Pages

Sunday, December 29, 2013

Gene Regulation Networks (GRNs): What are they? How do we build them? Why do we need them?


The cell is an amazing machine that can respond and adapt to ever changing environments.  Each cell’s response is encoded into its DNA and activated by proteins to create other proteins. The creation of each gene has an effect on the balance within the complex system of interactions occurring in a cell.

What are Gene Regulatory Networks (GRNs)?  

 Essentially, they capture the correlation of transcription levels or the relationship of interactions between different genes.  Response to any change in environment usually requires more than one gene.  GRNs visualize the relationships between different genes by representing the connections as a graph.  Each gene is represented as a node and relationships as lines connecting the genes.  The complete network of gene interactions resembles a road map.   And that is exactly what it is, a road map for us to understand all the effects that can occur when the levels of one gene are changed.  Using this road map, we can see alternate routes that  can be taken to substitute for a road (interaction) or town (protein) being blocked or removed.

Transcription factors (TFs) are usually proteins that assist in the transcription process. In eukaryotes, many TFs normally reside outside the nucleus and trans-locate to the nucleus once activated by another protein. Once inside the nucleus they bind to DNA, thereby changing the configuration of the chromatin and may promote or repress transcription of a gene. It is these effects that we are trying to understand by building GRNs.

How do we build a GRN? 

We can experimentally observe the locations of TFs bound to the DNA with high throughput techniques, such as ChIP-chip and ChIP-seq.   These methods essentially lock the TF to the DNA and then extract a specific TF along with the DNA to which it is locked.  The specific DNA to which the TF was attached is determined by either matching the sequence to a known set of sequences (ChIP-chip) or is sequenced and matched to the genome (ChIP-seq) to determine the binding location.  Performing these experiments in different conditions or within a time series gives us enormous amounts of data from which we can look for patterns where the TFs are bound to the DNA.  It is these patterns that help determine the GRN structure.
We can also use the results of experiments that measure the levels of transcription throughout the genome (RNA-seq).  This technique can also be used in different conditions and time series to show how and when differences in expression levels occur.  These experiments give us more information about the underlying behavior of the complex system of interactions by looking at the levels of expression in each of the conditions.

To extract information from the experimental data we must make some assumptions. It is assumed that the location of a TF is always near the gene it is regulating, although in some genomes this may be many thousands of nucleotides away from the transcription area. It is often assumed that the measured steady state levels of TF binding in DNA microarray expression profiling experiments is an accurate representation of the transcription and transcription represents the protein level.  It is also assumed that expression of transcription factors implies that the factor's proteins are active, although they may actually require post-translational modifications, co-factors, or may be sequestered in signaling pathways before becoming active.  
By abstracting away the intermediate steps for TF activation, we can search the data for general trends or other patterns of behavior using different techniques.  There is a whole field of research in computational biology dedicated to the automated construction of GRNs using data mining and machine learning techniques.  Below is a brief introduction to some of the methods.
If we are just concerned with the relationships between genes (ignoring the expression levels), then we can create Boolean networks.  These can represent relationships, such as protein C is produced if protein A AND protein B are active. Or C is produced if A but NOT B is active.  Depending on the algorithms used to discover the patterns, these rules can become very complex.  However the Boolean networks ignore the large amounts of data available in the expression levels.  The level of a gene is dependent not only on other TFs being active, but also on the levels of those factors.  Creation of a set of equations allows the dependencies on the levels of each TF.  For example, the level of protein C in the immediate future depends on the current levels of C and the levels of A and B and their rates of interaction.
                   |C| = |C| + a*|A| + b*|B|
These sets of equations, termed ordinary differential equation (ODE) models, can be very complex and each gene may be dependent on many other genes.  The large set of equations have many unknown parameters (a, b, …) that can be “solved” by using the large amounts of experimental data.  Unfortunately, there are so many parameters compared to the number of experiments that many different solutions are possible.
There are many other algorithms being use to capture the relationships.  The behavior of the complex system can be viewed as a distribution of individual behaviors of the components all mixed together.  There are methods that are very good at capturing those distributions, such as Bayesian networks or Hidden Markov Models.  Because these models are capturing the state changes within the cell, they can also capture the temporal features, such as time delays between raised levels of one gene and the effect on other genes.  The more complex the modeling method, the more complex the patterns that can be captured, but they usually also require more experimental data.

Why do we need GRNs?   

Perturbations, such as gene knockouts or gene mutations, will often lead to changes in transcription levels of many other genes and ultimately lead to altered phenotypes. Often these changes are detrimental to the cell.  Many diseases are caused by subtle changes in the balance of the complex system of interactions.  Even drugs intended to battle one of these perturbations will have far reaching and sometimes dangerous side effects at other locations within a genome, cell, or tissue.  Given the road map for interactions within a cell, we can predict how a cell will behave when one path or gene is removed from the network.  Sometimes we can even predict the phenotypes of those cells.

Gene Regulatory Network of E coli by Guzmán-Vargas and Santillán
 BMC Systems Biology 2008 2:13   doi:10.1186/1752-0509-2-13

 

As we obtain better and more quantitative models of the behavior of the cell, the better we will be able to envision all the possible effects of changes, either from disease or our attempts to cure it.