The cell is
an amazing machine that can respond and adapt to ever changing
environments. Each cell’s response is
encoded into its DNA and activated by proteins to create other
proteins. The creation of each gene has an effect on the balance within
the complex system of interactions occurring in a cell.
What are Gene Regulatory Networks (GRNs)?
Essentially, they capture the correlation of transcription levels or the relationship of interactions between different genes. Response to any change in environment usually requires more than one gene. GRNs visualize the relationships between different genes by representing the connections as a graph. Each gene is represented as a node and relationships as lines connecting the genes. The complete network of gene interactions resembles a road map. And that is exactly what it is, a road map for us to understand all the effects that can occur when the levels of one gene are changed. Using this road map, we can see alternate routes that can be taken to substitute for a road (interaction) or town (protein) being blocked or removed.Transcription factors (TFs) are usually proteins that assist in the transcription process. In eukaryotes, many TFs normally reside outside the nucleus and trans-locate to the nucleus once activated by another protein. Once inside the nucleus they bind to DNA, thereby changing the configuration of the chromatin and may promote or repress transcription of a gene. It is these effects that we are trying to understand by building GRNs.
How do we build a GRN?
We can experimentally observe the locations of TFs bound to the DNA with high throughput techniques, such as ChIP-chip and ChIP-seq. These methods essentially lock the TF to the DNA and then extract a specific TF along with the DNA to which it is locked. The specific DNA to which the TF was attached is determined by either matching the sequence to a known set of sequences (ChIP-chip) or is sequenced and matched to the genome (ChIP-seq) to determine the binding location. Performing these experiments in different conditions or within a time series gives us enormous amounts of data from which we can look for patterns where the TFs are bound to the DNA. It is these patterns that help determine the GRN structure.
We can also use the results of
experiments that measure the levels of transcription throughout the genome (RNA-seq). This technique can also be used in different
conditions and time series to show how and when differences in expression
levels occur. These experiments give us more
information about the underlying behavior of the complex system of
interactions by looking at the levels of expression in each of the conditions.
To extract information from the experimental data we must make some assumptions. It is assumed that the location of a TF is always near the gene it is regulating, although in some genomes this may be many thousands of nucleotides away from the transcription area. It is often assumed that the measured steady state levels of TF binding in DNA microarray expression profiling experiments is an accurate representation of the transcription and transcription represents the protein level. It is also assumed that expression of transcription factors implies that the factor's proteins are active, although they may actually require post-translational modifications, co-factors, or may be sequestered in signaling pathways before becoming active.
By
abstracting away the intermediate steps for TF activation, we can search the
data for general trends or other patterns of behavior using different techniques. There is a whole field of research in computational
biology dedicated to the automated construction of GRNs using data mining and machine
learning techniques. Below is a brief
introduction to some of the methods.
If we are
just concerned with the relationships between genes (ignoring the expression levels), then
we can create Boolean networks. These
can represent relationships, such as protein C is produced if protein A AND
protein B are active. Or C is produced if A but NOT B is active. Depending on the algorithms used to discover
the patterns, these rules can become very complex. However the
Boolean networks ignore the large amounts of data available in the expression
levels. The level of a gene is dependent
not only on other TFs being active, but also on the levels of those factors. Creation of a set of equations allows the
dependencies on the levels of each TF.
For example, the level of protein C in the immediate future depends on
the current levels of C and the levels of A and B and their rates of
interaction.
|C| = |C| +
a*|A| + b*|B|
These sets of
equations, termed ordinary differential equation (ODE) models, can be very complex and each gene may be dependent on many other
genes. The large set of equations have
many unknown parameters (a, b, …) that can be “solved” by using the large
amounts of experimental data. Unfortunately,
there are so many parameters compared to the number of experiments that many
different solutions are possible.
There are
many other algorithms being use to capture the relationships. The behavior of the complex system can be
viewed as a distribution of individual behaviors of the components all mixed
together. There are methods that are
very good at capturing those distributions, such as Bayesian networks or Hidden
Markov Models. Because these models are
capturing the state changes within the cell, they can also capture the temporal
features, such as time delays between raised levels of one gene and the effect
on other genes. The more complex the modeling method, the more complex the patterns that can be captured, but they usually also require more experimental data.
Why do we need GRNs?
Perturbations, such as gene knockouts or gene mutations, will often lead to changes in transcription levels of many other genes and ultimately lead to altered phenotypes. Often these changes are detrimental to the cell. Many diseases are caused by subtle changes in the balance of the complex system of interactions. Even drugs intended to battle one of these perturbations will have far reaching and sometimes dangerous side effects at other locations within a genome, cell, or tissue. Given the road map for interactions within a cell, we can predict how a cell will behave when one path or gene is removed from the network. Sometimes we can even predict the phenotypes of those cells.
Gene Regulatory Network of E coli by Guzmán-Vargas
and Santillán
BMC Systems Biology 2008 2:13 doi:10.1186/1752-0509-2-13
|