Visualizing Biological Processes

Sunday, December 29, 2013

Gene Regulation Networks (GRNs): What are they? How do we build them? Why do we need them?

The cell is an amazing machine that can respond and adapt to ever changing environments. Each cell’s response is encoded into its DNA and activated by proteins to create other proteins. The creation of each gene has an effect on the balance within the complex system of interactions occurring in a cell.

What are Gene Regulatory Networks (GRNs)?

Essentially, they capture the correlation of transcription levels or the relationship of interactions between different genes. Response to any change in environment usually requires more than one gene. GRNs visualize the relationships between different genes by representing the connections as a graph. Each gene is represented as a node and relationships as lines connecting the genes. The complete network of gene interactions resembles a road map. And that is exactly what it is, a road map for us to understand all the effects that can occur when the levels of one gene are changed. Using this road map, we can see alternate routes that can be taken to substitute for a road (interaction) or town (protein) being blocked or removed.

Transcription factors (TFs) are usually proteins that assist in the transcription process. In eukaryotes, many TFs normally reside outside the nucleus and trans-locate to the nucleus once activated by another protein. Once inside the nucleus they bind to DNA, thereby changing the configuration of the chromatin and may promote or repress transcription of a gene. It is these effects that we are trying to understand by building GRNs.

How do we build a GRN?

We can experimentally observe the locations of TFs bound to the DNA with high throughput techniques, such as ChIP-chip and ChIP-seq. These methods essentially lock the TF to the DNA and then extract a specific TF along with the DNA to which it is locked. The specific DNA to which the TF was attached is determined by either matching the sequence to a known set of sequences (ChIP-chip) or is sequenced and matched to the genome (ChIP-seq) to determine the binding location. Performing these experiments in different conditions or within a time series gives us enormous amounts of data from which we can look for patterns where the TFs are bound to the DNA. It is these patterns that help determine the GRN structure.

We can also use the results of experiments that measure the levels of transcription throughout the genome (RNA-seq). This technique can also be used in different conditions and time series to show how and when differences in expression levels occur. These experiments give us more information about the underlying behavior of the complex system of interactions by looking at the levels of expression in each of the conditions.

To extract information from the experimental data we must make some assumptions. It is assumed that the location of a TF is always near the gene it is regulating, although in some genomes this may be many thousands of nucleotides away from the transcription area. It is often assumed that the measured steady state levels of TF binding in DNA microarray expression profiling experiments is an accurate representation of the transcription and transcription represents the protein level. It is also assumed that expression of transcription factors implies that the factor's proteins are active, although they may actually require post-translational modifications, co-factors, or may be sequestered in signaling pathways before becoming active.

By abstracting away the intermediate steps for TF activation, we can search the data for general trends or other patterns of behavior using different techniques. There is a whole field of research in computational biology dedicated to the automated construction of GRNs using data mining and machine learning techniques. Below is a brief introduction to some of the methods.

If we are just concerned with the relationships between genes (ignoring the expression levels), then we can create Boolean networks. These can represent relationships, such as protein C is produced if protein A AND protein B are active. Or C is produced if A but NOT B is active. Depending on the algorithms used to discover the patterns, these rules can become very complex. However the Boolean networks ignore the large amounts of data available in the expression levels. The level of a gene is dependent not only on other TFs being active, but also on the levels of those factors. Creation of a set of equations allows the dependencies on the levels of each TF. For example, the level of protein C in the immediate future depends on the current levels of C and the levels of A and B and their rates of interaction.

|C| = |C| + a*|A| + b*|B|

These sets of equations, termed ordinary differential equation (ODE) models, can be very complex and each gene may be dependent on many other genes. The large set of equations have many unknown parameters (a, b, …) that can be “solved” by using the large amounts of experimental data. Unfortunately, there are so many parameters compared to the number of experiments that many different solutions are possible.

There are many other algorithms being use to capture the relationships. The behavior of the complex system can be viewed as a distribution of individual behaviors of the components all mixed together. There are methods that are very good at capturing those distributions, such as Bayesian networks or Hidden Markov Models. Because these models are capturing the state changes within the cell, they can also capture the temporal features, such as time delays between raised levels of one gene and the effect on other genes. The more complex the modeling method, the more complex the patterns that can be captured, but they usually also require more experimental data.

Why do we need GRNs?

Perturbations, such as gene knockouts or gene mutations, will often lead to changes in transcription levels of many other genes and ultimately lead to altered phenotypes. Often these changes are detrimental to the cell. Many diseases are caused by subtle changes in the balance of the complex system of interactions. Even drugs intended to battle one of these perturbations will have far reaching and sometimes dangerous side effects at other locations within a genome, cell, or tissue. Given the road map for interactions within a cell, we can predict how a cell will behave when one path or gene is removed from the network. Sometimes we can even predict the phenotypes of those cells.

Gene Regulatory Network of E coli by Guzmán-Vargas and Santillán

BMC Systems Biology 2008 2:13 doi:10.1186/1752-0509-2-13

As we obtain better and more quantitative models of the behavior of the cell, the better we will be able to envision all the possible effects of changes, either from disease or our attempts to cure it.

Monday, September 9, 2013

Reading the cellular instruction manual

It is much easier to use a machine if you can read the manual. But being a typical male I can understand the need to ignore the manual and just play with the knobs and buttons to see what they do.

Molecular biology experiments are doing just that. They isolate a single complex or molecule and play with it, changing its concentration or changing the structure of the molecule and measuring the differences that occur in the cell. By systematically "playing with" each molecule's knobs and buttons, we have explored how those individual molecules changes the behavior of the system.

Using this knowledge we try to change the behavior of the cells that are not behaving correctly by drugs that change the active concentrations of molecules or inhibit the cell from making the wrong molecules. We try turning the knobs in order to make the cell more normal. But often we are only looking a what is happening at one place in the cell. Every action within this complex system has many effects everywhere in the cell.

The instruction manual for a cell's behavior is encoded into the DNA. The problem is that we still don't know how to read it. We know that some of the molecules in the cells can bind with the DNA and change the behavior by producing more (or less) of a protein. The process of converting the DNA into proteins can be broken down into simple steps: DNA is copied into RNA by a process called transcription; the RNA is used as the working blueprint from which many proteins can be built by a process called translation.

DNA is transcribed into RNA which is translated into Proteins.

The statement above is the central doctrine of biology. The statement should be extended to be circular, because the creation of proteins has an effect on which DNA is transcribed. The cell is in a constant state of flux and the system responds to changes in the external environment or within the cell. The system has many feedback loops creating checks and balances to keep the production of proteins within acceptable ranges.

The DNA is very long manual with contingencies for many different functions, some only during development of the organism, some only activated when the cell needs to divide, and some are only activated when the cell needs to die. Within the entire genome (all the DNA), only a small portion represents the blueprints for producing proteins and these segments are called genes. Some of the other DNA is used to directly control the transcription of the genes, but much of the DNA's purpose is still unknown.

The proteins that bind to the DNA, collectively called DNA binding factors, can actually read the sequence of DNA and bind in specific positions to control the transcription activity around that position. The DNA encodes the signals of where these molecules bind and molecular biologist have discovered many of the signals used by each individual factor. But, many of these signals are overlapping with in the DNA sequence, which forces the individual factors to compete for binding to the DNA. This competition permits the cellular environment to dictate different binding of proteins depending on the local environment's concentration of proteins. For example, a low level of a protein in the cell does not activate the production of a gene, but once the concentration reaches a certain level that allows it to out-compete other factors and transcription is initiated. There could also be a feedback loop that causes the cell to stop production when the concentration increases to a level that causes the protein to out-compete in a region that blocks or inhibits transcription. Controlling the production of genes is known as Transcriptional Regulation.

We are only beginning to understand the complex balance of different mechanisms used in the cell to regulate the transcription. As we begin to understand the individual mechanisms down their behavior at individual nucleotides of DNA, we begin to see the small effects that changes in concentration of a factor can have on the system as a whole. It is this complex system of small and sometimes subtle shifts in factors bound to the DNA on which my research if focused. I am creating a model of these DNA buttons and knobs based on our current knowledge of each factors behavior in the attempt to understand some of the signal embedded into the DNA. And maybe someday soon we will begin to understand how to read the manual and begin learning what is written in our DNA.

Saturday, August 17, 2013

Welcome to the Visualization of Biological Processes Blog

Welcome to the Visualization of Biological Processes Blog. The intent is to follow the development of a teaching application that encompasses modeling, simulation and a dynamic visualization of the transcription regulation process. The application will be used to teach biology students the concepts of stochastic transcriptional regulation.

I will use this blog to explore and write about different aspects of the modeling, simulation, and visualization methods that are available as well as the selection of methods for our teaching application.

Pages