Tutorial - Reverse Engineering

Revision as of 12:43, 14 August 2006 by Smith (talk | contribs)

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

The Reverse Engineering component included in geWorkbench is being rewritten (as of April 2006), and hence in a future release the details of the interface may change. However, the functionality will be similar.

The primary use of the Reverse Engineering component is to infer regulatory interactions between genes and gene products. The Reverse Engineering component uses the information theory concept of mutual information to find these interactions. Mutual Information is in principle more sensitive and flexible than a simple correlation calculation. It is also invariant under data transformations, so the details of normalization should not be important.

Reverse Engineering in the context of geWorkbench

The Reverse Engineering component calculates the information that the expression pattern of one gene carries about the expression of another gene, that is, it is a pairwise calculation. Larger datasets, containing more arrays per marker, will yield greater sensitivity and better statistical support. Full scale runs of reverse engineering algorithms, comparing all markers against each other, and typically done on datasets containing several hundred microarrays, are typically performed on large cluster computers and are not feasible on a desktop machine. During 2006 we hope to provide a remote service that can host such calculations for jobs launched from geWorkbench. However, at present, smaller scale calculations are supported directly in geWorkbench.

As typically used in geWorkbench, the Reverse Engineering component calculates the Mutual Information score between a single hub gene and all other N markers in the dataset. In a second step, a subset containing the best M markers is chosen (with a current limit of 100), and a complete pairwise MxM/2 mutual information calculation is performed between them. The network resulting from this calculation can be displayed as a branched tree of interactions within the Cytoscape component.

Prerequisites

A dataset containing multiple arrays (the more the better) should be loaded into geWorkbench. If data is loaded from separate files, it should be merged into a single microarray datset, either at the time of or after being read in. See the section Projects and Data Files. In this tutorial we will load a dataset also used in other tutorial sections, which has been normalized, and filtered to reduce the number of genes. This file, "webmatrix_quantile_log2_dev1_mv0.exp" will be made available in the tutorial data section. It can also be obtained by starting with the file "webmatrix.exp" available in the downloads section and performing the following steps:

1. Load the file webmatrix.exp.

2. Quantile Normalize.

3. Log2 transform (also in the Normalize tab).

4. Filter out values having deviation less than 1.

5. Remove markers with filtered-out values using the missing values filter with a threshold of 0.

Example - Profiler

  • Load the data file "webmatrix_quantile_log2_dev1_mv0.exp". This contains a set of 100 experiments on Affymetrix HG_U95Av2 chips. As described above, this file has been quantile normalized, log2 converted, and then filtered to remove markers with a deviation of less than 1, in order to reduce the size of the dataset, leaving 3837 markers.
  • In the upper right section of geWorkbench find the Reverse Engineering component. It should by default be displaying the Profiler tab.
  • In the Markers component search box, on the left side of the geWorkbench interface, enter 1973 and hit enter. This will find the marker 1973_s_at, which is the c-Myc gene, a well-known transcription factor with many interactions. Click on this marker in the list. This will enter the marker into the Hub Gene Label field of the Profiler.


T Markers Search1973.png


  • The default setting in the Profiler is Mutual Information (fast). With this selected, hit Analyze(2D). This will return a list of all markers having a MI score of greater than the cutoff value (the default is 0.2).


T ReverseEngineering Basic.png


The next step is to create a network. By default, the calculation will be performed for the top 100 scoring genes on list.

  • If you wish to use a smaller set of genes, select just those you wish to include from the list by highlighting them.
  • By right-clicking and selecting "Add to Set", the selected group will be added to the Markers component as a new set of markers which can be used in other components (sequence retrieval, annotation retriever etc.). If no selection is made, up to the top 100 scoring genes will be added to the new set.
  • Hit the Create Network button. The Mutual Information algorithm will be run again on the selected markers, but this time it will include all pairwise combinations of the selected genes, not just each against the hub gene. Each gene is then connected via an edge with the gene it most strongly interacts with, with the chosen hub-gene at the center.

After network creation has been run, an adjacency matrix will be placed in the Projects Folder:


T ProjectFolders AdjacencyMatrix.png


  • Select the Cytoscape visualizer if it is not already active. The newly created network will appear similar to that shown here (we have selected the central hub-gene by clicking on it):


T ReverseEngineering InitialNet.png


A better visualization can be created in Cytoscape by going to the Layout menu, and chosing yFiles->organic. The layout will now appear as:


T ReverseEngineering Central1973.png


  • Within the network created in Cytoscape, one can select the central gene as already shown above, and then on the Cytoscape menu chose Select->Nodes->First Neighbors of selected nodes.

T ReverseEngineering SelectFirstNeighbors.png


The first neighbors will be highlighted in the graph,


T ReverseEngineering FirstNeighbors.png


and also added as a new set in the Markers component.


T Markers ReverseEng Selected.png


We can return to the main Reverse Engineering component by clicking on the original dataset in the Project Folders component. If we select the first (highest MI score) marker on the list, the graph shown below is drawn in the Motif Location Histogram display. This shows a plot of the expression values on each array for the selected hub marker vs any other marker selected in the list.


T ReverseEngineering MotifHistogram.png

Options

Pearson - Uses a Pearson correlation function to calculate the interaction scores.