Difference between revisions of "Tutorial - Reverse Engineering"

Line 33: Line 33:
 
* In the upper right section of geWorkbench find the Reverse Engineering component.  It should by default be displaying the '''Profiler''' tab.
 
* In the upper right section of geWorkbench find the Reverse Engineering component.  It should by default be displaying the '''Profiler''' tab.
  
* In the Markers component search box, on the left side of the geWorkbench interface, enter 1973_s_at and hit enter.  This is the marker for the c-Myc gene, a well-known transcription factor with many interactions.  Click on this marker in the list.  This will enter the marker into the '''Hub Gene Label''' field of the '''Profiler'''.
+
* In the Markers component search box, on the left side of the geWorkbench interface, enter 1973 and hit enter.  This will find the marker 1973_s_at, which is the c-Myc gene, a well-known transcription factor with many interactions.  Click on this marker in the list.  This will enter the marker into the '''Hub Gene Label''' field of the '''Profiler'''.
 +
 
 +
 
 +
[[Image:T_Markers_Search1973.png]]
 +
 
  
 
* The default setting in the Profiler is '''Mutual Information (fast)'''.  With this selected, hit '''Analyze(2D)'''.  This will return a list of all markers having a MI score of greater than the cutoff value (the default is 0.2).
 
* The default setting in the Profiler is '''Mutual Information (fast)'''.  With this selected, hit '''Analyze(2D)'''.  This will return a list of all markers having a MI score of greater than the cutoff value (the default is 0.2).
 +
  
 
[[Image:T_ReverseEngineering_Basic.png]]
 
[[Image:T_ReverseEngineering_Basic.png]]
  
* If at this point you hit the Create Network button (making sure that the Cytoscape checkbox is checked), a network will be displayed based on the top 100 markers interacting with c-Myc.  As described above, the MI algoritm is run again on these M=100 markers, in order to measure interactions between each pair.  Each marker is then connected via an edge with the marker it most strongly interacts with, with the chosen hub-gene at the center.  This is best seen in '''Cytoscape''' by going to the '''Layout''' menu, and chosing '''yFiles->organic'''.  
+
 
 +
* If at this point you hit the Create Network button, a network will be displayed based on the top 100 markers interacting with c-Myc.  As described above, the MI algoritm is run again on these M=100 markers, in order to measure interactions between each pair.  Each marker is then connected via an edge with the marker it most strongly interacts with, with the chosen hub-gene at the center.  This is best seen in '''Cytoscape''' by going to the '''Layout''' menu, and chosing '''yFiles->organic'''.  
  
 
* If a smaller list is desired, a set of markers can be highlighted in the list originally returned.  Only this selected subset, up to 100 markers, will then be used if "Create Network" is pressed.
 
* If a smaller list is desired, a set of markers can be highlighted in the list originally returned.  Only this selected subset, up to 100 markers, will then be used if "Create Network" is pressed.

Revision as of 23:33, 25 April 2006

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

The Reverse Engineering component included in geWorkbench is being rewritten (as of April 2006), and hence in a future release the details of the interface may change. However, the functionality will be similar.

The primary use of the Reverse Engineering component is to infer regulatory interactions between genes and gene products. The Reverse Engineering component uses the information theory concept of mutual information to find these interactions. Mutual Information is in principle more senstive and flexible than a simple correlation calculation. It is also invariant under data transformations, so the details of normalization should not be important.


Reverse Engineering in the context of geWorkbench

The Reverse Engineering component calculates the information that the expression pattern of one gene carries about the expression of another gene, that is, it is a pairwise calculation. Larger datasets, containing more arrays per marker, will yield greater sensitivity and better ststistical support. Full scale runs of reverse engineering algorithms, comparing all markers against each other, and typically done on datasets containing several hundred microarrays, are typically performed on large cluster computers and are not feasible on a desktop machine. During 2006 we hope to provide a remote service that can host such caluclations for jobs launched from geWorkbench. However, at present, smaller scale calculations are supported directly in geWorkbench.

As typically used in geWorkbench, the Reverse Engineering component calculates the Mutual Information score between a single hub gene and all other N markers in the dataset. In a second step, a subset containing the best M markers is chosen (with a current limit of 100), and a complete pairwise MxM/2 mutual information calculation is performed between them. The network resulting from this calculation can be displayed as a branched tree of interactions within the Cytoscape component.


Prerequisites

A dataset containing multiple arrays (the more the better) should be loaded into geWorkbench. If data is loaded from separate files, it should be merged into a single microarray datset, either at the time of or after being read in. See the section Projects and Data Files. In this tutorial we will load a dataset also used in other tutorial sections, which has been normalized, and filtered to reduce the number of genes. This file, "webmatrix_quantile_log2_dev1_mv0.exp" will be made available in the tutorial data section. It can also be obtained by starting with the file "webmatrix.exp" available in the downloads section and performing the following steps:

1. Load the file webmatrix.exp.

2. Quantile Normalize.

3. Log2 transform (also in the Normalize tab).

4. Filter out values having deviation less than 1.

5. Remove markers with filtered-out values using the missing values filter with a threshold of 0.


Example

  • Load the data file "webmatrix_quantile_log2_dev1_mv0.exp". This contains a set of 100 experiments on Affymetrix HG_U95Av2 chips. As described above, this file has been quantile normalized, log2 converted, and then filtered to remove markers with a deviation of less than 1, in order to reduce the size of the dataset, leaving 3837 markers.
  • In the upper right section of geWorkbench find the Reverse Engineering component. It should by default be displaying the Profiler tab.
  • In the Markers component search box, on the left side of the geWorkbench interface, enter 1973 and hit enter. This will find the marker 1973_s_at, which is the c-Myc gene, a well-known transcription factor with many interactions. Click on this marker in the list. This will enter the marker into the Hub Gene Label field of the Profiler.


T Markers Search1973.png


  • The default setting in the Profiler is Mutual Information (fast). With this selected, hit Analyze(2D). This will return a list of all markers having a MI score of greater than the cutoff value (the default is 0.2).


T ReverseEngineering Basic.png


  • If at this point you hit the Create Network button, a network will be displayed based on the top 100 markers interacting with c-Myc. As described above, the MI algoritm is run again on these M=100 markers, in order to measure interactions between each pair. Each marker is then connected via an edge with the marker it most strongly interacts with, with the chosen hub-gene at the center. This is best seen in Cytoscape by going to the Layout menu, and chosing yFiles->organic.
  • If a smaller list is desired, a set of markers can be highlighted in the list originally returned. Only this selected subset, up to 100 markers, will then be used if "Create Network" is pressed.
  • By right-clicking and selecting "Add to Set", this group will be added to the Markers component as a new set of markers which can be used in other components (sequence retrieval, annotation retriever etc.).
  • Within the network created in Cytoscape, one can select the central gene, and then on the Cytoscape menu chose Select->Nodes->First Neighbors of selected nodes. The first neighbors will be highlighted in the graph, and also added as a new set in the Markers component.


Further Features

Motif Location Histogram display - This shows a plot of the expression values for the selected hub marker vs any other marker selected in the list.

Pearson - Uses a Pearson correlation function to calculate the interaction scores.

Analyze(3D) - not functional.

Mutual Information (Accurate) - not functional.

Linear - not functional.

Notes on Prototype Status

The current Reverse Engineering is a prototype and a new version is under development. Functions not described, even if present in the interface, should be assumed to be non-functional.