Tutorial - Reverse Engineering

Revision as of 13:45, 21 April 2006 by Smith (talk | contribs)

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Note

This section is being actively edited today, April 21, 2006.

Overview

The Reverse Engineering component included in geWorkbench is being rewritten (as of April 2006), and hence in a future release the details of the interface may change. However, the type of functionality will be similar.

The primary use of the Reverse Engineering component is to infer regulatory interactions between genes and gene products. The Reverse Engineering component uses the information theory concept of mutual information to find these interactions. Mutual Information is in principle more senstive and flexible than a simple correlation calculation. It is also invariant under data transformations, so the details of normalization should not be important.


Reverse Engineering in the context of geWorkbench

The Reverse Engineering component calculates the information that the expression pattern of one gene carries about the expression of another gene, that is, it is a pairwise calculation. Larger datasets, containing more arrays per marker, will yield greater sensitivity and better ststistical support. Full scale runs of reverse engineering algorithms, comparing all markers against each other, and typically done on datasets containing several hundred microarrays, are typically performed on large cluster computers and are not feasible on a desktop machine. During 2006 we hope to provide a remote service that can host such caluclations for jobs launched from geWorkbench. However, at present, smaller scale calculations are supported directly in geWorkbench.

As typically used in geWorkbench, the Reverse Engineering component calculates the Mutual Information score between a single hub gene and all other N markers in the dataset. In a second step, a subset containing the best M markers is chosen (with a current limit of 100), and a complete pairwise MxM/2 mutual information calculation is performed between them. The network resulting from this calculation can be displayed as a branched tree of interactions within the Cytoscape component.


Prerequisites

A dataset containing multiple arrays (the more the better) should be loaded into geWorkbench. If data is loaded from separate files, it should be merged into a single microarray datset, either at the time of or after being read in. See the section on loading data files. In this tutorial we will load a dataset also used in other tutorial sections, which has been normalized, and filtered to reduce the number of genes. This file, XXXX will be made available in the tutorial data section (coming soon). It can also be obtained by starting with the file "webmatrix.exp" available in the downloads section and performing the steps shown in section XXXX.