geWorkbench

Revision as of 15:01, 23 December 2008

Home \| Quick Start \| Basics \| Menu Bar \| Preferences \| Component Configuration Manager \| Workspace \| Information Panel \| Local Data Files \| File Formats \| caArray \| Array Sets \| Marker Sets \| Microarray Dataset Viewers \| Filtering \| Normalization \| Tutorial Data \| geWorkbench-web Tutorials	Analysis Framework \| ANOVA \| ARACNe \| BLAST \| Cellular Networks KnowledgeBase \| CeRNA/Hermes Query \| Classification (KNN, WV) \| Color Mosaic \| Consensus Clustering \| Cytoscape \| Cupid \| DeMAND \| Expression Value Distribution \| Fold-Change \| Gene Ontology Term Analysis \| Gene Ontology Viewer \| GenomeSpace \| genSpace \| Grid Services \| GSEA \| Hierarchical Clustering \| IDEA \| Jmol \| K-Means Clustering \| LINCS Query \| Marker Annotations \| MarkUs \| Master Regulator Analysis \| (MRA-FET Method) \| (MRA-MARINa Method) \| MatrixREDUCE \| MINDy \| Pattern Discovery \| PCA \| Promoter Analysis \| Pudge \| SAM \| Sequence Retriever \| SkyBase \| SkyLine \| SOM \| SVM \| T-Test \| Viper Analysis \| Volcano Plot

MINDy

The MINDy algorithm uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti). This influence is measured in terms of whether there is a change in the correlation (measured as mutual information) of expression between the TF and its targets Ti conditional on a change in the expression of Mj. The mutual information values used in MINDy are calculated using the ARACNe algorithm, which is also a part of geWorkbench.

Outline of MINDy calculations

A microarray gene expression dataset is selected.
The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti).
Parameters for the MINDy run are set.
The unconditional mutual information is calculated for each pairing of the hub TF with a target gene Ti, and any target Ti not meeting a user specified threshold MI value is screened out. Use of the DPI calculation is not recommended here.
Using the expression value of the chosen modulator gene Mj, the arrays in the experiment are ordered (as columns in the data matrix), from lowest to highest.
Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression. The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+). The remaining arrays are not further considered.
For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta I). The smallest possible threshold value on MI should be used, consistent with the computational power of the available machine (the lower the threshold, the larger the computation). The target must meet the threshold in both the low (M-) and high (M+) calculations to be included in the next stage.
A value of delta I above a specified threshold (currently the same threshold as specified for the "conditional MI calculation) is taken to mean there is an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the correlation of expression between the hub TF and the target gene.
The sign of the influence of the modulator is determined. For example, does increasing the expression of the modulator gene Mj increase (or decrease) the correlation of expression between the hub TF and the target gene?

Important notes on the calculation

MI Thresholds

In the current release of geWorkbench (v1.6.2), in the unconditional MI calculation, a threshold can be set using either the raw MI score or using a p-value. The p-value is only a rough approximation based on parameters calculated for a particular data set. We recommend that the user use a MI value of 0.2 for both the unconditional and conditional MI calculations. A complete calculation of the proper p-value will be available with the release of MINDy2 in geWorkbench (Spring 2009).

delta (MI) Threshold

In the current release of geWorkbench (v1.6.2), the significance of the delta (MI) value is not calculated. This p-value requires a significant post-processing calculation that is planned for inclusion with the release of MINDy2 in geWorkbench (Spring 2009).

Prerequisites for MINDy calculations

The MINDy calculation contains certain assumptions (Wang et al, unpublished).

(a) the expression of the modulator (gm)must have a sufficient expression range to separate its two expression tails compared to the experimental noise level. This can be done by running the deviation filter (Filtering component) on the dataset before starting the MINDy calculation.

(b) Any modulator whose expression profile (Mj) is not statistically independent of that of the hub transcription factor (TF) must be excluded. This can be determined using a mutual information calculation (ARACNE). This will be implemented directly in MINDy2 (expected in Spring 2009)

Setting the Main Parameters

Modulators List - [From File or From Set] - The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component. The gene expression profiles of the modulators should be independent of that of the hub TF gene as measured by mutual information. This could be determined using a preliminary run of ARACNE including just the modulators and the transcription factor.

NOTE- in the next version of MINDy, the ability to calculate a p-value on the conditional mutual information score will be implemented. When available, it will be useful to limit the number of modulator genes tested to minimize the multiple testing correction.

Hub Marker - Enter the marker ID for a known or putative transcription factor gene. The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used. Note that even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component, either in the list or in the default Marker set "Selection".

Target List - [All Markers, From File, or From Set] - The target list should be composed of genes thought to be regulated by the Hub Marker transcription factor. The list of target markers can be loaded from a file containing a comma separated list, or a set of markers can be selected from the Markers component. Alternatively, All Markers can be selected.

(Note - the "All Markers" checkbox at the bottom of the Analysis component should not be used in the MINDy component).

Setting the Advanced Parameters

Sample per Condition (%) - MINDy calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used.

Unconditional and Conditional- The underlying ARACNe calculation of mutual information allows a threshold to be set. This allows TF-target pairs with low MI to be screened out - an MI value will only be returned for a target when it exceeds this value. The threshold can be specified as a mutual information value or as a P-value. (Note - in the current release, geWorkbench 1.6.2, the p-value calculation is not available for the conditional MI calculation, and we recommend that a MI value of around 0.2 be tried for both the unconditional and conditional parts of the calculation. The calculation of p-values in the unconditional section is only a approximation).

Mutual Info - The user can specify a threshold for the mutual information (MI) estimate. For example, a value of 0.20 filters out target genes with a MI score of less than 0.20. By default, no threshold is set (MI = 0).

P-value - Significance level for an unconditional mutual information (MI) estimate to be considered statistically different from zero. This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 0.01.

Correction - None or Bonferroni - correct for multiple testing.

DPI Tolerance - The Data Processing Inequality (triangle inequality)can be used to remove the effects of indirect interactions, e.g. if TF1->TF2->Target, DPI can be used to remove the indirect action of TF1 on the target. Stated another way, the DPI can be used to remove the weakest interaction of those between any three markers. The DPI tolerance specifies the degree of sampling error to be accepted, as with a finite sample size an exact value MI can not be calculated.
- The DPI tolerance is normally between 0 and 0.15 since values larger than 0.15 yields higher false positives.
- See the Tutorial_-_ARACNE tutorial page and Margolin et al. 2006 for further details on use of DPI.

DPI Target List - The DPI target list can be used to limit the ARACNE calculation to transcriptional networks. It is used to screen out spurious regulatory interaction signals of genes that are tightly coexpressed but are not in a regulatory relationship to each other, for example genes for two proteins that are in a physical complex and hence always produced in the same amounts. A comma-separated list can be typed in, or it can be loaded from an external file. If used, the DPI Target List should contain all markers that are annotated as transcription factors. Signaling proteins could also be included.
- Details: If the box is checked, the user selects and loads a file which specifies markers (which should be a list of one or more presumptive transcription factors) which will be given preferential treatment during the DPI edge-removal step. Edges originating from markers on this list will not be removed by edges originating from markers not on this list. However, for DPI calculations where all three markers are members of the list, the weakest connecting edge may still be removed.

Running a MINDy Analysis

1. Select a microarray set node in the Project Folder.

2. In the analysis pane (lower right of the application), select MINDy Analysis.

3. In the Main tab, populate the Modulators List by selecting a set of markers defined in the Markers component, or load a list from a file.

4. Populate the Target List textbox by selecting the choice "All Markers", or by selecting a set of markers defined in the Markers component, or by loading a list from a file.

5. Populate the Hub Gene textbox to designate the TF gene by (1) typing the marker name (as displayed in the Markers component) or (2) in the Selection Area (lower left of the application) Marker Tab, click on the marker name corresponding to the TF.

6. Parameter values for the unconditional and conditional mutual information calculations can be set in the Advanced Tab or left at the default values.

7. Click Analyze. If successful, the project window is updated to reflect the MINDy result node. The result node is shown as a child node of the input dataset. Please note that the Dataset History tab captures the analysis parameters.

Viewing MINDy Results

1. Select the MINDy result node in the Project Folder.

2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on Select All to display all modulators in the Table, List,and Heat Map views. The Modulators Selected is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views. Additional actions include:

Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).

Sort: Click on the column headers or use sort options available in the left pane.

Add to Set: Adds selected modulators to a Marker Set. You can select one or more Targets and/or Modulators, using the selection checkboxes.

All Markers: This checkbox determines if all the target genes are displayed or only genes in activated marker groups.

3. Select from the various tabs to view the data in alternate formats. See [#_Navigating_MINDy_Visualization Navigating MINDy] for additional information on these data views.

Navigating MINDy Visualization

MINDy includes the following data views: Modulator, Table, List and Heat Map.

Modulator

Modulator: This table-based view contains one row per modulator gene. Only modulators selected in this tab are included the other data views. The value of the Mode column for a modulator M is either “+”, “-“ or null (0) depending on if M+ is larger, smaller or equal to M-.

Table

Table: The rows of the table represent target genes and the columns represent modulators. Additional actions include:

Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).

Sorting: Displays columns (modulators) from left to right in descending order by; Aggregate ( M#), Enhancing (M+) or Negative (M-).

Modulator Limits: Activates the checkbox to limit the columns (modulators) display to a defined value. This selection filters the modulator display based upon the current display order.

Display Options:

Color View: Enables a heat map display of each cell based on the value of the score. 1 is displayed as absolute blue; +1 is displayed as absolute red; 0:1 is mapped uniformly from white to shades of red; -1:0 is mapped uniformly from shades of blue to white.

Score View: Displays the discretized score values.

List

List: The table has three columns: Modulator, Target and Score. Additional actions include:

Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).

Marker Override: Marker selection preferences. As markers are selected, the number of markers selected is listed next to Enable Selection field. This does not reflect the number of rows.

Heat Map

Heat Map: The Heat Map represents the expression values for individual markers (target genes). It contains two color mosaic panels. The rows correspond to target genes and the columns (arrays) are ordered according to the expression of the TF gene (low to high). In the screenshot above, G8 represents the modulator, whose expression values are used to divide the data into two sets. G9 represents the TF whose interaction with various targets is being evaluated. The left panel correspond to the L- arrays where the modulator is least expressed while the columns on the right panel to the L+ arrays where the modulator is most expressed. Additional actions include:

Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).

Transcription Factor: Displays the TF entered in the MINDy Analysis parameters.

Modulator: Select a modulator from the “Selected Modulators” list to update the heat map display.

Refresh: Resets the heat map display.

Image Snapshot: Captures the heat map as an image node in the Project Folder.

References

Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols, 2006 Vol 1(2). ppgs 662-671.'

Wang, K. et al., (unpublished) MINDY: An Algorithm for the Genome-wide Discovery of Modulators of Transcriptional Interactions.

@@ Line 9: / Line 9: @@
 # The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti).
 # Parameters for the MINDy run are set.
-# The unconditional mutual information is calculated for each pairing of the hub TF with a target gene Ti, and any target Ti not meeting a user specified threshold MI value is screened out.
+# The unconditional mutual information is calculated for each pairing of the hub TF with a target gene Ti, and any target Ti not meeting a user specified threshold MI value is screened out.  Use of the DPI calculation is not recommended here.
-# The arrays in the experiment are ordered, from lowest to highest (columns in the data matrix) by the expression value of the chosen modulator gene Mj.
+# Using the expression value of the chosen modulator gene Mj, the arrays in the experiment are ordered (as columns in the data matrix), from lowest to highest.
-# Two subsets of arrays are then chosen from each end of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression.  The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+).  The remaining arrays are not further considered.
+# Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression.  The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+).  The remaining arrays are not further considered.
-# For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta I).  A threshold MI value can be set for these interactions.  If the target does not meet the conditional MI threshold in either low (M-) or high (M+) calculations, that target is not included in the next stage.
+# For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta I).  The smallest possible threshold value on MI should be used, consistent with the computational power of the available machine (the lower the threshold, the larger the computation).   The target must meet the threshold in both the low (M-) and high (M+) calculations to be included in the next stage.
 # A value of delta I above a specified threshold (currently the same threshold as specified for the "conditional MI calculation) is taken to mean there is an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the '''correlation of expression''' between the hub TF and the target gene.
 # The sign of the influence of the modulator is determined.  For example, does increasing the expression of the modulator gene Mj increase (or decrease) '''the correlation''' of expression between the hub TF and the target gene?