MINDy
Contents
MINDY
The MINDY algorithm uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti). This influence is measured in terms of whether there is a change in the correlation (measured as mutual information) of expression between the TF and its targets Ti conditional on a change in the expression of Mj.
- Given a data matrix from a set of microarray hybridizations, Mindy sorts the arrays (columns in the data matrix) by the expression value of the chosen modulator gene Mj, from lowest to highest.
- Mindy then forms two sets of array experiments, one set containing arrays in which Mj shows the lowest expression, and the other set in which Mj shows the highest expression. The remaining, unassigned experiments are not further considered. A typical trial might involve assigning the lowest 35% of the arrays to the low group, as measured by expression of Mj, and the highest 35% to the high group.
- For each the two sets separately, Mindy then measures the mutual information between the choosen hub TF and each of the genes in the dataset.
- Mindy then asks if the change in expression of Mj between the two sets has any effect on the correlation of expression between the hub TF and its target genes, as measured by the mutual information. That is, for a given gene, is there any correlation between the change in Mj (that is, being in the low or the high group) and the observed mutual information between the two sets of arrays. For example, does increasing the expression of the modulator gene Mj increase (or decrease) the correlation of expression between the hub TF and the target gene?
Setting the Main Parameters
Modulators List - [From File or From Set] - The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component.
Hub Marker - Enter the marker ID for a known or putative transcription factor gene. The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used. Note that even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component, either in the list or in the default Marker set "Selection".
Target List - [All Markers, From File, or From Set] - The target list should be composed of genes thought to be regulated by the Hub Marker transcription factor. The list of target markers can be loaded from a file containing a comma separated list, or a set of markers can be selected from the Markers component. Alternatively, All Markers can be selected.
(Note - the "All Markers" checkbox at the bottom of the Analysis component should not be used in the MINDY component).
Setting the Advanced Parameters
Sample per Condition (%) - MINDY calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used.
Conditional (MINDY) - The conditional MINDY score for a given TF->Marker interaction is calculated as the difference in the MI values for that pair between low and high expression sets. A threshold for this MI difference can be set here. Calculating a P-value for the significance of this score requires an extra simulation step which is not currently implemented in geWorkbench. By default, no threshold is set (delta(MI) = 0).
- Correction - Inactivated since no P-value calculation is currently implemented.
Unconditional (ARACNE)- [P-value or Mutual Info] These values refer to the MI calculations for each TF-target pair separately within the two sets (defined by low and high expression of the modulator).
- Mutual Info - The user can specify a threshold for the mutual information (MI) estimate. For example, a value of 0.20 filters out target genes with a MI score of less than 0.20. By default, no threshold is set (MI = 0).
- P-value - Significance level for an unconditional mutual information (MI) estimate to be considered statistically different from zero. This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 1.
- Correction - None or Bonferroni - correct for multiple testing.
- DPI Tolerance - The Data Processing Inequality (triangle inequality)can be used to remove the effects of indirect interactions, e.g. if TF1->TF2->Target, DPI can be used to remove the indirect action of TF1 on the target. Stated another way, the DPI can be used to remove the weakest interaction of those between any three markers. The DPI tolerance specifies the degree of sampling error to be accepted, as with a finite sample size an exact value MI can not be calculated.
- The DPI tolerance is normally between 0 and 0.15 since values larger than 0.15 yields higher false positives.
- See the Tutorial_-_ARACNE tutorial page and Margolin et al. 2006 for further details on use of DPI.
- DPI Target List - The DPI target list can be used to limit the ARACNE calculation to transcriptional networks. It is used to screen out spurious regulatory interaction signals of genes that are tightly coexpressed but are not in a regulatory relationship to each other, for example genes for two proteins that are in a physical complex and hence always produced in the same amounts. A comma-separated list can be typed in, or it can be loaded from an external file. If used, the DPI Target List should contain all markers that are annotated as transcription factors. Signaling proteins could also be included.
- Details: If the box is checked, the user selects and loads a file which specifies markers (which should be a list of one or more presumptive transcription factors) which will be given preferential treatment during the DPI edge-removal step. Edges originating from markers on this list will not be removed by edges originating from markers not on this list. However, for DPI calculations where all three markers are members of the list, the weakest connecting edge may still be removed.
Running a MINDY Analysis
1. Select a microarray set node in the Project Folder.
2. In the analysis pane (lower right of the application), select MINDY Analysis.
3. In the Main tab, populate the Modulators List by selecting a set of markers defined in the Markers component, or load a list from a file.
4. Populate the Target List textbox by selecting the choice "All Markers", or by selecting a set of markers defined in the Markers component, or by loading a list from a file.
5. Populate the Hub Gene textbox to designate the TF gene by (1) typing the marker name (as displayed in the Markers component) or (2) in the Selection Area (lower left of the application) Marker Tab, click on the marker name corresponding to the TF.
6. In the Advanced Tab, accept the default values or update the parameter values by clicking the parameter textbox and typing a new value.
7. Click Analyze. If successful, the project window is updated to reflect the MINDY result node which is a child node of the input dataset. Please note that the Dataset History tab captures the analysis parameters.
Viewing MINDY Results
1. Select the MINDY result node in the Project Folder.
2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on Select All to display all modulators in the Table, List,and Heat Map views. The Modulators Selected is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views. Additional actions include:
- Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
- Sort: Click on the column headers or use sort options available in the left pane.
- Add to Set: Adds selected modulators to a Marker Set. You can select one or more Targets and/or Modulators, using the selection checkboxes.
- All Markers: This checkbox determines if all the target genes are displayed or only genes in activated marker groups.
3. Select from the various tabs to view the data in alternate formats. See [#_Navigating_Mindy_Visualization Navigating Mindy] for additional information on these data views.
MINDY includes the following data views: Modulator, Table, List and Heat Map.
Modulator
Modulator: This table-based view contains one row per modulator gene. Only modulators selected in this tab are included the other data views. The value of the Mode column for a modulator M is either “+”, “-“ or null (0) depending on if M+ is larger, smaller or equal to M-.
Table
Table: The rows of the table represent target genes and the columns represent modulators. Additional actions include:
- Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
- Sorting: Displays columns (modulators) from left to right in descending order by; Aggregate ( M#), Enhancing (M+) or Negative (M-).
- Modulator Limits: Activates the checkbox to limit the columns (modulators) display to a defined value. This selection filters the modulator display based upon the current display order.
Display Options:
- Color View: Enables a heat map display of each cell based on the value of the score. 1 is displayed as absolute blue; +1 is displayed as absolute red; 0:1 is mapped uniformly from white to shades of red; -1:0 is mapped uniformly from shades of blue to white.
- Score View: Displays the discretized score values.
List
List: The table has three columns: Modulator, Target and Score. Additional actions include:
- Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
- Marker Override: Marker selection preferences. As markers are selected, the number of markers selected is listed next to Enable Selection field. This does not reflect the number of rows.
Heat Map
Heat Map: The heat map contains two color mosaic panels. The rows correspond to target genes and arrays are ordered according to the expression of the TF gene (low to high). The left panel correspond to the L- arrays where the modulator is least expressed while the columns on the right panel to the L+ arrays where the modulator is most expressed. Additional actions include:
- Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
- Transcription Factor: Displays the TF entered in the Mindy Analysis parameters.
- Modulator: Select a modulator from the “Selected Modulators” list to update the heat map display.
- Refresh: Resets the heat map display.
- Image Snapshot: Captures the heat map as an image node in the Project Folder.
References
Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols, 2006 Vol 1(2). ppgs 662-671.'