MINDy

Revision as of 14:10, 13 July 2010 by Smith (talk | contribs) (DPI Tolerance)

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


MINDy

The MINDy algorithm (Modulator Inference by Network Dynamics) uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti). This influence is measured in terms of whether there is a change in the correlation (measured as mutual information) of expression between the TF and its targets Ti conditional on a change in the expression of Mj. The change in correlation is calculated as the difference in mutual information (delta (MI)) for each TF-Ti pair between the two conditions (modulator high or low). The mutual information values used in MINDy are calculated using the ARACNe algorithm, which is also a part of geWorkbench.


Outline of MINDy calculations

  1. A microarray gene expression dataset is selected.
  2. The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti).
  3. Parameters for the MINDy run are set.
  4. Using the expression value of the chosen modulator gene Mj, the arrays in the experiment are ordered (as columns in the data matrix), from lowest to highest.
  5. Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression. The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+). The remaining arrays are not further considered.
  6. For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta (MI)).
  7. The resulting delta (MI)s are displayed. At present, a p-value is not calculated on the delta (MI). Larger values of delta (I) may indicate an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the correlation of expression between the hub TF and the target gene.
  8. The sign of the influence of the modulator is also displayed, e.g. does increasing the expression of the modulator gene Mj increase (or decrease) the correlation of expression between the hub TF and the target gene?

Prerequisites for MINDy calculations

  1. Number of arrays - A microarray gene expression data set with a sufficient number of arrays must be present. For optimal results, at least one hundred microarrays of a homogenous cellular system should be used, for example, isolable tumor cells or cell lines, with a range of different expression conditions (distinct cellular phenotypes). It is best to have at least 100 separate microarrays.
  2. Modulator expression variation - The expression of the modulator (Mj) must have a sufficient expression range to separate its two expression tails compared to the experimental noise level. Low variation markers can be removed by running the deviation filter (Filtering component) on the dataset before starting the MINDy calculation.
  3. Independence of modulator and TF hub - Any modulator whose expression profile (Mj) is not statistically independent of that of the hub transcription factor (TF) must be excluded. This can be determined using a mutual information calculation (ARACNe). This functionality is not currently directly implemented within MINDy in geWorkbench, but can be run directly using the ARACNe component.


Important notes on the calculation

MI Thresholds

  • Unconditional - The unconditional ARACNe mutual information calculation is intended to be used in calculating a significance value on the final delta MI score. This feature is not yet implemented. However, the unconditional run is still performed to initialize ARACNe for the following conditional runs. In particular, parameters for the conditional MI runs are calculated using the number of arrays present in the full dataset, before partitioning for the conditional runs.
  • Conditional - The conditional MI score set will influence how many target markers are returned - the lower the threshold, the more targets will be returned. A target has to have a value above the set threshold in at least one of the two conditional ARACNe runs in order to be included in the output data. The threshold should be kept as close to zero as practical to avoid truncation effects on sub-threshold values.

delta (MI)

As implemented in geWorkbench, the significance of the delta (MI) values is not calculated.

Marker and Array Selection

  • All marker selection is done within the MINDy component interface. MINDy does not respect activated marker subsets in the Markers component.
  • MINDy does respect array subsets activated in the Arrays component.

Advanced - setting ARACNe dataset parameters

MINDy makes use of the original Fixed Bandwidth implementation of ARACNe. This algorithm can make use of parameters which are data set specific, if available (by separate calculation), and which can be used in setting the Kernel Width and Threshold. ARACNe includes default values with which to calculate these parameters, which also depend on the number of arrays in the dataset. However, it is possible to use the newer version of ARACNe (also called ARACNe2), which is included in geWorkbench as a separate component, to calculate the needed values for a particular dataset. The key is that ARACNe looks for two parameter files with the fitted parameters, and will use these if they are found. The files are called "config_kernel.txt" and "config_threshold.txt". If you want to use custom parameters in MINDy, you must create these two files by using a separate PREPROCESSING run of ARACNe on your dataset.

Running ARACNe in PREPROCESSING mode, with algorithm FIXED_BANDWIDTH, will create two files in the geWorkbench root directory, named according to the following template:

  • DatasetName_ARACNe_FBW_kernel.txt
  • DatasetName_ARACNe_FBW_threshold.txt

where "DatasetName" is the name of the microrarray dataset for which you ran ARACNe. For example, for the Bcell-100.exp dataset, the following two files would be generated:

  • Bcell-100.exp_ARACNe_FBW_kernel.txt
  • Bcell-100.exp_ARACNe_FBW_threshold.txt

To make these file available to MINDy, just rename them to "config_kernel.txt" and "config_threshold.txt".

Note that these default file names will be seen and the contents used by all versions of ARACNe, both standalone and within MINDy. So you should remove or rename these files before doing any other work with ARACNe/MINDy.

Setting the Main Parameters

MINDy Parameters Main.png


Modulators List - [From File or From Set] - The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component. The gene expression profiles of the modulators should be independent of that of the hub TF gene as measured by mutual information. This could be determined using a preliminary run of ARACNE including just the modulators and the transcription factor.

When testing multiple modulators, consider the false-positive implications of multiple tests, even though no significance value is being calculated.


Hub Marker - Enter the marker ID for a known or putative transcription factor gene.

  • The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used, selected either in the list or in the default Marker set "Selection".
  • Note that even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component.

Target List - [All Markers, From File, or From Set] - The target list should be composed of genes thought to be regulated by the Hub Marker transcription factor.

  • The list of target markers can be loaded from a file containing a comma separated list,
  • or, a set of markers can be selected from the Markers component.
  • Alternatively, All Markers can be selected.
  • Note - the "All Markers" checkbox at the bottom of the Analysis component should not be used in the MINDy component).

Setting the Advanced Parameters

MINDy Parameters Advanced.png

Sample per Condition (%)

MINDy calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used.


Unconditional and Conditional

  • Conditional - The underlying ARACNe calculation of the conditional mutual information allows a threshold to be set. This allows TF-target pairs with low MI to be screened out - an MI value will only be returned for a target when it exceeds this value. The threshold for the conditional calculations can be specified as a raw mutual information score or as a P-value.
  • The unconditional MI is intended for use in the calculation of statistical significance of the final delta (MI) score and is not currently used.
  • Mutual Info - If selected, the user specifies a threshold for the mutual information (MI) estimates in terms of the raw MI score. For example, a value of 0.1 filters out target genes with a MI score of less than 0.1 in both the high and low modulator expression sets. By default, a MI threshold of 0.1 is set.
    • Note - if the MI score is above threshold in one condition but not the other, the lower score will be set to zero when calculating delta (MI). If the threshold is set too high, there may be artefactually large delta (MI) results as values below the threshold are set to zero before delta (MI) is calculated.
  • P-value - If selected, the user specifies a threshold for the conditional mutual information estimate in terms of a p-value. This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 0.01.
  • Correction - None or Bonferroni - correct for multiple testing if a p-value is specified.

DPI Tolerance

The Data Processing Inequality (triangle inequality) can be used to remove the effects of indirect interactions, e.g. if TF1->TF2->Target, DPI can be used to remove the indirect action of TF1 on the target. Stated another way, the DPI can be used to remove the weakest interaction of those between any three markers. The DPI tolerance specifies the degree of MI sampling error to be accepted, as with a finite sample size an exact value MI can not be calculated.

  • The DPI tolerance is normally set between 0 and 0.15, since values larger than 0.15 yield higher false positives.
  • See the Tutorial_-_ARACNE tutorial page and Margolin et al. 2006 for further details on use of DPI.

DPI Target List

The DPI target list can be used to limit the ARACNE calculation to transcriptional networks. It is used to screen out spurious regulatory interaction signals of genes that are tightly coexpressed but are not in a regulatory relationship to each other, for example genes for two proteins that are in a physical complex and hence always produced in the same amounts. A comma-separated list can be typed in, or it can be loaded from an external file. If used, the DPI Target List should contain all markers that are annotated as transcription factors. Signaling proteins could also be included.

  • Details: If the box is checked, the user selects and loads a file which specifies markers (which should be a list of one or more presumptive transcription factors) which will be given preferential treatment during the DPI edge-removal step. Edges originating from markers on this list will not be removed by edges originating from markers not on this list. However, for DPI calculations where all three markers are members of the list, the weakest connecting edge may still be removed.

Services (Grid)

MINDy can be run either locally within geWorkbench, or remotely as a grid job on caGrid. See the Grid Services section for further details on setting up a grid job.


Running an example MINDy Analysis

Setup

  • For this example, we use a list of four candidate MAPK markers, contained in a CSV format file. Download the example file.
  • In the Component Configuration Manager, check whether the MINDy component has been loaded, and if not, load it.

Run

MINDy main parameter tab set up to run the example below.


MINDy parameters mapk run.png


  1. Load the Bcell-100.exp microarray dataset, which is available in the geWorkbench data directory under "public_data".
  2. In the analysis tab (at lower right in the application), select MINDy Analysis.
  3. In the MINDy Parameters Main tab, populate the Modulators List by loading the file Mapk_list.csv.
  4. Populate the Target List textbox by selecting the choice "All Markers".
  5. Set the hub gene to be probeset name 1973_s_at (MYC).
  6. Parameter values for the conditional mutual information calculation can be set in the Advanced Tab. The values will depend on the specifics of the data set being used, in terms of number of arrays and number of markers. Here we use the default parameters:
    1. sample per condition: 35%
    2. conditional: MI 0.1 (or even 0)
    3. unconditional: not used, control disabled.
    4. DPI target list: blank
    5. DPI tolerance: 0.1
  7. Click Analyze. If successful, the Project Folders component is updated to add the MINDy result node. The result node is shown as a child node of the input dataset. Please note that the Dataset History tab captures the analysis parameters.

Viewing MINDy Results

1. Select the MINDy result node in the Project Folder.

2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on Select All to display all modulators in the Table, List,and Heat Map views. The Modulators Selected is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views. Additional actions include:


  • Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
  • Sort: Click on the column headers or use sort options available in the left pane.
  • Add to Set: Adds selected modulators to a Marker Set. You can select one or more Targets and/or Modulators, using the selection checkboxes.
  • All Markers: This checkbox determines if all the target genes are displayed or only genes in activated marker groups.

3. Select from the various tabs to view the data in alternate formats. See [#_Navigating_MINDy_Visualization Navigating MINDy] for additional information on these data views.

Navigating MINDy Visualization

  • View tabs - MINDy includes the following data views: Modulator, Table, List and Heat Map.
  • Displayed targets filter - the targets displayed can be filtered by activating marker subsets. This is controlled by a selection at the bottom of the MINDy results display.

Modulator

MINDy mapk initial result modulator tab.png

Modulator: This table-based view contains one row per modulator gene. Only modulators selected in this tab are included the other data views. The value of the Mode column for a modulator M is either “+”, “-“ or null (0) depending on if M+ is larger, smaller or equal to M-.


Here all four modulators in the example have been selected, activating the other tabs.

MINDy mapk initial result modulator tab select all.png

Table

MINDy mapk Table.png

Table: The rows of the table represent target genes and the columns represent modulators. Additional actions include:

  • Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
  • Sorting: Displays columns (modulators) from left to right in descending order by; Aggregate ( M#), Enhancing (M+) or Negative (M-).
  • Modulator Limits: Activates the checkbox to limit the columns (modulators) display to a defined value. This selection filters the modulator display based upon the current display order.

Display Options:

  • Color View: Enables a heat map display of each cell based on the value of the score. 1 is displayed as absolute blue; +1 is displayed as absolute red; 0:1 is mapped uniformly from white to shades of red; -1:0 is mapped uniformly from shades of blue to white.
  • Score View: Displays the actual score values rather than the default discretized values.


Here, the "Color View" and "Score View" options have been checked.


MINDy mapk Table display options1.png


The order of the modulators in the table can be altered by choosing among the options "aggregate" (default) and "enhancing" and "negative".


Aggregate: The modulators are sorted in descending order based on the total number of target interactions each has.

Enhancing effect: The number of targets showing an enhancing effect is shown, and the modulators are sorted in descending order by the number of such enhancing interactions with targets.


MINDy mapk Table display options enhancing.png


Negative effect: The number of targets showing a negative effect is shown, and the modulators are sorted in descending order by the number of such negative interactions with targets.


MINDy mapk Table display options negative.png


The displayed target markers can also be filtered using a define marker set.

Here we show an example of selecting every 10th marker...

MINDy mapk Table select markers.png


Click "Add to Set" to create a new Marker set.


MINDy mapk Table target selected.png


The new marker set can now be used to filter the displayed targets. On the pulldown menu at the bottom of the MINDy viewer, choose the desired set of markers to use to restrict the display:


MINDy mapk Table Displayed targets filter menu.png


After the filter is selected, only those markers contained in the selected set will appear:


MINDy mapk Table Displayed targets filtered.png

List

In the list view, all modulators are listed in the first column, their targets in the second column, while the third column contains the scores. That is, each modulator/target pair is listed individually. This view has the advantage of displaying only actual data values.

This contrasts with the Table view, where the results are displayed in a spreadsheet format. Because each modulator will have its own set of targets, not each modulator/target pair will have a value. This results in the Table view being padded with zeros as necessary.

MINDy mapk List tab.png

List: The table has three columns: Modulator, Target and Score. Additional actions include:

  • Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
  • Marker Override: Marker selection preferences. As markers are selected, the number of markers selected is listed next to Enable Selection field. This does not reflect the number of rows.

Heat Map

The list at left shows the available modulators. The heat map is generated for only the targets of one modulator at a time.

Here the first modulator on the list is selected:


MINDy mapk Heat Map tab.png

Scrolling to the bottom of the image shows how the sense of the modulation is reversed for these markers compared to those at the top of the list:


MINDy mapk Heat Map tab MAP4K4 lower.png


Here the second modulator is selected, generating a new heat map:


MINDy mapk Heat Map tab mod2.png



As with the other views, the Heat Map view can be filtered using the "Displayed targets filter". When the marker set generated above, "mapk_targets_every10" is selected, a reduced heat map is generated which illustrates a transition from positive to negative effect of MYC when MAP4K4 is low, but not when MAP4K4 is high.

MINDy mapk Heat Map tab target filtered.png


Heat Map: The Heat Map represents the expression values for individual markers (target genes). It contains two color mosaic panels. The rows correspond to target genes and the columns (arrays) are ordered according to the expression of the TF gene (low to high). In the screenshot above, G8 represents the modulator, whose expression values are used to divide the data into two sets. G9 represents the TF whose interaction with various targets is being evaluated. The left panel correspond to the L- arrays where the modulator is least expressed while the columns on the right panel to the L+ arrays where the modulator is most expressed. Additional actions include:

  • Marker Display: Indicate marker display preferences for the Modulator column ( probe name or symbol).
  • Transcription Factor: Displays the TF entered in the MINDy Analysis parameters.
  • Modulator: Select a modulator from the “Selected Modulators” list to update the heat map display.
  • Refresh: Resets the heat map display.
  • Image Snapshot: Captures the heat map as an image node in the Project Folder.


MINDy Heat Map node.png

Displayed targets filter

This menu is located at the bottom of the MINDy viewer component and controls the target markers displayed in the various view tabs just described. It contains a list of all markers sets available in the Markers component. Any one set can be chosen, and only MINDy targets which are also in this selected subset will be displayed (the intersection of the MINDy result set and the Marker set).

Note - Marker sets do not need to be activated to be used for result filtering here.

Values:

  • All non-zero markers - all markers with delta (MI) > 0 are displayed.
  • Selection - This refers to the default "Selection" set in the Markers component.
  • any other marker set name - all available marker sets will be listed in the menu and any one can be chosen.


References

Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols, 2006 Vol 1(2). ppgs 662-671.'

Wang, K. et al., (in preparation) MINDY: An Algorithm for the Genome-wide Discovery of Modulators of Transcriptional Interactions. See http://arxiv.org/PS_cache/q-bio/pdf/0510/0510030v2.pdf