Difference between revisions of "MINDy"

(Marker Selection)
 
(200 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{TutorialsTopNav}}
 
{{TutorialsTopNav}}
  
==MINDy==
 
  
The MINDy algorithm (Modulator Inference by Network Dynamics) uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti).  This influence is measured in terms of whether there is a '''change in the correlation (measured as mutual information)''' of expression between the TF and its targets Ti conditional on a change in the expression of Mj.  The mutual information values used in MINDy are calculated using the ARACNe algorithm, which is also a part of geWorkbench.
+
=MINDy Analysis=
  
===Outline of MINDy calculations===
+
The MINDy algorithm (Modulator Inference by Network Dynamics) uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti).  This influence is measured in terms of whether there is a '''change in the correlation (measured as mutual information)''' of expression between the TF and its targets Ti conditional on a change in the expression of Mj.  The change in correlation is calculated as the difference in mutual information ('''delta (MI)''') for each TF-Ti pair between the two conditions (modulator high or low).  The mutual information values used in MINDy are calculated using the [[Tutorial_-_ARACNE|ARACNe]] algorithm, which is also a part of geWorkbench.
 +
 
 +
 
 +
==Outline of MINDy calculations==
 
# A microarray gene expression dataset is selected.
 
# A microarray gene expression dataset is selected.
 
# The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti).
 
# The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti).
Line 12: Line 14:
 
# Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression.  The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+).  The remaining arrays are not further considered.
 
# Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression.  The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+).  The remaining arrays are not further considered.
 
# For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta (MI)).
 
# For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta (MI)).
# The resulting delta (MI)s are displayed.  At present, a p-value is not calculated on the delta (MI).  Larger values of delta (I) may indicate an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the '''correlation of expression''' between the hub TF and the target gene.   
+
# The resulting delta (MI)s are displayed.  At present, a p-value is not calculated on the delta (MI).  Larger values of delta (MI) may indicate an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the '''correlation of expression''' between the hub TF and the target gene.   
# The sign of the influence of the modulator is also displayed, e.g. does increasing the expression of the modulator gene Mj increase (or decrease) '''the correlation''' of expression between the hub TF and the target gene?
+
# The sign of the influence of the modulator is also displayed.  A positive modulation effect (+) is where high expression of the modulator gene Mj '''increases ''' the mutual information between the hub TF and the target gene.  Likewise, A negative modulation effect (-) is where increasing the expression of the modulator gene Mj '''decreases''' the mutual information between the hub TF and the target gene.
 +
 
 +
==Prerequisites for MINDy calculations==
 +
# '''Number of arrays''' - A microarray gene expression data set with a sufficient number of arrays must be present.  For optimal results, at least 250 to 300 microarrays of a homogenous cellular system should be used, for example, isolable tumor cells or cell lines, with a range of different expression conditions (distinct cellular phenotypes).  (300 arrays has been found to give good results, while 250 has been found to be an absolute minimum).
 +
# '''Modulator expression variation''' - The expression of the modulator (Mj) must have a sufficient expression range to separate its two expression tails compared to the experimental noise level.  Low variation markers can be removed by running the deviation filter (Filtering component) on the dataset before starting the MINDy calculation.
 +
# '''Independence of modulator and TF hub''' - Any modulator (Mj) whose expression profile  is not statistically independent of that of the hub transcription factor (TF) must be excluded.  This can be determined using a mutual information calculation (ARACNe).  This functionality is not currently directly implemented within MINDy in geWorkbench, but can be run directly using the ARACNe component.
 +
# '''Note''' - The "Target List" also is used to represent all markers which will be used in the calculations.  As such, all hub markers and candidate modulator markers must be included in this list.
 +
 
 +
==Parameters - Main==
 +
 
 +
 
 +
[[Image:MINDy_Parameters_Main.png|{{ImageMaxWidth}}]]
 +
 
 +
 
 +
The Modulators List, Target List, and Hub Marker fields are populated using marker IDs as represented in the Markers component.  Note that these are not gene names, but the identifiers of the particular markers (e.g. Affymetrix probesets) from the expression platform used to collect the data.
 +
 
 +
 
 +
===Modulators List===
 +
 
 +
The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component.  The gene expression profiles of the modulators should be independent of that of the hub TF gene as measured by mutual information.  This could be determined using a preliminary run of ARACNE including just the modulators and the transcription factor.
 +
 
 +
 
 +
Modulators List pulldown menu options are:
 +
* '''From File''' - Load a list of candidate modulators from a file (containing a comma separated list).
 +
* '''From Set''' - Select a set of candidate modulators defined in the Markers component.  When '''From Set''' is selected, entries can also be typed directly into the text box.
 +
 
 +
'''Note''' - any markers in the modulator list must also appear in the target set (see Target List).
 +
 
 +
 
 +
===Target List===
 +
The target list can include all markers or can be restricted to some subset of candidates e.g. thought to be regulated by the Hub Marker transcription factor.
 +
 
 +
Target List pulldown menu options are:
 +
* '''All Markers''' - Run MINDy on all markers in the data set.
 +
* '''From File''' - Load a list of target markers from a file (containing a comma separated list).
 +
* '''From Set''' - Select a target marker set defined in the Markers component.
 +
 
 +
 
 +
* '''Important - Target list must also include the Hub Marker and all Modulator markers'''  
 +
** The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker.  The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations. 
 +
** If "All Markers" is chosen, then no further attention to this point is required.
 +
 
 +
* Note - the "'''All Markers'''" ''checkbox'' at the bottom of the Analysis component should '''not''' be used in the MINDy component.
 +
 
 +
 
 +
===Hub Marker===
 +
Enter the marker ID for a known or putative transcription factor gene.
 +
* The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used, selected either in the list or in the default Marker set "Selection".
 +
* '''Note''' - Even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component.
 +
* '''Note''' - The hub marker must also appear in the target set (see Target List).
 +
 
 +
 
 +
===Note on non-use of activated marker sets===
 +
Because the MINDy component allows the Modulator, Hub and Target marker sets to be chosen directly in its own interface, it does not respect marker sets that may be activated in the Markers component.
 +
 
 +
==Parameters - Advanced==
 +
 
 +
 
 +
[[Image:MINDy_Parameters_Advanced_.png|{{ImageMaxWidth}}]]
 +
 
 +
 
 +
===Sample per Condition (%)===
 +
MINDy calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used.
 +
 
 +
 
 +
===Conditional (threshold settings)===
 +
The underlying ARACNe calculation of the conditional mutual information allows a threshold to be set.  The threshold for the conditional calculations can be specified as a raw mutual information score or as a P-value.  An above-threshold MI value must be obtained in at least one of the two conditional ARACNe runs in order for the target to be included in the output data.
 +
 
 +
Options:
 +
* '''Mutual Info '''- If selected, the user specifies a threshold for the mutual information (MI) estimates in terms of the raw MI score. For example, a value of 0.1 filters out target genes with a MI score of less than 0.1 in both the high and low modulator expression sets.  By default, a MI threshold of 0.1 is set.
 +
** Note - if the MI score is above threshold in one condition but not the other, the lower score will be set to zero when calculating delta (MI). 
 +
* '''P-value''' - If selected, the user specifies a threshold for the conditional mutual information estimate in terms of a p-value.  This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 0.01.  The specified p-value is converted to a MI threshold.
 +
 
 +
* '''Correction''' -  correct for multiple testing if a p-value is specified.  The choices are
 +
** None - no correction of the p-value
 +
** Bonferroni -  apply the Bonferroni correction to the p-value before its is converted to a threshold MI score.
 +
 
 +
* Note on p-value calculation in MINDy in geWorkbench - The p-value calculation for the conditional runs of ARACNe is calculated using an approximation described in Margolin et al., 2006.
 +
 
 +
===Unconditional (threshold settings) (Not used in MINDy)===
 +
The unconditional MI is intended for use in the calculation of statistical significance of the final delta (MI) score and is not currently used.  This control is disabled.
 +
 
 +
 
 +
 
 +
 
 +
===ARACNe parameter files not supported in MINDy in geWorkbench===
 +
ARACNe allows files config_threshold.txt and config_kernel.txt to be read in from disk if present.  However, the version of ARACNe used in MINDy does not support this feature.  It uses default parameters to derive the threshold and kernel width values.
 +
 
 +
==Important notes on the calculation==
 +
 
 +
 
 +
===delta (MI)===
 +
As implemented in geWorkbench, the significance of the delta (MI) values is not calculated.
 +
 
 +
 
 +
===Marker and Array Selection===
 +
* '''Marker Sets''' - All marker selection is done within the MINDy component interface.  If the option "From Sets" is chosen, one marker set from the Markers component can be selected.  MINDy '''does not''' respect activated marker subsets in the Markers component - that is, checking the box next to a marker subset in the Markers component has no effect on the markers used for the Mindy calculation or display.
 +
 
 +
* '''Array Sets''' - MINDy '''does''' respect array subsets activated in the Arrays component.  That is, the arrays used can be limited to particular subsets by activating those subsets in the Arrays component (by checking the boxes next to them).
 +
 
 +
* '''Important - Target list must also include the Hub Marker and all Modulator markers'''
 +
** The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker.  The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations. 
 +
** If "All Markers" is chosen, then no further attention to this point is required.
 +
 
 +
* '''Testing of multiple modulators''' - When testing multiple modulators, consider the false-positive implications of multiple tests, even though no significance value is being calculated.
 +
 
 +
===ARACNe configuration files===
 +
The following discussion of configuration files applies only to the local version of MINDy, not the grid version.  On the grid version, only the default parameters for kernel width and threshold will be used.
 +
 
 +
MINDy in geWorkbench uses the original, fixed-bandwidth version of ARACNe.  This version of ARACNe uses two configuration files, config_kernel.txt and config_threshold.txt.  If these two files are not supplied, default parameters will be used, which should be sufficient for most cases.  The parameter files can also be generated using ARACNe2 in geWorkbench.  However, the files will be named after the dataset from which they are generated, and must be renamed to config_kernel.txt and config_threshold.txt to be seen by ARACNe.  Files with those names, if present in the geWorkbench installation root folder, will override any other dataset-specific configuration files for ARACNe2, and so should not be left on the system after MINDy has been run.
 +
 
 +
=Services (Grid)=
 +
 
 +
MINDy can be run either locally within geWorkbench, or remotely as a grid job on caGrid.  See the [[Tutorial_-_Grid_Services | Grid Services]] section for further details on setting up a grid job.  A Columbia grid login must be obtained to use the Columbia grid service.
 +
 
 +
 
 +
=Running an example MINDy Analysis=
 +
 
 +
==Analysis Framework==
 +
 
 +
For general details on saving and storing parameter settings, and launching the analysis, see the [[Tutorial_-_Analysis|Analysis]] tutorial page.
 +
 
 +
 
 +
==Setup==
 +
* For this example, we use a list of four candidate MAPK markers, contained in a CSV format file. Right-click on the following link and save the file [[Media:Mapk_list.csv|Mapk_list.csv]] to disk.
 +
* In the Component Configuration Manager, check whether the MINDy component has been loaded, and if not, load it.
 +
 
 +
 
 +
==Run==
 +
 
 +
The figure illustrates the MINDy main parameter tab set up to run the example below.
 +
 
 +
 
 +
Modulator list loaded from file:
 +
 
 +
 
 +
[[Image:MINDy_parameters_mapk_run.png|{{ImageMaxWidth}}]]
 +
 
 +
 
 +
Modulator list loaded from Marker set:
 +
 
  
===Important notes on the calculation===
+
[[Image:MINDy_parameters_mapk_run_fromset.png|{{ImageMaxWidth}}]]
  
====MI Thresholds====
 
* '''Unconditional''' - The unconditional ARACNe mutual information calculation is intended to be used in calculating a significance value on the final delta MI score.  This feature is not yet implemented.  However, the unconditional run is still performed to initialize ARACNe for the following conditional runs.  In particular, parameters for the conditional MI runs are calculated  using the number of arrays present in the full dataset, before partitioning for the conditional runs.
 
* '''Conditional''' -  The conditional MI score set will influence how many target markers are returned - the lower the threshold, the more targets will be returned.  A target has to have a value above the set threshold in at least one of the two conditional ARACNe runs in order to be included in the output data.  The threshold should be kept as close to zero as practical to avoid truncation effects on sub-threshold values.
 
  
====delta (MI)====
 
As implemented in geWorkbench, the significance of the delta (MI) values are not calculated.
 
  
====Marker and Array Selection====
+
# Load the Bcell-100.exp microarray dataset, which is available in the geWorkbench data directory under "public_data".  If you wish to see gene names in the results, you must also load the associated annotation file.  See e.g. the tutorial [[Local_Data_Files |Local Data Files]] for further details.
* All marker selection is down within the MINDy component interfaceMINDy does not respect activated marker subsets in the Markers component.
+
# In the analysis tab (at lower right in the application), select''' MINDy Analysis'''.
 +
# In the MINDy Parameters Main tab, populate the ''' Modulators List''' by loading the file [[Media:Mapk_list.csv|Mapk_list.csv]].  Or, the file can first be loaded into the Markers component with the "Load Set" button, then selected as "From Sets" in the Mindy parameters.
 +
# Populate the '''Target List''' textbox by selecting the choice "All Markers". 
 +
# Set the hub gene to be marker (probeset) name 37724_at (MYC)Type in the marker name directly, or search for and select it in the Markers component.
 +
# Parameter values for the conditional mutual information calculation can be set in the Advanced Tab.  The values will depend on the specifics of the data set being used, in terms of number of arrays and number of markers.  Here we use the default parameters:
 +
## Sample per Condition: 35%
 +
## Conditional: MI 0.1
 +
## Unconditional: not used, control disabled.
 +
## DPI target list: not used, control disabled.
 +
## DPI tolerance: not used, control disabled.
 +
#Click '''Analyze'''. If successful, the [[Workspace]] is updated to add the MINDy result node.  The result node is shown as a child node of the input dataset Bcell-100.exp. Please note that the Dataset History tab captures the analysis parameters.
  
* MINDy does respect array subsets activated in the Arrays component.
+
=Viewing MINDy Results=
  
====Advanced - setting ARACNe dataset parameters====
+
==General==
  
MINDy makes use of the original Fixed Bandwidth implementation of ARACNe.  This algorithm can make use of parameters which are data set specific, if available (by separate calculation), and which can be used in setting the Kernel Width and Threshold. ARACNe includes default values with which to calculate these parameters, which also depend on the number of arrays in the dataset.   However, it is possible to use the newer version of ARACNe (also called ARACNe2), which is also included in geWorkbench as a separate component, to calculate the needed values for a particular datasetThe key is that ARACNe looks for two parameter files with the fitted parameters, and will use these if they are found.  The files are called "config_kernel.txt" and "config_threshold.txt".  If you want to use custom parameters in MINDy, you must create these two files by using a separate PREPROCESSING run of ARACNe on your dataset.
+
1. The MINDy result node should be automatically selected in the [[Workspace]] once the result is available. If not, select it.  This will display the MINDy result viewer.
  
Running ARACNe in PREPROCESSING mode, with algorithm FIXED_BANDWIDTH, will create two files in the geWorkbench root directory, named according to the following template:
+
2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on '''Select All''' to display all modulators in the Table, List, and Heat Map views. The '''Modulators Selected''' is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views.
  
* DatasetName_ARACNe_FBW_kernel.txt
 
* DatasetName_ARACNe_FBW_threshold.txt
 
  
where "DatasetName" is the name of the microrarray dataset for which you ran ARACNe.  For example, for the Bcell-100.exp dataset, the following two files would be generated:
+
==Common Features==
  
* Bcell-100.exp_ARACNe_FBW_kernel.txt
+
===Net modulatory effect values===
* Bcell-100.exp_ARACNe_FBW_threshold.txt
 
  
To make these file available to MINDy, just rename them to "config_kernel.txt" and "config_threshold.txt".
+
The first step of the MINDy algorithm is to sort the input expression arrays by the expression value for the candidate modulator.  It then forms two groups of arrays, those where the candidate modulator is most highly expressed, and those where it is least expressed.  Here we will refer to these as the "conditional high set" and the "conditional low set" - that is, they are sets of arrays conditioned on the expression of the candidate modulator.
  
Note that these default file names will be seen and the contents used by all versions of ARACNe.  So you should remove or rename these files before doing any other work with ARACNe/MINDy.
+
The following symbols are used to break out the total, positive, and negative effects:
  
===Prerequisites for MINDy calculations===
+
* '''M#''' - For reach modulator, the total number of above-threshold transcription-factor-target (TF-Ti) MI scores found.
 +
* '''M+''' - The number of targets for which the TF-Ti pairs showed higher MI in the conditional high set compared with the MI in the conditional low set.
 +
* '''M-''' - The number of targets for which the TF-Ti pairs showed lower MI in the conditional high set compared with the MI in the conditional low set.
  
The MINDy calculation contains certain assumptions (Wang et al, unpublished):
 
  
(a) the expression of the modulator (gm)must have a sufficient expression range to separate its two expression tails compared to the experimental noise level. This can be done by running the deviation filter (Filtering component) on the dataset before starting the MINDy calculation.
+
===Controls===
  
(b) Any modulator whose expression profile (Mj) is not statistically independent of that of the hub transcription factor (TF) must be excludedThis can be determined using a mutual information calculation (ARACNE). This functionality is not currently directly implemented within MINDy in geWorkbench.
+
====Marker Display====
 +
Controls how the marker name is displayedOptions are:
 +
* '''Symbol''' - If an annotation file has been loaded, use the Gene Symbol associated with each marker.  
 +
* '''Probe Name''' - Use the marker probe name as given in the dataset.
  
(c) Optimally, at least 100 separate microarrays should be included in the analysis, with a range of different expression conditions (distinct cellular phenotypes).
+
====Add to Set====
 +
(Except Heat Map) -  Adds selected markers to a Marker Set. You can select one or more Targets and/or Modulators, using the selection check-boxes.
  
===Setting the Main Parameters===
 
  
===[[Image:T_Mindy_Main.png]]===
+
====Export====
 +
The results shown in the Modulator, Table, or List tabs can be exported to a CSV format file on disk using the "Export" button. Only the table in the  currently displayed tab is exported.
  
'''Modulators List''' - ['''From File''' or '''From Set'''] - The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component.  The gene expression profiles of the modulators should be independent of that of the hub TF gene as measured by mutual information.  This could be determined using a preliminary run of ARACNE including just the modulators and the transcription factor.
+
===Displayed targets filter===
 +
This menu is located at the bottom of the MINDy viewer component and controls the target markers displayed in the various view tabs just described. It contains a list of all markers sets available in the Markers component.  Any one set can be chosen, and only MINDy targets which are also in this selected subset will be displayed (the intersection of the MINDy result set and the Marker set).  
  
NOTE- in the next version of MINDy, the ability to calculate a p-value on the conditional mutual information score will be implemented.  When available, it will be useful to limit the number of modulator genes tested to minimize the multiple testing correction.
+
'''Note''' - Marker sets do not need to be activated to be used for result filtering here.
  
'''Hub Marker''' - Enter the marker ID for a known or putative transcription factor gene. The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used. Note that even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component, either in the list or in the default Marker set "Selection".
+
====Values====
 +
* '''All non-zero markers''' - all markers with delta (MI) > 0 are displayed.
 +
* '''Selection''' - This refers to the default "Selection" set in the Markers component.  
 +
* any other marker set name - all available marker sets will be listed in the menu and any one can be chosen.
  
'''Target List''' - ['''All Markers''', '''From File''', or '''From Set'''] - The target list should be composed of genes thought to be regulated by the Hub Marker transcription factor. The list of target markers can be loaded from a file containing a comma separated list, or a set of markers can be selected from the Markers component. Alternatively, '''All Markers''' can be selected.
 
  
(Note - the "'''All Markers'''" ''checkbox'' at the bottom of the Analysis component should '''not''' be used in the MINDy component).
+
==Modulator tab==
  
===Setting the Advanced Parameters===
+
[[Image:MINDy_mapk_initial_result_modulator_tab.png|{{ImageMaxWidth}}]]
  
[[Image:T_MINDy_Paramters_Advanced.png]]
+
This table-based view contains one row per modulator gene.  It summarizes the results, and is used to control the targets displayed in the other view tabs.
  
'''Sample per Condition (%)''' - MINDy calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used.
+
===Controls===
  
 +
* '''List Selections'''
 +
** '''Select All''' checkbox - When checked, all modulators will be selected.  If not checked, the individual markers can be selected using the individual check boxes in the table.
 +
* '''Modulators selected''' - Shows a count of the number of individual modulators that have been selected in the table.
  
'''Unconditional''' and '''Conditional'''- The underlying ARACNe calculation of mutual information allows a threshold to be set.  This allows TF-target pairs with low MI to be screened out - an MI value will only be returned for a target when it exceeds this value.  The threshold can be specified as a mutual information value or as a P-value.  (Note - in current implementations of geWorkbench, the p-value calculation is not available for the conditional MI calculation, and we recommend that a MI value of around 0.2 be tried.  The unconditional MI is intended for use in the calculation of statistical significance of the final delta MI score and is not currently used.
 
  
* '''Mutual Info '''- If selected, the user specifies a threshold for the mutual information (MI) estimates in terms of the raw MI score. For example, a value of 0.20 filters out target genes with a MI score of less than 0.20. By default, a MI threshold of 0.1 is set.
+
===Columns===
  
* '''P-value''' - If selected, the user specifies a threshold for the mutual information estimate in terms of a p-value - an estimate of the significance of the value.  This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 0.01. 
+
* M#, M+ and M- have already been described above under [[Tutorial_-_MINDy#Net_modulatory_effect_values |"Net modulatory effect values"]].  
** NOTE - p-value is not currently offered for the conditional calculation.  The p-value for the unconditional calculation is a rough estimate only.  Full p-value calculations will be implemented in a future release of MINDy within geWorkbench.
 
  
 +
* '''Check-boxes''' - Use these to select which modulators to include in generating the data views on the other tabs (Table, List, and Heat Map).
 +
* '''Modulator''' - The gene symbol or probe name for the putative modulators tested.
  
* '''Correction''' - None or Bonferroni - correct for multiple testing if a p-value is specified.
+
* '''Mode''' - Shows whether the net sum effect of the modulator over all its targets was enhancing or negative.
 +
**  If M+ - M- > 0, the result is "+", that is the candidate had a net positive modulatory effect (increased MI).
 +
**  If M+ - M- < 0, the result is "-", that is the candidate had a net negative modulatory effect (decreased MI).
 +
**  If M+ - M- = 0, the result is "0", that is the candidate modulator had a balanced effect.
  
  
* '''DPI Tolerance''' - The Data Processing Inequality (triangle inequality)can be used to remove the effects of indirect interactions, e.g. if TF1->TF2->Target, DPI can be used to remove the indirect action of TF1 on the target.  Stated another way, the DPI can be used to remove the weakest interaction of those between any three markers. The DPI tolerance specifies the degree of sampling error to be accepted, as with a finite sample size an exact value MI can not be calculated.
+
Here all four modulators in the example have been selected, activating the other tabs.
** The DPI tolerance is normally between 0 and 0.15 since values larger than 0.15 yields higher false positives.
+
** See the [[Tutorial_-_ARACNE]] tutorial page and Margolin et al. 2006 for further details on use of DPI.
+
[[Image:MINDy_mapk_initial_result_modulator_tab_select_all.png]]
  
 +
==Table==
  
* '''DPI Target List''' - The DPI target list can be used to limit the ARACNE calculation to transcriptional networks. It is used to screen out spurious regulatory interaction signals of genes that are tightly coexpressed but are not in a regulatory relationship to each other, for example genes for two proteins that are in a physical complex and hence always produced in the same amounts. A comma-separated list can be typed in, or it can be loaded from an external file. If used, the DPI Target List should contain all markers that are annotated as transcription factors. Signaling proteins could also be included.
 
** Details: If the box is checked, the user selects and loads a file which specifies markers (which should be a list of one or more presumptive transcription factors) which will be given preferential treatment during the DPI edge-removal step. Edges originating from markers on this list will not be removed by edges originating from markers not on this list. However, for DPI calculations where all three markers are members of the list, the weakest connecting edge may still be removed.
 
  
==Services (Grid)==
+
[[Image:MINDy_mapk_Table.png|{{ImageMaxWidth}}]]
  
MINDy can be run either locally within geWorkbench, or remotely as a grid job on caGrid.  See the [[Tutorial_-_Grid_Services | Grid Services]] section for further details on setting up a grid job.
 
  
 +
The column "Target" represents the target genes and the remaining columns represent the modulators tested.
  
===Running a MINDy Analysis===
+
* '''Discretization of scores''' - By default, the MI scores are discretized to +1 and -1 for positive and negative scores, respectively. Discretized scores are used to quantify the number of positive and negative modulation effects, as shown e.g. in the numbers in the column headers. If the "Score View" option is chosen, the actual scores will be shown. 
  
1. Select a microarray set node in the Project Folder.
 
  
2. In the analysis pane (lower right of the application), select''' MINDy Analysis'''.
+
===Controls===
 +
If many modulators were tested, it may be desirable to sort the display by their results.
  
3. In the Main tab, populate the ''' Modulators List''' by selecting a set of markers defined in the Markers component, or load a list from a file.
+
* '''Display Options:'''
 +
** '''Color View:''' Enables a heat map display of each cell based on the value of the score.  Positive values are displayed in shades of red, while negative values are displayed in shades of blue. The saturation of the color increases (starting from white for 0) with increasing absolute value of the score.
 +
** '''Score View:''' Displays the actual score values rather than the default discretized values.
  
4. Populate the '''Target List''' textbox by selecting the choice "All Markers", or by selecting a set of markers defined in the Markers component, or by loading a list from a file.
 
  
5. Populate the '''Hub Gene''' textbox to designate the TF gene by (1) typing the marker name (as displayed in the Markers component) or (2) in the Selection Area (lower left of the application) Marker Tab, click on the marker name corresponding to the TF.
+
Here, both the "Color View" and "Score View" options have been checked.
  
6. Parameter values for the unconditional and conditional mutual information calculations can be set in the Advanced Tab.  The values will depend on the specifics of the data set being used, in terms of number of arrays and number of markers.  A suggested "first try" set of parameters as shown in the above screenshot of the "advanced parameters" tab is:
 
  
* sample per condition: 35%
+
[[Image:MINDy_mapk_Table_display_options1.png|{{ImageMaxWidth}}]]
* conditional: MI 0.1 (or even 0)
 
* unconditional: MI 0.1
 
* DPI target list: blank
 
* DPI tolerance: 0.1
 
  
  
7. Click '''Analyze'''. If successful, the project window is updated to reflect the MINDy result node.  The result node is shown as a child node of the input dataset. Please note that the Dataset History tab captures the analysis parameters.
+
The modulators in the above figure are by default sorted by the aggregate count of targets for which a modulatory effect was seen (M#).
  
===Viewing MINDy Results===
+
The table can be sorted on the values in any column by clicking on its header.  In addition, the column headers (the modulators) can be sorted as described next:
  
1. Select the MINDy result node in the Project Folder.
+
* '''Modulator Sorting:''' - Displays columns (modulators) from left to right in descending order by the counts of: Aggregate ( M#), Enhancing (M+) or Negative (M-).
 +
** '''Aggregate (M#)''': The column header displays "M#" and the count of all targets for which a positive or negative modulatory effect was seen.  
 +
** '''Enhancing (M+)''': The column header displays "M+" and the count of all targets for which a positive modulatory effect was seen.
 +
** '''Negative (M-)''': The column header displays "M-" and the count of all targets for which a negative modulatory effect was seen.  
  
2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on '''Select All''' to display all modulators in the Table, List,and Heat Map views. The '''Modulators Selected''' is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views. Additional actions include:
+
Example of sorting by "Enhancing":  The first modulator column header is "MAP4K4 (M+ 103).
  
  
* '''Marker Display:''' Indicate marker display preferences for the Modulator column ( probe name or symbol).
+
[[Image:MINDy_mapk_Table_display_options_enhancing.png|{{ImageMaxWidth}}]]
  
* '''Sort:''' Click on the column headers or use sort options available in the left pane.
 
  
* '''Add to Set: ''' Adds selected modulators to a Marker Set. You can select one or more Targets and/or Modulators, using the selection checkboxes.
+
Example of sorting by "Negative": The first modulator column header is "MAP4K2 (M- 54).
  
* '''All Markers:''' This checkbox determines if all the target genes are displayed or only genes in activated marker groups.
 
  
3. Select from the various tabs to view the data in alternate formats. See [#_Navigating_MINDy_Visualization Navigating MINDy] for additional information on these data views.
+
[[Image:MINDy_mapk_Table_display_options_negative.png|{{ImageMaxWidth}}]]
  
===Navigating MINDy Visualization===
 
  
MINDy includes the following data views:  Modulator, Table, List and Heat Map.
+
* '''Modulator Limits:''' When the checkbox is selected, the number of columns (modulators) is limited to the value set in the selector box.
====Modulator====
 
  
[[Image:T_Mindy_Viewer_Modulator.png]]
+
* '''Marker Selection'''
 +
** '''Enable Selection''' - When checked, a column of checkboxes appears in the table to allow individual selection of targets.  Shows a count of all selected modulators and targets.
 +
** '''All Modulators''' - Select or Clear buttons - Selects or clear all modulator check boxes (table columns).
 +
** '''All Targets''' - Select or clear buttons - Select or clear all displayed targets (table rows).
 +
** '''Add to Set''' (button) - All selected markers (modulators and/or targets) will be added to a new set in the Markers component.
  
'''Modulator''': This table-based view contains one row per modulator gene. Only modulators selected in this tab are included the other data views. The value of the Mode column for a modulator M is either “+”, “-“ or null (0) depending on if M+ is larger, smaller or equal to M-.
 
  
 +
* '''Displayed targets filter'''  - The displayed target markers can be filtered using marker set defined in the Markers component.  After the filter is selected, only those markers contained in the selected set will appear.
  
====Table====
+
==List==
  
[[Image:T_Mindy_Viewer_Table.png]]
+
In the list view, all modulators are listed in the first column, their targets in the second column, while the third column contains the delta (MI) scores. That is, each modulator/target pair is listed individually. 
  
'''Table''': The rows of the table represent target genes and the columns represent modulators. Additional actions include:
+
This view has the advantage of displaying only actual data values.  This contrasts with the Table view, where the results are displayed in a spreadsheet format.  Because each modulator will have its own set of targets, not each modulator/target cell in the table will have a value.  Results in the Table view are padded with zeros as necessary.
  
* '''Marker Display: '''Indicate marker display preferences for the Modulator column ( probe name or symbol).
 
  
* '''Sorting: ''' Displays columns (modulators) from left to right in descending order by; Aggregate ( M#), Enhancing (M+) or Negative (M-).
+
[[Image:MINDy_mapk_List_tab.png]]
  
* '''Modulator Limits: '''Activates the checkbox to limit the columns (modulators) display to a defined value. This selection filters the modulator display based upon the current display order.
 
  
'''Display Options: '''
+
* '''Marker Selection''' (checkbox) - Controls which markers are used by the "Add to Set" button.
 +
** '''Enable Selection''' - When checked, a column of checkboxes appears beside each target and beside each marker to allow individual selection of each.  Shows a count of all selected modulators and targets.
 +
** '''Select all Modulators''' - Selects all modulators.
 +
** '''Select all Targets''' - Selects all targets.
 +
** '''Add to Set''' (button) - All selected markers will be added to a new set in the Markers component.
  
* '''Color View:''' Enables a heat map display of each cell based on the value of the score. 1 is displayed as absolute blue; +1 is displayed as absolute red; 0:1 is mapped uniformly from white to shades of red; -1:0 is mapped uniformly from shades of blue to white.
 
  
* '''Score View:''' Displays the discretized score values.
+
==Heat Map==
 +
The Heat Map represents the expression values for individual markers (target genes).  It contains two color mosaic panels. The rows correspond to target genes and are ordered according to their Pearson's correlation to the expression of the TF.  The columns (arrays) are ordered according to the expression of the TF gene, low (left) to high (right). The mosaic at left corresponds to the arrays where modulator was least expressed.  The mosaic at right corresponds to the arrays where the modulator expression was highest.  
  
 +
===Controls===
  
====List====
+
* '''Transcription Factor:''' Displays the TF hub gene entered in the MINDy Analysis parameters.
  
[[Image:T_Mindy_Viewer_List.png]]
+
* '''Modulators: ''' - The heat map is generated for the targets of only one modulator at a time.  The list shows the available modulators, and the text box above it shows the selected modulator.
  
'''List: ''' The table has three columns: Modulator, Target and Score. Additional actions include:
+
Here the first modulator on the list is selected:
  
* '''''Marker Display:''''' Indicate marker display preferences for the Modulator column ( probe name or symbol).
 
  
* '''''Marker Override: ''''' Marker selection preferences. As markers are selected, the number of markers selected is listed next to Enable Selection field. This does not reflect the number of rows.
+
[[Image:MINDy_mapk_Heat_Map_tab.png|{{ImageMaxWidth}}]]
  
====Heat Map====
 
  
[[Image:T_Mindy_Viewer_HeatMap.png]]
+
As shown below, scrolling to the bottom of the Heat Map image shows how the effect of modulation can differ for different genes.  The genes at top are directly correlated with MYC when MAP4K4 is low, whereas the genes at bottom are anti-correlated.
  
'''Heat Map:''' The Heat Map represents the expression values for individual markers (target genes).  It contains two color mosaic panels. The rows correspond to target genes and the columns (arrays) are ordered according to the expression of the TF gene (low to high). In the screenshot above, G8 represents the modulator, whose expression values are used to divide the data into two sets.  G9 represents the TF whose interaction with various targets is being evaluated. The left panel correspond to the L- arrays where the modulator is least expressed while the columns on the right panel to the L+ arrays where the modulator is most expressed. Additional actions include:
 
  
* '''''Marker Display:''''' Indicate marker display preferences for the Modulator column ( probe name or symbol).
+
[[Image:MINDy_mapk_Heat_Map_tab_MAP4K4_lower.png|{{ImageMaxWidth}}]]
  
* '''''Transcription Factor:''''' Displays the TF entered in the MINDy Analysis parameters.
 
  
* '''''Modulator: '''''Select a modulator from the “Selected Modulators” list to update the heat map display.
+
* '''Displayed targets filter''' - The targets displayed in the Heat Map view can be limited to those defined in a marker set.
  
* '''''Refresh: '''''Resets the heat map display.
+
* '''Image Snapshot:''' - Captures the Heat Map as an image node in the [[Workspace]].
  
* '''''Image Snapshot:''''' Captures the heat map as an image node in the Project Folder.
 
  
===References===
+
[[Image:MINDy_Heat_Map_node.png]]
  
Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols, 2006 Vol 1(2). ppgs 662-671.''''''
+
=References=
  
Wang, K. et al., (in preparation) MINDY: An Algorithm for the Genome-wide Discovery of Modulators of Transcriptional InteractionsSee  http://arxiv.org/PS_cache/q-bio/pdf/0510/0510030v2.pdf
+
# Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. (2006) Reverse Engineering Cellular Networks. Nature Protocols 1(2):662-671. [http://www.ncbi.nlm.nih.gov/pubmed/17406294 link to pub.].
 +
# Wang K, Saito M, Bisikirska BC, Alvarez MJ, Lim WK, Rajbhandari P, Shen Q, Nemenman I, Basso K, Margolin AA, Klein U, Dalla-Favera R, Califano A. (2009)  Genome-wide identification of post-translational modulators of transcription factor activity in human B cellsNat Biotechnol. 27(9):829-39. [http://www.ncbi.nlm.nih.gov/pubmed/19741643 link to pub.].

Latest revision as of 18:52, 22 January 2014

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



MINDy Analysis

The MINDy algorithm (Modulator Inference by Network Dynamics) uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti). This influence is measured in terms of whether there is a change in the correlation (measured as mutual information) of expression between the TF and its targets Ti conditional on a change in the expression of Mj. The change in correlation is calculated as the difference in mutual information (delta (MI)) for each TF-Ti pair between the two conditions (modulator high or low). The mutual information values used in MINDy are calculated using the ARACNe algorithm, which is also a part of geWorkbench.


Outline of MINDy calculations

  1. A microarray gene expression dataset is selected.
  2. The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti).
  3. Parameters for the MINDy run are set.
  4. Using the expression value of the chosen modulator gene Mj, the arrays in the experiment are ordered (as columns in the data matrix), from lowest to highest.
  5. Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression. The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+). The remaining arrays are not further considered.
  6. For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta (MI)).
  7. The resulting delta (MI)s are displayed. At present, a p-value is not calculated on the delta (MI). Larger values of delta (MI) may indicate an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the correlation of expression between the hub TF and the target gene.
  8. The sign of the influence of the modulator is also displayed. A positive modulation effect (+) is where high expression of the modulator gene Mj increases the mutual information between the hub TF and the target gene. Likewise, A negative modulation effect (-) is where increasing the expression of the modulator gene Mj decreases the mutual information between the hub TF and the target gene.

Prerequisites for MINDy calculations

  1. Number of arrays - A microarray gene expression data set with a sufficient number of arrays must be present. For optimal results, at least 250 to 300 microarrays of a homogenous cellular system should be used, for example, isolable tumor cells or cell lines, with a range of different expression conditions (distinct cellular phenotypes). (300 arrays has been found to give good results, while 250 has been found to be an absolute minimum).
  2. Modulator expression variation - The expression of the modulator (Mj) must have a sufficient expression range to separate its two expression tails compared to the experimental noise level. Low variation markers can be removed by running the deviation filter (Filtering component) on the dataset before starting the MINDy calculation.
  3. Independence of modulator and TF hub - Any modulator (Mj) whose expression profile is not statistically independent of that of the hub transcription factor (TF) must be excluded. This can be determined using a mutual information calculation (ARACNe). This functionality is not currently directly implemented within MINDy in geWorkbench, but can be run directly using the ARACNe component.
  4. Note - The "Target List" also is used to represent all markers which will be used in the calculations. As such, all hub markers and candidate modulator markers must be included in this list.

Parameters - Main

MINDy Parameters Main.png


The Modulators List, Target List, and Hub Marker fields are populated using marker IDs as represented in the Markers component. Note that these are not gene names, but the identifiers of the particular markers (e.g. Affymetrix probesets) from the expression platform used to collect the data.


Modulators List

The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component. The gene expression profiles of the modulators should be independent of that of the hub TF gene as measured by mutual information. This could be determined using a preliminary run of ARACNE including just the modulators and the transcription factor.


Modulators List pulldown menu options are:

  • From File - Load a list of candidate modulators from a file (containing a comma separated list).
  • From Set - Select a set of candidate modulators defined in the Markers component. When From Set is selected, entries can also be typed directly into the text box.

Note - any markers in the modulator list must also appear in the target set (see Target List).


Target List

The target list can include all markers or can be restricted to some subset of candidates e.g. thought to be regulated by the Hub Marker transcription factor.

Target List pulldown menu options are:

  • All Markers - Run MINDy on all markers in the data set.
  • From File - Load a list of target markers from a file (containing a comma separated list).
  • From Set - Select a target marker set defined in the Markers component.


  • Important - Target list must also include the Hub Marker and all Modulator markers
    • The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker. The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations.
    • If "All Markers" is chosen, then no further attention to this point is required.
  • Note - the "All Markers" checkbox at the bottom of the Analysis component should not be used in the MINDy component.


Hub Marker

Enter the marker ID for a known or putative transcription factor gene.

  • The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used, selected either in the list or in the default Marker set "Selection".
  • Note - Even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component.
  • Note - The hub marker must also appear in the target set (see Target List).


Note on non-use of activated marker sets

Because the MINDy component allows the Modulator, Hub and Target marker sets to be chosen directly in its own interface, it does not respect marker sets that may be activated in the Markers component.

Parameters - Advanced

MINDy Parameters Advanced .png


Sample per Condition (%)

MINDy calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used.


Conditional (threshold settings)

The underlying ARACNe calculation of the conditional mutual information allows a threshold to be set. The threshold for the conditional calculations can be specified as a raw mutual information score or as a P-value. An above-threshold MI value must be obtained in at least one of the two conditional ARACNe runs in order for the target to be included in the output data.

Options:

  • Mutual Info - If selected, the user specifies a threshold for the mutual information (MI) estimates in terms of the raw MI score. For example, a value of 0.1 filters out target genes with a MI score of less than 0.1 in both the high and low modulator expression sets. By default, a MI threshold of 0.1 is set.
    • Note - if the MI score is above threshold in one condition but not the other, the lower score will be set to zero when calculating delta (MI).
  • P-value - If selected, the user specifies a threshold for the conditional mutual information estimate in terms of a p-value. This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 0.01. The specified p-value is converted to a MI threshold.
  • Correction - correct for multiple testing if a p-value is specified. The choices are
    • None - no correction of the p-value
    • Bonferroni - apply the Bonferroni correction to the p-value before its is converted to a threshold MI score.
  • Note on p-value calculation in MINDy in geWorkbench - The p-value calculation for the conditional runs of ARACNe is calculated using an approximation described in Margolin et al., 2006.

Unconditional (threshold settings) (Not used in MINDy)

The unconditional MI is intended for use in the calculation of statistical significance of the final delta (MI) score and is not currently used. This control is disabled.



ARACNe parameter files not supported in MINDy in geWorkbench

ARACNe allows files config_threshold.txt and config_kernel.txt to be read in from disk if present. However, the version of ARACNe used in MINDy does not support this feature. It uses default parameters to derive the threshold and kernel width values.

Important notes on the calculation

delta (MI)

As implemented in geWorkbench, the significance of the delta (MI) values is not calculated.


Marker and Array Selection

  • Marker Sets - All marker selection is done within the MINDy component interface. If the option "From Sets" is chosen, one marker set from the Markers component can be selected. MINDy does not respect activated marker subsets in the Markers component - that is, checking the box next to a marker subset in the Markers component has no effect on the markers used for the Mindy calculation or display.
  • Array Sets - MINDy does respect array subsets activated in the Arrays component. That is, the arrays used can be limited to particular subsets by activating those subsets in the Arrays component (by checking the boxes next to them).
  • Important - Target list must also include the Hub Marker and all Modulator markers
    • The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker. The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations.
    • If "All Markers" is chosen, then no further attention to this point is required.
  • Testing of multiple modulators - When testing multiple modulators, consider the false-positive implications of multiple tests, even though no significance value is being calculated.

ARACNe configuration files

The following discussion of configuration files applies only to the local version of MINDy, not the grid version. On the grid version, only the default parameters for kernel width and threshold will be used.

MINDy in geWorkbench uses the original, fixed-bandwidth version of ARACNe. This version of ARACNe uses two configuration files, config_kernel.txt and config_threshold.txt. If these two files are not supplied, default parameters will be used, which should be sufficient for most cases. The parameter files can also be generated using ARACNe2 in geWorkbench. However, the files will be named after the dataset from which they are generated, and must be renamed to config_kernel.txt and config_threshold.txt to be seen by ARACNe. Files with those names, if present in the geWorkbench installation root folder, will override any other dataset-specific configuration files for ARACNe2, and so should not be left on the system after MINDy has been run.

Services (Grid)

MINDy can be run either locally within geWorkbench, or remotely as a grid job on caGrid. See the Grid Services section for further details on setting up a grid job. A Columbia grid login must be obtained to use the Columbia grid service.


Running an example MINDy Analysis

Analysis Framework

For general details on saving and storing parameter settings, and launching the analysis, see the Analysis tutorial page.


Setup

  • For this example, we use a list of four candidate MAPK markers, contained in a CSV format file. Right-click on the following link and save the file Mapk_list.csv to disk.
  • In the Component Configuration Manager, check whether the MINDy component has been loaded, and if not, load it.


Run

The figure illustrates the MINDy main parameter tab set up to run the example below.


Modulator list loaded from file:


MINDy parameters mapk run.png


Modulator list loaded from Marker set:


MINDy parameters mapk run fromset.png


  1. Load the Bcell-100.exp microarray dataset, which is available in the geWorkbench data directory under "public_data". If you wish to see gene names in the results, you must also load the associated annotation file. See e.g. the tutorial Local Data Files for further details.
  2. In the analysis tab (at lower right in the application), select MINDy Analysis.
  3. In the MINDy Parameters Main tab, populate the Modulators List by loading the file Mapk_list.csv. Or, the file can first be loaded into the Markers component with the "Load Set" button, then selected as "From Sets" in the Mindy parameters.
  4. Populate the Target List textbox by selecting the choice "All Markers".
  5. Set the hub gene to be marker (probeset) name 37724_at (MYC). Type in the marker name directly, or search for and select it in the Markers component.
  6. Parameter values for the conditional mutual information calculation can be set in the Advanced Tab. The values will depend on the specifics of the data set being used, in terms of number of arrays and number of markers. Here we use the default parameters:
    1. Sample per Condition: 35%
    2. Conditional: MI 0.1
    3. Unconditional: not used, control disabled.
    4. DPI target list: not used, control disabled.
    5. DPI tolerance: not used, control disabled.
  7. Click Analyze. If successful, the Workspace is updated to add the MINDy result node. The result node is shown as a child node of the input dataset Bcell-100.exp. Please note that the Dataset History tab captures the analysis parameters.

Viewing MINDy Results

General

1. The MINDy result node should be automatically selected in the Workspace once the result is available. If not, select it. This will display the MINDy result viewer.

2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on Select All to display all modulators in the Table, List, and Heat Map views. The Modulators Selected is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views.


Common Features

Net modulatory effect values

The first step of the MINDy algorithm is to sort the input expression arrays by the expression value for the candidate modulator. It then forms two groups of arrays, those where the candidate modulator is most highly expressed, and those where it is least expressed. Here we will refer to these as the "conditional high set" and the "conditional low set" - that is, they are sets of arrays conditioned on the expression of the candidate modulator.

The following symbols are used to break out the total, positive, and negative effects:

  • M# - For reach modulator, the total number of above-threshold transcription-factor-target (TF-Ti) MI scores found.
  • M+ - The number of targets for which the TF-Ti pairs showed higher MI in the conditional high set compared with the MI in the conditional low set.
  • M- - The number of targets for which the TF-Ti pairs showed lower MI in the conditional high set compared with the MI in the conditional low set.


Controls

Marker Display

Controls how the marker name is displayed. Options are:

  • Symbol - If an annotation file has been loaded, use the Gene Symbol associated with each marker.
  • Probe Name - Use the marker probe name as given in the dataset.

Add to Set

(Except Heat Map) - Adds selected markers to a Marker Set. You can select one or more Targets and/or Modulators, using the selection check-boxes.


Export

The results shown in the Modulator, Table, or List tabs can be exported to a CSV format file on disk using the "Export" button. Only the table in the currently displayed tab is exported.

Displayed targets filter

This menu is located at the bottom of the MINDy viewer component and controls the target markers displayed in the various view tabs just described. It contains a list of all markers sets available in the Markers component. Any one set can be chosen, and only MINDy targets which are also in this selected subset will be displayed (the intersection of the MINDy result set and the Marker set).

Note - Marker sets do not need to be activated to be used for result filtering here.

Values

  • All non-zero markers - all markers with delta (MI) > 0 are displayed.
  • Selection - This refers to the default "Selection" set in the Markers component.
  • any other marker set name - all available marker sets will be listed in the menu and any one can be chosen.


Modulator tab

MINDy mapk initial result modulator tab.png

This table-based view contains one row per modulator gene. It summarizes the results, and is used to control the targets displayed in the other view tabs.

Controls

  • List Selections
    • Select All checkbox - When checked, all modulators will be selected. If not checked, the individual markers can be selected using the individual check boxes in the table.
  • Modulators selected - Shows a count of the number of individual modulators that have been selected in the table.


Columns

  • Check-boxes - Use these to select which modulators to include in generating the data views on the other tabs (Table, List, and Heat Map).
  • Modulator - The gene symbol or probe name for the putative modulators tested.
  • Mode - Shows whether the net sum effect of the modulator over all its targets was enhancing or negative.
    • If M+ - M- > 0, the result is "+", that is the candidate had a net positive modulatory effect (increased MI).
    • If M+ - M- < 0, the result is "-", that is the candidate had a net negative modulatory effect (decreased MI).
    • If M+ - M- = 0, the result is "0", that is the candidate modulator had a balanced effect.


Here all four modulators in the example have been selected, activating the other tabs.

MINDy mapk initial result modulator tab select all.png

Table

MINDy mapk Table.png


The column "Target" represents the target genes and the remaining columns represent the modulators tested.

  • Discretization of scores - By default, the MI scores are discretized to +1 and -1 for positive and negative scores, respectively. Discretized scores are used to quantify the number of positive and negative modulation effects, as shown e.g. in the numbers in the column headers. If the "Score View" option is chosen, the actual scores will be shown.


Controls

If many modulators were tested, it may be desirable to sort the display by their results.

  • Display Options:
    • Color View: Enables a heat map display of each cell based on the value of the score. Positive values are displayed in shades of red, while negative values are displayed in shades of blue. The saturation of the color increases (starting from white for 0) with increasing absolute value of the score.
    • Score View: Displays the actual score values rather than the default discretized values.


Here, both the "Color View" and "Score View" options have been checked.


MINDy mapk Table display options1.png


The modulators in the above figure are by default sorted by the aggregate count of targets for which a modulatory effect was seen (M#).

The table can be sorted on the values in any column by clicking on its header. In addition, the column headers (the modulators) can be sorted as described next:

  • Modulator Sorting: - Displays columns (modulators) from left to right in descending order by the counts of: Aggregate ( M#), Enhancing (M+) or Negative (M-).
    • Aggregate (M#): The column header displays "M#" and the count of all targets for which a positive or negative modulatory effect was seen.
    • Enhancing (M+): The column header displays "M+" and the count of all targets for which a positive modulatory effect was seen.
    • Negative (M-): The column header displays "M-" and the count of all targets for which a negative modulatory effect was seen.

Example of sorting by "Enhancing": The first modulator column header is "MAP4K4 (M+ 103).


MINDy mapk Table display options enhancing.png


Example of sorting by "Negative": The first modulator column header is "MAP4K2 (M- 54).


MINDy mapk Table display options negative.png


  • Modulator Limits: When the checkbox is selected, the number of columns (modulators) is limited to the value set in the selector box.
  • Marker Selection
    • Enable Selection - When checked, a column of checkboxes appears in the table to allow individual selection of targets. Shows a count of all selected modulators and targets.
    • All Modulators - Select or Clear buttons - Selects or clear all modulator check boxes (table columns).
    • All Targets - Select or clear buttons - Select or clear all displayed targets (table rows).
    • Add to Set (button) - All selected markers (modulators and/or targets) will be added to a new set in the Markers component.


  • Displayed targets filter - The displayed target markers can be filtered using marker set defined in the Markers component. After the filter is selected, only those markers contained in the selected set will appear.

List

In the list view, all modulators are listed in the first column, their targets in the second column, while the third column contains the delta (MI) scores. That is, each modulator/target pair is listed individually.

This view has the advantage of displaying only actual data values. This contrasts with the Table view, where the results are displayed in a spreadsheet format. Because each modulator will have its own set of targets, not each modulator/target cell in the table will have a value. Results in the Table view are padded with zeros as necessary.


MINDy mapk List tab.png


  • Marker Selection (checkbox) - Controls which markers are used by the "Add to Set" button.
    • Enable Selection - When checked, a column of checkboxes appears beside each target and beside each marker to allow individual selection of each. Shows a count of all selected modulators and targets.
    • Select all Modulators - Selects all modulators.
    • Select all Targets - Selects all targets.
    • Add to Set (button) - All selected markers will be added to a new set in the Markers component.


Heat Map

The Heat Map represents the expression values for individual markers (target genes). It contains two color mosaic panels. The rows correspond to target genes and are ordered according to their Pearson's correlation to the expression of the TF. The columns (arrays) are ordered according to the expression of the TF gene, low (left) to high (right). The mosaic at left corresponds to the arrays where modulator was least expressed. The mosaic at right corresponds to the arrays where the modulator expression was highest.

Controls

  • Transcription Factor: Displays the TF hub gene entered in the MINDy Analysis parameters.
  • Modulators: - The heat map is generated for the targets of only one modulator at a time. The list shows the available modulators, and the text box above it shows the selected modulator.

Here the first modulator on the list is selected:


MINDy mapk Heat Map tab.png


As shown below, scrolling to the bottom of the Heat Map image shows how the effect of modulation can differ for different genes. The genes at top are directly correlated with MYC when MAP4K4 is low, whereas the genes at bottom are anti-correlated.


MINDy mapk Heat Map tab MAP4K4 lower.png


  • Displayed targets filter - The targets displayed in the Heat Map view can be limited to those defined in a marker set.
  • Image Snapshot: - Captures the Heat Map as an image node in the Workspace.


MINDy Heat Map node.png

References

  1. Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. (2006) Reverse Engineering Cellular Networks. Nature Protocols 1(2):662-671. link to pub..
  2. Wang K, Saito M, Bisikirska BC, Alvarez MJ, Lim WK, Rajbhandari P, Shen Q, Nemenman I, Basso K, Margolin AA, Klein U, Dalla-Favera R, Califano A. (2009) Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat Biotechnol. 27(9):829-39. link to pub..