Difference between revisions of "MINDy"
(→Heat Map) |
|||
(254 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{TutorialsTopNav}} | {{TutorialsTopNav}} | ||
− | |||
− | + | =MINDy Analysis= | |
− | + | The MINDy algorithm (Modulator Inference by Network Dynamics) uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti). This influence is measured in terms of whether there is a '''change in the correlation (measured as mutual information)''' of expression between the TF and its targets Ti conditional on a change in the expression of Mj. The change in correlation is calculated as the difference in mutual information ('''delta (MI)''') for each TF-Ti pair between the two conditions (modulator high or low). The mutual information values used in MINDy are calculated using the [[Tutorial_-_ARACNE|ARACNe]] algorithm, which is also a part of geWorkbench. | |
− | |||
− | |||
− | |||
+ | ==Outline of MINDy calculations== | ||
+ | # A microarray gene expression dataset is selected. | ||
+ | # The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti). | ||
+ | # Parameters for the MINDy run are set. | ||
+ | # Using the expression value of the chosen modulator gene Mj, the arrays in the experiment are ordered (as columns in the data matrix), from lowest to highest. | ||
+ | # Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression. The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+). The remaining arrays are not further considered. | ||
+ | # For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta (MI)). | ||
+ | # The resulting delta (MI)s are displayed. At present, a p-value is not calculated on the delta (MI). Larger values of delta (MI) may indicate an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the '''correlation of expression''' between the hub TF and the target gene. | ||
+ | # The sign of the influence of the modulator is also displayed. A positive modulation effect (+) is where high expression of the modulator gene Mj '''increases ''' the mutual information between the hub TF and the target gene. Likewise, A negative modulation effect (-) is where increasing the expression of the modulator gene Mj '''decreases''' the mutual information between the hub TF and the target gene. | ||
− | === | + | ==Prerequisites for MINDy calculations== |
+ | # '''Number of arrays''' - A microarray gene expression data set with a sufficient number of arrays must be present. For optimal results, at least 250 to 300 microarrays of a homogenous cellular system should be used, for example, isolable tumor cells or cell lines, with a range of different expression conditions (distinct cellular phenotypes). (300 arrays has been found to give good results, while 250 has been found to be an absolute minimum). | ||
+ | # '''Modulator expression variation''' - The expression of the modulator (Mj) must have a sufficient expression range to separate its two expression tails compared to the experimental noise level. Low variation markers can be removed by running the deviation filter (Filtering component) on the dataset before starting the MINDy calculation. | ||
+ | # '''Independence of modulator and TF hub''' - Any modulator (Mj) whose expression profile is not statistically independent of that of the hub transcription factor (TF) must be excluded. This can be determined using a mutual information calculation (ARACNe). This functionality is not currently directly implemented within MINDy in geWorkbench, but can be run directly using the ARACNe component. | ||
+ | # '''Note''' - The "Target List" also is used to represent all markers which will be used in the calculations. As such, all hub markers and candidate modulator markers must be included in this list. | ||
− | == | + | ==Parameters - Main== |
− | |||
− | + | [[Image:MINDy_Parameters_Main.png|{{ImageMaxWidth}}]] | |
− | |||
− | + | The Modulators List, Target List, and Hub Marker fields are populated using marker IDs as represented in the Markers component. Note that these are not gene names, but the identifiers of the particular markers (e.g. Affymetrix probesets) from the expression platform used to collect the data. | |
− | |||
− | + | ===Modulators List=== | |
− | + | The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component. The gene expression profiles of the modulators should be independent of that of the hub TF gene as measured by mutual information. This could be determined using a preliminary run of ARACNE including just the modulators and the transcription factor. | |
− | ''' | + | Modulators List pulldown menu options are: |
+ | * '''From File''' - Load a list of candidate modulators from a file (containing a comma separated list). | ||
+ | * '''From Set''' - Select a set of candidate modulators defined in the Markers component. When '''From Set''' is selected, entries can also be typed directly into the text box. | ||
− | + | '''Note''' - any markers in the modulator list must also appear in the target set (see Target List). | |
− | + | ===Target List=== | |
+ | The target list can include all markers or can be restricted to some subset of candidates e.g. thought to be regulated by the Hub Marker transcription factor. | ||
− | * ''' | + | Target List pulldown menu options are: |
+ | * '''All Markers''' - Run MINDy on all markers in the data set. | ||
+ | * '''From File''' - Load a list of target markers from a file (containing a comma separated list). | ||
+ | * '''From Set''' - Select a target marker set defined in the Markers component. | ||
− | |||
− | * ''' | + | * '''Important - Target list must also include the Hub Marker and all Modulator markers''' |
+ | ** The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker. The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations. | ||
+ | ** If "All Markers" is chosen, then no further attention to this point is required. | ||
+ | * Note - the "'''All Markers'''" ''checkbox'' at the bottom of the Analysis component should '''not''' be used in the MINDy component. | ||
− | |||
− | |||
− | |||
− | * | + | ===Hub Marker=== |
− | * | + | Enter the marker ID for a known or putative transcription factor gene. |
+ | * The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used, selected either in the list or in the default Marker set "Selection". | ||
+ | * '''Note''' - Even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component. | ||
+ | * '''Note''' - The hub marker must also appear in the target set (see Target List). | ||
− | |||
− | + | ===Note on non-use of activated marker sets=== | |
+ | Because the MINDy component allows the Modulator, Hub and Target marker sets to be chosen directly in its own interface, it does not respect marker sets that may be activated in the Markers component. | ||
− | + | ==Parameters - Advanced== | |
− | |||
− | + | [[Image:MINDy_Parameters_Advanced_.png|{{ImageMaxWidth}}]] | |
− | |||
− | + | ===Sample per Condition (%)=== | |
+ | MINDy calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used. | ||
− | |||
− | === | + | ===Conditional (threshold settings)=== |
+ | The underlying ARACNe calculation of the conditional mutual information allows a threshold to be set. The threshold for the conditional calculations can be specified as a raw mutual information score or as a P-value. An above-threshold MI value must be obtained in at least one of the two conditional ARACNe runs in order for the target to be included in the output data. | ||
− | 1. | + | Options: |
+ | * '''Mutual Info '''- If selected, the user specifies a threshold for the mutual information (MI) estimates in terms of the raw MI score. For example, a value of 0.1 filters out target genes with a MI score of less than 0.1 in both the high and low modulator expression sets. By default, a MI threshold of 0.1 is set. | ||
+ | ** Note - if the MI score is above threshold in one condition but not the other, the lower score will be set to zero when calculating delta (MI). | ||
+ | * '''P-value''' - If selected, the user specifies a threshold for the conditional mutual information estimate in terms of a p-value. This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 0.01. The specified p-value is converted to a MI threshold. | ||
− | + | * '''Correction''' - correct for multiple testing if a p-value is specified. The choices are | |
+ | ** None - no correction of the p-value | ||
+ | ** Bonferroni - apply the Bonferroni correction to the p-value before its is converted to a threshold MI score. | ||
+ | * Note on p-value calculation in MINDy in geWorkbench - The p-value calculation for the conditional runs of ARACNe is calculated using an approximation described in Margolin et al., 2006. | ||
− | + | ===Unconditional (threshold settings) (Not used in MINDy)=== | |
+ | The unconditional MI is intended for use in the calculation of statistical significance of the final delta (MI) score and is not currently used. This control is disabled. | ||
− | |||
− | |||
− | |||
− | + | ===ARACNe parameter files not supported in MINDy in geWorkbench=== | |
+ | ARACNe allows files config_threshold.txt and config_kernel.txt to be read in from disk if present. However, the version of ARACNe used in MINDy does not support this feature. It uses default parameters to derive the threshold and kernel width values. | ||
− | == | + | ==Important notes on the calculation== |
− | |||
− | |||
− | + | ===delta (MI)=== | |
+ | As implemented in geWorkbench, the significance of the delta (MI) values is not calculated. | ||
− | |||
+ | ===Marker and Array Selection=== | ||
+ | * '''Marker Sets''' - All marker selection is done within the MINDy component interface. If the option "From Sets" is chosen, one marker set from the Markers component can be selected. MINDy '''does not''' respect activated marker subsets in the Markers component - that is, checking the box next to a marker subset in the Markers component has no effect on the markers used for the Mindy calculation or display. | ||
− | + | * '''Array Sets''' - MINDy '''does''' respect array subsets activated in the Arrays component. That is, the arrays used can be limited to particular subsets by activating those subsets in the Arrays component (by checking the boxes next to them). | |
− | + | * '''Important - Target list must also include the Hub Marker and all Modulator markers''' | |
+ | ** The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker. The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations. | ||
+ | ** If "All Markers" is chosen, then no further attention to this point is required. | ||
− | ''' | + | * '''Testing of multiple modulators''' - When testing multiple modulators, consider the false-positive implications of multiple tests, even though no significance value is being calculated. |
− | + | ===ARACNe configuration files=== | |
+ | The following discussion of configuration files applies only to the local version of MINDy, not the grid version. On the grid version, only the default parameters for kernel width and threshold will be used. | ||
− | + | MINDy in geWorkbench uses the original, fixed-bandwidth version of ARACNe. This version of ARACNe uses two configuration files, config_kernel.txt and config_threshold.txt. If these two files are not supplied, default parameters will be used, which should be sufficient for most cases. The parameter files can also be generated using ARACNe2 in geWorkbench. However, the files will be named after the dataset from which they are generated, and must be renamed to config_kernel.txt and config_threshold.txt to be seen by ARACNe. Files with those names, if present in the geWorkbench installation root folder, will override any other dataset-specific configuration files for ARACNe2, and so should not be left on the system after MINDy has been run. | |
− | + | =Services (Grid)= | |
− | + | MINDy can be run either locally within geWorkbench, or remotely as a grid job on caGrid. See the [[Tutorial_-_Grid_Services | Grid Services]] section for further details on setting up a grid job. A Columbia grid login must be obtained to use the Columbia grid service. | |
− | |||
− | + | =Running an example MINDy Analysis= | |
+ | ==Analysis Framework== | ||
− | + | For general details on saving and storing parameter settings, and launching the analysis, see the [[Tutorial_-_Analysis|Analysis]] tutorial page. | |
− | |||
− | + | ==Setup== | |
+ | * For this example, we use a list of four candidate MAPK markers, contained in a CSV format file. Right-click on the following link and save the file [[Media:Mapk_list.csv|Mapk_list.csv]] to disk. | ||
+ | * In the Component Configuration Manager, check whether the MINDy component has been loaded, and if not, load it. | ||
− | |||
− | + | ==Run== | |
− | + | The figure illustrates the MINDy main parameter tab set up to run the example below. | |
− | |||
− | + | Modulator list loaded from file: | |
− | |||
− | + | [[Image:MINDy_parameters_mapk_run.png|{{ImageMaxWidth}}]] | |
− | |||
− | + | Modulator list loaded from Marker set: | |
− | |||
− | + | [[Image:MINDy_parameters_mapk_run_fromset.png|{{ImageMaxWidth}}]] | |
− | Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols | + | |
+ | |||
+ | # Load the Bcell-100.exp microarray dataset, which is available in the geWorkbench data directory under "public_data". If you wish to see gene names in the results, you must also load the associated annotation file. See e.g. the tutorial [[Local_Data_Files |Local Data Files]] for further details. | ||
+ | # In the analysis tab (at lower right in the application), select''' MINDy Analysis'''. | ||
+ | # In the MINDy Parameters Main tab, populate the ''' Modulators List''' by loading the file [[Media:Mapk_list.csv|Mapk_list.csv]]. Or, the file can first be loaded into the Markers component with the "Load Set" button, then selected as "From Sets" in the Mindy parameters. | ||
+ | # Populate the '''Target List''' textbox by selecting the choice "All Markers". | ||
+ | # Set the hub gene to be marker (probeset) name 37724_at (MYC). Type in the marker name directly, or search for and select it in the Markers component. | ||
+ | # Parameter values for the conditional mutual information calculation can be set in the Advanced Tab. The values will depend on the specifics of the data set being used, in terms of number of arrays and number of markers. Here we use the default parameters: | ||
+ | ## Sample per Condition: 35% | ||
+ | ## Conditional: MI 0.1 | ||
+ | ## Unconditional: not used, control disabled. | ||
+ | ## DPI target list: not used, control disabled. | ||
+ | ## DPI tolerance: not used, control disabled. | ||
+ | #Click '''Analyze'''. If successful, the [[Workspace]] is updated to add the MINDy result node. The result node is shown as a child node of the input dataset Bcell-100.exp. Please note that the Dataset History tab captures the analysis parameters. | ||
+ | |||
+ | =Viewing MINDy Results= | ||
+ | |||
+ | ==General== | ||
+ | |||
+ | 1. The MINDy result node should be automatically selected in the [[Workspace]] once the result is available. If not, select it. This will display the MINDy result viewer. | ||
+ | |||
+ | 2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on '''Select All''' to display all modulators in the Table, List, and Heat Map views. The '''Modulators Selected''' is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views. | ||
+ | |||
+ | |||
+ | ==Common Features== | ||
+ | |||
+ | ===Net modulatory effect values=== | ||
+ | |||
+ | The first step of the MINDy algorithm is to sort the input expression arrays by the expression value for the candidate modulator. It then forms two groups of arrays, those where the candidate modulator is most highly expressed, and those where it is least expressed. Here we will refer to these as the "conditional high set" and the "conditional low set" - that is, they are sets of arrays conditioned on the expression of the candidate modulator. | ||
+ | |||
+ | The following symbols are used to break out the total, positive, and negative effects: | ||
+ | |||
+ | * '''M#''' - For reach modulator, the total number of above-threshold transcription-factor-target (TF-Ti) MI scores found. | ||
+ | * '''M+''' - The number of targets for which the TF-Ti pairs showed higher MI in the conditional high set compared with the MI in the conditional low set. | ||
+ | * '''M-''' - The number of targets for which the TF-Ti pairs showed lower MI in the conditional high set compared with the MI in the conditional low set. | ||
+ | |||
+ | |||
+ | ===Controls=== | ||
+ | |||
+ | ====Marker Display==== | ||
+ | Controls how the marker name is displayed. Options are: | ||
+ | * '''Symbol''' - If an annotation file has been loaded, use the Gene Symbol associated with each marker. | ||
+ | * '''Probe Name''' - Use the marker probe name as given in the dataset. | ||
+ | |||
+ | ====Add to Set==== | ||
+ | (Except Heat Map) - Adds selected markers to a Marker Set. You can select one or more Targets and/or Modulators, using the selection check-boxes. | ||
+ | |||
+ | |||
+ | ====Export==== | ||
+ | The results shown in the Modulator, Table, or List tabs can be exported to a CSV format file on disk using the "Export" button. Only the table in the currently displayed tab is exported. | ||
+ | |||
+ | ===Displayed targets filter=== | ||
+ | This menu is located at the bottom of the MINDy viewer component and controls the target markers displayed in the various view tabs just described. It contains a list of all markers sets available in the Markers component. Any one set can be chosen, and only MINDy targets which are also in this selected subset will be displayed (the intersection of the MINDy result set and the Marker set). | ||
+ | |||
+ | '''Note''' - Marker sets do not need to be activated to be used for result filtering here. | ||
+ | |||
+ | ====Values==== | ||
+ | * '''All non-zero markers''' - all markers with delta (MI) > 0 are displayed. | ||
+ | * '''Selection''' - This refers to the default "Selection" set in the Markers component. | ||
+ | * any other marker set name - all available marker sets will be listed in the menu and any one can be chosen. | ||
+ | |||
+ | |||
+ | ==Modulator tab== | ||
+ | |||
+ | [[Image:MINDy_mapk_initial_result_modulator_tab.png|{{ImageMaxWidth}}]] | ||
+ | |||
+ | This table-based view contains one row per modulator gene. It summarizes the results, and is used to control the targets displayed in the other view tabs. | ||
+ | |||
+ | ===Controls=== | ||
+ | |||
+ | * '''List Selections''' | ||
+ | ** '''Select All''' checkbox - When checked, all modulators will be selected. If not checked, the individual markers can be selected using the individual check boxes in the table. | ||
+ | * '''Modulators selected''' - Shows a count of the number of individual modulators that have been selected in the table. | ||
+ | |||
+ | |||
+ | ===Columns=== | ||
+ | |||
+ | * M#, M+ and M- have already been described above under [[Tutorial_-_MINDy#Net_modulatory_effect_values |"Net modulatory effect values"]]. | ||
+ | |||
+ | * '''Check-boxes''' - Use these to select which modulators to include in generating the data views on the other tabs (Table, List, and Heat Map). | ||
+ | * '''Modulator''' - The gene symbol or probe name for the putative modulators tested. | ||
+ | |||
+ | * '''Mode''' - Shows whether the net sum effect of the modulator over all its targets was enhancing or negative. | ||
+ | ** If M+ - M- > 0, the result is "+", that is the candidate had a net positive modulatory effect (increased MI). | ||
+ | ** If M+ - M- < 0, the result is "-", that is the candidate had a net negative modulatory effect (decreased MI). | ||
+ | ** If M+ - M- = 0, the result is "0", that is the candidate modulator had a balanced effect. | ||
+ | |||
+ | |||
+ | Here all four modulators in the example have been selected, activating the other tabs. | ||
+ | |||
+ | [[Image:MINDy_mapk_initial_result_modulator_tab_select_all.png]] | ||
+ | |||
+ | ==Table== | ||
+ | |||
+ | |||
+ | [[Image:MINDy_mapk_Table.png|{{ImageMaxWidth}}]] | ||
+ | |||
+ | |||
+ | The column "Target" represents the target genes and the remaining columns represent the modulators tested. | ||
+ | |||
+ | * '''Discretization of scores''' - By default, the MI scores are discretized to +1 and -1 for positive and negative scores, respectively. Discretized scores are used to quantify the number of positive and negative modulation effects, as shown e.g. in the numbers in the column headers. If the "Score View" option is chosen, the actual scores will be shown. | ||
+ | |||
+ | |||
+ | ===Controls=== | ||
+ | If many modulators were tested, it may be desirable to sort the display by their results. | ||
+ | |||
+ | * '''Display Options:''' | ||
+ | ** '''Color View:''' Enables a heat map display of each cell based on the value of the score. Positive values are displayed in shades of red, while negative values are displayed in shades of blue. The saturation of the color increases (starting from white for 0) with increasing absolute value of the score. | ||
+ | ** '''Score View:''' Displays the actual score values rather than the default discretized values. | ||
+ | |||
+ | |||
+ | Here, both the "Color View" and "Score View" options have been checked. | ||
+ | |||
+ | |||
+ | [[Image:MINDy_mapk_Table_display_options1.png|{{ImageMaxWidth}}]] | ||
+ | |||
+ | |||
+ | The modulators in the above figure are by default sorted by the aggregate count of targets for which a modulatory effect was seen (M#). | ||
+ | |||
+ | The table can be sorted on the values in any column by clicking on its header. In addition, the column headers (the modulators) can be sorted as described next: | ||
+ | |||
+ | * '''Modulator Sorting:''' - Displays columns (modulators) from left to right in descending order by the counts of: Aggregate ( M#), Enhancing (M+) or Negative (M-). | ||
+ | ** '''Aggregate (M#)''': The column header displays "M#" and the count of all targets for which a positive or negative modulatory effect was seen. | ||
+ | ** '''Enhancing (M+)''': The column header displays "M+" and the count of all targets for which a positive modulatory effect was seen. | ||
+ | ** '''Negative (M-)''': The column header displays "M-" and the count of all targets for which a negative modulatory effect was seen. | ||
+ | |||
+ | Example of sorting by "Enhancing": The first modulator column header is "MAP4K4 (M+ 103). | ||
+ | |||
+ | |||
+ | [[Image:MINDy_mapk_Table_display_options_enhancing.png|{{ImageMaxWidth}}]] | ||
+ | |||
+ | |||
+ | Example of sorting by "Negative": The first modulator column header is "MAP4K2 (M- 54). | ||
+ | |||
+ | |||
+ | [[Image:MINDy_mapk_Table_display_options_negative.png|{{ImageMaxWidth}}]] | ||
+ | |||
+ | |||
+ | * '''Modulator Limits:''' When the checkbox is selected, the number of columns (modulators) is limited to the value set in the selector box. | ||
+ | |||
+ | * '''Marker Selection''' | ||
+ | ** '''Enable Selection''' - When checked, a column of checkboxes appears in the table to allow individual selection of targets. Shows a count of all selected modulators and targets. | ||
+ | ** '''All Modulators''' - Select or Clear buttons - Selects or clear all modulator check boxes (table columns). | ||
+ | ** '''All Targets''' - Select or clear buttons - Select or clear all displayed targets (table rows). | ||
+ | ** '''Add to Set''' (button) - All selected markers (modulators and/or targets) will be added to a new set in the Markers component. | ||
+ | |||
+ | |||
+ | * '''Displayed targets filter''' - The displayed target markers can be filtered using marker set defined in the Markers component. After the filter is selected, only those markers contained in the selected set will appear. | ||
+ | |||
+ | ==List== | ||
+ | |||
+ | In the list view, all modulators are listed in the first column, their targets in the second column, while the third column contains the delta (MI) scores. That is, each modulator/target pair is listed individually. | ||
+ | |||
+ | This view has the advantage of displaying only actual data values. This contrasts with the Table view, where the results are displayed in a spreadsheet format. Because each modulator will have its own set of targets, not each modulator/target cell in the table will have a value. Results in the Table view are padded with zeros as necessary. | ||
+ | |||
+ | |||
+ | [[Image:MINDy_mapk_List_tab.png]] | ||
+ | |||
+ | |||
+ | * '''Marker Selection''' (checkbox) - Controls which markers are used by the "Add to Set" button. | ||
+ | ** '''Enable Selection''' - When checked, a column of checkboxes appears beside each target and beside each marker to allow individual selection of each. Shows a count of all selected modulators and targets. | ||
+ | ** '''Select all Modulators''' - Selects all modulators. | ||
+ | ** '''Select all Targets''' - Selects all targets. | ||
+ | ** '''Add to Set''' (button) - All selected markers will be added to a new set in the Markers component. | ||
+ | |||
+ | |||
+ | ==Heat Map== | ||
+ | The Heat Map represents the expression values for individual markers (target genes). It contains two color mosaic panels. The rows correspond to target genes and are ordered according to their Pearson's correlation to the expression of the TF. The columns (arrays) are ordered according to the expression of the TF gene, low (left) to high (right). The mosaic at left corresponds to the arrays where modulator was least expressed. The mosaic at right corresponds to the arrays where the modulator expression was highest. | ||
+ | |||
+ | ===Controls=== | ||
+ | |||
+ | * '''Transcription Factor:''' Displays the TF hub gene entered in the MINDy Analysis parameters. | ||
+ | |||
+ | * '''Modulators: ''' - The heat map is generated for the targets of only one modulator at a time. The list shows the available modulators, and the text box above it shows the selected modulator. | ||
+ | |||
+ | Here the first modulator on the list is selected: | ||
+ | |||
+ | |||
+ | [[Image:MINDy_mapk_Heat_Map_tab.png|{{ImageMaxWidth}}]] | ||
+ | |||
+ | |||
+ | As shown below, scrolling to the bottom of the Heat Map image shows how the effect of modulation can differ for different genes. The genes at top are directly correlated with MYC when MAP4K4 is low, whereas the genes at bottom are anti-correlated. | ||
+ | |||
+ | |||
+ | [[Image:MINDy_mapk_Heat_Map_tab_MAP4K4_lower.png|{{ImageMaxWidth}}]] | ||
+ | |||
+ | |||
+ | * '''Displayed targets filter''' - The targets displayed in the Heat Map view can be limited to those defined in a marker set. | ||
+ | |||
+ | * '''Image Snapshot:''' - Captures the Heat Map as an image node in the [[Workspace]]. | ||
+ | |||
+ | |||
+ | [[Image:MINDy_Heat_Map_node.png]] | ||
+ | |||
+ | =References= | ||
+ | |||
+ | # Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. (2006) Reverse Engineering Cellular Networks. Nature Protocols 1(2):662-671. [http://www.ncbi.nlm.nih.gov/pubmed/17406294 link to pub.]. | ||
+ | # Wang K, Saito M, Bisikirska BC, Alvarez MJ, Lim WK, Rajbhandari P, Shen Q, Nemenman I, Basso K, Margolin AA, Klein U, Dalla-Favera R, Califano A. (2009) Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat Biotechnol. 27(9):829-39. [http://www.ncbi.nlm.nih.gov/pubmed/19741643 link to pub.]. |
Latest revision as of 18:52, 22 January 2014
Contents
MINDy Analysis
The MINDy algorithm (Modulator Inference by Network Dynamics) uses gene expression data to determine whether a putative modulator gene (Mj) influences the regulatory activity of a transcription factor gene (TF) over a set of target genes (Ti). This influence is measured in terms of whether there is a change in the correlation (measured as mutual information) of expression between the TF and its targets Ti conditional on a change in the expression of Mj. The change in correlation is calculated as the difference in mutual information (delta (MI)) for each TF-Ti pair between the two conditions (modulator high or low). The mutual information values used in MINDy are calculated using the ARACNe algorithm, which is also a part of geWorkbench.
Outline of MINDy calculations
- A microarray gene expression dataset is selected.
- The user specifies a set of one or more candidate modulator genes (Mj), a hub transcription factor (TF), and a set of putative targets of the transcription factor (Ti).
- Parameters for the MINDy run are set.
- Using the expression value of the chosen modulator gene Mj, the arrays in the experiment are ordered (as columns in the data matrix), from lowest to highest.
- Two subsets of arrays are then chosen from each end (tail) of the ordered list. One subset contains arrays in which Mj shows the lowest expression, and the other subset contains arrays in which Mj shows the highest expression. The subsets are non-overlapping. A typical trial might involve assigning the lowest 35% of the arrays to the low group (M-), as measured by expression of Mj, and the highest 35% to the high group (M+). The remaining arrays are not further considered.
- For each target Ti, the conditional mutual information between the hub TF and the target is then calculated for the array subsets M+ and M- separately, and the difference is taken (delta (MI)).
- The resulting delta (MI)s are displayed. At present, a p-value is not calculated on the delta (MI). Larger values of delta (MI) may indicate an interesting change in the mutual information conditional on the expression of the modulator, that is, the modulator has an effect on the correlation of expression between the hub TF and the target gene.
- The sign of the influence of the modulator is also displayed. A positive modulation effect (+) is where high expression of the modulator gene Mj increases the mutual information between the hub TF and the target gene. Likewise, A negative modulation effect (-) is where increasing the expression of the modulator gene Mj decreases the mutual information between the hub TF and the target gene.
Prerequisites for MINDy calculations
- Number of arrays - A microarray gene expression data set with a sufficient number of arrays must be present. For optimal results, at least 250 to 300 microarrays of a homogenous cellular system should be used, for example, isolable tumor cells or cell lines, with a range of different expression conditions (distinct cellular phenotypes). (300 arrays has been found to give good results, while 250 has been found to be an absolute minimum).
- Modulator expression variation - The expression of the modulator (Mj) must have a sufficient expression range to separate its two expression tails compared to the experimental noise level. Low variation markers can be removed by running the deviation filter (Filtering component) on the dataset before starting the MINDy calculation.
- Independence of modulator and TF hub - Any modulator (Mj) whose expression profile is not statistically independent of that of the hub transcription factor (TF) must be excluded. This can be determined using a mutual information calculation (ARACNe). This functionality is not currently directly implemented within MINDy in geWorkbench, but can be run directly using the ARACNe component.
- Note - The "Target List" also is used to represent all markers which will be used in the calculations. As such, all hub markers and candidate modulator markers must be included in this list.
Parameters - Main
The Modulators List, Target List, and Hub Marker fields are populated using marker IDs as represented in the Markers component. Note that these are not gene names, but the identifiers of the particular markers (e.g. Affymetrix probesets) from the expression platform used to collect the data.
Modulators List
The list of candidate modulators can either be loaded from a file as a comma separated list, or a set of markers can be selected from the Markers component. The gene expression profiles of the modulators should be independent of that of the hub TF gene as measured by mutual information. This could be determined using a preliminary run of ARACNE including just the modulators and the transcription factor.
Modulators List pulldown menu options are:
- From File - Load a list of candidate modulators from a file (containing a comma separated list).
- From Set - Select a set of candidate modulators defined in the Markers component. When From Set is selected, entries can also be typed directly into the text box.
Note - any markers in the modulator list must also appear in the target set (see Target List).
Target List
The target list can include all markers or can be restricted to some subset of candidates e.g. thought to be regulated by the Hub Marker transcription factor.
Target List pulldown menu options are:
- All Markers - Run MINDy on all markers in the data set.
- From File - Load a list of target markers from a file (containing a comma separated list).
- From Set - Select a target marker set defined in the Markers component.
- Important - Target list must also include the Hub Marker and all Modulator markers
- The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker. The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations.
- If "All Markers" is chosen, then no further attention to this point is required.
- Note - the "All Markers" checkbox at the bottom of the Analysis component should not be used in the MINDy component.
Hub Marker
Enter the marker ID for a known or putative transcription factor gene.
- The Hub marker can be entered directly in the text field, or the most recently selected marker in the Markers component will be used, selected either in the list or in the default Marker set "Selection".
- Note - Even if one directly types in a marker name, it will be replaced if any selection is made in the Markers component.
- Note - The hub marker must also appear in the target set (see Target List).
Note on non-use of activated marker sets
Because the MINDy component allows the Modulator, Hub and Target marker sets to be chosen directly in its own interface, it does not respect marker sets that may be activated in the Markers component.
Parameters - Advanced
Sample per Condition (%)
MINDy calculates the difference in mutual information for the TF-Target interaction between the set where the modulator gene is most expressed (+) and the set where the modulator gene is least expressed (-). This parameter specifies the percentage of the available samples to include in each group. E.g. 35% means that the top and bottom 35% of a list of samples ranked by expression would be used.
Conditional (threshold settings)
The underlying ARACNe calculation of the conditional mutual information allows a threshold to be set. The threshold for the conditional calculations can be specified as a raw mutual information score or as a P-value. An above-threshold MI value must be obtained in at least one of the two conditional ARACNe runs in order for the target to be included in the output data.
Options:
- Mutual Info - If selected, the user specifies a threshold for the mutual information (MI) estimates in terms of the raw MI score. For example, a value of 0.1 filters out target genes with a MI score of less than 0.1 in both the high and low modulator expression sets. By default, a MI threshold of 0.1 is set.
- Note - if the MI score is above threshold in one condition but not the other, the lower score will be set to zero when calculating delta (MI).
- P-value - If selected, the user specifies a threshold for the conditional mutual information estimate in terms of a p-value. This is a value between 0 and 1, with 1 indicating no threshold. By default, the value is 0.01. The specified p-value is converted to a MI threshold.
- Correction - correct for multiple testing if a p-value is specified. The choices are
- None - no correction of the p-value
- Bonferroni - apply the Bonferroni correction to the p-value before its is converted to a threshold MI score.
- Note on p-value calculation in MINDy in geWorkbench - The p-value calculation for the conditional runs of ARACNe is calculated using an approximation described in Margolin et al., 2006.
Unconditional (threshold settings) (Not used in MINDy)
The unconditional MI is intended for use in the calculation of statistical significance of the final delta (MI) score and is not currently used. This control is disabled.
ARACNe parameter files not supported in MINDy in geWorkbench
ARACNe allows files config_threshold.txt and config_kernel.txt to be read in from disk if present. However, the version of ARACNe used in MINDy does not support this feature. It uses default parameters to derive the threshold and kernel width values.
Important notes on the calculation
delta (MI)
As implemented in geWorkbench, the significance of the delta (MI) values is not calculated.
Marker and Array Selection
- Marker Sets - All marker selection is done within the MINDy component interface. If the option "From Sets" is chosen, one marker set from the Markers component can be selected. MINDy does not respect activated marker subsets in the Markers component - that is, checking the box next to a marker subset in the Markers component has no effect on the markers used for the Mindy calculation or display.
- Array Sets - MINDy does respect array subsets activated in the Arrays component. That is, the arrays used can be limited to particular subsets by activating those subsets in the Arrays component (by checking the boxes next to them).
- Important - Target list must also include the Hub Marker and all Modulator markers
- The MINDy main parameters tab requires the selection of Modulators, Targets, and a hub marker. The Target List must also contain the Hub Marker and all the Modulator markers, because a single expression profile dataset is transferred to the algorithm for calculations.
- If "All Markers" is chosen, then no further attention to this point is required.
- Testing of multiple modulators - When testing multiple modulators, consider the false-positive implications of multiple tests, even though no significance value is being calculated.
ARACNe configuration files
The following discussion of configuration files applies only to the local version of MINDy, not the grid version. On the grid version, only the default parameters for kernel width and threshold will be used.
MINDy in geWorkbench uses the original, fixed-bandwidth version of ARACNe. This version of ARACNe uses two configuration files, config_kernel.txt and config_threshold.txt. If these two files are not supplied, default parameters will be used, which should be sufficient for most cases. The parameter files can also be generated using ARACNe2 in geWorkbench. However, the files will be named after the dataset from which they are generated, and must be renamed to config_kernel.txt and config_threshold.txt to be seen by ARACNe. Files with those names, if present in the geWorkbench installation root folder, will override any other dataset-specific configuration files for ARACNe2, and so should not be left on the system after MINDy has been run.
Services (Grid)
MINDy can be run either locally within geWorkbench, or remotely as a grid job on caGrid. See the Grid Services section for further details on setting up a grid job. A Columbia grid login must be obtained to use the Columbia grid service.
Running an example MINDy Analysis
Analysis Framework
For general details on saving and storing parameter settings, and launching the analysis, see the Analysis tutorial page.
Setup
- For this example, we use a list of four candidate MAPK markers, contained in a CSV format file. Right-click on the following link and save the file Mapk_list.csv to disk.
- In the Component Configuration Manager, check whether the MINDy component has been loaded, and if not, load it.
Run
The figure illustrates the MINDy main parameter tab set up to run the example below.
Modulator list loaded from file:
Modulator list loaded from Marker set:
- Load the Bcell-100.exp microarray dataset, which is available in the geWorkbench data directory under "public_data". If you wish to see gene names in the results, you must also load the associated annotation file. See e.g. the tutorial Local Data Files for further details.
- In the analysis tab (at lower right in the application), select MINDy Analysis.
- In the MINDy Parameters Main tab, populate the Modulators List by loading the file Mapk_list.csv. Or, the file can first be loaded into the Markers component with the "Load Set" button, then selected as "From Sets" in the Mindy parameters.
- Populate the Target List textbox by selecting the choice "All Markers".
- Set the hub gene to be marker (probeset) name 37724_at (MYC). Type in the marker name directly, or search for and select it in the Markers component.
- Parameter values for the conditional mutual information calculation can be set in the Advanced Tab. The values will depend on the specifics of the data set being used, in terms of number of arrays and number of markers. Here we use the default parameters:
- Sample per Condition: 35%
- Conditional: MI 0.1
- Unconditional: not used, control disabled.
- DPI target list: not used, control disabled.
- DPI tolerance: not used, control disabled.
- Click Analyze. If successful, the Workspace is updated to add the MINDy result node. The result node is shown as a child node of the input dataset Bcell-100.exp. Please note that the Dataset History tab captures the analysis parameters.
Viewing MINDy Results
General
1. The MINDy result node should be automatically selected in the Workspace once the result is available. If not, select it. This will display the MINDy result viewer.
2. In the Modulator Tab, indicate the modulators of interest using the checkboxes or click on Select All to display all modulators in the Table, List, and Heat Map views. The Modulators Selected is updated to reflect the number of modulators selected. Only selected Modulators are displayed on the Table, List and Heat Map views.
Common Features
Net modulatory effect values
The first step of the MINDy algorithm is to sort the input expression arrays by the expression value for the candidate modulator. It then forms two groups of arrays, those where the candidate modulator is most highly expressed, and those where it is least expressed. Here we will refer to these as the "conditional high set" and the "conditional low set" - that is, they are sets of arrays conditioned on the expression of the candidate modulator.
The following symbols are used to break out the total, positive, and negative effects:
- M# - For reach modulator, the total number of above-threshold transcription-factor-target (TF-Ti) MI scores found.
- M+ - The number of targets for which the TF-Ti pairs showed higher MI in the conditional high set compared with the MI in the conditional low set.
- M- - The number of targets for which the TF-Ti pairs showed lower MI in the conditional high set compared with the MI in the conditional low set.
Controls
Marker Display
Controls how the marker name is displayed. Options are:
- Symbol - If an annotation file has been loaded, use the Gene Symbol associated with each marker.
- Probe Name - Use the marker probe name as given in the dataset.
Add to Set
(Except Heat Map) - Adds selected markers to a Marker Set. You can select one or more Targets and/or Modulators, using the selection check-boxes.
Export
The results shown in the Modulator, Table, or List tabs can be exported to a CSV format file on disk using the "Export" button. Only the table in the currently displayed tab is exported.
Displayed targets filter
This menu is located at the bottom of the MINDy viewer component and controls the target markers displayed in the various view tabs just described. It contains a list of all markers sets available in the Markers component. Any one set can be chosen, and only MINDy targets which are also in this selected subset will be displayed (the intersection of the MINDy result set and the Marker set).
Note - Marker sets do not need to be activated to be used for result filtering here.
Values
- All non-zero markers - all markers with delta (MI) > 0 are displayed.
- Selection - This refers to the default "Selection" set in the Markers component.
- any other marker set name - all available marker sets will be listed in the menu and any one can be chosen.
Modulator tab
This table-based view contains one row per modulator gene. It summarizes the results, and is used to control the targets displayed in the other view tabs.
Controls
- List Selections
- Select All checkbox - When checked, all modulators will be selected. If not checked, the individual markers can be selected using the individual check boxes in the table.
- Modulators selected - Shows a count of the number of individual modulators that have been selected in the table.
Columns
- M#, M+ and M- have already been described above under "Net modulatory effect values".
- Check-boxes - Use these to select which modulators to include in generating the data views on the other tabs (Table, List, and Heat Map).
- Modulator - The gene symbol or probe name for the putative modulators tested.
- Mode - Shows whether the net sum effect of the modulator over all its targets was enhancing or negative.
- If M+ - M- > 0, the result is "+", that is the candidate had a net positive modulatory effect (increased MI).
- If M+ - M- < 0, the result is "-", that is the candidate had a net negative modulatory effect (decreased MI).
- If M+ - M- = 0, the result is "0", that is the candidate modulator had a balanced effect.
Here all four modulators in the example have been selected, activating the other tabs.
Table
The column "Target" represents the target genes and the remaining columns represent the modulators tested.
- Discretization of scores - By default, the MI scores are discretized to +1 and -1 for positive and negative scores, respectively. Discretized scores are used to quantify the number of positive and negative modulation effects, as shown e.g. in the numbers in the column headers. If the "Score View" option is chosen, the actual scores will be shown.
Controls
If many modulators were tested, it may be desirable to sort the display by their results.
- Display Options:
- Color View: Enables a heat map display of each cell based on the value of the score. Positive values are displayed in shades of red, while negative values are displayed in shades of blue. The saturation of the color increases (starting from white for 0) with increasing absolute value of the score.
- Score View: Displays the actual score values rather than the default discretized values.
Here, both the "Color View" and "Score View" options have been checked.
The modulators in the above figure are by default sorted by the aggregate count of targets for which a modulatory effect was seen (M#).
The table can be sorted on the values in any column by clicking on its header. In addition, the column headers (the modulators) can be sorted as described next:
- Modulator Sorting: - Displays columns (modulators) from left to right in descending order by the counts of: Aggregate ( M#), Enhancing (M+) or Negative (M-).
- Aggregate (M#): The column header displays "M#" and the count of all targets for which a positive or negative modulatory effect was seen.
- Enhancing (M+): The column header displays "M+" and the count of all targets for which a positive modulatory effect was seen.
- Negative (M-): The column header displays "M-" and the count of all targets for which a negative modulatory effect was seen.
Example of sorting by "Enhancing": The first modulator column header is "MAP4K4 (M+ 103).
Example of sorting by "Negative": The first modulator column header is "MAP4K2 (M- 54).
- Modulator Limits: When the checkbox is selected, the number of columns (modulators) is limited to the value set in the selector box.
- Marker Selection
- Enable Selection - When checked, a column of checkboxes appears in the table to allow individual selection of targets. Shows a count of all selected modulators and targets.
- All Modulators - Select or Clear buttons - Selects or clear all modulator check boxes (table columns).
- All Targets - Select or clear buttons - Select or clear all displayed targets (table rows).
- Add to Set (button) - All selected markers (modulators and/or targets) will be added to a new set in the Markers component.
- Displayed targets filter - The displayed target markers can be filtered using marker set defined in the Markers component. After the filter is selected, only those markers contained in the selected set will appear.
List
In the list view, all modulators are listed in the first column, their targets in the second column, while the third column contains the delta (MI) scores. That is, each modulator/target pair is listed individually.
This view has the advantage of displaying only actual data values. This contrasts with the Table view, where the results are displayed in a spreadsheet format. Because each modulator will have its own set of targets, not each modulator/target cell in the table will have a value. Results in the Table view are padded with zeros as necessary.
- Marker Selection (checkbox) - Controls which markers are used by the "Add to Set" button.
- Enable Selection - When checked, a column of checkboxes appears beside each target and beside each marker to allow individual selection of each. Shows a count of all selected modulators and targets.
- Select all Modulators - Selects all modulators.
- Select all Targets - Selects all targets.
- Add to Set (button) - All selected markers will be added to a new set in the Markers component.
Heat Map
The Heat Map represents the expression values for individual markers (target genes). It contains two color mosaic panels. The rows correspond to target genes and are ordered according to their Pearson's correlation to the expression of the TF. The columns (arrays) are ordered according to the expression of the TF gene, low (left) to high (right). The mosaic at left corresponds to the arrays where modulator was least expressed. The mosaic at right corresponds to the arrays where the modulator expression was highest.
Controls
- Transcription Factor: Displays the TF hub gene entered in the MINDy Analysis parameters.
- Modulators: - The heat map is generated for the targets of only one modulator at a time. The list shows the available modulators, and the text box above it shows the selected modulator.
Here the first modulator on the list is selected:
As shown below, scrolling to the bottom of the Heat Map image shows how the effect of modulation can differ for different genes. The genes at top are directly correlated with MYC when MAP4K4 is low, whereas the genes at bottom are anti-correlated.
- Displayed targets filter - The targets displayed in the Heat Map view can be limited to those defined in a marker set.
- Image Snapshot: - Captures the Heat Map as an image node in the Workspace.
References
- Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. (2006) Reverse Engineering Cellular Networks. Nature Protocols 1(2):662-671. link to pub..
- Wang K, Saito M, Bisikirska BC, Alvarez MJ, Lim WK, Rajbhandari P, Shen Q, Nemenman I, Basso K, Margolin AA, Klein U, Dalla-Favera R, Califano A. (2009) Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat Biotechnol. 27(9):829-39. link to pub..