Difference between revisions of "Master Regulator Analysis"

(Prerequisites)
(References)
 
(177 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{TutorialsTopNav}}
 
{{TutorialsTopNav}}
 +
  
 
=Overview=
 
=Overview=
Regulatory activity in the context of specific cellular phenotypes can be modeled using interaction networks. These are graphs where nodes represent genes and an edge between two nodes A and B means that genes ''A'' and ''B'' are participants in the same regulatory activity. E.g., ''A'' can be a transcription factor for ''B''; or, ''A'' can be an miRNA that silences ''B''. Analysis of such regulatory networks [[#Basso2005 | [Basso et al., 2005]]] has convincingly demonstrated their scale-free nature which is dominated by a relatively small number of nodes with a large degree of connectivity. The genes corresponding to those nodes are known as "master regulators" and collectively orchestrate the regulatory program of the underlying cellular phenotype(s).  
+
Regulatory activity in the context of specific cellular phenotypes can be investigated using interaction networks. These are graphs where nodes represent genes and an edge between two nodes A and B means that genes ''A'' and ''B'' are participants in the same regulatory activity. E.g., ''A'' can be a transcription factor for ''B''; or, ''A'' can be an miRNA that silences ''B''. Analysis of such regulatory networks [[#Basso2005 | [Basso et al., 2005]]] has convincingly demonstrated their scale-free nature which is dominated by a relatively small number of nodes with a large degree of connectivity. The genes corresponding to those nodes are known as "master regulators" and collectively orchestrate the regulatory program of the underlying cellular phenotype(s).  
  
Master Regulator analysis [[#Lefebvre2010 | [Lefebvre et al., 2010]]] is an algorithm used to identify transcription factors whose targets (e.g., as represented in an ARACNe-generated interactome) are enriched for a particular gene signature.  The enrichment is evaluated using a statistical test such as Fisher’s exact test.
+
Master Regulator analysis [[#Lefebvre2010 | [Lefebvre et al., 2010]]] is an algorithm used to identify transcription factors whose targets (e.g., as represented in an ARACNe-generated interactome) are enriched for a particular gene signature (e.g. a list of differentially expressed genes).  The enrichment is evaluated using a statistical test such as Fisher’s exact test or GSEA.  The objective is to place the signature genes within a regulatory context and identify the master regulators responsible for coordinating their activity, thus highlighting the regulatory apparatus driving phenotypic differentiation.  
  
The master regulator analysis (MRA) component in geWorkbench combines regulatory information from interaction networks with differential expression analysis. The objective is to place differentially expressed genes within a regulatory context and identify the master regulators responsible for coordinating their regulation, thus highlighting the regulatory apparatus driving phenotypic differentiation. Specifically, given an interaction network ''I'', a (presumed) master regulator gene ''A'', and two sets of microarrays representing two distinct phenotypes, MRA computes the intersection between two sets of genes:
+
Specifically, given an interaction network ''I'', a (presumed) master regulator gene ''A'', and a set of signature genes, MRA computes the enrichment of the signature genes in the regulon of ''A'', where the regulon of ''A'' is defined as its neighbors in the interaction network ''I''.
# The neighbors of ''A'' in the interaction network ''I'' (this gene set is called the '''''regulon''''' of ''A'').
 
# The set of differentially expressed genes in the array data from the two phenotypes of interest.
 
  
Fisher’s exact test is then used to quantify how likely it is to encounter an intersection of the observed size by chance alone. A small p-value is taken to imply that gene ''A'' may play a significant role in mediating the regulatory program that leads to the differential phenotypes.
+
Interaction networks are represented as "adjacency matrices".  An adjacency matrix lists the connections that each node takes part in, and includes a measure of the strength of that interaction (e.g. the mutual information in the case of matrices generated by ARACNe).
  
Interaction networks are represented as "adjacency matrices"An adjacency matrix lists the connections that each node takes part in, and includes a measure of the strength of that interaction (e.g. the mutual information in the case of matrices generated by ARACNe).
+
Their are two master regulator analysis components implementing different methods to evaluate the enrichment of the signature in the regulonEither method will quantify how likely it is to encounter an enrichment of (at least) the observed size by chance alone. A small p-value is taken to imply that gene ''A'' may play a significant role in mediating the regulatory program that leads to the differential phenotypes.
  
=Setting up an MRA run=
+
* '''[[MRA-FET|FET Method (local service)]]''' - this method use Fisher's Exact Test.  This method is implemented locally in geWorkbench.
 +
* '''[[MARINa|MARINa Method *(grid service)]]''' - this method uses GSEA and differs in substantial ways from the FET-based method.  This method is only implemented as a grid service and currently has restricted availability due to its computational cost.  A t-test between two phenotype classes is built in to the implementation to produce the gene signature.
  
==Prerequisites==
+
The MARINa method can use sample shuffling to correct for non-independance between the expression of various genes.  Sample shuffling is not implemented for the MRA-FET method and hence in that method, the p-values are not directly comparable between genes.
* First confirm that the Master Regulator Analysis (MRA) component is loaded in the [[Tutorial_-_Component_Configuration_Manager | Component Configuration Manager]].
 
* The MRA component will be listed along with the other analysis routines within the geWorkbench Analysis Panel.
 
  
[[Image:MRA_Parameters_panel.png]]
+
Please note that MARINa does not employ any of the special gene lists available for use with the GSEA algorithm, such as [http://www.broadinstitute.org/gsea/msigdb/index.jsp. MSigDB].  It uses only a calculated list of differentially expressed genes and the regulon of the TF being tested.
  
* A microarray dataset from a gene expression experiment (comprising measurements from multiple arrays/samples).  A t-test will be run to generate the graphic result display.
+
With the release of geWorkbench 2.5.0, MRA-FET and MRA-MARINa are located in two separate sets of components, which can be loaded in the [[Component_Configuration_Manager| CCM]].
* A list of signature genes derived from the same dataset. This list may come from a t-test, clustering, or a combination of methods.  The user must define this set using methods relevant to the particular dataset and study goals.
 
* A list of candidate master regulator genes.  This may be comprised of e.g. transcription factors and signalling pathway genes.
 
  
* Two array sets (each containing a distinct subset of arrays from the expression experiment) have been selected in the "Arrays/Phenotypes" component, representing the two phenotypes under investigation. One of the array sets (identified by the red-color pin) has been classified as containing the "Case" arrays (i.e., one of the 2 phenotypes) while the other one contains the "Controls" (the second phenotype):
+
=Setting up an MRA run=
  
[[Image:Array_set_class_assignment_MRA.png]]
+
==Prerequisites==
 +
* Either or both Master Regulator Analysis (MRA) components, MRA-FET and MRA-MARINa, must be loaded in the [[Component_Configuration_Manager | Component Configuration Manager]].
  
<u>'''NOTE'''</u>:The "Case"/"Control" assignment is needed because, in general, it is possible to select more than one array sets for this analysis. In that case it is necessary to explicitly designate which belong to each of the 2 phenotypes.
+
[[Image:MRA-MARINa-CCM.png]]
  
==Parameters and Settings==
 
  
===Load Network===
+
===Gene Expression dataset===
There are 2 ways to designate the interaction network, represented by an adjacency matrix, that will be used for computing the regulons of the candidate master regulator genes:
+
A gene expression dataset in which the phenotypic signature was identified or can be demonstrated.  A t-test of differential expression will be run to generate the graphic "bar code" display of the effect of the master regulator on its regulon  or to generate the signature gene list (MARINa method).
* '''From File''': by choosing a file that describes a network.
 
* '''From Project''': by selecting an adjacency matrix node from the Project Folders component. Several analytical components in geWorkbench (e.g., [[Tutorial_-_ARACNE | ARACNE]]) produce adjacency matrix results nodes that can be utilized for this purpose.
 
  
===Master Regulators===
+
===Interaction Network===
There are 2 ways to designate the candidate master regulator genes to use:
+
An interaction network in the form of an adjacency matrix (See [[File_Formats|File Formats]].  Networks can be loaded from a file, or calculated with ARACNe from a dataset which includes the particular cellular phenotypes being investigated. If calculating the network with ARACNe, all genes to be tested as possible master regulators should be used as hubs.
* '''From File''': by choosing a file that contains a (comma separated) list of marker names.
 
* '''From Sets''': by selecting a marker set from within the “Markers” component.
 
  
===T-test p-value (alpha)===
+
If the incorrect network format is chosen, the user is warned and the analysis setup is terminated.
Differential expression between the 2 phenotypes of interest is assessed using a t-test. The p-value provided by the user indicates the significance threshold below which a gene’s average expression is presumed to be significantly different in the 2 sets of arrays (cases and controls). Additional parameter settings affecting the execution of the t-test are defined within the “T-test” subtab. The parameters specified there are exactly the same as those used for the differential expression t-test analysis (see description in the corresponding [[Tutorial_-_Differential_Expression | tutorial]]).
 
  
=Working with and viewing the analysis results=
+
If the network is loaded into MARINa as gene symbols or Entrez IDs, it will be transformed (expanded) to include all probesets annotated to each such gene if an annotation file has been loaded for the expression dataset.
Following the successful completion of the MRA computation, a result node appears in the Project Folder area of the geWorkbench interface, under the microarray experiment node used for the computation:
 
  
[[Image:MRA_results_node.png]]
+
===Signature genes (FET method)===
 +
A list of signature gene markers which distinguish between two phenotypes.  This list may come from a t-test, clustering, or some combination of methods.  The user must define this set using methods relevant to the particular dataset and study goals.
  
The results of the analysis can be visualized in the MRA Viewer component by selecting the result node.
+
===Candidate master regulator list (FET method)===
 +
A set of gene markers that will be tested as candidate master regulators.  This set may be comprised of e.g. transcription factor and signalling pathway genes.
  
==MRA Results Viewer==
+
===Note on Marker Sets===
The MRA viewer is structured in 3 distinct areas.
+
geWorkbench provides a mechanism to restrict some analyses to using certain sets of markers by "activating" these sets in the Markers component.  However, as the MRA analysis component uses named marker sets directly, it does not respect the activation state of marker sets in the Markers component, and such activated sets will have no effect on the analysis.
  
FOSL2
+
However, activating microarray sets would restrict the markers used in generating the "bar graph" by the MRA viewer.
  
[[Image:MRA_viewer_GBM_FOSL2.png]]
+
For this reason, no marker sets should be "activated" (their check-box checked) during MRA analysis.
  
ZNF238
+
==Parameters and Settings==
 +
===Main===
 +
The settings on this tab apply to both the FET and MARINa methods.
  
[[MRA_viewer_GBM_ZNF238.png]]
 
  
 +
====Load Network====
 +
There are 2 ways to designate the interaction network, represented by an adjacency matrix, that will be used for computing the regulons of the candidate master regulator genes:
 +
* '''From File''': by choosing a file that describes a network. 
 +
* '''From Workspace''': by selecting an adjacency matrix node from the [[Workspace|Workspace]] component.
  
===Summary Listing===
+
=====Load Network from File=====
This is a table with one row for each candidate master regulator.  
+
* The file loading controls will become active when this option is chosen.
 +
* Press the "Load" button to bring up the file browser.
 +
* After selecting a file, a second dialog will ask for details about the format and symbols used.
  
[[Image:MRA_Summary_listing.png]]
+
[[Image:MRA_Load_Network_Dialog.png]]
  
Each row contains 4 columns:
+
* '''File Format''':
# '''Master Regulator''': This is either the master regulator gene name or the marker/probeset name identifying the corresponding array feature (depending on the selection of the radio buttons “Symbol” and “Probe set”).
+
** ADJ
# '''P-value''': the p-value from Fisher’s exact test. The test utilizes a 2x2 contingency table where rows classify markers as differentially expressed versus non-differentially expressed, while columns indicate if a marker belongs to the regulon of the master regulator or not. Counts are computed using all markers found in the input experiment data. (Fischer's exact test includes p-values for more-extreme tables). 
+
** SIF
# '''Genes in regulon''': The number of genes in the regulon of the master regulator (this comprises the set of genes that are first neighbors of the master regulator in the interaction network specified in the parameters panel).
+
** MARINa 5-column format (internal use only)
# '''Genes in target list''': The number of differentially expressed genes that are also members of the regulon.
 
  
The contents of the table can be ordered by any column, by clicking on the column name.
+
* '''Nodes Represented by''':
 +
** probeset id
 +
** gene symbol
 +
** entrez id
 +
** other
  
===Detailed Listing===
+
If the network is loaded into MARINa as gene symbols or Entrez ID, it will be transformed (expanded) to include all probesets annotated to each such gene if an annotation file has been loaded for the expression dataset.
By clicking on the radio button associated with a master regulator in the Summary table, it is possible to display the complete listing of the differentially expressed genes intersecting the regulon of that master regulator.  
 
  
[[Image:MRA_Detailed_listing.png]]
+
After the file has been loaded, its name will be displayed in the adjacent text field.
  
The genes are displayed in a table with the following columns:
+
=====Load Network from Workspace=====
* '''Genes in target list''': the names of the genes in the intersection set. Either the gene name or the marker/probe set name is used (based on the choice of "Symbol" or "Probe Set" radio buttons).
+
Several analytical components in geWorkbench (e.g., [[ARACNe | ARACNe]], [[Cellular_Networks_KnowledgeBase | CNKB]]) produce adjacency matrix results nodes that can be utilized for this purpose. Networks can also be loaded into the [[Workspace|Workspace]] directly from a file.
* '''P-value''': the p-value of the t-test statistic for this marker, computed over the 2 sets of arrays (cases versus controls).
 
* '''T-test value''': The actual value of the t-test statistic for the gene. A positive value indicates that the mean expression of the gene is higher in cases than in controls (a negative value has the opposite meaning).
 
  
The “P-val Threshold” parameter (located above the table) can be used to limit the number of target genes displayed. Specifically, only genes with a p-value less than the specified cutoff will be shown (if “P-val Threshold” is left empty, then all target genes in the intersection set are displayed). Additional functionality is made available through the following buttons:
+
* The pulldown menu for choosing an available adjacency matrix will become active.  Only adjacency matrices that are children of the current microarray dataset will be offered.
  
* '''Add to Set''': creates a set containing the markers that correspond to the genes currently displayed in the table. The new marker set appears in the Markers component and is named after the master regulator.
+
All edges in the network are assumed to be significant, and any strength value included is not used.
* '''Export selected''': same as “Add to Set” but instead of being added to a marker set, the markers are stored into a file.
 
* '''Export all''': stores into a file information for all master regulators (instead of only the one currently selected in the “Summary” view). Specifically, for each master regulator, the file lists the markers for all genes that are both differential expressed and also belong to the master regulator’s regulon.
 
  
===Graph View===
+
====Enrichment Threshold====
For a given master regulator ''A'' and the intersection between its regulon and the set of differentially expressed genes, the graph view helps assess if the intersection genes are preferentially over-expressed in the cases versus the controls. The biological motivation comes from observing [[#Lim2009 | [Lim et al., 2009]]] that regulators with multiple targets tend to affect the expression level of (most of) their targets in one particular direction: they either promote their expression or inhibit it; but they rarely do both equally.  
+
Enter a p-value for the significance at which to accept the overlap of the regulon of a candidate TF and the signature set of genes.
 +
For the FET (local service), this is calculated using the FET. For the MARINa (grid service) method, this is calculated using GSEA.
  
[[Image:MRA_graph_view.png]]
+
===MRA-FET (Local service)===
 +
Please see the separate [[MRA-FET|MRA-FET]] chapter for details on running the FET version of master regulator analysis.
  
* The red-blue gradient at the bottom of the graph represents the range between the highest (red) and the lowest (blue) t-test statistic recorded among all differentially expressed genes.
+
===MARINA (grid service)===
* The vertical bars correspond to the genes displayed in the table under the “Detailed listing” portion of the interface, i.e., the intersection between the differentially expressed genes and the regulon of the master regulator ''A'' currently selected within the “Summary listing” table.
+
Please see the separate [[MARINa|MARINa]] chapter for details on running MARINa.
* The relative location of a bar on the gradient represents the t-test statistic recorded for the corresponding gene.
 
* Further, the color of each bar provides information about the correlation between the expression levels of the target gene and the master regulator A (correlations are computed as Pearson’s r, using data from all microarrays in the experiment): black means that the two genes are positively correlated (r > 0) while orange means that correlation is negative (r < 0).
 
  
 
=Dataset History=
 
=Dataset History=
Each results node stores the parameter settings used to setup the corresponding MRA run. The specific parameter values can be inspected within the Dataset History component, after clicking on the MRA results node in the Project Folders pane.
+
Each results node stores the parameter settings used to setup the corresponding MRA run. The specific parameter values can be inspected within the Dataset History component, after clicking on the MRA results node in the [[Workspace]].
 
 
=Example of running MRA=
 
This example uses the Bcell-100.exp dataset available in the data/public_data directory of geWorkbench, and further described on the [[Download]] page.  Briefly, this dataset is composed of 100 Affymetrix HG-U95Av2 arrays on which various B-cell lines, both normal and cancerous, were analyzed.  Thus it explores a potentially wide variety of expression phenotypes.
 
 
 
==Prerequisites==
 
# Obtain the annotation file for the HG-U95Av2 array type from the Affymetrix NetAffx website (http://www.affymetrix.com/support/technical/byproduct.affx?product=hgu95). The name will be similar to "HG_U95Av2.na29.annot.csv", where na29 is the version number. Loading the annotation file associates gene names and other information with the Affymetrix probeset IDs (see the geWorkbench FAQ for details on obtaining these files).
 
# Store in your disk the following 2 files:
 
## [[Media:Interaction_network.txt | Interaction_network.txt]]: describes an interaction network with 1955 nodes and 3810 edges. The file has 1955 lines (one for each node, the node name is the first entry in every line) and each line lists the edges emanating from that node. Edges are describes as tab-delimited pairs where the first member of the pair is the name of the target node and the second member is a number specifying a weight for the edge (MRA does not use the weight information but other geWorkbench components do). Node names correspond to marker ids.
 
## [[Media:Master_regulators.csv | Master_regulators.csv]]: a list of master regulators. The marker ids in this file correspond to genes whose [http://www.geneontology.org/ Gene Ontology] annotation (under the Molecular Function category) lists them as transcription factors.
 
 
 
==Loading and preparing the example data==
 
# Load a microarray dataset.  (See [[Tutorial_-_Local_Data_Files | Local Data Files]]).
 
# When prompted, load the annotation file.
 
# Load marker sets for the list of candidate master regulators and for the signature genes.
 
 
 
[[Image:MRA_GBM_Marker_sets.png]]
 
 
 
==Choosing array groups==
 
If needed, define array sets for 2 classes of arrays in the dataset.
 
 
 
* Here we show the MES and PN sets selected.
 
 
 
[[Image:MRA_array_set_activation.png]]
 
 
 
* The MES set is classifed as "Case".  Right click on the thumbtack adjacent to the set name. 
 
 
 
[[Image:Array_set_class_assignment_MRA.png]]
 
 
 
==Setting up the parameters and starting MRA==
 
In the Analysis Panel, select the "MRA Analysis" entry and set the parameters as follows:
 
* '''Load Network''': from the drop down choose the option "From Set", click on the "Load" button, and select the desired set.
 
* '''Transcription Factors''': from the drop down choose the option "From Set", click on the "Load" button, and select the desired set.
 
 
 
[[Image:MRA_GBM_param_setup.png]]
 
 
 
 
 
* Click on the '''Analyze''' button.
 
 
 
==Results==
 
Upon completion of the analysis, an MRA results node is placed in the Project Folders tree. The analysis results can be browsed using the MRA viewer and are as shown above in the MRA Results Viewer section.
 
  
 
=References=
 
=References=
 
<span id="Basso2005"></span>
 
<span id="Basso2005"></span>
* Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet 2005, 37(4):382-390 ([http://www.nature.com/ng/journal/v37/n4/abs/ng1532.html link to paper]).
+
* Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A (2005). Reverse engineering of regulatory networks in human B cells. Nat Genet 37(4):382-390 ([http://www.nature.com/ng/journal/v37/n4/abs/ng1532.html link to paper]).
 +
* Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, Sulman EP, Anne SL, Doetsch F, Colman H, Lasorella A, Aldape K, Califano A, Iavarone A  (2010)  The transcriptional network for mesenchymal transformation of brain tumors.  Nature 463(7279):318-25. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/20032975 20032975].
 
<span id="Lefebvre2010"></span>
 
<span id="Lefebvre2010"></span>
* Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, Basso K, Beltrao P, Krogan N, Gautier J, Dalla-Favera R, Califano A (2010)  A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers.  Mol Syst Biol.  6:377. PMID: 20531406 ([http://www.ncbi.nlm.nih.gov/pubmed/20531406 link to paper]).
+
* Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, Basso K, Beltrao P, Krogan N, Gautier J, Dalla-Favera R, Califano A (2010)  A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers.  Mol Syst Biol.  6:377. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/20531406 20531406].
 
<span id="Lim2009"></span>
 
<span id="Lim2009"></span>
 
* Lim WK, Lyashenko E, Califano A: Master regulators used as breast cancer metastasis classifier. Pac Symp Biocomput. 2009:504-15 ([http://psb.stanford.edu/psb-online/proceedings/psb09/lim.pdf link to paper]).
 
* Lim WK, Lyashenko E, Califano A: Master regulators used as breast cancer metastasis classifier. Pac Symp Biocomput. 2009:504-15 ([http://psb.stanford.edu/psb-online/proceedings/psb09/lim.pdf link to paper]).
 +
* Phillips HS, Kharbanda S, Chen R, Forrest WF, Soriano RH, Wu TD, Misra A, Nigro JM, Colman H, Soroceanu L, Williams PM, Modrusan Z, Feuerstein BG, Aldape K (2006)  Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis.  Cancer Cell 9(3):157-73.

Latest revision as of 17:47, 31 July 2014

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

Regulatory activity in the context of specific cellular phenotypes can be investigated using interaction networks. These are graphs where nodes represent genes and an edge between two nodes A and B means that genes A and B are participants in the same regulatory activity. E.g., A can be a transcription factor for B; or, A can be an miRNA that silences B. Analysis of such regulatory networks [Basso et al., 2005] has convincingly demonstrated their scale-free nature which is dominated by a relatively small number of nodes with a large degree of connectivity. The genes corresponding to those nodes are known as "master regulators" and collectively orchestrate the regulatory program of the underlying cellular phenotype(s).

Master Regulator analysis [Lefebvre et al., 2010] is an algorithm used to identify transcription factors whose targets (e.g., as represented in an ARACNe-generated interactome) are enriched for a particular gene signature (e.g. a list of differentially expressed genes). The enrichment is evaluated using a statistical test such as Fisher’s exact test or GSEA. The objective is to place the signature genes within a regulatory context and identify the master regulators responsible for coordinating their activity, thus highlighting the regulatory apparatus driving phenotypic differentiation.

Specifically, given an interaction network I, a (presumed) master regulator gene A, and a set of signature genes, MRA computes the enrichment of the signature genes in the regulon of A, where the regulon of A is defined as its neighbors in the interaction network I.

Interaction networks are represented as "adjacency matrices". An adjacency matrix lists the connections that each node takes part in, and includes a measure of the strength of that interaction (e.g. the mutual information in the case of matrices generated by ARACNe).

Their are two master regulator analysis components implementing different methods to evaluate the enrichment of the signature in the regulon. Either method will quantify how likely it is to encounter an enrichment of (at least) the observed size by chance alone. A small p-value is taken to imply that gene A may play a significant role in mediating the regulatory program that leads to the differential phenotypes.

  • FET Method (local service) - this method use Fisher's Exact Test. This method is implemented locally in geWorkbench.
  • MARINa Method *(grid service) - this method uses GSEA and differs in substantial ways from the FET-based method. This method is only implemented as a grid service and currently has restricted availability due to its computational cost. A t-test between two phenotype classes is built in to the implementation to produce the gene signature.

The MARINa method can use sample shuffling to correct for non-independance between the expression of various genes. Sample shuffling is not implemented for the MRA-FET method and hence in that method, the p-values are not directly comparable between genes.

Please note that MARINa does not employ any of the special gene lists available for use with the GSEA algorithm, such as MSigDB. It uses only a calculated list of differentially expressed genes and the regulon of the TF being tested.

With the release of geWorkbench 2.5.0, MRA-FET and MRA-MARINa are located in two separate sets of components, which can be loaded in the CCM.

Setting up an MRA run

Prerequisites

MRA-MARINa-CCM.png


Gene Expression dataset

A gene expression dataset in which the phenotypic signature was identified or can be demonstrated. A t-test of differential expression will be run to generate the graphic "bar code" display of the effect of the master regulator on its regulon or to generate the signature gene list (MARINa method).

Interaction Network

An interaction network in the form of an adjacency matrix (See File Formats. Networks can be loaded from a file, or calculated with ARACNe from a dataset which includes the particular cellular phenotypes being investigated. If calculating the network with ARACNe, all genes to be tested as possible master regulators should be used as hubs.

If the incorrect network format is chosen, the user is warned and the analysis setup is terminated.

If the network is loaded into MARINa as gene symbols or Entrez IDs, it will be transformed (expanded) to include all probesets annotated to each such gene if an annotation file has been loaded for the expression dataset.

Signature genes (FET method)

A list of signature gene markers which distinguish between two phenotypes. This list may come from a t-test, clustering, or some combination of methods. The user must define this set using methods relevant to the particular dataset and study goals.

Candidate master regulator list (FET method)

A set of gene markers that will be tested as candidate master regulators. This set may be comprised of e.g. transcription factor and signalling pathway genes.

Note on Marker Sets

geWorkbench provides a mechanism to restrict some analyses to using certain sets of markers by "activating" these sets in the Markers component. However, as the MRA analysis component uses named marker sets directly, it does not respect the activation state of marker sets in the Markers component, and such activated sets will have no effect on the analysis.

However, activating microarray sets would restrict the markers used in generating the "bar graph" by the MRA viewer.

For this reason, no marker sets should be "activated" (their check-box checked) during MRA analysis.

Parameters and Settings

Main

The settings on this tab apply to both the FET and MARINa methods.


Load Network

There are 2 ways to designate the interaction network, represented by an adjacency matrix, that will be used for computing the regulons of the candidate master regulator genes:

  • From File: by choosing a file that describes a network.
  • From Workspace: by selecting an adjacency matrix node from the Workspace component.
Load Network from File
  • The file loading controls will become active when this option is chosen.
  • Press the "Load" button to bring up the file browser.
  • After selecting a file, a second dialog will ask for details about the format and symbols used.

MRA Load Network Dialog.png

  • File Format:
    • ADJ
    • SIF
    • MARINa 5-column format (internal use only)
  • Nodes Represented by:
    • probeset id
    • gene symbol
    • entrez id
    • other

If the network is loaded into MARINa as gene symbols or Entrez ID, it will be transformed (expanded) to include all probesets annotated to each such gene if an annotation file has been loaded for the expression dataset.

After the file has been loaded, its name will be displayed in the adjacent text field.

Load Network from Workspace

Several analytical components in geWorkbench (e.g., ARACNe, CNKB) produce adjacency matrix results nodes that can be utilized for this purpose. Networks can also be loaded into the Workspace directly from a file.

  • The pulldown menu for choosing an available adjacency matrix will become active. Only adjacency matrices that are children of the current microarray dataset will be offered.

All edges in the network are assumed to be significant, and any strength value included is not used.

Enrichment Threshold

Enter a p-value for the significance at which to accept the overlap of the regulon of a candidate TF and the signature set of genes. For the FET (local service), this is calculated using the FET. For the MARINa (grid service) method, this is calculated using GSEA.

MRA-FET (Local service)

Please see the separate MRA-FET chapter for details on running the FET version of master regulator analysis.

MARINA (grid service)

Please see the separate MARINa chapter for details on running MARINa.

Dataset History

Each results node stores the parameter settings used to setup the corresponding MRA run. The specific parameter values can be inspected within the Dataset History component, after clicking on the MRA results node in the Workspace.

References

  • Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A (2005). Reverse engineering of regulatory networks in human B cells. Nat Genet 37(4):382-390 (link to paper).
  • Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, Sulman EP, Anne SL, Doetsch F, Colman H, Lasorella A, Aldape K, Califano A, Iavarone A (2010) The transcriptional network for mesenchymal transformation of brain tumors. Nature 463(7279):318-25. PMID: 20032975.

  • Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, Basso K, Beltrao P, Krogan N, Gautier J, Dalla-Favera R, Califano A (2010) A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol Syst Biol. 6:377. PMID: 20531406.

  • Lim WK, Lyashenko E, Califano A: Master regulators used as breast cancer metastasis classifier. Pac Symp Biocomput. 2009:504-15 (link to paper).
  • Phillips HS, Kharbanda S, Chen R, Forrest WF, Soriano RH, Wu TD, Misra A, Nigro JM, Colman H, Soroceanu L, Williams PM, Modrusan Z, Feuerstein BG, Aldape K (2006) Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell 9(3):157-73.