Difference between revisions of "Master Regulator Analysis"

(FET method details)
(Overview)
Line 15: Line 15:
 
The MRA component supports two methods to evaluate the overlap of the signature and the regulon.  Either method will quantify how likely it is to encounter an intersection of (at least) the observed size by chance alone. A small p-value is taken to imply that gene ''A'' may play a significant role in mediating the regulatory program that leads to the differential phenotypes.  Another way to sort the results is by the number of genes in the overlap set.  This may give a better evaluation of the actual biological significance of the MR.  As the number of genes in each hub gene's regulon is different, the calculated p-values are not directly comparable to each other.
 
The MRA component supports two methods to evaluate the overlap of the signature and the regulon.  Either method will quantify how likely it is to encounter an intersection of (at least) the observed size by chance alone. A small p-value is taken to imply that gene ''A'' may play a significant role in mediating the regulatory program that leads to the differential phenotypes.  Another way to sort the results is by the number of genes in the overlap set.  This may give a better evaluation of the actual biological significance of the MR.  As the number of genes in each hub gene's regulon is different, the calculated p-values are not directly comparable to each other.
  
* '''FET Method (local service)''' - this method use Fisher's Exact Test.  This method is implemented locally in geWorkbench.
+
* '''[[MRA-FET|FET Method (local service)]]''' - this method use Fisher's Exact Test.  This method is implemented locally in geWorkbench.
 
* '''MARINa Method *(grid service)''' - this method uses GSEA and differs in substantial ways from the FET-based method.  This method is only implemented as a grid service and currently has restricted availability due to its computational cost.
 
* '''MARINa Method *(grid service)''' - this method uses GSEA and differs in substantial ways from the FET-based method.  This method is only implemented as a grid service and currently has restricted availability due to its computational cost.
  

Revision as of 14:27, 26 November 2012

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

Regulatory activity in the context of specific cellular phenotypes can be investigated using interaction networks. These are graphs where nodes represent genes and an edge between two nodes A and B means that genes A and B are participants in the same regulatory activity. E.g., A can be a transcription factor for B; or, A can be an miRNA that silences B. Analysis of such regulatory networks [Basso et al., 2005] has convincingly demonstrated their scale-free nature which is dominated by a relatively small number of nodes with a large degree of connectivity. The genes corresponding to those nodes are known as "master regulators" and collectively orchestrate the regulatory program of the underlying cellular phenotype(s).

Master Regulator analysis [Lefebvre et al., 2010] is an algorithm used to identify transcription factors whose targets (e.g., as represented in an ARACNe-generated interactome) are enriched for a particular gene signature. The enrichment is evaluated using a statistical test such as Fisher’s exact test. The objective is to place the signature genes within a regulatory context and identify the master regulators responsible for coordinating their activity, thus highlighting the regulatory apparatus driving phenotypic differentiation.

Specifically, given an interaction network I, a (presumed) master regulator gene A, and a set of signature genes, MRA computes the intersection between two sets of genes:

  1. The neighbors of A in the interaction network I (this gene set is called the regulon of A).
  2. The set of signature genes.

Interaction networks are represented as "adjacency matrices". An adjacency matrix lists the connections that each node takes part in, and includes a measure of the strength of that interaction (e.g. the mutual information in the case of matrices generated by ARACNe).

The MRA component supports two methods to evaluate the overlap of the signature and the regulon. Either method will quantify how likely it is to encounter an intersection of (at least) the observed size by chance alone. A small p-value is taken to imply that gene A may play a significant role in mediating the regulatory program that leads to the differential phenotypes. Another way to sort the results is by the number of genes in the overlap set. This may give a better evaluation of the actual biological significance of the MR. As the number of genes in each hub gene's regulon is different, the calculated p-values are not directly comparable to each other.

  • FET Method (local service) - this method use Fisher's Exact Test. This method is implemented locally in geWorkbench.
  • MARINa Method *(grid service) - this method uses GSEA and differs in substantial ways from the FET-based method. This method is only implemented as a grid service and currently has restricted availability due to its computational cost.

The choice of FET vs MARINa is made on the "Services" tab, by selecting the local (FET) vs grid (MARINa) service.


For the local FET method, generating a bar-graph display of the effect of a master regulator on its regulon requires performing a t-test of differential expression on a dataset representing the phenotypes distinguished by the signature, ideally the original dataset from which the signature was determined.

Setting up an MRA run

Prerequisites

Gene Expression dataset

A gene expression dataset in which the phenotypic signature was identified or can be demonstrated. A t-test of differential expression will be run to generate the graphic "bar code" display of the effect of the master regulator on its regulon (FET method) or to generate the signature gene list (MARINa method).


Interaction Network

An interaction network in the form of an adjacency matrix (See File Formats. Networks can be loaded from a file, or calculated with ARACNe from a dataset which includes the particular cellular phenotypes being investigated. If calculating the network with ARACNe, all genes to be tested as possible master regulators should be used as hubs.

If the incorrect network format is chosen, the user is warned and the analysis setup is terminated.

Signature genes (FET method)

A list of signature gene markers which distinguish between two phenotypes. This list may come from a t-test, clustering, or some combination of methods. The user must define this set using methods relevant to the particular dataset and study goals.

Candidate master regulator list (FET method)

A set of gene markers that will be tested as candidate master regulators. This set may be comprised of e.g. transcription factor and signalling pathway genes.

Note on Marker Sets

geWorkbench provides a mechanism to restrict some analyses to using certain sets of markers by "activating" these sets in the Markers component. However, as the MRA analysis component uses named marker sets directly, it does not respect the activation state of marker sets in the Markers component, and such activated sets will have no effect on the analysis.

However, activating microarray sets would restrict the markers used in generating the "bar graph" by the MRA viewer.

For this reason, no marker sets should be "activated" (their check-box checked) during MRA analysis.

Parameters and Settings

Main

The settings on this tab apply to both the FET and MARINa methods.


MRA Parameters Main.png

Load Network

There are 2 ways to designate the interaction network, represented by an adjacency matrix, that will be used for computing the regulons of the candidate master regulator genes:

  • From File: by choosing a file that describes a network.
  • From Project: by selecting an adjacency matrix node from the Project Folders component.
Load Network from File
  • The file loading controls will become active when this option is chosen.
  • Press the "Load" button to bring up the file browser.
  • After selecting a file, a second dialog will ask for details about the format and symbols used.

MRA Load Network Dialog.png

  • File Format:
    • ADJ
    • SIF
    • MARINa 5-column format (internal use only)
  • Nodes Represented by:
    • probeset id
    • gene symbol
    • entrez id
    • other

After the file has been loaded, its name will be displayed in the adjacent text field.

Load Network from Project

Several analytical components in geWorkbench (e.g., ARACNe, CNKB) produce adjacency matrix results nodes that can be utilized for this purpose. Networks can also be loaded from a file directly into the Project.

  • The pulldown menu for choosing an available adjacency matrix will become active. Only adjacency matrices that are children of the current microarray dataset will be offered.

All edges in the network are assumed to be significant, and any strength value included is not used.

Enrichment Threshold

Enter a p-value for the significance at which to accept the overlap of the regulon of a candidate TF and the signature set of genes. For the FET (local service), this is calculated using the FET. For the MARINa (grid service) method, this is calculated using GSEA.

FET (local service)

MRA Parameters FET v2.png

Master Regulators

A set of candidate master regulator markers.

  • This set must be loaded into the Markers component before running MRA. The set can be created directly there, or read in from a file.

Signature Markers

A set of markers comprising the signature that distinguishes the chosen phenotype from others.

  • This set must be loaded into the Markers component before running MRA. The signature can be generated directly, e.g. through a t-test, or loaded from a file.

FET Runs

  • One (Enrichment Only)
  • Two (Enrichment plus mode of activity) - the target markers are divided into two groups and two runs of FET are performed. See the description above at FET method details.

Multiple Testing Correction

  • No Correction
  • Standard Bonferroni


T-test for differential expression

In the Arrays component, a case and a control group must be defined for running a t-test.

A "bar-code" graphic is generated using a t-test of differential expression. However, all t-values are accepted (critical alpha = 1) and used to order the bars representing the regulon markers.

All that is required is to define sets of arrays representing two phenotypes of interest (and distinguished by the signature). At least two sets of arrays must be activated, and at least one marked as "case", representing the target phenotype of the gene signature. "Control" is the default classification. See also the Differential Expression tutorial).


Array set class assignment MRA.png

MARINA (grid service)

Please see the separate MARINa chapter for details on running MARINa.

Viewing MRA analysis results (FET Method)

Following the successful completion of the MRA FET computation, a result node (MRA) appears in the Project Folder area of the geWorkbench interface, under the microarray experiment node. Hovering the cursor over the MRA result node will show the number of master regulators found.


Project Folders MRA Tooltip.png

The results of the analysis can be visualized in the MRA Viewer component by selecting the result node.

MRA Results Viewer

The MRA viewer is structured in 3 distinct areas.

(In the figures below, the data is sorted on the "genes in intersection set column").


MRA viewer GBM FOSL2 v2.png


Note - if no significant MRs are found, an empty result node is returned to the Project Folders component. The MRA viewer will appear but be empty.

Summary Listing

MRA Summary listing v2.png


First Row of Controls

  • Symbol - display the markers using their gene symbol (if available)
  • Probeset - display markers using their marker (probeset) name.
  • Results for top ... - Restrict the "bar graph" to at most the specified number of entries.
  • Bar height ... - set the height of the veritcal lines in the bar graph in pixels.
  • Bars for
    • Regulon - draw bars for each marker in the hub marker's regulon
    • Intersection set - draw bars for only those markers in the hub's regulon that are also present in the list of signature markers.

Second Row of Controls

Export Table

This command will export the entire master regulator results table to a file. It exports the same information shown on screen, sorted in the same way if the table has been sorted on one of the columns. The user can choose to export the table in CSV (.csv) or tab-delimited text format (.txt).

The following columns are exported:

  • Master Regulator
  • FET P-Value
  • Genes in regulon (count)
  • Genes in intersection set (count)
Export all targets

This command writes a file to disk containing each MR in the table, along with each MRs targets and the (value) for each target.

The master regulators and their markers in the intersection set (intersection of each MRs regulon and the signature genes) are exported, along with the T-test value calculated for display of the regulon. Each master regulator is listed on a line, followed by its intersection set markers with their t-test t values. Each MR is separated by a blank line from the preceeding section. The order in the file is not changed by sorting the results table prior to export.

Export File format:

marker, gene name, t-value

Example:

220462_at, CSRNP3
200660_at, S100A11, 12.541623
201474_s_at, ITGA3, 7.4126143
202910_s_at, CD97, 10.785
....
202614_at, SLC30A9
160020_at, MMP14, 4.415267
200808_s_at, ZYX, 9.006654
200859_x_at, FLNA, 8.309419
....


Exported files automatically receive a ".csv" file name extension.

Add Targets to Set

Create a new marker set in the Markers component containing the intersection set for the selected master regulator. The set is named after the master regulator.

Mode

This set of radio buttons controls which mode results to display in the bar graph, if the two-FET method for MRA was used (See above section FET method details).

  • Both - display results with both plus and minus modes.
  • Plus (+) - display only "plus" mode results.
  • Minus (-) - display only "minus" mode results.

Table Column Headers

At upper left in the MRA viewer. For each candidate master regulator found to have a significant effect using Fisher's Exact test, the following four columns are displayed:

  • Master Regulator - This is either the master regulator gene name or the marker/probeset name identifying the corresponding array feature (depending on the selection of the radio buttons “Symbol” and “Probe set”).
  • FET p-value - the p-value from Fisher’s exact test. The test utilizes a 2x2 contingency table where rows classify markers as belonging to the signature set or not, while columns indicate if a marker belongs to the regulon of the master regulator or not. Counts are computed using all markers found in the input experiment data. (Fischer's exact test includes p-values for more-extreme tables).
  • Genes in Regulon - the number of markers (genes) found to be first neighbors of the master regulator in the loaded network - its regulon.
  • Genes in Intersection Set - The number of markers found in the intersection of the signature and the regulon of the candidate MR.
  • Mode - Only used if MRA was run with the two-FET option. See the above section FET method details.

The contents of the table can be ordered by any column, by clicking on the column name. Sorting by the number of genes in the intersection set may give list with the more biologically interesting hits on top. As each regulon is of different size, the p-values are not directly comparable.

Clicking on the radio button for any of the master regulators will display the list of intersection genes in a table to the right (Detailed Listing), and will draw the regulon bar graph below.

Detailed Listing

The detailed list shows the genes/markers contained in the intersection set of the MR regulon and the signature.


MRA Detailed listing.png

The genes are displayed in a table with the following columns:

  • Genes in intersection set: the names of the genes in the intersection set. Either the gene name or the marker/probe set name is used (based on the choice of "Symbol" or "Probe Set" radio buttons).
  • -log10(p-value) * sign (t-value): A modified test statistic combining the -log10(p-value) with the sign of the t-value. The sign of the t-value indicates positive or negative differential expression.


Export Table

This command will export to a file on disk the contents of the detailed target results table.

The file can be written in either tab-delimited (.txt) or CSV (.csv) format.

The columns exported are:

  • Genes in intersection set
  • -log10(P-value) * sign of t-value

Bar Graph View

Description

The bar graph is created based on ranked differential expression results for all markers in the dataset. However, only markers in the TF's regulon or intersection set (depending on the setting chosen) are drawn as vertical bars, allowing their positions in the entire set of markers to be visualized.

The value used to calculate the differential expression display is -log10(p-value) * sign (t-value), as in the detailed table display described above.


MRA graph view GBM.png

  • Vertical bars - The vertical bars correspond to ranked positions of the markers belonging to each TF's regulon or intersection set (depending on the setting chosen).
  • Bar position on horizontal axis - bars for displayed markers are positioned using their rank in a list of all markers ordered by (-log10(p-value) * sign (t-value)), calculated using a t-test for differential expression.
  • Bar Color - The color of each bar indicates the sign of the Spearman's Correlation between the expression profile of the TF and its targets (calculated using data from all microarrays in the experiment, not just those in the case and control sets):
    • Red means that the two markers are positively correlated (r >= 0) while
    • Blue means that correlation is negative (r < 0).
  • Gradient - The red-blue gradient at the bottom of the graph qualitatively represents the ranking between the lowest (blue) and the highest (red) test statistic. The white area in the middle represents the middle of the ranking (not necessarily zero differential expression).

Detailed Examples

Detail for FOSL2:

MRA graph GBM FOSL2 v2.png

The bar graph shown above, for FOSL2, indicates that the expression profile of FOSL2 is positively correlated with its regulon targets having postive differential expression in the "case", mesenchymal phenotype. FOSL2 is more active in the mesenchymal than in the proneural phenotype.


Detail for ZNF238:

MRA graph GBM ZNF238 v2.png


The bar graph shown above, for ZNF238, indicates that expression profile of ZNF238 is positively correlated with its regulon targets having positive differential expression in the control, proneural phenotype. ZNF238 is less active in the mesenchymal phenotype and more active in the proneural phenotype.

Save Image

Via a right-click menu on the bar graph, the user can save an image of the displayed bar graph to

  • the Project as an image snapshot, or
  • directly to a file on disk. Available formats are PNG, JPEG, TIF and BMP.

MRA graph save image.png

Graph View (prior to 2.4.0)

The bar code view in geWorkbench 2.3.0 and some prior versions was similar to that described above but positioned the bars based directly on t-value rather than on the ranking in all markers. The right and left extremes represented the largest negative and positive t-values seen among all results, not just for the depicted TF.

The results below were calculated using the same data as depicted in the above figures for the current graph method.

FOSL2:

MRA graph GBM FOSL2.png


ZNF238:

MRA graph GBM ZNF238.png

Dataset History

Each results node stores the parameter settings used to setup the corresponding MRA run. The specific parameter values can be inspected within the Dataset History component, after clicking on the MRA results node in the Project Folders pane.

Example of running MRA (FET Method)

This example uses a dataset comprised of 176 microarrays described in Phillips (2006). The analysis follows that described in Carro et al. (2010) for master regulators of Glioblastoma.


Loading and preparing the example data

Microarray dataset

  1. Load a microarray dataset. (See Local Data Files).
  2. Normalize as desired. In this example, the data was log2 transformed.
  3. When prompted, load the annotation file.

Marker sets

Load marker sets for:

  1. the list of candidate master regulators
  2. the signature genes.

Note on Marker Sets

geWorkbench provides a mechanism to restrict some analyses to using certain sets of markers by "activating" these sets in the Markers component. However, as the MRA analysis component uses named marker sets directly, it does not respect the activation state of marker sets in the Markers component, and such activated sets will have no effect on the analysis.

However, activating microarray sets would restrict the markers used in generating the "bar graph" by the MRA viewer.

For this reason, no marker sets should be "activated" (their check-box checked) during MRA analysis.


MRA GBM Marker sets.png

Array sets

Array sets are shown defined for the three phenotypic classes of arrays in the dataset: Mesenchymal (MES), Proneural (PN), and Proliferative (Prolif).

  • MES and PN are "activated" for use in the t-test by checking the boxes next their names.
  • The MES set is classifed as "Case". Right click on the thumbtack adjacent to the set name.

Array set class assignment MRA.png

Setting up the parameters and starting MRA

In the Project Folders component, right-click on the expression dataset and select "MRA Analysis".

In the "Main" parameters tab,

  • Load Network - load the network, either directly from a file, or choose a network that has been loaded into the Project.
  • P-value - The p-value for the FET may be set as desired.

If the network is loaded from a file, you will see the following dialog.

MRA Load Network Dialog.png

Set the network file format (ADJ or SIF) and type of symbol used in the file to represent the gene nodes (e.g. marker id, gene symbol, Entrez ID).


The figure below shows the Main parameters tab after a network has been loaded from the Project Folders component:


MRA Parameters Main FET Example network from project.png


or from a file


MRA Parameters Main FET Example.png


In the "FET" parameters tab, select the signature and master regulator marker sets, and set the FET Runs and Multiple Testing Correction choices.


MRA Parameters FET Example v2.png


  • Master regulators - select the desired set from those loaded in the Markers component.
  • Signature markers - select the desired set from those loaded in the Markers component.
  • FET Runs - set
  • Multiple Testing Correction - set


  • Click on the Analyze button.
  • As previously noted, you may wish to sort the result table by the number of genes in the intersection set rather than by p-value, as this may give a more biologically relevant list.

Results

Upon completion of the analysis, an MRA results node is placed in the Project Folders tree. The analysis results can be browsed using the MRA viewer and are as shown above in the MRA Results Viewer section.

References

  • Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A (2005). Reverse engineering of regulatory networks in human B cells. Nat Genet 37(4):382-390 (link to paper).

  • Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, Sulman EP, Anne SL, Doetsch F, Colman H, Lasorella A, Aldape K, Califano A, Iavarone A (2010) The transcriptional network for mesenchymal transformation of brain tumors. Nature 463(7279):318-25.
  • Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, Basso K, Beltrao P, Krogan N, Gautier J, Dalla-Favera R, Califano A (2010) A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol Syst Biol. 6:377. PMID: 20531406 (link to paper).

  • Lim WK, Lyashenko E, Califano A: Master regulators used as breast cancer metastasis classifier. Pac Symp Biocomput. 2009:504-15 (link to paper).
  • Phillips HS, Kharbanda S, Chen R, Forrest WF, Soriano RH, Wu TD, Misra A, Nigro JM, Colman H, Soroceanu L, Williams PM, Modrusan Z, Feuerstein BG, Aldape K (2006) Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell 9(3):157-73.