Difference between revisions of "Master Regulator Analysis"

(Parameters and Settings)
Line 1: Line 1:
 
{{TutorialsTopNav}}
 
{{TutorialsTopNav}}
  
 +
=Overview=
 +
Regulatory activity in the context of specific cellular phenotypes can be modeled using interaction networks. These are graphs where nodes represent genes and an edge between two nodes A and B means that genes ''A'' and ''B'' are participants in the same regulatory activity. E.g., ''A'' can be a transcription factor for ''B''; or, ''A'' can be an miRNA that silences ''B''. Analysis of such regulatory networks [refs] has convincingly demonstrated their scale free nature which is dominated by a relatively small number of nodes with a large degree of connectivity. The genes corresponding to those nodes are known as "master regulators" and collectively orchestrate the regulatory program of the underlying cellular phenotype(s).
  
==Overview==
+
The master regulator analysis (MRA) component in geWorkbench combines regulatory information from interaction networks with differential expression analysis. The objective is to place differentially expressed genes within a regulatory context and identify the master regulators responsible for coordinating their regulation, thus highlighting the regulatory apparatus driving phenotypic differentiation. Specifically, given an interaction network ''I'', a (presumed) master regulator gene ''A'', and two sets of microarrays representing two distinct phenotypes, MRA computes the intersection between two sets of genes:
 +
# The neighbors of ''A'' in the interaction network ''I'' (this gene set is called the '''''regulon''''' of ''A'').
 +
# The set of differentially expressed genes in the array data from the two phenotypes of interest.
  
The goal of Master Regulator Analysis (MRA)is to identify transcription factors (TFs) which control the regulation of a set of target genes (TGs) that demonstrate significant differential expression across two cellular phenotypes, e.g. “Case” and “Control” in a microarray dataset.  Differential expression is measured using a simple t-test. Sets of genes putatively controlled by each TF (each TF's regulon) are obtained from an adjacency matrix (interaction network) calculated by [[Tutorial_-_ARACNE | ARACNe]] or other source prior to MRA.
+
Fisher’s exact test is then used to quantify how likely it is to encounter an intersection of the observed size by chance alone. A small p-value is taken to imply that gene ''A'' may play a significant role in mediating the regulatory program that leads to the differential phenotypes.
  
The dataset from which the adjacency matrix is derived would not necessarily be the same one used for the t-test.  An ARACNe run requires a dataset which explores many different expression phenotypes of a particular cell type, whereas a differential expression experiment compares only two classes.
+
=Setting up an MRA run=
  
For each TF, MRA then calculates, using Fisher's Exact test, whether there is greater overlap between the set of the TF's target genes and the set of differentially expressed genes than would be expected by chance.  
+
==Prerequisites==
 +
* First confirm that the MRA component is available in geWorkbench. If not, it can be loaded using the [[Tutorial_-_Component_Configuration_Manager | Component Configuration Manager]].
 +
* The MRA will be listed along with the other analysis routines within the geWorkbench Analysis Panel.
  
The types of data which will be used in the MRA then are:
+
[[Image:MRA_Parameters_panel.png|800px]]
# A microarray dataset appropriate for examining differential gene expression using a t-test.
 
# A list of putative transcription factors which are to be tested against the differentially expressed genes.
 
# An interaction network in the form of an ARACNe adjacency matrix.  It should contain the results of an ARACNe run including, as hub markers, at least all of the transcription factors that will be tested in MRA.
 
  
==MRA Parameters and Settings==
+
* Data from an expression experiment (comprising measurements from multiple arrays/samples) has been loaded and the corresponding node has been selected within the Project Folders:
 +
 
 +
[[Image:Experiment_data_node_(MRA).png]]
 +
 
 +
* Two array sets (each containing a distinct subset of arrays from the expression experiment) have been selected in the "Arrays/Phenotypes" component, representing the two phenotypes under investigation. One of the array sets (identified by the red-color pin) has been classified as containing the "Case" arrays (i.e., one of the 2 phenotypes) while the other one contains the "Controls" (the second phenotype):
 +
 
 +
[[Image:Array_set_class_assignment_(MRA).png]]
 +
 
 +
<u>'''NOTE'''</u>:The "Case"/"Control" assignment is needed because, in general, it is possible to select more than one array sets for this analysis. In that case it is necessary to explicitly designate which belong to each of the 2 phenotypes.
 +
 
 +
==Parameters and Settings==
  
 
===Load Network===
 
===Load Network===
 +
There are 2 ways to designate the interaction network that will be used for computing the regulons of the candidate master regulator genes:
 +
* '''From File''': by choosing a file that describes a network.
 +
* '''From Project''': by selecting a node from the project folders which represents an interaction network. Several analytical components in geWorkbench (e.g., [[Tutorial_-_ARACNE | ARACNE]]) produce results nodes that can be utilized for this purpose.
 +
 +
===Master Regulators===
 +
There are 2 ways to designate the candidate master regulator genes to use:
 +
* '''From File''': by choosing a file that contains a list of (comma separated) marker names.
 +
* '''From Sets''': by selecting one among the marker sets within the “Markers” component.
 +
 +
===T-test p-value (alpha)===
 +
Differential expression between the 2 phenotypes of interest is assessed using a t-test. The p-value provided by the user indicates the significance threshold below which a gene’s average expression is presumed to be significantly different in the 2 sets of arrays (cases and controls). Additional parameter settings affecting the execution of the t-test are defined within the “T-test” subtab. The parameters specified there are exactly the same as those used for the differential expression t-test analysis (see description in the corresponding [[Tutorial_-_Differential_Expression | tutorial]]).
 +
 +
=Working with and viewing the analysis results=
 +
Following the successful completion of the MRA computation, a result node appears in the Project Folder area of the geWorkbench interface, under the microarray experiment node used for the computation:
 +
 +
[[Image:MRA_resulst_node.png]]
 +
 +
The results of the analysis can be visualized in the MRA Viewer component by selecting the result node.
 +
 +
==MRA Results Viewer==
 +
The MRA viewer is structured in 3 distinct areas.
 +
 +
[[Image:MRA_viewer_full.png]]
 +
 +
===Summary Listing===
 +
This is a table with one row for each candidate master regulator.
 +
 +
[[Image:MRA_Summary_listing.png]]
 +
 +
Each row contains 4 columns:
 +
# '''Master Regulator''': This is either the master regulator gene name or the marker/probeset name identifying the corresponding array feature (depending on the selection of the radio buttons “Symbol” and “Probe set”).
 +
# '''P-value''': the p-value of fisher’s exact test. The test utilizes a 2x2 contingency table where rows classify markers as differentially expressed versus non-differentially expressed, while columns indicate if a marker belongs to the regulon of the master regulator or not. Counts are computed using all markers found in the input experiment data.
 +
# '''Genes in regulon''': The number of genes in the regulon of the master regulator (this comprises the set of genes that are first neighbors of the master regulator in the interaction network specified in the parameters panel).
 +
# '''Genes in target list''': The number of differentially expressed genes that are also members of the regulon.
 +
 +
The contents of the table can be ordered across any column, by clicking on the column name.
 +
 +
===Detailed Listing===
 +
By clicking on the radio button associated with a master regulator in the Summary table, it is possible to display the complete listing of the differentially expressed genes intersecting the regulon of that master regulator.
  
The network consists of an adjacency matrix generated by ARACNe.
+
[[Image:MRA_Detailed_listing.png]]
  
* '''From File''' - load an adjacency matrix generated by an external run of ARACNe.  
+
The genes are displayed in a table with the following columns:
* '''From Project''' - load an ARACNe adjacency matrix from a result node in the Project Folders component.
+
# '''Genes in target list''': the names of the genes in the intersection set. Either the gene name or the marker/probe set name is used (based on the choice of "Symbol" or "Probe Set" radio buttons).
 +
# '''P-value''': the p-value of the t-test statistic for this marker, computed over the 2 sets of arrays (cases versus controls).  
 +
# '''T-test value''': The actual value of the t-test statistic for the gene. A positive value indicates that the mean expression of the gene is higher in cases than in controls (a negative value has the opposite meaning).
  
===Transcription Factors===
+
The “P-val Threshold” parameter (located above the table) can be used to limit the number of target genes displayed. Specifically, only genes with a p-value less than the specified cutoff will be shown (if “P-val Threshold” is left empty, then all target genes in the intersection set are displayed). Additionally functionality is made available through the following buttons:
  
* '''From File''' - Load a comma-separate list of transcription factors from a file.
+
* '''Add to Set''': creates a set containing the markers that correspond to the genes currently displayed in the table. The new marker set appears in the Markers component and is named after the master regulator.
* '''From Sets''' - Use a set defined in the Markers component as the list of transcription factors.
+
* '''Export selected''': same as “Add to Set” but instead of being added to a marker set, the markers are stored into a file.
 +
* '''Export all''': stores into a file information for all master regulators (instead of only the one currently selected in the “Summary” view). Specifically, for each master regulator, the file lists the markers for all genes that are both differential expressed and also belong to the master regulator’s regulon.
  
===Significance Threshold===
+
===Graph View===
* '''T-test p-value (alpha)''' - The cutoff p-value by which to establish whether a particular marker shows a significant difference in expression between the two groups. (Note that multiple testing corrections are offered on the t-test parameters tab).
+
For a given master regulator ''A'' and the intersection between its regulon and the set of differentially expressed genes, the graph view helps assess if the intersection genes are preferentially over-expressed in the cases versus the controls. The biological motivation comes from observing [ref] that regulators with multiple targets tend to affect the expression level of (most of) their targets in one particular direction: they either promote their expression or inhibit it; but they rarely do both equally.  
  
 +
[[Image:MRA_graph_view.png|800px]]
  
 +
The red-blue gradient at the bottom of the graph represents the range between the highest (red) and the lowest (blue) t-test statistic recorded among all differentially expressed genes. The vertical bars correspond to the genes displayed in the table under the “Detailed listing” portion of the interface, i.e., the intersection between the differentially expressed genes and the regulon of the master regulator ''A'' currently selected within the “Summary listing” table. The relative location of a bar on the gradient represents the t-test statistic recorded for the corresponding gene. Further, the color of each bar provides information about the correlation between the expression levels of the target gene and the master regulator A (correlations are computed as Pearson’s r, using data from all microarrays in the experiment): black means that the two genes are positively correlated (r > 0) while orange means that correlation is negative (r < 0).
  
[[Image:T_MRA_Setup.png]]
+
=Dataset History=
 +
Each results node stores the parameter settings used to setup the corresponding MRA run. The specific parameter values can be inspected within the Dataset History component, after clicking on the MRA results node in the Project Folders pane.
  
 +
=Example of running MRA=
 +
This example uses the Bcell-100.exp dataset available in the data/public_data directory of geWorkbench, and further described on the [[Download]] page.  Briefly, this dataset is composed of 100 Affymetrix HG-U95Av2 arrays on which various B-cell lines, both normal and cancerous, were analyzed.  Thus it explores a potentially wide variety of expression phenotypes.
  
 +
==Prerequisites==
 +
# Obtain the annotation file for the HG-U95Av2 array type from the Affymetrix NetAffx website (http://www.affymetrix.com/support/technical/byproduct.affx?product=hgu95). The name will be similar to "HG_U95Av2.na29.annot.csv", where na29 is the version number. Loading the annotation file associates gene names and other information with the Affymetrix probeset IDs (see the geWorkbench FAQ for details on obtaining these files).
 +
# Store in your disk the following 2 files:
 +
## [[Media:Interaction_network.txt | Interaction_network.txt]]: describes an interaction network with 1955 nodes and 3810 edges. The file has 1955 lines (one for each node, the node name is the first entry in every line) and each line lists the edges emanating from that node. Edges are describes as tab-delimited pairs where the first member of the pair is the name of the target node and the second member is a number specifying a weight for the edge (MRA does not use the weight information but other geWorkbench components do). Node names correspond to marker ids.
 +
## [[Media:Master_regulators.csv | Master_regulators.csv]]: a list of master regulators. The marker ids in this file correspond to genes whose [http://www.geneontology.org/ Gene Ontology] annotation (under the Molecular Function category) lists them as transcription factors.
  
==t-test Parameters and Settings==
+
==Loading and preparing the example data==
 +
# Load the Bcell-100.exp dataset into geWorkbench as type "Affymetrix File Matrix".  (See [[Tutorial_-_Local_Data_Files | Local Data Files]]).
 +
# When prompted, load the annotation file.
  
The parameter settings available for the MRA t-test are shown in the figure belowThese parameters are the same as those described in the [[Tutorial_-_Differential_Expression | t-test component tutorial]].
+
==Choosing array groups==
 +
The Bcell-100 dataset comes with predefined sets of arrays.   
  
===P-values based on===
+
* In the Arrays/Phenotypes component (at lower left in the geWorkbench GUI), choose the group in the pulldown menu called "Class".
* t-distribution - directly calculate the p-value
+
* Check the box beside the 2 array sets titled "non-GC B-Cell" and "non-GC Tumor", as shown in the figure below.
* Permutation - determine the p-value empirically through repeated trials against permuted data sets.
 
** Randomly group experiments  - #-times - how many permuations to carry out
 
** All permutations
 
  
 +
[[Image:MRA_array_set_activation.png]]
  
===Correction method===
+
* Right-click on the array set titled "non-GC Tumor" and from the resulting popup menu choose "Classification->Case" to indicate that the arrays in that set will be treated as the "Case" phenotype for the t-test:
* Just alpha (no correction)
 
* Standard Bonferroni - divide given p-value threshold by number of markers tested.
 
* Adjusted Bonferroni - same as Standard Bonferroni, except the divisor for each successive marker tested is decreased by one.
 
  
 +
[[Image:Array_set_class_assignment_(MRA).png]]
  
===Step-down Westfall and Young methods===
+
==Setting up the parameters and starting MRA==
(only if permuation is selected for p-value calculation)
+
In the Analysis Panel, select the "MRA Analysis" entry and set the parameters as follows:
* minP
+
* '''Load Network''': from the drop down choose the option "From File", click on the "Load" button, and select to open the file "Interaction_network.txt".
* maxT
+
* '''Master Regulators''': from the drop down choose the option "From File", click on the "Load" button, and select to open the file "Master_regulators.csv".
 +
* '''T-test p-value (alpha)''': set the significance threshold to 0.01.
  
===Group Variances===
+
[[Image:MRA_params_setting_1.png | 700px]]
Choose whether the variances in the two groups being compared are expected to be equal or not.
 
* Unequal (Welch approximation)
 
* Equal
 
  
 +
* Click on the "T-test" tab and under '''Correction Method''' select the option titled "Standard Bonferroni", to indicate that the p-value of the t-test statistic for each gene should be family-wise corrected before compared to the p-value threshold (alpha) specified above.
  
[[Image:T_MRA_t-test.png]]
+
[[Image:MRA_params_setting_2.png | 700px]]
  
==Multiple testing considerations==
+
* Click on the '''Analyze''' button.
* '''t-test''' - The t-test for differential expression is run on each marker in turn, so that potentially thousands of tests may be performed.  The t-test tab within MRA offers simple multiple testing corrections such as the Bonferroni correction.  
 
  
* '''Fisher's Exact Test''' - Note that Fisher's Exact test is run for each transcription factor and a p-value reported. No correction is supplied for this occurrence of multiple testing.
+
==Results==
 +
Upon completion of the analysis, an MRA results node is placed in the Project Folders tree. The analysis results can be browsed using the MRA viewer.
  
==Running MRA==
+
=References=
# Select or load an adjacency matrix from an ARACNe run or other source.
 
# Select or load a list of transcription factors.
 
# Define two classes of arrays, e.g. case and control in the Arrays/Phenotypes component.
 
# Set the significance threshold and t-test parameters as desired.
 
# Press the '''Analyze''' button.  The t-test followed by the Fisher's Exact tests will be carried out.
 
# A table and graphic showing transcription factors for whose interactions significant overlap with the set of differentially expressed genes was found will be displayed.
 

Revision as of 10:27, 23 October 2009

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Overview

Regulatory activity in the context of specific cellular phenotypes can be modeled using interaction networks. These are graphs where nodes represent genes and an edge between two nodes A and B means that genes A and B are participants in the same regulatory activity. E.g., A can be a transcription factor for B; or, A can be an miRNA that silences B. Analysis of such regulatory networks [refs] has convincingly demonstrated their scale free nature which is dominated by a relatively small number of nodes with a large degree of connectivity. The genes corresponding to those nodes are known as "master regulators" and collectively orchestrate the regulatory program of the underlying cellular phenotype(s).

The master regulator analysis (MRA) component in geWorkbench combines regulatory information from interaction networks with differential expression analysis. The objective is to place differentially expressed genes within a regulatory context and identify the master regulators responsible for coordinating their regulation, thus highlighting the regulatory apparatus driving phenotypic differentiation. Specifically, given an interaction network I, a (presumed) master regulator gene A, and two sets of microarrays representing two distinct phenotypes, MRA computes the intersection between two sets of genes:

  1. The neighbors of A in the interaction network I (this gene set is called the regulon of A).
  2. The set of differentially expressed genes in the array data from the two phenotypes of interest.

Fisher’s exact test is then used to quantify how likely it is to encounter an intersection of the observed size by chance alone. A small p-value is taken to imply that gene A may play a significant role in mediating the regulatory program that leads to the differential phenotypes.

Setting up an MRA run

Prerequisites

  • First confirm that the MRA component is available in geWorkbench. If not, it can be loaded using the Component Configuration Manager.
  • The MRA will be listed along with the other analysis routines within the geWorkbench Analysis Panel.

MRA Parameters panel.png

  • Data from an expression experiment (comprising measurements from multiple arrays/samples) has been loaded and the corresponding node has been selected within the Project Folders:

Experiment data node (MRA).png

  • Two array sets (each containing a distinct subset of arrays from the expression experiment) have been selected in the "Arrays/Phenotypes" component, representing the two phenotypes under investigation. One of the array sets (identified by the red-color pin) has been classified as containing the "Case" arrays (i.e., one of the 2 phenotypes) while the other one contains the "Controls" (the second phenotype):

Array set class assignment (MRA).png

NOTE:The "Case"/"Control" assignment is needed because, in general, it is possible to select more than one array sets for this analysis. In that case it is necessary to explicitly designate which belong to each of the 2 phenotypes.

Parameters and Settings

Load Network

There are 2 ways to designate the interaction network that will be used for computing the regulons of the candidate master regulator genes:

  • From File: by choosing a file that describes a network.
  • From Project: by selecting a node from the project folders which represents an interaction network. Several analytical components in geWorkbench (e.g., ARACNE) produce results nodes that can be utilized for this purpose.

Master Regulators

There are 2 ways to designate the candidate master regulator genes to use:

  • From File: by choosing a file that contains a list of (comma separated) marker names.
  • From Sets: by selecting one among the marker sets within the “Markers” component.

T-test p-value (alpha)

Differential expression between the 2 phenotypes of interest is assessed using a t-test. The p-value provided by the user indicates the significance threshold below which a gene’s average expression is presumed to be significantly different in the 2 sets of arrays (cases and controls). Additional parameter settings affecting the execution of the t-test are defined within the “T-test” subtab. The parameters specified there are exactly the same as those used for the differential expression t-test analysis (see description in the corresponding tutorial).

Working with and viewing the analysis results

Following the successful completion of the MRA computation, a result node appears in the Project Folder area of the geWorkbench interface, under the microarray experiment node used for the computation:

MRA resulst node.png

The results of the analysis can be visualized in the MRA Viewer component by selecting the result node.

MRA Results Viewer

The MRA viewer is structured in 3 distinct areas.

MRA viewer full.png

Summary Listing

This is a table with one row for each candidate master regulator.

MRA Summary listing.png

Each row contains 4 columns:

  1. Master Regulator: This is either the master regulator gene name or the marker/probeset name identifying the corresponding array feature (depending on the selection of the radio buttons “Symbol” and “Probe set”).
  2. P-value: the p-value of fisher’s exact test. The test utilizes a 2x2 contingency table where rows classify markers as differentially expressed versus non-differentially expressed, while columns indicate if a marker belongs to the regulon of the master regulator or not. Counts are computed using all markers found in the input experiment data.
  3. Genes in regulon: The number of genes in the regulon of the master regulator (this comprises the set of genes that are first neighbors of the master regulator in the interaction network specified in the parameters panel).
  4. Genes in target list: The number of differentially expressed genes that are also members of the regulon.

The contents of the table can be ordered across any column, by clicking on the column name.

Detailed Listing

By clicking on the radio button associated with a master regulator in the Summary table, it is possible to display the complete listing of the differentially expressed genes intersecting the regulon of that master regulator.

MRA Detailed listing.png

The genes are displayed in a table with the following columns:

  1. Genes in target list: the names of the genes in the intersection set. Either the gene name or the marker/probe set name is used (based on the choice of "Symbol" or "Probe Set" radio buttons).
  2. P-value: the p-value of the t-test statistic for this marker, computed over the 2 sets of arrays (cases versus controls).
  3. T-test value: The actual value of the t-test statistic for the gene. A positive value indicates that the mean expression of the gene is higher in cases than in controls (a negative value has the opposite meaning).

The “P-val Threshold” parameter (located above the table) can be used to limit the number of target genes displayed. Specifically, only genes with a p-value less than the specified cutoff will be shown (if “P-val Threshold” is left empty, then all target genes in the intersection set are displayed). Additionally functionality is made available through the following buttons:

  • Add to Set: creates a set containing the markers that correspond to the genes currently displayed in the table. The new marker set appears in the Markers component and is named after the master regulator.
  • Export selected: same as “Add to Set” but instead of being added to a marker set, the markers are stored into a file.
  • Export all: stores into a file information for all master regulators (instead of only the one currently selected in the “Summary” view). Specifically, for each master regulator, the file lists the markers for all genes that are both differential expressed and also belong to the master regulator’s regulon.

Graph View

For a given master regulator A and the intersection between its regulon and the set of differentially expressed genes, the graph view helps assess if the intersection genes are preferentially over-expressed in the cases versus the controls. The biological motivation comes from observing [ref] that regulators with multiple targets tend to affect the expression level of (most of) their targets in one particular direction: they either promote their expression or inhibit it; but they rarely do both equally.

MRA graph view.png

The red-blue gradient at the bottom of the graph represents the range between the highest (red) and the lowest (blue) t-test statistic recorded among all differentially expressed genes. The vertical bars correspond to the genes displayed in the table under the “Detailed listing” portion of the interface, i.e., the intersection between the differentially expressed genes and the regulon of the master regulator A currently selected within the “Summary listing” table. The relative location of a bar on the gradient represents the t-test statistic recorded for the corresponding gene. Further, the color of each bar provides information about the correlation between the expression levels of the target gene and the master regulator A (correlations are computed as Pearson’s r, using data from all microarrays in the experiment): black means that the two genes are positively correlated (r > 0) while orange means that correlation is negative (r < 0).

Dataset History

Each results node stores the parameter settings used to setup the corresponding MRA run. The specific parameter values can be inspected within the Dataset History component, after clicking on the MRA results node in the Project Folders pane.

Example of running MRA

This example uses the Bcell-100.exp dataset available in the data/public_data directory of geWorkbench, and further described on the Download page. Briefly, this dataset is composed of 100 Affymetrix HG-U95Av2 arrays on which various B-cell lines, both normal and cancerous, were analyzed. Thus it explores a potentially wide variety of expression phenotypes.

Prerequisites

  1. Obtain the annotation file for the HG-U95Av2 array type from the Affymetrix NetAffx website (http://www.affymetrix.com/support/technical/byproduct.affx?product=hgu95). The name will be similar to "HG_U95Av2.na29.annot.csv", where na29 is the version number. Loading the annotation file associates gene names and other information with the Affymetrix probeset IDs (see the geWorkbench FAQ for details on obtaining these files).
  2. Store in your disk the following 2 files:
    1. Interaction_network.txt: describes an interaction network with 1955 nodes and 3810 edges. The file has 1955 lines (one for each node, the node name is the first entry in every line) and each line lists the edges emanating from that node. Edges are describes as tab-delimited pairs where the first member of the pair is the name of the target node and the second member is a number specifying a weight for the edge (MRA does not use the weight information but other geWorkbench components do). Node names correspond to marker ids.
    2. Master_regulators.csv: a list of master regulators. The marker ids in this file correspond to genes whose Gene Ontology annotation (under the Molecular Function category) lists them as transcription factors.

Loading and preparing the example data

  1. Load the Bcell-100.exp dataset into geWorkbench as type "Affymetrix File Matrix". (See Local Data Files).
  2. When prompted, load the annotation file.

Choosing array groups

The Bcell-100 dataset comes with predefined sets of arrays.

  • In the Arrays/Phenotypes component (at lower left in the geWorkbench GUI), choose the group in the pulldown menu called "Class".
  • Check the box beside the 2 array sets titled "non-GC B-Cell" and "non-GC Tumor", as shown in the figure below.

MRA array set activation.png

  • Right-click on the array set titled "non-GC Tumor" and from the resulting popup menu choose "Classification->Case" to indicate that the arrays in that set will be treated as the "Case" phenotype for the t-test:

Array set class assignment (MRA).png

Setting up the parameters and starting MRA

In the Analysis Panel, select the "MRA Analysis" entry and set the parameters as follows:

  • Load Network: from the drop down choose the option "From File", click on the "Load" button, and select to open the file "Interaction_network.txt".
  • Master Regulators: from the drop down choose the option "From File", click on the "Load" button, and select to open the file "Master_regulators.csv".
  • T-test p-value (alpha): set the significance threshold to 0.01.

MRA params setting 1.png

  • Click on the "T-test" tab and under Correction Method select the option titled "Standard Bonferroni", to indicate that the p-value of the t-test statistic for each gene should be family-wise corrected before compared to the p-value threshold (alpha) specified above.

MRA params setting 2.png

  • Click on the Analyze button.

Results

Upon completion of the analysis, an MRA results node is placed in the Project Folders tree. The analysis results can be browsed using the MRA viewer.

References