Difference between revisions of "ARACNe"
(→Run ARACNe) |
|||
Line 1: | Line 1: | ||
{{TutorialsTopNav}} | {{TutorialsTopNav}} | ||
− | + | =ARACNe= | |
ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) is an information-theoretic algorithm used to identify transcriptional interactions between gene products using microarray expression profile data. The resulting network is displayed using the Cytoscape component. ARACNe can be used to predict potential functional associations among genes, or to predict novel functions for uncharacterized genes, by identifying statistical dependencies between genes. The results take the form of a matrix of candidate interactions, also called an adjacency matrix, which can be used for further network visualization and analysis. | ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) is an information-theoretic algorithm used to identify transcriptional interactions between gene products using microarray expression profile data. The resulting network is displayed using the Cytoscape component. ARACNe can be used to predict potential functional associations among genes, or to predict novel functions for uncharacterized genes, by identifying statistical dependencies between genes. The results take the form of a matrix of candidate interactions, also called an adjacency matrix, which can be used for further network visualization and analysis. | ||
Line 12: | Line 12: | ||
Parameters described below allow one to incorporate a list of putative transcription factors and optimize the run to discover targets that they may regulate. | Parameters described below allow one to incorporate a list of putative transcription factors and optimize the run to discover targets that they may regulate. | ||
− | + | ==Run ARACNe== | |
# Load or select a microarray data set, or select an existing adjacency matrix in the project folders area of geWorkbench. | # Load or select a microarray data set, or select an existing adjacency matrix in the project folders area of geWorkbench. | ||
# In the analysis pane (lower right), select ARACNE analysis from the analysis list. | # In the analysis pane (lower right), select ARACNE analysis from the analysis list. | ||
− | # Populate the parameters used for this analysis method | + | # Populate the parameters used for this analysis method (see below for details). |
− | + | # Click on '''Analyze'''. If successful, the resulting adjacency matrix is added to the Project Folders component as a child of its parent dataset. The Dataset history captures the analysis parameters. The network will be depicted visually in the Cytoscape component. | |
− | # Click on '''Analyze'''. If successful, the resulting adjacency matrix is added to the Project Folders component as a child of its parent dataset. The Dataset history captures the analysis parameters. | ||
Example: | Example: | ||
* Load the tutorial dataset Bcell-100.exp. | * Load the tutorial dataset Bcell-100.exp. | ||
− | * Select the hub gene option "List" or "From Sets" and enter probe 1973_s_at. | + | * Select the hub gene option "List" (version 1.6.3) or "From Sets" (version 1.7.0) and enter probe 1973_s_at. |
* Other parameters left at default settings. | * Other parameters left at default settings. | ||
+ | * Click "Analyze". | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==Parameters== | ||
+ | |||
+ | ===Two new parameters introduced in geWorkbench version 1.7.0 - Algorithm and Mode=== | ||
+ | |||
+ | geWorkbench 1.7.0 includes a new release of ARACNe, referred to as ARANCe2. ARACNe2 includes two important new parameters, '''Algorithm''' and '''Mode'''. | ||
+ | |||
+ | * '''Algorithm''': Two algorithms are offered, Adaptive Partitioning and Fixed Bandwidth. | ||
+ | ** Adaptive Partitioning was added with the incorporation of the ARACNe2 code into geWorkbench in version 1.7.0. '''Adaptive Partitioning is much faster than the Fixed Bandwidth method, and is also considered to produce superior results. Adaptive Partitioning is now the recommended algorithm for all purposes.''' | ||
+ | ** Fixed Bandwidth was the only algorithm offered with geWorkbench 1.6.3 and earlier and is included for compatibility with previous versions. | ||
+ | |||
+ | * '''Mode''': | ||
+ | ** '''Preprocessing''' - in this mode, runtime parameters are calculated, but no MI calculation is performed. Preprocessing for a given algorithm need be only run once. The results are written to one or two files in the geWorkbench root directory, and are specific both to the dataset used and the algorithm chosen. Each time ARACNe is run in Discovery mode, it will look for the dataset-specific parameter files in its root directory. If the files are not found (Preprocessing has not been run), default parameter values will be used). | ||
+ | *** Fixed Bandwidth algorithm - two files are written to the geWorkbench root directory, one containing parameters for calculating the kernel width, and the other containing parameters for calculating a MI threshold from a specified P-value. | ||
+ | *** Adaptive Partitioning algorithm - only the parameter file for calculating a MI threshold from a specified P-value is written. | ||
+ | ** '''Discovery''' - The ARACNe mutual information calculation is run. If dataset-specific parameter files are present, they will be used if needed (based on settings selected for Kernel Width and Threshold). | ||
+ | ** '''Complete''' - A preprocessing run will be performed followed immediately by a Discovery run. The dataset specific parameter files created during the Preprocssing step will be used if needed (based on settings selected for Kernel Width and Threshold). | ||
+ | |||
+ | ===When is no preprocessing needed?=== | ||
+ | The preprocessing step can be time consuming. If you are for example using Adaptive Partioning, and decide you do not need to specify a p-value threshold for accepting edges, then you can just set a MI value as the threshold and proceed directly to Discovery mode. This will however make interpreting results more difficult. | ||
+ | |||
+ | If ARACNe does not find the dataset-specific parameter files it needs as described above, it will use by default parameters calculated from the B-cell dataset (reference). | ||
+ | |||
− | |||
Version 1.6.3 and previous: | Version 1.6.3 and previous: | ||
Line 84: | Line 109: | ||
* '''All Arrays'''<nowiki>: checking this box overrides any activated set arrays in the Arrays/Phenotypes component for inclusion in the MI calculations. If an array set is activated, only those arrays will be used. In addition, if NO array sets are activated, all arrays will be used. </nowiki> | * '''All Arrays'''<nowiki>: checking this box overrides any activated set arrays in the Arrays/Phenotypes component for inclusion in the MI calculations. If an array set is activated, only those arrays will be used. In addition, if NO array sets are activated, all arrays will be used. </nowiki> | ||
− | + | ==Bootstrapping== | |
<blockquote> | <blockquote> | ||
Line 95: | Line 120: | ||
* '''Consensus threshold''' (for bootstrapping only): After the bootstrapping runs are made, a permutation test is used to estimate the significance of interactions. The consensus threshold sets the cutoff point for calling the interactions significant and returning them in the final adjancency matrix | * '''Consensus threshold''' (for bootstrapping only): After the bootstrapping runs are made, a permutation test is used to estimate the significance of interactions. The consensus threshold sets the cutoff point for calling the interactions significant and returning them in the final adjancency matrix | ||
− | + | ==View ARACNe== | |
ARACNe produces as output an adjacency matrix. This contains the MI score for each pair of expression profiles compared. The Cytoscape component can be used to visualize and further manipulate the inferred connectivity data, included selecting sets of interesting genes and returning them to the Markers component as a new set. A description of Cystoscape is located in the Cytoscape Tutorial and Help. | ARACNe produces as output an adjacency matrix. This contains the MI score for each pair of expression profiles compared. The Cytoscape component can be used to visualize and further manipulate the inferred connectivity data, included selecting sets of interesting genes and returning them to the Markers component as a new set. A description of Cystoscape is located in the Cytoscape Tutorial and Help. | ||
Line 101: | Line 126: | ||
[[Image:T_ARACNE_results.png]] | [[Image:T_ARACNE_results.png]] | ||
− | + | ==Selecting markers back into the Markers component in Cytoscape:== | |
Using the mouse, left click and draw a box around markers of interest in the Cytoscape display. The enclosed markers will be highlighted in yellow. | Using the mouse, left click and draw a box around markers of interest in the Cytoscape display. The enclosed markers will be highlighted in yellow. | ||
Line 113: | Line 138: | ||
2. Alternatively, any existing set can be "tagged" by right-clicking on it and selecting "Tag for visualization". Any markers highlighted in Cytoscape will now be returned to the "tagged" set. | 2. Alternatively, any existing set can be "tagged" by right-clicking on it and selecting "Tag for visualization". Any markers highlighted in Cytoscape will now be returned to the "tagged" set. | ||
− | + | ==References== | |
Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols (2006), Vol 1, No. 2, pgs. 663-672 | Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols (2006), Vol 1, No. 2, pgs. 663-672 |
Revision as of 11:14, 13 July 2009
Contents
ARACNe
ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) is an information-theoretic algorithm used to identify transcriptional interactions between gene products using microarray expression profile data. The resulting network is displayed using the Cytoscape component. ARACNe can be used to predict potential functional associations among genes, or to predict novel functions for uncharacterized genes, by identifying statistical dependencies between genes. The results take the form of a matrix of candidate interactions, also called an adjacency matrix, which can be used for further network visualization and analysis.
ARACNe can perform two separate calculations:
- Mutual Information: The mutual information (MI) of one or more marker's expression profile(s) is calculated against all other active markers.
- Data Processing Inequality (DPI): The DPI calculation (triangle inequality) is used to remove the weakest interaction (edge) between any three markers. That is, if a MI value is available between each of three possible pairings of three markers, the weakest interaction of the three will be removed from the output. This has the intent of removing indirect interactions. For example, if A->B->C, the interaction A->C will likely be weaker than A->B or B->C and would be removed. A tolerance can be set on this.
Parameters described below allow one to incorporate a list of putative transcription factors and optimize the run to discover targets that they may regulate.
Run ARACNe
- Load or select a microarray data set, or select an existing adjacency matrix in the project folders area of geWorkbench.
- In the analysis pane (lower right), select ARACNE analysis from the analysis list.
- Populate the parameters used for this analysis method (see below for details).
- Click on Analyze. If successful, the resulting adjacency matrix is added to the Project Folders component as a child of its parent dataset. The Dataset history captures the analysis parameters. The network will be depicted visually in the Cytoscape component.
Example:
- Load the tutorial dataset Bcell-100.exp.
- Select the hub gene option "List" (version 1.6.3) or "From Sets" (version 1.7.0) and enter probe 1973_s_at.
- Other parameters left at default settings.
- Click "Analyze".
Parameters
Two new parameters introduced in geWorkbench version 1.7.0 - Algorithm and Mode
geWorkbench 1.7.0 includes a new release of ARACNe, referred to as ARANCe2. ARACNe2 includes two important new parameters, Algorithm and Mode.
- Algorithm: Two algorithms are offered, Adaptive Partitioning and Fixed Bandwidth.
- Adaptive Partitioning was added with the incorporation of the ARACNe2 code into geWorkbench in version 1.7.0. Adaptive Partitioning is much faster than the Fixed Bandwidth method, and is also considered to produce superior results. Adaptive Partitioning is now the recommended algorithm for all purposes.
- Fixed Bandwidth was the only algorithm offered with geWorkbench 1.6.3 and earlier and is included for compatibility with previous versions.
- Mode:
- Preprocessing - in this mode, runtime parameters are calculated, but no MI calculation is performed. Preprocessing for a given algorithm need be only run once. The results are written to one or two files in the geWorkbench root directory, and are specific both to the dataset used and the algorithm chosen. Each time ARACNe is run in Discovery mode, it will look for the dataset-specific parameter files in its root directory. If the files are not found (Preprocessing has not been run), default parameter values will be used).
- Fixed Bandwidth algorithm - two files are written to the geWorkbench root directory, one containing parameters for calculating the kernel width, and the other containing parameters for calculating a MI threshold from a specified P-value.
- Adaptive Partitioning algorithm - only the parameter file for calculating a MI threshold from a specified P-value is written.
- Discovery - The ARACNe mutual information calculation is run. If dataset-specific parameter files are present, they will be used if needed (based on settings selected for Kernel Width and Threshold).
- Complete - A preprocessing run will be performed followed immediately by a Discovery run. The dataset specific parameter files created during the Preprocssing step will be used if needed (based on settings selected for Kernel Width and Threshold).
- Preprocessing - in this mode, runtime parameters are calculated, but no MI calculation is performed. Preprocessing for a given algorithm need be only run once. The results are written to one or two files in the geWorkbench root directory, and are specific both to the dataset used and the algorithm chosen. Each time ARACNe is run in Discovery mode, it will look for the dataset-specific parameter files in its root directory. If the files are not found (Preprocessing has not been run), default parameter values will be used).
When is no preprocessing needed?
The preprocessing step can be time consuming. If you are for example using Adaptive Partioning, and decide you do not need to specify a p-value threshold for accepting edges, then you can just set a MI value as the threshold and proceed directly to Discovery mode. This will however make interpreting results more difficult.
If ARACNe does not find the dataset-specific parameter files it needs as described above, it will use by default parameters calculated from the B-cell dataset (reference).
Version 1.6.3 and previous:
Version 1.7.0
Parameters in version 1.7.0
- Hub Marker(s): Specifies which gene markers will be treated as "hubs" in the ARACNE mutual information (MI) calculation. The mutual information is calculated for each specified hub marker against all other markers in the submitted dataset.
- "All vs All" - The MI of every pair of markers in the dataset is computed, that is, each is used as a hub.
- "From Sets" - allows a set of markers defined in the Markers component to be chosen from a pulldown menu. Alternatively, the user can type in the names of desired markers directly as a comma separated list.
- "From File" - allows a comma-separated list of markers to be read in from a file by clicking Load Markers..
- Algorithm: Two algorithms are offered, Adaptive Partitioning and Fixed Bandwidth.
- Adaptive Partitioning was added with the incorporation of the ARACNe2 code into geWorkbench in version 1.7.0.
- Fixed Bandwidth was the only algorithm offered with geWorkbench 1.6.3 and earlier and is included in later versions for compatibility.
Adaptive Partitioning is considered to produce superior results compared with the Fixed Bandwidth method, and it is also much faster. Adaptive Partitioning is now the recommended algorithm for all purposes.
- Mode:
- Preprocessing - in this mode, runtime parameters are calculated, but no MI calculation is performed. Preprocessing for a given algorithm need be only run once. Each time ARACNe is run in Discovery mode, it will look for the dataset-specific parameter files in its root directory.
- Fixed Bandwidth algorithm - two files are written to the geWorkbench root directory, one containing parameters for calculating the kernel width, and the other containing parameters for calculating a MI threshold from a specified P-value.
- Adaptive Partitioning algorithm - only the parameter file for calculating a MI threshold from a specified P-value is written.
- Discovery - The ARACNe mutual information calculation is run. If dataset-specific parameter files are present, the will be used if needed (based on settings selected for Kernel Width and Threshold).
- Complete - A preprocessing run will be performed followed immediately by a Discovery run. The dataset specific parameter files created during the Preprocssing step will be used if needed (based on settings selected for Kernel Width and Threshold).
- Preprocessing - in this mode, runtime parameters are calculated, but no MI calculation is performed. Preprocessing for a given algorithm need be only run once. Each time ARACNe is run in Discovery mode, it will look for the dataset-specific parameter files in its root directory.
Parameters in version 1.6.3 and previous:
- Hub Marker(s): Specifies which gene markers will be treated as "hubs" in the ARACNE mutual information (MI) calculation. The mutual information is calculated for each specified hub marker against all other markers in the submitted dataset.
- "All vs All" - The MI of every pair of markers in the dataset is computed, that is, each is used as a hub.
- "List" - the hub marker(s) are taken from a user-entered list. A comma-separated list of marker IDs can be typed directly into the component, or the list can be loaded from a CSV file by clicking Load.
The remaining parameters are the same in all versions (except as noted).
- Threshold Type: This drop-down specifies the type of threshold to be used and can take the values “Mutual Info” or “P-value”. The actual value entered into the adjacent text area is always a number between 0 and 1.
- Kernel width: The Kernel width is a scaling parameter used for fitting a Gaussian function to the data when running the FIXED_BANDWIDTH algorithm only, otherwise this field is disabled. If used, the value can be either inferred or specified directly.
- Inferred: If PREPROCESSING has been run on the dataset (mode is set to PREPROCESSING or COMPLETE), the kernel width is calculated directly and will be used if "inferred" is selected. If PREPROCESSING has not been run, the kernel width is inferred based on parameters fitted to a large B-cell dataset (Margolin et al, 2006), extrapolated for the number of samples in the dataset being tested.
- Specify: The user can enter a value for the kernel width directly, e.g. based on a prior calculation with this dataset.
- DPI Tolerance - The Data Processing Inequality (triangle inequality)can be used to remove the effects of indirect interactions, e.g. if TF1->TF2->Target, DPI can be used to remove the indirect action of TF1 on the target. Stated another way, the DPI can be used to remove the weakest interaction of those between any three markers. The DPI tolerance specifies the degree of sampling error to be accepted, as with a finite sample size an exact value MI can not be calculated. The higher the tolerance specified, the fewer the edges that will be removed.
- If the “Do Not Apply” option is specified, no DPI is applied.
- If the “Apply” option is selected then the DPI is applied and the value (between 0 and 1) in the associated text box is used to determine the stringency level of the DPI application.
- DPI Target List - The DPI target list can be used to limit the ARACNE calculation to transcriptional networks. It is used to screen out spurious regulatory interaction signals of genes that are tightly coexpressed but are not in a regulatory relationship to each other, for example genes for two proteins that are in a physical complex and hence always produced in the same amounts. A comma-separated list can be typed in, or it can be loaded from an external file. If used, the DPI Target List should contain all markers that are annotated as transcription factors. Signaling proteins could also be included.
- Details: If the box is checked, the user selects and loads a file which specifies markers (which should be a list of one or more presumptive transcription factors) which will be given preferential treatment during the DPI edge-removal step. Edges originating from markers on this list will not be removed by edges originating from markers not on this list. However, for DPI calculations where all three markers are members of the list, the weakest connecting edge may still be removed.
- All Markers: checking this box overrides any activated set markers in the Markers component for inclusion in the pairwise MI calculations. If a marker set is activated, only the markers in the set will be used. In addition, if NO marker sets are activated, all markers will be used.
- All Arrays: checking this box overrides any activated set arrays in the Arrays/Phenotypes component for inclusion in the MI calculations. If an array set is activated, only those arrays will be used. In addition, if NO array sets are activated, all arrays will be used.
Bootstrapping
Bootstrap analysis can be used to generate a more reliable estimate of statistical significance for the interactions. Please see Margolin et al. 2006, Nature Protocols, Vol 1, No. 2, pg. 663-672 for further details (full reference below). Briefly, repeated runs of ARACNE are made, with arrays drawn at random from the full dataset with replacement. The same number of arrays is drawn each time as is present in the original dataset. A permutation test is then used to obtain a null distribution, against which the statistical significance of support for each network edge connection (interaction) can be measured.
- Bootstrap number: Specifies the number of bootstrapping runs to perform.
- Consensus threshold (for bootstrapping only): After the bootstrapping runs are made, a permutation test is used to estimate the significance of interactions. The consensus threshold sets the cutoff point for calling the interactions significant and returning them in the final adjancency matrix
View ARACNe
ARACNe produces as output an adjacency matrix. This contains the MI score for each pair of expression profiles compared. The Cytoscape component can be used to visualize and further manipulate the inferred connectivity data, included selecting sets of interesting genes and returning them to the Markers component as a new set. A description of Cystoscape is located in the Cytoscape Tutorial and Help.
Selecting markers back into the Markers component in Cytoscape:
Using the mouse, left click and draw a box around markers of interest in the Cytoscape display. The enclosed markers will be highlighted in yellow.
The highlighted markers will also be returned to the Markers component as follows:
1. By default, the highlighted markers are returned to the default "Selection" set.
2. Alternatively, any existing set can be "tagged" by right-clicking on it and selecting "Tag for visualization". Any markers highlighted in Cytoscape will now be returned to the "tagged" set.
References
Margolin, A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., and Califano, A. Reverse Engineering Cellular Networks. Nature Protocols (2006), Vol 1, No. 2, pgs. 663-672