Difference between revisions of "Tutorials"
(→Individual analysis and visualization components) |
(→Individual analysis and visualization components) |
||
Line 140: | Line 140: | ||
This component provides for query and display of data generated by the Columbia LINCS Technology U01 and Computation U01 Centers. It provides experimental and computational results for drug mode of action and similarity calculations, and for synergy experiments. | This component provides for query and display of data generated by the Columbia LINCS Technology U01 and Computation U01 Centers. It provides experimental and computational results for drug mode of action and similarity calculations, and for synergy experiments. | ||
+ | ===[[Marker Annotations | Marker Annotations]]=== | ||
+ | Marker annotations can be retrieved, including BioCarta pathway diagrams. | ||
===[[MarkUs | MarkUs]]=== | ===[[MarkUs | MarkUs]]=== | ||
The MarkUs component assists in the assessment of the biochemical function for a given protein structure. The component in geWorkbench provides an interface to the MarkUs web server at Columbia. MarkUs identifies related protein structures and sequences, detects protein cavities, and calculates the surface electrostatic potentials and amino acid conservation profile. | The MarkUs component assists in the assessment of the biochemical function for a given protein structure. The component in geWorkbench provides an interface to the MarkUs web server at Columbia. MarkUs identifies related protein structures and sequences, detects protein cavities, and calculates the surface electrostatic potentials and amino acid conservation profile. | ||
− | |||
− | |||
− | |||
===[[Master Regulator Analysis | Master Regulator Analysis]]=== | ===[[Master Regulator Analysis | Master Regulator Analysis]]=== | ||
The Master Regulator Analysis (MRA) component attempts to identify transcription factors which control the regulation of a set of differentially expressed target genes (TGs). Differential expression is determined using a t-test on microarray gene expression profiles from 2 cellular phenotypes, e.g. experimental and control. | The Master Regulator Analysis (MRA) component attempts to identify transcription factors which control the regulation of a set of differentially expressed target genes (TGs). Differential expression is determined using a t-test on microarray gene expression profiles from 2 cellular phenotypes, e.g. experimental and control. | ||
+ | |||
+ | ====[[MRA-FET]]==== | ||
+ | Master Regulator Analysis using Fisher's Exact Test. | ||
+ | |||
+ | ====[[MARINa]]==== | ||
+ | Master Regulator Analysis using the MARINa algoarithm. GSEA is used to compute enrichment. | ||
===[[MatrixREDUCE | MatrixREDUCE]]=== | ===[[MatrixREDUCE | MatrixREDUCE]]=== | ||
Line 170: | Line 175: | ||
===[[SAM|SAM]]=== | ===[[SAM|SAM]]=== | ||
+ | Interface to run the R implementation of Significance Analysis of Microarrays. | ||
− | + | ===[[Sequence_Retriever | Sequence Retriever]]=== | |
+ | Genomic and protein sequences for selected genes can be retrieved for further analysis. | ||
+ | ===[[SkyBase]]=== | ||
+ | Search the SkyBase database with a sequence of interest to find homology models which meet user-defined alignment coverage and sequence identity constraints. | ||
+ | SkyBase is a database that stores the homology models built by SkyLine analysis for | ||
+ | * structures in the RCSB Protein Data Bank (PDB) with a 60% redundancy cutoff | ||
+ | * (PDB60) structures in the Northeast Structural Genomics Consortium database | ||
− | ===[[ | + | ===[[SkyLine]]=== |
− | |||
===[[SOM | SOM]]=== | ===[[SOM | SOM]]=== | ||
Line 182: | Line 193: | ||
===[[SVM | SVM]]=== | ===[[SVM | SVM]]=== | ||
Classification using Support Vector Machines. | Classification using Support Vector Machines. | ||
+ | |||
+ | ===[[Viper_Analysis]]=== | ||
+ | The VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis) [Alvarez et al., manuscript in preparation] component in geWorkbench transforms the expression profile for each sample (column) into a transcription-factor activity profile, representing the relative activity of each TF in each sample. | ||
+ | |||
+ | ===[[Volcano_Plot]]=== | ||
+ | The Volcano Plot graphically depicts the results of the t-test for differential expression. The log2 fold change for each significant marker is plotted against the -log10 of the P-value. |
Revision as of 17:28, 23 January 2014
The tutorials shown on this page provide a quick introduction to the most important features of geWorkbench. Additional information can be found in the User Guide and in the Online Help section of the program.
Using the basic framework of geWorkbench
The graphical interface, files and data==
Quick Start
A quick jump into the most important topics for learning to use geWorkbench.
Basics
An introduction to the use of geWorkbench.
Menu_Bar
Many geWorkbench commands are available in the upper menu bar, as well as in the Workspace.
Component Configuration Manager
Customize geWorkbench to your needs. geWorkbench comes initially configured with only basic components installed. Use the CCM to load additional available modules.
Workspace
The Workspace is where data is loaded and analysis results are stored.
Information Panel
Describes components use to record details of calculations and datasets.
Local Data Files
Covers loading data from files on your local computer.
File Formats
Details of several different file formats supported by geWorkbench.
caArray
How to download microarray data from caArray. geWorkbench can download "derived" data sets from caArray.
Array Sets
How to create and use sets of arrays for controlling data analysis.
Marker Sets
How to create and use sets of markers for controlling data analysis.
Viewing a Microarray Dataset
Survey of geWorkbench visualiztion tools for microarray data. Includes:
- Microarray Viewer
- Tabular Microarray Viewer
- CEL file image viewer
- Color Mosaic
- Expression Profiles
- Scatter Plot
Filtering
geWorkbench provides numerous methods for filtering microarray data.
Normalization
geWorkbench provides numerous methods for normalizing microarray data.
Tutorial Data
Downloadable data used in the tutorials.
Individual analysis and visualization components
Analysis Framework
Most analysis routines are located in the command area located in the lower right quadrant of geWorkbench. This section describes a common framework for saving parameter settings that these components share.
ANOVA
How to set up and run Analysis of Variance.
ARACNE
Formal method for reverse Engineering - microarray datasets can be analyzed for interactions between genes. Now includes new ARACNe2, which implements the much faster Adaptive Partitioning algorithm and accurate parameter estimation.
BLAST
Submits BLAST jobs to the NCBI server and displays and allows further interaction with alignment results.
Cellular Networks KnowledgeBase (CNKB)
The CNKB component queries a database of protein-protein and protein-DNA interactions maintained at Columbia University.
CeRNA_Query
This component provides query access to a precomputed database of competitive endogenous RNA (ceRNA) interactions, also called "sponge" interactions. These interactions underlie a post-transcriptional layer of regulation, and were predicted using the Hermes algorithm (Sumazin et al., 2011).
Classification
Several classification components have been ported by the GenePattern development team to work with geWorkbench. These include K-nearest neighbors (KNN), Principle Component Analysis (PCA), Support Vector Machines (SVM) and Weighted Voting (WV).
Color Mosaic
Displays expression results as a heat map.
Consensus Clustering
This component allows geWorkbench to run Consensus Clustering on a GenePattern server.
Cytoscape
Cytoscape is used to display network interaction diagrams (from adjacency matrices). It features two-way interaction with the geWorkbench Markers component.
Cupid
Cupid (Sumazin et al. 2011) generates information that can help predict if a gene is a target of a specific miRNA. The Cupid service provides a simple query interface to a database of precalculated Cupid results.
DeMAND
The DeMAND (Drug Mode of Action through Network Dysreguation) algorithm measures dysregulation between the expression of two genes in a network caused by e.g. a drug perturbation. The list of top dysregulated gene pairs can reveal details of a drug's mode of action in the tested cellular system or tissue.
Differential Expression (t-test)
Several variants of the t-test are available.
Expression Value Distribution
View and manipulate a histogram of the distribution of expression values for each array.
Fold Change
Compare the ratio of the expression of genes between two sets of arrays, e.g. case and control sets.
Gene Ontology Term Analysis
Finds Gene Ontology terms that are over-represented in a list of genes of interest.
Gene Ontology Viewer
The Gene Ontology Viewer provides both a standalone GO Term browser, as well as displaying results of GO Term Analysis. Genes associated with a term can be copied back into a marker set for further analysis.
GenomeSpace
GenomeSpace allows for the transfer of data between a number of different genomics and bioinformatics software analysis platforms, including geWorkbench.
genSpace
GenSpace is a social networking tool which allows patterns of use (putative workflows) of geWorkbench components to be inferred and queried. If desired, (participation is entirely optional) it can be used to identify potential expert users of particular components who may be able provide advice.
Grid Services
A number of geWorkbench data analysis components have been implemented as services on the National Cancer Institute's caGrid. caGrid is an infrastructure component of the NCI's caBIG(R) program.
GSEA
Implements a front-end for submitting data to and viewing the results of a GSEA (Subramanian et al, 2005) analysis on a GenePattern server.
Hierarchical Clustering
geWorkbench implements its own agglomerative hierarchical clustering algorithm.
Jmol
Jmol is a molecular structure viewer for viewing PDB format files.
K-Means_Clustering
Provides an interface to running K-Means Clustering on a GenePattern server, and a viewer for the results.
LINCS_Query
This component provides for query and display of data generated by the Columbia LINCS Technology U01 and Computation U01 Centers. It provides experimental and computational results for drug mode of action and similarity calculations, and for synergy experiments.
Marker Annotations
Marker annotations can be retrieved, including BioCarta pathway diagrams.
MarkUs
The MarkUs component assists in the assessment of the biochemical function for a given protein structure. The component in geWorkbench provides an interface to the MarkUs web server at Columbia. MarkUs identifies related protein structures and sequences, detects protein cavities, and calculates the surface electrostatic potentials and amino acid conservation profile.
Master Regulator Analysis
The Master Regulator Analysis (MRA) component attempts to identify transcription factors which control the regulation of a set of differentially expressed target genes (TGs). Differential expression is determined using a t-test on microarray gene expression profiles from 2 cellular phenotypes, e.g. experimental and control.
MRA-FET
Master Regulator Analysis using Fisher's Exact Test.
MARINa
Master Regulator Analysis using the MARINa algoarithm. GSEA is used to compute enrichment.
MatrixREDUCE
MatrixREDUCE is a tool for inferring the binding specificity and nuclear concentration of transcription factors from microarray data.
MINDy
MINDy identifies modulators of gene regulation using conditional ARACNe calculations.
Pattern Discovery
Upstream seqeunce can be analyzed for conserved sequence patterns.
Principle Component Analysis (PCA)
Find components of the data responsible for the greatest variance. Provides a front-end to analysis on a GenePattern server, and graphical visualization of the results.
Promoter Analysis
Search a set of sequences against a promoter database.
Pudge
Pudge provides an interface to a protein structure prediction server (Honig lab) which integrates tools used at different stages of the structural prediction process.
SAM
Interface to run the R implementation of Significance Analysis of Microarrays.
Sequence Retriever
Genomic and protein sequences for selected genes can be retrieved for further analysis.
SkyBase
Search the SkyBase database with a sequence of interest to find homology models which meet user-defined alignment coverage and sequence identity constraints. SkyBase is a database that stores the homology models built by SkyLine analysis for
- structures in the RCSB Protein Data Bank (PDB) with a 60% redundancy cutoff
- (PDB60) structures in the Northeast Structural Genomics Consortium database
SkyLine
SOM
Clustering using Self-Organizing Maps.
SVM
Classification using Support Vector Machines.
Viper_Analysis
The VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis) [Alvarez et al., manuscript in preparation] component in geWorkbench transforms the expression profile for each sample (column) into a transcription-factor activity profile, representing the relative activity of each TF in each sample.
Volcano_Plot
The Volcano Plot graphically depicts the results of the t-test for differential expression. The log2 fold change for each significant marker is plotted against the -log10 of the P-value.