GeWorkbench-web/T-Test

Revision as of 15:08, 13 March 2015 by Smith (talk | contribs) (Example)

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

A t-Test analysis can be used to identify markers with statistically significant differential expression between two sets of microarrays. In geWorkbench, these groups are specified as the "Case" and "Control" sets.

There are several steps to setting up a t-test analysis in geWorkbench.

  1. At least two sets of arrays must be defined in the Arrays component.
  2. One or more array sets must be selected as "Case", and one or more others as "Control".


After the t-test is run, the markers meeting the significance threshold will be displayed graphically in a Volcano plot, and are also placed into a new Marker Set called "Significant Genes".

The t-test result is calculated using the Apache Commons Math Library.

t-Test Parameters

Select Case and Control Arrays

T-Test web params main.png

  • Array Context - If more than one context for array sets has been created, first choose the desired context. Each context can contain its own collection of array sets. It is provided as a convenience.
  • Select Case/Control Sets - In each of the two lists, select at least one array set. No array should be part of both case and control sets; if it is, the analysis will not be performed.


P-value Parameters

T-Test web params pvalue.png

Select Calculation Method

The p-values can be calculated by transforming the t-statistic directly, or by carrying out a permutation analysis. The permutation analysis measures how often a t-statistic at least as large as that observed occurs by chance after array labels of case and control are permuted.

  • t-distribution (the default)
  • Permutation - If chosen, the number of permutations to carry out must also be specified.


T-Test web params pvalue permutation.png

If permutation is chosen, the relevant controls become enabled

  • Randomly group experiments Enter the integer number of random permutations to carry out in the adjoining text field (#times).
  • Use all Permutatons - All possible permutations will be carried out.

Overall alpha (Critical p-value)

The threshold for a difference in expression between Case and Control sets being called significant (default 0.05). Multiple-testing corrections can be specified in the Alpha Corrections tab.

Data is log2-transformed

If the dataset has been Log2 transformed, select yes. Having this information allows the fold-change displayed in the Volcano Plot to be calculated in a consistent fashion.

Expression data typically better approximates a normal distribution if at has been log transformed. This can be performed in an external program such as geWorkbench desktop.


Alpha corrections

T-Test web params alpha.png


For multiple testing (alpha) correction with the standard t-distribution calculation, the following options are offered:

  • Just alpha (no correction)
  • Standard Bonferonni Correction - the value of alpha is divided by the number of markers included in the analysis.
  • Adjusted (step down) Bonferonni Correction


T-Test web params alpha permutation.png

If a permutation-based calculation method has been chosen, two variants of the Westfall and Young method also become available:

  • minP
  • maxT

Degrees of Freedom

T-Test web params dof.png


Group variances can be declared as:

  1. unequal (Welch approximation) (default)
  2. Equal.


Example

Preparation

This example uses the file Bcell-100_log2.exp, which is the Bcell data described in the Tutorial_Data section, and which has been further normalized as follows:

  • Threshold Normalization - set a minimum value of 1.0 for each data point,followed by
  • Log2 transformation

You can also associate the Affymetrix HG-U95Av2 annotation file during data upload, although it is not required for this example. See the FAQ section for information on downloading the original annotation file from Affymetrix.


Set Parameters

  • Array Context - "Class"
  • Case array set - "GC-Tumor"
  • Control array set - "GC B-cell"
  • P-Value Parameters tab
    • P-value Overall Alpha - 0.01
    • Calculation method - t-distribution
    • Data is log2 transformed - select "Yes"
  • Alpha-corrections tab
    • Standard Bonferroni
  • Degree of Freedom tab
    • Welch approximation - unequal group variances.


Running the t-test analysis

  • Click Submit.

t-Test Results

Result Node

A t-test result node is placed into the Workspace as a child of the analyzed microarray dataset.

The list of significant markers is placed into a new set "Significant Genes" in Marker Sets. The number of markers in the set is shown in square brackets.

Fold Change

The method used to calculate fold change depends on whether the data was marked as log2 transformed or not during the t-test using the "Data is log2 transformed" box.

  • Linear data ("Data is log2 transformed" box was not checked): the fold change is calculated, for each marker, as the Log2 transform of the average expression in the Case set divided by the average expression in the control set, that is,
Log2(Avg(cases)/Avg(controls)) 
or 
Log2(Avg(cases)) - Log2(Avg(controls)).  (Difference of logs of averaged values).
  • Log2 transformed data ("Data is log2 transformed" box was checked): In this case, for each marker, the average of the (log) case values minus the average of the (log) control values is calculated, that is,
Avg(cases) - Avg(controls).  (Difference of averaged log values).

The fold change is not calculated if, for the linear case, the average case or control value is negative.


Volcano Plot Visualizer

T-Test web volcano plot.png


The Volcano Plot graphically depicts the results of the t-test for differential expression. It includes only markers which exceeded the threshold for significance in the t-test. The log2 fold change for each marker is plotted against -log10(P-value).

The fold change is calculate as described in the previous section based on whether the data was log2 transformed or not.