GeWorkbench-web/T-Test
Contents
Overview
A t-Test analysis can be used to identify markers with statistically significant differential expression between two sets of microarrays. In geWorkbench, these groups are specified as the "Case" and "Control" sets.
There are several steps to setting up a t-test analysis in geWorkbench.
- At least two sets of arrays must be defined in the Arrays component.
- One or more array sets must be selected as "Case", and one or more others as "Control".
After the t-test is run, the markers meeting the significant threshold will be displayed graphically in a Volcano plot, and are also placed into a new Marker Set called "Significant Genes".
The t-test result is calculated using the Apache Commons Math Library.
t-Test Parameters
Select Case and Control Arrays
- Array Context - If more than one context for array sets has been created, first choose the desired context. Each context can contain its own collection of array sets. It is provided as a convenience.
- Select Case/Control Sets - In each of the two lists, select at least one array set. No array should be part of both case and control sets; if it is, the analysis will not be performed.
P-value Parameters
Select Calculation Method
The p-values can be calculated by transforming the t-statistic directly, or by carrying out a permutation analysis. The permutation analysis measures how often a t-statistic at least as large as that observed occurs by chance after array labels of case and control are permuted.
- t-distribution (the default)
- Permutation - If chosen, the number of permutations to carry out must also be specified.
If permutation is chosen, the relevant controls become enabled
- Randomly group experiments Enter the integer number of random permutations to carry out in the adjoining text field (#times).
- Use all Permutatons - All possible permutations will be carried out.
Overall alpha (Critical p-value)
The threshold for a difference in expression between Case and Control sets being called significant (default 0.05). Multiple-testing corrections can be specified in the Alpha Corrections tab.
Data is log2-transformed
If the dataset has been Log2 transformed, select yes. Having this information allows the fold-change displayed in the Volcano Plot to be calculated in a consistent fashion.
Expression data typically better approximates a normal distribution if at has been log transformed. This can be performed in an external program such as geWorkbench desktop.
Alpha corrections
For multiple testing (alpha) correction with the standard t-distribution calculation, the following options are offered:
- Just alpha (no correction)
- Standard Bonferonni Correction - the value of alpha is divided by the number of markers included in the analysis.
- Adjusted (step down) Bonferonni Correction
If a permutation-based calculation method has been chosen, two variants of the Westfall and Young method also become available:
- minP
- maxT
Degrees of Freedom
Group variances can be declared as:
- unequal (Welch approximation) (default)
- Equal.
Example
Preparation
This example uses the file Bcell-100_log2.exp, which is the Bcell data described in the Tutorial_Data section, and which has been further normalized as follows:
- Threshold Normalization - set a minimum value of 1.0 for each data point,followed by
- Log2 transformation
You can also associate the Affymetrix HG-U95Av2 annotation file during data upload, although it is not required for this example. See the FAQ section for information on downloading the original annotation file from Affymetrix.
Set Parameters
- Array Context - "Class"
- Case array set - "GC-Tumor"
- Control array set - "GC B-cell"
- P-Value Parameters tab
- P-value Overall Alpha - 0.01
- Calculation method - t-distribution
- Data is log2 transformed - select "Yes"
- Alpha-corrections tab
- Standard Bonferroni
- Degree of Freedom tab
- Welch approximation - unequal group variances.
Running the t-test analysis
- Click Submit.
t-Test Results
Result Sets
A t-test result node is placed into the Workspace as a child of the analyzed microarray dataset.
The list of significant markers is placed into a new set "Significant Genes" in Marker Sets. The number of markers in the set is shown in square brackets.
Volcano Plot Visualizer
The Volcano Plot graphically depicts the results of the t-test for differential expression. It includes only markers which exceeded the threshold for significance in the t-test. The log2 fold change for each marker is plotted against -log10(P-value).