T-test

Revision as of 17:09, 28 February 2012 by Smith (talk | contribs) (Result Sets)

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

A t-Test analysis can be used to identify markers with statistically significant differential expression between two sets of microarrays. In geWorkbench, these groups are specified as the "Case" and "Control" sets.

There are several steps to setting up a t-test analysis in geWorkbench.

  1. At least two sets of arrays must be available in the Arrays component.
  2. The array sets to be used in the analysis must be "activated" by checking the box adjacent to their names in the Arrays component.
  3. One or more activated array sets must be designated "Case", and the others "Control" (which is the default classification).
  4. The t-test parameters must be set.


After the t-test is run, the results will be displayed graphically, and all markers meeting the significance threshold are placed into a new Marker Set called "Significant Genes".

Please see the Example section below for instructions on preparing array sets for the t-test analysis.

t-Test Parameters

P-value Parameters

p-values based on

The p-values can be calculated by transforming the t-statistic directly, or by carrying out a permutation analysis. The permutation analysis measures how often a t-statistic at least as large as that observed occurs by chance after array labels of case and control are permuted.

  • t-distribution (the default)
  • Permutation - If chosen, the number of permutations to carry out must also be specified.

Overall alpha (Critical p-value)

The threshold for a difference in expression between Case and Control sets being called significant. A value of 0.05 is often used for a single test. Multiple-testing corrections can be specified in the Alpha Corrections tab.

Data is Log2-transformed

If the dataset has been Log2 transformed, check this box. Having this information allows the fold-change displayed in the Volcano Plot to be calculated in a consistent fashion.

The system will examine the current dataset and make a guess as to whether the data has been log2 transformed. The user can override this guess using the check box.


T-test Pvalue params.png


Alpha corrections

For multiple testing (alpha) correction, the following options are offered:

  • no correction
  • Standard Bonferonni Correction - the value of alpha is divided by the number of markers included in the analysis.
  • Adjusted (step down) Bonferonni Correction
  • Two variants of the Westfall and Young method are available if the p-value is estimated by permuation:
    • minP
    • maxT


T-test alpha corrections.png

Degrees of Freedom

Group variances can be declared as:

  1. unequal (Welch approximation) (default)
  2. Equal.


T-test degrees of freedom.png

Example

Preparation

Obtain the file "BCell-100.exp", which is contained in the data/public_data directory of the geWorkbench distribution, or can be directly downloaded from the tutorial data download area.

You may also wish to load the Affymetrix HG-U95Av2 annotation file, although it is not required for this example. See the FAQ section for information on downloading this file from Affymetrix.


For tips on loading data files, see Local Data Files and Projects.

In this example, we apply two normalization steps to the data set.

  1. Threshold Normalizer - set a minimum value of 1. Any value less than 1 will be set to 1.
  2. Log2 Transformation Normalizer - Log2 transform the data.

For an actual data analysis, you should apply data normalization steps appropriate to your own data and analysis design.

Array Classification

The t-test in geWorkbench requires that at least two sets of arrays be "activated". Only such "activated" sets are considered. In addition, at least one such set must be designated as "Case", and at least one other as "Control" (which is the default classification). Note that more than one set of arrays can be marked as "Case" or control.

Array set classification is covered in the Arrays/Phenotypes chapter. However, for convenience, the steps are illustrated here.

The desired sets of arrays should be activated in the Arrays/Phenotypes component. This is done by checking the boxes by the desired Sets.

T-test Set activation BCell.png


The classification can be made directly by left-clicking on the "thumb-tack" icon adjacent to an array set name.

T-test Set classification left click Bcell.png


The array classification can also be set by right-clicking on the desired array set and selecting "Classification":


T-test Set classification right click Bcell.png


Using either method, the desired array set can be classified as "Case":


T-test Set selection BCell.png


The thumbtack image next to activated Array Sets is colored red.

Seting the Analysis Parameters

  1. The t-test component should be loaded by default in the Component Configuration Manager.
  2. From the Analysis Panel, select T-Test Analysis.
  3. P-value Parameters tab:
    1. P-values based on t-distribution.
    2. Note that here the default alpha (critical p-value) is set to 0.01.
    3. Check-mark the box "Data is log2 Transformed".
  4. Alpha-corrections tab
    1. Standard Bonferonni
  5. Degree of Freedom tab
    1. Welch approximation - unequal group variances.


The P-value Parameters tab set for the example analysis:


T-test Example setup.png

Running the t-test analysis

  1. Click Analyze. The results will be returned in three locations: The Project Folder, the Markers component, and the Visualization area.

t-Test Results

Result Sets

A t-test result node is placed into the Projects Folder as a child of the microarray dataset that was analyzed.

The list of significant markers is placed into a new set in the Markers component. This set is labeled "Significant Genes". The number in square brackets indicates the number of markers in the set.

Volcano Plot Visualizer

Volcano plot.png


The Volcano Plot graphically depicts the results of the t-test for differential expression. It includes only markers which exceeded the threshold for significance in the t-test. The log2 fold change for each marker is plotted against the -log10 of the P-value. Finally, the fold-change ratio is plotted on a log2 scale.

See the Volcano Plot tutorial for details on the calculation of the fold change, and on the color scheme used in the display. The fold change is calculated in a different fashion depending on whether the data was marked as log2 transformed or not.

Technical Notes

  • If two data points have exactly the same coordinates, only the point which is "on top" will be shown when clicked-on or moused over.
  • If the graph has only one point, or has several points all with the exact same coordinates, the default JFreeChart graphing behavior may omit a scale on the X or Y axis. The ranges of the axes and the labels can be manually adjusted. Right-click on the X or Y axis label area and select Properties->Range. Turn off "auto-ranging" and set the desired ranges.

Color Mosaic Visualizer

The Color Mosaic tab shows all of the arrays and the p-value calculated for each marker. By default, the markers are sorted by p-value. The display of each type of annotation can be switched on and off.

Please see the Color Mosaic tutorial for a complete description of all the features and controls of this viewer.


T-Test Color Mosaic Control Descriptions.png