Difference between revisions of "PCA"

(Requirements)
(Parameters)
Line 35: Line 35:
  
  
 +
===Special behavior for Marker and Array Selections===
 +
 +
The PCA component defines a special behavior with respect to activated Marker or Array sets.
 +
 +
If the analysis variable is "experiments", then activated array sets will be respected, but activated marker sets will be ignored - all markers will be used.
 +
 +
If the analysis variable is "genes", the activated marker sets will be respected, but activated array sets will be ignored - all arrays will be used.
  
 
==Results Viewer==
 
==Results Viewer==

Revision as of 18:40, 26 November 2013

Principle Component Analysis (PCA)

Principle component analysis (Raychaudhuri et al., 2000) is used to find the most important contributors to the variance in a dataset.

geWorkbench can dispatch a PCA job to a GenePattern server, and display the returned result.

The analysis can be done either in terms of experiments (arrays) or genes. The result will be the most important features of the experiments or genes in explaining the data.

Note - due to memory limitations, analysis on a full set of genes may not be possible.

Requirements

The PCA 3D viewer has special requirements. The computer display driver must support OpenGL version 1.2 or higher. If it does not, a warning message will be displayed at the time the user attempts to create a 3D plot. Any recent graphics card should provide this support.

Parameters

The PCA component analysis interface is shown below.


PCA analysis genes.png


Variables

  • Genes - analyze the principle components in terms of genes. Each component will be composed from the weighted contributions of each gene. Available memory will limit how many genes can be successfully analyzed.
  • Experiments - analyze the principle components in terms of experiments. Each component will be composed from the weighted contributions of each experiment.

GenePattern Server Settings

PCA GenePattern Server Settings.png

  • Protocol - http
  • Host - enter the URL of an available GenePattern server.
  • Port - Enter the port number of the GenePattern server you are using.
  • Username - enter a GenePattern username
  • Password - the GenePattern password for username, if required.


Special behavior for Marker and Array Selections

The PCA component defines a special behavior with respect to activated Marker or Array sets.

If the analysis variable is "experiments", then activated array sets will be respected, but activated marker sets will be ignored - all markers will be used.

If the analysis variable is "genes", the activated marker sets will be respected, but activated array sets will be ignored - all arrays will be used.

Results Viewer

After running an analysis, any number of components can be graphed as to the weight of the contribution from each original variable by highlighting the desired components. Each principle component is shown as a line with a different color. In the table below the graph, the individual weights defining each principle component are shown.


PCA Components.png


A plot can be produced by selecting two or three components in the list and pushing the "Plot" button.

Selecting two components will produce a 2-D graph,


PCA Projection 2D.png


while selecting 3 components will produce a 3-D graph.

The 3-D graph can be rotated by grabbing any data point by left-clicking on it and dragging it with the mouse.


PCA Projection.png


Left-clicking on a data point will also cause it to be highlighted in the Markers component.


PCA Selection.png


Below, the PCA result node is shown in the Workspace, along with a highlighted gene.


PCA Result.png


References

  • Raychaudhuri S, Stuart JM, and Altman RB. (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac. Symp. Biocomput. 455-466. PubMed ID 10902193.