Difference between revisions of "SOM"

Line 3: Line 3:
 
__TOC__
 
__TOC__
  
 +
==Note==
  
==Preparation: An example of filtering and normalization==
+
(June 6, 2006) A problem has been identified in the Hierarchical Clustering analysis  with the implementations of Average Linkage and Total Linkage.  This problem is present in the current version (1.0.3) of geWorkbench and all previous versions.  This problem will be fixed in the next release.  The implementation of single linkage is believed to be working correctly.
  
 +
==Background==
  
The file "webmatrix.exp" contains results from 100 Affymetrix HG-U95Av2 chips containing B-cell samples from numerous different disease states (phenotypes).  12600 markers are represented.  To prepare this dataset for clustering we will normalize and filter the data.  The steps shown below are just an example of how filtering and normalization can be used, and each dataset should be handled according to the type of analysis being undertaken and its goals.
+
Clustering methods can allow identification of groups of markers with similar expression. geWorkbench supports two clustering methods:
 +
# Self-organizing maps (SOMs)
 +
# Hierarchical Clustering
  
The dataset was created by the following steps:
+
Self-organizing maps group the markers into a user-specified number of bins. In geWorkbench, a SOM visualizer component displays the results graphically.   Hierarchical clustering constructs a tree-like relationship among the expression patterns of all markers present. Results are viewed in the Dendrogram component.
* Normalization: Quantile normalization.
 
* Normalization: Log2 transformation.
 
* Filtering: Deviation filter with Deviation bound of 1.
 
* Filtering: Missing values filter with maximum number of missing arrays of 0.
 
  
The results of performing these steps are also available as the file "webmatrix_quantile_log2_dev1_mv0.exp" in the tutorial data section (coming soon).
 
  
==Example==
+
==Preparation==
  
* In the Arrays/Phenotypes component, select the set of arrays labeled "ultrashort designation".
+
For the clustering examples, load a microarray dataset such as "webmatrix_quantile_log2_dev1_mv0.exp", available in the tutorial_data.zip [[Download]].
 +
 
 +
 
 +
==SOM Example==
 +
 
 +
* In the '''Arrays/Phenotypes''' component pulldown menu, select the group labeled "Class".
 +
* Activate two sets of arrays to compare, the GC B-cell and non-GC B-cell, by checking the boxes before the names.
 +
* Go to the '''Analysis''' component, and select '''SOM Analysis'''.
 +
 
 +
The default parameters are shown below.  We will accept these parameters.
 +
 
 +
[[Image:T_SOM_Analysis_Parameters.png]]
 +
 
 +
The resulting display of nine clusters is shown here:
 +
 
 +
Image:T_SOM_display.png
 +
 
 +
 
 +
 
 +
==Hierarchical Clustering - Example==
 +
 
 +
* In the Arrays/Phenotypes component, select the set of arrays labeled "Class".
 
* Activate two classes of arrays to compare, the GC B-cell and non-GC B-cell, by checking the boxes before the names.
 
* Activate two classes of arrays to compare, the GC B-cell and non-GC B-cell, by checking the boxes before the names.
 
* Go to the Analysis component, and select Hierarchical Clustering.
 
* Go to the Analysis component, and select Hierarchical Clustering.
* At the bottom of the Analysis component, '''uncheck the box that says "All Arrays"'''.  This will allow the clustering to be done only on those arrays which were activated in the Arrays/Phenotypes component.
+
* At the bottom of the Analysis component, the box that says '''All Arrays''' should be unchecked, so that the array selection above is used.
  
  

Revision as of 16:58, 6 June 2006

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Note

(June 6, 2006) A problem has been identified in the Hierarchical Clustering analysis with the implementations of Average Linkage and Total Linkage. This problem is present in the current version (1.0.3) of geWorkbench and all previous versions. This problem will be fixed in the next release. The implementation of single linkage is believed to be working correctly.

Background

Clustering methods can allow identification of groups of markers with similar expression. geWorkbench supports two clustering methods:

  1. Self-organizing maps (SOMs)
  2. Hierarchical Clustering

Self-organizing maps group the markers into a user-specified number of bins. In geWorkbench, a SOM visualizer component displays the results graphically. Hierarchical clustering constructs a tree-like relationship among the expression patterns of all markers present. Results are viewed in the Dendrogram component.


Preparation

For the clustering examples, load a microarray dataset such as "webmatrix_quantile_log2_dev1_mv0.exp", available in the tutorial_data.zip Download.


SOM Example

  • In the Arrays/Phenotypes component pulldown menu, select the group labeled "Class".
  • Activate two sets of arrays to compare, the GC B-cell and non-GC B-cell, by checking the boxes before the names.
  • Go to the Analysis component, and select SOM Analysis.

The default parameters are shown below. We will accept these parameters.

T SOM Analysis Parameters.png

The resulting display of nine clusters is shown here:

Image:T_SOM_display.png


Hierarchical Clustering - Example

  • In the Arrays/Phenotypes component, select the set of arrays labeled "Class".
  • Activate two classes of arrays to compare, the GC B-cell and non-GC B-cell, by checking the boxes before the names.
  • Go to the Analysis component, and select Hierarchical Clustering.
  • At the bottom of the Analysis component, the box that says All Arrays should be unchecked, so that the array selection above is used.


  • In Hierarchical Clustering, set the parameters to:
    • Clustering Method: Total Linkage
    • Clustering Dimension: Both
    • Clustering Metric: Euclidean
  • Click Analyze.

The results will be displayed in the Dendrogram component.

T HierarchicalClustering BCregion.png

By scrolling down a bit, one finds a large interesting area, showing clear differences between groups of arrays. We will select two clearly differentiated clusters. Check the Enable Zoom checkbox. Then highlight the first cluster of 12 markers as shown here:

T HierarchicalClustering BC12Markers.png


Then left-click to select this subset of the dendrogram. It will be displayed alone.

T HierarchicalClustering BC12MarkersZoom.png


Now right-click and select "Add to Set". In the Markers component, the select genes are added as Cluster Tree [12], where 12 is the number of markers selected.


Repeat for the similar region just below, which contains another 44 markers.


T HierarchicalClustering BC44Markers.png


This will result in two sets of markers having been added to the Markers component, as shown below:

T Markers ClusterTree12and44.png