SOM

Revision as of 18:21, 13 September 2006 by Smith (talk | contribs) (Limitations)

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Limitations

In version 1.0.4, the Hierachical Clustering tutorial requires a machine with 1 GB of memory, and the Java heap size must be set to at least 450 MB. This can be done in the file UILauncher.lax which is found in the application root directory of the InstallAnywhere packaged versions of geWorkbench. Specifically, the following line in that file should be set as follows:

lax.nl.java.option.java.heap.size.max=450678989

Version 1.03 of geWorkbench and earlier used an algorithm for Fast Hierarchical Clustering which does not implement the commonly understood versions of Average Linkage and Total Linkage. We do not recommend further use of that version.

Outline

In this section the following information about clustering methods is covered:

  1. What clustering is and how it can be used.
  2. Methods implemented in geWorkbench: SOMs and Hierarchical Clustering
  3. An example of performing an SOM analysis
  4. An example of performing a Hierarchical Clustering analysis.
  5. Saving a list of genes obtained from clustering.

Overview

Clustering methods can allow identification of groups of markers with similar expression. A common application is to search for genes that appear to be co-regulated. A list of such markers, saved to the Markers component, can be used for further steps, such as retrieving upstream sequences, Gene Ontology analysis, or viewing of annotations.

geWorkbench supports two clustering methods:

  1. Self-Organizing maps (SOMs)
  2. Hierarchical Clustering

Self-organizing maps group the markers into a user-specified number of bins. In geWorkbench, a SOM visualizer component displays the results graphically. Hierarchical clustering constructs a tree-like relationship among the expression patterns of all markers present. Results are viewed in the Dendrogram component.

SOM Example

  • Note - for the distance calculations used in SOM analysis to be valid, the data must have been normalized such that the scale of variation over each array is equal. (More details to be added here).
  • Load the microarray dataset "webmatrix_quantile_log2_dev1_mv0.exp", available in the tutorial_data.zip Download.
  • In the Arrays/Phenotypes component pulldown menu, select the group labeled "Class".
  • Activate two sets of arrays to compare, e.g. GC B-cell and non-GC B-cell, by checking the boxes before the names (these are chosen here because they are the smallest groups).
  • Go to the Analysis component, and select SOM Analysis.

Parameters: Rows, Columns - give the number of bins into which to separate the different marker expression patterns. Radius - Iterations - Alpha - Function - Bubble or Gaussian

The default parameters are shown below. We will accept these parameters.

T SOM Analysis Parameters.png


The resulting display of nine clusters is shown below. The user should experiment with different parameters to attempt to discern informative groupings.


T SOM display.png


Any individual graph can be right-clicked on and "Add to Set" chosen. This will add these markers to a new Set in the Markers component. Each will be given a name starting with "Cluster Grid" and the number of markers will be shown.

Hierarchical Clustering - Example

  • Load the microarray dataset "webmatrix_quantile_log2_dev1.2_mv0.exp", available in the tutorial_data.zip Download.
  • In this tutorial, we are going to include all arrays.
    • For a faster example, you could select subsets of arrays or markers in the Markers and Arrays/Phenotypes component.
    • You would do this by choosing a set in the pulldown menu there and checking the boxes in front of the desired subsets.
  • Go to the Analysis component, and select Fast Hierarchical Clustering Analysis.
  • In Hierarchical Clustering, set the parameters to:
    • Clustering Method: Total Linkage
    • Clustering Dimension: Marker
    • Clustering Metric: Pearson's
  • Click Analyze.

The results are placed in the Project Folders component and labeled "Hierarchical Clustering", and can be displayed in the Dendrogram component.

  • Go to the Dendrogram visusalization component.
  • change the width setting from 20 to 6, so that you can see all the arrays.
  • Scroll down towards the bottom of the results, and find the region pictured below.
  • Click the "Enable Zoom" button and, using the mouse cursor, select a subset of markers such as depicted here:


Tutorial-Dendrogram-ZoomSelection.png


  • Now left-click on this selection and it will be displayed alone:


Tutorial-Dendrogram-ZoomSelected.png


  • Right-click anywhere in this image and select "Add to Set", as shown above. The markers will be placed in a new Marker Set with 84 members labeled "Cluster Tree":


Tutorial-HierClust-84ClusterTreeMarkers.png


Saving the list of genes

For use in future examples, you can save a list of genes from the Markers panel:

  • Highlight its entry in the Markers component (Cluster Tree[84]).
  • Right-click and select "Save".
  • Enter a name. We have saved the list from the above hierarchical clustering example as "cluster_tree_total_pearsons_84_markers.csv". (The .csv is added automatically). This list is available under this name in the tutorial data download.