Difference between revisions of "SOM"

(Limitations)
 
(20 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
__TOC__
 
__TOC__
  
==Limitations==
 
  
In version 1.0.4, the Hierachical Clustering tutorial requires a machine with 1 GB of memory, and the Java heap size must be set to at least 450 MB. This can be done in the file UILauncher.lax which is found in the application root directory of the InstallAnywhere packaged versions of geWorkbench. Specifically, the following line in that file should be set as follows:
 
  
lax.nl.java.option.java.heap.size.max=450678989
+
==Overview==
 +
 
 +
The SOM (Self-Organizing Maps) method clusters markers into a user-specified number of bins based on their similarity to each other. In geWorkbench, a SOM visualizer component displays the results graphically. The markers clustered into a particular bin can be saved as a set back to the Markers component for further analysis.
 +
 
 +
* Note - for the distance calculations used in SOM analysis to be valid, the data should be normalized such that the scale of variation over each array is equal.
 +
 
 +
A more formal description - Self Organizing Map (SOM) is an algorithm to perform clustering of real vectors defined on an instance space of dimensionality. The clusters found are described by prototypical instances, referred as neurons of the SOM, which are arranged topologically in the form of a one or two dimensional grid, the Self Organizing Map.
 +
 
 +
==SOM Parameters==
 +
 
 +
[[Image:SOM_Parameters.png|{{ImageMaxWidth}}]]
 +
 
 +
 
 +
===Number of Rows and Columns===
 +
The final graphical display will be laid out using the given number of rows and columns. The product of the two gives the number of bins into which the markers will be clustered. Both rows and columns must be greater than 0.
 +
 
 +
===Radius===
 +
When using the bubble neighborhood parameter this float value is used to define the extent of the neighborhood. If an SOM vector is within this distance from the winning node (the cluster to which an element has been assigned) then that Node (and SOM vector) is considered to be in the neighborhood and its SOM vector is adapted. This must be a number greater than 0.
 +
 
 +
===Iterations===
 +
The number of times the dataset will be presented to the Map. Each expression element will be presented this number of times to train the Nodes. This must be a number be greater than 0.
  
Version 1.03 of geWorkbench and earlier used an algorithm for Fast Hierarchical Clustering which does not implement the commonly understood versions of Average Linkage and Total Linkage.  We do not recommend further use of that version.
 
  
==Outline==
+
===Alpha===
 +
This value is used to scale the change of individual SOM vectors when a new expression vector is associated with a node. This must be a value between 1 and 0.
  
In this section the following information about clustering methods is covered:
+
===Function===
# What clustering is and how it can be used.
+
The neighborhood options indicate the conventions (formulas) used to update (adapt) an SOM vector once an expression vector has been added into a Node's neighborhood.
# Methods implemented in geWorkbench: SOMs and Hierarchical Clustering
+
====Bubble====
# An example of performing an SOM analysis
+
This option uses the provided radius (see above) to determine which surrounding SOM nodes are in the neighborhood and therefore are candidates for adaptation. When this option is selected the Alpha parameter for scaling the adaptation is used directly as provided from the user.
# An example of performing a Hierarchical Clustering analysis.
+
====Gaussian====
# Saving a list of genes obtained from clustering.
+
This option forces all SOM vectors in the network to be adapted regardless of proximity to the winning node. In this case the Alpha parameter is scaled based on the distance between the SOM vector to be adapted and the winning node's SOM vector.
  
==Overview==
+
==Services (Grid)==
  
Clustering methods can allow identification of groups of markers with similar expression.  A common application is to search for genes that appear to be co-regulatedA list of such markers, saved to the '''Markers''' component, can be used for further steps, such as retrieving upstream sequences, Gene Ontology analysis, or viewing of annotations.
+
SOM can be run either locally within geWorkbench, or remotely as a grid job on caGridSee the [[Tutorial_-_Grid_Services | Grid Services]] section for further details on setting up a grid job.
  
geWorkbench supports two clustering methods:
+
==Analysis Actions==
# Self-Organizing maps (SOMs)
+
This component uses the standard [[Analysis_Framework|analysis component framework]], which provides three buttons:
# Hierarchical Clustering
 
  
Self-organizing maps group the markers into a user-specified number of bins.  In geWorkbench, a SOM visualizer component displays the results graphically.   Hierarchical clustering constructs a tree-like relationship among the expression patterns of all markers present.  Results are viewed in the Dendrogram component.
+
* '''Analyze''' - Start the clustering job.
 +
* '''Save Settings''' - Save the current settings to a named entry in the settings list.
 +
* '''Delete Settings''' - Delete the selected setting entry from the list.
  
 
==SOM Example==
 
==SOM Example==
  
* Note - for the distance calculations used in SOM analysis to be valid, the data must have been normalized such that the scale of variation over each array is equal(More details to be added here).
+
For this example we will start with the data set used in the [[Tutorial_-_ANOVA | ANOVA]] tutorial.  Briefly, the Bcell-100.exp dataset was quantile normalized and log2 transformed.  A set of markers found by ANOVA analysis of four groups of arrays was saved to the Markers componentIt is this set of markers which will here be clustered.
* Load the microarray dataset "webmatrix_quantile_log2_dev1_mv0.exp", available in the tutorial_data.zip [[Download]].
+
 
* In the '''Arrays/Phenotypes''' component pulldown menu, select the group labeled "Class".
+
1. If desired, select a subset of markers and/or arrays on which to cluster. The following figure shows the set of 1786 markers found in the ANOVA example has been activated by checking the adjacent check-box.
* Activate two sets of arrays to compare, e.g. GC B-cell and non-GC B-cell, by checking the boxes before the names (these are chosen here because they are the smallest groups).
+
 
* Go to the '''Analysis''' component, and select '''SOM Analysis'''.
+
[[Image:T_HC_set_activation.png]]
 +
 
 +
 
 +
2. Set the SOM parameters. The default parameters are shown above in the Parameters section.  We will accept these parameters.
 +
 
 +
3. Push Analyze.  The result will be displayed graphically in the SOM Clusters Viewer.
  
Parameters:
+
4. The user might experiment with different parameter settings to attempt to discern informative groupings.
Rows, Columns - give the number of bins into which to separate the different marker expression patterns.
 
Radius -
 
Iterations -
 
Alpha -
 
Function - Bubble or Gaussian
 
  
The default parameters are shown below.  We will accept these parameters.
+
==SOM Clusters Viewer==
  
[[Image:T_SOM_Analysis_Parameters.png]]
+
Running the example just described, with 3 rows and 3 columns, produces the nine clusters shown below.
  
  
The resulting display of nine clusters is shown below.  The user should experiment with different parameters to attempt to discern informative groupings.
+
[[Image:T_SOM_result.png|{{ImageMaxWidth}}]]
  
 +
===Mouse over===
  
[[Image:T_SOM_display.png]]
+
Mousing over a particular point on any cluster will show the marker (probeset), the array and the expression value that the point is associated with.
  
 +
[[Image:T_SOM_mouse-over.png]]
  
Any individual graph can be right-clicked on and "Add to Set" chosen.  This will add these markers to a new Set in the '''Markers''' componentEach will be given a name starting with "Cluster Grid" and the number of markers will be shown.
+
===Show selected===
 +
If the "Show selected" box is checked, the user may then click on any of the clusters and it will be enlarged to fill the display areaUnchecking the "Show selected" box will return to the original display of all clusters.
  
==Hierarchical Clustering - Example==
+
[[Image:T_SOM_show_selected.png]]
  
* Load the microarray dataset "webmatrix_quantile_log2_dev1.2_mv0.exp", available in the tutorial_data.zip [[Download]].
+
===Left-click actions===
* In this tutorial, we are going to include all arrays.
+
====Selecting====
** For a faster example, you could select subsets of arrays or markers in the Markers and Arrays/Phenotypes component.
+
When a point on a cluster is clicked on with the mouse, the marker and array it corresponds to will be highlighted in the Markers and Arrays/Phenotypes components, respectively.
** You would do this by choosing a set in the pulldown menu there and checking the boxes in front of the desired subsets.
 
* Go to the Analysis component, and select Fast Hierarchical Clustering Analysis.
 
  
* In Hierarchical Clustering, set the parameters to:
+
====Zooming====
** Clustering Method: Total Linkage
+
While left-clicking on a cluster display, dragging the mouse downwards and to one side or the other will produce a selection box, which will have the effect of zooming in on the region selected.  Dragging the mouse upwards will zoom back out.
** Clustering Dimension: Marker
 
** Clustering Metric: Pearson's
 
  
*Click '''Analyze'''.
+
===Right-click menu===
  
The results are placed in the Project Folders component and labeled "Hierarchical Clustering", and can be displayed in the Dendrogram component.
+
Right-clicking on a cluster produces the menu shown below.
  
* Go to the Dendrogram visusalization component.
+
[[Image:T_SOM_right_click_menu.png]]
* change the width setting from 20 to 6, so that you can see all the arrays.
 
* Scroll down towards the bottom of the results, and find the region pictured below.
 
* Click the "Enable Zoom" button and, using the mouse cursor, select a subset of markers such as depicted here:
 
  
 +
The individual choices are:
  
[[Image:Tutorial-Dendrogram-ZoomSelection.png]]
 
  
 +
====Zoom In/Zoom Out====
 +
Zoom in or zoom out in a particular cluster.  Zooming can be done for both axes simultaneously, or individually using the sub-menus shown in the following figure.
  
* Now left-click on this selection and it will be displayed alone:
+
[[Image:T_SOM_ZoomIn_detail.png]]
  
 +
====Auto Range====
 +
Return the cluster to original display size (fit to display area).
  
[[Image:Tutorial-Dendrogram-ZoomSelected.png]]
+
====Image Snapshot====
 +
Add a snapshot of the selected cluster to the [[Workspace]].
  
 +
====Add to Set====
 +
Add the markers in the selected cluster to a new set in the Markers component.  The new sets name will start with "Cluster Grid".
  
* Right-click anywhere in this image and select "Add to Set", as shown above.  The markers will be placed in a new Marker Set with 84 members labeled "Cluster Tree":
 
  
 +
====Properties====
 +
The "Properties" item allows the title, scale, axis labels and other aspects of the cluster graphs to be customized.
  
[[Image:Tutorial-HierClust-84ClusterTreeMarkers.png]]
+
[[Image:T_SOM_Properties-title.png]]
  
 +
[[Image:T_SOM_Properties-Plot-Domain.png]]
  
==Saving the list of genes==
+
[[Image:T_SOM_Properties-Plot-Appearance.png]]
  
For use in future examples, you can save a list of genes from the Markers panel:
+
[[Image:T_SOM_Properties-Other.png]]
* Highlight its entry in the Markers component (Cluster Tree[84]).
 
* Right-click and select "Save".
 
* Enter a name.  We have saved the list from the above hierarchical clustering example as "cluster_tree_total_pearsons_84_markers.csv".  (The .csv is added automatically).  This list is available under this name in the tutorial data download.
 

Latest revision as of 19:28, 22 January 2014

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

The SOM (Self-Organizing Maps) method clusters markers into a user-specified number of bins based on their similarity to each other. In geWorkbench, a SOM visualizer component displays the results graphically. The markers clustered into a particular bin can be saved as a set back to the Markers component for further analysis.

  • Note - for the distance calculations used in SOM analysis to be valid, the data should be normalized such that the scale of variation over each array is equal.

A more formal description - Self Organizing Map (SOM) is an algorithm to perform clustering of real vectors defined on an instance space of dimensionality. The clusters found are described by prototypical instances, referred as neurons of the SOM, which are arranged topologically in the form of a one or two dimensional grid, the Self Organizing Map.

SOM Parameters

SOM Parameters.png


Number of Rows and Columns

The final graphical display will be laid out using the given number of rows and columns. The product of the two gives the number of bins into which the markers will be clustered. Both rows and columns must be greater than 0.

Radius

When using the bubble neighborhood parameter this float value is used to define the extent of the neighborhood. If an SOM vector is within this distance from the winning node (the cluster to which an element has been assigned) then that Node (and SOM vector) is considered to be in the neighborhood and its SOM vector is adapted. This must be a number greater than 0.

Iterations

The number of times the dataset will be presented to the Map. Each expression element will be presented this number of times to train the Nodes. This must be a number be greater than 0.


Alpha

This value is used to scale the change of individual SOM vectors when a new expression vector is associated with a node. This must be a value between 1 and 0.

Function

The neighborhood options indicate the conventions (formulas) used to update (adapt) an SOM vector once an expression vector has been added into a Node's neighborhood.

Bubble

This option uses the provided radius (see above) to determine which surrounding SOM nodes are in the neighborhood and therefore are candidates for adaptation. When this option is selected the Alpha parameter for scaling the adaptation is used directly as provided from the user.

Gaussian

This option forces all SOM vectors in the network to be adapted regardless of proximity to the winning node. In this case the Alpha parameter is scaled based on the distance between the SOM vector to be adapted and the winning node's SOM vector.

Services (Grid)

SOM can be run either locally within geWorkbench, or remotely as a grid job on caGrid. See the Grid Services section for further details on setting up a grid job.

Analysis Actions

This component uses the standard analysis component framework, which provides three buttons:

  • Analyze - Start the clustering job.
  • Save Settings - Save the current settings to a named entry in the settings list.
  • Delete Settings - Delete the selected setting entry from the list.

SOM Example

For this example we will start with the data set used in the ANOVA tutorial. Briefly, the Bcell-100.exp dataset was quantile normalized and log2 transformed. A set of markers found by ANOVA analysis of four groups of arrays was saved to the Markers component. It is this set of markers which will here be clustered.

1. If desired, select a subset of markers and/or arrays on which to cluster. The following figure shows the set of 1786 markers found in the ANOVA example has been activated by checking the adjacent check-box.

T HC set activation.png


2. Set the SOM parameters. The default parameters are shown above in the Parameters section. We will accept these parameters.

3. Push Analyze. The result will be displayed graphically in the SOM Clusters Viewer.

4. The user might experiment with different parameter settings to attempt to discern informative groupings.

SOM Clusters Viewer

Running the example just described, with 3 rows and 3 columns, produces the nine clusters shown below.


T SOM result.png

Mouse over

Mousing over a particular point on any cluster will show the marker (probeset), the array and the expression value that the point is associated with.

T SOM mouse-over.png

Show selected

If the "Show selected" box is checked, the user may then click on any of the clusters and it will be enlarged to fill the display area. Unchecking the "Show selected" box will return to the original display of all clusters.

T SOM show selected.png

Left-click actions

Selecting

When a point on a cluster is clicked on with the mouse, the marker and array it corresponds to will be highlighted in the Markers and Arrays/Phenotypes components, respectively.

Zooming

While left-clicking on a cluster display, dragging the mouse downwards and to one side or the other will produce a selection box, which will have the effect of zooming in on the region selected. Dragging the mouse upwards will zoom back out.

Right-click menu

Right-clicking on a cluster produces the menu shown below.

T SOM right click menu.png

The individual choices are:


Zoom In/Zoom Out

Zoom in or zoom out in a particular cluster. Zooming can be done for both axes simultaneously, or individually using the sub-menus shown in the following figure.

T SOM ZoomIn detail.png

Auto Range

Return the cluster to original display size (fit to display area).

Image Snapshot

Add a snapshot of the selected cluster to the Workspace.

Add to Set

Add the markers in the selected cluster to a new set in the Markers component. The new sets name will start with "Cluster Grid".


Properties

The "Properties" item allows the title, scale, axis labels and other aspects of the cluster graphs to be customized.

T SOM Properties-title.png

T SOM Properties-Plot-Domain.png

T SOM Properties-Plot-Appearance.png

T SOM Properties-Other.png