Difference between revisions of "SOM"

Line 7: Line 7:
  
  
The file "webmatrix.exp" contains results from 100 Affymetrix HG-U95Av2 chips containing B-cell samples from numerous different disease states (phenotypes).  12600 markers are represented.  To prepare this dataset for clustering we will filter and normalize the data.  The steps shown are just an example of how filtering and normalization can be used, and each dataset should be handled according to the type of analysis being undertaken and its goals.
+
The file "webmatrix.exp" contains results from 100 Affymetrix HG-U95Av2 chips containing B-cell samples from numerous different disease states (phenotypes).  12600 markers are represented.  To prepare this dataset for clustering we will normalize and filter the data.  The steps shown below are just an example of how filtering and normalization can be used, and each dataset should be handled according to the type of analysis being undertaken and its goals.
  
For this dataset, we performed the following steps:
+
The dataset was created by the following steps:
 +
* Normalization: Quantile normalization.
 +
* Normalization: Log2 transformation.
 +
* Filtering: Deviation filter with Deviation bound of 1.
 +
* Filtering: Missing values filter with maximum number of missing arrays of 0.
  
1. Applied '''Expression Threshold Filter''' to remove very low expression values in the range 0-20.
+
The results of performing these steps are also available as the file "webmatrix_quantile_log2_dev1_mv0.exp" in the tutorial data section (coming soon).
  
2. Applied the '''Missing Values Filter''' with a maximum number of missing values per marker of 2. (Deletes markers with more than 2 missing values).  This reduced the number of markers to 6327.
+
==Example==
  
3. Performed '''Quantile Normalization''' using '''Averaging Method''' of '''Mean Marker Profile'''.
+
* In the Arrays/Phenotypes component, select the set of arrays labeled "ultrashort designation".
 +
* Activate two classes of arrays to compare, the GC B-cell and non-GC B-cell, by checking the boxes before the names.
 +
* Go to the Analysis component, and select Hierarchical Clustering.
 +
* At the bottom of the Analysis component, '''uncheck the box that says "All Arrays"'''. This will allow the clustering to be done only on those arrays which were activated in the Arrays/Phenotypes component. 
  
4. Applied the '''Deviation Filter''' with Deviation Bound of 20 and '''Missing Values''' set to '''Marker Average'''.
 
  
5. Applied the '''Missing Values Filter''' as in (2), which further reduced the number of markers to 6270.
+
* In Hierarchical Clustering, set the parameters to:
 +
** Clustering Method: Total Linkage
 +
** Clustering Dimension: Both
 +
** Clustering Metric: Euclidean
  
The resulting dataset was named '''webmatrix_fn.exp'''.
+
*Click '''Analyze'''.
  
 +
The results will be displayed in the Dendrogram component.
  
==Fast Hierarchical Clustering==
+
[[Image:T_HierarchicalClustering_BCregion.png]]
  
'''Fast Hierarchical Clustering''' is found in the '''Analysis Panel'''.
+
By scrolling down a bit, one finds a large interesting area, showing clear differences between groups of arrays.  We will select two clearly differentiated clusters.  Check the '''Enable Zoom''' checkbox. Then highlight the first cluster of 12 markers as shown here:
  
In this example we shown Hierarchical Clustering being performed with the following options:
+
[[Image:T_HierarchicalClustering_BC12Markers.png]]
  
1. Clustering Method:  "Total Linkage"
 
  
2. Clustering Dimension: "Both"
+
Then left-click to select this subset of the dendrogram.  It will be displayed alone.  
  
3. Clustering Metric: "Euclidean"
+
[[Image:T_HierarchicalClustering_BC12MarkersZoom.png]]
  
  
[[Image:T_Analysis_FHC.png]]
+
Now right-click and select "Add to Set".  In the Markers component, the select genes are added as Cluster Tree [12], where 12 is the number of markers selected.
  
  
Hit '''Analyze''' to run the clustering. The resulting dataset is inserted into the '''Project Panel'''
+
Repeat for the similar region just below, which contains another 44 markers.
  
  
[[Image:T_ProjectFolder_HierarchClust.png]]
+
[[Image:T_HierarchicalClustering_BC44Markers.png]]
 
 
 
 
and can be viewed in '''Dendrogram'''.
 
 
 
 
 
==Selecting a subtree in '''Dendrogram'''==
 
 
 
Here we will pick a subtree near the top for further investigation.
 
 
 
1. Click '''Enable Zoom'''.
 
 
 
2. Position the mouse pointer over the cluster subtree of interest.  It will be highlighted in blue.
 
 
 
 
 
[[Image:T_Dendrogram_SelectCluster.png]]
 
 
 
 
 
3. Left-click on the highlighted subtree to view it alone.
 
 
 
4. By right-clicking on the image, and selecting '''Add to Set''' (note that the picture uses a previous notation, "Add to Panel"),
 
 
 
[[Image:T_Dendrogram_ClusterDetailAdd.png]]
 
 
 
 
 
the markers in this subtree can be added as a new marker set in '''Markers.'''  It will be given the default name "Cluster Tree".
 
 
 
 
 
[[Image:T_MarkerSets_ClusterTree.png]]
 

Revision as of 20:30, 24 April 2006

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Preparation: An example of filtering and normalization

The file "webmatrix.exp" contains results from 100 Affymetrix HG-U95Av2 chips containing B-cell samples from numerous different disease states (phenotypes). 12600 markers are represented. To prepare this dataset for clustering we will normalize and filter the data. The steps shown below are just an example of how filtering and normalization can be used, and each dataset should be handled according to the type of analysis being undertaken and its goals.

The dataset was created by the following steps:

  • Normalization: Quantile normalization.
  • Normalization: Log2 transformation.
  • Filtering: Deviation filter with Deviation bound of 1.
  • Filtering: Missing values filter with maximum number of missing arrays of 0.

The results of performing these steps are also available as the file "webmatrix_quantile_log2_dev1_mv0.exp" in the tutorial data section (coming soon).

Example

  • In the Arrays/Phenotypes component, select the set of arrays labeled "ultrashort designation".
  • Activate two classes of arrays to compare, the GC B-cell and non-GC B-cell, by checking the boxes before the names.
  • Go to the Analysis component, and select Hierarchical Clustering.
  • At the bottom of the Analysis component, uncheck the box that says "All Arrays". This will allow the clustering to be done only on those arrays which were activated in the Arrays/Phenotypes component.


  • In Hierarchical Clustering, set the parameters to:
    • Clustering Method: Total Linkage
    • Clustering Dimension: Both
    • Clustering Metric: Euclidean
  • Click Analyze.

The results will be displayed in the Dendrogram component.

T HierarchicalClustering BCregion.png

By scrolling down a bit, one finds a large interesting area, showing clear differences between groups of arrays. We will select two clearly differentiated clusters. Check the Enable Zoom checkbox. Then highlight the first cluster of 12 markers as shown here:

T HierarchicalClustering BC12Markers.png


Then left-click to select this subset of the dendrogram. It will be displayed alone.

T HierarchicalClustering BC12MarkersZoom.png


Now right-click and select "Add to Set". In the Markers component, the select genes are added as Cluster Tree [12], where 12 is the number of markers selected.


Repeat for the similar region just below, which contains another 44 markers.


T HierarchicalClustering BC44Markers.png