Difference between revisions of "User:Smith"

 
(27 intermediate revisions by the same user not shown)
Line 1: Line 1:
Tutorial Design considerations -
+
==Resources:==
1. Probably best not to use detailed section numbers, since we cannot autoupdate them in this wiki.  Instead, rely on links?
 
2. Each section should list example data files needed, and these should be part of distribution.
 
       
 
  
             
+
http://geworkbench.org =
     
+
http://wiki.c2b2.columbia.edu/workbench
  
       
+
http://wiki.c2b2.columbia.edu/workbook/index.php/Genomics_Workbook
  
 +
https://sharepoint.c2b2.columbia.edu/c2b2/default.aspx
  
==Before you begin==
+
http://wiki.c2b2.columbia.edu/mantis/
  
 +
http://wiki.c2b2.columbia.edu/mantis/view_all_bug_page.php
  
==Getting Started==
+
http://wiki.c2b2.columbia.edu/mantis/login_page.php
  
 +
http://wiki.c2b2.columbia.edu/isrce/index.php/MARINa,_IDEA,_CUPID_Grid_Service_Implementation
  
==Overview of the GUI and component interoperability==
 
  
The graphical user interface for geWorkbench is divided into for major sections, for
+
http://gforge.nci.nih.gov
  
1. Data management
+
http://gforge.nci.nih.gov/projects/geworkbench
  
2. Marker and Phenotype management
+
http://wiki.c2b2.columbia.edu/informatics/
 +
same as
 +
(http://helpdesk.cu-genome.org/informatics/)
  
3. Visualization tools (primarily)
 
  
4. Analytical tools
+
ICTVdb
  
  
The data managment area (1) is called the Project Panel.  It can hold one workspace, and a workspace in turn can hold one or more projects.  Projects can be used as wished to group different data sets.  Any data file or analysis result is stored in a project.  A workspace and all the data it contains can be saved and returned to later.
 
  
 +
http://wiki.c2b2.columbia.edu/ictvdb/
  
The most important design goal of geWorkbench is to allow data produced or altered in one module to be easily transfered to other modules for successive analysis steps.  There are two places that hold shared data - the Project component (1), and the Panels component(2).  While the Project component holds files and various types of analysis result sets, the Panels component groups markers or phenotypes into panels.  These panels can then be selected for further analysis of only that particular subset of data.  For example, several analysis components produce lists of markers, and each such new list is placed into the Markers component as a new marker panel.  An example of using a phenotype panel is to group microarrays by their disease state.  In a series of tutorials below, we will demonstrate how a panel of markers is defined through selecting a cluster in the Hierarchical Clustering component, and this panel of markers is then passed to the Sequence Retrieval component to begin sequence analysis.
+
nonpublic documents:
  
 
+
adcvs.cu-genome.org:/cvs/magnet
A key feature of the GUI is that the modules displayed in the Visualization (3) and Analysis (4) areas depend on the type of data currently selected in the Project Panel. Thus you will see a different set of choices (tabs) when a microarray data set is selected, as compared to when a DNA or protein sequence file is selected.  When a new data file is loaded, or an analysis produces a new data set, not only is it added to the Project Panel, but an appropriate viewer in the Visualization area is automatically selected.
 
 
 
The GUI provides a menu bar at top with a standard choice of commands.  Many commands that are available in the menu bar are also available by right-clicking on data objects.
 
 
 
 
 
==Tutorial: Loading and saving data==
 
 
 
 
 
 
 
===File types supported===
 
 
 
Expression
 
 
 
1. Affymetrix MAS5/GCOS (text files output by Affymetrix software)
 
 
 
2. Affymetrix File Matrix (.exp)(a geWorkbench defined format)
 
 
 
3. RMAExpress Processed File
 
 
 
4. GenePix
 
 
 
5. '''Note - the type "Normalized no-confidence expression matix" has switched the phenotype and gene labels -don't use until fixed.'''
 
 
 
 
 
Genotypic
 
 
 
1. '''Genotypic data files - is this working?'''
 
 
 
 
 
Sequence
 
 
 
1. FASTA
 
 
 
Pattern Detection
 
 
 
1. Pattern Files
 
 
 
 
 
 
 
===Loading MAS5/GCOS type files===
 
 
 
Use the 10 cardiomyopathy files from Harvard.
 
 
 
''What happens the first time a new chip-type is loaded - how long does it take, what is happening, what internal files are being built?''
 
 
 
 
 
===Merging loaded data===
 
 
           
 
===Saving data files===
 
       
 
 
 
 
 
 
 
==Tutorial: Use of panels==
 
Marker Panels
 
Phenotype Panels
 
Activating a panel
 
 
 
 
 
Use of activated phenotype and marker panels throughout application.
 
 
 
-- if no panels are activated, the "Activated Arrays" and "Activated Markers" check boxes should have no effect.
 
 
 
-- if gene or phenotype panels are activated, then these check boxes should control what is used or displayed.
 
 
 
-- if one of the boxes is checked, only activated markers or arrays will be used.
 
 
 
-- if the box is not checked, then ('''in most cases - are there any exceptions?''') the marker or phenotype panels will be ignored and all markers or phenotypes will be used.
 
 
 
Note that there is a new "plot" button that is available only when a gene panel is active.
 
 
 
 
 
===Working with Marker and Phenotype Panels===
 
 
 
Use the cardiomyopathy dataset created in ''loading data.''
 
 
 
Creating Phenotype Panels
 
 
 
Assigning Case/Control status
 
 
 
Activating a phenotype panel
 
 
 
 
 
Creating Marker Panels
 
 
 
Activating a marker panel
 
 
 
 
 
Saving data to matrix file.
 
 
 
Use the cardiomyopathy dataset annotated in
 
 
 
==Tutorial: Viewing a microarray dataset==
 
 
 
===The Microarray component===
 
 
 
--Point out intensity and array sliders, color key and array name.
 
 
 
 
 
===The Tabular Microarray component===
 
 
 
 
 
===Color Mosaic===
 
 
 
--Point out only displays when "Display" button pushed.
 
 
 
--Point out intensity, accession, gene height and width controls.
 
 
 
--Explain whether remaining controls work or not:  Pat,Abs,Ratio.???
 
 
 
 
 
===Expression Profiles===
 
 
 
-- displays expression level against array number. Each marker is a separate color line.
 
 
 
 
 
===Expression Value Distribution===
 
 
 
-- for a single array, plots expression value against marker number.
 
         
 
 
 
 
 
 
 
==Tutorial: Filtering and Normalizing Data==               
 
                 
 
 
 
==Tutorial: Differential Expression==
 
 
 
T-Test
 
 
 
 
 
 
 
==Tutorial: Hierarchical Clustering==
 
 
 
===Preliminary Filtering and Normalization===
 
 
 
 
 
The file "webmatrix.exp" contains results from 100 Affymetrix HG-U95Av2 chips containing B-cell samples from numerous different disease states (phenotypes).  12600 markers are represented.  To prepare this dataset for clustering we will filter and normalize the data.  The steps shown are just an example of how filtering and normalization can be used, and each dataset should be handled according to the type of analysis being undertaken and its goals.
 
 
 
For this dataset, we performed the following steps:
 
 
 
1. Applied '''Expression Threshold Filter''' to remove very low expression values in the range 0-20.
 
 
 
2. Applied the '''Missing Values Filter''' with a maximum number of missing values per marker of 2. (Deletes markers with more than 2 missing values).  This reduced the number of markers to 6327.
 
 
 
3. Performed '''Quantile Normalization''' using '''Averaging Method''' of '''Mean Marker Profile'''.
 
 
 
4. Applied the '''Deviation Filter''' with Deviation Bound of 20 and '''Missing Values''' set to '''Marker Average'''.
 
 
 
5. Applied the '''Missing Values Filter''' as in (2), which further reduced the number of markers to 6270.
 
 
 
The resulting dataset was named '''webmatrix_fn.exp'''.
 
 
 
 
 
===Fast Hierarchical Clustering===
 
 
 
'''Fast Hierarchical Clustering''' is found in the '''Analysis Panel'''.
 
 
 
In this example we shown Hierarchical Clustering being performed with the following options:
 
 
 
1. Clustering Method:  "Total Linkage"
 
 
 
2. Clustering Dimension: "Both"
 
 
 
3. Clustering Metric: "Euclidean"
 
 
 
 
 
[[Image:T_Analysis_FHC.png]]
 
 
 
 
 
Hit '''Analyze''' to run the clustering.  The resulting dataset is inserted into the '''Project Panel'''
 
 
 
 
 
[[Image:T_ProjectFolder_HierarchClust.png]]
 
 
 
 
 
and can be viewed in the '''Dendrogram Panel'''.  Here we will pick a subtree near the top for further investigation.
 
 
 
1. Click '''Enable Zoom'''.
 
 
 
2. Position the mouse pointer over the cluster subtree of interest.  It will be highlighted in blue.
 
 
 
 
 
[[Image:T_Dendrogram_SelectCluster.png]]
 
 
 
 
 
1. Left click on the highlighted subtree to view it alone.
 
 
 
2. By right clicking on the image, and selecting "Add to panel"
 
 
 
 
 
[[Image:T_Dendrogram_ClusterDetailAdd.png]]
 
 
 
 
 
this markers in this subtree can be added as a new marker panel to the '''Gene Panel.'''
 
 
 
 
 
[[Image:T_GenePanel_ClusterTree.png]]
 
 
 
 
 
==Tutorial: Marker Annotations==
 
 
 
For this tutorial, will be examine the group of markers selected in the '''Hierarchical Clustering''' tutorial.  geWorkbench can retrieve gene and pathway information from databases hosted at the NCI. 
 
 
 
1. The desired marker panel is activated by checking its box in the '''Gene Panel'''.
 
 
 
2. In the '''Marker Annotations''' panel, select '''Retrieve Annotations'''.
 
 
 
 
 
[[Image:T_MarkerAnnotations_ClusterTree.png]]
 
 
 
 
 
The links under the heading '''Gene''' can be clicked to display information from the CGAP database at the NCI:
 
 
 
 
 
[[Image:T_CGAP_Page_for_NME1.png]]
 
 
 
 
 
The '''Pathway''' links can be clicked to display BioCarta pathway diagrams provided through the NCI's caCORE/caBIO resource.  The graphical components are themselves clickable to provide further information.
 
 
 
 
 
[[Image:T_caBIO_Pathways_h_ndkDynamin.png]]
 
 
 
 
 
==Tutorial: Sequence Retrieval==
 
 
 
 
 
geWorkbench contains a number of modules that allow DNA or protein sequences to be analyzed.  Sequences can be loaded from a local disk as a FASTA format file, or can be retrieved from a database.  Here we discuss retrieval of sequences from the network.
 
 
 
For this example, we will start with the group of markers selected in the '''Hierarchical Clustering''' tutorial.
 
 
 
We will download sequences from +-2000 bp from the transcription start site of each gene.  This region may contain some regulatory elements such a transcription factor binding sites.
 
 
 
Press the '''Retrieve Sequences''' button to download the sequences.
 
 
 
 
 
[[Image:T_SeqeunceRetriever_ClusterTree.png]]
 
 
 
 
 
The retrieved sequences are placed in the Project Folder.  Note that when this entry is selected, the modules supporting sequence analysis will appear.
 
 
 
 
 
[[Image:T_ProjectFolder_ClusterSeqs.png]]
 
 
 
 
 
 
 
==Tutorial: Pattern Discovery==
 
 
 
The geWorkbench '''Pattern Discovery''' module uses an algorithm called '''SPLASH''' to search for common patterns in sets of DNA or protein sequences. This type of search can be used, for example, to search for common regulatory elements in otherwise unrelated sequences.
 
 
 
For this tutorial, we will begin with the set of sequences retrieved in the '''Sequence Retriever''' tutorial.  These sequences derive from a cluster of genes showing similar expression pattern across a number of different experiments.
 
 
 
A number of parameters can be adjusted by the user, as shown in the figure, to adjust the sensitivity of the search.
 
 
 
 
 
[[Image:T_PatternDiscovery_Run.png]]
 
 
 
 
 
The result of the search can be viewed both in the '''Pattern Discovery''' module itself and in other sequence viewer modules.
 
 
 
 
 
[[Image:T_PatternDiscovery_Result.png]]
 
 
 
 
 
The results of a run of '''Pattern Discovery''' are placed in the Project Folder:
 
 
 
 
 
[[Image:T_ProjectFolder_PatternDiscovery.png]]
 
 
 
 
 
 
 
 
 
==Tutorial: Promoter Analysis==
 
 
 
   
 
 
 
==Regulatory Network==
 
 
 
 
 
==Integrated Annotation Information==
 
 
 
 
 
==Enrichment Analysis==
 
 
 
 
 
==Sequence Analysis - BLAST==
 

Latest revision as of 13:11, 6 August 2013

Resources:

http://geworkbench.org = http://wiki.c2b2.columbia.edu/workbench

http://wiki.c2b2.columbia.edu/workbook/index.php/Genomics_Workbook

https://sharepoint.c2b2.columbia.edu/c2b2/default.aspx

http://wiki.c2b2.columbia.edu/mantis/

http://wiki.c2b2.columbia.edu/mantis/view_all_bug_page.php

http://wiki.c2b2.columbia.edu/mantis/login_page.php

http://wiki.c2b2.columbia.edu/isrce/index.php/MARINa,_IDEA,_CUPID_Grid_Service_Implementation


http://gforge.nci.nih.gov

http://gforge.nci.nih.gov/projects/geworkbench

http://wiki.c2b2.columbia.edu/informatics/ same as (http://helpdesk.cu-genome.org/informatics/)


ICTVdb


http://wiki.c2b2.columbia.edu/ictvdb/

nonpublic documents:

adcvs.cu-genome.org:/cvs/magnet