Difference between revisions of "Sequence Retriever"

(Obtaining a set of markers)
(Retrieving the sequences)
Line 20: Line 20:
 
We will retrieve sequences from +-2000 bp from the transcription start site of each gene.   
 
We will retrieve sequences from +-2000 bp from the transcription start site of each gene.   
  
Press the '''Get Sequence''' button to download the sequences.
+
Verify that the sequence type is set to DNA.  Press the '''Get Sequence''' button to download the sequences.
  
  

Revision as of 16:46, 20 July 2006

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Background

geWorkbench contains a number of modules that allow DNA or protein sequences to be visualized and analyzed. Sequences can be loaded from a local disk as a FASTA format file, or can be retrieved from a remote resource. Here we discuss retrieval of sequences from the network.

Once a set of sequences has been obtained, it can used for several types of analysis in geWorkbench, including searching using known promoter motifs ( Promoter_Analysis), running BLAST searches, or looking for common motifs using Pattern Discovery.

Limitations

This section applies to geWorkbench 1.03, released April 5, 2006. For DNA sequences, only the sequence +-2000 bp from the transcription start site is available, and only for markers in the Affymetrix HG_U95 chip. These sequences have been pre-cached on the geWorkbench server and are downloaded to the application the first time they are requested. For the next version, we are developing methods to obtain sequences directly from the UC Santa Cruz Golden Path database where possible. Amino-acid sequences are retrieved on-the-fly from the European Bioinformatics Institute (EBI).

Example - retrieving sequences for a list of gene markers

Obtaining a set of markers

We will start with a group of markers obtained in a manner similar to that shown in the tutorial Tutorial_-_Clustering#Hierarchical_Clustering_-_Example. The list of markers from that tutorial can also be loaded from the file "cluster_tree_12markers.csv" found in the tutorial data file (see Download). To load a set of markers, press the "Load Set" button at the bottom of the component and browse to the desired file.

Retrieving the sequences

We will retrieve sequences from +-2000 bp from the transcription start site of each gene.

Verify that the sequence type is set to DNA. Press the Get Sequence button to download the sequences.


T SequenceRetriever ClusterTree.png


Retrieved FASTA format sequences can be added to a project by clicking the Add to Project button at lower right in the component (see interface picture above). Note that if the resulting sequence entry in the Project Folder is then selected, modules supporting sequence analysis and visualization will appear in the Analytical Tools and Visualization areas of the GUI.

ProjectFolder SavedSequences.png

Viewing retrieved sequences

The underlying sequence for any marker in the Sequence Retriever display can be seen by double-clicking on the line representing it.


Sequence Retriever SequenceDetails.png


Double-clicking again on the sequence display will return to the summary view. Sequences can also be viewed in the Sequence and Promoter components in the Visualization area. Analytical tools for sequence analysis are displayed in the lower right portion of the interface.