Difference between revisions of "Sequence Retriever"

(Retrieving the sequences)
Line 45: Line 45:
  
  
[[Image:ProjectFolder_SavedSequences.png]]
+
[[Image:T_SequenceRetriever_84ClustSeqs_ProjFold.png]]

Revision as of 18:36, 17 August 2006

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Outline

In this tutorial, we will

  • Discuss uses for retrieved sequences.
  • Review obtaining a set of markers for which we wish to retrieve sequences.
  • Retrieve DNA sequences from a remote resource
  • View the sequences in a sequence browser.

Overview

geWorkbench contains a number of modules that allow DNA or protein sequences to be visualized and analyzed. Sequences can be loaded from a local disk as a FASTA format file, or can be retrieved from a remote resource. Here we discuss retrieval of sequences from the network.

Once a set of sequences has been obtained, it can used for several types of analysis in geWorkbench, including searching using known promoter motifs ( Promoter_Analysis), running BLAST searches, or looking for common motifs using Pattern Discovery.

Limitations

This section applies to geWorkbench 1.03, released April 5, 2006. For DNA sequences, only the sequence +-2000 bp from the transcription start site is available, and only for markers in the Affymetrix HG_U95 chip. These sequences have been pre-cached on the geWorkbench server and are downloaded to the application the first time they are requested. The DNA sequences have been their exons masked with the letter "E".


For the next version, we are developing methods to obtain sequences directly from the UC Santa Cruz Golden Path database where possible. Amino-acid sequences are retrieved on-the-fly from the European Bioinformatics Institute (EBI).

Example - retrieving sequences for a list of gene markers

Obtaining a set of markers

We will start with a group of markers obtained in the tutorial Hierarchical Clustering. The list of markers from that tutorial can also be loaded from the file "cluster_tree_12markers.csv" found in the tutorial data file (see Download). To load a set of markers, press the "Load Set" button at the bottom of the component and browse to the desired file.

Retrieving the sequences

We will retrieve sequences from -1999 to +1 bp from the transcription start site of each gene.

Verify that the sequence type is set to DNA. Press the Get Sequence button to download the sequences.


T SequenceRetriever 84ClustSeqs.png


By double-clicking on one of the lines representing a returned sequence, you can switch to a detailed view of the sequence:


T SequenceRetriever 84ClustSeqs disp.png


Retrieved FASTA format sequences can be added to a project by clicking the Add to Project button at lower right in the component (see interface picture above). Note that if the resulting sequence entry in the Project Folder is then selected, modules supporting sequence analysis and visualization will appear in the Analytical Tools and Visualization areas of the GUI. However, the Sequence Retrieval component will not be visible! You must select the Project or the sequence's parent object to see the Sequence Retrieval component again.


T SequenceRetriever 84ClustSeqs ProjFold.png