Sequence Retriever
Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials |
Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot |
Contents
Outline
In this tutorial, we will
- Discuss uses for retrieved sequences.
- Review obtaining a set of markers for which we wish to retrieve sequences.
- Retrieve DNA sequences from a remote resource
- View the sequences in a sequence browser.
Overview
geWorkbench contains a number of modules that allow DNA or protein sequences to be visualized and analyzed. Sequences can be loaded from a local disk as a FASTA format file, or can be retrieved from a remote resource. Here we discuss retrieval of sequences from the network.
Once a set of sequences has been obtained, it can used for several types of analysis in geWorkbench, including searching using known promoter motifs ( Promoter_Analysis), running BLAST searches, or looking for common motifs using Pattern Discovery.
Limitations
This section applies to geWorkbench 1.03, released April 5, 2006. For DNA sequences, only the sequence +-2000 bp from the transcription start site is available, and only for markers in the Affymetrix HG_U95 chip. These sequences have been pre-cached on the geWorkbench server and are downloaded to the application the first time they are requested. The DNA sequences have been their exons masked with the letter "E".
For the next version, we are developing methods to obtain sequences directly from the UC Santa Cruz Golden Path database where possible. Amino-acid sequences are retrieved on-the-fly from the European Bioinformatics Institute (EBI).
Example - retrieving sequences for a list of gene markers
Obtaining a set of markers
We will start with a group of markers obtained in the tutorial Hierarchical Clustering. The list of markers from that tutorial can also be loaded from the file "cluster_tree_12markers.csv" found in the tutorial data file (see Download). To load a set of markers, press the "Load Set" button at the bottom of the component and browse to the desired file.
Retrieving the sequences
We will retrieve sequences from -1999 to +1 bp from the transcription start site of each gene.
Verify that the sequence type is set to DNA. Press the Get Sequence button to download the sequences.
By double-clicking on one of the lines representing a returned sequence, you can switch to a detailed view of the sequence:
Adding the returned sequences to the project
Retrieved FASTA format sequences can be added to a project by clicking the Add to Project button at lower right in the component (see interface picture above). Note that if the resulting sequence entry in the Project Folder is then selected, modules supporting sequence analysis and visualization will appear in the Analytical Tools and Visualization areas of the GUI. However, the Sequence Retrieval component will not be visible! You must select the Project or the sequence's parent object to see the Sequence Retrieval component again.
- Here we saved the returned sequences under the name "cluster sequences".
Generating a new list of markers for the returned sequences.
The Sequence Retriever does not necessarily return a sequence for every probe listed. We can generate a new list of genes for just those present here.
- In the Project Folders component, selecting the "cluster sequences" object just created. Its contents now appear below in the Markers component.
- in the Markers component, select all of the probes and right-click.
- Select Add to Set.
- Enter a name for the new set. Here we have used "cluster tree seqs".
- Note that the Marker Sets component shows there are 64 sequences in the set, from the 84 markers we started with.
- This new list can also be saved to disk by right-clicking on it and selecting Save. We have used the name "cluster_tree_total_pearsons_64of84_markers.csv"
Saving the seqeunces to an external FASTA file
- Right-click on the "cluster seqeunces" entry you made in the Project Folders component.
- Select Save.
- Enter a suitable name. We have saved it as "640f84ClusterPearsonsSeqs.fasta"