Difference between revisions of "Sequence Retriever"

(Overview)
m (Controls)
Line 28: Line 28:
 
* '''Find a Marker''' - This tab allows one to search in the component for a particular marker.  However, we recommend performing searches in the Markers component and adding the results to a set.
 
* '''Find a Marker''' - This tab allows one to search in the component for a particular marker.  However, we recommend performing searches in the Markers component and adding the results to a set.
  
* "- and +" text fields - For DNA sequence retrieval, these two text fields specify the distance upstream (-) and downstream (+) from the transcription start site for the request.  They are disabled when a protein query is selected.
+
* '''"- and +"''' text fields - For DNA sequence retrieval, these two text fields specify the distance upstream (-) and downstream (+) from the transcription start site for the request.  They are disabled when a protein query is selected.
  
 
[[Image:Sequence_Retriever_MAP4K4_DNA_pre.png]]
 
[[Image:Sequence_Retriever_MAP4K4_DNA_pre.png]]

Revision as of 12:37, 8 June 2011

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Outline

In this tutorial, we will

  • Discuss uses for retrieved sequences.
  • Review obtaining a set of markers for which we wish to retrieve sequences.
  • Retrieve DNA sequences from a remote resource
  • View the sequences in a sequence browser.
  • Add the sequences to the Project Folders component
  • Save the sequences to a file.

Overview

geWorkbench contains a number of modules that allow DNA or protein sequences to be visualized and analyzed. Sequences can be loaded from a local disk as a FASTA format file, or can be retrieved from a remote resource. Here we discuss retrieval of sequences from the network.

Once a set of sequences has been obtained, it can used for several types of analysis in geWorkbench, including searching using known promoter motifs ( Promoter_Analysis), running BLAST searches, or looking for common motifs using Pattern Discovery.

Nucleotide sequences are obtained directly from the UC Santa Cruz Golden Path database. Amino-acid sequences are retrieved from the European Bioinformatics Institute (EBI).

Controls

  • Type - DNA or Protein
    • Source - The pulldown menu to the right of the "Type" pulldown shows the source to query for the data. At present, only one choice for each type is supported. DNA sequences are retrieved from the UCSC Santa Cruz. Protein sequences are retrieve from the EBI.
  • Marker - This panel shows markers that are in any activated set in the Markers component.
  • Find a Marker - This tab allows one to search in the component for a particular marker. However, we recommend performing searches in the Markers component and adding the results to a set.
  • "- and +" text fields - For DNA sequence retrieval, these two text fields specify the distance upstream (-) and downstream (+) from the transcription start site for the request. They are disabled when a protein query is selected.

Sequence Retriever MAP4K4 DNA pre.png

Prerequisites

  • A microarray dataset must be loaded.
  • An annotation file must be associated with the microarray dataset at the time it is loaded. At present, only Affymetrix-format annotation files can be read in. These files can be obtained for Affymetrix chip types from affymetrix.com. For exact instructions, please see the geWorkbench FAQ page: FAQ

Example - retrieving sequences for a list of gene markers

Obtaining a set of markers

Sequences can be retrieved for any set of markers of interest. For this example we have loaded the tutorial data file BCell-100.exp and selected the last 10 markers into a new Marker Set:

T SequenceRetriever MarkerSet.png


When the set is activate (through use of the check box) the selected marker set will appear in the Sequence Retriever component:

T SequenceRetriever Setup.png


We will retrieve DNA sequences from Santa Cruz and leave the default settings of +-10,000 relative to the start of transcription. After pressing Get Sequence the sequences are downloaded:

T SequenceRetriever AfterRetrieval.png

Note that for several of the markers more than one sequence has been retrieved. All sequences associated with a given gene symbol are retrieved.


Double-clicking on one of the lines shows the sequence detail:

T SequenceRetriever SequenceDetail.png


The component provides check boxes which allow sequences of interest to be selected and added to the Project Folders component as a data node:

T SequenceRetriever SelectingForProject.png


When Add to Project is pushed, the user is asked for a name for the new data node:

T SequenceRetriever NamingSet.png


The resulting node is placed into the Project Folder as a child of the original dataset:

T SequenceRetriever SequenceNode.png


Note that when this node is added, the Viewing area of the geWorkbench GUI will now show components that support working with sequences. However, the Sequence Retrieval component will no longer be visible! You must select the Project or the sequence's parent object to see the Sequence Retrieval component again.


Saving the sequences to an external FASTA file

  1. Right-click on the "selected sequences" entry you made in the Project Folders component.
  2. Select Save.
  3. Enter a suitable name and save the file.