Difference between revisions of "User:Smith"

Line 13: Line 13:
  
 
==Tutorial: Sequence Analysis - BLAST==
 
==Tutorial: Sequence Analysis - BLAST==
 +
 +
Will provide some scenarios on when you might want to do BLAST queries in the context of geWorkbench - e.g. you have found an interesting marker, retrieve its gene, and want to see what it is related to ( we can't do by-gene queries right now though in the sequence retriever).
  
 
Use the file "NM_024426-Wilms.fasta" provided in the tutorial data directory.  This is a nucleotide sequence file.  There is a second file which contains the corresponding protein sequence, "NP_077744-Wilms.fasta".
 
Use the file "NM_024426-Wilms.fasta" provided in the tutorial data directory.  This is a nucleotide sequence file.  There is a second file which contains the corresponding protein sequence, "NP_077744-Wilms.fasta".
Line 43: Line 45:
  
  
In the BLAST results viewer, you can examine the alignments  (Note that this component is not working right - no point taking a screenshot yet)  
+
In the BLAST results viewer, you can examine the alignments  (Note that this component is not working right - no point taking a screenshot yet).  Note that this sequences hits many other target sequences.  Each different target hit is listed on a line in the results table.  Note that a sequence can hit one target sequence in several different places.  Each is listed as a separate subentry under that target.  Note that there is a bug associated with the information displayed when there are such subhits.
  
 
Here you can select sequences to add back to the main project by checking "include" and then "Add selected sequences to project".  You can also add just the aligned parts by hitting the appropriate button - note- there is a bug here, it is adding the worst, not the best aligned subsequence for each target.
 
Here you can select sequences to add back to the main project by checking "include" and then "Add selected sequences to project".  You can also add just the aligned parts by hitting the appropriate button - note- there is a bug here, it is adding the worst, not the best aligned subsequence for each target.
 +
 +
The Load button allows you to load an external BLAST file in HTML format into the viewer.

Revision as of 13:14, 22 March 2006

Tutorial: Promoter Analysis

Tutorial: Regulatory Network Reverse Engineering

Tutorial: Integrated Annotation Information ???

Tutorial: Enrichment Analysis/GO Term component

Tutorial: Sequence Analysis - BLAST

Will provide some scenarios on when you might want to do BLAST queries in the context of geWorkbench - e.g. you have found an interesting marker, retrieve its gene, and want to see what it is related to ( we can't do by-gene queries right now though in the sequence retriever).

Use the file "NM_024426-Wilms.fasta" provided in the tutorial data directory. This is a nucleotide sequence file. There is a second file which contains the corresponding protein sequence, "NP_077744-Wilms.fasta".

Provide a little background info about Wilm's tumor. (It was chosen at random).

Go to Sequence Alignment.

Select BLAST.

Note that the subsequence displays the length of the longest sequence selected (here there is only one). It can be used to select out a portion of the sequence to use for the query ( probably wouldn't make much sense if more than one sequence is selected for the query).

Select a program. Since this is a nucelotide query, we want to select a nucleotide query program such as blastn.

Provide a one line description of each of the different blast algorithms. We have this info on the AMDeC website.

Now that the program has been selected, note that the appropriate databases are displayed (need to verify this for all algorithms). Here we will try ncbi/nt - the complete non-redundant nucleotide database.

Go to the advanced options tab. Make sure the matrix "dna.mat" is selected. Change the Expect value to 0.01. We will leave checked the box to use PFP filtering for repeated sequence elements (Paracel Filtering Package).

In the Service tab, select Columbia.

Note the text field at bottom which shows that one sequence has been selected. If you have a fasta file with mulitple sequences, you can select the ones you want in the Markers component and activate this selection, letting you search on a subset. Or, you can search on all the sequences in a file (all markers checkbox, or also by default if no subselection made? find out).

You can check the server status by hitting the "Refresh" button. For the columbia machine, this can give you an idea of how busy it is.

Press the curved arrow submit button. Observe the progress bar "Blast is running".

When the results return, they are placed in the Project Folders as a child of the sequence the correspond to. Note that you can mouse over the result set to see how many sequences are in it. In this case, I found 160.


In the BLAST results viewer, you can examine the alignments (Note that this component is not working right - no point taking a screenshot yet). Note that this sequences hits many other target sequences. Each different target hit is listed on a line in the results table. Note that a sequence can hit one target sequence in several different places. Each is listed as a separate subentry under that target. Note that there is a bug associated with the information displayed when there are such subhits.

Here you can select sequences to add back to the main project by checking "include" and then "Add selected sequences to project". You can also add just the aligned parts by hitting the appropriate button - note- there is a bug here, it is adding the worst, not the best aligned subsequence for each target.

The Load button allows you to load an external BLAST file in HTML format into the viewer.