Difference between revisions of "SkyBase"

(Viewing SkyBase Results)
(Table Column Details, upper left)
Line 122: Line 122:
  
  
[[Image:SkyBase_1e09_upper_left.png]]
+
[[Image:SkyBase_1e09_upper_left.png|{{ImageMaxWidth}}]]
 
 
 
 
  
 
====Table Column Details, upper right====
 
====Table Column Details, upper right====

Revision as of 14:46, 16 July 2013

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Overview

SkyBase is a database that stores the homology models built by SkyLine analysis for

As of 7/19/2012, the databases have:

  • PDB60: 12,264 structures, 7,804,258 models.
  • NESG: 946 structures, 1,943,390 models.

Users can search the database with their sequence of interest to find homology models which meet user-defined alignment coverage and sequence identity constraints.

SkyBase Web Version

SkyBase can be used either within geWorkbench, or directly in a web browser. For more information about the web version, please see the following two links.

SkyBase Web Tutorial

http://skybase.c2b2.columbia.edu/nesg3/help/help.html

SkyBase Web Search Page

http://skybase.c2b2.columbia.edu/nesg3/nesg.php

SkyBase in geWorkbench

Parameters

BLAST is run using the query sequence to identify "hits" to existing models in the SkyBase database.

% Minimum Alignment Coverage

  • Percentage of the hit sequence that the query sequence must align to, including similarity matches.
  • If the query sequence is shorter than hit sequence, sequence coverage is calculated for the query sequence.

% Minimum Sequence Identity

Percentage of the hit sequence that the query sequence must have exact letter matches with.

Most Similar Hits to Report

The number of top hits to report, based on a calculated rank. The rank combines the model quality pG, the template coverage, and the model-template sequence identity.

  1. Models with a quality score < 0.7 are discarded.
  2. The remaining models are then binned by the quality score, pG, such that bin A > bin B > bin C:
    1. 0.9 <= pG < 1.0, bin A
    2. 0.8 <= pG < 0.9, bin B
    3. 0.7 <= pG < 0.8, bin C
  3. Within each bin, ranks are further decided by sorting their template coverage; higher coverage gets higher rank
  4. Within each bin, for any models with the same template coverage, ranks are further decided by sorting the hits on their sequence identity; higher identity gets higher rank.


The BLAST search with the parameters shown below will return the top 10 results that have at least a 75% sequence coverage of hit sequences (if the query sequence is shorter than hit sequence, sequence coverage is calculated for the query sequence) and over 30% sequence identity of the two sequences.

Homology Models SkyBase

For details of the two supported homology model databases, please see http://skybase.c2b2.columbia.edu/nesg3/help/help.html

  • PDB60 (default) - models generated based on structures in the PDB with a 60% redundancy cutoff.
  • NESG - models generated based on structures in the Northeast Structural Genomics Consortium database (~670 structures).


SkyBase Parameters.png

Grid Service

No local service implementation of SkyBase is available in geWorkbench. Instead, an open grid service is used. No username or password is required.

In the Services tab,

  1. Click on "Search Grid Services". This will retrieve the information for the SkyBase grid service from the index service.
  2. Select the radio button in front of the SkyBase grid service.
  3. Return to the "Parameters" tab.


SkyBase Grid Service.png

Running a SkyBase query

  1. Make sure SkyBase is loaded in the Component Configuration Manager.
  2. Load a protein sequence file for which you wish to find homology models.
  3. Select the SkyBase analysis component in the Control area of geWorkbench.
  4. Set the parameters as desired.
  5. Select the grid service in the "Services" tab.
  6. Back on the Parameters tab, hit "Analyze".

Viewing SkyBase Results

Note - SkyLine results are maintained on the server, not in geWorkbench. Each time a different structure is selected for viewing, its details will be retrieved from the SkyBase server. While there is currently no data deletion policy, data of interest should be saved to disk or screenshots taken.


After query with the sequence for PDB structure "1e09", 1e09.fasta:


SkyBase 1e09 full normal.png

Table

Column Headers

  •  %Id Query-Model Sequence - Percent identity in the query-model sequence alignment.
  • Model Start-End
  • Query Start-End
  • Model SeqID
  • Model Sequence
  • Query Sequence
  • pG - a log-transformed, length-normalized integration over the residue-by-residue Prosa II profile [Sippl, 1993].
  • Coverage Template
  •  %Id Template-Model Sequences - Percent identity in the template-model sequence alignment.
  • Template
  • Template Length
  • eValue
  • Model Length
  • Model Coverage
  • Model Species
  • Model Description
  • Model File
  • Template-Model Alignment

Note on column sorting

In the initial display, the data is sorted in descending order on the second column, "Id% query-model-sequence". The table can be resorted based on any column by clicking on that column's header. Repeated clicks on the same header will cycle through sorting the table in three ways:

  • Original order (column 2, descending).
  • Ascending order of clicked-on column.
  • Descending order of clicked-on column.

Table Column Details, upper left

SkyBase 1e09 upper left.png

Table Column Details, upper right

SkyBase 1e09 upper right.png

Bar Chart

For each model, the bar chart plots several of the most important features for easy comparison:

  • Model Quality, pG - a log-transformed, length-normalized integration over the residue-by-residue Prosa II profile [Sippl, 1993].
  • Template Coverage -
  • Model-Template Sequence Identity - Degree of identity between the model sequence and the structural template.
  • Rank - red line, not labeled.

SkyBase BarChart.png

Alignments

Jalview - Alignments between the model and the original template sequence, and between the model and the query sequence, can be viewed using the built-in Jalview multiple alignment viewer. http://www.jalview.org/. The residues are color-coded in the alignments.

This viewer offers a number of options for customizing the alignment view.

Model-template alignment (VAT)

SkyBase 1e09 VAT.png


Model-query alignment (VAQ)

SkyBase 1e09 VAQ.png


Controls

  • ATP - Add Structure to Project - The "ATP" button will add the currently displayed protein structure file (PDB file) as a new node to the Project in the Project Folders component.
  • VAT - View alignment between model and template - display the model-template sequence alignment in Jalview.
  • VAQ - View alignment between model and query - display the model-query sequence alignment in Jalview.

References

Lee H, Li Z, Silkov A, Fischer M, Petrey D, Honig B, Murray D. (2010) High-throughput computational structure-based characterization of protein families: START domains and implications for structural genomics. J Struct Funct Genomics. 11(1):51-9. Link to paper

Mirkovic N., Li Z., Parnassa A., Murray D. (2007) Strategies for High-Throughput Comparative Modeling: Applications to Leverage Analysis in Structural Genomics and Protein Family Organization. Proteins: Structure, Function, and Bioinformatics 66:766-777. link.

Sippl MJ (1993) Recognition of errors in three-dimensional structures of proteins. Proteins. 17(4):355–62. link