SkyBase
Contents
[hide]Overview
SkyBase is a database that stores the homology models built by SkyLine analysis for all NESG PSI2 protein structures. Users can search the database with their sequence of interest to find homology models which meet user-defined alignment coverage and sequence identity constraints.
SkyBase Web Version
SkyBase can be used either within geWorkbench, or directly in a web browser. For more information about the web version, please see the following two links.
SkyBase Web Tutorial
http://skybase.c2b2.columbia.edu/nesg3/help/help.html
SkyBase Web Search Page
http://skybase.c2b2.columbia.edu/nesg3/nesg.php
SkyBase in geWorkbench
Parameters
% minimum Alignment Coverage
- Percentage of the hit sequence that the query sequence must align to, including similarity matches.
- If the query sequence is shorter than hit sequence, sequence coverage is calculated for the query sequence.
% minimum sequence identity
Percentage of the hit sequence that the query sequence must have exact letter matches with.
most similar hits to report
The number of top hits to report, based on a calculated rank. The rank combines the model quality pG, the template coverage, and the model-template sequence identity.
- Models with a quality score < 0.7 are discarded.
- The remaining models are then binned by the quality score, pG, such that bin A > bin B > bin C:
- 0.9 <= pG < 1.0, bin A
- 0.8 <= pG < 0.9, bin B
- 0.7 <= pG < 0.8, bin C
- Within each bin, ranks are further decided by sorting their template coverage; higher coverage gets higher rank
- Within each bin, for any models with the same template coverage, ranks are further decided by sorting the hits on their sequence identity; higher identity gets higher rank.
The BLAST search with the parameters shown below will return the top 10 results that have at least a 75% sequence coverage of hit sequences (if the query sequence is shorter than hit sequence, sequence coverage is calculated for the query sequence) and over 30% sequence identity of the two sequences.
Grid Service
No local service implementation of SkyBase is available in geWorkbench. Instead, an open grid service is used. No username or password is required.
In the Services tab,
- Click on "Search Grid Services". This will retrieve the information for the SkyBase grid service from the index service.
- Select the radio button in front of the SkyBase grid service.
- Return to the "Parameters" tab.
Running a SkyBase query
- Make sure SkyBase is loaded in the Component Configuration Manager.
- Load a protein sequence file for which you wish to find homology models.
- Select the SkyBase analysis component in the Control area of geWorkbench.
- Set the parameters as desired.
- Select the grid service in the "Services" tab.
- Back on the Parameters tab, hit "Analyze".
Viewing SkyBase Results
Note - SkyLine results are maintained on the server, not in geWorkbench. Each time a different structure is selected for viewing, its details will be retrieved from the SkyBase server. While there is currently no data deletion policy, data of interest should be saved to disk or screenshots taken.
After query with the sequence for PDB structure "1e09", 1e09.fasta:
Table
Note on column sorting
In the initial display, the data is sorted in descending order on the second column, "Id% query-model-sequence". The table can be resorted based on any column by clicking on that column's header. Repeated clicks on the same header will cycle through sorting the table in three ways:
- Original order (column 2, descending).
- Ascending order of clicked-on column.
- Descending order of clicked-on column.
Column Headers
- %Id Query-Model Sequence
- Model Start-End
- Query Start-End
- Model SeqID
- Model Sequence
- Query Sequence
- pG
- Coverage Template
- %Id Template-Model Sequences
- Template
- Template Length
- eValue
- Model Length
- Model Coverage
- Model Species
- Model Description
- Model File
- Template-Model Alignment
Table Column Details, upper left
Table Column Details, upper right
ATP - Add Structure to Project
The "ATP" button will add the currently displayed protein structure file (PDB file) as a new node to the Project in the Project Folders component.
VAT - View alignment between model and template
View alignment between model-template
VAQ - View alignment between model and query
View alignment between model-query
References
Lee H, Li Z, Silkov A, Fischer M, Petrey D, Honig B, Murray D. (2010) High-throughput computational structure-based characterization of protein families: START domains and implications for structural genomics. J Struct Funct Genomics. 11(1):51-9. Link to paper
Mirkovic N., Li Z., Parnassa A., Murray D. (2007) Strategies for High-Throughput Comparative Modeling: Applications to Leverage Analysis in Structural Genomics and Protein Family Organization. Proteins: Structure, Function, and Bioinformatics 66:766-777.