SkyBase

Revision as of 14:53, 14 June 2011 by Smith (talk | contribs) (Overview)

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Overview

SkyBase is a database that stores the homology models built by SkyLine analysis for all NESG PSI2 protein structures. Users can search the database with their sequence of interest to find homology models which meet user-defined alignment coverage and sequence identity constraints.

SkyBase can be used either within geWorkbench, or directly in a web browser. For more information about the web version, please see the following two links.

SkyBase Web Tutorial

http://skybase.c2b2.columbia.edu/nesg3/help/help.html


SkyBase Database Web Search Page

http://skybase.c2b2.columbia.edu/nesg3/nesg.php

SkyBase in geWorkbench

Parameters

% minimum Alignment Coverage

  • Percentage of the hit sequence that the query sequence must align to, including similarity matches.
  • If the query sequence is shorter than hit sequence, sequence coverage is calculated for the query sequence.

% minimum sequence identity

Percentage of the hit sequence that the query sequence must have exact letter matches with.

most similar hits to report

The number of top hits to report, based on a calculated rank. The rank combines the model quality pG, the template coverage, and the model-template sequence identity.

  1. Models with a quality score < 0.7 are discarded.
  2. The remaining models are then binned by the quality score, pG, such that bin A > bin B > bin C:
    1. 0.9 <= pG < 1.0, bin A
    2. 0.8 <= pG < 0.9, bin B
    3. 0.7 <= pG < 0.8, bin C
  3. Within each bin, ranks are further decided by sorting their template coverage; higher coverage gets higher rank
  4. Within each bin, for any models with the same template coverage, ranks are further decided by sorting the hits on their sequence identity; higher identity gets higher rank.


The BLAST search with the parameters shown below will return the top 10 results that have at least a 75% sequence coverage of hit sequences (if the query sequence is shorter than hit sequence, sequence coverage is calculated for the query sequence) and over 30% sequence identity of the two sequences.


SkyBase Parameters.png

Grid Service

No local service implementation of SkyBase is available in geWorkbench. Instead, an open grid service is used. No username or password is required.

In the Services tab,

  1. Click on "Search Grid Services". This will retrieve the information for the SkyBase grid service from the index service.
  2. Select the radio button in front of the SkyBase grid service.
  3. Return to the "Parameters" tab.


SkyBase Grid Service.png

Running a SkyBase query

  1. Make sure SkyBase is loaded in the Component Configuration Manager.
  2. Load a protein sequence file for which you wish to find homology models.
  3. Select the SkyBase analysis component in the Control area of geWorkbench.
  4. Set the parameters as desired.
  5. Select the grid service in the "Services" tab.
  6. Back on the Parameters tab, hit "Analyze".

Viewing SkyBase Results

Note - SkyLine results are maintained on the server, not in geWorkbench. Each time a different structure is selected for viewing, its details will be retrieved from the SkyBase server. While there is currently no data deletion policy, data of interest should be saved to disk or screenshots taken.


After query with the sequence for PDB structure "1e09", 1e09.fasta:


SkyBase 1e09 full normal.png

Table

Note on column sorting

In the initial display, the data is sorted in descending order on the second column, "Id% query-model-sequence". The table can be resorted based on any column by clicking on that column's header. Repeated clicks on the same header will cycle through sorting the table in three ways:

  • Original order (column 2, descending).
  • Ascending order of clicked-on column.
  • Descending order of clicked-on column.


Column Headers

  •  %Id Query-Model Sequence
  • Model Start-End
  • Query Start-End
  • Model SeqID
  • Model Sequence
  • Query Sequence
  • pG
  • Coverage Template
  •  %Id Template-Model Sequences
  • Template
  • Template Length
  • eValue
  • Model Length
  • Model Coverage
  • Model Species
  • Model Description
  • Model File
  • Template-Model Alignment


Table Column Details, upper left

SkyBase 1e09 upper left.png


Table Column Details, upper right

SkyBase 1e09 upper right.png

ATP - Add Structure to Project

The "ATP" button will add the currently displayed protein structure file (PDB file) as a new node to the Project in the Project Folders component.

VAT - View alignment between model and template

View alignment between model-template


SkyBase 1e09 VAT.png

VAQ - View alignment between model and query

View alignment between model-query


SkyBase 1e09 VAQ.png

References

Lee H, Li Z, Silkov A, Fischer M, Petrey D, Honig B, Murray D. (2010) High-throughput computational structure-based characterization of protein families: START domains and implications for structural genomics. J Struct Funct Genomics. 11(1):51-9. Link to paper

Mirkovic N., Li Z., Parnassa A., Murray D. (2007) Strategies for High-Throughput Comparative Modeling: Applications to Leverage Analysis in Structural Genomics and Protein Family Organization. Proteins: Structure, Function, and Bioinformatics 66:766-777.