Peptide Recognition Domain (PRD) Specificity Prediction

DREAM4, Challenge 1

Synopsis

Many important protein-protein interactions are mediated by peptide recognition domains (PRD), which bind short linear sequence motifs in other proteins. For example, SH3 domains typically recognize proline-rich motifs, PDZ domains recognize hydrophobic C-terminal tails, and kinases recognize short sequence regions around a phosphorylatable residue [1].

Given the sequence of the domains, the challenge consists of predicting a position weight matrix (PWM) that describes the specificity profile of each of the given domains to their target peptides. Any publicly accessible peptide specificity information available for the domain may be used.

Background

Ideally, PRD specificity could be predicted directly from the sequence of the domain itself. This will enable the prediction of protein-protein interaction networks directly from the genome sequence.

The specificity of selected human SH3, synthetic PDZ and kinase PRDs were experimentally mapped using phage display and combinatorial peptide libraries. The peptide libraries contain many short peptides with diverse sequences, around ten amino acids in length. The domain is used to select peptides from the library that bind to it. The set of peptides that bind to a domain defines a short, linear sequence pattern that the domain is expected to recognize. This pattern can be represented probabilistically as a position weight matrix (PWM). The PWM representation implicitly assumes independence of the motif positions. While in certain motifs interactions between some positions may exist, they are neglected for this challenge.

Publicly available information about the domain family that may be useful for prediction includes known ligands of members of the domain family from the literature or databases like DOMINO [2] or PDZBase [3] and structures from the PDB [4].

The Challenge

Peptides bound by SH3, PDZ, and kinase PRDs were experimentally identified. These data constitute an unpublished "gold standard" for the binding specificity of the selected PRDs.

Given the sequence of the domains, the challenge consists of predicting a position weight matrix (PWM) that describes the specificity profile of each of the given domains to their target peptides. Any publicly accessible peptide specificity information available for the domain may be used.

Data

Submission

Using the provided tab delimited template file

and keeping the formatting of this file, submit a ten-column PWM for each domain. An example PWM is illustrated below. Each row corresponds to an amino acid, each column corresponds to the probability that the given amino acid is found at that position. Each of the ten columns must sum to 1.0. (Note that the amino acids are ordered alphabetically by IUPAC single letter code. Please keep this template format.)

A	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
C	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
D	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
E	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
F	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
G	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
H	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
I	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
K	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
L	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
M	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
N	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
P	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
Q	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
R	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
S	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
T	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
V	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
W	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
Y	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05

Scoring Metrics

The submitted PWM predictions will be judged exclusively by similarity to the experimentally mapped PWM using the distance induced by the Frobenius Norm (http://mathworld.wolfram.com/FrobeniusNorm.html).

Domain specific notes:

References

  1. Pawson T, Nash P (2003) Assembly of cell regulatory systems through protein interaction domains. Science 300: 445-452.
  2. Ceol A, Chatr-aryamontri A, Santonico E, Sacco R, Castagnoli L, et al. (2007) DOMINO: a database of domain-peptide interactions. Nucleic Acids Res 35: D557-560.
  3. Beuming T, Skrabanek L, Niv MY, Mukherjee P, Weinstein H (2005) PDZBase: a protein-protein interaction database for PDZ-domains. Bioinformatics 21: 827-828.
  4. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The Protein Data Bank. 28: 235-242.

Authors

The challenge was provided by Gary Bader and Philip M. Kim, from the Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto. Pre-publication data was provided generously by Sachdev Sidhu, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto and Ben Turk, Deparment of Pharmacology, Yale University.  The challenge has been designed in collaboration with Robert Prill and Gustavo Stolovitzky from the IBM T.J. Watson Research Center in New York.

Download

Don't hesitate to post a question in the DREAM discussion board if you need any clarification on this challenge.

Retrieved from "http://wiki.c2b2.columbia.edu/dream/index.php/D4c1"

This page has been accessed 6,960 times. This page was last modified 00:07, 3 July 2009.

x
Find
Browse
The DREAM Project
Community portal
Current events
Recent changes
Random page
Help
Donations
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Create an account or log in
Special pages
New pages
File list
Statistics
Bug reports
More...