Cupid

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

Cupid (Sumazin et al. 2011) generates information that can help predict if a gene is a target of a specific miRNA. The Cupid service provides a simple query interface to a database of precalculated Cupid results. The results can be searched either by miRNA ID or by RefSeq gene ID.

For each miRNA M the algorithm uses TargetScan (Lewis et al., 2005), PITA (Kertesz et al., 2007), and MIRANDA (Enright et al., 2003), at their default settings, to predict target sites of M in 3' UTRs of REFseq (hg19) transcripts. Each predicted site seed is scored according to its conservation across 46 vertebrate genomes. Site distances from 5' and 3' ends of the 3'UTR are also annotated. Scores are normalized and input into a support vector machine tool (LIBSVM, Chang and Lin, 2011) to train against 684 validated miRNA targets (the “gold standard”), procured from http://mirecords.biolead.org and TRANSFAC. Site seeds classified by LIBSVM together with the gold standard interaction set are considered positive, otherwise they are considered negative. The output produced by ideal includes:

  • A list of sites where an miRNA is predicted to match the 3’ UTR region of a REFseq gene. For each site the following information is listed:
    • The related TargetScan, PITA and MIRANDA scores.
    • The site’s conservation score.
    • The site’s gold standard classification score.
    • An overall probability of being a true target site (integrating all the above scores).

The Cupid interface is a standalone component in geWorkbench. Unlike many components which are available only when a relevant dataset type has been loaded in the Workspace, the Cupid interface is available as long as it has been loaded in the Component Configuration Manager.

Using the Cupid Interface

Prerequisites

The Cupid component must be loaded in the Component Configuration Manager


CUPID Query Interface.png

Parameters and Controls

  • Server URL - The URL of the Cupid service (at Columbia). The user should not need to change this.
  • Query Type - Cupid results can be queried either by RefSeq ID or miRNA id.
    • RefSeq ID - query by RefSeq ID.
    • miRNA ID - query by miRNA ID.
  • Query Value - The RefSeq or miRNA ID on which to query the database.
  • Submit - submit the query.
  • Export - export the displayed query results to a CSV-format file.

Results

The Cupid output lists all the sites where an miRNA is predicted to match the 3’ UTR region of a REFseq gene. Each result line represents how well a miRNA (identified by the third column of a line) is predicted to match the 3’ UTR region of a gene (identified by the second column of the line, via its refSeq id). More precisely, each column in a line captures the following info (columns below are listed in the order in which they appear in the Cupid output file):

  1. Interaction Probability - overall probability that the miRNA matches the 3' UTR region of the gene.
  2. refSeq id - refSeq idof the target gene.
  3. miRNA - identifier of the miRNA whose matching potential against the target gene is being assessed.
  4. Distance from Start of UTR - location of the starting site of the match of the miRNA sequence, on the gene’s 3’ UTR region. Distances are normalized by UTR length, e.g., 0.33 means that the beginning of the match site is 1/3 of the UTR length from its start.
  5. Distance from End of UTR - location of the ending site of the match of the miRNA sequence, on the gene’s 3’ UTR region. Distances are normalized by UTR length, e.g., 0.67 means that the ending of the match site is 2/3 of the UTR length from its start.
  6. PITA Score - PITA score for the miRNA-target site match.
  7. MIRANDA Score - MIRANDA score for the miRNA-target site match.
  8. TargetScan Score - TargetScan score for the miRNA-target site match.
  9. Conservation Score - Conservation score of the sequence region of the miRNA match (computed against conservation across 46 vertebrate genomes).
  10. Gold Standard Classification - "Yes" means that the site is classified as a gold standard; "No" that it is not).

The Cupid Service

The CUPID servlet takes two parameters for a query, type and value.

  • type –a string value of "RefSeq ID" or "miRNA ID".
  • value – a string containing the RefSeq or miRNA ID on which to query.


Below are example lines from the output file produced by the Cupid code:

0.805531	NM_000034	hsa-miR-122	0.16	0.84	0.92 0.74	 0.93	0.78	1
0.897525	NM_000034	hsa-miR-122	0.17	0.83	0.92	0.74	0.93	1.00	1
0.843500	NM_000038	hsa-miR-135a	0.07	0.93	0.00	0.00	0.93	1.00	1
0.836859	NM_000038	hsa-miR-135b	0.07	0.93	0.00	0.00	0.94	1.00	1
0.740299	NM_000059	hsa-miR-146a	0.65	0.35	0.00	0.29	0.90	0.00	1
0.762785	NM_000059	hsa-miR-146a	0.66	0.34	0.00	0.29	0.90	0.00	1
0.837864	NM_000076	hsa-miR-221	0.13	0.87	0.00	0.50	0.70	0.67	1
0.736315	NM_000076	hsa-miR-221	0.68	0.32	0.00	0.35	0.00	0.51	1
0.855172	NM_000076	hsa-miR-222	0.13	0.87	0.00	0.11	0.69	0.67	1
0.838109	NM_000088	hsa-miR-29c	0.63	0.37	0.00	0.00	0.41	1.00	1

However, the actual service uses pipe characters “|” rather than tabs as the delimiter:

0.838109|NM_000088|hsa-miR-29c|0.63|0.37|0.0|0.0|0.41|1.0|1|
0.840324|NM_000088|hsa-miR-29c|0.66|0.34|0.0|0.0|0.82|1.0|1|
0.854647|NM_000088|hsa-miR-29c|0.76|0.24|0.0|0.0|0.73|1.0|1|
0.837787|NM_000088|hsa-let-7a|0.57|0.43|0.0|0.0|0.68|1.0|0|
0.257611|NM_000088|hsa-let-7a|0.81|0.19|0.0|0.56|0.0|0.0|0|
0.740121|NM_000088|hsa-let-7a|0.9|0.1|0.0|0.12|0.0|0.62|0|
0.838051|NM_000088|hsa-let-7a*|0.95|0.05|0.0|0.0|0.63|1.0|0|
0.257282|NM_000088|hsa-let-7a-2*|0.77|0.23|0.08|0.0|0.0|0.0|0|
0.804941|NM_000088|hsa-let-7b|0.02|0.98|0.9|0.0|0.0|1.0|0|
0.837787|NM_000088|hsa-let-7b|0.57|0.43|0.0|0.0|0.68|1.0|0|

References

  • Chih-Chung Chang and Chih-Jen Lin (2011) LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27.
  • Enright, A.J., B. John, U. Gaul, T. Tuschl, C. Sander, and D.S. Marks (2003) MicroRNA targets in Drosophila. Genome Biol, 5(1): p. R1.
  • Kertesz, M., N. Iovino, U. Unnerstall, U. Gaul, and E. Segal (2007) The role of site accessibility in microRNA target recognition. Nat Genet. 39(10): p. 1278-84.
  • Lewis, B.P., C.B. Burge, and D.P. Bartel (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1): p. 15-20.
  • Sumazin P, Yang X, Chiu HS, Chung WJ, Iyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J, Califano A. (2011) An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147(2):370-81. doi: 10.1016/j.cell.2011.09.041. PubMed 22000015