Database Info
From Informatics
Database Inventory
Facilities hosting databases :
(1) C2B2 (Center for Computational Biology and Bioinformatics) - https://www.c2b2.columbia.edu/page.php?pageid=22
(2) Herbert Irving Comprehensive Cancer Center - http://cancercenter.columbia.edu
(3) Genome Center - http://genome4.cpmc.columbia.edu/
DB NAME | CONTACT | SIZE | DESCRIPTION | POINTER | DETAILS |
| GenBank | Richard Friedman (friedman<at>cancercenter.columbia.edu) | 82 million sequences,
65 billion nucleotides | Consists of nucleic acid sequences used for sequence retrieval and homolog identification. | adgate.cu-genome.org, | GenBank Details |
| NCBI BlastDB | Richard Friedman (friedman<at>cancercenter.columbia.edu) | ~ 60 million sequences | Consists of nucleic acid and peptide sequences used for sequence retrieval and homolog identification. | adgate.cu-genome.org | BlastDB Details |
| GenPept | Richard Friedman (friedman<at>cancercenter.columbia.edu) | 5 million sequences,
1.5 billion amino acids | Consists of protein sequences used for sequence retrieval and homolog identification. | adgate.cu-genome.org, | GenPept Details |
| UniProt | Richard Friedman (friedman<at>cancercenter.columbia.edu) | 5 million sequences,
300 million amino acids | Consists of protein sequences used for sequence retrieval and homolog identification. | adgate.cu-genome.org, | UniProt Details |
| Geneways | Ilya Mayzus (im53<at>genomecenter.columbia.edu) | 2.1 million unique molecular interactions;
Oracle: 6.7 GB, Flat-file/indices: 500 GB | Geneways is a system for automatically extracting, analyzing, visualizing and integrating molecular pathway data from research literature. It contains 4 million molecular interaction statements which describe 2.1 million unique molecular interactions, automatically extracted from one quarter million articles in 78 journals. | Geneways | Geneways Details |
| CNKB | Michael Honig (mhonig<at>c2b2.columbia.edu) | ~400k interactions | The Cellular Network Knowledge Base maintains the B-Cell Interactome and stores protein-protein and protein-dna interactions from several public databases. It stores sequence info, interaction type, GO annotation, cellular and molecular phenotype context for the interaction, algorithm or experimental procedure for inferring the interaction, and homology relationships across multiple species. | http://www.geworkbench.org. Available for querying via geWorkbench plugin and visualizing interactions in geWorkbench Cytoscape module.
B-cell Interactome (66,193 interactions) | |
| ADOMETA | Dr. Dennis Vitkup (vitkup<at>dbmi.columbia.edu),
Lifeng Chen (lifeng.chen<at>dbmi.columbia.edu) | ~ 18,000 genes | ADOMETA stands for ADoption of Orphan METabolic Activities. It is a bioinformatics resource designed to predict genes for orphan metabolic activities, which are known biochemical activities not currently assigned to genes in some or all organisms. ADOMETA returns a ranked list of genes likely to catalyze a given metabolic activity in a selected organism. | ADOMETA | ADOMETA Details |
| HMAP | Kely Norel (rn98<at>columbia.edu) | ~ 29,000 templates | HMAP is a method for hybrid multidimensional alignment of profiles which combines sequence, secondary and tertiary information in protein structures into profiles to facilitate the detection of remote homologs and perform sequence-to-sequence alignments. The database for HMAP contains approximately 13,000 templates, which are 90% sequence non-redundant. | /ifs/data/c2b2/bh_lab/shares/databases/hmap/1dprof | HMAP Details |
| SCOP | Kely Norel (rn98<at>columbia.edu) | 105,725 PDB entries | The SCOP database provides a detailed and comprehensive description of all known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and far evolutionary relationships; the third, fold, describes geometrical relationships. | SCOP, Internal: /ifs/data/c2b2/bh_lab/shares/databases/scop | SCOP Details |
| PDB | Kely Norel (rn98<at>columbia.edu) | 55,000 structure entries | The Protein Data Bank contains 3D structural coordinates for over 42,000 proteins and is used for protein visualization, structural analysis, and structure-based searching. | PDB, Internal: /ifs/data/c2b2/bh_lab/shares/databases/pdb | PDB Details |
| PROSITE | Richard Friedman (friedman<at>cancercenter.columbia.edu) | 1,300 patterns | PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them. | PROSITE | PROSITE Details |
| Pfam | Richard Friedman (friedman<at>cancercenter.columbia.edu) | Statistical models of 9,000 protein families | Pfam is used for the identification of protein structural domains. | Pfam | Pfam Details |
| REBASE | Richard Friedman (friedman<at>cancercenter.columbia.edu) | 750 restriction enzymes and their binding sites | REBASE stores the location of restriction enzymes' binding sites. | REBASE | REBASE Details |
| Functional Annotation DB | Markus Fischer (mf2355<at>columbia.edu) | 350 structural genomics targets | The Functional Annotation Database stores annotation on PDB structures and the results from programs such as DALI and SKAN. | Functional Annotation DB | AnnotationDB Details |
| PQS db | Kely Norel (rn98<at>columbia.edu) | 66,000 structures | Protein Quaternary Structure database | Internal: /ifs/data/c2b2/bh_lab/shares/databases/macmol EBI: http://pqs.ebi.ac.uk/pqs-doc/pqs-doc.shtml | PQS Details |
