Database Inventory

Facilities hosting databases :  

(1) C2B2 (Center for Computational Biology and Bioinformatics) -

(2) Herbert Irving Comprehensive Cancer Center -

(3) Genome Center -







GenBank Richard Friedman (friedman<at> 82 million sequences,

65 billion nucleotides

Consists of nucleic acid sequences used for sequence retrieval and homolog identification.,

GenBank Details
NCBI BlastDB Richard Friedman (friedman<at> ~ 60 million sequences Consists of nucleic acid and peptide sequences used for sequence retrieval and homolog identification. BlastDB Details
GenPept Richard Friedman (friedman<at> 5 million sequences,

1.5 billion amino acids

Consists of protein sequences used for sequence retrieval and homolog identification.,

GenPept Details
UniProt Richard Friedman (friedman<at> 5 million sequences,

300 million amino acids

Consists of protein sequences used for sequence retrieval and homolog identification.,

UniProt Details
Geneways Ilya Mayzus (im53<at> 2.1 million unique molecular interactions;

Oracle: 6.7 GB, Flat-file/indices: 500 GB

Geneways is a system for automatically extracting, analyzing, visualizing and integrating molecular pathway data from research literature. It contains 4 million molecular interaction statements which describe 2.1 million unique molecular interactions, automatically extracted from one quarter million articles in 78 journals. Geneways Geneways Details
CNKB Michael Honig (mhonig<at> ~400k interactions
The Cellular Network Knowledge Base maintains the B-Cell Interactome and stores protein-protein and protein-dna interactions from several public databases. It stores sequence info, interaction type, GO annotation, cellular and molecular phenotype context for the interaction, algorithm or experimental procedure for inferring the interaction, and homology relationships across multiple species. Available for querying via geWorkbench plugin and visualizing interactions in geWorkbench Cytoscape module.

B-cell Interactome (66,193 interactions)
HPRD (6,575 interactions)
Reactome (63,801 interactions)
Geneways (45,317 interactions)
DIP (54,219 interactions)
MINT (17,640 interactions)
BioGRID (154,563 interactions)

ADOMETA Dr. Dennis Vitkup (vitkup<at>,

Lifeng Chen (lifeng.chen<at>

~ 18,000 genes ADOMETA stands for ADoption of Orphan METabolic Activities. It is a bioinformatics resource designed to predict genes for orphan metabolic activities, which are known biochemical activities not currently assigned to genes in some or all organisms. ADOMETA returns a ranked list of genes likely to catalyze a given metabolic activity in a selected organism. ADOMETA ADOMETA Details
HMAP Kely Norel (rn98<at> ~ 29,000 templates HMAP is a method for hybrid multidimensional alignment of profiles which combines sequence, secondary and tertiary information in protein structures into profiles to facilitate the detection of remote homologs and perform sequence-to-sequence alignments. The database for HMAP contains approximately 13,000 templates, which are 90% sequence non-redundant. /ifs/data/c2b2/bh_lab/shares/databases/hmap/1dprof HMAP Details
SCOP Kely Norel (rn98<at> 105,725 PDB entries The SCOP database provides a detailed and comprehensive description of all known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and far evolutionary relationships; the third, fold, describes geometrical relationships. SCOP,
Internal: /ifs/data/c2b2/bh_lab/shares/databases/scop
SCOP Details
PDB Kely Norel (rn98<at> 55,000 structure entries The Protein Data Bank contains 3D structural coordinates for over 42,000 proteins and is used for protein visualization, structural analysis, and structure-based searching. PDB,
Internal: /ifs/data/c2b2/bh_lab/shares/databases/pdb
PDB Details
PROSITE Richard Friedman (friedman<at> 1,300 patterns PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them. PROSITE PROSITE Details
Pfam Richard Friedman (friedman<at> Statistical models of 9,000 protein families Pfam is used for the identification of protein structural domains. Pfam Pfam Details
REBASE Richard Friedman (friedman<at> 750 restriction enzymes and their binding sites REBASE stores the location of restriction enzymes' binding sites. REBASE REBASE Details
Functional Annotation DB Markus Fischer (mf2355<at> 350 structural genomics targets The Functional Annotation Database stores annotation on PDB structures and the results from programs such as DALI and SKAN. Functional Annotation DB AnnotationDB Details
PQS db Kely Norel (rn98<at> 66,000 structures Protein Quaternary Structure database Internal: /ifs/data/c2b2/bh_lab/shares/databases/macmol
PQS Details
