NCBI BlastDB

From Informatics

Jump to: navigation, search
  • What labs are using the BLAST databases?
  • It is available to all labs within GCG, in addition to some labs outside of Columbia University.
  • Who is the main “database authority” for the BLAST databases?
  • Pavel Morozov (pm59<at>columbia.edu), Hans-Erik Aronson
  • What kinds of databases are these? (flat-file, relational, XML, object-oriented, etc)
  • Flat-file - OS text files in Fasta format
  • Database Attributes? (Name, BioCategory, Description, Size, Filepath)

NCBI BlastDB

    1. nt
      • nucleotide
      • All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences)
      • 1.6 million sequences
      • /databases/blastdb/db1/ncbi
    2. nr
      • peptide
      • All non-redundant GenBank CDS translations+PDB+Swissprot+PIR+PRF
      • 4.7 million sequences
      • /databases/blastdb/db1/ncbi
    3. swissprot
      • peptide
      • SWISS-PROT protein sequence database
      • 237,000 sequences
      • /databases/blastdb/db1/ncbi
    4. pataa
      • peptide
      • protein sequences derived from the Patent division of GenBank
      • 380,000 sequences
      • /databases/blastdb/db1/ncbi
    5. patnt
      • peptide
      • nucleotide sequences derived from the Patent division of GenBank
      • 3.7 million sequences
      • /databases/blastdb/db1/ncbi
    6. pdbaa
      • peptide
      • protein sequences derived from the 3-dimensional PDB
      • 29,318 sequences
      • /databases/blastdb/db1/ncbi
    7. pdbnt
      • nucleotide
      • nucleotide sequences derived from the 3-dimensional PDB
      • 7,051 sequences
      • /databases/blastdb/db1/ncbi
    8. est_human
      • nucleotide
      • Human subset of GenBank+EMBL+DDBJ sequences from EST div
      • ~ 8 million sequences
      • /databases/blastdb/db1/ncbi
    9. est_mouse
      • nucleotide
      • Mouse subset of GenBank+EMBL+DDBJ sequences from EST div
      • 4.8 million sequences
      • /databases/blastdb/db1/ncbi
    10. est_others
      • nucleotide
      • Non-redundant database of all other organisms GenBank+EMBL_DDBJ EST sequences
      • ~ 11.9 million sequences
      • /databases/blastdb/db1/ncbi
    11. gss
      • nucleotide
      • Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences
      • ~ 10.5 million sequences
      • /databases/blastdb/db1/ncbi
    12. sts
      • nucleotide
      • Non-redundant database of GenBank+EMBL+DDBJ STS divisions
      • 922,406 sequences
      • /databases/blastdb/db1/ncbi
    13. month.aa
      • peptide
      • All new or revised GenBank CDS translations + PDB + SwissProt + PIR + PRF released in the last 30 days
      • 200,216 sequences
      • /databases/blastdb/db1/ncbi
    14. month.nt
      • nucleotide
      • All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days
      • 114,786 sequences
      • /databases/blastdb/db1/ncbi
    15. mito.aa
      • peptide
      • database of mitochondrial sequences
      • 2,222 sequences
      • /databases/blastdb/db1/ncbi
    16. mito.nt
      • nucleotide
      • database of mitochondrial sequences
      • 129 sequences
      • /databases/blastdb/db1/ncbi
    17. alu.a
      • peptide
      • translations of select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences
      • 1,962 sequences
      • /databases/blastdb/db1/ncbi
    18. alu.n
      • nucleotide
      • select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences
      • 327 sequences
      • /databases/blastdb/db1/ncbi
    19. vector
      • Vector subset of GenBank (R), NCBI
      • 911 sequences
      • /databases/blastdb/db1/ncbi
    20. yeast.aa
      • peptide
      • Yeast amino-acid sequences
      • 6,298 sequences
      • /databases/blastdb/db1/ncbi
    21. month.est_human
      • nucleotide
      • non-redundant database of Human GenBank+EMBL+DDBJ EST sequences
      • 61,643 sequences
      • /databases/blastdb/db1/ncbi
    22. month.est_mouse
      • nucleotide
      • non-redundant database of Mouse GenBank+EMBL+DDBJ EST sequences
      • 4,132 sequences
      • /databases/blastdb/db1/ncbi
    23. month.est_others
      • nucleotide
      • non-redundant database of all other organisms GenBank+EMBL+DDBJ EST sequences
      • 211,077 sequences
      • /databases/blastdb/db1/ncbi

  • Anticipated yearly growth? (Megabytes/Gigabytes)?
  • It is available to all labs within GCG, in addition to some labs outside of Columbia University.
  • Backup procedures? How often?
    • Database backups (Hot, Cold, Both) [and/or]
    • Operating system backup
    OS backup
  • What servers/operating systems are hosting them (IP addresses)
  • adtera.cu-genome.edu
  • Approximately how many *active* users?
  • Not sure - perhaps all AMDeC users
  • How often is the database used? (Daily, Weekly, Monthly)
  • Daily
  • What platforms are being used? (Oracle, MySQL, PostgreSQL, etc)
  • Not applicable (N/A) for RDBMS
  • What applications are using these databases?
    • Web interface?
    • Application (GUI)?
    • Command user interface(CUI)?
    CUI
    BLAST: 99% of time
    EMBOSS, BioPerl, HMMER, SSAHA, MUMmer
  • Is it accessible from outside the firewall to public users?
  • YES - users connect to ADGATE via SSH
  • What is the primary purpose of the database? (What types of information does it contain?)
  • Homology - to compare and find similar sequences
    It contains nucleotide and amino-acid sequences for numerous species
  • Are there any issues or problems with the database?
    • Specific error messages popping up?
    • Problems connecting from the application or web interface?
    • Performance issues (queries are slow, freezes at times, etc)
    • etc...
          Frequent error message - Segmentation fault
          Web-interface no longer works
          Formatting the databases fails at times, freezes on occasion
  • Would they like help in administering the database?
  • Not at the moment - it is mostly automatic with some manual labor at times.
  • What additional features or changes would users like to see?
    • new tables or queries?
    • additional screens on application or web interface?
    • migrate to different database platform (i.e. MySQL to Oracle)?
    none
  • May I access the database and if so, what is the login info?
  • Yes - obtain OS permissions from Hans-Erik
Personal tools