CoEDGI

From Informatics

Jump to: navigation, search

coEDGI (co-expression Enhancer Discovery of Genomic Informatics)

A gene will usually have introns (non-coding regions) and exons (coding regions) within its nucelotide sequence. A transcription
start site (TSS) and a transcription end site (TES) mark the beginning and end bases, respectively, of the transcribed sequence.
For each upstream and downstream region of a particular gene, we want to capture data on specific size chunks of each
region - 1k, 5k, 10k, and 35k base pair chunks.


Table: gene - stores information about each gene
Column
Column Type
Comment
idintegerprimary key; auto-generated
flybase_idvarchar(20)FlyBase accession ID
entrezidintegerNCBI Entrez Gene identifier
gene_symbolvarchar(20)HGNC Gene Symbol identifier
refseq_idvarchar(20)RefSeq database identifier
cg_idvarchar(20)
speciesvarchar(40)NCBI Taxonomy identifier
sequence_textmediumtextfull length gene sequence

Table: gene_to_ortholog - stores gene to ortholog mapping information

Column
Column Type
Comment
gene_idintegerpart of composite primary key & foreign-key pointer to 'id' column in gene table
ortholog_idintegerpart of composite primary key & foreign-key pointer to 'id' column in gene table

Table: region - stores downstream, upstream, exon, and intron region information for a particular gene

Column
Column Type
Comment
idintegerprimary-key; auto-generated
gene_idintegerforeign-key pointer to 'id' column in gene table; it links each gene region to its respective gene identifier
region_typeenum('exon','intron','upstream','downstream')stipulates the portion of the gene being referenced
strand_directiontinyintdirection of the DNA strand
chromosomevarchar(3)chromosome identifier
masked_outchar(1)Y/N; whether the region is completely masked out or not with N's

Table: region_sequence - stores specific sequence information for each particular gene region type

Column
Column Type
Comment
idintegerprimary-key; auto-generated
region_idintegerforeign-key pointer to 'id' column in 'region' table; an upstream or downstream region, for example, can be referenced by 1k, 5k, 10k, or 30k sections.
region_startintegerstart site of region sequence
region_endintegerend site of region sequence
sequence_lengthintegernumber of nucleotide bases in the region
sequence_textmediumtextfull length sequence of region, including N's
Personal tools