From Informatics
coEDGI (co-expression Enhancer Discovery of Genomic Informatics)
A gene will usually have introns (non-coding regions) and exons (coding regions) within its nucelotide sequence. A transcription
start site (TSS) and a transcription end site (TES) mark the beginning and end bases, respectively, of the transcribed sequence.
For each upstream and downstream region of a particular gene, we want to capture data on specific size chunks of each
region - 1k, 5k, 10k, and 35k base pair chunks.
Table: gene - stores information about each gene
Column | Column Type |
|
id | integer | primary key; auto-generated
|
flybase_id | varchar(20) | FlyBase accession ID
|
entrezid | integer | NCBI Entrez Gene identifier
|
gene_symbol | varchar(20) | HGNC Gene Symbol identifier
|
refseq_id | varchar(20) | RefSeq database identifier
|
cg_id | varchar(20) |
|
species | varchar(40) | NCBI Taxonomy identifier
|
sequence_text | mediumtext | full length gene sequence
|
Table: gene_to_ortholog - stores gene to ortholog mapping information
Column | Column Type |
|
gene_id | integer | part of composite primary key & foreign-key pointer to 'id' column in gene table
|
ortholog_id | integer | part of composite primary key & foreign-key pointer to 'id' column in gene table
|
Table: region - stores downstream, upstream, exon, and intron region information for a particular gene
Column | Column Type |
|
id | integer | primary-key; auto-generated
|
gene_id | integer | foreign-key pointer to 'id' column in gene table; it links each gene region to its respective gene identifier
|
region_type | enum('exon','intron','upstream','downstream') | stipulates the portion of the gene being referenced
|
strand_direction | tinyint | direction of the DNA strand
|
chromosome | varchar(3) | chromosome identifier
|
masked_out | char(1) | Y/N; whether the region is completely masked out or not with N's
|
Table: region_sequence - stores specific sequence information for each particular gene region type
Column | Column Type |
|
id | integer | primary-key; auto-generated
|
region_id | integer | foreign-key pointer to 'id' column in 'region' table; an upstream or downstream region, for example, can be referenced by 1k, 5k, 10k, or 30k sections.
|
region_start | integer | start site of region sequence
|
region_end | integer | end site of region sequence
|
sequence_length | integer | number of nucleotide bases in the region
|
sequence_text | mediumtext | full length sequence of region, including N's
|