Intragen Data
From Informatics
Intragen Home | Requirements | Data | Design & Development | Setup & Configuration |
Contents[hide] |
Format of Data Received from Genotyping Facility
Individual genotyping data is released in batch format, containing up to 96 individual samples per batch. Each batch file is labeled NYHP_PlateXXX_YY where XXX refers to a plate number and YY (01-96) refers to the number of individual samples on the plate.
Genotypes
SNP data is released in the following format:
Column 1 - SNP marker name (rs#) Column 2 – SubjectID Allele A Allele B Quality Score Chromosome (where the SNP marker is located) Position (base number within the chromosome where the SNP is located)
The (Chromosome, Position) pair uniquely specifies the position of a SNP marker. This example contains the first few lines from an actual genotypes file as delivered to us by the genotyping facility. Each line gives the genotype for a particular SNP marker.
Phenotypes
Phenotype files list the following Subject attributes:
Subject ID Plate number Sex Year of birth Ethnicity Code
This file contains the actual phenotypic information for the first 224 individuals genotyped. Ethnicity of subjects is provided by a code. The code is based on self report, integrating information provided by the subject on themselves, their parents, and four grandparents. An ethnicity code provides a concatenation of reported ethnicity on all of these seven individuals, without regard to the frequency of occurrence in the family.
Format of Data Exports
After authentication registered users will be presented with a querying interface which will allow them to indicate subjects for data export (for the first phase of implementation the interface will support the selection of either individual samples or all samples coming fromn the same plate). An export package will comprise 3 files, combined into a zip archive:
- genotypes: tab delimited file with columns
SNP Name Sample ID Allele1 Allele2 GC Score
- SNP map: tab delimited with columns (markers should be listed in order of increasing chromosome # and for each chromosome in order of increasing position #)
SNP Name Chromosome Position
- phenotypes: Excel file with columns
Sample ID Plate Code Gender Year of Birth Ethnicity
Within the phenotypes file additional worksheets will provide the codes for gender and ethinicity (see this file).