Intragen Data

From Informatics

Jump to: navigation, search

Intragen Home | Requirements | Data | Design & Development | Setup & Configuration


Contents

Format of Data Received from Genotyping Facility

Individual genotyping data is released in batch format, containing up to 96 individual samples per batch. Each batch file is labeled NYHP_PlateXXX_YY where XXX refers to a plate number and YY (01-96) refers to the number of individual samples on the plate.

Genotypes

SNP data is released in the following format:

 Column 1 - SNP marker name (rs#)
 Column 2 – SubjectID 
 Allele A
 Allele B
 Quality Score
 Chromosome (where the SNP marker is located)
 Position (base number within the chromosome where the SNP is located)

The (Chromosome, Position) pair uniquely specifies the position of a SNP marker. This example contains the first few lines from an actual genotypes file as delivered to us by the genotyping facility. Each line gives the genotype for a particular SNP marker.

Phenotypes

Phenotype files list the following Subject attributes:

 Subject ID
 Plate number
 Sex
 Year of birth
 Ethnicity Code

This file contains the actual phenotypic information for the first 224 individuals genotyped. Ethnicity of subjects is provided by a code. The code is based on self report, integrating information provided by the subject on themselves, their parents, and four grandparents. An ethnicity code provides a concatenation of reported ethnicity on all of these seven individuals, without regard to the frequency of occurrence in the family.

Format of Data Exports

After authentication registered users will be presented with a querying interface which will allow them to indicate subjects for data export (for the first phase of implementation the interface will support the selection of either individual samples or all samples coming fromn the same plate). An export package will comprise 3 files, combined into a zip archive:

  • genotypes: tab delimited file with columns
  SNP Name
  Sample ID 
  Allele1 
  Allele2
  GC Score
  • SNP map: tab delimited with columns (markers should be listed in order of increasing chromosome # and for each chromosome in order of increasing position #)
  SNP Name 
  Chromosome 
  Position
  • phenotypes: Excel file with columns
  Sample ID
  Plate Code
  Gender
  Year of Birth 
  Ethnicity

Within the phenotypes file additional worksheets will provide the codes for gender and ethinicity (see this file).

Personal tools