SAEC Executive summary of data preparation

"[+]" denotes hidden additional information. Clicking on the "+" shows that information. "[*]" denotes available mouse-over information.

Data reformatting

The original Illumina data that came in four comma separated files [+–] where divided up by subject and stored in separate files [**]. 4 subject duplicates in Illumina records were removed based on their call rate[+–].

PGx40001_12278-DNA.csv
PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12276-DNA.csv
PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12277-DNA.csv
PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12914-DNA.csv

WG0012277-DNA_A10_2948_A10 and WG0012277-DNA_F04_2948_F04.

PLINK input data

info on reformatting

removing SNPs without founder genotype

link to file

following these obvious "outliers" a number of analysis were performed (see results page) that identified SNPs and subjects with inconsistencies. Those are listed below:

Individuals removed

removed because (link to results) concordance + ethnic inconsistancies

SNPs removed

SNPs where removed based on more than one criteria, which ones? where are the results?

geWorkbench

SAEC exec sum

Contents

SAEC Executive summary of data preparation

Data reformatting

PLINK input data

removing SNPs without founder genotype

Individuals removed

SNPs removed

Search

Personal tools

Tools