Difference between revisions of "SAEC exec sum"

(Data reformatting)
(removing SNPs without founder genotype)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= SAEC Executive summary of data preparation=
 
= SAEC Executive summary of data preparation=
"[+]" denotes hidden additional information. Clicking on the "+" shows that information. "-" denotes available mouse-over information.
+
"<font color="red">[+]</font>" denotes hidden additional information. Clicking on the "<font color="red">+</font>" shows that information. "<font color="red">[*]</font>" denotes available mouse-over information.
 
== Data reformatting ==
 
== Data reformatting ==
 
The original Illumina data that came in four comma separated files <span class="toggleblock" title="csv_files">[<font>+</font><font style="display:none;">–</font>]</span> where divided up by subject and stored in separate files <span class="toggleblock" title="located on ~/SJS/Genotypes">[<font>*</font><font style="display:none;">*</font>]</span>.
 
The original Illumina data that came in four comma separated files <span class="toggleblock" title="csv_files">[<font>+</font><font style="display:none;">–</font>]</span> where divided up by subject and stored in separate files <span class="toggleblock" title="located on ~/SJS/Genotypes">[<font>*</font><font style="display:none;">*</font>]</span>.
Line 17: Line 17:
  
 
== PLINK input data ==
 
== PLINK input data ==
 +
info on reformatting
  
 
== removing SNPs without founder genotype ==
 
== removing SNPs without founder genotype ==
 +
link to file
 +
 +
following these obvious "outliers" a number of analysis were performed (see results page) that identified SNPs and subjects with inconsistencies. Those are listed below:
  
 
== Individuals removed ==
 
== Individuals removed ==
 +
removed because (link to results) concordance + ethnic inconsistancies
  
 
== SNPs removed ==
 
== SNPs removed ==
 +
SNPs where removed based on more than one criteria, which ones? where are the results?

Latest revision as of 15:45, 18 January 2008

SAEC Executive summary of data preparation

"[+]" denotes hidden additional information. Clicking on the "+" shows that information. "[*]" denotes available mouse-over information.

Data reformatting

The original Illumina data that came in four comma separated files [+] where divided up by subject and stored in separate files [**]. 4 subject duplicates in Illumina records were removed based on their call rate[+].

PGx40001_12278-DNA.csv
PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12276-DNA.csv
PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12277-DNA.csv
PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12914-DNA.csv

WG0012277-DNA_A10_2948_A10 and WG0012277-DNA_F04_2948_F04.

PLINK input data

info on reformatting

removing SNPs without founder genotype

link to file

following these obvious "outliers" a number of analysis were performed (see results page) that identified SNPs and subjects with inconsistencies. Those are listed below:

Individuals removed

removed because (link to results) concordance + ethnic inconsistancies

SNPs removed

SNPs where removed based on more than one criteria, which ones? where are the results?