SAEC protocol
Protocol for managing and analyzing the SAEC data
Data reformating for PLINK
1. from "PGX40001_GSK_SJS_B137_29Aug2007_DNAReport.xls" take columns B and C (DNA name, Subject ID) and store it into "mapping_info.txt".
2. take the four csv files provided by GSK and extract data per individual extractPatients extractPatients
perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_12278-DNA.csv" perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12914-DNA.csv" perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12277-DNA.csv" perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12276-DNA.csv"
3. sanity check: count all non-comment lines for a sample individual
$ gawk '!/^#/{print $1}' ../data/42.txt |sort -u |wc 1069083 1069083 10938646
4. copy columns (SubjectID, SEX, SBTY) from C:\SAEC\SJS Delivery from GSK\PGX40001_Clinical\Page1_4_5_7a_8a_9a_10_11_13a.txt to file phenotype.txt
C:\SAEC\SJS Delivery from GSK\PGX40001_Illumina1M\Documents\Locus_Annotation_Files>gawk '{print $2 "\t" $1 "\t" $4 "\t" $3}' Human1M_Physical_and_Genetic_Map_Coordinates.txt > illumina1M.map