Protocol for managing and analyzing the SAEC data

Data reformating for PLINK

1. from "PGX40001_GSK_SJS_B137_29Aug2007_DNAReport.xls" take columns B and C (DNA name, Subject ID) and store it into "mapping_info.txt".

2. take the four csv files provided by GSK and extract data per individual extractPatients extractPatients

 perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_12278-DNA.csv"
 perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12914-DNA.csv"
 perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12277-DNA.csv"
 perl extractPatients.pl -outfile test.out -outdir ..\data -infile "..\SJS Delivery from GSK\PGX40001_Illumina1M\Extracted Genotypes\PGx40001_GSK_SJS_B137_28Aug2007_Genotype_Report_12276-DNA.csv"

3. sanity check: count all non-comment lines for a sample individual

$ gawk '!/^#/{print $1}' ../data/42.txt  |sort -u |wc
1069083 1069083 10938646

4. copy columns (SubjectID, SEX, SBTY) from C:\SAEC\SJS Delivery from GSK\PGX40001_Clinical\Page1_4_5_7a_8a_9a_10_11_13a.txt to file phenotype.txt

C:\SAEC\SJS Delivery from GSK\PGX40001_Illumina1M\Documents\Locus_Annotation_Files>gawk '{print $2 "\t" $1 "\t" $4 "\t" $3}' Human1M_Physical_and_Genetic_Map_Coordinates.txt > illumina1M.map

geWorkbench

SAEC protocol

Protocol for managing and analyzing the SAEC data

Data reformating for PLINK

Search

Personal tools

Tools