POPSICLE- a software suite to determine population structure and to establish genotype-phenotype associations using Next-generation sequencing data |
MAIN INDEX ANALYTICAL PIPELINE CONTACT SYSTEM REQUIREMENTS POPSICLE Package Download Manual pages in HTML format Used Case |
Used Case We demonstrate the utility of POPSICLE using an example dataset which consists of chromosomes 1 and 2 from 30 falciparum samples. The reads from these 30 samples are aligned to these two chromosomes and sorted by genome position. Look for the data files in the directory ./data Step1: Find somies The somies are found using the utility FindSomies from POPSICLE java -jar H:\JavaCodes\JarExecutables\LPDtools.jar FindSomies -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\bams" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Somies.txt" -m 1 -k "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\chrSizes.txt" See FindSomies Page for more details Step2: Find alleles The alleles are found using the utility findAlleles from POPSICLE java -jar H:\JavaCodes\JarExecutables\LPDtools.jar findAlleles -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\bams" -n 1 -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\alleleFiles" See FindAlleles Page for more details Step3: Generate popsicle input file The input file is generated using the utility GenerateInputFromAlleleFiles from POPSICLE java -jar H:\JavaCodes\JarExecutables\LPDtools.jar GenerateInputFromAlleleFiles -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\alleleFiles" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\snps" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\popsicle1.txt" -k "vcf" -l "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Somies.txt" see GenerateInputFromAlleleFiles page for more details Step4: Remove loci that contain the same allele in most of the samples The input file is filtered using the utility RemoveInsignificantLoci from POPSICLE java -jar H:\JavaCodes\JarExecutables\LPDtools.jar RemoveInsignificantLoci -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\popsicle1.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle1.txt" -m 0.9 see RemoveInsignificantLoci page for more details Step5: Remove loci that contain lots of missing data The input file is filtered using the utility RemoveLociWithLotsOfMissingData from POPSICLE java -jar H:\JavaCodes\JarExecutables\LPDtools.jar RemoveLociWithLotsOfMissingData -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle1.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle2.txt" -m 0.5 see RemoveLociWithLotsOfMissingData page for more details Step6: Find divergence from baseline The nucleotide sequence data is converted to numeric data using the utility FindDivergenceFromBaseline java -jar H:\JavaCodes\JarExecutables\LPDtools.jar FindDivergenceFromBaseline -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle2.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt" see FindDivergenceFromBaseline page for more details Step7: Convert the file from step 6 into ARFF format The file can be converted to arff format using the utility Convert2ARFFformat java -jar H:\JavaCodes\JarExecutables\LPDtools.jar Convert2ARFFformat -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.ARFF" See Convert2ARFFformat page for more details Step8: Cluster the input data to find the population size The clustering can be performed using the utility PerformKmeansClustering java -jar H:\JavaCodes\JarExecutables\LPDtools.jar PerformKmeansClustering -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.ARFF" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters" -n 2 -m 10 See PerformKmeansClustering page for more details Step9: Find Ancestries The Ancestries can be found using the utility POPSICLEIntermediate java -jar H:\JavaCodes\JarExecutables\LPDtools.jar POPSICLEIntermediate -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6.txt" -n 5000 -k "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.ARFF" -l "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\temp" See POPSICLEIntermediate page for more details Step10: Polish the ancestry data to eliminate spurious assignments The polishing can be done using the utility PolishPosicle java -jar H:\JavaCodes\JarExecutables\LPDtools.jar PolishPosicle -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6.txt" -n 3 Step10: Arrange the polished ancestry data by clusters java -jar H:\JavaCodes\JarExecutables\LPDtools.jar ArrangePopsicleByClustersFormed -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6_polished.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6_polished_arranged.txt" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters" Step11: Draw circos plot to reveal global and local ancestries java -jar H:\JavaCodes\JarExecutables\LPDtools.jar ConvertPopsicle2CircosHighlights -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6_polished_arranged.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\circos" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt" -k "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters" See circos directory for more details. open circos.conf change line 15 to include the sample names that you currently have replace line 41 with information from colors.txt file run circos using the command circos -conf circos.conf -outputFile first.png Step12: Draw local ancestries Draw local ancestry profiles using DrawPopsicle.R Step13: Find genotype phenotype associations java -jar H:\JavaCodes\JarExecutables\LPDtools.jar FindGenotypePhenotypeBootstrap -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6.txt" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\VirulenceRegion.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\genopheno.txt" -n 50 Filter this file to include only entries with p-value below certain value. In this example p-values less than or equal to 0.01 are selected Step14: Find Annotations java -jar H:\JavaCodes\JarExecutables\LPDtools.jar findAnnotations -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\genopheno0.01.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\genopheno0.01.annot" -j C:\Users\shaikjs\Desktop\PopsicleTemp\plasmodium\ReferenceFiles\PlasmoDB-32_Pfalciparum3D7.gff |
CITATION: Jahangheer S. Shaik, Asis Khan and Michael E. Grigg, "POPSICLE: A Software Suite to Study Population Structure and Ancestral Determinants of Phenotypes using Whole genome Sequencing Data", submitted to PLoS special edition |