https://www.walmart.com/ip/Popsicle-Rainbow-Big-Stick-3-5-oz/198225061 POPSICLE- a software suite to determine population structure and to establish genotype-phenotype associations using Next-generation sequencing data niaidlogo





MAIN

INDEX


ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

POPSICLE Package  

Download Manual pages in HTML format

Used Case

Used Case
We demonstrate the utility of POPSICLE using an example dataset which consists of chromosomes 1 and 2 from 30 falciparum samples. The reads from these 30 samples are aligned to these two chromosomes and sorted by genome position. Look for the data files in the directory ./data

Step1: Find  somies
The somies are found using the utility FindSomies from POPSICLE
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar FindSomies -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\bams" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Somies.txt" -m 1 -k "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\chrSizes.txt"
See FindSomies Page for more details
Step2: Find  alleles
The alleles are found using the utility findAlleles from POPSICLE

java -jar H:\JavaCodes\JarExecutables\LPDtools.jar findAlleles -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\bams" -n 1 -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\alleleFiles"
See FindAlleles Page for more details
Step3: Generate popsicle input file
The input file is generated using the utility GenerateInputFromAlleleFiles from POPSICLE

java -jar H:\JavaCodes\JarExecutables\LPDtools.jar GenerateInputFromAlleleFiles -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\alleleFiles" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\snps" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\popsicle1.txt" -k "vcf" -l "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Somies.txt"
see GenerateInputFromAlleleFiles page for more details
Step4: Remove loci that contain the same allele in most of the samples
The input file is filtered  using the utility RemoveInsignificantLoci from POPSICLE

java -jar H:\JavaCodes\JarExecutables\LPDtools.jar RemoveInsignificantLoci -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\popsicle1.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle1.txt" -m 0.9
see RemoveInsignificantLoci page for more details
Step5: Remove loci that contain lots of missing data
The input file is filtered  using the utility RemoveLociWithLotsOfMissingData from POPSICLE

java -jar H:\JavaCodes\JarExecutables\LPDtools.jar RemoveLociWithLotsOfMissingData -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle1.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle2.txt" -m 0.5
see RemoveLociWithLotsOfMissingData page for more details
Step6: Find divergence from baseline
The nucleotide sequence data is converted to numeric data using the utility FindDivergenceFromBaseline
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar FindDivergenceFromBaseline -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicle2.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt"
see FindDivergenceFromBaseline page for more details
Step7: Convert the file from step 6 into ARFF format
The
file can be converted to arff format using the utility Convert2ARFFformat
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar Convert2ARFFformat -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.ARFF"
See Convert2ARFFformat page for more details
Step8: Cluster the input data to find the population size
The clustering can be performed using the  utility PerformKmeansClustering
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar PerformKmeansClustering -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.ARFF" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters" -n 2 -m 10
See PerformKmeansClustering page for more details
Step9: Find Ancestries
The Ancestries can be found using the utility POPSICLEIntermediate
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar POPSICLEIntermediate -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6.txt" -n 5000 -k "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.ARFF" -l "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\temp"
See POPSICLEIntermediate page for more details
Step10: Polish the ancestry data to eliminate spurious assignments
The polishing can be done using the utility
PolishPosicle
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar PolishPosicle -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6.txt" -n 3
Step10: Arrange the polished ancestry data by clusters
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar  ArrangePopsicleByClustersFormed -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6_polished.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6_polished_arranged.txt" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters"
Step11: Draw circos plot to reveal global and local ancestries
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar ConvertPopsicle2CircosHighlights -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6_polished_arranged.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\circos" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.txt" -k "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\FilteredPopsicleBaseline1.clusters"

See circos directory for more details.
open circos.conf
change line 15 to include the sample names that you currently have
replace line 41 with information from colors.txt file
run circos using the command circos -conf circos.conf -outputFile first.png
Step12: Draw local ancestries
Draw local ancestry profiles using DrawPopsicle.R
Step13: Find genotype phenotype associations
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar FindGenotypePhenotypeBootstrap -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\Mix_6.txt" -j "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\VirulenceRegion.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\genopheno.txt" -n 50
Filter this file to include only entries with p-value below certain value. In this example p-values less than or equal to 0.01 are selected
Step14: Find Annotations
java -jar H:\JavaCodes\JarExecutables\LPDtools.jar findAnnotations -i "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\genopheno0.01.txt" -o "C:\Users\shaikjs\Desktop\KompoZer 0.7.10\POPSICLE\data\genopheno0.01.annot" -j C:\Users\shaikjs\Desktop\PopsicleTemp\plasmodium\ReferenceFiles\PlasmoDB-32_Pfalciparum3D7.gff
 


CITATION: Jahangheer S. Shaik, Asis Khan and Michael E. Grigg, "POPSICLE: A Software Suite to Study Population Structure and Ancestral Determinants of Phenotypes using Whole genome Sequencing Data", submitted to PLoS special edition