clusterKmeans

https://www.walmart.com/ip/Popsicle-Rainbow-Big-Stick-3-5-oz/198225061

POPSICLE- a software suite to determine population structure and to establish genotype-phenotype associations using Next-generation sequencing data

MAIN

INDEX

ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

POPSICLE Package

Download Manual pages in HTML format

Used Case

PerformKmeansClustering
Prerequisites

Generate the baseline file using the FindDivergenceFromBaseline utility of POPSICLE. Convert to ARFF format using Convert2ARFFformat utility of POPSICLE

What does it do?
It clusters the samples into specified range of groups using k-means implementation in WEKA. It also assigns score to each of those groups based on intra-cluster similarity and inter-cluster dissimilarity using DUNN index. It proposes optimum number of clusters that are relevant to your data.
How to run it?
java -jar LPDtools.jar PerformKmeansClustering -i sampledARFFfile -o outputClustersFile -n minClusters -m maxClusters
Here, -i is the baseline files generated using FindDivergenceFromBaseline, which is converted into ARFF format using Convert2ARFFformat utility of POPSICLE
Output
kmeansClusters

Here, the user specified minimum clusters to be 2 and maximum clusters to be 15 as indicated in the first line. The line 2 shows cluster assignments for each sample assigned to 2 clusters, line 3 shows assignments to 3 clusters and so on. Line 17 shows DUNN index score associated with 2 clusters, line 3 shows score for 3 clusters and so on. The software recommends the optimum number of clusters be 3. If the user disagrees with this evaluation, he can change the last line to reflect the number of clusters he thinks are relevant. In this example, the number of optimum clusters the user chose were 5 (line 33)

CITATION: Jahangheer S. Shaik, Asis Khan and Michael E. Grigg, "POPSICLE: A Software Suite to Study Population Structure and Ancestral Determinants of Phenotypes using Whole genome Sequencing Data", submitted to PLoS special edition