kWIP: The <i>k</i>-mer weighted inner product, a <i>de novo</i> estimator of genetic similarity
by Kevin D. Murray, Christfried Webers, Cheng Soon Ong, Justin Borevitz, Norman Warthmann
Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or “samples”) in an unbiased manner, preferably de novo. Rapid estimation of genetic relatedness directly from sequencing data has the potential to overcome reference genome bias, and to verify that individuals belong to the correct genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include establishing sample identity and detecting mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.Publisher URL: http://journals.plos.org/ploscompbiol/article
DOI: 10.1371/journal.pcbi.1005727
Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.
Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.