November 16, 2012

TreeMix paper "officially" published

~8 months after the paper was pre-published in Nature Precedings, it is also "officially" published in PLoS Genetics. In the meantime, I count 18 uses of the label TreeMix in my blog, which includes both uses of the treemix software itself and its auxiliary threepop and fourpop programs; I also wrote a small script that converts ADMIXTURE output into TreeMix format, and generally had a lot of fun using it. I'm glad I didn't have to wait 8 months to learn that something like TreeMix existed.

In the grand scheme of things, an 8-month head start may not be much, but consider that perhaps someone else might either have a use for TreeMix or the desire to build on it, and if they decide to make their research available prior to official publication, then, perhaps an additional few months might be gained. And, if someone else still decides to follow up on them then...

There are many arguments for immediate publication of research results, but I think that the potential for speeding up scientific progress is one of the best ones.

In the old days, it was really necessary to impose a delay between the time when a scientist placed a final full stop to his paper and the time it appeared on another scientist's desk: publication involved significant expenses of paper, ink, and labor, so the frivolous or erroneous had to be weeded out; dissemination involved expensive transport by carriage or boat; storage involved a building, and bookshelves, and additional cost.

All these costs have shrunk to insignificance; imposing delays to research dissemination now accounts to little more than placing a sleep() call in the unending loop of scientific advancement. And, the one remaining argument for post-review publication ("weeding out the frivolous or erroneous") carries little weight: pre-review publication is a better guarantor of quality by exposing research to many more eyes and minds that may scrutinize it more carefully, having rid themselves of the idea that "if it's published it must be good".

PLoS Genet 8(11): e1002967. doi:10.1371/journal.pgen.1002967

Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data

Joseph K. Pickrell1, Jonathan K. Pritchard

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and “ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at


No comments: