March 07, 2012

TreeMix: fitting trees in the presence of admixture

Pickrell and Pritchard have made available a preprint of a new paper that shows how to fit a graph model to a set of populations. Tree models are commonly used to infer the relationship between populations, but these are often inappropriate for populations within the same species where lateral gene flow may (and often does) play a role.

I have wished for something like this for a long time, so it's great that it has finally been attempted. Moreover, the TreeMix software is available for anyone who wants to play with it.


If my CPUs were not already on fire between several new projects, I would love to try this right away, but I'm sure that I will get around to it before too long.

Inference of population splits and mixtures from genome-wide allele frequency data

Joseph K. Pickrell1 and Jonathan K. Pritchard

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In this model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and ``ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.

Link

8 comments:

pconroy said...

How should I read the graph??

If I look at the diagram left-right, I see the root is the leftmost vertex - Denisovan/Neanderthal + San...

If I look at it bottom-up, I see the root as the bottommost vertex:
- African + West-European + East-European

Which is correct??

terryt said...

"in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations".

Interesting, although the study doesn't include 'Papuans'. They could be the 'population ancestral to other extant East Asian populations'.

Easy772 said...

Wait a second. Wouldn't Papuans and Melanesians have their own tree? I thought Papuans were M carriers, which came after K right? I always assumed they were tropically adapted populations from the "K wave".

eurologist said...

Really nice work. Is there consensus now that most mid-latitude to northern East Asians are derived from Siberians? That is, a continental migration versus a coastal one?

The European - Native American connection will surely lead to some speculation...

Inclusion of the Oceanians surely twists things around a bit, but it is still believable (without, SW Asians are derived from the same level as are Caucasians and Europeans; with, Caucasians and Russians are intermediate). Both is probably true, and would need a migration study focused on these populations.

Either way, Cambodians as expected clearly have a very ancient element (and Melanesian admixture).

I wonder if the order Orcadian -> French -> Italian -> (Basque, Sardinian) indicates that that the peopling of Europe started from North/Central Europe vs. a Mediterranean route, or indicates later (agriculture and IE) admixture.

Andrew Oh-Willeke said...

Cool! It would not have been obvious that this was even possible.

Easy772 said...

@eurologist

Melanesian or Papuan admixture in Cambodians. Thought ISEA populations had ~3% Melanesian while mainland SEA had ~3% Papuan. I could be wrong though.

So if I'm reading the display correctly, the ancestral Southeast Asian population is the same as the Ancestral South Asian population yes?

eurologist said...

So if I'm reading the display correctly, the ancestral Southeast Asian population is the same as the Ancestral South Asian population yes?

One of the authors and Razib seem to think so, but I don't. I think there are many reasons to believe the ancestral SE Asian population goes deeper in time, IMO including pre-Toba. Whereas, the ancestral South Indian population IMO stems from the climatic boundary inherent across India during cold/dry times ~70,000-60,000ya and again after ~45,000ya, but predominantly ~25,000 - 16,000 ya. Of course, the two admixed, but several important differences, including the amount of Denisovan admixture, indicate that until recently, admixture was low.

Unknown said...

"I wonder if the order Orcadian -> French -> Italian -> (Basque, Sardinian) indicates that that the peopling of Europe started from North/Central Europe vs. a Mediterranean route, or indicates later (agriculture and IE) admixture."

Its hard to read but I read the Orcadians as being near the root of the Russian/Adygei branch.

I can digest an Adygei-Berengia connection.

But Russian-Maya? A Orcadian-Maya could be later contamination, but Russian-Maya?