March 10, 2012

Trying TreeMix on HapMap populations

I gave TreeMix a try on samples of 30 individuals from all the HapMap-3 populations. The following was done with ~260k SNPs and default parameters. I specified -m 4, that is, I allowed for 4 migration events, and specified YRI as the outgroup to produce this result.

I'll have to explore the intricacies of the visualization parameters a little further, and see how to extract some raw numbers from the output files, but overall, the results seem quite reasonable:

If I'm interpreting the figure correctly, there seems to be:

  • progressively more drift (left-to-right) from Africa to Mexico (the only Native American-admixed population)
  • Mexicans are related to East Asian populations (CHD, CHB and especially JPT) and have a strong migration edge from a West Eurasian source (related to CEU and TSI)
  • African Americans (ASW) have a similar (but weaker) edge from West Eurasians
  • Maasai (MKK) have an even weaker edge from what appears to be a southern Caucasoid population (represented here by Tuscans TSI)
  • Gujarati Indians (GIH) are intermediate between East Asians and Europeans, and also have input from a West Eurasian-type population.


This seems like a neat tool for exploring human (and canid ;-) population history.

6 comments:

eurologist said...

Dienekes,

Good to see that you can use this new technique.

Given that Pickrell et al. didn't run enough migration events to show any European details, I think it would be really interesting to concentrate on Europe. Pick some of the major European groups you have (perhaps exclude Tuscans for simplicity), SW Asians, Caucasus/West Asians, Pakistan and perhaps NW India, but perhaps avoid a lot of known East Asian, Central Asian, and African admixture as possible to keep it simple.

If it works out, you could perhaps see some interesting ordering in Europe (what major tree does the algorithm pick, and in what order?, what are the major migrations/ admixtures?)!

Matt said...

Any plans to try using your Admixture zombie populations together with real populations in this kind of analysis?

Unless maybe would that be redundant/misleading...

Dienekes said...

I've already tried it with ADMIXTURE components, and it seems to work fine, e.g., it finds that the Northwest_African component is hybrid from a Southern Caucasoid and African component.

One could, in principle, mix real with zombie populations, and perhaps I'll try that in the future as well.

Andrés said...

Dienekes, this new visualization is no good - trying to fix something that is inherently wrong (phylogenetic trees for human DNA data).

Human DNA is high-dimensional information. Things like PCA reduce the data to fewer dimensions, but the smaller complexity is paid with less accuracy. Phylogenetic trees are a wrong way to depict origin a development of populations, and TreeMix is building on bad foundations.

Also, the fact that you pick a random value for m out of the blue makes the whole thing useless for serious science - in any case it should be the program to find out the number of migrations that better explain the data. The same goes for picking a random k in clustering algorithms without any serious basis.

Dienekes said...

Also, the fact that you pick a random value for m out of the blue makes the whole thing useless for serious science

The program itself picks up the number of migration events, 4 is the maximum number allowed. It's "serious science" to have parameters input into programs, because the alternative (fitting every possible parameter with every possible value) may be desirably theoretically but is impossible practically in a universe where computers have finite memory and processing power.

Dienekes, this new visualization is no good - trying to fix something that is inherently wrong (phylogenetic trees for human DNA data).

Human DNA is high-dimensional information.


DNA evolves treelike, whether it is "high-dimensional information" or not. Homologous sequences share precisely one ancestor.

Populations don't evolve treelike, but trees are often good representations of their evolution, and trees with migration edges are even better.

Unknown said...

"Notice also, that it appears that the North_European input into Siberian precedes the Atlantic_Med input into North_European. So, this is consistent with an eastern origin of North_European which absorbed Atlantic_Med/Oetzi-like populations in Europe and contributed to the East Eurasian native population in Siberia."

IMO this is consistent with the North_European/Siberian admixture preceding the Northern-European/Atlantic Med admixture in time. The first mix takes place lower down the Northern-European branch, the population moves along the branch with time. I agree this is the steppe dwellers (Gravettian, 22-32k ago). The admixture with the Atlantic med happened later to a population that had moved further away from the root.

The indigenous North African component is interesting. I assume that the strength of the migration is reflected in how easily it manifests?. This would then be the biggest mix. So a very big movement from South West Asia into an indigenous North African population. I suppose that is possible.