July 25, 2012

Khoisan genetic prehistory (Pickrell et al. 2012)

This appears to be the first paper using the specialized Affymetrix chip, which was announced some time ago, and used in some of my previous experiments. The new array has been dubbed "Affymetrix Human Origins array" and has been composed by intersecting panels of SNPs ascertained in individuals from several world populations.

It is of course great to see that this paper has appeared as a preprint in arXiv, and hopefully this is a trend that will continue; biology should be like physics, with papers appearing immediately online for commenting, and not hidden away in authors', editors', and reviewers' drawers for months if not years before they become available to all.

I will highlight some points of particular interest to me:

Some caveats of interpretation here are warranted. First, all the Khoisan populations have some level of admixture with non-Khoisan populations. There is thus no single \split time" in their history, and any method (like the one used here) that estimates a single such time will actually be estimating a composite of several signals. Second, we have made the modeling assumption that history involves populations splitting in two with no gene  ow after the split. More complex demographies are quite plausible, but render the interpretation of a split time nearly meaningless (if populations continue to exchange migrants after \splitting", they arguably have not split at all). We thus consider strong interpretations of split times estimated from genetic data to be impossible, but we nonetheless and the estimates to be useful in constraining the set of historical hypotheses that are consistent with the data. 

This echoes (somewhat) my sentiments about split times being a tug-of-war in the presence of admixture. Another interesting bit from the paper:

Interestingly, a few of the Khoe-speaking populations have slightly positive f4 statistics in this com- parison, and in the Shua the f4 statistic is significantly greater than zero. We speculate that some of the Khoe-speaking populations have a low level of east African ancestry, and that the relevant east African population was itself admixed with a western Eurasian population. The Shua also show a detectable signal of admixture LD, though we estimate the admixture date as much older (44 generations). This signal of east African ancestry specifically in Khoe-speaking populations is of particular interest in the light of the hypoth esis that the Khoe-Kwadi languages were brought to southern Africa by a pre-Bantu pastoralist immigration from eastern Africa [Guldemann, 2008] 

The authors also announce an improvement on TreeMix:

In the original TreeMix algorithm, one first builds the best-tting tree of populations. However, this approach is not ideal if there are many admixed populations (as in our application here, where all of the Khoisan populations are admixed). To get around this, we allow for known admixture events to be incorpo- rated into this tree-building step. Imagine that there are several populations that we think a priori might be unadmixed (in our applications, these are the Chimpanzee, Yoruba, Dinka, Europeans, and East Asians). We  first build the best tree of these unadmixed populations using the standard TreeMix algorithm. Now assume we have an independent estimate of the admixture level of each Khoisan population, and imagine we know the source population for the mixture. 
I don't think that Sub-Saharan African populations can any longer be considered unadmixed. When one used SNPs ascertained in Eurasian individuals, many Sub-Saharan populations appear symmetrically related to Eurasians, because they lack variation at sites where new polymorphism appeared outside Africa. 

This is not, however, the case when one uses SNPs ascertained in African individuals, and a clear pattern of differential affiliation with West Eurasians across the continent is evident. As I have said before, I strongly suspect that this is due to fairly late back-migration of Eurasians into Africa, carrying Y-haplogroup DE chromosomes. Within haplogroup CT, both its major subclades CF and DE are represented in Eurasia, and both D,E, and DE* as well. In Africa, as far as we know, only DE* and E are native. On balance, the weight of the evidence would suggest a Eurasian origin of the DE-YAP haplogroup.

(I would perhaps be as bold as to extend this into the even more basal clades of the phylogeny which turn up with surprising regularity in Eurasian datasets, and are usually discounted as the result of recent admixture. I'm not so sure; if recent admixture was at fault, then the African signal in Eurasia would be absolutely dominated by E-related lineages: but the A's and B's turn up in quite unexpected places. Are they really all recent Africans, or could they share a much deeper common ancestry? If I had deep pockets, I'd surely invest in genome sequencing the collection of such Eurasian erratics)

As a parting thought, I hope that the data used in this paper will become publicly available in time, perhaps when the article appears in journal form. True open science depends not only in the public availability of research results, but also of the data that produced them.

UPDATE: Here is the ADMIXTURE analysis from the paper (Figure 7):

It would have been nice if the Fst values between ancestral populations were reported in the paper; also, if an East Eurasian group was added in the analysis. In any case, there does appear a pattern of differential affiliation with the French population (K=2). At K=3 the main Sub-Saharan (blue) component emerges, and a few populations continue to exhibit an excess of West Eurasian affiliation.

arXiv:1207.5552v1 [q-bio.PE]
The genetic prehistory of southern Africa

Joseph K. Pickrell et al.

The hunter-gatherer populations of southern and eastern Africa are known to harbor some of the most ancient human lineages, but their historical relationships are poorly understood. We report data from 22 populations analyzed at over half a million single nucleotide polymorphisms (SNPs), using a genome-wide array designed for studies of history. The southern Africans-here called Khoisan-fall into two groups, loosely corresponding to the northwestern and southeastern Kalahari, which we show separated within the last 30,000 years. All individuals derive at least a few percent of their genomes from admixture with non-Khoisan populations that began 1,200 years ago. In addition, the Hadza, an east African hunter-gatherer population that speaks a language with click consonants, derive about a quarter of their ancestry from admixture with a population related to the Khoisan, implying an ancient genetic link between southern and eastern Africa.



Unknown said...

The following PhD thesis is available through open access and covers some of the same populations:
Genetic variation in Khoisan-speaking populations from southern Africa
Schlebusch, Carina Maria

German Dziebel said...

As Fig. 3 shows, when corrected for admixture, Khoisans stop being "basal" to other human populations. Also high levels of admixture in Hadza suggests that linkage disequilibrium and admixture may indeed be correlated. More at http://anthropogenesis.kinshipstudies.org/2012/07/khoisans-are-genetically-admixed-and-not-basal-to-other-humans-hadza-are-recently-admixed/

terryt said...

"but the A's and B's turn up in quite unexpected places. Are they really all recent Africans, or could they share a much deeper common ancestry?"

That's an interesting thought.