September 21, 2012

Complex origins and natural selection of the Khoe-San

The Khoe-San were recently made the object of a study by Pickrell et al. in a paper that was posted on arXiv and ought to appear in journal form in the near future. Good things come in pairs, so on the heels of that study, a new paper in Science by Schlebusch et al. deals with a similar set of populations. The former paper used the Affymetrix Human Origins Array which contains sets of SNPs ascertained in different individuals from around the world, and the dataset will be comparable to the HGDP set genotyped on the same chip. The current study uses the Illumina Omni 2.5, which would make its data to comparable to the 1000 Genomes data, as well as to a variety of other data genotyped on Illumina platforms. So, from the data perspective, I would say that the two nicely complement each other.

There is an abundance of good stuff in the 176 pages of supplementary material which are freely available in the Science website.

One important technical proposition in the paper is the use of a concordance ratio. As I understand it, this is based on the idea that when populations split, initially the signal that they did so is very weak, and becomes stronger with more time (and drift). So, by taking the ratio of concordant minus discordant alleles over concordant plus discordant ones, they can show support for a topology and estimate population split times.

Of course, Khoe-San populations cannot really be seen as having split from the rest of mankind at some particular time. Pickrell et al. argue for this on the basis of admixture LD in even the most "unadmixed" populations (such as HGDP San), but the most obvious reason why the simple split scenario cannot be true comes from the fact that the Khoe-San possess a substantial percentage of Y-haplogroup E, which links them to other Sub-Saharan Africans, and even Eurasians within a ~50ka framework at most, and probably much lower, since they carry derived sublineages within E that were founded much more recently.

Nonetheless, this admixture was probably not so great to destroy the evidence of isolation, and the authors give an estimate of ~100ka for the split:

This division forms the deepest divergence among extant humans (Fig. 2A, S32) and, assuming an effective population size (Ne) of 21,000 individuals (11, 12), the maximum likelihood divergence time is Ts = 0:083 × 2Ne generations (95% ML CI: 0.075-0.091) corresponding to ∼100,000 years ago (14), in agreement with previous estimates of 110,000-160,000 years ago (11, 12).

But, this estimate disagrees with the idea that Khoe-San split off 250-300 thousand years ago, which has been advanced on the basis of the slower autosomal mutation rate. Many of the news headlines on the paper talk about the paper showing that Khoe-San diverged before Out-of-Africa, but, actually, using the new slow mutation rate, a date of 100ka is actually around the time, or even after Out-of-Africa, which now appears to have taken place twice as early as previously thought.

Thankfully, Schlebusch et al. do not only give absolute age estimates, but also express their age estimates in terms of the effective population size. But, the effective population size is indirectly linked to the autosomal mutation rate, as I noted in my review of Gronau et al. and Veeramah et al., i.e. the two papers cited for the effective population size of 21,000 individuals. In order to generate the same amount of genetic divergence, a slower mutation rate requires a higher population size. Ergo, I don't think the estimates of Schlebusch et al. are discordant with those of Scally and Durbin, and, the two may harmonize once effective population sizes are re-calculated on the basis of the slow human autosomal mutation rate.

The authors do acknowledge the possibility of archaic admixture in Africa. In my opinion, the presence of this admixture can harmonize the evidence of shallow common ancestry with Eurasians and African farmers (e.g., in the form of Y-haplogroup E) with the deep autosomal divergence times.

I am also looking forward to getting the new data when it appears at the Jakobsson lab data page. Together with the HGDP San (on both Affymetrix and Illumina platforms), and the Henn et al. data, there will shortly be no shortage of data on the Khoe-San. And, together with the data from Pagani et al. on Ethiopia it may be a good idea to update my africa9 calculator when I find the time for it.

Science DOI: 10.1126/science.1227721

Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History

Carina M. Schlebusch et al.

The history of click-speaking Khoe-San, and African populations in general, remains poorly understood. We genotyped ~2.3 million SNPs in 220 southern Africans and found that the Khoe-San diverged from other populations >=100,000 years ago, but structure within the Khoe-San dated back to about 35,000 years ago. Genetic variation in various sub-Saharan populations did not localize the origin of modern humans to a single geographic region within Africa; instead, it indicated a history of admixture and stratification. We found evidence of adaptation targeting muscle function and immune response, potential adaptive introgression of UV-light protection, and selection predating modern human diversification involving skeletal and neurological development. These new findings illustrate the importance of African genomic diversity in understanding human evolutionary history.



eurologist said...

The authors state:
The deep divergence between
Northern and Southern Khoe-San groups corresponded to 25,000-
43,000 years, similar to estimates between West Africans and Eurasians [ref. #11].

This is a pretty long time, but could also be enhanced by different admixture. However, the fact that they compare this time to a ridiculously short West African - Eurasian divergence (from another paper) also makes me very worried about the clock used, here.

eurologist said...

I looked it up:

The current paper's ref #11 (Gronau et al., 2011) uses a human-chimp divergence time of 6.5 Mya, which in their runs corresponds to 2.0 10^-8 mutations per generation and site. Their N estimates are based on a generation length of 25 years.

Also, that paper estimates a matching population size of 9,000 - not 21,000 as stated in the current paper, if I read it correctly. Pretty confusing. Perhaps the supplement clarifies this, but I don't have the time to read that, now.

Aylwyn Scally said...

The 21,000 figure comes from the Veeramah et al. paper. And you (and Dienekes) are correct: the calulation on which it is based assumes a higher mutation rate than is measured in present-day humans. (In fact they assumed 6 My of human-chimp sequence divergence.) Rescaling it would result in a Khoe-San divergence here of ~200,000 years. Of course one should reiterate that the relationship between the Khoe-San and other humans is quite possibly more complicated than a simple divergence.