August 19, 2012

Raising a peace banner in the Neandertal Wars

The two camps in the Second Neandertal Wars (*)  have assumed maximalist positions on opposing sides of the argument: African structure explains it! vs. Neandertal admixture explains it!. Armed with the Vindija genome, that marvel of technological ingenuity, and a suite of impressive statistical models, the two sides have reached completely opposing conclusions.

In order to formulate my own position, I decided to do what I love best, i.e., to look at the data for myself. My main idea is that the signals of Neandertal and Denisova admixture as measured by these quantities (D-statistics) ...

D(Pop1, Yoruba, Neandertal, Chimp)
D(Pop1, Yoruba, Denisova, Chimp)

... will vary on different SNP ascertainment panels. SNPs ascertained in Africans may have a great number of Palaeoafrican alleles; SNPs in Neandertal-admixed populations will have a great number of Neandertal alleles; SNPs in Denisova-admixed populations will have a great number of Denisova alleles. If a population has admixture from hominin X, this admixture, as measured by the D-statistic, will tend to be inflated in panels possessing alleles that introgressed from X, and suppressed in panels that lack them.

The issue of ascertainment and archaic admixture was addressed by Skoglund and Jakobsson (2011); my aim is different: I am not so much interested in how ascertainment affects admixture estimates, but rather in exploiting the observation of the preceding paragraph (that Palaeoafrican, Neandertal, or Denisovan SNPs will lurk at different rates when ascertained in different individuals) to see what it tells us about human differences.

The signal of "archaic admixture" may be generated by genuine archaic admixture in one population (e.g., Eurasians), making it more similar to the archaic group (e.g., Neandertals), or by archaic admixture -of a different sort- in another population (e.g., Africans), making it less similar to that group. Both these processes may be at work, operating at different intensity in different populations and across different timelines.

I used the Harvard HGDP set, which contains 12 SNP panels, each of which has been ascertained in two chromosomes of a single individual. These panels are:
San, Yoruba, Mbuti, French, Sardinian, Han, Cambodian, Mongolian, Karitiana, Papuan1, Papuan2, Melanesian
A D-statistic was calculated relative to either Neandertal or Denisova for all HGDP populations, as well as the two archaic hominins. Subsequently, I used MCLUST to infer the number of different clusters on the basis of these statistics. In the optimal solution, MCLUST inferred 7 clusters, with each archaic hominin getting its own cluster, while the modern human populations were assigned to 5 clusters corresponding to five major human races recognized by traditional physical anthropology (Mongoloid, Negroid, Australoid, Capoid, and Caucasoid).
Note that these are not admixture proportions, but assignment probabilities! All populations fell into their expected clusters. The populations from Pakistan who are believed to be predominantly Caucasoid with varying degrees of minor admixture of an Ancestral South Indian element were assigned to the Caucasoid cluster. So did the Mozabite Berbers, a Caucasoid population with minority Negroid admixture. Finally, of the Central Asian populations, the Hazara of Pakistan showed mixed affiliations in the Caucasoid and Mongoloid clusters, while the Uygur were assigned to the Mongoloid cluster.

It is noteworthy that by exploiting patterns of relationship of modern to regional archaic humans, we have managed to recreate the major human groups. This is, perhaps, supportive of those who have argued that a degree of regional continuity across the Old World, and not only recent post-Out of Africa genetic divergence is responsible for present-day inter-population differences.

MCLUST also gave us the D-statistic means for the 7 inferred clusters. Remember that these are differences between a population Pop1 and Yoruba, relative to an archaic hominin (Neandertal or Denisova), and for 12 different ascertainment panels:


There are wonderful patterns to be discovered here; you can look at the data for yourselves; that's the open science thing to do.

All our ideas about human origins are conditioned on the availability of genomes from two archaic Eurasian hominins, and the lack of genomes of similar age from Africa.

But, remember:
  • You can fit Europe, China, India, and the US into Africa, with room to spare. 
  • If Vindija and Denisova, two caves less than 5,000km apart were home to people more divergent from each other than any two humans are today, it's strange to think that only "modern humans" inhabited Africa at the same time. 
  • The maximum genetic distance between living Africans is much higher than the maximum distance between living Eurasians: Africa is much more diverse than Eurasia. It's simpler to assume that the same relative pattern was true during the Middle Stone Age. The palaeoanthropology seems to support this, showing archaic forms present even during the terminal Pleistocene in Africa.
  • If modern humans did interbreed with 2/2 archaic humans whose sequences we possess, it's strange to think that they somehow shunned the African Others.
In view of the above, I humbly raise my peace banner in the Neandertal Wars, and declare that it isn't either-or: it's both!

(*) The First Neandertal Wars were fought decades ago by anthropologists working with calipers and magnifying lenses. Their outcome was to relegate Neandertals from the enviable position of our likely ancestors to that of an irrelevant sidekick, although a not-negligible minority continued an insurgency against the Out-of-Africa-only victors.

5 comments:

Unknown said...

Dieneke: I do think you can use the "Harvard" HGDP panels like if there was no ascertainment. One would need to take this ascertainment into account before doing any sensible analyis on this data set.
This is however not a trivial matter...

I'm afraid it shows the limits of "open Science"...

Dienekes said...

The advantage of using the marker sets of the Harvard HGDP panel is (quoting the technical document associated with it):

"(Panels 1-12) Discovery of heterozygous sites within 12 individuals of known ancestry
The first 12 SNP ascertainment strategies are based on the idea of the Keinan, Mullikin et al. Nature Genetics 2007 paper6. That paper takes advantage of the fact that by discovering SNPs in a comparison of two chromosomes from the same individual of known ancestry, and then genotyping in a larger panel of samples from the same population, one can learn about history in a way that is not affected by the frequency of the SNP in human populations. In particular, even though we may miss a substantial proportion of real SNPs in the individual (false-negatives), and even if a substantial proportion of discovered SNP are false-positives, we expect that the inferences about history using SNPs discovered in this way will be as accurate as what would be obtained using SNPs identified from deep sequencing with perfect readout of alleles."

It is clear that when one uses different ascertainment panel as I have done in this post, that interesting patterns emerge.

Now, these patterns may depend on population history, but they are certainly interesting. For example, why does the signal of archaic admixture in Australoids become so pronounced with an Amerindian ascertainment, even higher than in Amerindians with an Amerindian ascertainment, or in Australoids with an Australasian ascertainment?

I'm afraid it shows the limits of "open Science"...

I think the open science thing to do is put the experiment out there. It can be ignored, or someone might see something interesting in the patterns. Even if the probability of it being useful at all is low, it certainly doesn't hurt anyone, except my own time, which I'm perfectly willing to sacrifice toying around with data :)

Unknown said...

It is not as simple as just having less SNPs than you would have without ascertainment.
The whole (multidimensional) site frequency spectrum is biased in a complex way. So allele frequencies within populations are biased, as well as any index of population differentiation, not mentioning any PCA analysis done on the raw data...

Dienekes said...

It is not as simple as just having less SNPs than you would have without ascertainment.

Never said it was, which is why I pointed to the paper by Skoglund and Jakobsson which showed how the signal of archaic admixture might be biased at the periphery of the expansion of modern humans.

On the other hand, there are interesting patterns in the data. For example, under the French ascertainment, the signal of Neandertal admixture for Caucasoids diminishes relative to the African ascertainments. Given that Caucasoids both represent a bottlenecked OoA population AND the French ascertainment ought to carry more Neandertal alleles, this is certainly a curious finding. The loss of Palaeoafrican alleles by using a non-African ascertainment seems like a good explanation for this pattern, so here is a case where African structure may in fact explain some of the D-statistic differences between Europe and Africa.

Another case is the aforementioned one of Australoids under an Amerindian or Han ascertainment which makes the statistics shoot up, while the reverse (Mongoloids under an Australoid ascertainment) makes them drop down.

I don't pretend to know what goes on here, but hopefully someone smarter and knowledgeable than myself may find the answer.

terryt said...

'In view of the above, I humbly raise my peace banner in the Neandertal Wars, and declare that it isn't either-or: it's both!"

That is probably correct.

"Africa is much more diverse than Eurasia. It's simpler to assume that the same relative pattern was true during the Middle Stone Age".

With H. erectus (or something similar) having evolved in Africa I think it is safe to assume that such diversity goes back that far. Way beyond the MSA.

"If modern humans did interbreed with 2/2 archaic humans whose sequences we possess, it's strange to think that they somehow shunned the African Others".

Perhaps did not have the opportunity before the subset left Africa and entered the Levant/Arabia.

"the signal of archaic admixture might be biased at the periphery of the expansion of modern humans".

It has long been recognised that the species or subspecies most different to the average for that group tend to be found at the geographic margins.

"In the optimal solution, MCLUST inferred 7 clusters, with each archaic hominin getting its own cluster, while the modern human populations were assigned to 5 clusters corresponding to five major human races recognized by traditional physical anthropology (Mongoloid, Negroid, Australoid, Capoid, and Caucasoid)".

Those five modern clusters are found at the geographic margin of the human species spread before the expansion into Northwest Europe, Australia and America. The same five clusters are what I used in my conception of the human star:

http://humanevolutionontrial.blogspot.co.nz/2009/06/human-evolution-on-trial-human-star.html

You may find the essay interesting.