May 14, 2011

The strangeness of the human genome

Here is a little experiment:

Calculate the first principal component of variation between Papuans, and Karitiana from Brazil. These are some of the populations most distant to Africa that one can find genetic data for. (One Papuan,HGDP00544, is substantially different from the rest, and is shifted towards East Asians, so he was removed, all analyses on 613,630 SNPs with all no-calls removed).

Project 18 Mbuti Pygmies+San (henceforth Palaeoafricans) and 21 Yoruba from the HGDP-CEPH onto this component.

What do we expect? According to the standard Out-of-Africa model, you expect that Palaeoafricans and Yoruba will not differ from each other along the axis in which Amerindians differ from Papuans. If you take into account the Denisovan admixture in Australo-Melanesians, you might expect Africans (who lack this admixture) to be more Amerindian-like (since Amerindians also lack that archaic component). But, you certainly don't expect in either scenario Palaeoafricans to differ from Yoruba.

What the data say. Here are the data points along PC1:

green = Papuans
magenta = Karitiana

Here is a blowup of the middle part, showing the African populations:

red = Palaeoafrican
blue = Yoruba

A t-test supports (p less than .000001) the obvious visual conclusion that Palaeoafricans differ from Yoruba in the same way that Papuans differ from Karitiana. This is quite remarkable: why would Yoruba differ from San/Pygmies in the same way that Amazonians differ from Australo-Melanesians?

The difference is not that great: the Paleoafrican/Yoruba means are -.022 and -.020 and the Papuan/Karitiana ones are -0.167 and 0.206 respectively. Hence, the difference between Palaeoafricans and Yoruba is only 0.54% or so in this projection. But it is there, and it points to events in human prehistory not covered by the "standard model".

Discussion

I have long argued that Africa should not be viewed only as a source, but also as a destination of population movemenets. If Africa was only a source, then there would be absolutely no reason for two different African groups to differ from each other in the same way that two of the most distant (from Africa) groups do. No one can reasonably argue, I think that Africans had the opportunity of any amount of gene flow with either Papuans or Amerindians.

I conjecture that the signal detected here is a legacy of a prehistoric episode of migration of Eurasians into Africa, which affected Yoruba more than it did Palaeoafricans. This population of Eurasians was slightly more similar to Karitiana than to Papuans. We will try to trace its origins next.

Using She instead of Karitiana

I repeat the previous experiment, but I use She, a far eastern ethnic group of China instead of the Karitiana.Here are the PC1 co-ordinates in this projection:

Yoruba: 0.111, Palaeoafrican: 0.108, Papuan: -0.155, She: 0.248

Hence, Yoruba are shifted by 0.74% on the Papuan-She axis relative to Palaeoafricans.

Using Tuscans instead of She

Using Tuscans the PC1 co-ordinates are:

Yoruba: 0.169, Palaeoafrican: 0.164, Papuan: -0.144, Tuscan: 0.289

Hence, Yoruba are shifted by 1.15% on the Papuan-Tuscan axis relative to Palaeoafricans.

Using Onge instead of Tuscans

Finally, I substituted Onge from the Indian Ocean for Tuscans. This analysis is based on 112,041 SNPs, so it's not directly comparable with the previous ones. Nonetheless:

Yoruba: 0.075, Palaeoafrican: 0.07, Papuan: -0.15, Onge: 0.267

Hence, Yoruba are shifted by 1.2% on the Papuan-Onge axis relative to Palaeoafricans.

Conclusion

In a previous post on McEvoy et al. (2011) I speculated about a possible West Eurasian back-migration into Africa. The results presented here are compatible with that theory, but they are also compatible with a second Out-of-Africa movement; the latter, however, if it happened, did not only affect West Eurasians, but also East Asians and even Amerindians, at least relative to Papuans who may have been more isolated than the rest.

On balance, I prefer a scenario with back-migration:
  1. It is difficult to envision a second Out-of-Africa that reached Brazil in its spread but avoided Papua, moreover there are no diagnostic uniparental markers of such an event
  2. It is simpler to think of a movement of Y-haplogroup DE-bearing men a short distance from "somewhere between the Indian Ocean (where the Andamanese live), and East Africa." which would introduce Eurasian-like genes into Sub-Saharan Africa.
UPDATE (May 16): See the interesting discussion on the problem of potential ascertainment bias in the comments. In short: the signal seems to persist in the She and Tuscan comparisons, but not in the Karitiana one. Perhaps this means that whatever event took place postdates the migration of Amerindians into the New World?

16 comments:

terryt said...

"2.It is simpler to think of a movement of Y-haplogroup DE-bearing men a short distance from 'somewhere between the Indian Ocean (where the Andamanese live), and East Africa.' which would introduce Eurasian-like genes into Sub-Saharan Africa".

I'm sure that is so, however I'm still experiencing huge resistance (almost blind rage) from Maju when I have the audacity to suggest such a thing.

Eze said...

The slight shift the Yoruba have towards Eurasians (compared to Paleoafricans that is) could be explained by the fact that the Paleoafricans split from most of humanity first, hence had more time to become more divergent from Eurasians.

While the proto-Niger-Congo groups split off from the proto-Out of Africa group on a later stage. This scenario would make a secondary Out of Africa migration or an Eurasian back-migration affecting Yoruba redundant.

Onur said...

Palaeoafricans differ from Yoruba in the same way that Papuans differ from Karitiana.

Using the expression "in the same way" thrice does not make the genetic difference between Palaeoafricans and Yoruba in the same way as the genetic difference between Papuans and Karitiana. There is a very huge difference between the two genetic differences and the shift of Yoruba towards Eurasians compared to Paleoafricans is slight (less than I guessed to see), as can be seen in your PCA plots. I agree with Eze that the slight shift of Yoruba towards Eurasians compared to Paleoafricans can be explained by the early split of Paleoafricans from the rest of humanity.

Dienekes said...

You don't get it, the "early split of Palaeoafricans from the rest of humanity" would not place them in a different position from Yoruba on the Papuan-Karitian axis because Papuans/Karitiana are part of the rest of humanity.

Onur said...

But as is clear from the Denisovan paper, Papuans are shifted away from that rest of humanity to some extent probably due to their unique archaic admixture and subsequent isolation from the rest of humanity. This should have an effect on the PCA positions of Paleoafricans and the rest of Sub-Saharans (who very probably split from non-Africans later than Paleoafricans) when they are compared with Papuans.

Dienekes said...

But as is clear from the Denisovan paper, Papuans are shifted away from that rest of humanity to some extent probably due to their unique archaic admixture and subsequent isolation from the rest of humanity.

The Denisovan admixture is irrelevant in this context. At any locus where Karitiana and Papuans have a different allele because of Denisovan admixture in the latter, Yoruba and Palaeoafricans are both expected to have the Karitiana allele, since Africans lack "Denisovan admixture". Hence, the shift of Yoruba from Palaeoafricans along the Papuan-Karitiana axis is not due to Denisovan admixture in Papuans.

pos said...

Many studies have shown that Papuans appear less close to African pastoralists than expected (e.g. Reich et al. Nat Gen). In the light of the recent evidence, this is most likely explained by the putative ~5% archaic human (Denisova) ancestry in Papuans.

This archaic ancestry could result in a skew towards 'Paleoafricans' compared to Yoruba partly because non-Africans are more related to Yorubans and more to distant to 'paleoafricans'. In that sense e.g. Denisova is closer to 'paleoafricans' in coalescent terms (if genetic drift is not higher in paleoafricans). Add to this the fact that ascertainment bias favours inclusion of SNPs in which the derived allele is frequent in non-Africans and Yoruba, causing 'paleoafricans' to have allele frequencies more similar to the archaic component of Papuans (more likely to be in the form of ancestral alleles in the loci on SNP chips). Simulations would be appropriate to test this hypothesis.

Dienekes said...

The archaic admixture in Denisovans and the "archaic" (African hunter-gatherer) element in San and Pygmies are not expected to be particularly close. While both are not derived from the main group that comprises the bulk of mankind, they diverged before the emergence of Homo sapiens. In short archaic African _not equals_ archaic Papuan, and hence the results cannot be explained by archaic admixture.

pos said...

Obviously San/pygmy variation does not 'equal' Denisova variation or any other archaic Eurasian population. But what you are looking at here are fine-scale patterns and differences on the order of a percent.

Even if you ignore the fact that the data is biased due to SNP discovery in a panel of Yorubans and non-africans, a population history such as e.g. (Denisova,(San,(Yoruba,(Papuans,Karitiana)))) with gene flow from Denisova into Papuans and probably a high effective size in 'paleoafricans' would result in a shorter distance between Denisova-'paleoafricans' compared to Denisova-Yoruba, measured in units of genetic drift. This applies because you use multiple diploid individuals in the PCA.

I agree that gene flow from West Eurasia to East Africa is likely though. This is quite evident in analyses of HapMap Masaai (e.g. the supplement of the Hapmap 2010 paper in nature).

Dienekes said...

Ok, I'll see if I can play some tricks to avoid ascertainment bias if that is really the problem.

Using only SNPs that are polymorphic in all studied groups oughta do the trick, right?

lars said...

mr dienekes
I think that kongoid and mongoloid admixture in the caucasoid groups obeys to a geographical pattern rather than ethnolinguistic one.
i.e as far as we go eastwards(in the caucasoid groups' realm)the mongoloid gradient increases and as far as we go southward the kongoid gradient increases.
Thus it looks that the mongoloid input in anatolians is very likely a legacy of a very ancient mongoloid-affinity and not be the result of the few turk warriors' invasion.
And that's why indo-iranian india and iran have more mongoloid input than anatolia i.e simply because they are located more eastward(closer geographically to Siberia and Eastern Asian than Anatolia)
OT
please one of the dear dieneksians comment on the "The case for Euphratic" so I can post the remaining 2 parts of my comment

Onur said...

Ok, I'll see if I can play some tricks to avoid ascertainment bias if that is really the problem.

Using only SNPs that are polymorphic in all studied groups oughta do the trick, right?


Great. I'll be waiting for their results to come before making further arguments. BTW, I tend to agree with much of Pos' points on this issue.

Dienekes said...

Ok, here goes:

I used --maf 0.01 in every source population, to ensure that only polymorphic sites in all populations were used. Also --geno 0.01 in every source population.

For the Papuan/Karitiana the Yoruba/Palaeoafrican effect disappeared.

For the Papuan/She it persisted (0.99% shift of Yorubans relative to Palaeoafricans on the Papuan-She axis)

It also persisted for the Papuan/Tuscan (1.15% shift).

Onur said...

So it seems a second Out-of-Africa migration is equally compatible with the data as a back-migration to Africa, especially as the shift of Yorubans to Eurasians decreases based on the distance of the Eurasian reference population to Africa (less Yoruba shift to She than to Tuscans), as the second Out-of-Africa migration is expected to be clinal with decrease in its impact with increasing distance from Africa.

pos said...

Cool, to completely get rid of ascertainment bias is probably near impossible, but excluding monomorphic SNPs is probably a good step though but how about maf > 0.1 or something like that? In my opinion the archaic admixture is likely to still factor in, or at least it would be very hard to prove that it does not.

Andres Baldrich said...

Maybe it has some relation with R1b1 ydna in Cameroon?