March 22, 2011

"Neandertal" genes in East Africa

John Hawks writes that different "Neandertal"-derived haplotypes are found in Europe and China. He attributes this to genetic drift after a population of modern humans admixed with Neandertals in West Asia.

As you all know, I've voiced significant reservations about the interpretation of the Neandertal genome data as evidence for Neandertal admixture in Eurasians. So, I decided to pull up an old experiment I had as a draft for ages because it is quite pertinent to the issue.

East Africa is a possible source of information about the issue of "Neandertal" admixture. The populations of the region are complex: they are thought to preserve features of very old Africans, perhaps the earliest Homo sapiens but they have also been affected by gene flow from Sub-Saharan Africa and West Asia.

If Neandertal admixture occurred in West Asia, then we would not expect East Africans to possess any of it, as Neandertals did not exist in East Africa. At most we would expect them to possess as much of it as could be explained by back-migration from West Asia.

So, I took the Maasai (MKK) sample from HapMap (r3 b36) and calculated allele frequencies for all SNPs in common with it and the 13 genomic regions of Neandertal admixture from Reich et al. (2010), first described by Green et al. (2010) and available here (xls).

There are 190 SNPs in that file, and 46 of them are in the HapMap data. Fortunately, this includes 9 SNPs on chromosome 5 (from rs17617368 to rs16898552) which cover all the length of a 70kbp region attributed to Neandertal admixture (from 28986511 to 29056374).

The interesting thing about this region is that 3/45 Asians possess the "Neandertal" alleles while Africans and Europeans (AFR and CEU) do not. So, it is an example of "Neandertal" genes that survived in Asians but not Europeans.

Here is a table of the minor ("Neandertal") allele frequencies on the MKK sample of 156 individuals:

Maasai seem to have some "Neandertal" genes in common with East Asians that are not shared by Europeans.

Admixture of Maasai with East Asians seems unlikely. Thus, there are three possibilities:
  • A recent back-migration of West Asians who possess these alleles
  • A really old back-migration of undifferentiated "Neandertal"-admixed West Asians in which these alleles had not yet been lost by drift
  • Origin of these alleles in the common ancestors of East Africans and Eurasians rather than introgression from Neandertals
I can't exclude the possibility that some recent Caucasoids from West Asia possessed these alleles while CEU do not. I will simply note that HapMap Tuscans (TSI), do not possess them, and neither do 471 Ashkenazi Jews from Bray et al. (2010) who are likely to be of West Asian/European ancestry. Neither do Kurds and Urkarah Dagestanis (from Xing et al. 2010) possess 2 "Neandertal" alleles on SNPs available in that dataset.

So, I will tentatively exclude the possibility that recent Caucasoid back-migrations brought these alleles to East Africa.

This leaves open two possibilities: that (i) these aren't Neandertal genes at all and were part of the ancestral gene pool of East Africans and Eurasians, or that (ii) they were brought back to Africa by a major early back-migration of undifferentiated Eurasians.

To conclude the post:
  • It's not as simple as Africans vs. non-Africans.
  • Sample more diverse African groups for "Neandertal" genes. Both Green et al. (2010) and Reich et al. (2010) claim that African groups do not differ from each other with respect to Eurasian archaic hominins, which is what you'd expect for admixture that took place in Eurasia. But, they haven't sampled broadly in Africa to make that claim convincingly.

16 comments:

Aswe said...

Intriguing blog post.

I think you should compare these "Neanderthal" SNPs to Behar's Oromos, as well as Arabians. I don't know if those SNPs were genotyped, though. Comparisons with some other East Asian groups would be good too.

Andrew Oh-Willeke said...

The most plausible back migration event for East Africa would be the same event that put Austronesians in Madagascar together with Africans ca. 2000 years ago.

Eze said...

@Oh-Willeke,

That's a very narrow way to look at it. The Horn of Africa is quite close to the Arabian Peninsula and historically the most populated region in Arabia has been the western mountainous side close to Africa. Red Sea levels were also much lower during certain periods in prehistory than they are now, making it easier for people to migrate back and forth.

However, I am not convinced that all of the Eurasian affinity found in modern Eastern Africans is due to Back-to-Africa migrations. A good portion of it is native to this region and can be explained by the Out-of-Africa migration taking place there.

What puts this theory in doubt is that the Hadzabe (an indigenous (that is non-Bantu) hunter-gatherer group from Northern Tanzania) have an incredible dissimilarity with Eurasians. In fact, their cluster had a higher Fst distance from the Eurasian cluster than the South African Khoisan cluster did per Henn et al. 2011.

Strat said...

"The most plausible back migration event for East Africa would be the same event that put Austronesians in Madagascar together with Africans ca. 2000 years ago."

No, it is one of the least plausible and most impossible scenarios. There is no evidence of an Austronesian migration to the African mainland.

pconroy said...

@Andrew,

As someone who has proposed this before, I have to agree with you.

We've yet to see detailed sampling of the islands off the coast of east Africa - like Zanzibar and others - and I wouldn't be surprised if Austronesian DNA turned up.

Dienekes said...

The Maasai have 0% East Asian Y chromosomes, and 0% East Asian admixture in my experiments. I'd say the theory that they have Austronesian admixture has no basis on the available evidence.

German Dziebel said...

"Hadzabe (an indigenous (that is non-Bantu) hunter-gatherer group from Northern Tanzania) have an incredible dissimilarity with Eurasians. In fact, their cluster had a higher Fst distance from the Eurasian cluster than the South African Khoisan cluster did per Henn et al. 2011."

The greatest Fst in Henn 2010 is between Hadza and Europeans (Tuscans), not Eurasians. Asians were not in the sample. After Hadza-Europeans (256) the greatest Fst is between Hadza and South Khoisan (222), which is exceptionally high in view of the relative geographic proximity between Hadza and SAK. Also from Henn Suppl Mat: "when European individuals are included, the largest distance along PC1
occurs between southern KhoeSan and European Tuscans. Eastern African populations, such as the Sandawe and Maasai [Hadza is a close third - G.D.], are the closest African populations to the Europeans, which is consistent with shared variation between these populations, apparent at k= 2 through 6 (Fig. 1)." If we look at Fig. S2 in Henn's Suppl Mat, then East Africans (Maasai, Hadza and Sandawe are the closest to Europeans of all African populations).

Strat said...

"The greatest Fst in Henn 2010 is between Hadza and Europeans (Tuscans)"

That seems to be enough to infer that Hadza are the genetically most distant African population to Eurasians.

eurologist said...

I totally agree that we need much more sampling within Africa to have a better picture of the scope of its diversity -- and how it relates to Eurasia. Stating the obvious: the most important missing piece of information is what are truly ancient Eurasian elements largely not shared with Africa, and what are instead ancient African contributions (now perhaps marginalized).

That said, in addition to an African origin of old segments, before ~300,000 ya everything points to the notion that gene flow between Europe, west Asia, and North Africa was freely open. Secondly, when contraction occurred after the expansion (caused by extremely favorable climate) ~130,000 - 100,000 ya, it seems likely that some of the contraction direction was back into East Africa.

So, I wouldn't be surprised to find some admixture from both times in Africans - in fact, that admixture may have been crucial in bringing about modern humans. On the flip side, admixture that only matches East Asians better fits the picture of a first migration that reached Asia, with a second one dominating the population up to India, and subsequently Europe (and thus not carrying the variation under discussion).

Unfortunately, with the timing, this admixture still could have occurred in the Middle East 130,000 - 100,000 ya, and it would be difficult to tell the difference from an original African contribution unless large statistics of such segments prove otherwise.

German Dziebel said...

"That seems to be enough to infer that Hadza are the genetically most distant African population to Eurasians."

Strat, this is enough to infer that, according to the Fst statistic, Hadza are the genetically most distant African population to Tuscans. Fst is good for pairwise comparisons but not for the whole dataset. According to PCA, Hadza is one of the three closest populations to Tuscans. And they are all East Africans. Also, Tuscans were included in the study just as a check for recent admixture in South Africa. It wasn't supposed to be a proxy for all of Eurasia.

Strat said...

"Strat, this is enough to infer that, according to the Fst statistic, Hadza are the genetically most distant African population to Tuscans. Fst is good for pairwise comparisons but not for the whole dataset. According to PCA, Hadza is one of the three closest populations to Tuscans. And they are all East Africans. Also, Tuscans were included in the study just as a check for recent admixture in South Africa. It wasn't supposed to be a proxy for all of Eurasia."

You're wrong, Dziebel. Tuscans are a proxy for Eurasians as a whole (and also for other non-Africans) when they are compared with Africans (this is clear from previous genetic studies), and it doesn't matter what intentions the authors had.

Fst is the right tool to use to make between-population genetic distance analyses, this is what Fst is primarily purposed to do. On the other hand, PCA plots, like ADMIXTURE plots, do not have such a primary purpose, and they usually capture only two dimensions (mostly the first two dimensions) and, again like ADMIXTURE plots, their results are shaped by the populations they include, so they can be misleading like ADMIXTURE plots.

German Dziebel said...

"Tuscans are a proxy for Eurasians as a whole (and also for other non-Africans) when they are compared with Africans (this is clear from previous genetic studies)"

References? You probably don't have any...

"Fst is the right tool to use to make between-population genetic distance analyses, this is what Fst is primarily purposed to do. On the other hand, PCA plots, like ADMIXTURE plots, do not have such a primary purpose, and they usually capture only two dimensions (mostly the first two dimensions) and, again like ADMIXTURE plots, their results are shaped by the populations they include, so they can be misleading like ADMIXTURE plots."

For a start, Strat, you have a complete confusion in your head. Read http://www.pnas.org/content/102/44/15942.full for how Fst analysis is done right and how it feeds into a PCA. Henn et al. used Fst to just complement their LD analysis but they didn't do extensive pairwise comparisons between genetically and geographically different populations. The reason they ran Fst is to check if recent admixture messed up their LD regressions. In Table 1 they report Fst at K=8 but apparently they ran Fst for K=14. At K=8 in Table 1, the difference between 0.25 for SAK-Tuscan and 0.256 for Hadza-Tuscan are almost identical. I don't know what exactly popped in Fst based on K=14 clusters as the point of origin/largest distance. In Ramachandran above, Hadza wasn't part of the sample, so it's hard to compare and infer. But Henn et al. were trying to downplay East Africa to build a case for SAK to be the source for modern human variation with decreasing Fst, heterozygosity and increasing LD values emanating from South Africa through East Africa to areas outside of Africa. Hadza continued to pose a problem (e.g., in Table 1 high Fst between Hadza and European and Hadza and Sandawe, higher than the corresponding values for SAK), which they explained away as a "recent bottleneck." I questioned the recent bottleneck interpretation here: http://blogs.discovermagazine.com/gnxp/2011/03/population-structure-within-africa/

So why are we trying to solve big problems on the basis of incomplete data? And I already forgot what is your point, Strat, exactly? And, finally, who are you, Strat?

Strat said...

"References? You probably don't have any..."

Actually, it is so obvious that I thought you wouldn't ask for any references, so I didn't give any references. But as you have asked for references, here is a study that includes a worldwide Fst table (Table 1):

http://hmg.oxfordjournals.org/content/early/2009/11/18/hmg.ddp505.full.pdf+html

It is clear from that Fst table that genetic distances of non-Africans to specific Sub-Saharan African populations all follow the same order of genetic closeness: Yoruba-isiXhosa-Bushmen, so Tuscans are indeed a proxy for non-Africans when compared with Sub-Saharan Africans (you can substitute Tuscans with Chinese or any other non- or almost non-Sub-Saharan admixed population).

"For a start, Strat, you have a complete confusion in your head. Read http://www.pnas.org/content/102/44/15942.full for how Fst analysis is done right and how it feeds into a PCA. Henn et al. used Fst to just complement their LD analysis but they didn't do extensive pairwise comparisons between genetically and geographically different populations. The reason they ran Fst is to check if recent admixture messed up their LD regressions. In Table 1 they report Fst at K=8 but apparently they ran Fst for K=14. At K=8 in Table 1, the difference between 0.25 for SAK-Tuscan and 0.256 for Hadza-Tuscan are almost identical. I don't know what exactly popped in Fst based on K=14 clusters as the point of origin/largest distance. In Ramachandran above, Hadza wasn't part of the sample, so it's hard to compare and infer. But Henn et al. were trying to downplay East Africa to build a case for SAK to be the source for modern human variation with decreasing Fst, heterozygosity and increasing LD values emanating from South Africa through East Africa to areas outside of Africa. Hadza continued to pose a problem (e.g., in Table 1 high Fst between Hadza and European and Hadza and Sandawe, higher than the corresponding values for SAK), which they explained away as a "recent bottleneck." I questioned the recent bottleneck interpretation here: http://blogs.discovermagazine.com/gnxp/2011/03/population-structure-within-africa/

So why are we trying to solve big problems on the basis of incomplete data? And I already forgot what is your point, Strat, exactly? And, finally, who are you, Strat?"

Nice attempt of trying to downplay the Fst results of Henn et al. 2011 on your part, but the results speak for themselves. I expected a better criticism from you.

In your last link, apparently your interesting debate with the commenter named "onur" was suddenly disrupted because of technical reasons. In his/her last comment (41) he/she explicitly states that when comparing the homogenization in Sub-Saharan Africa with the homogenization in the rest of the world he/she had in mind the more populous regions of the world for an apple to apple comparison. With regard to his/her conspiracy theory analogy, I don't know what to say.

Lastly, why are you curious about who I am?

German Dziebel said...

"It is clear from that Fst table that genetic distances of non-Africans to specific Sub-Saharan African populations all follow the same order of genetic closeness: Yoruba-isiXhosa-Bushmen, so Tuscans are indeed a proxy for non-Africans when compared with Sub-Saharan Africans (you can substitute Tuscans with Chinese or any other non- or almost non-Sub-Saharan admixed population)"

Thanks for the paper. But we still seem to be seeing and expecting different things. The paper you sent has no Tuscans in it. Then, every pairwise comparison between African and non-African populations yields significantly different values, so I don't see how Utah Europeans (213) can be seen as close to Chinese (245), all representing distance from San. Plus, Amerindians aren't in the sample, and they usually yield the largest distances from African populations.

Yes, there's a progression attested in all studies, whereby San are outliers, while other Africans (usually, a West African and a South African Bantu source) are closer to each other than to San and closer to non-Africans than San. Henn et al. may have found that Hadza are even more divergent than San but far less diverse and have the highest levels of LD in Africa. This puts him closer to non-Africans than San.

"but the results speak for themselves"

Could you tell me what they speak for themselves about, in your opinion, Strat? I explained mine in my conversation with onur.

"With regard to his/her conspiracy theory analogy, I don't know what to say."

Yes, Razib keeps his "tab" open for 2 weeks. Our time was up. Onur had been reading my string with Terry Toohill here on Dienekes, in which I referred to the anthropogenic theory of Pleistocene extinctions as a "conspiracy theory," and tossed it back at me. To it, I would respond that it does seem to be strange that a bunch of anatomically modern skulls supposedly dated in Africa to 200-50K years aren't associated with any specific behavioral adaptation or any specific tool type. As Shea recently wrote, "The absence of modern behaviors unique to Pleistocene Africa is even more odd. For at least 195,000 years, H. sapiens was an African hominin, and for much of that period (150–60 kya), it was exclusively African. There must have been behaviors derived uniquely among African H. sapiens. And yet, to read much of what has been written recently about H. sapiens evolution on that continent, the only such uniquely African pattern of human behavior appears to be the capacity to persist successfully for long periods of geological time without
acting like Upper Paleolithic Europeans." And when you know that prior to the late 1980s these skulls were dated to the Late Stone Age, you willy-nilly start wondering if they are indeed that old. The genetics of Hadza suggests that Paleoafricans weren't as diverse as they are portrayed, hence likely derived from outside of Africa.

"Lastly, why are you curious about who I am?"

Usually, I know at least something about people I cross-comment with. You don't even have a Blogger profile.

Strat said...

"Thanks for the paper. But we still seem to be seeing and expecting different things. The paper you sent has no Tuscans in it. Then, every pairwise comparison between African and non-African populations yields significantly different values, so I don't see how Utah Europeans (213) can be seen as close to Chinese (245), all representing distance from San. Plus, Amerindians aren't in the sample, and they usually yield the largest distances from African populations.

Yes, there's a progression attested in all studies, whereby San are outliers, while other Africans (usually, a West African and a South African Bantu source) are closer to each other than to San and closer to non-Africans than San. Henn et al. may have found that Hadza are even more divergent than San but far less diverse and have the highest levels of LD in Africa. This puts him closer to non-Africans than San."

Dziebel, by being proxy, I was referring to the invariance of that progression of Sub-Saharan populations in genetic relationship with different non-African populations. Distances of non-African populations to Sub-Saharans of course vary among them, but the progression remains the same. That means Tuscans, as an ordinary non-African population, are a proxy when it comes to the progression of Sub-Saharan population distances to non-Africans.

"Could you tell me what they speak for themselves about, in your opinion, Strat? I explained mine in my conversation with onur."

I don't want to plunge into a long conversation like you and onur, so I am dropping the subject (that, of course, doesn't mean I have lost the argument; I have said enough on this subject, and I don't debate to "win" an argument, search for the truth is what matters for me the most).

"Usually, I know at least something about people I cross-comment with. You don't even have a Blogger profile."

I have a Blogger profile, but it is invisible. I don't care with the identity of the people I debate.

Strat said...

I am dropping the subject especially because that we have slided away from the thread subject.