April 26, 2011

Sub-Saharan admixture in West Eurasian groups (Moorjani et al. 2011)

Let me preface this by saying that I don't doubt that there exists some Sub-Saharan admixture in some West Eurasian (Caucasoid) groups, and I've quantified the different types of African admixture that can be found in many such groups, most recently here.

However, there are serious methodological flaws in a new paper by Moorjani et al. which render its estimates unreliable. This is unfortunate, as the authors assembled an important dataset, but they only consider a very simplistic model of 2-population admixture which is completely inappropriate for the problem they are studying.

Caucasoids on the Chinese-San axis of variation

Moorjani et al. motivate their study by projecting various West Eurasian groups from Europe and the Near East onto the first principal component of variation defined by CHB (Chinese) and San (Bushmen). The reasoning is the following:
To study the signal of African gene flow into West Eurasian populations, we began by computing principal components (PCs) using San Bushmen (HGDP-CEPH- San) and East Eurasians (HapMap3 Han Chinese- CHB), and plotted the mean values of the samples from each West Eurasian population onto the first PC, a procedure called ‘‘PCA projection’’ [17,18]. The choice of San and CHB, which are both diverged from the West Eurasian ancestral populations [19,20], ensures that the patterns in PCA are not affected by genetic drift in West Eurasians that has occurred since their common divergence from East Eurasians and South Africans.
This is indeed a good idea: if some Caucasoid group A has a common ancestral element with Sub-Saharans that is lacking in another Caucasoid group B, then A is expected to be shifted towards the San side of the first PC relative to B. Indeed, this is what the authors observe:
We observe that many Levantine, Southern European and Jewish populations are shifted towards San compared to Northern Europeans, consistent with African mixture, and motivating formal testing for the presence of African ancestry (Figure 1, Figure S2).
However, this is clearly a case of seeing the glass half full. The authors prefer the hypothesis that some Caucasoid groups have African ancestry, although the hypothesis that other Caucasoid groups have East Asian ancestry can equally well explain the observed pattern. Indeed, both hypotheses may explain the phenomenon they observe.

For example, African ancestry in Palestinians has been well-documented, so Palestinians are expected to be San-shifted relative to northern Europeans. On the other hand, East Eurasian ancestry has also been well-documented in HGDP Russians, so we expect them to be CHB-shifted relative to southern Europeans.

Things are not that clear for other Caucasoid populations, e.g., southern Europeans or northwestern Europeans. The authors assume that the different position of these two groups on the San-Chinese axis is due only to Sub-Saharan admixture in southern Europeans. This implicit assumption is the Achilles' heel of the paper.

Tests of population admixture

Because of genetic drift, two populations that diverged from a common ancestor will have different allele frequencies. However, imagine if we looked at these allele differences and saw that a population A not only had different frequencies than B, but also the difference in frequencies tended to be in the direction of a Sub-Saharan population. For example, at some locus f(A)=0.4, f(B)=0.3, and f(Sub-Saharan)=0.1. You can see that B's frequency deviates from A's in the direction of Sub-Saharans. This may occur due to random drift for one particular marker, but if it occurs systematically across the genome, then admixture is a likely explanation. This is the basis of the 3-population test used by the authors.

Another idea is to see whether frequency differences between A and B are correlated with frequency differences between Sub-Saharans and another Eurasian population unrelated to either A or B. Differences between Caucasoids and Sub-Saharans are (in part) due to divergence between Sub-Saharans and ancestral Eurasians. Suppose, for example, that we've identified a group (e.g., Papuans) unlikely to have admixed with Caucasoids. If B differs from A (over many markers) in the same direction that Sub-Saharans differ from Papuans, this is consistent with the notion that B has some Sub-Saharan admixture that A lacks. This is the basis of the 4-population test.

Note that because of symmetry, a highly negative value in their 4-population test (x, CEU, Papuan, YRI) indicates Sub-Saharan admixture, while a highly positive one would indicate "Papuan" admixture! The authors do observe positive values, suggesting that some northern European populations are Papuan-shifted even with respect to CEU, most notably Russia with a Z-score of 11.4. Thankfully, we are spared a paper on Papuan admixture in Russia.

Comparison to the Indian Cline work

These tests are an important statistical tool, and many of this paper's authors have used them before to study the Indian Cline of populations. However, the current paper has two important shortcomings in comparison to Reich et al. (2009).

In their study of the Indian Cline, Reich et al. (2009) excluded groups that were shifted towards CHB, thus ensuring that they were left with groups that could be modeled as a simple mix of two ancestral population elements.

Moreover, they used the Onge a relatively isolated population from the Indian Ocean as a control group that could be said to form a clade with Ancestral South Indians at the exclusion of West Eurasians. In the current paper it is simply assumed that northern Europeans have no African admixture.
Application of the test to each West Eurasian population (using A = YRI and B= CEU) finds little or no evidence of mixture in North Europeans but highly significant evidence in many Southern European, Levantine and Jewish groups (Table 1).
In other words: taking CEU (a northern European population) as the standard, northern Europeans have no evidence of African admixture.

Sardinians: an important test case

Sardinians are an important test case for the authors' model. Their 3-population test shows no evidence of admixture, while the 4-population test does. Moreover, their STRUCTURE analysis shows a trivial 0.2%, whereas the authors estimate their Sub-Saharan admixture as 2.9%.

Let's begin by performing a PCA analysis of Sardinians, CHB, and CEU, which is shown below.

(All PCA analyses are done in smartpca as implemented in EIGENSOFT 4.0 beta, withnumoutlieriter set to 0. All analyses are performed over datasets merged in PLINK with the --geno 0.001 flag, which effectively keeps only common markers and ensures a high quality dataset)

CEU is shifted towards CHB relative to Sardinian. This is made more visually obvious if we blow up the CEU/CHB portion of the above plot:

CEU is shifted towards CHB by 2.4% relative to Sardinians. This is quite close to the 2.5% East/South Asian K=3 admixture for Britons in my most recent analysis, done with a different East Asian reference and a different method (ADMIXTURE); the CEU sample of White Utahns has been repeatedly shown to be most similar to people from the British Isles or Northwestern Europe.

Now, let's look at Sardinians, CHB, and YRI:

and a blowup:

Sardinians are shifted 1.1% relative to CEU towards YRI. Again, this is close to the 0.9% K=3 Sub-Saharan ADMIXTURE result I recently obtained.

So, where does the 2.9% Sub-Saharan admixture in Sardinians come from? Moorjani et al. estimate this percentage under the assumption that Northern Europeans are not shifted towards Chinese, i.e., that East Eurasians are irrelevant. Clearly, as we have seen, this is wrong. As we shall see, this erroneous assumption leads to the erroneous admixture estimate.

2.9% Sub-Saharan admixture in Sardinians (?)

Now, I will demonstrate how the spurious 2.9% result can be obtained. By doing so, it will become obvious why Moorjani et al. obtained this result as a result of ignoring the eastern Asian shift of their northern European sample in their analysis.

Here is a PCA plot of Sardinians, CEU, CHB, YRI:

and the blowup:

When we run all four populations together, Sardinians are shifted towards YRI along Dimension 1, and CEU are shifted towards CHB along Dimension 2. Given that the eigenvalue for PC1 is approximately twice (50.15) that for PC2 (25.31), and doing a little high school geometry on the triangle (Sardinian, CEU, YRI), we project Sardinian onto the CEU-YRI line, intersecting at point X. We thus obtain the estimated "CEU" admixture as:

= [distance(YRI, Sardinian)^2-distance(CEU,Sardinian)^2] / distance(CEU,YRI)^2

which equals 0.971021, and so, "YRI" admixture is 2.9%!

Ashkenazi Jews

The example of the Sardinians showed how lack of controling for East Eurasian shift tended to overestimate the degree of Sub-Saharan admixture. Another test case is that of Ashkenazi Jews. The authors find no evidence of admixture with their 3-population test, but do find such evidence with their 4-population test, as well as with STRUCTURE.

On a PCA plot of CHB, Ashkenazi (Behar et al. 2011), and CEU, the Ashkenazi are shifted 3.3% towards CHB along eigenvector 1.

On a PCA plot of YRI, CEU, and Ashkenazi, the Ashkenazi are shifted by 5.3% towards YRI.

In the case of the Sardinians, their African-shift together with CEU's Asian-shift caused Sardinians/CEU to diverge on the African-Asian axis, and Moorjani et al. took the entirety of this divergence to represent African admixture in Sardinians.

In this case Ashkenazi are both Asian- and African-shifted relative to CEU. The two shifts partially cancel each other out: Ashkenazi are pulled towards Africans on the YRI-CHB axis because of their YRI-shift, and away from them because of their CHB-shift. Failing to account for these processes, the authors assume that only Sub-Saharan admixture in Ashkenazi can accont for the different position of CEU and Ashkenazi on the Asian-African axis, coming up with a 2.8-3.2% "Sub-Saharan admixture" in two different samples.

And, here is a second way of seeing how this spurious admixture estimate follows from the phenomenon I am describing. CEU are (in terms of Fst) 0.76 times distant from CHB as they are from YRI (Fst=0.17 and 0.129). In other words, Sub-Saharan admixture is more "potent" at shifting a population than East Eurasian ancestry is. Ashkenazi are YRI-shifted by 5.3%, and they are CHB-shifted by 3.3%. Multiplying the latter by 0.76 we obtain: 5.3-0.76*3.3 = 2.8%!

In other words, the 2.8% Sub-Saharan admixture in Ashkenazi Jews is a compromise between two different phenomena in a tug-of-war. It is not an accurate estimate of admixture.


I have also carried an experiment with Sardinians, Ashkenazi Jews, CEU, and Papuans, instead of CHB, as Papuans are also used in the paper as an outgroup population.

and the blowup:
It is clear that the populations show differential shift towards Papuans that is concordant with their above-described shift towards the Chinese.

Luhya and Bilala

Failure to correct for differential shift towards Chinese/Papuans is problem enough, but the paper also fails to properly take into account non-West African populations. North African groups are conspicuous in their absence, while the HapMap3 Luhya (LWK) and a Bilala sample are used to represent East Africa.

Henn et al. (2011) contains Tuscan, Yoruba, Maasai, Bulala samples, so I ran the Tuscans as test data in a supervised ADMIXTURE 1.1 analysis together with these African groups, HGDP-CEPH North_Italian, and HapMap3 CEU. That is, I'm playing along -for the sake of argument- with the idea that East Eurasians are irrelevant, and Tuscans can be seen as a mixture of CEU "Europeans" and African groups.

The results are unambiguous: Tuscans/North Italians are found to be 2.1%/1.2% "Maasai" and 0% of all the other African groups. In other words whatever element there is in common between Tuscans and Africans is not particularly West African.

The inclusion in the paper of HapMap3 Luhya Bantu but not of HapMap3 Luhya Maasai is puzzling, and the choice of one group over the other is passed in silence.

In my own experiments, I distinguish between North, Sub-Saharan, and East African ancestral components.

Beyond a binary worldview

Much more can be said, but let's summarize: the model of Moorjani et al. (2011) fails because:
  1. It does not account for the West-East Eurasian axis, folding everything onto the North European-Sub-Saharan African one
  2. It undersamples African diversity by excluding both North African and East African populations
Perhaps I'll add more in the future, but I believe I've already said enough to cast serious doubt on this paper's conclusions.

PLoS Genet 7(4): e1001373. doi:10.1371/journal.pgen.1001373

The History of African Gene Flow into Southern Europeans, Levantines, and Jews

Priya Moorjani, Nick Patterson, Joel N. Hirschhorn, Alon Keinan, Li Hao, Gil Atzmon, Edward Burns, Harry Ostrer, Alkes L. Price, David Reich


Previous genetic studies have suggested a history of sub-Saharan African gene flow into some West Eurasian populations after the initial dispersal out of Africa that occurred at least 45,000 years ago. However, there has been no accurate characterization of the proportion of mixture, or of its date. We analyze genome-wide polymorphism data from about 40 West Eurasian groups to show that almost all Southern Europeans have inherited 1%–3% African ancestry with an average mixture date of around 55 generations ago, consistent with North African gene flow at the end of the Roman Empire and subsequent Arab migrations. Levantine groups harbor 4%–15% African ancestry with an average mixture date of about 32 generations ago, consistent with close political, economic, and cultural links with Egypt in the late middle ages. We also detect 3%–5% sub-Saharan African ancestry in all eight of the diverse Jewish populations that we analyzed. For the Jewish admixture, we obtain an average estimated date of about 72 generations. This may reflect descent of these groups from a common ancestral population that already had some African ancestry prior to the Jewish Diasporas.



