August 30, 2012

Scrubbing Sardinians

In a series of posts, I showed that European populations have east Eurasian-like admixture, an element that appears to be lacking in Sardinians. I did this both on the basis of the 3-population test and a number of different comparisons between West Eurasian populations, as well as on the basis of the 4-population test.

The fact that f4(Sardinian, CEU, Asian, African) is negative was interpreted by  Moorjani et al. (2011) as evidence that Sardinians have ~2.9% African admixture. As I pointed out at the time this level of admixture was predicated on the assumption that CEU did not have Asian admixture, and this assumption now appears not to hold.

Of course, the above-mentioned paper also used an admixture LD based method (ROLLOFF) to date the African admixture in Sardinians, coming up with an estimate of ~71 generations. But, we should remember that ROLLOFF does not quantify the extent of this admixture.

Imagine walking along a Sardinian genome: the negative f4 signal is created both by occasional African-like segments you meet along the way, but also by the presence of East Eurasian SNPs in CEU in other locations where Sardinians may have no African admixture. The f4 signal is a genomewide average that is influenced by two different processes: punctuation by African segments whose length distribution can supply information about the time of their introgression; and, the background genome that is lacking in East Eurasian-like polymorphism present in CEU.

In this post, I will show that:
  • The admixture estimate of 2.9% is not robust, but depends on the choice of Asian population for f4 ancestry estimation, consistent with the idea that it is influenced by east Eurasian-like admixture that has affected northern European populations.
  • If Sardinians are "scrubbed" of any trace of African admixture, the negative  f4(Sardinian, CEU, Asian, African) signal persists
Estimates of African admixture in Sardinians depend on choice of Asian/American population

African ancestry in Sardinians was estimated by Moorjani et al. (2011), using the following ratio:

f4(San,Papuan; Sardinian,CEU) / f4(San,Papuan; YRI, CEU)

In Table S6 different ancestral populations were used for f4 ancestry estimation, and all results ranged between 2.9-3.4%.

The signal of east Eurasian-like admixture in northern Europe is strongest when Karitiana as used as an Asian/American reference. If the level of "African" admixture in Sardinians is driven, as I suspect, by the presence of east Eurasian-like admixture in northern Europe, then I expect this admixture to be highest when Karitiana instead of Papuans are used. And, indeed, this is what I observe :

f4(San,Papuan;Sardinian,CEU) = 0.00118099 (Z=10.6838)
f4(San,Papuan;YRI,CEU) = 0.0379664 (Z=88.2287)

(in all experiments I use a set of 28 Sardinians vs. 27 in the Moorjani et al. paper, a set of 112 CEU, 147 YRI, a set of 166,770 SNPs, and -k 200 for fourpop)

therefore, African admixture in Sardinians using Papuan reference = 0.00118099/0.0379664 = 3.1%

but


f4(San,Karitiana;Sardinian,CEU) =  0.00272141 (Z=22.7288)

f4(San,Karitiana;YRI,CEU) = 0.04449 (Z=100.19)

therefore, African admixture in Sardinians using Karitiana reference = 0.00272141/0.04449 = 6.1%

A ~2-fold difference in African admixture has resulted from a different choice of outgroup. This is unexpected if West Eurasians did not exchange genes with Papuans and Karitiana since their divergence, but expected if CEU received genes from an Asian population that was more like Karitiana and less like Papuans.

Scrubbing Sardinians

Another way to demonstrate that east Eurasian-like admixture in CEU is inflating the perceived level of African-like admixture in Sardinians is to comprehensively "scrub" Sardinians of all traces of African ancestry by replacing segments of their DNA when there is even a hint of such ancestry with missing values.

Going back to the mental experiment of walking along the Sardinian genome, we are going to remove spots of even remote possibility of African admixture. It will be shown that CEU continues to have evidence of east Eurasian-like admixture using the scrubbed Sardinians, suggesting that it is not only African-like admixture in Sardinians generating this signal, but also East Eurasian-like admixture in CEU.

I used DIYDodecad to do this scrubbing, but one could potentially try any approach that can identify African segments, such as HAPMIX or PCA. I used the dataset assembled for K7b and K12b, and carried out a K=3 ADMIXTURE analysis, which resulted in 3 components centered on West Eurasia, Asia, and Africa. I chose not to use an African component from higher-K (e.g. the K7b calculator), because it is conceivable that African ancestry might be lurking in southern Caucasoid components inferred with these tools (e.g., the "Southern" component of K7b or the "Southwest Asian" one of K12b). The average African admixture in Sardinians using the K3b calculator is 0.9%, and for the subset of CEU used it is 0.2%.

Using the byseg mode of DIYDodecad, I created ancestry maps of the 28 HGDP Sardinians, and I only kept windows where the African admixture was exactly 0%. This is a very aggressive scrubbing, designed to remove virtually all African admixture from the population. For example, if a window has 99.9% West Eurasian admixture and 0.01% African, I will nonetheless remove it, even though chances are extremely high that the 0.01% represents only noise. I did not want to leave any doubt that any trace of identifiable African ancestry remained in my "scrubbed Sardinians".

I am very confident that my scrubbed Sardinians do not have any hint of African ancestry, but you can decide for yourselves. I base my confidence on (a) the extreme nature of the scrubbing , which threw away much of the Sardinian genome in order to ensure that no hints of local African ancestry remained (b) re-assessment of the scrubbed Sardinians with K3b showing that they are now 100% West Eurasian, (c) ab initio ADMIXTURE analysis of CHB, YRI, CEU, and scrubbed Sardinians, demonstrating that the latter are 100% West Eurasian, while CEU has traces of 0.1% African and 0.3% Asian ancestry.

So, here are the results for the scrubbed Sardinians:

f4(San,Papuan;Sardinian_scrubbed,CEU) = 0.000678108 (Z=4.05225)
f4(San,Papuan;YRI,CEU) = 0.0379664 (Z=88.2287)
so scrubbed Sardinians with Papuan reference appear 0.000678108 / 0.0379664 = 1.8% African

and 

f4(San,Karitiana;Sardinian_scrubbed,CEU) = 0.00205526 (Z=11.2848)
f4(San,Karitiana;YRI,CEU) = 0.04449 (Z=100.19)
so scrubbed Sardinians with Karitiana reference appear 0.00205526/0.04449 = 4.6% African

Despite the thorough scrubbing, Sardinians continue to show African admixture using f4 ancestry estimation. This is consistent with the idea that much of the African ancestry inferred using f4 ancestry estimation in Sardinians is an artifact of not taking into account east Eurasian-like admixture in CEU.

Conversely, a significant signal of east Eurasian-liked admixture in CEU persists whether one uses regular or scrubbed Sardinians:

With regular Sardinians

f4(San,Papuan;Sardinian,Karitiana) = 0.0084678 (Z=21.2137)
f4(San,Papuan;Sardinian,CEU) = 0.00118099 (Z=10.6838)

So, CEU appears = 0.00118099/0.0084678 = 13.9% East Eurasian

With scrubbed Sardinians

San,Papuan;Sardinian_scrubbed,Karitiana 0.00774427 0.00056725 13.6523
San,Papuan;Sardinian_scrubbed,CEU 0.000678108 0.000167341 4.05225

So, CEU appears = 0.000678108/0.00774427 = 8.8% East Eurasian

Conclusion

My "palimpsest" idea seems to be confirmed by the data. A first observation is that the level of African-like admixture in Sardinians depended on whether one used Papuans or Karitiana as an outgroup, suggesting that neither population was a true outgroup, and the signal of African admixture in Sardinians was driven in part by East Eurasian-like admixture in CEU. African admixture in Europe cannot be assessed accurately if one ignores the confounding effect of East Eurasian admixture.

When I aggressively scrubbed Sardinians so as to remove all traces of African ancestry, part of the African admixture fraction disappeared (expected, since African ancestry was removed from Sardinians), but a substantial part of it remained (unexpected, if the signal was driven only by African admixture, but expected, if it was driven in part by East Eurasian-like admixture in CEU). Conversely, using scrubbed Sardinians reduced, but did not make disappear, the admixture estimate for CEU.

13 comments:

aramt said...

Any thoughts on why Karitiana would be the better proxy? In another topic we already saw that Atlantic-Baltic shows some Amerindian-like component even in the absence of any Siberian-like comp.. Things are starting to get interesting.

truth said...

The study of Moorjani is pretty bad actually, it didn't make sense all those high percentages, and is very Anglo-centered in the sense they use the CEU sample as the defacto pure European reference, whitout taking in account the northern-euro asian-like admixtures. I think your articles are pretty genius.

Dienekes said...

The study of Moorjani is pretty bad actually, it didn't make sense all those high percentages, and is very Anglo-centered in the sense they use the CEU sample as the defacto pure European reference, whitout taking in account the northern-euro asian-like admixtures. I think your articles are pretty genius.

I wouldn't go that far. It was a reasonable effort based on the available evidence. I don't think anyone could have suspected easily that east Eurasian-like admixture had taken place in northern Europe to an extent that the estimates would be affected. Also, the paper introduced a new methodology for inferring admixture times, which is still useful whether the admixture levels are perhaps a couple percent higher than they ought to be.

Dienekes said...

Any thoughts on why Karitiana would be the better proxy? In another topic we already saw that Atlantic-Baltic shows some Amerindian-like component even in the absence of any Siberian-like comp.. Things are starting to get interesting.

My guess is it has something to do with the Y-haplogroup R-Q common ancestry. Some old Asian group that spawned both Rs and Qs, with some of the Qs moving to the Americas ~15ka, and some of the Rs drifting towards Europe and arriving there eventually fairly recently.

Lank said...

A ~2-fold difference in African admixture has resulted from a different choice of outgroup. This is unexpected if West Eurasians did not exchange genes with Papuans and Karitiana since their divergence, but expected if CEU received genes from an Asian population that was more like Karitiana and less like Papuans.

It's also expected if Eurasians are African-shifted, relative to Americans. You might find evidence of this African affinity in Eurasia and Melanesia, if you were to run a K=2 analysis.

Your "scrubbing" method seems like a great method to test the effect that an Asian or African shift has on the 4-population test. However, it is limited by the use of ADMIXTURE components based on modern populations. If the K=3 components are assumed to be 'pure', then there's not even any need for the 4-population test; simply looking at the K=3 values would suffice. K=3 shows an average of 1.2% East Eurasian and 0.2% African in the CEU, and 0.9% African in the Sardinians. All Eurasians are closely related relative to Africans, which means that African admixture in Sardinians would cause a greater shift on the Asia-Africa axis than an equal amount of East Eurasian admixture in the CEU. This means that the Africa and Asia shifts of Sardinians and CEU, respectively, should roughly cancel each other out, if the 'West Eurasian' ADMIXTURE component is a proper ancestral population.

But it would be great if you tried a K=2 run as well, to shed light on the positions of different populations on the Africa-Asia (Africa-America?) axis.

Dienekes said...

If the K=3 components are assumed to be 'pure', then there's not even any need for the 4-population test; simply looking at the K=3 values would suffice.

The K=3 components are not assumed to be pure. The only thing that is assumed is that if there is African ancestry in a window, it will register a non-zero African score in that window.

Given that African-ness or not of SNPs can only be assessed by comparing allele frequencies and allele correlations in different populations, I would challenge anyone to show me a segment that is of African origin that doesn't even register a 0.01% African in an ADMIXTURE test.

What kind of scenario would result in a real African segment that consists of SNPs whose frequencies are all concordant with the frequencies in West Eurasian and Asian populations to such an extent that not even a 0.01% trace remains?

Anyway, if anyone thinks there are African segments in the scrubbed data, I'd like to see the evidence for it.

Lank said...

The K=3 components are not assumed to be pure. The only thing that is assumed is that if there is African ancestry in a window, it will register a non-zero African score in that window.

So you are assuming that the West Eurasian component is devoid of African influence at the very least, no?

What I think you have failed to bring to attention is the question about why increased East Eurasian admixture would even result in a decreased African affinity. The reason is that East Asians are further removed from Africans genetically than West Eurasians are. Americans, in turn, are even further removed from Africans. This is why an EA-mixed West Eurasian reference results in an increased African admixture estimate in populations with less East Eurasian admixture.

It would be interesting to calculate the African admixture in the scrubbed Sardinians with an East Asian reference, to compare with the Karitiana as a reference. This could then be compared to the difference between African admixture estimates in regular Sardinians using both ancestral populations. If the difference between the estimates of both experiments is more significant when the unscrubbed Sardinians are used, then this would further support the existence of the Africa-America axis I alluded to in my previous comment (Americans are further removed from Africans than anyone else).

Now, the compromise position in this whole debate would be that both groups (Sardinians, CEU) have some admixture that pulls them in different directions relative to a "West Eurasian" component. But simply using Northern Europeans or Sardinians as the assumed ancestral population to estimate admixture could lead to misleading results. I think the compromise position is the most viable explanation.

I'd like to add that we still know very little about the ancestral populations who lived many thousands of years ago and how they gave rise to the variation seen in populations today. It's even possible that Sardinians may have East Eurasian influence; they simply appear to lack this in comparison to other modern West Eurasians. Not to mention the African affinity of West Eurasians in general.

I wonder what methods the Reich lab is using in their upcoming paper.

Unknown said...

I think "scrubbing" is a potentially offensive adjective in this context and it would be better to choose another one. Excised, trimmed, corrected etc for example.

Anonymous said...
This comment has been removed by the author.
Anonymous said...

I won't lie that I'm an amateur to all this, so don't shoot me down too harshly, but, to me, the situation seems to be this: What distinguishes Sardinians from other Europeans is found here:

http://dodecad.blogspot.co.uk/2012/04/estimating-your-gok4-related-ancestry.html

In short, they have the lowest levels of native/pre-Neolithic European ancestry of the populations explored. I imagine a portion (but not all) of East Eurasian affinity in Europeans is proportionate to UP European ancestry: The latter were more Asian-shifted than Neolithic farmers. But does this require admixture in the formal sense? West Eurasians came later in time, as it were, than East Eurasians. West Eurasian is a subset of Eurasian -- the ancestral population -- and, because of that, distinction from that group (or from its East Eurasian cousins) is always going to be relative.

If a population (proto-West Eurasians) splinters away from East Eurasians and then splinters again at some point, it stands to reason that, because of a multitude of factors (population size, dispersal etc.), one will differentiate from the ancestral population at a faster rate.

If they later meet up again, one will almost certainly be more or less similar to this or that population. This is true of any group: Siberians and Americans, for instance -- if you threw them together, Siberians would be more African-shifted than Americans, despite no admixture.

Is that not, then, what we're seeing? I don't know if it's possible to distinguish admixture from affinity, so all of what I've said may well be moot, but, to me, it's intuitive that no two related-but-separated populations would diverge from an ancestral component at precisely the same rate.

Of course, the picture's complicated by multiple waves of later 'real' admixture, peaking in NE Europe.

Acid said...

I think a file is missing to run the K3b calculator (maybe K3b.par?). It doesn't work.

Dienekes said...

I think a file is missing to run the K3b calculator (maybe K3b.par?). It doesn't work.



See updated link.

Mauri said...

Without having any opinion about the African admix among Sardinians, I want to thank you for this exercise, because I have seen so many times in genetic analyses that the globe is round and everything is relative. The principle is that we can take any two coordinates and give values for other locations using these two. If necessary we can pick third one :) .