September 07, 2012

Estimating admixture proportions and dates with ADMIXTOOLS (Patterson et al. 2012)

This is a very exciting new paper, both for what it has to say about human history, but also because it is accompanied by a new ADMIXTOOLS software package that contain methods to infer levels of dates of admixture between populations.

Ancient European origins

There was a tip about this paper in the recent study of Native American origins. It was suggested that northern Europeans have an excess of central/east Eurasian-related ancestry relative to Sardinians. I had noticed over a year ago that northern Europeans tended to be Asian-shifted relative to Mediterranean Europeans, and when the same effect was hinted at in the Native American paper, I set out to explore the issue in a series of posts using the f4 and f3 statistics. So it's great to finally see the formal treatment of the same subject.

Figure 9 from the paper shows a scenario envisioned by the authors as consistent with the evidence:


In my earlier post I had suggested a couple of explanations for this pattern, including an Asian-shift of the Mesolithic substratum which contributes more to northern than southern Europeans, as well as a possible influence by a northern stream of Indo-European invasion into Europe. The rolloff age estimate in the paper is:

In Figure 7e we show the rolloff results. The signal is clear enough, though noisy. We estimate an admixture date of 4150 ± 850 B.P. Our standard errors computed using a block jackknife (block size=5cM) are uncomfortably large here.  
However this date must be treated with great caution. We obtained a data set from the Illumina iControl database (http://www.illumina.com/science/icontroldb.ilmn) of ‘Caucasians’ and after curation have 1,232 samples of European ancestry genotyped on an Illumina SNP array panel. We merged the data with the HGDP Illumina 650Y genotype data obtaining a data set with 561, 268 SNPs. Applying rolloff to this sample with HGDP Karitiana and Sardinians as sources, we get a much more recent date of 2200 ± 762 years B.P.

If the admixture event was related to admixture between Neolithic and Mesolithic peoples, one might guess that the admixture date would be earlier. On the other hand, the evidence shows that down to 5,000 years ago, there were farmers in Europe who were like modern Sardinians, and hunter-gatherers who were ultra-North European (even more than current north Europeans), so fusion between incoming and resident groups was not a one-time deal when they first met. A recent mtDNA study also suggests that farmers and hunter-gatherers did not completely fuse until 4,000 years BP, after which time their distinctive mtDNA types begin to expand in unison.

In my opinion, the fusion may have been effected post-5ka after the arrival of Indo-Europeans into most of Europe. Before that time, there lived in Europe groups who had either a lot or a little Neolithic ancestry. The IE invasion acted as a shock that broke down old loyalties and brought together different groups whose focus was the new military/trading elite associated primarily with metallurgy. This invasion could have acted both as a source (in its northern stream) of East Eurasian-like ancestry, since it spread east-west and passed through territory where evidence of east Eurasian mtDNA has been turning up; but it could also have acted as a blender, creating out of the "apples and oranges" that existed in Europe prior to 5ka, a new variable mix.

In any case, the authors of the current paper discuss the Mesolithic vs. Indo-European issue:
Ancient DNA studies have documented a clean break between the genetic structure of the Mesolithic hunter-gatherers of Europe and the Neolithic first farmers who followed them. Mitochondrial analyses have shown that the first farmers in central Europe, belonging to the Linear Pottery culture (LBK), were genetically strongly differentiated from European hunter-gatherers (BRAMANTI et al., 2009), with an ‘affinity’ to present day Near Eastern and Anatolian populations (HAAK et al., 2010). More recently, new insight has come from analysis of ancient nuclear DNA from three hunter-gatherers and one Neolithic farmer who lived roughly contemporaneously at about 5000 years B.P. in what is now Sweden (SKOGLUND et al., 2012). The farmer’s DNA shows a signal of genetic relatedness to Sardinians that is not present in the hunter-gatherers who have much more relatedness to present-day northern Europeans. These findings suggest that the arrival of agriculture in Europe involved massive movements of genes (not just culture) from the Near East to Europe and that people descending from the Near Eastern migrants initially reached as far north as Sweden with little mixing with the hunter-gatherers they encountered. However, the fact that today, northern Europeans have a strong signal of admixture of these two groups, as proven by this study and consistent with the findings of (SKOGLUND et al., 2012), indicates that these two ancestral groups subsequently mixed.   
Combining the ancient DNA evidence with our results, we hypothesize that agriculturalists with genetic ancestry close to modern Sardinians immigrated into all parts of Europe along with the spread of agriculture. In Sardinia, the Basque country, and perhaps other parts of southern Europe they largely replaced the indigenous Mesolithic populations, explaining why we observe no signal of admixture in Sardinians today to the limits of our resolution. In contrast, the migrants did not replace the indigenous populations in northern Europe, and instead lived side-by-side with them, admixing over time (perhaps over thousands of years). Such a scenario would explain why northern European populations today are admixed, and also have a rolloff admixture date that is substantially more recent than the initial arrival of agriculture in northern Europe. (An alternative history that could produce the signal of Asian-related admixture in northern Europeans is admixture from steppe herders speaking Indo-European languages, who after domesticating the horse would have had a military and technological advantage over agriculturalists (ANTHONY, 2007). However, this hypothesis cannot explain the ancient DNA result that northern Europeans today appear admixed between populations related to Neolithic and Mesolithic Europeans (SKOGLUND et al., 2012), and so even if the steppe hypothesis has some truth, it can only explain part of the data.)
Another application of the new methodology is to Spain, where many analyses (including some of the Dodecad Project) have shown that the population has both a "Mediterranean" and a "North European" component. The authors date this admixture to 3,600 +/- 400 BP, and they associate it with Bell Beaker-related backflow into Iberia. However, a newer study that probably appeared when this paper was in review showed that Mesolithic Iberians were also North European-like. So, one probably does not need a special explanation for their case: the Neolithic/Mesolithic mix that occurred in Scandinavia, probably also occurred in Spain.  The 3.6ky signal for North European/Sardinian-like admixture in Spain is similar to the 4.15ky signal of North Eurasian/Sardinian  admixture in northern Europe. Both cases may reflect the same event. The authors point out that these dates are inconsistent with Visigoths and the like contributing a major portion of north European ancestry to Spain, consistent with the Ralph and Coop (2012) study. It might even be tempting to ascribe the small ~0.5k difference in the age of the signal to this later migration, or even to Celtic-related migrations, since the Celts -based on phenotypic descriptions by ancient authors- belonged to a substantial degree to the northern Europeoids.

It will certainly be interesting to study the Beaker folk's autosomal DNA in relation to European prehistory, as R1b makes its first appearance with them on the European scene. Were they the people who brought North European/East Eurasian-like ancestry into Iberia, or did the pre-existing I folk already possess it? As more ancient DNA is sampled, so will our ideas about the sequence of events be better informed. (If Iron Age people from Bulgaria were also like Sardinians, then, as they say, the plot thickens.)


Dates of admixture with rolloff

Here are a couple examples of the rolloff fit of an exponential distribution that is used to estimate dates of admixture:

First, the Uygur (790 ± 60 year ago) shows a very good fit, and interesting things were happening in Central Asia in the 13th century.
Second, Xhosa, at a similar age (740 ± 30 years ago). I don't know much about them, but Wikipedia tells me that they were part of the Nguni migration:
They migrated southwards over many centuries, with large herds of Nguni cattle, probably entering what is now South Africa around 2,000 years ago in sporadic settlement, followed by larger waves of migration around 1400 AD.
A little early sporadic settlements and a large pulse at 1,400AD may very well average to something very close to the given date.
And, here is the plot for Spain, where the signal is older (3600 ± 400) and noisier, as evidenced both by the wider reported error and the visual impression:


It will certainly be fun to apply the same method on other data. I had waited for rolloff since it was originally announced, and "good things come to those who wait". One interesting test case might be that of Anatolian Turks, where, presumably there were two episodes of admixture, one, early one in Central Asia between West and East Eurasian people, and a second, recent one, in Anatolia when Central Asian Turkic speakers admixed with some of the pre-Turkic inhabitants. Another will probably be that of ANI-ASI admixture in South Asia; the group behind this paper has presented this research in conference, so I'm guessing there's another paper on that topic as well, and, perhaps the even more mysterious admixture in the case of West Africans.

The authors also announce the Affymetrix Human Origins Array which is based on ascertainments included in the Harvard HGDP set, and which I've been occasionally using in some of my own experiments. This new chip was recently used in the South African study. A new curated version of the HGDP set that removes outliers is also announced:
We successfully genotyped the array in 934 samples from the HGDP, and made the data publicly available on August 12 2011 at ftp://ftp.cephb.fr/hgdp supp10/. The present study analyzes a curated version of this dataset in which we have used Principal Component Analysis (Patterson 2006) to remove samples that are outliers relative to others from their same populations; 828 samples remained after this procedure. This curated dataset is available for download from the Reich laboratory website (http://genetics.med.harvard.edu/reich/Reich_Lab/ Datasets.html). 
UPDATE (8 Sep 2012 ): The following discussion in smallcase is now obsolete. See f-statistics are robust to differences in sample age for details.

Addendum on the applicability of tests of admixture to samples of different age

Finally, since the authors study f-statistics with the Tyrolean Iceman, it is worthwhile to link to one of my recent posts on the topic. I've made a small figure to repeat the main argument of that post:




This shows the relationship between three populations: A, Cmod, and B. A and Cmod form a clade, and B is a group whose possible tree-violating contributions we are investigating.

Canc  is an ancient individual whose genome has been sampled and who belongs to the lineage leading up to Cmod. As such, he is missing a few thousand years of evolution (shown with the dashed line). On the left figure, Canc is very old (so he is missing a lot of evolution), while on the right, he is fairly recent (so he is missing only a little).

You can mentally slide Canc up and down its branch. As it tends to Cmod (right), then Canc will appear unadmixed, because it will have "experienced" almost as many years of evolution as A has, and will be separated by exactly the same amount of evolution from B as A does. But, as  Canc becomes older (left) and approaches the Root, then it will become much more related to B than A is, only on account of it being older. A test that compares Canc with A and B may conclude that Canc is a mixture of A and B.

Here is another way to explain this:

Let B be the allele found in group B and A be the allele in group A.
The pattern ABB means that Canc has B, and hence matches B. This is consistent with admixture from B-to-Canc if the allele B first appeared on the B branch of the tree.

But, it is also consistent with B being an allele at the Root that went to both sides of the tree: Canc is more likely to match the allele at the root (because he's older, closer to the Root) than A (who's younger, so an allele at the root has had more time to be lost due to drift, or a new one to appear through mutation).

Now, I don't think this effect has played a major role in the analyses' presented in this paper for the Iceman, because if A=North European, Canc=Iceman, and Cmod=Sardinians, then the f statistics show that it is A that is admixed with a B=Karitiana-like population. It might play a role in the Neandertal-like excess identified for the Iceman, because in that case A=Europeans, B=Vindija, and it is the Iceman that appears more admixed vis a vis living Europeans. I do think, however, on the basis of my ancestry map, that even if some of the signal is due to the proposed effect, not all of it is, since Neandertal-like segments in Oetzi tend to correspond with segments likely to be of European pre-Neolithic ancestry. And, indeed, if pre-Neolithic Europeans were indeed more Neandertal-like then that is another thing in which they may have resembled East Asians.

In any case, I do believe that some thought needs to be given to tests of admixture when either (i) one of the samples is an ancient genome, or (ii) there is some reason to think that the per annum rate of evolution has been different in two populations, which would "mimic" a closer/more distant relationship to the root. As we develop the ability to sample near 100ky-old samples, strange effects might appear if a test that does not take account of differences in sample ages is used.

Conclusion

To cap this long post, the new paper represents an exciting combination of new data, software, methods, and interpretation, that will probably give genome bloggers and all those interested in human history a lot to think about and/or play with in the coming months and years. I will certainly be unbundling and trying out the new ADMIXTOOLS suite.

UPDATE: Razib also covers this new paper.

Genetics doi: 10.1534/genetics.112.145037


Ancient Admixture in Human History

Nick Patterson et al.

Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred, and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses, and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples where they provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present day Basques and Sardinians, and the other related to present day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean ‘Iceman’. 

12 comments:

Arch Hades said...

Could the lack of West Asian component in this Bronze age "Thracian" be due a possible significant intrusion of West Asian admixture in South Eastern Europeans be to migrations during the Macedonian or Roman/Byzantine Empire times?

Dienekes said...

Could the lack of West Asian component in this Bronze age "Thracian" be due a possible significant intrusion of West Asian admixture in South Eastern Europeans be to migrations during the Macedonian or Roman/Byzantine Empire times?


We don't know what he lacked/had. Sardinians would still be the closest population if he e.g., had mostly "Med" ancestry with say, 10% North European or 10% West Asian, or even 10% of each.

While what you're suggesting may explain _some_ of the West Asian component in the Balkans, it can't be the only explanation, because ~10% West Asian also occurs in the far reaches of Europe in places like Ireland and Norway.

mooreisbetter said...

Some of this makes no sense whatsoever, like Sardinians being pure because they descended from Neolithic farmers.

Dont the most isolated Sardinians of La Barbagia bear the highest frequency of Y C Hg I-M26 and mtDNA Hg U ??? Arent those believed to be pre-Neolithic?

jackson_montgomery_devoni said...

Wow this truly is exciting! So could we chalk up what the main European Dodecad K12b components may represent then?

North European=Admixture between Mesolithic and Neolithic Europeans component

Mediterranean=Neolithic European component

West Asian=Late Neolithic or Bronze Age West Asian origin component

Would any of this make sense?

andrew said...

It is worth pointing out a sound ecological reason that Scandinavia should be heavily admixed between a Northern and Sardinian component, while Spain is not so admixed.

The technological package associated with the Sardinian genetic profile included plant and animal domesticates that gave the Sardinian farmers a decisive edge over "Northern" hunter-gatherers, but that edge was a function of how competitive the farmer and herder package was with the hunter-gatherer package.

Spain's climate is reasonably close to the source climate of the plant and animal domesticates that the Sardinian type people introduced, so the edge provided by that package in Spain was decisive.

Scandinavia was a marginal package for the plant and animal domesticates (mostly because it is colder) that the Sardinian type people introduced, and the Scandinavians may have placed a higher degree of reliance of fishing. Also fishing is really intermediate in staying power between farming and hunter-gather food production. For example, while the non-nomadic lifestyle and coincident pottery development arrived only with farming and herding in the Fertile Crescent, in East Asia, fishing civilizations developed pottery first and farming second. It is quite possible that Scandinavian hunter-gatherers may have been more fishing based relative to terrestrial food sources than Spanish hunter-gatherers. It is also possible that Spanish hunter-gatherers simply had someplace acceptable to flee to from the incoming Sardianian type migrants as evidenced perhaps by the hg V levels that Spain and Scandinavia share, while Scandinavians were cornered. The pressures felt by cornered relicts (admittedly admixed and adapted somewhat by then) from outsiders could help explain politically and sociologically why Scandinavians would have transitioned from a hunting-gathering mode to an aggressive raiding mode during the Viking era. Like Eminem, for them, failure wasn't an option and they did what they felt they had to do to survive as a people in the face of unfair competition.

Historically, after Scandinavia first transitioned to farming and herding when the climate was favorable to that, it then "reverted" to hunting-gathering-fishing-raiding when the local climate became unfavorable for farming and herding the European domesticate package, only to go back to farming and herding again when the climate changed again and as the farming and herding techniques became more sophisticated.

In contrast, once Spain transitioned from hunting and gathering to farming and herding, it never reverted to hunting and gathering.

The transitional periods when hunting and gathering and fishing were roughly equal in productiveness to farming and herding (something that much of Europe never experienced, since the technology was developed in the Near East and then leap frogged in an already refined and optimized for the climate when it arrived in Western Europe) is a natural juncture for admixture.

In the same vein, models of relative proportions of language speakers in a bilingual community show that the relative economic prestige and benefits of membership in the respective linguistic groups is one of the main drivers of language shift in one direction or the other. Admixture is going to be greatest when it isn't clear at the moment which group will turn out to be more prosperous.

apostateimpressions said...

"The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present day Basques and Sardinians, and the other related to present day populations of northeast Asia and the Americas."

I have suspected this for a couple of years, not just because of Hg. N but due my brief immersion in the Viking black metal scene. Odin and his aspects are rarely depicted in the modern iconography with striking Mongolianism. The snow is a bit deeper, the accompanying wolves are beefier with longer hair and Odin's dress is furier and his horns more rounded. Brief research revealed commonalities between historical Germanic and Siberian paganism. Overall, the artistic symbolism implies an ancient ecological and cultural continuum in north Eurasia stretching westward into Scandinavia and its surrounding environs. My guess has been that a north Eurasian hunter-gatherer continuum reformed from the paeleolithic remnants during the Mesolithic and continued through the Bronze Age. Hence the East Eurasian-like shift in north Europeans. The rest is Neolithic and Bronze Age from the Near East. Intuitive and even superficial but possibly bang on. : )

raphael petit said...

What is also worth noting is that the authors again confirmed Moorjani' results about sub-Saharan gene flow from Africa into Sardinia :

"There is some modest level of sub-Saharan (probably west African-related) gene flow from Africa into Sardinia as is shown by analyses in MOORJANI et al. (2011)." (p.44)

Dienekes said...

The current paper does not deal with the issue of African gene flow into Sardinia at all. The levels of that gene flow in the cited paper were estimated using f4 ancestry estimation, using statistics of the form f4(CEU, Sardinian, East Eurasian, African) under the assumption that East Asians were an outgroup to Europeans; that assumption is now shown to be false, and hence the estimated proportions of African-related ancestry are not robust.

The non-robustness of these proportions can be shown in two different ways: by either showing how they are inflated when the East Eurasian "outgroup" is one most similar to the population contributing genes to north Europe, and by masking out all segments of having even a remote possibility of African admixture in Sardinians.

http://dienekes.blogspot.com/2012/08/scrubbing-sardinians.html

eurologist said...
This comment has been removed by the author.
eurologist said...

What if much of the European neolithic was indigenous Balkan? Or part of it (Cardium) was not, but during migration became mostly old Mediterranean? In that case, there was little replacement in the south.

Mesolithic Balkans and Italians and Iberians could have been mostly rather similar (as they are today, except for more recently introduce W and SW Asian components).

If true, there was actually little neolithic replacement in the Mediterranean Europe.

Onur said...

First, the Uygur (790 ± 60 year ago) shows a very good fit, and interesting things were happening in Central Asia in the 13th century.

As conceded by the authors of the paper, that date is too recent and so is not a good fit (as is clear from ancient genetics, history, ancient anthropology and archaeology). I think the major reason why they estimated the timing of the Caucasoid-Mongoloid admixture of Uyghurs wrong is that the Caucasoid-Mongoloid admixture of Uyghurs occurred over a period of thousands of years, which involved many and punctuated major admixture episodes.

eurologist said...

"What if much of the European neolithic was indigenous Balkan? Or part of it (Cardium) was not, but during migration became mostly old Mediterranean? In that case, there was little replacement in the south.

Mesolithic Balkans and Italians and Iberians could have been mostly rather similar (as they are today, except for more recently introduce W and SW Asian components).

If true, there was actually little neolithic replacement in the Mediterranean Europe. "

See also the talk by Christina Papageorgopoulou, recently covered by Dienekes:

http://dienekes.blogspot.com/2012/12/talk-by-christina-papageorgopoulou-on.html