September 14, 2011

The Caucasus revisited (Yunusbayev et al. 2011)


This is another treasure trove of a paper, and together with Balanovsky et al. (2011) we now have a very clear picture of genetic variation in this most interesting of world regions.

Here is the ADMIXTURE analysis:

The authors also post results up to K=10 in the supplementary material, which show Druze/Bedouin/Basque-centered component. It is actually possible to push the analysis higher than K=7 without such problem components appearing, by retaining non-closely related individuals (using --genome in PLINK and then iteratively removing individuals from pairs with PI_HAT greater than some value).

Nonetheless, the components emerging from this analysis will be familiar to followers of the Dodecad Project. In terms of Dodecad v3:
  • light yellow "North East Asian"
  • orange "South East Asian"
  • brown "Neo African" or "Sub_Saharan", as there are no African hunter-gatherers
  • dark blue "North European", as there is no split of east/west Europe at this level
  • middle blue "West Asian"
  • light blue "Southwest Asian"
  • green "South Asian", but anchored on Sindhi, a population from Pakistan, due to the lack of more southern populations from India
The labels of new populations sampled in this study can be seen in brown. I particularly hope that the substantial new autosomal data will become publicly available, so that I can use them in the Dodecad Project. It will be an invaluable new resource, filling some "holes" in the Eurasian landscape (e.g., east of the Caspian; Bulgarians; several new Caucasus populations) in the Li et al. (HGDP), and Behar et al. data.

(to be continued)

UPDATE I (Y-chromosomes):


Some observations:
  • C has a concentration in the Turkic Nogays
  • The presence of D this far west is very surprising, again in the Nogays. This haplogroup has a relic distribution, with particular concentrations in Tibet, Mongolia, Japan, and Andaman Islanders. In all likelihood its presence here is linked to the Nogays' eastern origin
  • E and its subclades occurs at a very low frequency here
  • G2a has a clear West Caucasus (both north and south) concentration
  • I seems to have a mainly West Caucasus distribution as well; this is a common European haplogroup; it has quite elevated frequencies among the Andis and Kara Nogays. It would be interesting to discover some historical correlate for the presence of I in Kara Nogays but not Kuban Nogays and in Andis but not in most of the NE Caucasus
  • J1 has the expected Northeast Caucasus nexus. This haplogroup is bimodal, with a mode in Arabians and a secondary mode in NE Caucasus. Note the paucity of J1e-P58, the reverse of the situation of Arabians; I've noted before the likely association of the P58 clade with Semitic languages.
  • The extreme concentration of J2 in Chechens and Ingush are probably associated with low variance. Apart from these atypical populations, a substantial presence of this haplogroup can be found in the NW/S Caucasus in different populations and in the form of different subclades.
  • The new LT mystery clade has its usual low-frequency wide distribution
  • N occurs in Nogays as expected, and, like C, also in the NW Caucasus. This probably also represents an eastern influence, probably associated not only with the Nogays but also with various Tatar influences on the Caucasus.
  • Q occurs widely in the NW Caucasus but only in 1 Nogay. Perhaps this is more of a Tatar marker, although a finer-scale resolution of this haplogroup is really necessary.
  • R1a-related lineages occur less frequently here among eastern Slavs, a main reason for the disconnect between the Eastern European plain and the Caucasus. There does, however, appear to be good diversity here, with the presence of R1a*, R1a1-M198*, Note again how the Iranic Ossetians (both North and South) have almost no R1a1 compared to both their NW Caucasian and S Caucasian neighbors, again, suggesting that this may not have been an important Alan or steppe Iranian lineage, at least during the late antique time horizon. The occurrence of R1a1f-M458 may represent Slavic influence in the NW Caucasus.
  • R1b-related lineages seem ubuiquitous in the Caucasus. R-M73 occurs substantially in Kara Nogays and Balkars, an apparent link with Central Asia where this haplogroup occurs frequently.
UPDATE II (Caucasus-Eastern Europe discontinuity)

The authors of this paper highlight the genetic discontinuity between the eastern European plain and the Caucasus. This was also apparent in the Balanovsky et al. (2011) paper, and was also a major conclusion of the Dodecad Project, with Caucasians exhibiting a high percentage of the "West Asian" component, while eastern Slavs low "West Asian" and high "East European".

The interpretation of this discontinuity is more difficult. There are surely parts of the Caucasus region that are mountainous and pose an ecological contrast to the flatlands of eastern Europe. That is consistent with a different type of population living in either region for a long time, despite the well-attested archaological contacts (e.g., Maikop or the settlement of steppe nomads such as Alans or Sarmatians).

On the other hand, the eastern Slavic population can, at least in part, have expanded more recently, in the medieval period, as part of the early Slavic dispersals, as well as the push to the north and east of the Russians. These appear to have partly displaced Turkic groups from the north Pontic region, with all of the above having displaced historical Scythian (Iranic) nomads, who, in turn, displaced the mysterious Cimmerians. If the discovery of east Eurasian mtDNA C in Neolithic and Bronze Age Ukraine stands up, there will be another layer of population replacement, as mtDNA C is quite rare in the broader region today. On the other hand, the Caucasus itself may have been affected from population movements from the Near East, as Balanovsky et al. suggest.

So, in conclusion, the discontinuity is a fact that emerges from different types of analyses, but its causes remain uncertain, and it is not clear when and how it was first established.

Mol Biol Evol (2011) doi: 10.1093/molbev/msr221

The Caucasus as an asymmetric semipermeable barrier to ancient human migrations

Bayazit Yunusbayev et al.

Abstract

The Caucasus, inhabited by modern humans since the Early Upper Paleolithic and known for its linguistic diversity, is considered to be important for understanding human dispersals and genetic diversity in Eurasia. We report a synthesis of autosomal, Y chromosome and mitochondrial DNA (mtDNA) variation in populations from all major subregions and linguistic phyla of the area. Autosomal genome variation in the Caucasus reveals significant genetic uniformity among its ethnically and linguistically diverse populations, and is consistent with predominantly Near/Middle Eastern origin of the Caucasians, with minor external impacts. In contrast to autosomal and mtDNA variation, signals of regional Y chromosome founder effects distinguish the eastern from western North Caucasians. Genetic discontinuity between the North Caucasus and the East European Plain contrasts with continuity through Anatolia and the Balkans, suggesting major routes of ancient gene flows and admixture.

Link

26 comments:

Kartveli said...

Dienekes,

Did they collect new Georgian samples or do they use Behar's data?

apostateimpressions said...

It is interesting to see the Dai and the Lahu -- absolutely pure SE Asians. The Lahu have no tribes or clans above the family level but they are divided into colour groups that allude to the traditional style of clothing. Unfortunately some Lahu villages have been Christianised, a religion that is totally foreign to them.

http://en.wikipedia.org/wiki/Lahu_people

It is interesting to see the heavy W/ SW Asian admixture in the W/ S European populations, French, Italians, Romanians and Bulgarians, with levels reaching towards 50% and beyond in Italians -- with little or no NE/ S Asian admixture however. E Europeans by contrast have higher levels of NE/ S Asian, approaching 1/5 in Russians but little W/ SW Asian.

Turks on the other hand have higher levels of the W/ SW Asian components than Italians and Bulgarians, they have similar levels of S/ NE Asian to E Europeans and with the addition of a very slight SE Asian component in common with Iranians and Bulgarians.

The Jewish groups stand out here as almost completely Asian with absolutely minimal European admixture. They are similar in structure to the Druze but with sizable S Asian admixture.

The SS African component is virtually absent from Europeans and Turks and is limited to a slight presence in Iranians and other groups to the south.

It seems notable that no S European component is distinguished here.

princenuadha said...

"Genetic discontinuity between the North Caucasus and the East European Plain contrasts with continuity through Anatolia and the Balkans, suggesting major routes of ancient gene flows and admixture."

But look at how much Europe starts to curve towards the east as you go towards northeastern Europe. And there are a few caucasians very close to Europeans that are mapped on the "black sea".

I think it looks like an eastern population, like the caucasians, had a big impact on the Europeans and not just the near east.

Dienekes said...

Did they collect new Georgian samples or do they use Behar's data?

Behar's data.

yisha3 said...

Regarding P58, the association with Semitic would be obvious if not for the M368 (J1c3a) clade.
I think you should be referring to P58's major subclade, L147.1 (J1c3d) yet I will agree on a singular issue:
No study addressed this marker... Yet.

Gioiello said...

Anyway R1b1b2 is the 6,5%, a little bit for being born in Middle East or in Central Asia.
R1b1b1 is 1,69%, much more in percentage, but we know that Central Asia gets more frequency of this haplogroup, even though I think Western Europe gets more variance and probably is at the origin also of this haplogroup.
R1a (I presume M420) only 1 sample: I think it is the periphery of Western Europe, seen that it lacks completely in India.
R* (I presume M207) only 1. Infinitely more amongst the 50 Tuscans of the 1000 Genomes project: 2.

Probably this isn’t a good news for “marcantonio”.

tndl said...

Not surprisingly, I2a is well represented. I believe Dodecad also found a connection between the Balkans and Caucasus. May this map hold the answer?
http://en.wikipedia.org/wiki/File:Map_of_Colchis,_Iberia,_Albania,_and_the_neighbouring_countries_ca_1770.jpg

Annie Mouse said...

I am taking from this:

NE Caucausians = J1*

NW Caucausians = G2a* with R1a1* and a noticeable trace of J2a*

Southern Caucasians=NW Caucasions with maybe a bit more J2a*. Consistent with the entrance of this haplogroup from the south.

J2a* spikes in the Chechens and Ingush for some reason. I wonder where these groups came from?

R1b is present but not consistently. I would say it is notable under-represented in the Caucausus. These folk dont appear to have come from here.

Very interesting, as from what we can see the neolithic/Bronze age in Western Europe seems to be associated with G2a. So, evidence supporting the idea that the neolithic spread from the NW Caucasus. Not proof but most definitely evidence.

sykes.1 said...

What, pray tell, is the map supposed to show.

Evidently there has been a change in color coding, so it's not possible to go from the admixture chart to the pca map.

Some commentary would be helpful.

Ebizur said...

Dienekes wrote,

"Q occurs widely in the NW Caucasus but only in 1 Nogay. Perhaps this is more of a Tatar marker, although a finer-scale resolution of this haplogroup is really necessary."

Approximately one quarter of male Nogays possess Y-DNA that appears to suggest an ancestral origin in eastern Asia:

15/163 = 9.2% C

15/163 = 9.2% N

7/163 = 4.3% O

5/163 = 3.1% D

However, the presence of haplogroups O and D, and the breakdown of the subclades of haplogroups C and N that have been found in the Nogays suggest that the eastern Asian influence on this population derives from Central Asia or East Asia sensu stricto:

13/163 = 8.0% C(xC3c)
2/163 = 1.2% C3c

8/163 = 4.9% N(xN1b, N1c)
3/163 = 1.8% N1b
4/163 = 2.5% N1c

A recent influence from North Asia (Siberia) should instead have resulted in elevated frequencies of C3c, N1b, and N1c. N(xN1b, N1c) is actually extremely rare in Siberia (except for the southernmost "corner" of Siberia, i.e. the Altai region, which is latitudinally further south than some parts of Kazakhstan, Mongolia, or Manchuria). The combination of N(xN1b, N1c), O, and D Y-DNA is almost surely a sign of some genetic influence from some place located between (and within the same range of latitude as) the erstwhile Soviet Central Asia and Japan. Haplogroup Q is generally quite rare in populations of this region (except for the Altai), so I do not see any reason to expect the Nogays to have a higher frequency of Q than their neighbors in the Caucasus in the first place.

Jim said...

"So, evidence supporting the idea that the neolithic spread from the NW Caucasus. Not proof but most definitely evidence."

That sounds like that proto-Pontic proposal I first saw refenced here.

Andrew Oh-Willeke said...

The Caucasus are actually the most "pure" in the "West Asian" component, and are a fairly plausible source for that component.

The unity of the Caucasus in sharing that West Asian autosomal component as strongly as the region does is notable given that there are such an immense lack of homogenity in uniparental markers among the various subpopulations of the Caucasus. Given the remarkably low level of intergroup admixture revealed by the differing uniparental mixtures and linguistic divides at short distances in the Caucusas, this ponts to the autosomal unity having an origin of great time depth.

terryt said...

"this ponts to the autosomal unity having an origin of great time depth".

Very true.

"The interpretation of this discontinuity is more difficult. There are surely parts of the Caucasus region that are mountainous and pose an ecological contrast to the flatlands of eastern Europe".

The mountains are likely to have been a refuge for earlier groups while people moved freely across the plains.

"13/163 = 8.0% C(xC3c)
2/163 = 1.2% C3c"

So what is that 'C(xC3c)
'? C5 or something else?

"The combination of N(xN1b, N1c), O, and D Y-DNA is almost surely a sign of some genetic influence from some place located between (and within the same range of latitude as) the erstwhile Soviet Central Asia and Japan".

And C(x3c) is from there?

Ebizur said...

terryt asked,

"So what is that 'C(xC3c)'? C5 or something else?"

C5 has been found in some samples from Central Asia and the PRC, but it is very rare there (as it is in South Asia and Southwest Asia, the only other regions in which it has been detected so far). The C(xC3c) individuals in Yunusbaev's samples of Nogays are much more likely to belong to haplogroup C3(xC3c).


terryt asked,

"And C(x3c) is from there?"

I would guess so. C3(xC3c) is also found in indigenous Siberian populations, but the C3c subclade is generally much more common in Siberia. C3(xC3c) is found commonly and accompanied by very low frequencies of C3c in populations of, for example, the PRC (e.g. Manchus, Sibes). These indigenous populations of what are now the northernmost parts of the PRC also possess N(xN1b, N1c), O, and D Y-DNA, so they may be considered as potential source populations for the eastern influence observed in the Nogay Y-DNA pool.

eurologist said...

On the autosomal level, I agree there are pockets of a substrate from pre-European population. There are also a couple of y-haplogroups that are rather basal (pre-LGM R1b relicts). But:

"Very interesting, as from what we can see the neolithic/Bronze age in Western Europe seems to be associated with G2a. So, evidence supporting the idea that the neolithic spread from the NW Caucasus."

No - G2a in the Caucasus is derived, not basal. By looking at the fairly to highly derived states of all these tens of haplogroups in the Caucasus, it should be clear that it was the recipient of many rather different refugees. Tribes/people came, claimed a valley, and literally fiercely defended it for millennia.

Onur said...

Hi guys,

I have been corresponding via email with Mait Metspalu, one the authors of both this paper and the Behar et al. 2010 paper, for some months. In our correspondence Mr. Metspalu informed me that all of the Turkish samples used in this paper are exactly the same samples as all the Turkish samples used in Behar et al. 2010 (19 in total) and, more importantly, that all of the Behar et al. Turks (consequently also all of this paper's Turks) were sampled from the region of Turkey that is historically known as Cappadocia. This confirmed my suspicions, as ever since the publication of the Behar et al. 2010 paper I had been suspecting that the Turks used in Behar et al. 2010 were from a rather limited region of Turkey, and their contrast with the genetically much more heterogeneous Dodecad Turks, who are from all over Turkey, had increased my suspicions. Now my suspicions are confirmed by Mr. Metspalu. So, as Dodecad Turks are from all over Turkey while all the Turks used in this paper and the Behar et al. 2010 paper are exclusively from the historic Cappadocia region, the Dodecad Turks are much more representative of the ethnic Turkish genetic variation than the Turks used in this paper and the Behar et al. 2010 paper.

According to the Dodecad Project's standard ADMIXTURE analysis results, the most noticeable differences between the Dodecad Turks and this paper's/Behar et al 2010's Turks are that the average of the sum of the Mongoloid component percentages is 5.2% for the Dodecad Turks (as of now 23 samples in total) and 6.9% for this paper's/Behar et al 2010's Turks (as I said, 19 samples in total), that the average of the South Asian component percentage is 2.3% for the Dodecad Turks (less than that of the Dodecad Armenians [20 samples in total as of now], which is 2.8%, and almost equal to that of the Dodecad Assyrians [12 samples in total as of now], which is 2.2%) and 3% for this paper's/Behar et al 2010's Turks, and that on average the Dodecad Turks have more West European, more Mediterranean, more Southwest Asian and less West Asian component percentages than this paper's/Behar et al 2010's Turks. Also according to the Dodecad Project's standard ADMIXTURE analysis results, the average of the sum of the Mongoloid component percentages of the Dodecad Turks (5.2.%) is very similar to the average of the sum of the Mongoloid component percentages of the HGDP Adyghe (5.5%), for whom Mr. Metspalu informed me that they are at the same time the Adyghe samples used in this paper and the Behar et al. 2010 paper by the way.

terryt said...

"The C(xC3c) individuals in Yunusbaev's samples of Nogays are much more likely to belong to haplogroup C3(xC3c)".

Thanks Ebizur. That's what I thought on reflection.

"it was the recipient of many rather different refugees. Tribes/people came, claimed a valley, and literally fiercely defended it for millennia".

That makes sense.

Annie Mouse said...

I would expect the residual basal haplogroups to spread everywhere the population did, and derived haplogroups to spread from the point of origin of the mutation. It does not trouble me that the G2a might be derived, it should be after all this time. It might trouble me a little if there was no basal, but I have not seen data for this, perhaps you have a better source for the details of the G2a?.

Plus a basal group of men could easily create a new founder effect in another area at a later date, complicating the view.

terryt said...

Sorry. Not really on the subject of the Caucasus:

"C5 has been found in some samples from Central Asia and the PRC, but it is very rare there (as it is in South Asia and Southwest Asia, the only other regions in which it has been detected so far)".

That is far wider than just 'South Asia', the region so often quoted for the haplogroup. I notice its only subgroup is C5-P92. Any idea where that is found?

"I would expect the residual basal haplogroups to spread everywhere the population did, and derived haplogroups to spread from the point of origin of the mutation".

That is what I have always presumed. And I have always thopught that the geographic range of mtDNA A makes it difficult to support an original N expansion east through South Asia. It's beginning to look as though the geographic distribution of Y-hap C makes it difficult to support an original expansion east through South Asia as well.

"Plus a basal group of men could easily create a new founder effect in another area at a later date, complicating the view".

I agree that is quite possible for the Y-chromosome, such as C, but is fairly unlikely for the mtDNA lineages.

eurologist said...

Annie Mouse,

I agree that things can get complicated quickly. Nevertheless, some things don't change. If you have a source population, then migrants from that, who simultaneously expand, will (i) be a subset (due to chance and drift), and (ii) form one or several new stars around that subset. Typically, a small source population will not expand but rather get fixed to fewer sub-groups due to drift, and will have no star-like pattern at the equivalent level. G2a pretty much disagrees with all of these expectations in the Caucasus.

A reasonable conclusion would be that G2a underwent growth both in Europe and in Anatolia (or some other Black Sea region) at roughly the same time (absolutely latest post Younger Dryas and very early neolithic, based on Derenburg) - but separately, like R1b1a2(M269) did (IMO). Then, some of that entered the Caucasus.

Dienekes posted a G2a median joining network figure in his Treilles entry:
http://3.bp.blogspot.com/-a9yFwkju2UU/TeVVmTZ6D8I/AAAAAAAADzw/qWptkbj-evk/s1600/g2a.png

We obviously need to dig much deeper into G2a SNPs to get a clearer picture. Luckily, as opposed to R1b, we have several really good timings from ~7,500 and >5,000 ya.

Annie Mouse said...

Hmm I forgot that diagram. I wish it was annotated so you could clearly see the root and branches. I cant really tell what is what. It would be nice to see where Otzi and the rest sit.

terryt said...

"Typically, a small source population will not expand but rather get fixed to fewer sub-groups due to drift, and will have no star-like pattern at the equivalent level".

That seems to be what we see in basal G. A relatively prolonged period of drift in some, as yet unknown, region before it expanded. But where was that region? Hardly South Asia or it would have expanded along with the Indian Y-haps.

"If you have a source population, then migrants from that, who simultaneously expand, will (i) be a subset (due to chance and drift), and (ii) form one or several new stars around that subset".

I can see a case where that might not be so. If the source population remained isolated in its region of coalescence after members had left it would then be subject to continued drift. The apparently 'derived' haplogroup may in fact still reside in the original region and the apparent G*, or whatever, is the migrant population.

"We obviously need to dig much deeper into G2a SNPs to get a clearer picture".

Very true.

Annie Mouse said...

While trying to sort out what was what in the Treilles linkage map I realized it is based entirely on STR data, which has recently come into disrepute. Also STRs are particularly difficult to relate heirachically in this way. So regretfully I think we have to view this pretty map with extreme caution.

I tried to find a better SNP one but couldnt. This is the closest I can get (Wikipedia, :) ).

G1= Iran
G2= East of Turkey (probably)

G2a1=Caucasians (and Stalin)
G2a2=British, Turkish
G2a3=Neolithic LBK, Europe, Turkey, Armenia
G2a4=Otzi, Western Europe, North Africa

G2b=Turkey (1 man)
G2c=Ashkenazi

Basically it looks like the population represented by G2a2 and G2a3 was present in both Turkey and Western Europe.

G2a1 is only about 3k years old so would not have been around when folk from the area headed across to Europe (neolithic LBK G2a3 is 5k old). This represents a post neolithic expansion from most probably a G2a population.

G2a4 was only identified in 2009 so we dont have much data on it, and it could well be in Turkey or the Caucasus also.

Everything we actually know is consistent with a population from the Caucasus (or nearby) being the source for European G2a.

eurologist said...

"I think we have to view this pretty map with extreme caution"

Annie Mouse,

I agree 100%. I think it shows that it is based on way too few markers - most of the transitions are not believable unless G's were all European traveling salesmen.

Of course one has to be careful about at what level in the tree people moved around. The G-Project people seem to think G's traveled to Europe every few hundred years, with each new group mostly wiping out their progenitors.

Well, that is a bit exaggerated, but they seem to be preoccupied with fancy migration stories and forget that G (at several levels), once in Europe, formed their own large clusters, and some amount of back-migration is to be expected. And of course, their time estimates are way off (a factor of 2-3 at least too young in many cases, IMO). For example, they list G2a4 as 3,000 years, when we know it is at the absolute minimum 5,000 years old, and most G3a3 subgroups as only ~1,000 to a few 1,000 years old, when we know that at the base level it is at least 7,000 years old (Derenburg) - and of course it should have seen the beginning of its explosion in Europe right with LBK.

At any rate, from their compilation, G2* is found both in Europe and Armenia, G2a* basically everywhere where G is found. But only a few, derived sub-groups are Caucasus-specific: e.g., G2a1.

G2a3b1, the one most prevalent in Europe, is supposedly only 5,000 years old according to Wikipedia, yet found in North Africa, India, Uzbekistan, Siberia, Malaysia, and China! This is also found at high prevalence in the Caucasus, but is highly derived, so likely got there the same way it got (almost) all over the world.

A more parsimonious hypothesis is that 95% or so of G in Europe entered exactly twice: once before LGM (as G2a*), and once with LBK (perhaps from being present in the Balkans before agriculture), as G2a3b1*, G2a3b1a1* and G2a3b*.

Onur said...

BTW, Mr. Metspalu also told me that all of the Behar/Yunusbayev Turks are from the autocthonous Turks of the historic Cappadocia region, so, for instance, they do not descend from the emigrants from the Balkans or the Caucasus to that region. So the Behar/Yunusbayev Turks can at most represent Turks from the historic Cappadocia region. I say at most, as Mr. Metspalu doesn't know from which provinces of the historic Cappadocia region they are, he only knows that they are all from the historic Cappadocia region and all autocthonous Turks of that region. So they may be from a single province of the historic Cappadocia region or from multiple provinces of it, only the sample collectors know these details according to Mr. Metspalu.

terryt said...

"G1= Iran
G2= East of Turkey (probably)"

That's interesting. And revealing. Region of origin?