October 26, 2013

Afghan mega-paper (Di Cristofaro et al.)

The admixture results nicely presented on a map:

The authors note that none of the ancestral components peaks in Central Asia, concluding that this region has been a destination rather than a source of population movements. I certainly agree that Central Asia has a lot of recent history affecting it from virtually all directions. On the other hand, we should be cautious about interpreting geographical clines in terms of directionality of population movement; a good example is Sardinia which often emerges as a "focus" of Mediterranean ancestry, but this does not mean that it is the origin of such ancestry. It would certainly be interesting to remove the layers of more recent ancestry from Central Asia to see what was there before the last few thousand years.

The PCA based on autosomal data:

The Y-chromosome haplogroup data can be found in Figure S7. The authors comment:
94% of the chromosomes are distributed within the following 9 main haplogroups: R-M207 (34%), J-M304 (16%), C-M130 (15%), L-M20 (6%), G-M201 (6%), Q-M242 (6%), N-M231 (4%), O-M175 (4%) and E-M96 (3%). Within the core haplogroups observed in the Afghan populations, there are sub-haplogroups that provide more refined insights into the underlying structure of the Y-chromosome gene pool. One of the important sub-haplogroups includes the C3b2b1-M401 lineage that is amplified in Hazara, Kyrgyz and Mongol populations. Haplogroup G2c-M377 reaches 14.7% in Pashtun, consistent with previous results [31], whereas it is virtually absent from all other populations. J2a1-Page55 is found in 23% of Iranians, 13% of the Hazara from the Hindu Kush, 11% of the Tajik and Uzbek from the Hindu Kush, 10% of Pakistanis, 4% of the Turkmen from the Hindu Kush, 3% of the Pashtun and 2% of the Kyrgyz and Mongol populations. Concerning haplogroup L, L1c-M357 is significantly higher in Burusho and Kalash (15% and 25%) than in other populations. L1a-M76 is most frequent in Balochi (20%), and is found at lower levels in Kyrgyz, Pashtun, Tajik, Uzbek and Turkmen populations. Q1a2-M25 lineage is characteristic of Turkmen (31%), significantly higher than all other populations. Haplogroup R1a1a-M198/M17 is characterized by its absence or very low frequency in Iranian, Mongol and Hazara populations and its high frequency in Pashtun and Kyrgyz populations.

PLoS ONE 8(10): e76748. doi:10.1371/journal.pone.0076748

Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge

Julie Di Cristofaro et al.

Despite being located at the crossroads of Asia, genetics of the Afghanistan populations have been largely overlooked. It is currently inhabited by five major ethnic populations: Pashtun, Tajik, Hazara, Uzbek and Turkmen. Here we present autosomal from a subset of our samples, mitochondrial and Y- chromosome data from over 500 Afghan samples among these 5 ethnic groups. This Afghan data was supplemented with the same Y-chromosome analyses of samples from Iran, Kyrgyzstan, Mongolia and updated Pakistani samples (HGDP-CEPH). The data presented here was integrated into existing knowledge of pan-Eurasian genetic diversity. The pattern of genetic variation, revealed by structure-like and Principal Component analyses and Analysis of Molecular Variance indicates that the people of Afghanistan are made up of a mosaic of components representing various geographic regions of Eurasian ancestry. The absence of a major Central Asian-specific component indicates that the Hindu Kush, like the gene pool of Central Asian populations in general, is a confluence of gene flows rather than a source of distinctly autochthonous populations that have arisen in situ: a conclusion that is reinforced by the phylogeography of both haploid loci.



AdygheChabadi said...

I was just about to send an e-mail and ask you had you seen this paper.

AdygheChabadi said...

After looking at the Admixture analysis (Figure S2) at K=9, I think I can make out Dienekes' 'Globe 13' components fairly accurately:

AC3/ Light blue: Southwest Asian + Mediterranean

AC4/ Medium Blue: North European

AC6/ Light Green: West Asian/ Caucasus

AC7/ Darker Green: South Asian (Overwhelmingly predominant component of the "ASI" [Ancestral South Indian] component)

The others are self-explanatory.

The near identicalness of the Brahui, Balochi, and Makrani is nothing new here.

I wonder what that AC5 = Kalash component is in the French? Tiny, but noticeable. Same with the Russians and North Italians. Seems absent in the Sardinians.

Also looking at Figure S1, the "South Asian" component seem to break down into 2 components >K=12. Any idea on what that would be? Also the Pulliyar shift to 100% of the darker green "South Asian" component (as opposed to the dominant medium green "South Asian" component in the Gujaratis) in three highest K's, why?

There are other components in the >K=9 Admixture analyses. Don't know how informative they are though.

Also, Dienekes, when are you going to do another Admixture analysis?

Rob said...

shame there was no finer resolution of R1a ...

Joshua Lipson said...

More G2c, guys. There's no dismissing an Ashkenazi-Pashtun Y-DNA identity of some kind any longer. What's an actually viable theory? Khazar-Pashtun link is geographically implausible, but I don't by the Lost Tribe bullshit either for the standard reasons. Any finer-grained STR analysis of the lineage out there?

Slumbery said...

AC6 has some significant difference in distribution from Dodecad West Asian, however it can possibly come from this map being a torso in those areas.

The authors presumably used the data from the black-dot places to draw this map, therefore their colours in Scandinavia, most of Central and East Europe, Arabia and Egypt are complete fantasy based on their preconceptions. (They have not used any data from there...)

They got strange results on Hazara R1b1a1. An earlier article (can't remember exactly) indicated that this Y-Hg has very hight frequency among Hazara. Now they got zero from the big Afghanistan sample, but almost 1/4 from the Pakistan Hazara sample. This suggest this Hg has a sporadic distribution and the earlier research just picked up a non-typical subgroup. (I sincerely hope that the difference is not coming from the Afghanistan-Hazara data of this (or both) Article(s) being sampled from a single village or valley or something like that...)

Seinundzeit said...

It is deeply unfortunate that they could only manage 4-5 autosomal samples for each ethnic group. I mean, they should've tried to match the HGDP sample sizes, 20-25 autosomal samples would have been great. Let's be frank, 5 samples for the largest ethnic group in the country doesn't really seem wise. Also, I understand that the focus was on the Hindu Kush, but a sampling opportunity in Afghanistan is very rare. They could've sampled Pashtuns from the heartland (Paktia, Kandahar, and Nangarhar), but they choose isolated communities in the far north of the country (communities which may have admixture from neighboring populations, but we can't verify this, since there are only 4-5 samples). And we don't even know how much variation exists in these isolated Pashtun groups in the north, because we only have 5 samples. But I guess this is much better than nothing.

cris said...

AC3=Etrusco-Tyrhsenian componenet
AC4=Macro-Vascono-Caucasic component
AC6=Proto Indo-European component
AC7=Proto Dravidian component
AC8=Macro-Sinic component
AC9=Macro-Altaic component

Unknown said...

I really, really wish they would include north africa in these analyses.

mm said...

Is this a goog quality study?


Does Khazars appear in other studies as distinct people?

Joshua Lipson said...

The Elhaik study was a true sham. The "Armenians as proxy for Khazars" kindergarten-level mistake.

Mark D said...

"But I guess this is much better than nothing."

I beg to disagree. With such low sample sizes, it's no better than reading tea leaves. It lacks credibility.

eurologist said...

Incredibly, there is no sample in C Europe nor N Europem, and the E European samples exclude most of medium and southern E Europe. AC3 looks like SW Asian, not Middle Eastern. AC4 is polluted by the poor CE sampling. AC6 looks Gedrosian, but what happened to AC5?

Weird sampling weird study.

andrew said...

The extent to which the ethnicities of Afghanistan remain distinct from each other despite close proximity for a substantial length of time and near universal conversion to Islam by all of the ethnicities is notable.

Rob said...

As Dienekes said: "The authors note that none of the ancestral components peaks in Central Asia, concluding that this region has been a destination rather than a source of population movements. On the other hand, we should be cautious about interpreting geographical clines in terms of directionality of population movement"

The simplicity in which these "scholars" draw their conclusions merely re-affirms how cursory their level of knowledge is. They should stick to "number crunching" and abstain from pretending to offer anything inciteful

mregdna said...

"Khazar-Pashtun link is geographically implausible"

I don't think so because a part of the Khazars could have been some indo-iranians from central Asia. In that case a common ancestor between the ashkenazim G2c and Pashtun one could be dated more than 1500 years ago.

Proto Khazars being Armenians and Georgians is a very strange hypothesis that does not fit with the history of Ashina Clan. But it's possible that some Caucasians were assimilated into Khazar tribes and Empire.

andrew said...

Everyone should "abstain from pretending to offer anything inciteful" as that would tend to breach the peace and not foster civility.

Trying to offer something "insightful" is certainly be encouraged, it is simply harder to achieve than it seems.

Davidski said...

Don't worry Dr Rob, the scientists who wrote this paper are obviously way ahead of you, because what they say in this study correlates very well with the latest ancient DNA and data on the phylogeography of R1a.

In fact, I won't be surprised if some of the names on this paper soon appear on papers covering the phylogeography of R1a and our other favorite haplogroups based on complete Y-chromosome sequences.

Rob said...

Davidski what do u mean ? Which r1a data from which paper ? What ancient DNA exists from central Asia ??

Rob said...

The "peaks " in themselves mean little .'eg the peak in 'saami' areas certainly don't represent a source of population migration, but rather their peculiar structuring due to relatively small Ne, drift , etc
Whilst locally informative , they are less so on a goal scale

fmgarzam said...

About: “concluding that this region has been a destination rather than a source of population movements.” And “The absence of a major Central Asian-specific component indicates that the Hindu Kush, like the gene pool of Central Asian populations in general, is a confluence of gene flows rather than a source of distinctly autochthonous populations that have arisen in situ: a conclusion that is reinforced by the phylogeography of both haploid loci.”

My one cent: Confluence.

Afghanistan for centuries was the three way confluence place, the trade hub of the known world, peaking some 700 to 1000 years ago. Asia, Europe and Africa met there, a hub, the great meeting place. It was the quintessential trade/relay place, the gate to the other side of the world. Maritime and Oceanic navigation diminished its importance.

I understand that around 1000 years ago, the time of Mahmoud of Ghazni. There were a lot of people living there at that time, there was even a one million people city there, later deserted. People from all over used to meet there for the great caravan market. We are talking the gate to or from the Silk Road.

It probably was a destination for centuries for many, even Alexander the Great. Then when the time of decadence came it could have been a small source of people migration.

Spencer Wells was studying the Pashtun or some other group there, I wonder what happened.

In very early 80’s by chance I bought cheap Caravans a novel by James A. Michener. It was a good way to try to learn about a who knows where country that had just been invaded by the Soviets. It is one of my greatest buys. I got me hooked on trade history.

Davidski said...

Dr Rob, you've never heard of the ISOGG Y-DNA Haplogroup Tree?

What about all the papers on ancient DNA from South Siberia, Western Siberia, Kazakhstan and Tarim Basin?

You seem to be very poorly informed for someone who apparently has an interest in R1a and the genetic history of Central Eurasia.

Rob said...

Davidski , perhaps I am indeed poorly informed . All I know is that they have discovered R1a in the places u mentioned. Exactly which subclades they were , and where they came from remains to be demonstrated

I know u have freely speculated that it was from the eastern European region . This may be so, but it still is far from proof of yr simplistic, and archaeologically incorrect, visions of a "Kurgan" migrations.

Seinundzeit said...

It's been quite a while since Dienekes released a DIY Admixture calculator, and that's perfectly fine and reasonable. But the autosomal data for the "Afghan Hindu Kush" paper has finally been released. http://www.evolutsioon.ut.ee/MAIT/public_data/afghan/
I really hope Dienekes considers creating a new Calculator+Oracle with this data, it fills an important gap in Eurasia. It would be a very important addition.

Unknown said...

I like how you cite that it is a confluence or convergence location considering most of this is post neolithic expansion without a great deal of consideration for epi-paleolithic conditions. Recent discoveries that may extend the borders of the Fertile Crescent into eastern Iran (http://www.livescience.com/37963-agriculture-arose-eastern-fertile-crescent.html), if not beyond, may place the Iranian Plateau and Afghanistan as important to neolithic development as the Levantine corridor. It would be ironic if the importance of Afghanistan in later eras was critical because it was where it all began.