Y chromosomes in Iranians and Tajiks (Malyarchuk et al. 2013)

An interesting paper on Iranian and Tajik Y chromosomes. Iranian Y chromosomes were comprehensively studied by Grugni et al. but it is always good to have additional samples.

I have mentioned before the apparent distinction between west and east Iranians in terms of haplogroup J/R1a frequencies, with high ratios in Persians and Kurds, and low ones in Pathans, and this seems to be reinforced here; the Tajiks are speakers of Persian (hence "western") but trace their ancestry to the east of the modern country of Iran, and in-between Persians and eastern Iranians.

The absence of R1a in this Kurdish sample, coupled with high J frequency parallels the situation in the Kurdish Anatolian settlement studied by Gokcument et al., as well as the Georgian Kurmanji sample studied by Nasidze et al. On the other hand, R1a is present in the Kurmanji samples from Turkey and Turkmenistan in the latter study, as well as in the aforementioned Kurdish sample from Iran by Grugni et al. and the Kurdish sample from Turkmenistan studied by Wells et al. I'd say that there is potential variation of this haplogroup within Kurdish groups, which might be worth further exploration.

It would also be very interesting to study the haplogroup I chromosomes from this region. Do they represent historical introgression from Europe, or are they, perhaps, local basal clades that reinforce the idea of a relic distribution of I in West Asia, prior to the migration into Europe, that was recently suggested by the discovery of IJ* chromosomes in Iran by Grugni et al.?

Annals of Human Biology, 2013; Early Online: 1–7

Y-chromosome variation in Tajiks and Iranians

Boris Malyarchuk et al.

Aim: The purpose of this study was to characterize Y-chromosome diversity in Tajiks from Tajikistan and in Persians and Kurds from Iran.

Method: Y-chromosome haplotypes were identified in 40 Tajiks, 77 Persians and 25 Kurds, using 12 short tandem repeats (STR) and 18 binary markers.

Results: High genetic diversity was observed in the populations studied. Six of 12 haplogroups were common in Persians, Kurds and Tajiks, but only three haplogroups (G-M201, J-12f2 and L-M20) were the most frequent in all populations, comprising together 60% of the Y-chromosomes in the pooled data set. Analysis of genetic distances between Y-STR haplotypes revealed that the Kurds showed a great distance to the Iranian-speaking populations of Iran, Afghanistan and Tajikistan. The presence of Indian-specific haplogroups L-M20, H1-M52 and R2a-M124 in both Tajik samples from Afghanistan and Tajikistan demonstrates an apparent genetic affinity between Tajiks from these two regions.

Conclusions: Despite the marked similarities between Y-chromosome gene pools of Iranian-speaking populations, there are differences between them, defined by many factors, including geographic and linguistic relationships.



I think Y-DNA of certain groups might differ from other related groups due to bottle-neck effects. The previous Y-dna study on Iran showed around 20% R1a1 among Kurds of Iran where as in this study 0%. As you had mentioned in other studys Kurds do have high amounts of R1a1 compared to their non-Iranic neighbours.

Kurds themselves are not homegenous, they speak different iranic languages. Their Ethnicity was born post-Islamic times as before that most of Iranic tribes that inhabited their regions were Median or Parthians.

But on automosal DNA they seem to be similar to eachother with the same level of components. Which might indicate certain male lines became more dominant then others depending on which region or which Kurdish groups.

"I have mentioned before the apparent distinction between west and east Iranians in terms of haplogroup J/R1a frequencies, with high ratios in Persians and Kurds, and low ones in Pathans, and this seems to be reinforced here; the Tajiks are speakers of Persian (hence "western") but trace their ancestry to the east of the modern country of Iran, and in-between Persians and eastern Iranians."

There is some evidence that a distinction between "high-R1a Iranics" and "high-J2 Iranics" might exist even within the modern Republic of Tajikistan.

Tajikistan (R. Spencer Wells, Nadira Yuldasheva, Ruslan Ruzibakiev et al. 2001):

1/16 = 6.3% E-M96

1/16 = 6.3% F-M89(xI-M170, J2-M172, H1-M52, K-M9)

5/16 = 31.3% J2-M172

2/16 = 12.5% H1-M52

1/16 = 6.3% O-M175(xO1a-M119, O2a1-M95, O3-M122)

2/16 = 12.5% L-M20

1/16 = 6.3% R2a-M124

3/16 = 18.8% R1a1a-M17(xM87)

1/22 = 4.5% C-M130(xC3a3-M48)
1/22 = 4.5% C3a3-M48
2/22 = 9.1% C-M130 total

1/22 = 4.5% F-M89(xI-M170, J2-M172, H1-M52, K-M9)

2/22 = 9.1% J2-M172

1/22 = 4.5% L-M20

2/22 = 9.1% R2a-M124

14/22 = 63.6% R1a1a-M17(xM87)

Although the sample sizes of Wells et al. 2001 leave much to be desired, it is notable that their sample of Tajiks from Dushanbe, the capital and largest city of Tajikistan, located in the central-western part of the country, exhibits a high frequency of J2-M172 and only a moderate frequency of R1a1a-M17 Y-DNA. In contrast, their sample of Tajiks from Khojant (Khujand), the second largest city of Tajikistan and capital of Sughd (Sogdiana) Province in the northern extremity of the country, exhibits an extraordinarily high frequency of R1a1a-M17 and a rather low frequency of J2-M172 Y-DNA.

Both of these samples include F-M89(xI-M170, J2-M172, H1-M52, K-M9), L-M20, and R2a-M124 as minority haplogroups. They differ in regard to their other minority haplogroups, with H1-M52, E-M96, and O-M175(xO1a-M119, O2a1-M95, O3-M122) (perhaps O2b) being found only in the sample of Tajiks from Dushanbe, and C-M130 (including both C-M130(xC3a3-M48) and C3a3-M48) being found only in the sample of Tajiks from Khojant; however, these differences are probably not so significant, considering the small sample sizes.

On the other hand, some other samples from the same study suggest a more complex picture:

1/31 = 3.2% C-M130(xC3a3-M48)

10/31 = 32.3% J2-M172

1/31 = 3.2% K-M9(xO-M175, L-M20, N1c1-M46, P-M45)

3/31 = 9.7% L-M20

1/31 = 3.2% P-M45(xQ1a1-M120, Q1a3a1-M3, R1-M173, R2a-M124)

10/31 = 32.3% R1-M173(xR1a1a-M17)

5/31 = 16.1% R1a1a-M17(xM87)

5/44 = 11.4% E-M96

1/44 = 2.3% C-M130(xC3a3-M48)

7/44 = 15.9% F-M89(xI-M170, J2-M172, H1-M52, K-M9)

5/44 = 11.4% J2-M172

7/44 = 15.9% L-M20

6/44 = 13.6% P-M45(xQ1a1-M120, Q1a3a1-M3, R1-M173, R2a-M124)

3/44 = 6.8% R1-M173(xR1a1a-M17)

10/44 = 22.7% R1a1a-M17(xM87)

The Yagnobis, a minority population from an isolated valley in the south of Sughd Province, located geographically between Khujand and Dushanbe, share the low R1a1a/J2 ratio of the sample of Tajiks from Dushanbe, but they speak an Eastern Iranic language in contrast to the Western Iranic Persian dialect of the Tajiks. The Yagnobi sample also exhibits R1-M173(xR1a1a-M17) (perhaps some subclade of R1b) Y-DNA with equal frequency to J2-M172.

This sample of the Shugnan population, which inhabits the most populous, southwestern part of the overall very sparsely populated Gorno-Badakhshan Autonomous Province in eastern Tajikistan, exhibits a higher frequency of R1a1a-M17 than J2-M172 like the Tajiks from Khujand, but the R1a1a/J2 ratio of the Shugnanis is only 2.0 whereas the R1a1a/J2 ratio of the sample of Tajiks from Khujand is 7.0. In addition, the total frequency of R1a1a+J2 in the Shugnani sample is only 15/44 = 34.1%, due to the rather high frequencies of F-M89(xI-M170, J2-M172, H1-M52, K-M9), L-M20, P-M45(xQ1a1-M120, Q1a3a1-M3, R1-M173, R2a-M124), and E-M96 in this sample. The R1a1a+J2 totals for the other samples are 15/31 = 48.4% Yagnobi, 8/16 = 50.0% Dushanbe Tajik, and 16/22 = 72.7% Khujand Tajik.

One thing about these samples that strikes me as a bit curious is that R1a1a-M17 is by far most frequent among the Tajiks in Khujand, which is geographically and historically more closely connected to the Fergana Valley of eastern Uzbekistan than to the other regions of Tajikistan. This same sample of Tajiks from Khujand also exhibits C3a3-M48, the only unambiguous evidence of Turko-Mongol Y-DNA influence in any of this study's Tajikistani samples. However, the sample of Uzbeks from the Uzbekistani portion of the Fergana Valley does not exhibit such a high frequency of R1a1a-M17 (only 14/63 = 22.2%), although the Fergana Uzbeks' R1a1a/J2 ratio is rather high (2.33).

As Dax already mentioned. Although Kurds are relatively homogeneous by aDNA compared to their neighbors, the frequency of y and mtDNA based on the tribal group can vary just like most other places in the Near East,take Southwest Turkey which is high in Haplogroup G and compare to some Central Anatolian Turkish villages which show considerable frequency of Haplogroups like L or Georgian Yezidi Kurds who are untypical high in R2 and J2a lineages.

This new Kurdish sample seems to be odd in one point, that the Kurds lack entirely R1a*. This is the first time I come across a sample on Kurds from the traditional area of Kurdistan who lack it, usually most studies show a frequency 10-25%. exceptions are Kurdish settlements outside Kurdistan, for example in Central Anatolia and Georgia (most probably bottle neck effect).

However I do not agree with DAX that "Kurds are not homogeneous" this as explanation for the variance in Kurdish yDNA. Today we know its not just wrong but also ignorant to claim this and people who claim this should first tell me which people in Western Asia or South- and Central Asia are more homogeneous than Kurds. The Kurds are compared to many of their neighbors by far too homogeneous to be "the result of post Islam.

The Kurds seem to be an ancestral component in whole of Western Asia.