November 24, 2005

New paper on Indian Y-chromosome variation

A new paper on Y-chromosome variation in India has become available as an unedited preprint in the AJHG site. This is a huge study which covered linguistic/caste groups from the entire country and used 69 binary markers and 10 microsatellites to create a very thorough sampling of Indian Y-chromosomal variation. It will take some time to digest all the new information, plus the supplemental materials of the paper that remain to be put online. I will blog more about this soon. In bullet form, some findings of the paper which caught my attention:
  • R1a1's molecular variance is highest in NW India and its age is substantial
  • R1a1's variance is high in tribals
  • The phylogeny of J2 has been refined and it is now split into two newly discovered clades, called J2a and J2b.
  • J2 is almost entirely absent from tribals and is represented at a higher frequency in upper castes than middle castes than lower castes.

The samples:
High-resolution assessment of Y-chromosome binary haplogroup composition was conducted on 728 Indian samples representing 36 populations, including 17 tribal populations, from six geographic regions and different social and linguistic categories. They comprise (Austro-Asiatic) Ho, Lodha, Santal, (Tibeto-Burman) Chakma, Jamatia, Mog, Mizo, Tripuri, (Dravidian) Irula, Koya Dora, Kamar, Kota, Konda Reddy, Kurumba, Muria, Toda (Indo-European) Halba. The 18 castes include (Dravidian) Iyer, Iyengar, Ambalakarar, Vanniyar, Vellalar, Pallan and (Indo-European) Koknasth Brahmin, Uttar Pradhesh Brahmin, West BengalBrahmin, Rajput, Agharia, Gaud, Mahishya, Maratha, Bagdi, Chamar, Nav Buddha, Tanti. With exception of the Koya Dora and Konda Reddy groups, these samples have been previously described (Basu et al. 2003).
J2 is divided into two main clades: J2a*-M410 and J2b*-M12:
New phylogenetic resolution has been achieved within the J2-M172 clade with the discovery of the M410 nucleotide A to G substitution (Table 2). Now all J2-M172 derived lineages can be assigned to one of two sister clades, namely J2a*-M410 and J2b*-M12, necessitating an updated revision of the previous “haplogroup by lineage” YCC nomenclature for J2 (Jobling et al. 2003). The J2*-M172 phylogenetic revisions are presented in supplemental dataA5. We include the DYS413≤18 allele repeat node in the phylogeny as suggested by Di Giacomo et al. (2004). It is notable that no J2*-M172 haplogroup lacking both M410 and M12 derived alleles has yet been observed. The DYS413 locus was typed in M410 derived samples from India, Pakistan and Turkey. The vast majority displayed the ≤18 allele repeat, although 16/118 in Turkey had alleles ≥19, as did 5/17 in Pakistan and 5/28 in India, 4 of which were restricted to the Dravidian-speaking Iyengar and Iyer upper castes.
5 New Clades in haplogroups C, L, Q, and I:
We report 5 new clades that improve the haplogroup topology within the Y-chromosome genealogy. The new subclade C5-M356, accounts for 85% of the former C* haplogroups. While its overall frequency is only 1.4% in the Indian sample, it occurs in all linguistic groups, and in both tribes and castes. It also occurs in 1 Dravidian Brahui in Pakistan (Table 3). The new L3-M357 subclade which accounts for 86% of L-M20(xL1xL2) chromosomes in Pakistan; but occurs sporadically (3/728) in India. All Indian haplogroup Q representatives belong to the new M346-subclade. This new Q clade will aid in future studies attempting to narrow the candidate Asian/Siberian precursors of Native American chromosomes. The G5-M377 substitution is independent of G1-M285 and G2-P15 subclades (Cinnioglu et al. 2004) and occurs in Pakistan. The M379 polymorphism defines the I1c2 subclade, that occurs only our Pakistani data.
Indigenous Indian haplogroups:
On the basis of the combined phylogeographic distributions of haplotypes observed
among populations defined by social and linguistic criteria, candidate haplogroups that most plausibly arose in situ within the boundaries of present day India include C5-M356, F*-M89, H*-M69 (and its sub-clades H1-M52 and H2-APT), R2-M124 and L1-M76. The congruent geographic distribution of H*-M69 and potentially paraphyletic F*-M89 Y-chromosomes in India suggests that they might share a common demographic history.
R1a1 and R2:
The widespread geographic distribution of haplogroup R1a1-M17 across Eurasia and the current absence of informative subdivisions defined by binary markers leave its geographic origin uncertain. However the contour map of R1a1-M17 variance shows the highest variance in the northwest region of India (Figure 3).


In haplogroups R1a1 and R2 the associated mean microsatellite variance is highest in tribes (Table 8), not castes. This is a clear contradiction to what would be expected from an explanation involving a model of recent occasional admixture.


Specifically, they could have actually arrived in southern India from southwest Asian source region multiple times with some episodes being considerably earlier than others. Considerable archeological evidence exists regarding the presence of Mesolithic peoples in India (Kennedy 2000), some of whom could have entered the
subcontinent from the northwest during the late Pleistocene period. The high variance of R1a1 in India (Table 8), the spatial frequency distribution of R1a1 microsatellite variance (Figure 3) clines and expansion time (Table 7) support this view.
Clustering of R1a1 haplotypes:
The ages of the Y-microsatellite variation (Table 7) for R1a1 and R2 in India suggest that the pre-historical context of these haplogroups will likely be complex. A PC plot of R1a1-M17 Y-microsatellite data (Figure 4) shows several interesting features: (a) one tight population cluster comprising S. Pakistan, Turkey, Greece, Oman and West Europe, (b) one loose cluster comprising all the Indian tribal and caste populations, with the tribal populations occupying an edge of this cluster, and (c) Central Asia
and Turkey occupy intermediate positions. The upper and lower bounds of the divergence time between the two clusters is 12 kya and 8 kya, respectively. The pattern of clustering does not support the model that the primary source of the R1a1-M17 chromosomes in India was Central Asia or the Indus valley via Indo-European speakers.
The spread of J2a:
Figure 2 demonstrates the eastward expansion of J2a-M410 to Iraq, Iran and Central Asia coincident with painted pottery and ceramic figurines, well documented in the Neolithic archeological record (Cauvin 2000). Near the Indus valley, the Neolithic site of Mehrgarh beginning around 5000 BCE (Kenoyer 1998) displays the presence of these types of material culture correlated with the spread J2a-M410 in Pakistan. While the association of agriculture with J2a-M410 is recognized, it is not necessarily the only explanation for its history. Despite an apparent exogenous frequency spread pattern of hg J2a towards North and Central India from the west (Figure 2), it is premature to attribute it to a simplistic demic expansion of early agriculturalists and pastoralists from the Middle East. It reflects the overall net process of spread that may contain numerous as yet unrevealed movements embedded within the general pattern. It may also reflect a combination of elements of earlier prehistoric Holocene epi-paleolithic peoples from the Middle East, subsequent Bronze Age Harappans of uncertain provenance and succeeding Iron Age Indo-Aryans from Central Asia (Kennedy 2000). Further, the relative position of the Indian tribals (Fig. 4), the high microsatellite variance among them (Table 8), the estimated age (14 kya) of microsatellite variation within R1a1 (Table 7) and the variance peak in the west (Fig. 3) are entirely inconsistent with a model of recent gene flow from castes to tribes and a large genetic impact of the Indo-Europeans on the autochthonous gene pool of India. Instead, our overall inference is that an early Holocene expansion in NW India (including the Indus) contributed R1a1-M17 chromosomes both to the Central Asian and S Asian tribes prior to the arrival of the Indo-Europeans.
J2a in upper caste Indians:
The J2 clade is nearly absent among Indian tribals, except among Austro-Asiatic speaking tribals (11%). Among the Austro-Asiatic tribals, the predominant J2b2 hg occurs only in the Lodha.

Haplogroup J2a-M410 is confined to upper caste Dravidian and Indo-European speakers, with little occurrence in the middle and lower castes. This absence of even modest admixture of J2a in south Indian tribes and middle and lower castes is inconsistent with the L1 data. Overall, therefore, our data provide overwhelming support to an Indian origin of Dravidian speakers.
Haplogroup frequencies:

Free Image Hosting at

American Journal of Human Genetics (in press)

Polarity and Temporality of High Resolution Y-chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists

Sanghamitra Sengupta, Lev A. Zhivotovsky, Roy King, S. Q. Mehdi, Christopher A. Edmonds, Cheryl-Emiliane T. Chow, Alice A. Lin, Mitashree Mitra, Samir K. Sil, A. Ramesh, M.V. Usha Rani, Chitra M. Thakur, L. Luca Cavalli-Sforza, Partha P. Majumder and Peter A. Underhill


While considerable cultural impact on social hierarchy and language in south Asia is attributable to the arrival of nomadic Central Asian pastoralists, genetic data (mitochondrial and Y chromosomal) have yielded dramatically conflicting inferences on the genetic origins of tribes and castes of south Asia. We sought to resolve this conflict using high-resolution data on 69 informative Y-chromosome binary markers and 10 microsatellite markers from a large set of geographically, socially and linguistically representative ethnic groups of south Asia. We have found that the influence of Central Asia on the pre-existing gene pool was minor. The ages of accumulated microsatellite variation in the majority of Indian haplogroups exceed 10-15 kya, attesting to the antiquity of regional differentiation. Therefore, our data do not support models that invoke a pronounced recent genetic input from central Asia to explain the observed genetic variation in south Asia. R1a1 and R2 haplogroups indicate demographic complexity that is inconsistent with a recent single history. Associated microsatellite analyses of the high frequency R1a1 haplogroup chromosomes indicate independent recent histories of the Indus valley and the peninsular Indian region. Our data are also more consistent with a peninsular origin of Dravidian speakers than a source with proximity to the Indus and significant genetic input resulting from demic diffusion associated with agriculture. Our results underscore the importance of marker ascertainment towards distinguishing phylogenetic terminal branches from basal nodes when attributing ancestral composition and temporality to either indigenous or exogenous sources. Our reappraisal indicates that pre-Holocene and Holocene era – not Indo-European – expansions have shaped the distinctive south Asian Y-chromosome landscape.

No comments: