Showing posts with label J1. Show all posts
Showing posts with label J1. Show all posts

July 19, 2012

Huge study on Y-chromosome variation in Iran (Grugni et al. 2012)

This is the equivalent of a box of candy for anyone interested in Eurasian (pre-)history. I will have digest all the goodies within, and post any of my comments as updates to this post.

UPDATE I: Here is the table of haplogroup frequencies for easy reference:

One of the most interesting finds is the presence of a few IJ-M429* chromosomes  in the sample. Haplogroup IJ encompasses the major European I subclade, and the major West Asian J subclade. The discovery of IJ* chromosomes is consistent with the origin of this haplogroup in West Asia; it is widely believed that haplogroup I represents a pre-Neolithic lineage in Europe, although at present there are no Y chromosome-tested pre-Neolithic remains.

There is also a wide assortment of Q and R in Iran. While some of these may be intrusive (e.g., the 42.6% of Q1a2 in Turkmen, likely a legacy of their Central Asian origins), the overall picture appears consistent with a deep presence of these lineages in Iran. This is especially true for haplogroup R where pretty much every paragroup and derived group is present, excepting those likely to have originated recently elsewhere.

UPDATE II: From the paper:
Although accounting only for 25% of the total variance, the first two components (Figure 3) separate populations according to their geographic and ethnic origin and define five main clusters: East-African, North-African and Near Eastern Arab, European, Near Eastern and South Asian. The 1stPC clearly distinguishes the East African groups (showing a high frequency of haplogroup E) from all the others which distribute longitudinally along the axis with a wide overlapping between European and Arab peoples and between Near Eastern and South Asian groups. The 2ndPC separates the North-African and Near Eastern Arabs (characterized by the highest frequency of haplogroup J1) from Europeans (characterized by haplogroups I, R1a and R1b) and the Near Easterners from the South Asians (due to the distribution of haplogroups G, R2 and L). Iranian groups do not cluster all together, occupying intermediate positions among Arab, Near Eastern and Asian clusters. In this scenario, it is worth of noticing the position of three Iranian groups: (i) Khuzestan Arabs (KHU-Ar) who, despite their Arabic origin, are close to the Iranian samples; (ii) Armenians from Tehran (THE-Ar), whose position, in the upper part of the Iranian distribution, indicates a close affinity with the Near Eastern cluster, while their position near Turkey and Caucasus groups, due to the high frequency R1b-M269 and other European markers (eg: I-M170), is in agreement with their Armenia origin; (iii) Sistan Baluchestan (SB-Ba) that clusters with its neighbouring Pakistan.
UPDATE III: There are lots of little details in the haplogroup distribution that make historical sense. For example, C3 exists in Assyrians from Azarbaijan, and both C*, C3, and O exists in Zoroastrians from Yazd. It is often forgotten that before the spread of Islam, and quite time thereafter, Inner Asia was teeming with Zoroastrians and Nestorian Christians. It seems quite likely that these outliers represent a legacy of these communities.

UPDATE IV: I have a feeling that Razib will take exception with this statement: "Ancient Persian people were firstly characterized by the Zoroastrianism. After the Islamization, Shi'a became the main doctrine of all Iranian people."


UPDATE V: This confirms my observation from the recent studies in Afghanistan, that there is an inverse relationship of J2a and R1a in Iranian-speaking groups, with an excess of the latter among the eastern Iranians, and of the former among the Persians. From the paper:
Among the different J2a haplogroups, J2a-M530 [46] is the most informative as for ancient dispersal events from the Iranian region. This lineage probably originated in Iran where it displays its highest frequency and variance in Yazd and Mazandaran (Figure 2). Taking into account its microsatellite variation and age estimates along its distribution area (Tables S3 and S7), it is likely that its diffusion could have been triggered by the Euroasiatic climatic amelioration after the Last Glacial Maximum and later increased by agriculture spread from Turkey and Caucasus towards southern Europe. The high variance observed in the Italian Peninsula is probably the result of stratifications of subsequent migrations and/or of the presence of sub-lineages not yet identified. Of interest in the M530 network (Figures 2 and S3) is the presence of a lateral branch that is characterized by a DYS391 repeat number equal to 9. Differently from previous observations [46], this branch is not restricted to Anatolian Greek samples being shared with different eastern Mediterranean coastal populations. The M530 diffusion pattern seems to be also shared by the paragroups J2a-M410* and J2a-PAGE55*. In addition, the variance distribution of the rare R1b-M269* Y chromosomes, displaying decreasing values from Iran, Anatolia and the western Black Sea coastal region, is also suggestive of a westward diffusion from the Iranian plateau, although more complex scenarios can be still envisioned because of its non-star like structure.
Of course, the idea that the diffusion of J2a related lineages ties in with early agricultural expansions has been with us for a long time, but it is time to abandon it. First of all, as we have seen, J2a diminishes greatly as we head towards South Asia; it certainly doesn't look like the lineage of the multitude of agricultural settlements that sprang up along the southeastern vector soon after the invention of agriculture. Second, it is lacking so far in all ancient Y chromosome data from Europe down to 5,000 years ago. It seems much more probably that J2 related lineages spread from the highlands of West Asia much later. 


The "age estimates" are the result of using the inappropriate "evolutionary mutation rate", and become even older because of the inclusion of the DYS388 marker that is very stable in many haplogroups but very mutable within haplogroup J. On the left you can see frequency, Y-STR variance, and haplotype network structures for various J-related groups.


It is unfortunate that there is no progress in the phylogeographic assessment of R1a in this paper. There have been substantial discoveries of SNPs within this haplogroup as a result of commercial testing; however there is clearly an ascertainment bias in the newer discoveries, as almost all these SNPs have been detected in Europeans. The new paper confirms the high levels of Y-STR variance in India, Pakistan, and Iran. Together with the cornucopia of related paragroups in Iran, there is little doubt that this haplogroup originated in the general area of Central/South Asia.


Personally, as I have stated before, I would relate this R1a with Neolithic peoples living east of the Caspian, in contrast to the R1b bearers who lived west and south of it. These two populations came under the influence of the Indo-Europeans and spread in different directions. The Indo-Iranians were then initially the mixed descendants of the Indo-Europeans and the R1a old agricultural population, and were formed in the territory of the Bactria-Margiana Archaeological Complex. 


This also explains the contrast between Iranian and Armenian groups: the latter mostly lack the R1a lineage, contrasting with all Iranian groups (even their Kurdish neighbors) who possess it. Conversely, Iranian groups, and especially eastern Iranians and Indo-Ayrans lack the R1b lineage. This is due to the fact that neither R1a nor R1b were originally part of the Indo-European community, but their geographical position was such that they came under the influence of the Indo-Europeans when the latter began their expansion.


UPDATE VI: I have created my own dendrogram using the Y-haplogroup frequencies and the hclust package of R (default parameters):


From top to bottom, one can identify some clusters:

  • Eastern Europe, further broken down into Balkans and Slavic+Hungary
  • West Asian/Caucasus
  • Iranian Proper
  • Arab

These correspond largely to the clusters identified by the authors, with India and the Turkmen sample emerging as the clear outliers. I omitted the Ethiopian samples, since E-M78 was not resolved phylogenetically, causing the Ethiopians to group with the likely E-V13 from the Balkans.

UPDATE VII: I have also run MCLUST over the haplogroup frequency data over the MDS representation of the distance matrix. The maximum number of 10 clusters occurred with 5 MDS dimensions retained. Population assignments in the 10 clusters can be found in the table below:


Iran/Azerbaijan_Gharbi+Tehran_(Assyrian) 1
Iran/Lorestan_(Lur) 1
Iran/Tehran_(Armenian) 1
Iran/Azerbaijan_Gharbi_(Azeri) 2
Iran/Hormozgan_(Bandari+Afro-Iranian) 2
Iran/Hormozgan/Qeshmi 2
Iran/Khorasan_(Persian) 2
Iran/Kurdistan_(Kurd) 2
Iran/Sistan_Baluchestan_(Baluch) 2
Pakistan 2
Iran/Fars+Isfahan_(Persian) 3
Iran/Gilan_(Gilak) 3
Iran/Yazd+Tehran_(Zoroastrian) 3
Turkey/Central 3
Turkey/East 3
Turkey/West_ 3
Iran/Golestan_(Turkmen) 4
India 4
Iran/Khuzestan_(Arab) 5
Egypt_(Arab) 5
Iraq/Baghdad 5
Oman 5
Saudi_Arabia 5
Tunisia 5
United_Arab_Emirates 5
Iran/Mazandaran_(Mazandarani) 6
Iran/Yazd_(Persian) 6
Balkarian 6
Georgia 6
Albania 7
Greece 7
Bosnia 8
Croatia 8
Slovenia 8
Czech_Republic 9
Hungary 9
Poland 9
Ukraine 9
Iraq_(Marsh_Arab) 10
Qatar 10
Yemen 10


We can ignore cluster #4 which consists of the two outliers (India + Turkmen). The rest of the clusters seem relatively coherent. Notice, for example, the Arabian cluster #10, Balkan cluster #8, Eastern European cluster #9, Greek-Albanian cluster #7, Mixed Arab cluster #5.

PLoS ONE 7(7): e41252. doi:10.1371/journal.pone.0041252

Ancient Migratory Events in the Middle East: New Clues from the Y-Chromosome Variation of Modern Iranians

Viola Grugni et al.


Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations.

Link

February 29, 2012

Serbian Y-chromosomes

Gene. 2012 Jan 31. [Epub ahead of print]

High levels of Paleolithic Y-chromosome lineages characterize Serbia.

Regueiro M, Rivera L, Damnjanovic T, Lukovic L, Milasin J, Herrera RJ.

Abstract

Whether present-day European genetic variation and its distribution patterns can be attributed primarily to the initial peopling of Europe by anatomically modern humans during the Paleolithic, or to latter Near Eastern Neolithic input is still the subject of debate. Southeastern Europe has been a crossroads for several cultures since Paleolithic times and the Balkans, specifically, would have been part of the route used by Neolithic farmers to enter Europe. Given its geographic location in the heart of the Balkan Peninsula at the intersection of Central and Southeastern Europe, Serbia represents a key geographical location that may provide insight to elucidate the interactions between indigenous Paleolithic people and agricultural colonists from the Fertile Crescent. In this study, we examine, for the first time, the Y-chromosome constitution of the general Serbian population. A total of 103 individuals were sampled and their DNA analyzed for 104 Y-chromosome bi-allelic markers and 17 associated STR loci. Our results indicate that approximately 58% of Serbian Y-chromosomes (I1-M253, I2a-P37.2, R1a1a-M198) belong to lineages believed to be pre-Neolithic. On the other hand, the signature of putative Near Eastern Neolithic lineages, including E1b1b1a1-M78, G2a-P15, J1-M267 and J2-M172 and R1b1a2-M269 accounts for 39% of the Y-chromosome. Furthermore, an examination of the distribution of Y-chromosome filiations in Europe indicates extreme levels of Paleolithic lineages in a region encompassing Serbia, Bosnia-Herzegovina and Croatia, possibly the result of Neolithic migrations encroaching on Paleolithic populations against the Adriatic Sea.

Link

November 16, 2011

Armenian Y-chromosomes revisited (Herrera et al. 2011)

Armenian Y-chromosomes have been a largely ignored since the publication of the classic Weale et al. (2001) paper a decade ago. The Armenian DNA Project has largely covered the void during the intervening years, but it is nice that the topic is revisited by academics.

Armenia is sandwiched between Anatolia, the Fertile Crescent, the Iranian plateau, the Caucasus, and the Black and Caspian seas, making the study of Armenian Y-chromosomes extremely interesting for the student of Eurasian prehistory.

Gene flow from the surrounding regions may have affected the Armenian population over historical time, but the remoteness of the Armenian highlands, coupled with the national church -- which distinguished Armenians from both the Orthodoxy of the Roman Empire, the Zoroastrianism of the Persians, and, later the Islam of Arabs and Ottomans -- may have prevented it.

My comments on the paper will follow below once I read it.

UPDATE I: The paper spends a lot of time on analysis of Y-STR variance; my opinion of Y-STRs as a tool for inferring past population movements is, to put it mildly, low. When Bahamian Y-STR variance is higher than African one, and E-V13, one of the youngest European Y-haplogroups (in terms of Y-STR variance) turns up in Spain in one of the earliest ancient DNA samples, it goes without saying that the burden of proof is on those who wish to continue to talk about Neolithic or other population movements to make the assumptions of their models clearer. Nonetheless, there is still some utility in Y-STRs, so I reproduce some tree diagrams from the paper (top left), and link to the supplementary info that has a collection of haplotypes that may be useful to genealogists.

From the paper:
However, owing to the contentions associated with the current calibrations of the Y-STR mutation rates,32,34,35,41 as well as the limitations of the assumptions utilized by the methodologies for time estimations, the absolute dates generated in this study should only be taken as rough estimates of upper bounds.
Indeed. We are at the point where Y-STRs are at the end of their utility, but the replacement technology of extensive Y-chromosome sequencing has not quite arrived in an economical way yet.


UPDATE II:
I will have some additional thoughts on Y-chromosome distribution in the third update, but, for the time being, the two most important "nuggets" of information are: (i) the unusual haplogroup frequencies in Sasun (high R2 and T), which may be due to a founder effect, but it would be interesting if Armenian historians could find some explanation for their occurrence there, and (ii) the occurrence of R-M269*(xL23) in Ararat Valley. I invite more knowledgeable readers to comment on the issue; the haplotypes are in Table 2 of the supplement.

UPDATE III: The ubuiquity of haplogroup G2a in Neolithic Europe, coupled with the absence of other prominent present-day European haplogroups, has important implications about European discontinuity.

But, it also has implications about West Asian discontinuity. The Neolithic in Europe arrived by all accounts from either of two principal areas: Anatolia or the Levant. Today, in Anatolia and the Levant, we see a set of haplogroups of which haplogroup J is the most important and ubiquitous one. Haplogroup R1b is also quite frequent in Armenia, the east Caucasus, Anatolia, and Iran, but its frequency drops dramatically to the east and south. And, there is a whole assortment of other haplogroups with varying frequency.

Why didn't all these non-G2a haplogroups participate in the early Neolithic colonization of Europe? It could very well be that a very small founder population crossed the Aegean into Europe, one that happened to be G2a-dominated. But, that is ultimately not very satisfying: if there was plenty of J and R1b in West Asia at the time of the Neolithic expansion, why are these haplogroups so conspicuous in their absence -at least so far- from Neolithic Europe?

The case of haplogroup J is particularly problematic. If we had to guess, by looking at present-day distribution, which lineage tracks population movements from the Near East to Europe, there is simply no better candidate: every map of this haplogroup, and especially of its J2a sublineage shows an unambiguous pattern of radiation, with a core area consisting of Southern Italy, Greece, Anatolia, West Asia, Mesopotamia and the northern parts of the Levant. All these regions are crucial to the story of the Neolithic, so the absence of J in Neolithic Europe is perplexing.

And, the story has other complications. From the current paper:
The relative expansion times for haplogroup J2-M172 (Table 4) generally correspond with those yielded for R1b-M343, with the exception of Greece and Crete, which, unlike haplogroup R1b-M343, are slightly older than the dates yielded for several of the Near Eastern groups as well as the four Armenian populations.
As mentioned above, I don't give much weight on Y-STR evidence, but observations such as the above certainly add to the feeling of unease that something is not quite right with the default picture of prehistory.

Another observation on the Armenian population, is its very low frequency of haplogroup R1a1. Proponents of the Kurgan model of Indo-European dispersals sometimes associate this haplogroup with the Proto-Indo-European community, and it is strange why -if their ideas are right- Armenia is so lacking in this haplogroup, like its Caucasian neighbors. Why would these hypothetical migrants make such a huge impact in faraway India and barely a dent in nearby Armenia?

Finally, the occurrence of some I2, E-V13, and, perhaps, J2b in Armenia may point to Balkan contacts. But, when did these contacts occur? Are they traceable to the migration of Phrygians to Anatolia, according to the Herodotean account of Armenian origins, or can they be attributed to later contacts with Greeks or other Europeans?

The veil of mystery seems to be raised even higher by every new study: we may be less certain of what really happened today than in the days of happy ignorance, ten years ago. Ultimately it is new data, like the ones included in this paper, that will make every piece of evidence fit, and the grand puzzle of the history of Eurasia will be revealed in all its glory.

European Journal of Human Genetics , (16 November 2011) | doi:10.1038/ejhg.2011.192

Neolithic patrilineal signals indicate that the Armenian plateau was repopulated by agriculturalists

Kristian J Herrera, Robert K Lowery, Laura Hadden, Silvia Calderon, Carolina Chiou, Levon Yepiskoposyan, Maria Regueiro, Peter A Underhill and Rene J Herrera

Abstract
Armenia, situated between the Black and Caspian Seas, lies at the junction of Turkey, Iran, Georgia, Azerbaijan and former Mesopotamia. This geographic position made it a potential contact zone between Eastern and Western civilizations. In this investigation, we assess Y-chromosomal diversity in four geographically distinct populations that represent the extent of historical Armenia. We find a striking prominence of haplogroups previously implicated with the Agricultural Revolution in the Near East, including the J2a-M410-, R1b1b1*-L23-, G2a-P15- and J1-M267-derived lineages. Given that the Last Glacial Maximum event in the Armenian plateau occured a few millennia before the Neolithic era, we envision a scenario in which its repopulation was achieved mainly by the arrival of farmers from the Fertile Crescent temporally coincident with the initial inception of farming in Greece. However, we detect very restricted genetic affinities with Europe that suggest any later cultural diffusions from Armenia to Europe were not associated with substantial amounts of paternal gene flow, despite the presence of closely related Indo-European languages in both Armenia and Southeast Europe.

Link

October 05, 2011

Y-chromosomes of Marsh Arabs

What do the Marsh Arabs have to do with ancient Sumer? Nothing that can be determined on the basis of this data. There are plenty of ancient Sumerian skulls, so how about we study them directly?

As far as I can see, the only link between Marsh Arabs and Sumerians presented in this paper comes from dating Y-STR variation of their major J1-Page08 group using the evolutionary mutation rate, with a divergence time of 4.5 +/- 2.6 ky. Even if that mutation rate was correct (it is not) and the assumptions on which the confidence interval are based were exhaustive (they are not), we still have +/- 2.6 ky leeway to deal with, which spans not only the Sumerians but plenty more besides.

Not to mention that the evolutionary mutation rate is wrongly applied to every case under the sun, and that Y-STR based age estimation in general has been conclusively shown to be a rather futile exercise.

Nonetheless, the paper does have value in demonstrating the paucity of J2 and R1 in the Marsh Arabs compared to the more cosmopolitan general Iraqi population:
Different from the Iraqi control sample, the Marsh Arab gene pool displays a very scarce input from the northern Middle East (Hgs J2-M172 and derivatives, G-M201 and E-M123), virtually lacks western Eurasian (Hgs R1-M17, R1-M412 and R1-L23) and sub-Saharan African (Hg E-M2) contributions.
Rather than "Sumerian", it seems that the Marsh Arabs have rather preserved a more pristine Semitic patrilineal gene pool compared to the cosmopolitan Iraqi samples that have absorbed pre-Arab and pre-Semitic population elements.


BMC Evolutionary Biology 2011, 11:288doi:10.1186/1471-2148-11-288

In search of the genetic footprints of Sumerians: a survey of Y-chromosome and mtDNA variation in the Marsh Arabs of Iraq.

Nadia Al-Zahery et al.

Abstract (provisional)

Background
For millennia, the southern part of the Mesopotamia has been a wetland region generated by the Tigris and Euphrates rivers before flowing into the Gulf. This area has been occupied by human communities since ancient times and the present-day inhabitants, the Marsh Arabs, are considered the population with the strongest link to ancient Sumerians. Popular tradition, however, considers the Marsh Arabs as a foreign group, of unknown origin, which arrived in the marshlands when the rearing of water buffalo was introduced to the region.

Results
To shed some light on the paternal and maternal origin of this population, Y chromosome and mitochondrial DNA (mtDNA) variation was surveyed in 143 Marsh Arabs and in a large sample of Iraqi controls. Analyses of the haplogroups and sub-haplogroups observed in the Marsh Arabs revealed a prevalent autochthonous Middle Eastern component for both male and female gene pools, with weak South-West Asian and African contributions, more evident in mtDNA. A higher male than female homogeneity is characteristic of the Marsh Arab gene pool, likely due to a strong male genetic drift determined by socio-cultural factors (patrilocality, polygamy, unequal male and female migration rates).

Conclusions
Evidence of genetic stratification ascribable to the Sumerian development was provided by the Y-chromosome data where the J1-Page08 branch reveals a local expansion, almost contemporary with the Sumerian City State period that characterized Southern Mesopotamia. On the other hand, a more ancient background shared with to Northern Mesopotamia is revealed by the less represented Y-chromosome lineage J1-M267*. Overall our results indicate that the introduction of water buffalo breeding and rice farming, most likely from the Indian sub-continent, only marginally affected the gene pool of autochthonous people of the region. Furthermore, a prevalent Middle Eastern ancestry of the modern population of the marshes of southern Iraq implies that if the Marsh Arabs are descendants of the ancient Sumerians, also the Sumerians were most likely autochthonous and not of Indian or South Asian ancestry.

Link

September 14, 2011

The Caucasus revisited (Yunusbayev et al. 2011)


This is another treasure trove of a paper, and together with Balanovsky et al. (2011) we now have a very clear picture of genetic variation in this most interesting of world regions.

Here is the ADMIXTURE analysis:

The authors also post results up to K=10 in the supplementary material, which show Druze/Bedouin/Basque-centered component. It is actually possible to push the analysis higher than K=7 without such problem components appearing, by retaining non-closely related individuals (using --genome in PLINK and then iteratively removing individuals from pairs with PI_HAT greater than some value).

Nonetheless, the components emerging from this analysis will be familiar to followers of the Dodecad Project. In terms of Dodecad v3:
  • light yellow "North East Asian"
  • orange "South East Asian"
  • brown "Neo African" or "Sub_Saharan", as there are no African hunter-gatherers
  • dark blue "North European", as there is no split of east/west Europe at this level
  • middle blue "West Asian"
  • light blue "Southwest Asian"
  • green "South Asian", but anchored on Sindhi, a population from Pakistan, due to the lack of more southern populations from India
The labels of new populations sampled in this study can be seen in brown. I particularly hope that the substantial new autosomal data will become publicly available, so that I can use them in the Dodecad Project. It will be an invaluable new resource, filling some "holes" in the Eurasian landscape (e.g., east of the Caspian; Bulgarians; several new Caucasus populations) in the Li et al. (HGDP), and Behar et al. data.

(to be continued)

UPDATE I (Y-chromosomes):


Some observations:
  • C has a concentration in the Turkic Nogays
  • The presence of D this far west is very surprising, again in the Nogays. This haplogroup has a relic distribution, with particular concentrations in Tibet, Mongolia, Japan, and Andaman Islanders. In all likelihood its presence here is linked to the Nogays' eastern origin
  • E and its subclades occurs at a very low frequency here
  • G2a has a clear West Caucasus (both north and south) concentration
  • I seems to have a mainly West Caucasus distribution as well; this is a common European haplogroup; it has quite elevated frequencies among the Andis and Kara Nogays. It would be interesting to discover some historical correlate for the presence of I in Kara Nogays but not Kuban Nogays and in Andis but not in most of the NE Caucasus
  • J1 has the expected Northeast Caucasus nexus. This haplogroup is bimodal, with a mode in Arabians and a secondary mode in NE Caucasus. Note the paucity of J1e-P58, the reverse of the situation of Arabians; I've noted before the likely association of the P58 clade with Semitic languages.
  • The extreme concentration of J2 in Chechens and Ingush are probably associated with low variance. Apart from these atypical populations, a substantial presence of this haplogroup can be found in the NW/S Caucasus in different populations and in the form of different subclades.
  • The new LT mystery clade has its usual low-frequency wide distribution
  • N occurs in Nogays as expected, and, like C, also in the NW Caucasus. This probably also represents an eastern influence, probably associated not only with the Nogays but also with various Tatar influences on the Caucasus.
  • Q occurs widely in the NW Caucasus but only in 1 Nogay. Perhaps this is more of a Tatar marker, although a finer-scale resolution of this haplogroup is really necessary.
  • R1a-related lineages occur less frequently here among eastern Slavs, a main reason for the disconnect between the Eastern European plain and the Caucasus. There does, however, appear to be good diversity here, with the presence of R1a*, R1a1-M198*, Note again how the Iranic Ossetians (both North and South) have almost no R1a1 compared to both their NW Caucasian and S Caucasian neighbors, again, suggesting that this may not have been an important Alan or steppe Iranian lineage, at least during the late antique time horizon. The occurrence of R1a1f-M458 may represent Slavic influence in the NW Caucasus.
  • R1b-related lineages seem ubuiquitous in the Caucasus. R-M73 occurs substantially in Kara Nogays and Balkars, an apparent link with Central Asia where this haplogroup occurs frequently.
UPDATE II (Caucasus-Eastern Europe discontinuity)

The authors of this paper highlight the genetic discontinuity between the eastern European plain and the Caucasus. This was also apparent in the Balanovsky et al. (2011) paper, and was also a major conclusion of the Dodecad Project, with Caucasians exhibiting a high percentage of the "West Asian" component, while eastern Slavs low "West Asian" and high "East European".

The interpretation of this discontinuity is more difficult. There are surely parts of the Caucasus region that are mountainous and pose an ecological contrast to the flatlands of eastern Europe. That is consistent with a different type of population living in either region for a long time, despite the well-attested archaological contacts (e.g., Maikop or the settlement of steppe nomads such as Alans or Sarmatians).

On the other hand, the eastern Slavic population can, at least in part, have expanded more recently, in the medieval period, as part of the early Slavic dispersals, as well as the push to the north and east of the Russians. These appear to have partly displaced Turkic groups from the north Pontic region, with all of the above having displaced historical Scythian (Iranic) nomads, who, in turn, displaced the mysterious Cimmerians. If the discovery of east Eurasian mtDNA C in Neolithic and Bronze Age Ukraine stands up, there will be another layer of population replacement, as mtDNA C is quite rare in the broader region today. On the other hand, the Caucasus itself may have been affected from population movements from the Near East, as Balanovsky et al. suggest.

So, in conclusion, the discontinuity is a fact that emerges from different types of analyses, but its causes remain uncertain, and it is not clear when and how it was first established.

Mol Biol Evol (2011) doi: 10.1093/molbev/msr221

The Caucasus as an asymmetric semipermeable barrier to ancient human migrations

Bayazit Yunusbayev et al.

Abstract

The Caucasus, inhabited by modern humans since the Early Upper Paleolithic and known for its linguistic diversity, is considered to be important for understanding human dispersals and genetic diversity in Eurasia. We report a synthesis of autosomal, Y chromosome and mitochondrial DNA (mtDNA) variation in populations from all major subregions and linguistic phyla of the area. Autosomal genome variation in the Caucasus reveals significant genetic uniformity among its ethnically and linguistically diverse populations, and is consistent with predominantly Near/Middle Eastern origin of the Caucasians, with minor external impacts. In contrast to autosomal and mtDNA variation, signals of regional Y chromosome founder effects distinguish the eastern from western North Caucasians. Genetic discontinuity between the North Caucasus and the East European Plain contrasts with continuity through Anatolia and the Balkans, suggesting major routes of ancient gene flows and admixture.

Link

May 15, 2011

Genes and Languages in the Caucasus

If there was ever a paper that was the equivalent of a box of candy, this is probably it. I will update this post with my comments.

UPDATE I (Genealogical rate, Gene-language concordance, Ossetes): I seriously don't know where to begin with this paper. So, given the serendipitous appearance of an abstract on Y-chromosome mutation rates, here is a major new pro-genealogical rate quote from the new paper:
We found that “evolutionary” estimates of most clusters fall far outside the range of the respective linguistic dates, while “genealogical” estimates gave a good fit with the linguistic 23 dates. At least two population events in the Caucasus are documented archaeologically, which allows additional comparison with these “historical” dates. In both cases, the historical (archaeological) date is similar to a genetic estimate based on the “genealogical” mutation rate (Supplementary Note 2).
And, here's a comparison of the linguistic and genetic (based on Y-chromosomes) trees from the paper:
The correspondence seems remarkable; the only major discrepancy is for Iranic (Indo-European) Ossetes who group with NW Caucasians genetically, which makes sense as the Ossetes are probably to a large extent NW Caucasians that underwent a language shift at the influence of the Alans.

Speaking of the Ossetes, their negligible R1a1-M198 frequency (0.4-0.8%) should be a warning that Iranic steppe nomads _does not equal_ R1a1. While a limited contribution of Alans to the Ossetes is expected, it is not expected that Ossetes will have two of the lowest M198 frequencies in the Caucassus: in all probability R1a1 was not particularly important among Alans, and, by implication (?) Sarmatians.

UPDATE II (4 haplogroups for 4 language families):

The most interesting discovery in this paper is, of course, the correspondence between Y-chromosome haplogroups and language groups, thanks to the very large number of individuals tested and the deep phylogenetic resolution of the haplogroups:
Overall, the most frequent haplogroups in the Caucasus were G2a3b1-P303 (12%), G2a1a-P18 (8%), J1*-M267(xP58) (34%), and J2a4b*-M67(xM92) (21%), which together encompassed 73% of the Y chromosomes, while the other 24 haplogroups identified in our study comprise the remaining 27% (Table 2). ... haplogroup G2a3b1-P303 comprised at least 21% (and up to 86%) of the Y chromosomes in the Shapsug, Abkhaz and Circassians ... haplogroup G2a1a-P18 comprised at least 56% (and up to 73%) of the Digorians and Ironians (both from the Central Caucasus Iranic linguistic group), while not being found at more than 12% (average 3%) in other populations... haplogroup J2a4b*-M67(xM92) comprised 51-79% of the Y chromosomes in the Ingush and three Chechen populations (North-East Caucasus, Nakh linguistic group), while, in the rest of the Caucasus, its frequency was not higher than 9% (average 3%) ... haplogroup J1*-M267(xP58) comprised 44-99% of the Avar, Dargins, Kaitak, Kubachi, and Lezghins (South-East Caucasus, Dagestan linguistic group) but was less than 25% in Nakh populations and less than 5% in the rest of Caucasus.

Interestingly, G2a3 is one of the lineages of early Central European farmers, and 2 medieval German knights. G2 is also, curiously, one of the West Eurasian lineages that are found in very small quantities in India, especially among upper caste Hindus. We are beginning to make connections across space and time, even though the patterns are far from clear yet.

The prevalence of J1*-M267(xP58) in Dagestan is well known (or suspected) from previous studies. Notice that J-P58, if we use the genealogical rate has an age of ~5.4ky in Semitic groups, and this is in concordance with the 5,750 years ago origin of Semitic languages based on Bayesian phylogenetics. So, it is clear that part of haplogroup J1 was prevalent in ancient Semitic groups, another, disjoint part in ancient Dagestani groups.

To make things more interesting, the Nakh groups (Ingush and Chechens) have J2a4b*-M67(xM92) as their modal haplogroup. Nakh is also a Northeast Caucasian language subfamily, like Dagestani, and indeed NE Caucasian is also called Nakho-Daghestanian. What did the early speakers of this family look like?

It would be tempting to think that Proto-Nakho-Dagestanians were J1-dominated, as J1 exists in both Nakh (16-25%) and Dagestani (58-99%) groups, whereas J2a4b-M67 (the Nakh modal haplogroup) is nearly completely absent in Dagestanians.

UPDATE III (No European influence):

Another interesting discovery of this study is the lack of European influence in the populations of the North Caucasus.
It seems that both R1a1a-M198 and I2a-P37 have a major barrier eastward in the Don river. Please note that the former is not strictly a European haplogroup, but it nonetheless experiences a massive drop in frequency, and is negligible everywhere except in Abkhaz-Circassians (NW Caucasus; 10.3-19.7%), with an outlier in Dargins (22%).

This seems to put a limit on the origin of any hypothetical movements across the Eurasian steppe east of the Don river, as haplogroup I2a-P37 is largely absent in Central Asia, and occurs 3 times in 1,525 individuals in this sample. So, while there have been proposals of a Central European origin of some steppe pastoralist groups, these are hard to reconcile with this picture.

UPDATE IV (Haplogroup G):

Two of the modal haplogroups in this paper are G2a1a-P18 (Iranic, 56-73%) and G2a3b1-P303 (NW Caucasians, 21-86%). Battaglia et al. (2008) also found a high frequency of G2a* in Georgians and Balkars (~30%, also modal in both populations). It appears that G2a is a mainly West (both NW and SW) Caucasian phenomenon within the context of this region.

UPDATE V (Starostin and Language depth)

The authors applied the methodology of the late Sergei Starostin to the problem of language time depth:
The present work employs Starostin’s methodology, and we made special efforts to create the high-quality linguistic databases required for this analysis. Thus, based on significantly extended and revised linguistic databases, we have applied a glotto-chronological approach to the North Caucasian languages. As a result, our study provides a unique opportunity to make direct comparisons of linguistic and genetic data from the same populations. Lexico-statistical methods have also been applied to a number of language families using a Bayesian approach to increase the statistical robustness of language classification (Gray and Atkinson, 2003; Kitchen et al., 2009; Greenhill et al., 2010). Using these methods with the Caucasus languages under
study here will be the focus of future work.
It will certainly be interesting to see Bayesian phylogenetic methods applied to the Caucasus languages in the future, using the linguistic datasets developed here. The concordance of genetic-linguistic results in this paper, in addition to the many successes of the G&A approach, is making it increasingly difficult for those who doubt our ability to estimate the age of language families in a manner similar to that with which biologists estimate the age of genetic variation.

See also Tower of Babel project and the Evolution of Human Languages project at the Santa Fe Institute.

UPDATE VI (Haplogroup J2a)

I have recently speculated about a possible link between the Caucasus region and India based on the appearance of a "Dagestan" component in India, the clear West Asian origin of Ancestral North Indians, as well as a possible linguistic link between Northeast Caucasian, Hurrian, and Indo-European.

A problem with that theory is that the high J1*(xP58) frequency in Dagestan has no counterpart in South Asia. The current study, however, adds data on the Nakh part of the Nakho-Dagestanian (Northeast Caucasian) family, showing this to be J2a4b-M67 dominated. So, while I think that J1*(xP58) may have been present among Proto-Northeast Caucasians, these must have interacted with J2a folk.

J-M67 is clearly intrusive into the Central Caucasus, from the South where a much greater variety of J2a-related lineages is observed among Armenians, North Iranians, and Anatolian Turks.

We now have good coverage of J2a in the entirety of the West Asian region, with the exception of Azerbaijan, and a few patterns are beginning to emerge:
  1. The center of the J2a world is somewhere between eastern Turkey, Armenia, Azerbaijan, Iran, and Syria
  2. The Caucasus is a northern extension of this world, just as Greece and Italy are its main western extensions, with a strong extension into Central Asia as far as Xinjiang, and well into South Asia all the way to upper caste South Indian Hindus.
  3. In the Caucasus itself J-M67 is dominating Nakh speakers, but with little other J2a related variation.
  4. In comparison to Nakhs, J2a seems more varied in Georgians, among Ossetes, and among NW Caucasian speakers
It is hard to make any pronouncements on how J2a spread northwards from its Transcaucasian cradle, but I would think that the Kura-Araxes and Maikop cultures are fairly good candidates for that spread, with the former being J2a dominated, and the latter being more G2a dominated. I would not, however, dismiss a more recent spread of J2a into the region.

UPDATE VII (Absence of E1b1b1):

This haplogroup has a more Mediterranean distribution and is conspicuously absent in the North Caucasus. Unfortunately no downstream markers were typed, but (a) its presence in small amounts in NW Caucasians (1-1.7%) together with a similar low frequency (1.5%) in Georgians, (b) its absolute absence among Nakho-Dagestanians, except for one Lezghin, suggest to me that it arrived to the region from the west, and is probably a low-frequency trace of Ancient Greek colonies of the Black Sea, just as it is associated with Greek colonists in the West Mediterranean and Sicily.

UPDATE VIII (Haplogroups L and T):

There is a little haplogroup L in the North Caucasus. L-M27 and L-M317 seems concentrated in the Northwest, while L-M357 is found only in Nakh speakers. The detection of L-M357 in North but not South Iran may be related with this population, and also the L-rich population of Syria, especially from the eastern inland area.

Haplogroup T has been the subject of a major recent paper. In this region, it is found in 2 NW Caucasians, 1 Ossete and a couple of Lezgins, but unfortunately with no fine phylogenetic resolution.

Mol Biol Evol (2011) doi: 10.1093/molbev/msr126

Parallel Evolution of Genes and Languages in the Caucasus Region

Oleg Balanovsky1,2,*, Khadizhat Dibirova1,*, Anna Dybo3, Oleg Mudrak4, Svetlana Frolova1, Elvira Pocheshkhova5, Marc Haber6, Daniel Platt7, Theodore Schurr8, Wolfgang Haak9, Marina Kuznetsova1, Magomed Radzhabov1, Olga Balaganskaya1,2, Alexey Romanov1, Tatiana Zakharova1, David F. Soria Hernanz10,11, Pierre Zalloua6, Sergey Koshel12, Merritt Ruhlen13, Colin Renfrew14, R. Spencer Wells10, Chris Tyler-Smith15, Elena Balanovska1 and The Genographic Consortium16

We analyzed 40 SNP and 19 STR Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees, and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language co-evolution occurred within geographically isolated populations, probably due to its mountainous terrain.

Link

October 14, 2010

African admixture in the Near East: where from?

Here is the result of running ADMIXTURE for K=5 using 275K SNPs on the combined HGDP + HapMap African and West Asian populations, also including Adygei and Tuscans. The populations are in order: Luhya, Maasai, Tuscans, Yoruba, Adygei, Bedouins, Druze, Mozabites, Palestinians.
At this level of detail, Africans are divided into three clusters which can be labeled Sub-Saharan (red), East African (blue), and "Mozabite" or North African (purple). Europeans and West Asians form the green cluster, while the Arab samples have a substantial contribution of the yellow cluster.

Here are the admixture proportions:

African admixture in the two European populations is probably in the limits of statistical noise and consists of "Mozabite" (0.4%) for Tuscans and "E African" for Adygei (0.6%).

Druze, an Arab population that was religiously isolated from Arab Muslims for about a thousand years seems to have correspondingly missed most African admixture, registering 0.6% "Mozabite" and 1.1% "E African".

Non-Druze Arabs have clear traces of African admixture both in the form of "Mozabite" North African (4.5% for Palestinians, 4.9% for Bedouins), E African (6% for Palestinians and 5.7% for Bedouins) and a little Sub-Saharan (1.3% for Palestinians and 2.1% for Bedouins).

I had pointed the mainly eastern African admixture in Near Eastern Arabs a year ago in my review of HAPMIX. Clearly Maasai are a better stand-in than the Yoruba for whatever African ancestry Arabs have.

It is quite interesting to note the genetic distance (expressed in Fst) between the five inferred clusters:

We can plainly see that proximity to Eurasians increases in the order of Sub-Saharan, East African, "Mozabite". I have little doubt that Somalis and Ethiopians from East Africa would occupy an intermediate position between Maasai from Kenya and "Mozabites" in that order.

An interesting observation is that the "Arab" cluster is slightly more distant to all African clusters than the European/W Asian cluster is. This might seem perplexing as geography might dictate that it should be closer to the African clusters.

However, this is not very surprising to me, as there was gene flow between West Asia and Europe and Africa in old times, evidenced by such things as the presence of Eurasian Y-haplogroup R-V88 in Africa and African haplogroup E1b in Europe and West Asia.

The original Arab ancestors, were probably haplogroup J1e-bearing Semites exploiting arid environments of West Asia. Present-day Levantine Arabs (especially Bedouins, in the available samples) maintain a strong signal of this component of their ancestry, admixed, however, principally with the original Tuscan- and Adygei- like West Asians, and secondarily with E and N Africans.

Revisiting GenomesUnzipped "Ashkenazi Jewish" admixture

There were two individuals in my recent post who showed some evidence of "Ashkenazi Jewish" admixture (DBV001: 100% and VXP001: 32%). I list in the comments of that post some possible explanatios for why VXP001 (who has no knowledge of Jewish ancestry) might get such a result. Naturally, using 275K SNPs is better than the 192 of EURO-DNA-CALC, so I did a separate run that included these two individuals.

The results are:

DBV001: 85.1% European/W Asian, 10.5% "Arab", 0.5% "E African", and 3.8% "Mozabite". This is entirely consistent with full known Jewish ancestry. The closest population to the Middle Eastern component of Jews are presumably the Druze, who have about 16.9% of the "Arab" (which should probably be relabeled "Semitic") cluster. Ashkenazi Jews are known to be intermediate between Levantine and European populations, and DBV001's result is entirely consistent with this.

As I've mentioned before, the exact percentage of Middle Eastern ancestry in modern European Jews is difficult to estimate, as this would depend on determining the exact percentage of "European/W Asian" and "Semitic" components was present in their gene pool before they settled in Europe. If, for example, they were 100% in the "Semitic" cluster, then DBV001 would be about 10% of Middle Eastern ancestry, but if they were like modern Druze, then this percentage would be 100*10.5/16.9 = 62.1%. The truth is probably somewhere in between.

VXP001: A shorter story, as VXP001 comes out 100% "European/W Asian". Thus, I am inclined to believe that VXP001's AJ score is either due to the small number of markers, or to a European-origin component in the composite Ashkenazi Jewish gene pool that he happens to share.

UPDATE (Oct 23): A much more detailed analysis of Genomes Unzipped individuals.

June 03, 2010

Two major groups of living Jews (Atzmon et al. 2010)

(Last Update Jun 10)

More on this paper as soon as I read it carefully. Nature News has an overview of the research. This study addresses my question about the extent of Southern vs. Central/European ancestry in Jews.

It is also entirely consistent with my theory that Diaspora Jews are to a certain extent descended from Italian-Balkan-Anatolian groups, among which they lived in Hellenistic-Roman-Late Antique times; my guess is that Middle-Eastern Jews form a distinct group in relation to European/Syrian ones because, unlike them, they had a smaller opportunity of absorbing Euranatolians, and their admixture -if any- came from linguistically (and probably genetically) related Semitic groups.

UPDATE I (Jun 4)

It is a bit frustrating how the authors did not limit themselves to HGDP but also included a wide variety of populations from POPRES, which they bundled together in a fairly arbitrary way:
Next, each of 2407 European subjects was assigned into one of 10 groups based on geographic region: South:Italy, Swiss-Italian; Southeast: Albania, Bosnia-Herzegovina, Bulgaria, Croatia, Greece, Kosovo, Macedonia, Romania, Serbia,Slovenia, Yugoslavia; Southwest: Portugal, Spain; East: CzechRepublic, Hungary; East-Southeast: Cyprus, Turkey; Central:Austria, Germany, Netherlands, Swiss-German; West: Belgium,France, Swiss-French, Switzerland; North: Denmark, Norway,Sweden; Northeast: Finland, Latvia, Poland, Russia, Ukraine;Northwest: Ireland, Scotland, UK.
I don't know exactly why Switzerland should be bundled with Belgium, while Austria with Netherlands, or that Finns would be bundled with Poles, or Albanians, Greeks, and Slavs would be bundled in a broad "Southeast" group. Anyway, the authors don't use this POPRES sample much in their actual paper, although they present some results in the supplement, so I won't dwell on it further.

UPDATE II (Jun 4)

On the left we have panel B from Figure 1 in the paper, which shows the first two principal components in a regional context. Capitalized labels represent Jewish groups. Note that Iranian and Iraqi Jews don't show a particular relationship to Arabs (Bedouins and Palestinians). It would be interesting to see if there is a relationship with Iraqis or Iranians, which might indicate whether admixture with these local groups is responsible for Iranian/Iraqi Jewish distinctiveness. As in previous studies, most other Jews, including Syrian Jews, are located between Europeans and Druze; the lack of non-Jewish Euranatolian populations is especially baffling.

I am particularly interested in the seemingly very close relationship between Greek, Turkish, and Italian Jews. The GRK-TUR relationship is not that puzzling, as these are mostly Ottoman Jews who found themselves on different sides of national borders, and we would not expect them to be any different. But, why would they be so similar to Italian Jews? Speaking of Greek Jews, how many of them are of Romaniote and how many of Sephardic extraction?

UPDATE III (Jun 4)

The STRUCTURE analysis is also quite interesting. Jews seem to lack appreciable levels of East Asian (orange) or Sub-Saharan African (yellow) admixture, or of Central/South Asian admixture (green). The lack of E/C/S Asian admixture is especially damning of the Khazar hypothesis.

We should probably not interpret the three main visible components ("European" blue, "Mozabite" purple, "Near Eastern" pink) as representing ancestral proportions of European, North African, and Near Eastern elements. For example, Mongoloids have some "purple" while it is unlikely that they have North African admixture; so, while purple has an obvious relationship to Mozabites, it is not a good fit for an ancestral population group. Its substantial presence in the Near East also precludes such an easy interpretation.

Nor can we easily infer the percentage of "European" and "Near Eastern" admixture in Jews. The "Pink" element seems to grade from prominence among Iranian Jews to insignificance among Basques, but what did the original European and Jewish groups look like? Depending on how close they were to the Basque and Iranian Jewish end of the gradient, quite different admixture proportions would arise.

In more mathematical terms, a gradient can be represented as a single variable x going from 0 to 1, e.g., pink/(pink+blue) in the STRUCTURE analysis or relative position between Basques and Druze in the PCA figure above. But x can be expressed in an infinite number of ways as a weighted summation of two other numbers between 0 and 1. If ancestral groups were exactly like Basques and Druze or they were exactly pure blue and pure pink, then we could arrive at exact ancestral proportions for living Jews, but unfortunately, unlike situations where clear-cut well-differentiated ancestral groups exist to act as yardsticks, this does not appear -as of yet- to be the case for intra-Caucasoid variation.

UPDATE IV (Jun 4)

From the paper:
Admixture with local populations, including Khazars and Slavs, may have occurred subsequently during the 1000 year (2nd millennium) history of the European Jews. Based on analysis of Y chromosomal polymorphisms, Hammer estimated that the rate might have been as high as 0.5% per generation or 12.5% cumulatively (a figure derived from Motulsky), although this calculation might have underestimated the influx of European Y chromosomes during the initial formation of European Jewry. Notably, up to 50% of Ashkenazi Jewish Y chromosomal haplogroups (E3b, G, J1, and Q) are of Middle Eastern origin,15 whereas the other prevalent haplogroups (J2, R1a1, R1b) may be representative of the early European admixture. The 7.5% prevalence of the R1a1 haplogroup among Ashkenazi Jews has been interpreted as a possible marker for Slavic or Khazar admixture because this haplogroup is very common among Ukrainians (where it was thought to have originated), Russians, and Sorbs, as well as among Central Asian populations, although the admixture may have occurred with Ukrainians, Poles, or Russians, rather than Khazars. In support of the ancestry observations reported in the current study, the major distinguishing feature between Ashkenazi and Middle Eastern Jewish Y chromosomes was the absence of European haplogroups in Middle Eastern Jewish populations.
I would not be so quick to assign haplogroups to European or Middle Eastern origin. For example, G seems to have originated in the Middle East, but it is quite plentiful in substantial parts of Europe. So, while its ultimate origins may be West Asian (it arose in a man who lived in West Asia thousands of years ago), its proximate origin may be European in some particular case.

As I have argued before, I doubt E3b (or E1b1b) was an original Jewish lineage, J2 probably represents Iranian/Euranatolian admixture in Jews, while J1 (or a subset thereof) has strong Semitic connotations.

UPDATE V (Jun 10):

Another paper by Behar et al. (2010) on the same topic.


AJHG doi:10.1016/j.ajhg.2010.04.015

Abraham's Children in the Genome Era: Major Jewish Diaspora Populations Comprise Distinct Genetic Clusters with Shared Middle Eastern Ancestry

Gil Atzmon et al.

Abstract

For more than a century, Jews and non-Jews alike have tried to define the relatedness of contemporary Jewish people. Previous genetic studies of blood group and serum markers suggested that Jewish groups had Middle Eastern origin with greater genetic similarity between paired Jewish populations. However, these and successor studies of monoallelic Y chromosomal and mitochondrial genetic markers did not resolve the issues of within and between-group Jewish genetic identity. Here, genome-wide analysis of seven Jewish groups (Iranian, Iraqi, Syrian, Italian, Turkish, Greek, and Ashkenazi) and comparison with non-Jewish groups demonstrated distinctive Jewish population clusters, each with shared Middle Eastern ancestry, proximity to contemporary Middle Eastern populations, and variable degrees of European and North African admixture. Two major groups were identified by principal component, phylogenetic, and identity by descent (IBD) analysis: Middle Eastern Jews and European/Syrian Jews. The IBD segment sharing and the proximity of European Jews to each other and to southern European populations suggested similar origins for European Jewry and refuted large-scale genetic contributions of Central and Eastern European and Slavic populations to the formation of Ashkenazi Jewry. Rapid decay of IBD in Ashkenazi Jewish genomes was consistent with a severe bottleneck followed by large expansion, such as occurred with the so-called demographic miracle of population expansion from 50,000 people at the beginning of the 15th century to 5,000,000 people at the beginning of the 19th century. Thus, this study demonstrates that European/Syrian and Middle Eastern Jews represent a series of geographical isolates or clusters woven together by shared IBD genetic threads.

Link

December 28, 2009

Y chromosomes of Dagestan highlanders

Journal of Human Genetics 54, 689–694 (1 December 2009) | doi:10.1038/jhg.2009.94

The key role of patrilineal inheritance in shaping the genetic variation of Dagestan highlanders

Laura Caciagli

Abstract

The Caucasus region is a complex cultural and ethnic mosaic, comprising populations that speak Caucasian, Indo-European and Altaic languages. Isolated mountain villages (auls) in Dagestan still preserve high level of genetic and cultural diversity and have patriarchal societies with a long history of isolation. The aim of this study was to understand the genetic history of five Dagestan highland auls with distinct ethnic affiliation (Avars, Chechens-Akkins, Kubachians, Laks, Tabasarans) using markers on the male-specific region of the Y chromosome. The groups analyzed here are all Muslims but speak different languages all belonging to the Nakh-Dagestanian linguistic family. The results show that the Dagestan ethnic groups share a common Y-genetic background, with deep-rooted genealogies and rare alleles, dating back to an early phase in the post-glacial recolonization of Europe. Geography and stochastic factors, such as founder effect and long-term genetic drift, driven by the rigid structuring of societies in groups of patrilineal descent, most likely acted as mutually reinforcing key factors in determining the high degree of Y-genetic divergence among these ethnic groups.

Link

November 18, 2009

Y chromosomes of NE Portuguese Jews

American Journal of Physical Anthropology doi:10.1002/ajpa.21154

Phylogeographic analysis of paternal lineages in NE Portuguese Jewish communities

Inês Nogueiro et al.

Abstract

The establishment of Jewish communities in the territory of contemporary Portugal is archaeologically documented since the 3rd century CE, but their settlement in Trás-os-Montes (NE Portugal) has not been proved before the 12th century. The Decree of Expulsion followed by the establishment of the Inquisition, both around the beginning of the 16th century, accounted for a significant exodus, as well as the establishment of crypto-Jewish communities. Previous Y chromosome studies have shown that different Jewish communities share a common origin in the Near East, although they can be quite heterogeneous as a consequence of genetic drift and different levels of admixture with their respective host populations. To characterize the genetic composition of the Portuguese Jewish communities from Trás-os-Montes, we have examined 57 unrelated Jewish males, with a high-resolution Y-chromosome typing strategy, comprising 16 STRs and 23 SNPs. A high lineage diversity was found, at both haplotype and haplogroup levels (98.74 and 82.83%, respectively), demonstrating the absence of either strong drift or founder effects. A deeper and more detailed investigation is required to clarify how these communities avoided the expected inbreeding caused by over four centuries of religious repression. Concerning haplogroup lineages, we detected some admixture with the Western European non-Jewish populations (R1b1b2-M269, 28%), along with a strong ancestral component reflecting their origin in the Middle East [J1(xJ1a-M267), 12%; J2-M172, 25%; T-M70, 16%] and in consequence Trás-os-Montes Jews were found to be more closely related with other Jewish groups, rather than with the Portuguese non-Jewish population.

Link

October 16, 2009

The emergence and dispersal of haplogroup J-P58 (aka J1e)

The paper uses the evolutionary mutation rate, which, as I have argued elsewhere overestimates time to most recent ancestor (TMRCA) by about a factor of 3. The evolutionary mutation rate is appropriate for haplogroups subject to strong genetic drift that have not grown to large numbers, but it is completely inappropriate under conditions of strong population growth.

To make things concrete, according to the model of drift-induced variance reduction proposed by Zhivotovsky, Underhill, and Feldman (2006), in 10,000 years (or 400 generations), J-P58 should have grown to the grand number of 200 men, or at least five orders of magnitude lower than the actual present-day haplogroup size. To account for the observed J-P58 size of millions of men, strong growth over time is needed, and with either the Z.U.F. (2006) analysis or my own, strong growth results in an accumulation of variance at close to the germline mutation rate.

With that said, all ages in this paper should be divided by a factor of 3. This is not only theoretically sound, but harmonizes better with other lines of evidence.

The paper studies Y-STR variance in several Middle Eastern populations. The lack of samples from the Caucasus does not allow us to infer the levels of Y-STR variance in that region. Arabian J-P58 from Saudi Arabia, Qatar, and UAE are pooled, resulting in low mean Y-STR variance of 0.16. This low value stems primarily from Qatar and UAE as the Saudi Arabian J-P58 makes a very small contribution (4 examples) in the pooled sample.

Unfortunately the authors just missed the very recent paper on Arabian DNA by Abu-Amero et al., which shows that J-M267 variance is 0.27-0.29 in Yemen and Saudi Arabia, and much lower (0.16-0.19) in UAE and Qatar. This severely weakens the case for an expansion of J1 from the northern to the southern Levant, as it reveals that not only Oman and Yemen (mentioned in the paper), but also the geographically dominant Saudi Arabia is a region of high Y-STR diversity. Thus it is not the case that:
The timing and geographical distribution of J1e is representative of a demic expansion of agriculturalists and herder–hunters from thePre-Pottery Neolithic B to the late Neolithic era.24,26 The higher variances observed in Oman, Yemen and Ethiopia suggest either sampling variability and/or demographic complexity associated with multiple founders and multiple migrations.
But rather Oman, Yemen and Ethiopia are not atypical for the southern J1e range, which also includes Saudi Arabia as a region of high Y-STR variance. It is rather only the small gulf states of UAE and Qatar that have lower variance.

An interesting find, however, is the fact of high Y-STR variance (0.37, 0.43) in Alawites from Syria and Assyrians from Syria and Iraq. These populations have an impeccable Semitic historical record, and, in the case of the Assyrians are one of the few non-Arabic populations included in the study. It is also interesting that Assyrians are said to be derived from both Assyrian- and Aramaic-speaking ancestors, and hence to potentially have a complex (both East- and Northwest- Semitic) origin. These facts probably explain their high Y-STR variance.

Translated into non-"evolutionary" years, the expansion time of 16.2ky for Assyrians, becomes ~5.4ky. This age is in uncanny agreement with the recently estimate age of Semitic languages 5.75ky ago.

The authors of the current paper cite the above-mentioned linguistic work, but have trouble bridging the gap between their own "evolutionary" dates and the date for the breakup of Proto-Semitic:
A recent Bayesian analysis of Semitic languages supports an originin the Levant 5750 years ago and subsequent arrival in the Horn of Africa from Arabia 2800 years ago,11 thus providing an indirect support of our phylogenetic clock estimates. It is important to note that the glottochronological dates yield estimates for the break-up and expansion of the Proto-Semitic language. Proto-Semitic, itself, may have been spoken in a localized linguistic community for millennia before its bifurcation into the East and West Semitic branches.
If one rejects the "evolutionary" rate, there is no need to postulate that Proto-Semitic was spoken (but did not disperse) for millennia; indeed, a "static" Proto-Semitic/J-P58 community would be difficult to explain in view of the fact that mobile herding was their main economic activity. In my view, The J-P58 bearing Proto-Semites emerge in the 4th millennium BC out of a general J1 Middle Eastern background, just as their TMRCA suggests. They begin to expand at that time, and emerge in the historical record 1-2 thousand years later in both their Eastern (Akkadian) and, later, Western (Aramaic and Canaanite) forms.

The authors also cite their own work with respect to the correlation of J1 distribution with semi-arid environments in the Middle East and cite evidence to the effect that:
archeological studies have shown an early presence (ca. 6000–7000 BCE) of domesticated herding in the arid steppe desert regions
The presence of a large frequency of undifferentiated J*(xJ1, J2) chromosomes in Soqotra suggests that the Arabian peninsula possessed such chromosomes, which now have a marginal status throughout the Middle East. I propose that a the early steppe desert herders of 6000-7000BC possessed J* chromosomes, that J1 arose in the Middle East, and its subclade J-P58 experienced rapid growth associated with the breakup and expansion of Semitic languages in the 4th millennium BC.

In conclusion: this paper gives us important new data on the origin and expansion of Y-chromosome J-P58, and strengthens the case that this haplogroup may be a diagnostic marker of the Proto-Semitic population of the Near East.

Related:

European Journal of Human Genetics doi: 10.1038/ejhg.2009.166

The emergence of Y-chromosome haplogroup J1e among Arabic-speaking populations

Jacques Chiaroni et al.

Abstract

Haplogroup J1 is a prevalent Y-chromosome lineage within the Near East. We report the frequency and YSTR diversity data for its major sub-clade (J1e). The overall expansion time estimated from 453 chromosomes is 10 000 years. Moreover, the previously described J1 (DYS388=13) chromosomes, frequently found in the Caucasus and eastern Anatolian populations, were ancestral to J1e and displayed an expansion time of 9000 years. For J1e, the Zagros/Taurus mountain region displays the highest haplotype diversity, although the J1e frequency increases toward the peripheral Arabian Peninsula. The southerly pattern of decreasing expansion time estimates is consistent with the serial drift and founder effect processes. The first such migration is predicted to have occurred at the onset of the Neolithic, and accordingly J1e parallels the establishment of rain-fed agriculture and semi-nomadic herders throughout the Fertile Crescent. Subsequently, J1e lineages might have been involved in episodes of the expansion of pastoralists into arid habitats coinciding with the spread of Arabic and other Semitic-speaking populations.

Link