Showing posts with label Genographic. Show all posts
Showing posts with label Genographic. Show all posts

January 31, 2013

Y chromosome and mtDNA study of modern Middle Eastern populations (Badro et al. 2013)

I will just briefly comment on the occurrence of L3* mtDNA in the Near East. This is a critical haplogroup because of its age of ~70ky. If all L3* in the Near East represents African migrants, then only the M and N macrogroups appeared in Eurasia, and a good case can be made for a "late" OoA event.

On the other hand, it is quite possible that some of the L3* in the Near East does not represent recent admixture, but rather native forms of L3 with deep ancestry in the region. If that is the case, then the Near East will emerge as the origin of L3, with M, N representing Out-of-Near East-into-Eurasia founders, and the various L3*(xM, N) representing Out-of-Near East-into-Africa founders.

It is difficult to say at present what will turn out to be the case. Ancient DNA has the potential of resolving this issue, because if L3*(xM, N) in Eurasia is really recent (e.g., associated with Islamic/Arab dispersals spanning Africa and Eurasia), then it ought to be missing from the earliest genetic layers.

Also of interest the geographical distribution of Y-haplogroups; nothing much new here, but still useful as a reference:





PLoS ONE 8(1): e54616. doi:10.1371/journal.pone.0054616

Y-Chromosome and mtDNA Genetics Reveal Significant Contrasts in Affinities of Modern Middle Eastern Populations with European and African Populations 

Danielle A. Badro et al.

The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies have examined the mtDNA and Y-chromosome contrast in focal populations in the Middle East, none have undertaken a broad-spectrum survey including North and sub-Saharan Africa, Europe, and Middle Eastern populations. In this study 5,174 mtDNA and 4,658 Y-chromosome samples were investigated using PCA, MDS, mean-linkage clustering, AMOVA, and Fisher exact tests of FST's, RST's, and haplogroup frequencies. Geographic differentiation in affinities of Middle Eastern populations with Africa and Europe showed distinct contrasts between mtDNA and Y-chromosome data. Specifically, Lebanon's mtDNA shows a very strong association to Europe, while Yemen shows very strong affinity with Egypt and North and East Africa. Previous Y-chromosome results showed a Levantine coastal-inland contrast marked by J1 and J2, and a very strong North African component was evident throughout the Middle East. Neither of these patterns were observed in the mtDNA. While J2 has penetrated into Europe, the pattern of Y-chromosome diversity in Lebanon does not show the widespread affinities with Europe indicated by the mtDNA data. Lastly, while each population shows evidence of connections with expansions that now define the Middle East, Africa, and Europe, many of the populations in the Middle East show distinctive mtDNA and Y-haplogroup characteristics that indicate long standing settlement with relatively little impact from and movement into other populations.

December 18, 2012

Genographic GenoChip paper (Elhaik et al. 2012)

... has been posted on the arXiv. I don't have time to comment on it at the moment, and any further thoughts will be posted as an update here. By the way, thanks to the authors for putting me in the acknowledgements section :)

On a related note, I have released a patch for Geno 2.0 data so that they can be used with my DIYDodecad tools. I have converted 3-4 files already using it, so it seems to work fine, but in one file there was a problem because there were a lot of manual line breaks; not sure if this is a general problem or it was caused by the submitter re-saving the file, but if you encounter it, you might want to try saving your .csv file in Unix file format, or using dos2unix to fix it.

arXiv:1212.4116 [q-bio.PE]

The GenoChip: A New Tool for Genetic Anthropology

Eran Elhaik et al.

The Genographic Project is an international effort using genetic data to chart human migratory history. The project is non-profit and non-medical, and through its Legacy Fund supports locally led efforts to preserve indigenous and traditional cultures. In its second phase, the project is focusing on markers from across the entire genome to obtain a more complete understanding of human genetic variation. Although many commercial arrays exist for genome-wide SNP genotyping, they were designed for medical genetic studies and contain medically related markers that are not appropriate for global population genetic studies. GenoChip, the Genographic Project's new genotyping array, was designed to resolve these issues and enable higher-resolution research into outstanding questions in genetic anthropology. We developed novel methods to identify AIMs and genomic regions that may be enriched with alleles shared with ancestral hominins. Overall, we collected and ascertained AIMs from over 450 populations. Containing an unprecedented number of Y-chromosomal and mtDNA SNPs and over 130,000 SNPs from the autosomes and X-chromosome, the chip was carefully vetted to avoid inclusion of medically relevant markers. The GenoChip results were successfully validated. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays for three continental populations. While all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. The GenoChip is a dedicated genotyping platform for genetic anthropology and promises to be the most powerful tool available for assessing population structure and migration history.

Link

November 30, 2012

Using Genographic 2.0 data with DIYDodecad

I have released a converter for Genographic 2.0 data at the Dodecad blog. This will allow you to use DIYDodecad with your Genographic 2.0 raw data download.

November 29, 2012

South Indian Y chromosomes (+ a little complaining about methods)

The table of haplogroup frequencies (left) may prove quite useful, but I am fairly disappointed with what appears to be the state of the art in recent published research on Y chromosome variation. This is not to belittle the tremendous amount of labor and money needed to collect and genotype large representative samples of individuals; only to express hope that better use of the collected samples could be achieved.

First of all, it is inconceivable to me how scientists can continue to use the 3x slower "evolutionary mutation rate" for their analyses of Y-chromosome ages on the basis of Y-STR markers. I have done my small part in my Y-STR series to show that this mutation rate is applicable only for a rather specific demographic history, and completely unsuitable to real growing human populations where Y-STR variance accumulates at close to the genealogical rate. And, my observations merely elaborated quantitatively what was already present in Zhivotovsky et al. (2006) but has been completely ignored since:
In simulations of a neutral process with average rate of increase m = 1, the number of surviving haplogroups rapidly decreased with time and corresponded well with the theory of mutant survival (Li 1955, p. 242), and the average size of the surviving haplogroups increased each generation by a value rapidly approaching 0.5 (data not shown), which agrees with asymptotic fraction of 2/t of haplotypes that survive at generation t (Athreya and Ney 1972, p. 19). The accumulated variance increased almost linearly (fig. 1), at a rate of increase about 0.00028 per generation; that is, the actual rate of accumulation microsatellite variation was about 3.6 times less than that predicted from the germ line mutation rate. This corresponds perfectly to the 3- to 4-fold difference observed between germ line and evolutionarily effective mutation rate.
The issue is all but resolved in the amateur "genetic genealogy" community, but even professional geneticists often use either genealogical or evolutionary rate, or take an agnostic stance by reporting results based on both rates. To arrive at strong conclusions about a topic on the basis of a mutation rate that is, to say the least, controversial, without even acknowledging the existence of a controversy is unsatisfactory. Y-chromosome researchers ought to copy the attitude of those working with autosomal DNA, where a corresponding mutation rate controversy was not swept under the carpet, but acknowledged (e.g., in the recent Meyer et al. high-coverage Denisova paper), with the implications of the uncertainty during the present "transitional" period quantified in the form of wider confidence intervals.

This "mutation rate" issue  notwithstanding, it was also recently shown that by Busby et al. that Y-STR based estimates have a dependence on the set of Y-STRs used, with markers exhibiting linear behavior across different time spans. This does not invalidate their use as molecular clocks, but highlights the need to not only select a bunch of Y-STRs, but also either (i) demonstrate that the selected set exhibits linear behavior for the time span of interest, or (ii) correct for deviations from linearity. Again, this type of modelling of microsatellite behavior was recently achieved for autosomal STRs by Sun et al.  Note that such deviations result in a slower rate than the genealogical one, but the mechanism whereby this is produced is completely different than the one proposed by Zhivotovsky et al.: it is not drift in a non-growing (m=1) population that reduces the effective rate, but rather "saturation" of the mutation process, whereby the variance at fast-mutating markers grows sub-linearly with time, because of physical constraints on their possible range of values.

I don't hope that Y-STR based age estimation will have much to offer in the coming years. But the third set of the 1000 Genomes Project is on its way, and this will include a variety of South Asian samples. Very soon we will be in a good position to study the time depth of common ancestry between e.g., European and South Asian Y-chromosomes within various haplogroups using point mutations, and these are not plagued by many of the problems associated with Y-STR variation and its interpretation.

Finally, I can't help but notice that this paper has not acknowledged the tremendous progress in resolving the Y chromosome phylogeny done by non-academic researchers. With the current state of our knowledge, the claim that haplogroup R1a1 is "autochthonous" in India is not tenable. Even if one discounts all the evidence made by SNP discoveries in the commercial testing world (and why should they?), finer-scale structure within this haplogroup has now been officially published and appears to be inconsistent with a South Asian origin of this haplogroup.

Certainly, not all is resolved; for example, the representation of tribal populations in commercial DNA testing is almost non-existent, and a sampling of their Y-SNP diversity is urgently needed. A very useful paradigm of research is that of recent work on the most basal clade of the Y-chromosome phylogeny (A00) in which the identification of very unique Y-chromosomes by genetic genealogists was combined with academic samples of "indigenous" peoples to produce new knowledge.

Much of population genetic research will benefit from such consilience between academics and amateurs. This is not an idle hope, but a recognition that this field is one in which the public not only has a substantial interest but can also do something about it. Many might be interested in Mars exploration, but without Elon Musk's bank account, most are consigned to being consumers of information about the Red Planet. Hopefully, better ways of combining the efforts of research scientists and the educated public can be identified and used in the near future.

PLoS ONE 7(11): e50269. doi:10.1371/journal.pone.0050269

Population Differentiation of Southern Indian Male Lineages Correlates with Agricultural Expansions Predating the Caste System

GaneshPrasad ArunKumar et al.

Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that both Y chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10–30 Kya), suggesting that more recent Holocene migrations from western Eurasia contributed less than 20% of the male lineages. We found strong evidence for genetic structure, associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was established 4–6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna (caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation were best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.

Link

July 26, 2012

A look at Y chromosomes of Romania via Count Dracula

In short: researchers tried to see whether they could identify a specific Y chromosome lineage associated with the House of Basarab in Romania, the most famous member of which is Vlad the Impaler, an inspiration for the mythical Count Dracula. To do this, they tested Basarab-surnamed individuals, as well as the general Romanian population.

The whole exercise was, in a sense, a failure, since it neither disclosed a Basarab-specific lineage, nor resolved the historical question about the origin of the House of Basarab (Vlach or Cuman). But, it gave us some wonderful new data on Romania that is, of course, quite welcome.

This seems like a good candidate for a future ancient DNA study, assuming of course, that Vlad and his family are still in their final resting place, and there are brave enough researchers to disturb them (j/k).

On a more serious note, the authors correctly state that even if the Basarab house was originally Turkic, they could still have carried West Eurasian chromosomes, since incoming Turkic groups in Europe were not purely Mongoloid like their more remote ancestors. On the other hand, I note that most of the Basarab-surnamed individuals belonged to E-V13, I-P37.2, J-M241 all of which are almost certainly native Romanian. If one of them carries the original chromosome, then the odds are in favor of a Romanian origin, although nothing short of ancient DNA work can resolve the issue, assuming that's possible.

Table S1 contains the new Romanian data, and Table S2 data from surrounding populations (Hungary, Bulgaria, Ukraine).

PLoS ONE 7(7): e41803. doi:10.1371/journal.pone.0041803

Y-Chromosome Analysis in Individuals Bearing the Basarab Name of the First Dynasty of Wallachian Kings

Begoña Martinez-Cruz et al.

Vlad III The Impaler, also known as Dracula, descended from the dynasty of Basarab, the first rulers of independent Wallachia, in present Romania. Whether this dynasty is of Cuman (an admixed Turkic people that reached Wallachia from the East in the 11th century) or of local Romanian (Vlach) origin is debated among historians. Earlier studies have demonstrated the value of investigating the Y chromosome of men bearing a historical name, in order to identify their genetic origin. We sampled 29 Romanian men carrying the surname Basarab, in addition to four Romanian populations (from counties Dolj, N = 38; Mehedinti, N = 11; Cluj, N = 50; and Brasov, N = 50), and compared the data with the surrounding populations. We typed 131 SNPs and 19 STRs in the non-recombinant part of the Y-chromosome in all the individuals. We computed a PCA to situate the Basarab individuals in the context of Romania and its neighboring populations. Different Y-chromosome haplogroups were found within the individuals bearing the Basarab name. All haplogroups are common in Romania and other Central and Eastern European populations. In a PCA, the Basarab group clusters within other Romanian populations. We found several clusters of Basarab individuals having a common ancestor within the period of the last 600 years. The diversity of haplogroups found shows that not all individuals carrying the surname Basarab can be direct biological descendants of the Basarab dynasty. The absence of Eastern Asian lineages in the Basarab men can be interpreted as a lack of evidence for a Cuman origin of the Basarab dynasty, although it cannot be positively ruled out. It can be therefore concluded that the Basarab dynasty was successful in spreading its name beyond the spread of its genes.

July 25, 2012

Genographic 2.0 launched

The Genographic Project at National Geographic has announced their new Genographic 2.0 test. This is a new test which tests about 150,000 SNPs and gives you information about your ancestry:
  • Ancestral breakdown (admixture)
  • Hominid (Neandertal/Denisova) admixture
  • Y chromosome haplogroup (apparently on the order of 10,000 SNPs)
  • mtDNA haplogroup
I participated in a pre-launch online presentation of the test about a month ago, and it seems that the creators of the test have paid some thought into identifying their set of SNPs. The number of SNPs is about right for ancestry comparisons, but it will be interesting to see how many of them intersect the many publicly available datasets that already exist.

If you take the test and receive your raw data, drop me a line -but don't send me the data, right away!- because I would be interested in seeing the format in which the data can be downloaded, for possible inclusion of Geno 2.0 data in my own Dodecad Project. It might be a good idea for a technical description of the new array to be posted on the website.

Overall, it is a great idea to update the Genographic test that was previously based only on Y chromosomes and mtDNA, and I will be following any further developments closely. Your Genetic Genalogist and The Genetic Genealogist have many more details on this.

(I do not personally endorse any particular testing company or product).

March 28, 2012

A rare look at the Y chromosomes of Afghanistan

I often bemoan the fact that some of the regions of the world that are most interesting to the student of prehistory (e.g., Mesopotamia and the Iranian Plateau) seem to also be the ones with more than their fair share of political trouble, hindering efforts to study them with the newest set of tools. Afghanistan is certainly one case that hasn't been quite the most welcoming of places in recent decades.

The country is transitional between the Iranic speaking world of Iran and the Indo-Aryan speaking world of South Asia, as well as between the Indo-Iranian world and the (mostly) Turkic-speaking world of Central Asia. Hence, the absence of data for that country has been acutely felt for all those who are trying to understand "what happened" in Eurasia.

The appearance of a new paper by the Genographic Project is a welcome sight, and a good example of what is best about this Project. I haven't been exactly a fan of the Genographic's interpretation of their own data, but kudos to them for getting them in the first place.

From the paper:
Pashtuns are the largest ethnic group in Afghanistan, accounting for about 42 percent of the population, with Tajiks (27%), Hazaras (9%), Uzbeks (9%), Aimaqs (4%), Turkmen people (3%), Baluch (2%), and other groups (4%) making up the remainder [6]. In the present study, eight ethnic groups were examined, with a focus on the largest four groups: - The Pashtuns, traditionally lived a seminomadic lifestyle, they reside mainly in southern and eastern Afghanistan and in western Pakistan. They speak Pashto which is a member of the Eastern Iranian languages. - The Tajiks are a Persian-speaking ethnic group which are closely related to the Persians of Iran. In Afghanistan, they are the largest Tajik population outside their homeland to the north in Tajikistan. - The Hazara population speaks Persian with some Mongolian words. They believe they are descendants of Genghis Khan's army that invaded during the twelfth century. - The Uzbeks are a Turkic speaking group that have been living a sedentary farming lifestyle in Northern Afghanistan.
The main features of the Y-chromosome gene pool:
Genotyping revealed 32 halpogroups present in Afghanistan's ethnic groups among our samples. Haplogroups R1a1a-M17, C3-M217, J2-M172, and L-M20 were the most frequent when Afghan ethnic groups were pooled, together comprising >66% of the chromosomes. Absolute and relative haplogroup frequencies are tabulated in Table S4.
-The PCA analysis (left) showcases wonderfully the correspondence between different haplogroups and the three main regions of the Near East (green), South Asia (yellow), and Central Asia (purple).

It is a real shame that the newer markers available within the most prominent R-M17 haplogroup were not tested:
The prevailing Y-chromosome lineage in Pashtun and Tajik (R1a1a-M17), has the highest observed diversity among populations of the Indus Valley [46]. R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where the mid-Holocene R1a1a7-M458 sublineage is dominant [46]. R1a1a7-M458 was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as previously thought [47], expansions from the Pontic Steppe [3], bringing the Indo-European languages to Central Asia and India.
Nonetheless, I can't really disagree with the dismissal of the R-M17/Indo-European theory. R-M17 is simply too populous in South Asia to be the genetic legacy of "Indo-Europeans": (i) under an elite-dominance model, its frequency is way too high (compared to well-attested examples of elite dominance, e.g., Hungary or Turkey where the genetic legacy of the elite element is in the minority), (ii) under a folk migration model, it is difficult to understand why a hypothetical migrating Indo-European people would have such an overwhelming influence in the region while at the same time hardly influencing at all other densely occupied agricultural landscapes of the Eurasian steppe periphery; moreover, no autosomal signal corresponding to a migration from eastern Europe to South Asia really exists -the main cline of variation links South with West Asia, not Europe- and the small signal that does exist does not really correspond to observed levels of R-M17.

From the paper:
The E1b1b1-M35 lineages in some Pakistani Pashtun were previously traced to a Greek origin brought by Alexander's invasions [48]. However, RM network of E1b1b1-M35 found that Afghanistan's lineages are correlated with Middle Easterners and Iranians but not with populations from the Balkans.
Greek populations are not homogeneous in their haplogroup E frequencies, so it would be useful to consider the possibility that the lack of this frequent Southeastern European haplogroup in South Asia may not reflect a complete lack of Greek influence in this region, but rather, an influence from a structured ancient Greek population.

Looking at the Y-haplogroup composition:

A few points of interest:

  • The clear link between C/N/O with Central Asia
  • A clear difference between Persian and Pashto speakers in terms of inverse J2a/R1a frequences
  • The paucity of J1 chromosomes (only 1 Tajik) testifies to the absence of relatively recent Middle Eastern influences associated with the spread of Islam; consistent with the absence of the autosomal "Southwest Asian" component in South/Central Asia.
  • Paucity of R1b, except in a couple Uzbeks and a Tajik; I have argued before that R1a had an early distribution in the arc of flatlands north and east of the Caspian, while R1b a complementary distribution in the smaller arc of the highlands west and south of it, out of which the Tocharians may have originated.
  • The small Nurestani sample comprises of J2a, R1a, and R2; these are linguistic relatives of the Kalash of Pakistan who -unlike the latter- were converted to Islam in the 19th century.
I would say that the evidence is pretty clear that the earliest Iranians may have included haplogroups R1a and J2, although I would not wager on their relative proportions and overall contribution to modern Iranian-speaking populations. For whatever reason, it seems that Kurds and Persians ended up with a J2-over-R1a advantage, while Pathans and (plausibly) Turkified Central Asian former Iranian speakers with the reverse. Nonetheless, the occurrence of both haplogroups in most Iranian groups, as well as in most Indo-Aryan ones is quite telling. It is unfortunate that the relationships between these Y chromosomes (still J2a*! six years after Sengupta et al.) and their West Eurasian brethren was not further pursued.

Hopefully, the data can be re-used down the road once the phylogeny of different haplogroups (and R1a in particular) is better understood. As I've stated before on this blog, I take Y-STR based age estimates with a huge grain of salt, so I would not put much faith in any of the ones presented in this paper.

Related: Firasat et al. (2006), Y-chromosomes of Afghanistan, Lashgary et al. (2011), Regueiro et al. (2006).

PLoS ONE doi:10.1371/journal.pone.0034288


Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events

Marc Haber et al.

Abstract


Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.

Link

March 13, 2012

Pre-Roman genetic structure has persisted in modern Basque populations

This is a fairly interesting study that paints a picture of continuity of genetic structure among Basques since pre-Roman times. I am not sufficiently familiar with either Basque history or geography to comment on this in detail, but the central conclusion that Basques differ from their neighbors in being more isolated and less cosmopolitan is something that I have also noticed in my own experiments (see for example the K12b population portraits, contrasting French_Basque and Pais_Vasco_1KG with other Iberian/French populations).

For those who know more, does the following scheme make sense?


Y-haplogroup frequencies, showing a preponderance of R-M269 related lineages and a strong showing of the the I-M26 lineage are shown below. The latter links Basques with Sardinians, as well as probably with Neolithic France.



Codes (from the paper): BIG, Bigorre; BEA, Béarn; CHA, Chalosse; ZMI, Lapurdi/Baztan; NLA,Lapurdi Nafarroa; SOU, Zuberoa; RON, Roncal and Salazar valleys; NCO, CentralWestern Nafarroa; NNO, North Western Nafarroa; GUI, Gipuzkoa; GSO, SouthWestern Gipuzkoa; ALA, Araba; BBA, Bizkaia; BOC, Western Bizkaia; CAN,Cantabria; BUR, Burgos; RIO, La Rioja; NAR, North Aragon.

The picture of continuity is further strengthened by ancient Basque Y-chromosomes, showing the same picture of R1b-majority/I minority as today. What we really need now is to bridge the gap between late antiquity and the Neolithic, and beyond to better understand the temporal sequence of settlement.

Mol Biol Evol (2012)doi: 10.1093/molbev/mss091

Evidence of pre-Roman tribal genetic structure in Basques from uniparentally inherited markers

Begoña Martínez-Cruz et al.

Basque people have received considerable attention from anthropologists, geneticists and linguists during the last century due to the singularity of their language and to other cultural and biological characteristics. Despite the multidisciplinary efforts performed to address the questions of the origin, uniqueness and heterogeneity of Basques, the genetic studies performed up to now have suffered from a weak study-design where populations are not analyzed in an adequate geographic and population context. To address the former questions and to overcome these design limitations, we have analyzed the uniparentally inherited markers (Y chromosome and mitochondrial DNA) of ∼900 individuals from 18 populations, including those where Basque is currently spoken and populations from adjacent regions where Basque might have been spoken in historical times. Our results indicate that Basque-speaking populations fall within the genetic Western European gene pool and they are similar to geographically surrounding non-Basque populations, and also that their genetic uniqueness is based on a lower amount of external influences compared to other Iberians and French populations. Our data suggest that the genetic heterogeneity and structure observed in the Basque region results from pre-Roman tribal structure related to geography and might be linked to the increased complexity of emerging societies during the Bronze Age. The rough overlap of the pre-Roman tribe location and the current dialect limits supports the notion that the environmental diversity in the region has played a recurrent role in cultural differentiation and ethnogenesis at different time periods.

Link

February 25, 2012

Pre-Neolithic Basque mtDNA gene pool (?)

I haven't read the paper, but I'm unconvinced that an 8,000 YBP estimate of separation (even if it were made with the precision of an atomic clock) "clearly supports" genetic continuity with the Paleolithic/Mesolithic settlers of the Franco-Cantabrian region.

Not only because the date in question is within a millennium or two of the arrival of the Neolithic in Iberia, one of those "curious coincidences."

Much more importantly, because the wisdom of Barbujani continues to be ignored when it comes to tying genetic age estimates to archaeology.

The American Journal of Human Genetics, 23 February 2012 doi:10.1016/j.ajhg.2012.01.002

The Basque Paradigm: Genetic Evidence of a Maternal Continuity in the Franco-Cantabrian Region since Pre-Neolithic Times

Doron M. Behar et al.

Different lines of evidence point to the resettlement of much of western and central Europe by populations from the Franco-Cantabrian region during the Late Glacial and Postglacial periods. In this context, the study of the genetic diversity of contemporary Basques, a population located at the epicenter of the Franco-Cantabrian region, is particularly useful because they speak a non-Indo-European language that is considered to be a linguistic isolate. In contrast with genome-wide analysis and Y chromosome data, where the problem of poor time estimates remains, a new timescale has been established for the human mtDNA and makes this genome the most informative marker for studying European prehistory. Here, we aim to increase knowledge of the origins of the Basque people and, more generally, of the role of the Franco-Cantabrian refuge in the postglacial repopulation of Europe. We thus characterize the maternal ancestry of 908 Basque and non-Basque individuals from the Basque Country and immediate adjacent regions and, by sequencing 420 complete mtDNA genomes, we focused on haplogroup H. We identified six mtDNA haplogroups, H1j1, H1t1, H2a5a1, H1av1, H3c2a, and H1e1a1, which are autochthonous to the Franco-Cantabrian region and, more specifically, to Basque-speaking populations. We detected signals of the expansion of these haplogroups at ∼4,000 years before present (YBP) and estimated their separation from the pan-European gene pool at ∼8,000 YBP, antedating the Indo-European arrival to the region. Our results clearly support the hypothesis of a partial genetic continuity of contemporary Basques with the preceding Paleolithic/Mesolithic settlers of their homeland.

Link

May 15, 2011

Genes and Languages in the Caucasus

If there was ever a paper that was the equivalent of a box of candy, this is probably it. I will update this post with my comments.

UPDATE I (Genealogical rate, Gene-language concordance, Ossetes): I seriously don't know where to begin with this paper. So, given the serendipitous appearance of an abstract on Y-chromosome mutation rates, here is a major new pro-genealogical rate quote from the new paper:
We found that “evolutionary” estimates of most clusters fall far outside the range of the respective linguistic dates, while “genealogical” estimates gave a good fit with the linguistic 23 dates. At least two population events in the Caucasus are documented archaeologically, which allows additional comparison with these “historical” dates. In both cases, the historical (archaeological) date is similar to a genetic estimate based on the “genealogical” mutation rate (Supplementary Note 2).
And, here's a comparison of the linguistic and genetic (based on Y-chromosomes) trees from the paper:
The correspondence seems remarkable; the only major discrepancy is for Iranic (Indo-European) Ossetes who group with NW Caucasians genetically, which makes sense as the Ossetes are probably to a large extent NW Caucasians that underwent a language shift at the influence of the Alans.

Speaking of the Ossetes, their negligible R1a1-M198 frequency (0.4-0.8%) should be a warning that Iranic steppe nomads _does not equal_ R1a1. While a limited contribution of Alans to the Ossetes is expected, it is not expected that Ossetes will have two of the lowest M198 frequencies in the Caucassus: in all probability R1a1 was not particularly important among Alans, and, by implication (?) Sarmatians.

UPDATE II (4 haplogroups for 4 language families):

The most interesting discovery in this paper is, of course, the correspondence between Y-chromosome haplogroups and language groups, thanks to the very large number of individuals tested and the deep phylogenetic resolution of the haplogroups:
Overall, the most frequent haplogroups in the Caucasus were G2a3b1-P303 (12%), G2a1a-P18 (8%), J1*-M267(xP58) (34%), and J2a4b*-M67(xM92) (21%), which together encompassed 73% of the Y chromosomes, while the other 24 haplogroups identified in our study comprise the remaining 27% (Table 2). ... haplogroup G2a3b1-P303 comprised at least 21% (and up to 86%) of the Y chromosomes in the Shapsug, Abkhaz and Circassians ... haplogroup G2a1a-P18 comprised at least 56% (and up to 73%) of the Digorians and Ironians (both from the Central Caucasus Iranic linguistic group), while not being found at more than 12% (average 3%) in other populations... haplogroup J2a4b*-M67(xM92) comprised 51-79% of the Y chromosomes in the Ingush and three Chechen populations (North-East Caucasus, Nakh linguistic group), while, in the rest of the Caucasus, its frequency was not higher than 9% (average 3%) ... haplogroup J1*-M267(xP58) comprised 44-99% of the Avar, Dargins, Kaitak, Kubachi, and Lezghins (South-East Caucasus, Dagestan linguistic group) but was less than 25% in Nakh populations and less than 5% in the rest of Caucasus.

Interestingly, G2a3 is one of the lineages of early Central European farmers, and 2 medieval German knights. G2 is also, curiously, one of the West Eurasian lineages that are found in very small quantities in India, especially among upper caste Hindus. We are beginning to make connections across space and time, even though the patterns are far from clear yet.

The prevalence of J1*-M267(xP58) in Dagestan is well known (or suspected) from previous studies. Notice that J-P58, if we use the genealogical rate has an age of ~5.4ky in Semitic groups, and this is in concordance with the 5,750 years ago origin of Semitic languages based on Bayesian phylogenetics. So, it is clear that part of haplogroup J1 was prevalent in ancient Semitic groups, another, disjoint part in ancient Dagestani groups.

To make things more interesting, the Nakh groups (Ingush and Chechens) have J2a4b*-M67(xM92) as their modal haplogroup. Nakh is also a Northeast Caucasian language subfamily, like Dagestani, and indeed NE Caucasian is also called Nakho-Daghestanian. What did the early speakers of this family look like?

It would be tempting to think that Proto-Nakho-Dagestanians were J1-dominated, as J1 exists in both Nakh (16-25%) and Dagestani (58-99%) groups, whereas J2a4b-M67 (the Nakh modal haplogroup) is nearly completely absent in Dagestanians.

UPDATE III (No European influence):

Another interesting discovery of this study is the lack of European influence in the populations of the North Caucasus.
It seems that both R1a1a-M198 and I2a-P37 have a major barrier eastward in the Don river. Please note that the former is not strictly a European haplogroup, but it nonetheless experiences a massive drop in frequency, and is negligible everywhere except in Abkhaz-Circassians (NW Caucasus; 10.3-19.7%), with an outlier in Dargins (22%).

This seems to put a limit on the origin of any hypothetical movements across the Eurasian steppe east of the Don river, as haplogroup I2a-P37 is largely absent in Central Asia, and occurs 3 times in 1,525 individuals in this sample. So, while there have been proposals of a Central European origin of some steppe pastoralist groups, these are hard to reconcile with this picture.

UPDATE IV (Haplogroup G):

Two of the modal haplogroups in this paper are G2a1a-P18 (Iranic, 56-73%) and G2a3b1-P303 (NW Caucasians, 21-86%). Battaglia et al. (2008) also found a high frequency of G2a* in Georgians and Balkars (~30%, also modal in both populations). It appears that G2a is a mainly West (both NW and SW) Caucasian phenomenon within the context of this region.

UPDATE V (Starostin and Language depth)

The authors applied the methodology of the late Sergei Starostin to the problem of language time depth:
The present work employs Starostin’s methodology, and we made special efforts to create the high-quality linguistic databases required for this analysis. Thus, based on significantly extended and revised linguistic databases, we have applied a glotto-chronological approach to the North Caucasian languages. As a result, our study provides a unique opportunity to make direct comparisons of linguistic and genetic data from the same populations. Lexico-statistical methods have also been applied to a number of language families using a Bayesian approach to increase the statistical robustness of language classification (Gray and Atkinson, 2003; Kitchen et al., 2009; Greenhill et al., 2010). Using these methods with the Caucasus languages under
study here will be the focus of future work.
It will certainly be interesting to see Bayesian phylogenetic methods applied to the Caucasus languages in the future, using the linguistic datasets developed here. The concordance of genetic-linguistic results in this paper, in addition to the many successes of the G&A approach, is making it increasingly difficult for those who doubt our ability to estimate the age of language families in a manner similar to that with which biologists estimate the age of genetic variation.

See also Tower of Babel project and the Evolution of Human Languages project at the Santa Fe Institute.

UPDATE VI (Haplogroup J2a)

I have recently speculated about a possible link between the Caucasus region and India based on the appearance of a "Dagestan" component in India, the clear West Asian origin of Ancestral North Indians, as well as a possible linguistic link between Northeast Caucasian, Hurrian, and Indo-European.

A problem with that theory is that the high J1*(xP58) frequency in Dagestan has no counterpart in South Asia. The current study, however, adds data on the Nakh part of the Nakho-Dagestanian (Northeast Caucasian) family, showing this to be J2a4b-M67 dominated. So, while I think that J1*(xP58) may have been present among Proto-Northeast Caucasians, these must have interacted with J2a folk.

J-M67 is clearly intrusive into the Central Caucasus, from the South where a much greater variety of J2a-related lineages is observed among Armenians, North Iranians, and Anatolian Turks.

We now have good coverage of J2a in the entirety of the West Asian region, with the exception of Azerbaijan, and a few patterns are beginning to emerge:
  1. The center of the J2a world is somewhere between eastern Turkey, Armenia, Azerbaijan, Iran, and Syria
  2. The Caucasus is a northern extension of this world, just as Greece and Italy are its main western extensions, with a strong extension into Central Asia as far as Xinjiang, and well into South Asia all the way to upper caste South Indian Hindus.
  3. In the Caucasus itself J-M67 is dominating Nakh speakers, but with little other J2a related variation.
  4. In comparison to Nakhs, J2a seems more varied in Georgians, among Ossetes, and among NW Caucasian speakers
It is hard to make any pronouncements on how J2a spread northwards from its Transcaucasian cradle, but I would think that the Kura-Araxes and Maikop cultures are fairly good candidates for that spread, with the former being J2a dominated, and the latter being more G2a dominated. I would not, however, dismiss a more recent spread of J2a into the region.

UPDATE VII (Absence of E1b1b1):

This haplogroup has a more Mediterranean distribution and is conspicuously absent in the North Caucasus. Unfortunately no downstream markers were typed, but (a) its presence in small amounts in NW Caucasians (1-1.7%) together with a similar low frequency (1.5%) in Georgians, (b) its absolute absence among Nakho-Dagestanians, except for one Lezghin, suggest to me that it arrived to the region from the west, and is probably a low-frequency trace of Ancient Greek colonies of the Black Sea, just as it is associated with Greek colonists in the West Mediterranean and Sicily.

UPDATE VIII (Haplogroups L and T):

There is a little haplogroup L in the North Caucasus. L-M27 and L-M317 seems concentrated in the Northwest, while L-M357 is found only in Nakh speakers. The detection of L-M357 in North but not South Iran may be related with this population, and also the L-rich population of Syria, especially from the eastern inland area.

Haplogroup T has been the subject of a major recent paper. In this region, it is found in 2 NW Caucasians, 1 Ossete and a couple of Lezgins, but unfortunately with no fine phylogenetic resolution.

Mol Biol Evol (2011) doi: 10.1093/molbev/msr126

Parallel Evolution of Genes and Languages in the Caucasus Region

Oleg Balanovsky1,2,*, Khadizhat Dibirova1,*, Anna Dybo3, Oleg Mudrak4, Svetlana Frolova1, Elvira Pocheshkhova5, Marc Haber6, Daniel Platt7, Theodore Schurr8, Wolfgang Haak9, Marina Kuznetsova1, Magomed Radzhabov1, Olga Balaganskaya1,2, Alexey Romanov1, Tatiana Zakharova1, David F. Soria Hernanz10,11, Pierre Zalloua6, Sergey Koshel12, Merritt Ruhlen13, Colin Renfrew14, R. Spencer Wells10, Chris Tyler-Smith15, Elena Balanovska1 and The Genographic Consortium16

We analyzed 40 SNP and 19 STR Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees, and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language co-evolution occurred within geographically isolated populations, probably due to its mountainous terrain.

Link

November 02, 2010

ADMIXTURE analysis of Spencer Wells

One of the people who've entrusted me with their DNA for analysis in the Dodecad Project is none other than the Genographic Project's Spencer Wells. His project ID is DOD162, and he is the very last individual to be included in the project's pilot phase.

I've enjoyed watching the Genographic Project's various documentaries and reading their published articles, often commenting on them in this blog, so it's a nice opportunity to give something back to the leader of one of the few organizations that is really helping advance our knowledge of human origins.


Here are the results of the admixture analysis: Spencer Wells is in the first bar, and admixture proportions of the 10 components I am using are color-coded for both himself and 36 other populations.

His results are uneventful: his bar is very similar to the one next to it, which summarizes the admixture proportions of 25 individuals from the HapMap-3 CEU population. His top component is North European (60.6%) as his appearance and Northwest European ancestry would suggest. Next is South European (24.9%) and West Asian (13.5%). Rounding up his results is a small slice (1%) of Southwest Asian.

Below you can see Spencer followed by the 25 CEU individuals; these are American Whites from Utah, and as you can see, while there is some small variation in proportions and minor components, he "fits right in" this population.
I have also compared him against the 16 Dodecad Project participants who belong to my "American White" category. This is a rather fuzzy category, consisting of European-descended Americans and Canadians whose ancestry was not entirely from one of my other categories. In that population, the average components are: 11.3% West Asian, 0.2% Northwest African, 26.8% South European, 0.1% Northeast Asian, 1.2% SW Asian, 60.4% N European, 0.1% S Asian, also quite close to Spencer's results.

August 17, 2010

mtDNA relics in south China

In a past post I had noted that the occurrence of low-frequency divergent haplotypes in a population might be a "relic of a bygone age". The point I was trying to make is that early settlement in a region may create a diverse gene pool (as there is plenty of time for variation to accumulate), but this antiquity of settlement may be obscured by later (including fairly recent) expansions of sublineages that appear to be young in evolutionary terms.

Hence, the importance of outliers in age estimation, as these may alternatively be "relics" of the most ancient population (prior to the expansion, due to either selection or demographic increase, of the recent lineages), or introgressed lineages from abroad.

In order to discover outliers, you need a large sample. The authors of this paper, in the context of mtDNA, discovered 5 new basal (=near the trunk) lineages within Eurasian macrohaplogroups M and N. This is less than 0.1% of their huge Chinese sample. In a smaller sample, as is customary in most mtDNA studies, these outliers would probably have been undetected.

What is most interesting, is that the authors explicitly tried to distinguish between the two competing hypotheses described above: admixture and "relics". The new lineages do not appear to be the result of foreign admixture (e.g., some rare Indian M subclade that somehow found itself into southern China), but to be true relics.

The existence of relics pushes back the time of settlement/Out of Africa expansion, as more time is needed to "tie in" the relics with the rest of the tree.

This should serve as a warning for age estimation: so many times, peculiar lineages are brushed aside with a paragroup label as oddities, while researchers focus on the more established and phylogeographically informative lineages. While full-mtDNA sequencing is a viable option, the same procedure is not widely-applied in Y chromosomes, as the Y chromosome is much larger than mtDNA, and hence more difficult (and expensive) to fully sequence.

A 6,000-strong sample is probably not available for most countries and populations, except for the Genographic project -which seems to be missing in action of late. There are also large commercial samples which benefit from the desire of paying customers with unusual haplotypes to look deeper into their ancestry. Unfortunately these same customers are WEIRD, and give us little information about most of mankind, including about the most interesting and mysterious aspects of human prehistory.

Nonetheless, there is hope for the future, as sample sizes continue to increase and genotyping costs to decrease. While there is reason to share Craig Venter's bleak assessment of the accomplishment of genomics, the single, clear, field where human genetics has triumphed and will continue to triumph is that of human origins.

UPDATE: Gene Expression notes that commercial companies like 23andMe have even larger samples, and customers can download 550k SNPs for their sample. However, most of the people who buy 23andMe tests are -in the global context- near clones of each other, being predominantly of western European origin. Moreover, the thousands of SNPs included in the technology used by 23andMe include a limited number of mtDNA and Y chromosome SNPs which have been chosen for their informativeness, i.e., they define studies clades of the phylogeny, and are thus unsuitable for discovering new clades -as was done in this paper. I'm pretty sure there are paragroups a-plenty in both the 23andMe customer base or in the Genographic Project, but, as far as I know neither of the two aggressively mine their data for SNP discovery/phylogeny refinement, and there are ethical limitations to consider, as people who sign up for either service do not, necessarily approve of their DNA sample being used beyong the narrow scope of the provided service.

Molecular Biology and Evolution, doi:10.1093/molbev/msq219

Large-scale mtDNA screening reveals a surprising matrilineal complexity in East Asia and its implications to the peopling of the region

Qing-Peng Kong et al.

In order to achieve a thorough coverage of the basal lineages in the Chinese matrilineal pool, we have sequenced the mitochondrial DNA (mtDNA) control region and partial coding-region segments of 6,093 mtDNAs sampled from 84 populations across China. By comparing with the available complete mtDNA sequences, 194 of those mtDNAs could not be firmly assigned into the available haplogroups. Completely sequencing 51 representatives selected from these unclassified mtDNAs identified a number of novel lineages, including five novel basal haplogroups that directly emanate from the Eurasian founder nodes (M and N). No matrilineal contribution from the archaic hominid was observed. Subsequent analyses suggested that these newly identified basal lineages likely represent the genetic relics of modern humans initially peopling East Asia, instead of being the results of gene flow from the neighboring regions. The observation that most of the newly recognized mtDNA lineages have already differentiated and show the highest genetic diversity in southern China provided additional evidence in support of the Southern-Route peopling hypothesis of East Asians. Specifically, the enrichment of most of the basal lineages in southern China and their rather ancient ages in Late Pleistocene further suggested that this region was likely the genetic reservoir of modern humans after they entered East Asia.

Link

February 04, 2010

Charles Darwin belonged to Y-chromosome haplogroup R1b

Not very surprising given his nationality. I guess there is a small chance of a non-paternity event in 4 transmission events, so the result is probably not as good as e.g., exhuming Charles Darwin himself and testing him directly, but that's probably just nitpicking.

DISCOVERING THE ORIGINS OF CHARLES DARWIN
Today, 200 years after his birth, DNA technology has helped determine who Darwin’s ancient ancestors were. Darwin’s great-great-grandson, Chris Darwin, 48, who lives in the Blue Mountains near Sydney, took a Genographic Project public participation cheek swab test analyzing his “Y” chromosome. According to Dr. Spencer Wells, project director of the Genographic Project, a research partnership between National Geographic and IBM with field support from the Waitt Family Foundation, Darwin’s deep ancestry shows his ancestors left Africa around 45,000 years ago.

“I couldn’t wait to find out my family’s deep ancestry. I suspect that most people would be fascinated to know their family history over the past 60,000 years. After all, how can you understand who you really are, if you don’t know where you have come from?,” Chris Darwin said.

The test revealed that Chris Darwin, and therefore his paternal great-great-grandfather, Charles Darwin, are from Haplogroup R1b, one of the most common European male lineages. “Approximately 70 percent of men in southern England belong to Haplogroup R1b, and in parts of Ireland and Spain that number exceeds 90 percent,” Wells said.

August 17, 2009

Coastal-inland differences in Y chromosomes of the Levant

More on this after I get a hold of and digest the information in the paper.

Just a quick comment, based only on the abstract, that the Levantine populations should be studied in a European context as well, as they have been influenced by prehistoric populations from the Aegean, Greeks, Romans, medieval Crusaders, or Ottomans of various origins.

UPDATE: The paper has several supplementary figures and tables.

In Figure S1 we see the biallelic markers used in this study, and their representation in the various populations. It is a chronic problem with studies of this sort to undertype samples; there are phylogeographically informative markers within haplogroups G, L, E1b1b, and J2 for example, which would have added important information about the specific affinities of these haplogroups in the studied populations.


Inspit of these deficiencies, we may still make some useful observations. For example, IE-speaking Iranians have largely the same haplogroups as Arabs, but a much higher representation of haplogroup J2 compared to J1. The converse is true for all Arabs except the Lebanese. But, we do know, that even in Lebanon itself, Muslims have a higher J1/J2 ratio than Christians, and Islam was the main vehicle of Arabization in the region. The Christians are descended from the pre-Arab Byzantine Greco-Aramaic populations (with an addition of Western European Y-chromosomes in some Christian communities, which would not have substantially upset the J1/J2 balance).

It is fairly clear to me that in the Middle East, Greek and Iranian-settled regions have a higher J2/J1 ratio than regions with solid Semitic or NE Caucasian populations where J1 predominates.

UPDATE II (Aug 27):

The paper reports a near zero frequency of haplogroup J1 in Tunisia and Morocco, after an earlier study by the same authors. However, a different study (Onofri et al.) on Moroccan and Tunisian Y chromosomes report 20 and 35% respectively, which is in agreement with an earlier study on North African Y-chromosomes (Arredi et al.) The discrepancy in the J1 frequency seems too large to have arisen by chance given the sample sizes, and it would be interesting to see how it may have arisen.

Annals of Human Genetics doi:10.1111/j.1469-1809.2009.00538.x

Geographical Structure of the Y-chromosomal Genetic Landscape of the Levant: A coastal-inland contrast

Mirvat El-Sibai et al.

Abstract

We have examined the male-specific phylogeography of the Levant and its surroundings by analyzing Y-chromosomal haplogroup distributions using 5874 samples (885 new) from 23 countries. The diversity within some of these haplogroups was also examined. The Levantine populations showed clustering in SNP and STR analyses when considered against a broad Middle-East and North African background. However, we also found a coastal-inland, east-west pattern of diversity and frequency distribution in several haplogroups within the small region of the Levant. Since estimates of effective population size are similar in the two regions, this strong pattern is likely to have arisen mainly from differential migrations, with different lineages introduced from the east and west.

Link

October 31, 2008

60,000-year-old Y-chromosome haplogroup D? Not really

I am probably sounding like a broken record, but here comes another study which uses the wholly inappropriate "evolutionary" mutation rate of 0.00069/locus/generation. This rate is suitable for a haplogroup that grows due to drift alone and which is expected in 60,000 years (or 2,400 generations) to have grown to the grand number of ~1,200 men.

Not only is this the case, but the authors give "confidence intervals" on their age estimates of 61-71kya which is almost certainly an underestimate of the truth based on an incomplete assessment of the factors affecting uncertainty about the haplogroup's age. This nice and tight estimate is accomplished using the grand total of eight STRs!

Based on using the wrong mutation rate, and artificially narrow confidence intervals, the authors joyously proclaim:
The estimated ages of the D-M174 lineages are older than those previously reported
based on both Y chromosome and mtDNA variations in East Asia [8, 9, 21]. To see
whether it is over-estimated, using the same method, we calculated the divergence
time between DE* and E-M40. The estimated age is 27,176 years, which is much younger than the D-M174 lineage, but consistent with the previous estimation (27,800-37,000 years ago) [3]. Hence, the antiquity of D-M174 likely reflects the true prehistory of human populations in East Asia. The age estimation model developed by Zhivotovsky (2001) is not sensitive to effective population size and recent population expansion though the effect of population substructure cannot be totally ruled out. The antiquity of D-M174 was also supported by a previous study in which the origin of D-M174 was estimated more than 50,000 years ago [5].
Study [5] by Underhill et al., which supposedly supports the origin of haplogroup D 50,000 years ago, actually doesn't derive this estimate on the basis of any genetic data, but rather from theory about the "Southern Coastal Route":
The early human groups that used this route around 50000 years ago (taking the earliest occupation of Australia as the endpoint of this dispersal) were not restricted to coastal areas, and must have successfully colonized the Asian mainland, as shown by the distribution of surviving Group IV and V lineages.
I am constantly amazed by how the tremendous amount of effort required to identify, sample, catalogue, process, and genotype great numbers of people from around the world is accompanied by an apparently complete lack of interest in checking the basic premises on which interpretation of this data is based.

This paper and its supplementary data is a wonderful resource for Y-chromosome haplogroup D, but if you want to know more about the origins of this haplogroup, the sister clade of the common haplogroup E, you'll have to look elsewhere.

BMC Biology doi: 10.1186/1741-7007-6-45

Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations

Hong Shi et al.

Abstract

Background

The phylogeography of the Y chromosome in Asia previously suggested that modern humans of African origin initially settled in mainland southern East Asia, and about 25,000-30,000 years ago, migrated northward, spreading throughout East Asia. However, the fragmented distribution of one East Asian specific Y chromosome lineage (D-M174), which is found at high frequencies only in Tibet, Japan and the Andaman Islands, is inconsistent with this scenario.

Results

In this study, we collected more than 5,000 male samples from 73 East Asian populations and reconstructed the phylogeography of the D-M174 lineage. Our results suggest that D-M174 represents an extremely ancient lineage of modern humans in East Asia, and a deep divergence was observed between northern and southern populations.

Conclusions

We proposed that D-M174 has a southern origin and its northward expansion occurred about 60,000 years ago, predating the northward migration of other major East Asian lineages. The Neolithic expansion of Han culture and the last glacial maximum are likely the key factors leading to the current relic distribution of D-M174 in East Asia. The Tibetan and Japanese populations are the admixture of two ancient populations represented by two major East Asian specific Y chromosome lineages, the O and D haplogroups.

Link

October 30, 2008

"Phoenician" Y-chromosomes

It has been several years since the inception of the Genographic project, and to say that the quantity and quality of the work produced by it is underwhelming would be charitable.

The newest bit of Genographic wisdom is that haplogroup J2 in the Mediterranean is associated not with the Neolithic, Greek, or other population movements, but with the sea-faring Phoenicians. They achieve this feat by (allegedly) comparing areas of Phoenician with those of no (or low) such influence.

I have intentionally limited myself to five major weak points of the study: to cover more would be too time-consuming and unnecessary.



1. The Hellenistic age did not happen

A central assumption of this work is that the conquest and occupation of the Middle East by Alexander the Great does not count as Greek influence, despite centuries of Greek domination that followed, both during Hellenistic, and later in Roman times.

The authors write that their method could be further used to:
include systematic investigations of military expansions, such as the Greek signal, from the time of Alexander the Great in central and south Asia
Apparently they didn't think of applying it to West Asia itself, which was also conquered by Alexander the Great, and in which the Greek-speaking element persisted far longer than in "south Asia".

Thus, the population of Phoenicia and its "periphery" is implicitly assumed to be free of Greek influence. That is a bizarre contention, given that Greek was spoken in "Phoenicia" long after the Phoenician language became extinct.

2. Crete was influenced by the Phoenicians

This totally unsupported claim is necessary for the authors' thesis, since Crete has the world maximum of haplogroup J2. I have no doubt that Phoenicians traded with Cretans, just as Cretans traded with Phoenicians. But, that is no excuse to think of Crete as an area of Phoenician influence.

Indeed, settlement of the Levant by Aegean peoples is archaeologically supported, while Phoenician settlement of Crete is not.

But, speaking of Phoenician settlement, the only area of Greece where such settlement is believed to have taken place is in mainland Greece, in Thebes, where Cadmus and his Phoenicians founded Cadmeis. I doubt that this had any substantial effect, but if the authors wanted to be intellectually honest, they would list this as an area of Phoenician influence, rather than Crete.

3. West Asia Minor (or the Pontus) was not colonized by Greeks

The most laughable claim of the authors (see map) is the absence of blue (Greek) dots on West Asia Minor, and the Pontus (Northeast Turkey). Apparently the Greek colonies of the far West (such as Marseilles) count as areas of Greek influence, while the countless Greek cities on the Asian side of the Aegean, or in northeast Turkey do not.

The motivation of this is obvious, since Asia Minor is a J2-heavy area and asserting the Greek influence there would upset the paper's thesis. But, it is absurd to place blue dots in Paphlagonia and Caria and not in Ionia or the Pontus.

4. Modern Lebanese are descendants of Phoenicians

This central assumption of the paper has no actual support, except for a vague geographical congruence. Modern Lebanese are a hybrid people, divided into Christians and Muslims. Both are Arabs, with Muslims being more influenced by the original Arabians, and Christians more influenced by the pre-Arab (Greco-Syrian) and post-Arab (West European) migrations. Perhaps, there is a trace of Phoenician genes in them, but this is really not a self-evident claim.

5. R1b in Greece and Turkey is due to the Celts

R1b in Greece and Turkey belongs primarily into the "eastern" variety, and not the "western" variety. It is in Italy and north of Greece where the two varieties begin to blend with each other. No care to distinguish between these varieties is taken.

Certainly, some R1b in this region may be due to Western Europeans (e.g. from the period of the Frankokratia), but to assign its totality to this factor is nonsensical. Apparently, the geniuses of the Genographic project have decreed that the brief foray of the Celts into Greece introduced massive amounts of R1b, but a thousand years of Greco-Roman domination of the Levant did nothing of the kind.

6 (bonus). Haplogroup J2 is more frequent in East than in West Sicily

Sicily is an island which had well-documented and not insignificant settlements by both Greeks and Phoenicians. Moreover, these settlements were geographically divided: Greeks in the East, Phoenicians in the West. It is in the East that J2 has its highest frequency, and not in the Phoenician West.

Conclusion

Is there anything of value in this paper? Well, it's a good idea to try to correlate Y-chromosome distribution with historical rather than pre-historical events. Too bad the authors botched the job, but their paper can at least serve as a reference point for how not to go about doing it.

UPDATE: Take a look at the "haplotype groups" suggested by the authors as signals of Phoenician and Greek colonization.



Not only are haplotype groups not clades (they do not designate common ancestry), but 7-marker haplotypes don't even designate anything that can be remotely tied to the time period in question, given the huge confidence intervals associated with even larger numbers of markers. Feel free to plug these haplotypes to yhrd or ysearch to find plenty of long-lost "Phoenicians" all over the planet.

UPDATE II: The "evolutionary" mutation rate rears its ugly head

From the paper:
Because there is a significant chance that a haplotype existing 3000 years ago has accumulated a one-step difference in an STR (we expect 0.6 mutations per seven-STR haplotype when a rate of 6.9x10-4 per locus per 25 yr is used), these one-step neighbors have been included in each set, producing what we have labeled STR+s. STR-s can contain both haplotypes deriving from mutations, which should have been included, and independent haplotypes unconnected with the migrations that we are trying to detect.
UPDDATE III: What of the Arabs?

The modern Lebanese are Arabs, as are most modern North Africans where Phoenician colonies were founded. The Arabs also affected several Mediterranean islands, as well as Iberia. One would think that the most salient feature of modern Mediterranean populations would be mentioned in a paper which attempted to trace patterns of Y-chromosome variation in the Mediterranean.

Certainly, the Neolithic, Greek, and Phoenician migrations, as well as the Jewish Diaspora moved people around. But the Phoenicians have been extinct for 2,000 years. The Jews had (and have) communities around the Mediterranean, but did not amount to a significant population element anywhere. It is the Arabs who are the elephant in the room, and yet they are ignored. Are similarities between the Levant, North Africa and Spain due to Phoenicians or due to this later Arab movement? By failing to trace the distribution of their "Phoenician colonization signals" among Arabians, the authors have overstated their case.


American Journal of Human Genetics doi: :10.1016/j.ajhg.2008.10.012

Identifying Genetic Traces of Historical Expansions: Phoenician Footprints in the Mediterranean

Pierre A. Zalloua et al.

Abstract

The Phoenicians were the dominant traders in the Mediterranean Sea two thousand to three thousand years ago and expanded from their homeland in the Levant to establish colonies and trading posts throughout the Mediterranean, but then they disappeared from history. We wished to identify their male genetic traces in modern populations. Therefore, we chose Phoenician-influenced sites on the basis of well-documented historical records and collected new Y-chromosomal data from 1330 men from six such sites, as well as comparative data from the literature. We then developed an analytical strategy to distinguish between lineages specifically associated with the Phoenicians and those spread by geographically similar but historically distinct events, such as the Neolithic, Greek, and Jewish expansions. This involved comparing historically documented Phoenician sites with neighboring non-Phoenician sites for the identification of weak but systematic signatures shared by the Phoenician sites that could not readily be explained by chance or by other expansions. From these comparisons, we found that haplogroup J2, in general, and six Y-STR haplotypes, in particular, exhibited a Phoenician signature that contributed > 6% to the modern Phoenician-influenced populations examined. Our methodology can be applied to any historically documented expansion in which contact and noncontact sites can be identified.

Link

September 16, 2008

Genographic project paper on human mtDNA mutation rates

This is a fairly technical paper which should be of interest for those interested in uniparental markers and their age estimation.

Genetics doi: 10.1534/genetics.108.091116

Maximum Likelihood Estimation of Site-Specific Mutation Rates in Human Mitochondrial DNA from Partial Phylogenetic Classification

Saharon Rosset et al.

Abstract

The mitochondrial DNA hyper-variable segment I (HVS-I) is widely used in studies of human evolutionary genetics, and therefore accurate estimates of mutation rates among nucleotide sites in this region are essential. We have developed a novel maximum-likelihood methodology for estimating site-specific mutation rates from partial phylogenetic information, such as haplogroup association. The resulting estimation problem is a generalized linear model, with a non-standard link function. We develop inference and bias correction tools for our estimates and a hypothesis testing approach for site independence. We demonstrate our methodology using 16,609 HVS-I samples from the Genographic Project. Our results suggest that mutation rates among nucleotide sites in HVS-I are highly variable. The 16,400--16,500 region exhibits significantly lower rates compared to other regions, suggesting potential functional constraints. Several loci identified in the literature as possible termination associated sequences (TAS) do not yield statistically slower rates than the rest of HVS-I, casting doubt on their functional importance. Our tests do not reject the null hypothesis of independent mutation rates among nucleotide sites, supporting the use of site-independence assumption for analyzing HVS-I. Potential extensions of our methodology include its application to estimation of mutation rates in other genetic regions, like Y-chromosome short tandem repeats.

Link

June 13, 2008

mtDNA of Tarim mummies




The National Geographic documentary on the Tarim mummies reports on recent mtDNA work on the Tarim mummies. The program doesn't really reveal anything new to anyone familiar with the story of these mummies, but there are some nice segments of some of them as they would have been during their lifetime. At some point, the camera shows what appear to be haplogroup assignments, although I wouldn't vouch as to what these actually mean. or to who exactly they belong. What they do say is that they found markers from "Europe, West Eurasia, Siberia, Tibet, Mongolia, even India". They also mention that the "Beauty of Loulan", the "Boy" have "unexpected marks of East Asian ancestry", and "Cherchen Man" also carries "a surprising East Asian lineage" and that the "Shaman" has a "lineage frequently seen in the Himalayas and India".

March 27, 2008

Christian and Muslim Lebanese do differ from each other after all

Like I said they did in 2007. BBC has a story about this:
The team analysed the Y chromosomes of 926 Lebanese males and found that patterns of male genetic variation in Lebanon fell more along religious lines than along geographical lines.

A genetic signature on the male chromosome called WES1, which is usually only found in European populations, was found among the Lebanese men included in the study.

"It seems to have come in from Europe and is found mostly in the Christian population," said Dr Spencer Wells, director of the Genographic Project.

"This is odd because typically we don't see this sort of stratification by religion when we are looking at the relative proportions of these lineages - and particularly immigration events."

He told BBC News: "Looking at the same data set, we saw a similar enrichment of lineages coming in from the Arabian Peninsula in the Muslim population which we didn't see [as often] in the Christian population."

Lebanese Muslim men were found to have high frequencies of a Y chromosome grouping known as J1. This is typical of populations originating from the Arabian Peninsula, who were involved in the Muslim expansion.

As I predicted, the finding of similarity between Christian and Muslim Lebanese in the older National Geographic story on Wells' and Zalloua's work was premature, based on their common possession of Y-haplogroup J, because it did not look at downstream markers which differentiate between Christians and Muslims. As I observed based on the work of Capelli et al., it is the overrepresentation of Y-haplogroup J*(xJ2), which comprises almost entirely of J1 chromosomes that is the mark of the Arab descent of Muslim Lebanese.

I will post the abstract of this study and any further comments when I see it.

UPDATE: The Genographic project has its own page on this research, as well as a link to the paper (pdf).

Y-Chromosomal Diversity in Lebanon Is Structured by Recent Historical Events

Pierre A. Zalloua et al.

Lebanon is an eastern Mediterranean country inhabited by approximately four million people with a wide variety of ethnicities and religions, including Muslim, Christian, and Druze. In the present study, 926 Lebanese men were typed with Y-chromosomal SNP and STR markers, and unusually, male genetic variation within Lebanon was found to be more strongly structured by religious affiliation than by geography.We therefore tested the hypothesis that migrations within historical times could have contributed to this situation. Y-haplogroup J*(xJ2) was more frequent in the putative Muslim source region (the Arabian Peninsula) than in Lebanon, and it was also more frequent in Lebanese Muslims than in Lebanese non-Muslims. Conversely, haplogroup R1b was more frequent in the putative Christian source region (western Europe) than in Lebanon and was also more frequent in Lebanese Christians than in Lebanese non-
Christians. The most common R1b STR-haplotype in Lebanese Christians was otherwise highly specific for western Europe and was unlikely to have reached its current frequency in Lebanese Christians without admixture.We therefore suggest that the Islamic expansion from the Arabian Peninsula beginning in the seventh century CE introduced lineages typical of this area into those who subsequently became Lebanese Muslims, whereas the Crusader activity in the 11th-13th centuries CE introduced western European lineages into Lebanese Christians.