Showing posts with label E-V13. Show all posts
Showing posts with label E-V13. Show all posts

November 11, 2011

Falsification in action

I am an occasional critic of Anatole Klyosov's Y-STR based age estimation methodology on the GENEALOGY-DNA-L list. As I have mentioned before, I am boycotting Y-STRs because they are simply worthless for the student of prehistory due to their poor qualities as molecular clocks and lack of any clear correspondence with population movements.

Nonetheless, Klyosov's professional credentials and substantial "dna genealogy" paper production, may lead some to give his work, characterized by very narrow confidence intervals and rather imaginative archaeological reconstructions, undue attention.

Klyosov resurfaced on GENEALOGY-DNA-L, taking a swipe at my criticism of his narrow confidence intervals:
Instead of walking in circles considering "bushy trees" all these years and complaining on "huge confidence intervals", one better take ACTUAL genealogy data, ACTUAL haplotype datasets, and compare actual dates with those resulted from DNA genealogy. This will show what ACTUAL margins of error looks like. With "bushy trees", they should be first subdivided on separate branches, and each branch should be analyzed individually.

Thankfully, the arrival of ancient DNA analysis can be used to falsify Klyosov's assertions. In December 2010 he discussed the possibility that some E1b1b1 subclades may have played a role in wiping out the "Bell Beakers":
However, E-V13 is already out, since it was formed around 2600 ybp (Lutak and Klyosov, Proceedings, 2009, April, pp. 639-669). E-V65 is out on the same reason (2625 ybp). E-V22 is a good candidate, with its common ancestor around 5075 ybp (ibid). E1b1b1a1-V12 also could be there, with its common ancestor of 4300+/-680 ybp. E3b1, as Adams et al (2008) called them (it is apparently E-81), has a common ancestor in Iberia around 4825 ybp (Klyosov, Proceedings, 2009, March, pp. 390-421), which nicely fit to the concept.
The recent publication of 7,000-year-old E-V13 from Neolithic Spain, indicates that this haplogroup was in existence at least that long ago, and hence could not have been formed 2,600 years before present. Klyosov's error is at least 2.5x, consistent with my assertions that Y-STR based age estimates carry huge confidence intervals, and inconsistent with his self-assurance that they do not.

I see nothing wrong in advancing speculative hypotheses based on the available evidence. I've advanced some of my own ideas for the spread of E-V13 that appear to be less plausible in the light of the ancient DNA evidence, even though a historical, Greek-mediated spread of a subset of E-V13 as proposed by Di Gaetano et al. and King et al. is still possible.

What is certainly wrong is to have over-confidence in one's assertions and not to admit the limitations of Y-STR based age estimates when they are staring us in the face on both theoretical and empirical grounds.

November 01, 2011

Y-haplogroups E-V13 and G2a in Neolithic Spain

I have not read the paper, so I can't comment in detail. Two quick comments:
  • The discovery of G2a is added to the finds from Treilles, Derenburg, and the Alps. It is now virtually certain that the Neolithic transition in much of Europe, both inland, and coastal involved G2a-bearing men.
  • The discovery of E-V13 in Spain is unexpected on a number of different reasons: there is relatively little of it there now; it had previously been associated with the inland route of the spread of agriculture, as well as the spread of the Greeks to Sicily and Provence, or Roman soldiers at a much later date.
While this Neolithic E-V13 may well have come from the Balkans, and the common ancestor of the very uniform present-day Balkan cluster may have lived after this Spanish find, it is now certain that E-V13 was established in Europe long before the Bronze Age. This highlights the need to avoid Y-STR based calculations on modern populations for inferring patterns of ancient history, and not to conflate TMRCAs with "dates of arrival": "In short: a particular TMRCA is consistent with either the arrival of the lineage long before and long after the TMRCA in a particular geographical area."

At least for now, three of the major players of the European genetic landscape (E-V13, G2a, and I2a) have made their Neolithic appearance. Hopefully, as more ancient DNA is published, and even from later dates, more of them will turn up.

I will comment more when I get to read the paper.

UPDATE I:

From the paper:
For the six male samples, two complete and four partial Y-STRs haplotypes were obtained (Table 3). They allowed classification of individuals into two different haplogroups: G2a (individuals ave01, ave02, ave03, ave05, and ave06, which seem to share the same haplotype) and E1b1b1 (individual ave07). The four markers chosen to confirm belonging to these haplogroups (Y-E1b1b1-M35.1, Y-E1b1b1a1b-V13, Y-G2-M287, and Y-G2a-P15) were typed with a rate of 66%, which permitted confirmation that four males were G2a and one was E1b1b1a1b (Table 3).

Analysis of shared haplotypes showed that the G2a haplotype found in ancient specimens is rare in current populations: its frequency is less than 0.3%(Table S3). The haplotype of individual ave07 is more frequent (2.44%), particularly in southeastern European populations (up to 7%). The Ave07 haplotype was also compared with current Eb1b1a2 haplotypes previously published (10–14). It appeared identical at the seven markers tested to five Albanian, two Bosnian, one Greek, one Italian, one Sicilian, two Corsican, and two Provence French samples and are thus placed on the same node of the E1b1b1a1b-V13 network as eastern, central, and western Mediterranean haplotypes (Fig. S1).
The ancient remains all appeared to lack the common European lactase persistence genotype.

On the mtDNA:
Mitochondrial HVS-I sequences were obtained for the seven individuals and can be classified into four different haplotypes (Table 2). All are still frequent in current European populations (Table S1), and three of them were also found in ancient Neolithic samples (Table S2). These haplotypes permitted the determination that the individuals ave01, ave02, and ave06 belonged to K1a, ave04 and ave05 to T2b, ave03 to H3, and ave07 to U5 haplogroups.
The supporting information (pdf) has a lot of additional information.

PNAS doi: 10.1073/pnas.1113061108

Ancient DNA suggests the leading role played by men in the Neolithic dissemination

Marie Lacan et al.

The impact of the Neolithic dispersal on the western European populations is subject to continuing debate. To trace and date genetic lineages potentially brought during this transition and so understand the origin of the gene pool of current populations, we studied DNA extracted from human remains excavated in a Spanish funeral cave dating from the beginning of the fifth millennium B.C. Thanks to a “multimarkers” approach based on the analysis of mitochondrial and nuclear DNA (autosomes and Y-chromosome), we obtained information on the early Neolithic funeral practices and on the biogeographical origin of the inhumed individuals. No close kinship was detected. Maternal haplogroups found are consistent with pre-Neolithic settlement, whereas the Y-chromosomal analyses permitted confirmation of the existence in Spain approximately 7,000 y ago of two haplogroups previously associated with the Neolithic transition: G2a and E1b1b1a1b. These results are highly consistent with those previously found in Neolithic individuals from French Late Neolithic individuals, indicating a surprising temporal genetic homogeneity in these groups. The high frequency of G2a in Neolithic samples in western Europe could suggest, furthermore, that the role of men during Neolithic dispersal could be greater than currently estimated.

Link

March 14, 2011

The coming of the Greeks to Provence and Corsica (King et al. 2011)

I am sure I will have much more to say on this paper once I read it carefully, but, for the moment, I will remind readers of my 2008 post on Expansion of E-V13 explained in which I postulated that E-V13 in Europe is attributed largely to Greek colonization.

The paper is also quite exciting as it includes samples of Greeks from the vicinity of Smyrna and Phocaia, the first, as far as I know published samples of Greek men from Asia Minor. I do find, however, somewhat bizarre the use of Anatolian Greeks as the putative ancestors of the colonization of the West Mediterranean and of Anatolian Turks as the supposed representatives of the Neolithic population (Table 1). The claim that the latest Anatolian population stratum (Turks) can be linked to its earliest (Neolithic-era Anatolians) is rather suspect.

UPDATE I (Mar 15)

The authors claim:
This high frequency ofhaplogroup J2a-Page55 (formerly DYS413≤ 18) in Smyrna is characteristic of non Greek Anatolia.
This claim is based entirely on the authors' limited Balkan Greek samples. An inspection of more Greek samples shows that DYS413 less or equal to 18 occurs at higher frequencies both in Crete, but also several mainland sites (Serrai, Larisa, Patrai) spanning the entire country. Hence, I believe that the claim that J2a-Page55 distinguishes Greeks from non-Greeks is spurious.

UPDATE II (Mar 15)

The authors cite the "Phoenician" paper:
Previous Y-chromosome genetic studies of Phoenician colonization have demonstrated that haplogroup J2 frequency was amplified in regions containing the Phoenician colonies of Iberia and North Africa in comparison to areas not containing Phoenician colonies [7]
My scathing criticism of that paper, and the specific "Phoenician" association with J2 can be found here.

UPDATE III (Mar 15)

The authors make a big deal of the presumed relationship of Phocaea with Ionians and of Smyrna with Ionian/Aeolians. As I have mentioned before, it is a hard sell to think that two sites right next to each other, inhabited by people who had no ethnic or religious distinction for more than 2,000 years (any tribal Greek identities had disappeared by ancient times) managed to retain, nonetheless distinctive gene pools from each other over that time span that can be traced to archaic Greek tribal distinctions.

UPDATE (Mar 17)

The above-mentioned nitpicks do not, however, detract from the paper's thesis. So, it's worth repeating a few of the things on which this thesis is supported:
  • We have new Greek population samples from Asia Minor that show E-V13 frequencies well within the regional variation of mainland Greece, and higher than in the Turkish Anatolian population. This disproves the theory that E-V13 may have been introduced to the mainland Greek population recently from Albanians, Thracians, and other bizarre theories advocated by some, as these would not have affected substantially the Greeks of West Asia Minor.
  • It should be noted however, that E-V13 frequencies vary substantially among Greek populations. This seems consistent with my theory of its Bronze Age "heroic" origin, as late lineages are expected to have non-homogeneous frequency distributions.
  • The Corsican evidence is consistent with the Greek origin of E-V13 due to the higher frequency of E-V13 around the colony of Alalia (4.6% East Corsica vs. 1.6% in West Corsica).
  • The absence of I-M423 in Provence precludes a substantial contribution to the Provencal population by Balkan populations north of Greece where I-M423 reaches a higher frequency.
It seems pretty clear to me that E-V13 bearing men of Provence are patrilineally descended from the Greeks of the archaic age. The same could be true for others (e.g., J-M92) assigned (erroneously in my opinion) to non-Greek Anatolians, but overall, the evidence supports the persistence of the gene pool of the Western Greeks among the present-day southern French.

BMC Evolutionary Biology 2011, 11:69doi:10.1186/1471-2148-11-69

The coming of the Greeks to Provence and Corsica: Y-chromosome models of archaic Greek colonization of the western Mediterranean

Roy J King et al.

Abstract (provisional)

Background
The process of Greek colonization of the Central and Western Mediterranean during the Archaic and Classical Eras has been understudied from the perspective of population genetics. To investigate the Y chromosomal demography of Greek colonization in the Western Mediterranean, Y-chromosome data consisting of 29 YSNPs and 37 YSTRs were compared from 51 subjects from Provence, 56 subjects from Smyrna and 31 subjects whose paternal ancestry derives from Asia Minor Phokaia, the ancestral embarkation port to the 6th century BCE Greek colonies of Massalia (Marseilles) and Alalie (Aleria, Corsica).

Results
19% of the Phokaian and 12% of the Smyrnian representatives were derived for haplogroup E-V13, characteristic of the Greek and Balkan mainland, while 4% of the Provencal, 4.6% of West Corsican and 1.6% of East Corsican samples were derived for E-V13. An admixture analysis estimated that 17% of the Y-chromosomes of Provence may be attributed to Greek colonization. Using putative Neolithic Anatolian lineages: J2a-dys445=6, G2a-M406 and J2a1b1-M92 the data predict a 0% Neolithic contribution to Provence from Anatolia. Estimates of colonial Greek vs. indigenous Celto-Ligurian demography predict a maximum of a 10% Greek contribution, suggesting a Greek male elite-dominant input into the Iron Age Provence population.

Conclusions
Given the origin of viniculture in Provence is ascribed to Massalia, these results suggest that E-V13 may trace the demographic and socio-cultural impact of Greek colonization in Mediterranean Europe, a contribution that appears to be considerably larger than that of a Neolithic pioneer colonization.

Link

September 28, 2010

Y chromosome study of Serbian Roma

The haplogroups are available as supplementary material. I wonder whether different population of Roma underwent different levels of admixture, or whether the Roma are themselves originally unrelated groups of wanderers which came to be identified by others as "Gypsies" and eventually believed it.

Both "massive admixture" and the scenario I am entertaining have their problems: in the former: why did a group of Roma admix so heavily while another not at all? in the latter: how did groups of unrelated origin come to share common cultural-linguistic traits? Balkan ethnology is not easy.

Here is an interesting tidbit from the paper which complements my recent enumeration of the genealogical mutation rate's superiority:
For the majority of the populations, time estimates based on Zhivotovsky et al., (2004) and NETWORK using the evolutionary mutation rate are comparable.
On the other hand, time estimates using the genealogical mutation rate (Goedbloed et al., 2009) seem to fit better with historical data of the Romani diaspora.
American Journal of Physical Anthropology DOI: 10.1002/ajpa.21372

Divergent patrilineal signals in three Roma populations

Maria Regueiro et al.

Abstract

Previous studies have revealed that the European Roma share close genetic, linguistic and cultural similarities with Indian populations despite their disparate geographical locations and divergent demographic histories. In this study, we report for the first time Y-chromosome distributions in three Roma collections residing in Belgrade, Vojvodina and Kosovo. Eighty-eight Y-chromosomes were typed for 14 SNPs and 17 STRs. The data were subsequently utilized for phylogenetic comparisons to pertinent reference collections available from the literature. Our results illustrate that the most notable difference among the three Roma populations is in their opposing distributions of haplogroups H and E. Although the Kosovo and Belgrade samples exhibit elevated levels of the Indian-specific haplogroup H-M69, the Vojvodina collection is characterized almost exclusively by haplogroup E-M35 derivatives, most likely the result of subsequent admixture events with surrounding European populations. Overall, the available data from Romani groups points to different levels of gene flow from local populations.

July 03, 2010

Y chromosomes of Arbereshe from Calabria

From the paper:
The Arbereshe are one of the largest linguistic minorities in Italy. They are the result of complicated movements of Albanians around the end of the 15th and beginning of the 16th century, often linked to the invasion of the Balkans by the Ottoman Empire. Despite that, it is generally agreed that most of the immigrants started moving from the south of Albania (Toskeria), with, very often, intermediate steps in Greece, particularly in the Peloponnese (Zangari 1941). Further evidence is provided by linguistic research, according to which Arberisht, the language spoken by Arbereshe, is part of the Tosk dialect group of Albanian, a language originally spoken in Toskeria (Babiniotis 1998).
On the sample:
The Arbereshe Y-chromosome variation was investigated by sampling individuals from different villages of the Pollino area (Calabria) who bear one of the founding surnames of the population. The genotyping was performed using 12 microsatellites (STRs) and 31 unique event polymorphisms (UEPs), defining, respectively, haplotypes and haplogroups. The Italian and Balkan genetic backgrounds were explored using the large amount of data provided by recent Y-chromosome studies in the two peninsulas and by literature data on STRs from forensic research.
Comparison of Y-haplogroup frequency and diversity between Albanians from Tirana and Arbereshe from Calabria (from Table III):


The presence of F*(xG,I,J,K) in Albanians is interesting as this occurs in Romania and Bosnia Herzegovina (all groups), and in South Apulia, It could potentially be haplogroup H and may reflect a Gypsy element that was not present when the Arbereshe moved to Italy from the Balkans.

Haplogroup I shows similar frequencies, but:
I-M170 is the most common Balkan haplogroup (Pericic et al. 2005a,b) and the second most frequent Arbereshe clade. Nevertheless, analysis of its network reveals unexpected results: most of the Arbereshe I-M170 haplotypes are not included in the Balkan cluster (Figure 3), but are located in the long branches containing mainly Italian chromosomes. Comparisons with literature data (Semino et al. 2000; Barac et al. 2003, Rootsi et al. 2004) show that the core haplotype of the Balkan cluster (16-14-15-13-31-24-11-11-13; locus order as above) is consistent with the almost Balkan exclusive I2a (formerly I1b) clade. The proposed interpretation of the Arbereshe as a proxy of the founder Albanian population leads us to hypothesize that the I2a clade was less common in the southern Balkans 500 years ago than nowadays. The very tight shape of the I2a cluster in the network suggests a very recent expansion of this haplogroup in the southern Balkans. Furthermore, I2a is still rare in
mountain populations such as the Albanians of Kosovo (Pericic et al. 2005a,b) and in a randomly selected Arbereshe sample from Rootsi et al. (2004).
This is an interesting finding in the light of recent evidence for selection in Y-haplogroup I.

The situation with J2 is also quite interesting as this is rarer in Arbereshe (3%) than Albanians (17%):
The scarcity of J2 chromosomes in the Arbereshe sample (1/40) is very difficult to explain, given that they are very common in both the Italian peninsula and the southern Balkans. Literature data on J2 indicate that most of the haplotypes included in the Balkan (B) cluster of the network (Figure 3) have an STR configuration consistent with the J2-M12 sub-clade (Di Giacomo et al. 2004; Semino et al. 2004; Cruciani et al. 2007). In contrast, most of the haplotypes in the other clusters agree with the STR configuration given for the J2-M67 clade, with its sub-clade J2-M92 (Di Giacomo et al. 2004). It is unconvincing to attribute the rarity of J2 in the Arbereshe to random sampling or to the effect of genetic drift. Furthermore, the Arbereshe sample analysed by Semino et al. (2004) also completely lacks the typically Balkan J2-M12 chromosomes. If we interpret our Arbereshe sample as representative of the founding Albanian population, we may hypothesize that the J2 haplogroup was considerably less diffuse in the southern Balkans five centuries ago than today.
What we can conclude from this study is that the founding Albanian population was J2- and I2a- lite compared to modern Albanians. The source for the I2a seems to be either the Albanization of people from the West Balkans and/or selection, although it would be difficult to see a massive increase in frequency in only five centuries. The I2a-deficiency of the Arbereshe also gives support to the theory that the Albanians are relatively recent arrivals from the northeast; this theory has been upheld in the past on the basis of the (i) their historical obscurity until the last millennium, and (ii) the paucity of native sea terms and Greek loanwords in Albanian, which is difficult to explain if Albanians always occupied their current location on the Adriatic.

The source of J2 is less clear, and could be either the Albanization of Greeks (the only Balkan population with a sizeable J2 frequency) or remnants of Muslim Anatolians from Ottoman times. However, modern Albanians belong mainly to clade J2b, while Anatolians belong to J2a. Thus, I tend to dismiss the Anatolian connection.

The low frequency of R1*x(R1a1) in the Arbereshe, together with the high E1b1b1a frequency are quite convincing of the Balkan origins of this population.

Ann Hum Biol.
2010 Jun 22. [Epub ahead of print]

Linking Italy and the Balkans. A Y-chromosome perspective from the Arbereshe of Calabria.

Boattini A, Luiselli D, Sazzini M, Useli A, Tagarelli G, Pettener D.

Abstract

Background: The Arbereshe are an Albanian-speaking ethno-linguistic minority who settled in Calabria (southern Italy) about five centuries ago. Aim: This study aims to clarify the genetic relationships between Italy and the Balkans through analysis of Y-chromosome variability in a peculiar case study, the Arbereshe. Subject and methods: Founder surnames were used as a means to identify a sample of individuals that might trace back to the Albanians at the time of their establishment in Italy. These results were compared with data of more than 1000 individuals from Italy and the Balkans. Results: The distributions of haplogroups (defined using 31 UEPs) and haplotypes (12 STRs) show that the Italian and Balkan populations are clearly divergent from each other. Within this genetic landscape, the Arbereshe are characterized by two peculiarities: (a) they are a clear outlier in the Italian genetic background, showing a strong genetic affinity with southern Balkans populations; and (b) they retain a high degree of genetic diversity. Conclusion: These results support the hypothesis that the surname-chosen Arbereshe are representative of the Y-chromosome genetic variability of the Albanian founder population. Accordingly, the Arbereshe genetic structure can contribute to the interpretation of the recent biological history of the southern Balkans. Intra-haplogroup analyses suggest that this area may have experienced important changes in the last five centuries, resulting in a marked increase in the frequency of haplogroups I2a and J2.

Link

May 12, 2009

Y chromosome haplogroup E-M78 subtyping in Italians

The overall frequency of haplogroup E-M78 was 7.78% in the Piedmont and 11.44% in Siily, with E-V13 present at a frequency of 3.33% and 5.93% respectively. The Sicilian sample is the same as in di Gaetano et al.. The paper includes Y-STR minimum haplotype data in its supplementary material.

International Journal of Legal Medicine doi:10.1007/s00414-009-0350-y

Subtyping of Y-chromosomal haplogroup E-M78 (E1b1b1a) by SNP assay and its forensic application

S. Caratti et al.

Abstract
The continual discovery of new single-nucleotide polymorphisms (SNPs) has led to an increased resolution of the Y chromosome phylogeny. Some of these Y-SNPs have shown to be restricted to small geographical regions and therefore may prove useful in the forensic field as tools for the prediction of population of origin of unknown casework samples. Here, we describe a system for the molecular dissection of haplogroup E-M78 (E1b1b1a), consisting of multiplex polymerase chain reaction and minisequencing of M78 and nine population-informative Y-SNPs (M148, M224, V12, V13, V19, V22, V27, V32, V65) in a single reaction. Sensitivity and admixture studies demonstrated that the SNP protocol allows robust genotyping from as little as 50 pg of male DNA, even in the presence of 500-fold amounts of female DNA. In order to evaluate the suitability of E1b1b1a, subhaplogrouping for population-of-origin prediction, the distribution of E-M78 and its derived variants was determined in an Italian population sample (n = 326).

Link

April 21, 2009

In search of Bronze Age metal prospectors

UPDATE:

A post in the GENEALOGY-DNA-L gives some additional information from the scientists working on this:
We are following up on the Weale study (Mol. Biol. Evol. 19(7):1008-1021. 2002) which reported a much higher than average number of E3b individuals in
Abergele. We are interested in the possibility that these may be linked to
the Bronze age copper mines nearby, but obviously this is just one
possibility. The first step is to see if we can replicate the findings of
the 2002 study in a much larger sample.
The 2002 study had found a high frequency of HG21 in Abergele. It will be interesting to see which subclade of E3b (or E1b1b in the updated terminology) the NW Wales men belong to. If they do belong to E-V13, then this would be consistent with a Bronze Age origin, although this would be difficult to distinguish from other scenaria, e.g., the arrival of this haplogroup with the Romans.

Also of interest: The Litoroid Race in the Bronze Age.

DNA test to prove Bronze Age link
Men are needed for DNA tests to prove their distant ancestors moved from the Mediterranean to north west Wales as migrant workers 4,000 years ago.

...

Researchers at the University of Sheffield hope to link the migration of men in the Bronze Age to the discovery of copper.

The metal was found at both Parys Mountain on Anglesey, and on the Great Orme at Llandudno, Conwy.

The researchers are building on previous work carried out in the area which found a much higher-than-average presence of a DNA marker that is commonly found in people from the Balkans and Spain.

December 24, 2008

Expansion of E-V13 and I-M423 from the Balkans

The most interesting aspect of this paper is that it supports a European rather than Middle Eastern origin of E-V13 and I-M243 based on an analysis of relative Y-STR variance. However, the age estimates presented in this paper are based on the infamous "evolutionary mutation rate", and are thus suspect. What appears as "Mesolithic" using the wrong mutation rate is actually Bronze Age, although with hefty confidence intervals.

Furthermore, caution should be used when correlating TMRCA with archaeological events. As I have noted before, the founder of a haplogroup is not the same as the Most Recent Common Ancestor (MRCA) of the present-day population from that haplogroup. This study seems to argue against the Middle Eastern origin of E-V13 suggested by Cruciani et al..

When exactly E-V13 came to the Balkans remains to be seen, but its expansion is properly placed in a Bronze Age rather than Mesolithic time frame. Interestingly, the paper has turned up some additional evidence:
Only four E-M78*, which do not belong to any already described sub-clade, have been observed in the southern Balkans. Two of them (from Greece) turned out to be characterized by the mutation M521 and therefore represent a new M78 lineage.

...

The presence of E-M78* Y chromosomes in the Balkans (two Albanians), previously described virtually only in northeast Africa, upper Nile, gives rise to the question of what the original source of the E-M78 may have been.
This is suggestive that E-V13 expanded from the Balkans out of a pre-existing E-M78 ancestor, almost completely swamping that E-M78 population. When exactly E-M78 arrived in the Balkans, it is difficult to say, since Y-STR variance takes us only as far back as the MRCA who lived in the Bronze Age.

In any case, this paper adds important new data on Balkan Y-chromosomes, although it is unfortunately marred by facile Y-chromosome/archaeological correlations, and the use of the inappropriate evolutionary mutation rate.

At present, I see no reason to change my theory on the expansion of E-V13. The finding that E-V13 is less diverse in Anatolia and the Middle East further reinforces the idea of the Balkan origin of that expansion, while an estimated age in the 2nd millennium BC is consistent with the birth of the Greek world.

The authors write:
Interestingly, J-DYS445-6 and J-M92 (a sub-lineage of M67), both have expansion times between 7000 and 8000 years ago
Converted into non-"evolutionary" ages, these are again consistent with the expansion of the Greek world. J-M92 was correctly associated with the expansion of the Greek world by Di Giacomo et al. (2004) "Y chromosomal haplogroup J as a signature of the post-neolithic colonization of Europe", who did not fall into the evolutionary mutation rate trap.

Also of interest is the discovery of an extremely rare R1a*(xR1a1) in a single Macedonian Greek. Another instance of R1a*(xR1a1) was previously discovered in a Cretan Greek. R1a1 occurred in 16.3% of Greeks from Athens vs. 10.5% of Greeks from Macedonia, the opposite of what was observed by Semino et al. in 2000. R1a1 does not seem to have any clear geographical structure within Greece, which would be expected if it was of more recent introduction.

UPDATE: Having labeled E-V13 Mesolithic, the authors label haplogroups G and J2 as Neolithic. A most interesting observation (Table 1) is that haplogroup J2a-M410* and J2b-M12* have the maximum and second maximum Y-STR variance in the region, being much more diverse than other haplogroups supposedly representing pre-agricultural Europeans.

At least, such a finding should give pause to those who arrive at facile conclusions about ages of human migrations on the basis of Y-STR variance.

European Journal of Human Genetics doi:10.1038/ejhg.2008.249

Y-chromosomal evidence of the cultural diffusion of agriculture in southeast Europe

Vincenza Battaglia et al.

Abstract

The debate concerning the mechanisms underlying the prehistoric spread of farming to Southeast Europe is framed around the opposing roles of population movement and cultural diffusion. To investigate the possible involvement of local people during the transition of agriculture in the Balkans, we analysed patterns of Y-chromosome diversity in 1206 subjects from 17 population samples, mainly from Southeast Europe. Evidence from three Y-chromosome lineages, I-M423, E-V13 and J-M241, make it possible to distinguish between Holocene Mesolithic forager and subsequent Neolithic range expansions from the eastern Sahara and the Near East, respectively. In particular, whereas the Balkan microsatellite variation associated to J-M241 correlates with the Neolithic period, those related to E-V13 and I-M423 Balkan Y chromosomes are consistent with a late Mesolithic time frame. In addition, the low frequency and variance associated to I-M423 and E-V13 in Anatolia and the Middle East, support an European Mesolithic origin of these two clades. Thus, these Balkan Mesolithic foragers with their own autochthonous genetic signatures, were destined to become the earliest to adopt farming, when it was subsequently introduced by a cadre of migrating farmers from the Near East. These initial local converted farmers became the principal agents spreading this economy using maritime leapfrog colonization strategies in the Adriatic and transmitting the Neolithic cultural package to other adjacent Mesolithic populations. The ensuing range expansions of E-V13 and I-M423 parallel in space and time the diffusion of Neolithic Impressed Ware, thereby supporting a case of cultural diffusion using genetic evidence.

Link

September 14, 2008

Y chromosomes of Bayash Romani

Once again, the 0.00069/locus/generation rate is used in this paper, and hence its estimated ages are wrong. The given Y-STR variance for haplogroup H1a in Table 2 is 0.06, which corresponds to an age of ~800 years.

It's interesting though, that Zhivotovsky is a co-author of this paper which states that:
A recent refinement of E1b1b1a-M78 by novel biallelic markers indicates that its subhaplogroup E1b1b1a2-V13 is the most common in Europe (Cruciani et al., 2007). In fact, E1b1b1a2-V13 originated in Western Asia about 11 KYA and expanded in Southeastern Europe about 4.5 KYA, not in connection with the spread of agriculture as traditionally assumed, but rather at the beginning of the Balkan Bronze age, as a consequence of the in situ population increase in the already populated territory (Cruciani et al., 2007).
and he was a co-author of King et al. (2008) which stated that:
The calculated expansion time of haplogroup E3b1a2-V13 in mainland Greece is 8,600 y BP at Nea Nikomedeia and 9,200 y BP at Lerna/Franchthi Cave and is consistent with the late Mesolithic/initial Neolithic horizon. These dates exceed those reported previously for Europe (Cruciani et al., 2007) that date to the Bronze Age. This discrepancy arises mainly because of differences in the choice of mutation rate used.
Peter Underhill was also a co-author of the latter study, and also of the recent paper on Sicily which used germline mutation rates and:
The estimate of Time to Most Recent Common Ancestor is about 2380 years before present, which broadly agrees with the archaeological traces of the Greek classic era.
Mesolithic - Early Bronze Age - classical Greek. Three completely different ages using three different mutation rates: a mutation rate 3.6x slower than the germline rate => Mesolithic. A mutation rate 2.4 to 2.8x slower => Early Bronze Age. A germline mutation rate => classical Greek.

My most recent take. I'll be much surprised if E-V13 turns out to be anything other than 2nd millennium BC in the Balkans.

American Journal of Physical Anthropology doi: 10.1002/ajpa.20933

Dissecting the molecular architecture and origin of Bayash Romani patrilineages: Genetic influences from South-Asia and the Balkans

Irena Martinovi Klari et al.

Abstract

The Bayash are a branch of Romanian speaking Roma living dispersedly in Central, Eastern, and Southeastern Europe. To better understand the molecular architecture and origin of the Croatian Bayash paternal gene pool, 151 Bayash Y chromosomes were analyzed for 16 SNPs and 17 STRs and compared with European Romani and non-Romani majority populations from Europe, Turkey, and South Asia. Two main layers of Bayash paternal gene pool were identified: ancestral (Indian) and recent (European). The reduced diversity and expansion signals of H1a patrilineages imply descent from closely related paternal ancestors who could have settled in the Indian subcontinent, possibly as early as between the eighth and tenth centuries AD. The recent layer of the Bayash paternal pool is dominated by a specific subset of E1b1b1a lineages that are not found in the Balkan majority populations. At least two private mutational events occurred in the Bayash during their migrations from the southern Balkans toward Romania. Additional admixture, evident in the low frequencies of typical European haplogroups, J2, R1a, I1, R1b1b2, G, and I2a, took place primarily during the early Bayash settlement in the Balkans and the Romani bondage in Romania. Our results indicate two phenomena in the Bayash and analyzed Roma: a significant preservation of ancestral H1a haplotypes as a result of considerable, but variable level of endogamy and isolation and differential distribution of less frequent, but typical European lineages due to different patterns of the early demographic history in Europe marked by differential admixture and genetic drift.

Link

August 06, 2008

Sicilian Y-chromosomes: Greek and North African influences

In retrospect, posting my E-V13/Ancient Greek colonization theory a week before the appearance of this article was a very timely move. I was pondering whether I should wait in or post it; I'm glad I did not wait.

And here is the money shot:
The mutation rate used is the average of rates taken from Gusmao et al27 for DYS460 and from the Y Chromosome Haplotype Reference Database (YHRD, http://www.yhrd.org) for the other microsatellites.
I feel slightly vindicated given my recent interest in Y-STR mutation rates.

Also of interest:
Haplogroup R1b1c-M269, the most frequent Y-chromosome Hg in Europeans, is differentially distributed among eastern (18.4%) and western (30.3%) areas of Sicily. ... E3b1a-M78, G2-P15 and J2-M172 show frequencies (0.22, 0.32,0.33), respectively. E3b1a2-V13 is present in both WSI (6.5%) and ESI (5.3%), whereas G2-P15 and J2-M172 are non-randomly distributed, occurring at higher frequencies in the eastern areas of the island ... Furthermore Q-P36- or M242-derived chromosomes also detected significant similarities between Sicily (2.54%) and Lebanese populations (1.53%).
The G2 frequency looks like a typo to me. It's listed as 4.1% (West) and 7.02% (East). J-M241 is more frequent in the West (7.38%) than in the East (1.75%). The paragroup J2*(xM67, J2a1k) is more frequent in the East (14.91%) than the West (6.55%). The overall haplogroup I breakdown is (5.08% I-M253, 1.27 % I-M26, 0.42% I-M223, and 0.85% I*)


European Journal of Human Genetics advance online publication 6 August 2008; doi: 10.1038/ejhg.2008.120

Differential Greek and northern African migrations to Sicily are supported by genetic evidence from the Y chromosome

Cornelia Di Gaetano et al.

Abstract

The presence or absence of genetic heterogeneity in Sicily has long been debated. Through the analysis of the variation of Y-chromosome lineages, using the combination of haplogroups and short tandem repeats from several areas of Sicily, we show that traces of genetic flows occurred in the island, due to ancient Greek colonization and to northern African contributions, are still visible on the basis of the distribution of some lineages. The genetic contribution of Greek chromosomes to the Sicilian gene pool is estimated to be about 37% whereas the contribution of North African populations is estimated to be around 6%.

In particular, the presence of a modal haplotype coming from the southern Balkan Peninsula and of its one-step derivates associated to E3b1a2-V13, supports a common genetic heritage between Sicilians and Greeks. The estimate of Time to Most Recent Common Ancestor is about 2380 years before present, which broadly agrees with the archaeological traces of the Greek classic era. The Eastern and Western part of Sicily appear to be significantly different by the chi2-analysis, although the extent of such differentiation is not very high according to an analysis of molecular variance. The presence of a high number of different haplogroups in the island makes its gene diversity to reach about 0.9. The general heterogeneous composition of haplogroups in our Sicilian data is similar to the patterns observed in other major islands of the Mediterranean, reflecting the complex histories of settlements in Sicily.

Link

July 31, 2008

Expansion of E-V13 explained

E-V13 is the main European clade of haplogroup E. It has been variously interpreted as a signature of early Balkan Bronze Age, or Mesolithic, the Greek colonization of Southern Italy, Greek ancestry in some Pakistanis, or Roman soldiers of Balkan origin in Britain. A proper understanding of its age would help resolve the problem of its origins.

Age, of course, depends on a proper choice of mutation rate, and as I have argued (part I and part II), the proper effective mutation rate is near the germline rate and not 3.6x slower as argued by Zhivotovsky, Underhill, and Feldman (2006). This is especially true for a relatively young haplogroup (very low STR variance compared to other lineages), which is also quite frequent in its area of origin, while much reduced away from it, giving a definite impression of a sudden and relatively recent expansion.

In my previous post, I estimated a Late Bronze Age for E-V13 in Greece and areas affected by historical Greek colonization. I now used Ken Nordtvedt's Generations2 program to obtain estimates of the age of E-V13 in three different datasets: the King set, 12-marker data from the E-M35 Phylogeny Project (Haplozone), as well as E-M78 data -most of which should be E-V13- from Bosch et al. (2006). In the latter set, I used two marker sets: all 12 markers common between Generations2 and Bosch, as well as 8 markers common between them, but excluding markers after DYS392 (in the Generations2/FTDNA order).

N


Age (25y/gen) Age (30y/gen)
Nea Nikomedeia 8
149
1725 BC 2470 BC
Sesklo/Dimini 20
71
225 AD 130 BC
Lerna Franchthi 20
120
1000 BC 1600 BC
Crete 13
68
300 AD 40 BC
Haplozone 103
134
1350 BC 2020 BC
Aromuns (12) 32
71
225 AD 130 BC
Aromuns (8) 32
73
175 AD 190 BC
Slavomacedonians (12) 13
51
725 AD 470 AD
Slavomacedonians (8) 13
59
525 AD 230 AD
Albanians (12) 9
70
250 AD 100 BC
Albanians (8) 9
59
525 AD 230 AD

Both the King et al. E-V13 data, as well as the diverse, mostly European Haplozone E-V13 agree in placing the expansion of this haplogroup squarely in the Aegean Bronze Age.

Aromuns (Vlachs) coalesce to the Roman era, consistent with the idea that they are Balkan natives who became Latinized linguistically at around that era.

Albanians also coalesce to Roman/Late Antique times, consistent with the idea that their high frequency of haplogroup E-V13 (which reaches very high numbers in e.g. Kosovars) is not associated with high diversity. Founder effects in that time frame are the reason for the high frequency of E-V13 in them.

Finally, Slavomacedonians from the former Yugoslav Republic of Macedonia coalesce well into AD times, at around the time of the first Slavic arrivals in the Balkans. This suggests that E-V13 in them is the result of local founders at around that time who adopted the Slavic language. However, Pericic et al. (2005) (see below) report high (but unspecified) diversity of E-M78α in "Macedonia", so it is possible that a larger number of earlier inhabitants were absorbed.

Pericic et al. (2005) give a 7.3kya estimate for the expansion of E-M78α (almost perfectly equivalent to E-V13) for Southeastern European populations north of Greece. Due to their use of the 3.6x slower mutation rate, this figure needs to be converted to equivalent years. The Nea Nikomedeia time depth was estimated as 9.2kya by King et al. Therefore, the equivalent age for the Pericic et al. (2005) expansion is (7.3/9.2) * 149 generations or 118 generations (1,540-950BC). They note that STR variance is higher in Greece, Macedonia, and Apulia, all areas with well-known historical Greek connections.

Cruciani et al. (2007) propose that E-V13 arrived in Europe from West Asia and underwent an expansion in Europe at 4-4.7 kya. This age is calculated using effective mutation rates that are 2.4 or 2.8 slower than the germline rate, which seems to suggest a Late Bronze Age or even later expansion with a rate closer to the germline one.

In the Balkans, it is fairly clear that E-V13 is mostly concentrated south of the Jirecek Line which separated native Greek from Latin speakers. In Italy, the highest frequencies are found in the south, the areas of historical Greek colonization. High frequencies are also attained in Cyprus. Cyprus also high STR diversity, consistent with an early arrival, suggestive of both early Mycenaean and later colonizations from the Aegean.

Conclusion

The age and distribution of E-V13 chromosomes suggest that expansions of the Greek world in the Bronze and later ages were the major causes of its diffusion.

Who was the E-V13 patriarch in Greece? He was perhaps one of the legendary figures of Greek mythology some of whom are said to have come from abroad. For whatever reason, his progeny grew, and were around to participate in the expansion of the Mycenaean world and the subsequent Greek colonization.

UPDATE (Aug. 1):

An additional piece of evidence is Y-chromosome distribution in Calabria, a Southern Italian region with well-known Greek connections. According to Semino et al. (2004) [Am. J. Hum. Genet. 74:1023–1034, 2004], the Calabrian sample has an E-M78 frequency of 16.3%, whereas "Calabria 2" representing the "Albanian community of the Cosenza province" has only 5.9%. This is consistent with the idea that E-V13 in modern Albanians is to a great degree due to Greek founders (Epirotes or ancient colonists).

July 21, 2008

How Y-STR variance accumulates: a comment on Zhivotovsky, Underhill and Feldman (2006)

An important erratum for this post.

Additions to this entry at the bottom (last update July 29)


In recent years, in most population genetics papers, an evolutionary mutation rate for Y chromosome microsatellites (STRs) of 0.00069/locus/generation has been used. This rate was proposed by Zhivotovsky et al. (2004) (pdf), and defended in Zhivotovsky et al. (2005), and especially Zhivotovsky, Underhill and Feldman (2006) (henceforth Z.U.F.)

This mutation rate is smaller than the observed germline mutation rate by a factor of 3-4. The germline mutation rate is observed by counting mutations directly, e.g., in father-son pairs, or in known pedigrees. Zhivotovsky et al. have provided two pieces of evidence in favor of their evolutionary rate:
  • Study of accumalation of STR variation in populations with known founding events, namely Bulgarian Roma and Maori, in their 2004 paper.
  • Simulations indicating a 3.6x discrepancy between the two rates in their 2006 paper, which is due to multiple bottlenecks in a haplogroup's history.
I was always apprehensive about what the "right" mutation rate should be:
We need to obtain good estimates of the mutation rate in order to pinpoint in time the common ancestor of a set of Y chromosomes. A factor of 3, especially for relatively recent events may correspond to a difference between early historical and late Paleolithic events.
Thus, I decided to look into the matter myself to be convinced -one way or another- of what the evolutionary mutation rate must be.

Methodology

The following assumptions, following Z.U.F. are made:
  • A man has 0, 1, 2, ... sons according to a Poisson process with mean m=1.
  • A step mutation (increase or decrease by 1 repeat) occurs with a mutation rate of µ=0.00251
  • STR variance of the man's descendants is measured after g generations.
Results are averaged over N men who have descendants after g generations. I will call such men, "Patriarchs". Thus, I generate random family trees for men until I have harvested N=10,000 of them who have living descendants today.2

Patriarch vs. MRCA

A consequence of the time-forward methodology of simulation, is that a Patriarch may not be the Most Recent Common Ancestor (MRCA) of his descendants g generations into the future. Trivially, if a Patriarch has only one son, then, that son -not the Patriarch- is the MRCA of his descendants. But, even if the Patriarch has many sons, and his group of descendants grows, it is possible (due to randomness of the fathering process) that at some generation only 1 descendant will survive.

Suppose that the Patriarch has lived in generation 0, and the MRCA lived in generation i. Thus, STR variance in the descendants at generation g (today) has accumulated over a time span of g-i generations, since, of course, at the generation i (of the MRCA), STR variance is zero.

Now, if we use a time-forward methodology from known foundation events (e.g. the arrival of the Roma in Bulgaria, or the Maori in New Zealand), it is perfectly right to see how STR variance accumulates from the known foundational event. We would then divide the accumulated STR variance by the known time span to determine an effective evolutionary mutation rate, similar to Zhivotovsky et al. (2004).

But, when the foundational event is unknown, when we are trying to estimate its age, then we can only go as far back as the MRCA, since at his time variance is zero. Therefore, by dividing accumulated variance with the evolutionary mutation rate of Z.U.F., we are over-estimating the time to the MRCA.

For example, with g=100, the average STR variance for the descendants of N=10,000 Patriarchs is 0.0755. But, if we average only those Patriarchs who are also the MRCA of their descendants, we obtain a value of 0.0824, or about 9% higher.

In general, the over-estimate (as a percentage) decreases as g increases: as g increases, the average number of descendants of a Patriarch increases, making them much less susceptible to a variance-reset type of bottleneck described here.

Thus, while the age difference between the MRCA and the Patriarch is real, its effect in the age estimate is not very pronounced. There is, however, a second, and much more serious problem, with the Z.U.F. rates when applied to evolutionary studies.

Prolific vs. Non-Prolific Patriarchs: an Observation Selection effect

Patriarchs starting at generation 0 will have a very variable number of descendants at generation g. By averaging over all of them, we are estimating the average STR variance in the descendants of men who lived g generations ago.

Now, consider how this average changes if we average only over the k most "prolific" men (with the most descendants) out of all the N=10,000 Patriarchs:


k
Average Variance
100
0.1721
1000
0.1407
2500
0.1219
5000
0.1033
10000
0.0755


It is clear that the STR variance in the descendants of the most "prolific" Patriarchs is much higher than in the descendants of the least "prolific" ones. In fact, for the most prolific Patriarchs, variance accumulates near the germline mutation rate, and not at the lower evolutionary effective rate.

Below is the cumulative percentage of the descendants of the k most prolific Patriarchs, with k from 1 to N.

It can be seen that e.g., from the most prolific half of the Patriarchs stems 84% of the descendants. And this, assuming no social inequality in the number of progeny, i.e. each man having the exact same average probability (m=1) of fathering a son. Thus, in reality, the more prolific Patriarchs may have an even larger fraction of the descendants.

Why is this important? Because, in population studies, scientists are likely observe (in the finite samples they collect) multiple descendants only of the most prolific of the Patriarchs. Thus, for the vast majority of the Patriarchs with few descendants, we are likely to sample no, or few of their descendants.

This means that there is an inherent observation selection effect in the types of Patriarchs we are likely to study: they are much more likely to be among the prolific ones. Coupling this observation with the knowledge that STR variance in the descendants of prolific Patriarchs accumulates near the germline mutation rate (0.69µ for the 100 most prolific ones in my experiment), we, once again, conclude that the STR variance in haplogroups likely to be made the object of scientific study accumulates near the germline mutation rate, and at the very least, faster than the evolutionary rate of Z.U.F.

Closing Remarks

Z.U.F. have also proposed two additional demographic scenaria under which a higher effective mutation rate would be observed:
  • A sudden jump in the size of the haplogroup after it appears
  • An expanding population (m>1)
Both factors seem reasonable for post-Holocene human populations. It is well known that -whatever temporary setbacks there were- mankind has overall experienced a substantial population growth in recent millennia. Thus, an expanding population seems like a fair assumption.

Moreover, it is reasonable to assume that in stratified human societies, a few males, (leaders, or conquerors), or groups of closely related males may have generated a disproportionate number of descendants in the short-term.

In summary:
  • The age difference between the Patriarch and the MRCA indicates that Variance/0.00069 overestimates the age of the MRCA somewhat (but not very much).
  • A prolific Patriarch's descendants are more likely to be sampled by scientists, and tend to have a higher STR variance. Hence, Variance/0.00069 overestimates the age of the MRCA, perhaps substantially.
  • Demographic factors, such as population growth, or short-term success by related males indicates that Variance/0.00069 overestimates the age of the MRCA.
In view of the above, and keeping in mind both the stochastic factors that cause STR variance to fluctuate around its expected value, as well as uncertainties in demographic history, I do believe that ages calculated with the evolutionary mutation rate of 0.00069/locus/generation are significantly overestimated.

1 Z.U.F. used a germline mutation rate of µ=0.001. For the purposes of simulation, this is not an important difference, as they themselves note. I choose the rate of 0.0025 because it is closer to the actual human germline mutation rate for STRs.
2 Z.U.F. generated 50,000 men and then averaged over the men who had descendants. I, on the other hand, generate as many men as it takes to harvest at least N men with descendants, to ensure that I average a substantially large number of such men.

Editorial change (Jul 22): erroneously written "exceeds",in paragraph 2, changed to "is smaller than".

Update (July 23):

To further elucidate how the observation selection effect may make lineages seem older than they really are, I carried out another small experiment (g=110, N=10,000, m=1).

The age of each group is inferred by dividing the accumulated variance by the evolutionary rate of 0.0006944 (=μ/3.6).

The average variance over all N in this experiment is 0.0867, thus, the average inferred age is 125 generations, close to the truth (110 generations), allowing for the correction in age between the Patriarch and the TMRCA.

However, if we calculated the average variance over ten groups of 1,000 lineages (out of all N=10,000) according to the number of descendants, we see, as described above, that more "prolific" lineages have accumulated more variance, whereas less "prolific" ones have accumulated less variance than the overall average of 0.0867.

Thus, over the 10% most populous lineages (right of the figure), the average inferred age is 209 generations, or a 90% overestimate of the true age!

But, as I mentioned, it is precisely these populous lineages (which don't just have "some" descendants today, but thousands and millions of them) that are likely to be studied, because they are the only ones that have enough representatives in a sample of 100-1,000 men, typically seen in a population study, to allow for an age estimate via a variance calculation.

Update (July 24): Haplogroup sizes

The number of a Patriarch's descendants after g generations is a random variable which depends on the parameters m (the population growth constant), and g, the number of generations.

Scientists typically look at haplogroups with thousands or millions of existing members. Are such haplogroups produced in the types of simulations performed by Z.U.F.?

I estimate the average size of the haplogroups of the haplogroups produced by Z.U.F. for different g=10,20,...,700 and m=1.

It is evident that this number increases linearly with g at a rate estimated to be 0.5/generation [This was also noted by Z.U.F. who state: "the average size of the surviving haplogroups increased each generation by a value rapidly approaching 0.5"] However, this means, that the average haplogroup at 700 generations has a size of ~350 men.

Thus, not only is the average variance estimated by Z.U.F. inappropriate because of an observation selection effect (averaging over small and large haplogroups alike), but it seems to miss the relevant observations altogether, i.e. the really large haplogroups numbering in the hundreds of thousands or millions. Yet it is precise for such large haplogroups that it has often be used in the literature.

How can we produce "realistic" haplogroup sizes, close to those likely to become an object of scientific study in contemporary human populations? We can either:
  • increase the number of initial representatives, i.e. start with many related men with identical Y chromosomes rather than just 1, or we can
  • increase the population growth constant m to something higher than 1, i.e. a growing population.
Yet, both these changes have the same effect, namely the accumulation of variance at a higher rate than the Z.U.F. rate.

Indeed, Z.U.F. produce some such large haplogroups in some of their simulations (Fig. 1 asterisks, Fig. 2 squares/diamonds), all of which show -predictably- a higher effective rate than their 3.6x slower rate.

They caution against such large haplogroup sizes ["population size exceeds 1 million by generation 1000, which is not realistic for many local tribes."]. Granted, -- if one looks at local tribes never growing to large numbers.

And yet, some or all of the co-authors of Z.U.F. did not limit their use of the 3.6x slower rate to local tribes: Cinnioglu et al. 2004 (pdf), Sengupta et al. (2006), King et al. (2008) all apply the 0.00069 rate for populations (and haplogroups) that have grown to much more than 1 million in less time, thus overestimating severely their age.

Update (July 24): Variance of a large haplogroup

Following the previous observations, naturally, I wanted to see for myself what the STR variance of an ancient lineage with a large number of modern descendants actually looks like. My target size is 1,000,000, which is about 20% of modern Greek males.

I consider two cases:
  • Expansion commencing in the Late Bronze Age (g=120 or 1,600BC with a generation length of 30)
  • Expansion commencing in the early Neolithic (g=300 or 7,000BC)

I harvest N=1,000 haplogroups for each of these cases. I set the growth constant at m=1.100694 for the Bronze Age, and m=1.039122 for the Neolithic. This ensures that enough "large" haplogroups will be generated during simulation. Naturally, the overall population grows at a smaller rate, but the successful lineages will grow much faster than the population average.

Note that I harvest only haplogroups whose MRCA lived in the specified time span. Also, I harvest haplogroups whose final size is between 750,000 and 1,250,000 to match my target size of 1,000,000. Indeed, the average size of the harvested haplogroups is 964,327 for the Bronze Age, and 979693 for the Neolithic.

Here are the results:
  • ~1 million descendants of a Bronze Age (120 generations ago) ancestor have an STR variance of 0.269 +/ 0.087
  • ~1 million descendants of a Neolithic (300 generations ago) ancestor have an STR variance of 0.629 +/- 0.156
If we used the germline mutation rate (μ=0.0025) we would estimate the ages of these haplogroups as:
  • Bronze Age: 107.6 generations, or a 10% underestimate
  • Neolithic: 251.6 generations, or a 16% underestimate
On the other hand, if we used the evolutionary rate of 0.00069 of Z.U.F., our estimates would be:
  • Bronze Age: 389.9 generations, or a 225% overestimate
  • Neolithic 911.6 generations, or a 203% overestimate
It is clear that the Z.U.F. rate of 0.00069 substantially overestimates the ages of large recent haplogroups, whereas the germline rate underestimates them by a little.

Let's look at some concrete examples of age estimates in the literature, where I compare my own (first) estimates with the published ones. Here is how my estimates are derived:

For a Bronze Age ancestor (g=120) it is: 0.269 =(approx) 0.9 μg

For a Neolithic ancestor (g=300) it is: 0.629 =(approx) 0.84 μg

Thus, the correction multiplier, if the variance is between 0.269 and 0.629 is between 0.84 and 0.9; I will use the midpoint 0.87. If the variance is less than 0.269, then I use 0.9. If the variance is more than 0.629 then I use 0.84. Of course, the correction factor could be expressed more accurately as a function of the variance.

Note that the generation length preferred by these authors is 25, by me it is 30. All ages are ky BC.

Cinnioglu et al. (2004)

In this paper, an evolutionary rate of 0.0007 is used.



Variance
Cinnioglu
Dienekes
E-M78
0.18
4.4
0.4
G-P15
0.35
10.5
2.9
I-P37
0.23
6.2
1.1
J-M12
0.24
6.6
1.2
J-M67
0.33
9.8
2.6
R-M269
0.33
9.8
2.6

E-M78 is dated to 400BC, only a couple of centuries after the historical Greek colonization. E-M78 reaches its maximum in the Peloponese, a major source of Greek colonists.

I-P37 and J-M12 are dated to 1,100BC and 1,200BC, at around the time that e.g. the Phrygians from the Balkans are believed to have migrated to Asia Minor. I-P37 and J-M12 reach their maxima in areas north of Greece where the Phrygians are said to have originated.

Sengupta et al. (2006)



Variance
Sengupta
Dienekes
J2-M410
0.38
11.7
3.3
R-M17
0.39
12
3.4
R-M17 (upper caste)
0.26
7.3
1.5
G-P15
0.29
8.5
2
J-M241
0.38
11.8
3.3

Thus, all the exogenous West Asian lineages in India have post-Neolithic ages, with R-M17 having a suggestive age of 1,500BC coinciding with the suggested date for the Indo-Aryans.

King et al. (2008)



Variance
King
Dienekes
J-M12 (Nea Nikomedeia)
0.18
4.7
0.4
E-V13 (Sesklo/Dimini)
0.24
6.6
1.2
E-V13 (Lerna Franchthi)
0.25
7.2
1.3
J-M92 (Crete)
0.14
3.1
0.1 AD
J-M319 (Crete)
0.14
3.1
0.1 AD
E-V13 (Crete)
0.09
1.1
0.8 AD

These are very localized samples, so they should not be interpreted as reflecting expansion times in Greece itself, however, they do suggest a Bronze Age expansion of E-V13 and a much later arrival of E-V13 in Crete.

Note that for Crete, the 1,000,000-haplogroup size assumption is a substantial overestimate, so my age estimates are also substantial underestimates.

Update (July 25): R-M17 in South Siberia

Derenko et al. (2006) "Contrasting patterns of Y-chromosome variation in South Siberian populations from Baikal and Altai-Sayan regions" calculate the variance of R-M17 chromosomes in South Siberia, using the Z.U.F. rate, arriving at an age of 11.3kya corresponding to a value of 0.31. This corresponds to 2,300BC according to my estimate (see previous update).

Recently Bouakaze et al. (Int J Legal Med (2007) 121:493–499) reported the presence of R-M17 chromosomes in ancient inhabitants of South Siberia and the Andronovo culture (2,500BC-1,500BC).

The Andronovo culture is widely believed to be of Eastern European ultimate origin, reflecting the eastward movement of the Kurgan culture, and is associated by some with the ancestors of the Indo-Iranians.

In the Balkans, again in Z.U.F. years, the age of R-M17 is 15.8kya corresponding to variation of 0.44, corresponding to ~4,000BC according to my estimate.

Update (July 25): Baltic Y chromosomes

Lappalainen et al. (2008) use the Z.U.F. rate to estimate the antiquity of lineages in the Baltic region. Dates are ky BC.



Lappalainen Dienekes
I1a
5.7
1
N3
6.8
1.5
R1a1
8.7
1.9

1,000BC for I1a in the Baltic region is within the time frame of the emergence of the Germanic people who did experience a strong demographic growth.
1,500BC for N3 shows a rather late time for Finno-Ugrians. However, it must be noted that smaller demographic sizes would impose more drift, and hence a slower accumulation of variance. Therefore, this time is probably underestimated.
1,900BC for R1a1 is consistent with the northern edge of the expansion of R1a1. Once again, reduced variance may also be influenced by smaller population numbers, making this a possible underestimate.

Update (July 25): Southeastern Europe (the Balkans)

Pericic et al. (2005) use the Z.U.F. rate to estimate ages of Y-chromosome lineages in the Balkans. Dates are ky BC.



Pericic
Dienekes
I1b* (xM26)
8.1
2
E3b1α
5.3
0.9
R-M17
13.8
3.8
R-M269
9.6
2.3
J-M241 (without Kosovars) 1
0.8AD

Thus, Balkan haplogroup I seems related to a Bronze Age origin, with R-M17 being substantially older, and deriving perhaps from northern Balkan Neolithic or alternatively intrusive Kurgan populations. J-M241 seems to be quite young, similar to J-M12 in Nea Nikomedeia (see discussion of King et al. (2008) above).

The young ages of J-M12 and J-M241 also explain the striking inverse correlation between it and J-M410, which makes sense if it expanded later. A fairly late expansion also explains its under-representation in Southern Italy and Anatolia: it appears to be a rather young and "Epirotic" clade that was too late in coming to significantly participate in the historical Greek colonization.

Update (July 26): E3b in Cyprus and Southern Italy

Capelli et al. (2005) [Population Structure in the Mediterranean Basin: A Y Chromosome Perspective] study Y-chromosome variation in many Mediterranean populations including Cyprus. I use a mutation rate of 0.0018 for the six markers used in this study (Quintana-Murci et al. AJHG 68(2) pp. 537 - 542 ). Ages are in ky BC.

I come up with an age of 1.4ky BC for E3b in Cyprus, which is consistent with Mycenaean and later Greek settlements on the island.

I also looked at Southern Italian Y chromosomes. I removed those with values other than (13,12) in DYS19,DYS388), since these are universal in Greek E-V13, in order to remove possible contamination from non E-V13 chromosomes. The resulting age is 900BC, once again very close to the historical Greek colonization of Magna Graecia.

July (26): A more elaborate population growth model

Z.U.F. also propose (Fig. 2 triangles) a more elaborate population growth with:
  • m=1.002 before 400 generations
  • m=1.012 from 400 to to 14 generations ago
  • m=1.12 from 14 to 8 generations ago
  • m=1.25 from 8 generations ago to current time

I ran a simulation (g=1000, N=10,000) with this population growth model. The average size of the descent groups of the MRCAs is 692,982 men. Averaged all of them, variance is 1.37.
  • With the germline mutation rate, an estimate of 549 generations (45% underestimate)
  • With the Z.U.F. evolutionary rate, an estimate of 1,988 generations (99% overestimate)
If we limit ourselves only to the 10, 1000, 5000 most prolific MRCAs (out of the N=10,000), we obtain ages (respectively):
  • With the germline mutation rate: 776, 747, 668 generations
  • With the Z.U.F. evolutionary rate: 2,810, 2,707, 2,419 generations

Thus, one can estimate that STR variance since the time of the MRCA accumulates at a rate of ~0.75μ / generation.

And, yet, the 0.00069 rate has been used to date Paleolithic events, e.g., by Semino et al. (2004) [Am. J. Hum. Genet. 74:1023–1034, 2004], leading to general age overestimates.

Update (July 29)

My discussion is continued in Haplogroup sizes and observation selection effects (continued)