I will probably update this entry when I read the actual paper carefully.
Nonetheless, it seems to confirm that the marker set influence on TMRCA estimates that Tim Janzen reported and I highlighted is a nuissance even for a relatively young haplogroup. It is also probably consistent with the idea that Y-STR based estimates are suspect because of deviations from the linear model.
UPDATE I (An epitaph for Y-STRs)
The paper could just as easily have been titled "An epitaph for Y-STRs". Of course, Y-STRs do carry information related to antiquity; and there are so many datasets collected from both academics and genealogist enthusiasts. Thus, they will continue to be used and analyzed for at least a few years more.
Nonetheless, the conclusion is inescepable that a very specific use of Y-STRs on modern populations, with the goal of discovering tight links with archaeological/historical events is all but dead.
The reason is simple: as clocks, they suck. A bad clock is not useless: it gives you some information about time. Moreover, you can often use several to iron out the inaccuracy of any single one of them.
Unfortunately, better estimation through averaging of bad estimators works only in one case: when the estimators are unbiased.
An unbiased estimator has an expected value equal to what you are trying to estimate. For example, suppose that the true age of a founder is 100 generations. For various reasons, bad clocks may give you estimates different than 100: some more, some less.
But, if some of them tend to give you an estimate of around 50 generations, and some of them tend to give you estimates around 200 generations, then averaging them out tells you nothing, except what ratio of slow and fast clocks you used.
Use more fast ones, and get a recent estimate; use more slow ones and get a more ancient one. Here is a figure from the paper, showing age estimates of sub-haplogroups R-S21 vs. R-S116:
The different codes are explained in the supplementary material, but notice the difference between 4.A (the 4 most "linear") and 4.C (the 4 least "linear"). Using a generation length of 31.5 years, these correspond to 8.3ky BP and 2.4ky BP, i.e., a >3-fold difference.
Using "all" 15 Y-STRs (15.all) leads to an age of 3.4ky BP, but the analysis of Busby et al. show how misleading this is: using all 15 Y-STRs is simply averaging out a set of bad clocks: the 3.4ky BP is not dominated by the actual split between the two haplogroups, but is actually an artefact of the set of clocks used.
Here is Table 1 from the paper, notice the last column:
The last column is an estimate of the duration of linearity for a Y-STR. It is basically an estimate (in years) of the time span during which a Y-STR accumulates variance in a predictable (linear) manner, which can be calculated from a combination of the range of the Y-STR (the possible values it can take), and its mutation rate (how often it changes its value).
The basic idea is simple: a big room (great range) allows more freedom of movement before you hit one of the walls; a fly (high mutation rate) is more likely to hit a wall before a tortoise.
A Y-STR with a small range and a high mutation rate is hopeless because its propensity to change its value (high mutation rate) is checked by its smaller range.
Going back to the table, we see that many Y-STRs have linearity durations lower than the middle of the Bronze Age, and some of them much lower. This means that including these Y-STRs will tend to suppress age estimates to make them appear younger.
(to be continued)
UPDATE II: Lack of cline in Europe
The authors showed that the observed east-west clinality of Y-STR variance from Turkey to the Atlantic was spurious and there is no longer any longitudinal pattern of decreasing variation. I made exactly the same point in January 2010, when Balaresque et al. appeared:
Equally flawed is the inference that R1b1b2 is clinal (Figure 2A). Microsatellite variance is not significantly higher in Turkey than in Europe -- even if one makes the questionable assumption that modern Anatolian Turks are patrilineal descendants of Neolithic Anatolians. The significance of the regression line disappears if 1 or 2 data points are excluded, and the plot has a quite visible "gap" between Turkey and Italy corresponding to the entirety of eastern Europe and the Balkans, i.e. the routes that any putative Neolithic lineages would have entered Europe
The authors of the current paper seem to be agnostic as to when R-M269 arrived in Europe. As Dr. Capelli says in an otherwise sensationalist BBC piece:
"At the moment it's not possible to claim anything about the age of this lineage," he told BBC News, "I would say that we are putting the ball back in the middle of the field."
In the actual paper, the lack of an east-west cline is interpreted as inconsistent with the Neolithic model:
the homogeneity of STR variance and distribution of sub-types across the continent are inconsistent with the hypothesis of the Neolithic diffusion of the R-M269 Y chromosome lineage.Personally, I've often emphasized the huge (underappreciated) confidence intervals associated with Y-STR based estimates, so I appreciate the "caution" part of the paper. I was reading the Haplogroup R page on ISOGG, and the statement...
Haplogroup R1b1a2-M269 is observed most frequently in Europe, especially western Europe, but with notable frequency in southwest Asia. R1b1a2-M269 is estimated to have arisen approximately 4,000 to 8,000 years ago in southwest Asia and to have spread into Europe from there.... pretty much sums up my views on the subject, although I would add that I consider the most likely place of origin of R-M269 to be in the highlands west and south of the Caspian sea, "complementary" to an early R-M17 distribution in the arc of flatlands north and east of the Caspian.
I think that there are many possible migration routes and possible archaeological correlates of the R-M269 spread, but at the moment, a Neolithic-to-Bronze age dispersal is the more likely hypothesis. Indeed, the Paleolithic hypothesis cannot be saved even with the recognition of the phenomena described in this paper, since, as we have seen even the most "linear" markers produce an 8.3ky BP age. Only a descent to the murky territory of the evolutionary rate can save that hypothesis.
What about the lack of clinality across Europe? A point that is overlooked, I think is that clinality does not necessarily follow from a geographical range expansion. Two additional conditions must hold:
Proc. R. Soc. B doi: 10.1098/rspb.2011.1044
- The dispersal must be slow, so that variation begins to accumulate at very different dates at the near and far ends of the expansion range
- The number of founder colonists spreading at any stage of the expansion must be very low, otherwise they will carry pretty much all the diversity found in their parent population.
The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269
George B. J. Busby et al.
Recently, the debate on the origins of the major European Y chromosome haplogroup R1b1b2-M269 has reignited, and opinion has moved away from Palaeolithic origins to the notion of a younger Neolithic spread of these chromosomes from the Near East. Here, we address this debate by investigating frequency patterns and diversity in the largest collection of R1b1b2-M269 chromosomes yet assembled. Our analysis reveals no geographical trends in diversity, in contradiction to expectation under the Neolithic hypothesis, and suggests an alternative explanation for the apparent cline in diversity recently described. We further investigate the young, STR-based time to the most recent common ancestor estimates proposed so far for R-M269-related lineages and find evidence for an appreciable effect of microsatellite choice on age estimates. As a consequence, the existing data and tools are insufficient to make credible estimates for the age of this haplogroup, and conclusions about the timing of its origin and dispersal should be viewed with a large degree of caution.