Showing posts with label Y-STR Series. Show all posts
Showing posts with label Y-STR Series. Show all posts

August 27, 2011

Y-STR variance of Busby et al. (2011) dataset

I calculated the Y-STR variance of the Busby et al. (2011) dataset, for both the 10 and 15 Y-STR sets, as well as 4- and 5-most "linear" subsets thereof. Generation length of 31.5 years is used for the calendar year estimates.

My position that Y-STRs are effectively dead for age estimation stands, but I thought it'd be a good exercise to do this, as my personal adieu to more than a decade of Y-STRs: they didn't live up to their promise, but, indirectly, they helped create an entire field of "genetic prehistory" that will live on after their demise.

The greatest contribution of the Busby et al. (2011) paper is that it has cured the naivete of some who bought into the "more STRs = more accuracy" scheme. After this paper all Y-STR based estimates (including my own, above) are suspect.

The non-linearity of the Y-STR mutation model is only one of the problems of Y-STRs. Over the last few years, I've examined many commonly held wrong assumptions about the way Y-STRs have been used:
  1. The "evolutionary" mutation rate and its inflated dates
  2. The lack of appreciation of the true confidence intervals of age estimates (even under a well-behaved, symmetric stepwise mutation model), which are wider than believed by many, once uncertainty about generation length, mutation rates, and the inherent stochasticity of the mutation process is taken into account
  3. A common conflation of haplogroup ages with migration events; a migration event may be actually much older or much younger than the Y-STR variance age, usually the latter, except in rare cases of the colonization of islands or remote regions of the world.
  4. Influence of foreigner contamination or relics in the estimation of population ages.
  5. Impact of population demography to age estimates, even "interclade" ones
From now on I am going on a Y-STR boycott on this blog. Y-STRs still have their obvious uses, for recent genealogy, or forensics. They may also convey some information about human prehistory in the broadest time scales.

But, on the whole, they are worse than useless for the prehistorian: not only do they produce estimates fraught with danger, but also, being the only game in town, are prone to over-interpretation and spurious associations.

Thankfully, it will only be a few years more until we can move past the Y-STR swamp, and into the more promising territory of well-behaved unique event polymorphisms that are currently too costly to type on a large number of samples. Archaeogenetics will also help, although that, too, has its own perils (namely contamination, and the inability to get data from the hot and humid regions of the world).

One way or another, we're bound to know more in the future, and destroying the Y-STR behemoth is the first step toward making some real progress in genetic prehistory.

August 24, 2011

Back to the drawing board for R-M269 (Busby et al. 2011)

I will probably update this entry when I read the actual paper carefully.

Nonetheless, it seems to confirm that the marker set influence on TMRCA estimates that Tim Janzen reported and I highlighted is a nuissance even for a relatively young haplogroup. It is also probably consistent with the idea that Y-STR based estimates are suspect because of deviations from the linear model.

UPDATE I (An epitaph for Y-STRs)

The paper could just as easily have been titled "An epitaph for Y-STRs". Of course, Y-STRs do carry information related to antiquity; and there are so many datasets collected from both academics and genealogist enthusiasts. Thus, they will continue to be used and analyzed for at least a few years more.

Nonetheless, the conclusion is inescepable that a very specific use of Y-STRs on modern populations, with the goal of discovering tight links with archaeological/historical events is all but dead.

The reason is simple: as clocks, they suck. A bad clock is not useless: it gives you some information about time. Moreover, you can often use several to iron out the inaccuracy of any single one of them.

Unfortunately, better estimation through averaging of bad estimators works only in one case: when the estimators are unbiased.

An unbiased estimator has an expected value equal to what you are trying to estimate. For example, suppose that the true age of a founder is 100 generations. For various reasons, bad clocks may give you estimates different than 100: some more, some less.

But, if some of them tend to give you an estimate of around 50 generations, and some of them tend to give you estimates around 200 generations, then averaging them out tells you nothing, except what ratio of slow and fast clocks you used.

Use more fast ones, and get a recent estimate; use more slow ones and get a more ancient one. Here is a figure from the paper, showing age estimates of sub-haplogroups R-S21 vs. R-S116:


The different codes are explained in the supplementary material, but notice the difference between 4.A (the 4 most "linear") and 4.C (the 4 least "linear"). Using a generation length of 31.5 years, these correspond to 8.3ky BP and 2.4ky BP, i.e., a >3-fold difference.

Using "all" 15 Y-STRs (15.all) leads to an age of 3.4ky BP, but the analysis of Busby et al. show how misleading this is: using all 15 Y-STRs is simply averaging out a set of bad clocks: the 3.4ky BP is not dominated by the actual split between the two haplogroups, but is actually an artefact of the set of clocks used.

Here is Table 1 from the paper, notice the last column:

The last column is an estimate of the duration of linearity for a Y-STR. It is basically an estimate (in years) of the time span during which a Y-STR accumulates variance in a predictable (linear) manner, which can be calculated from a combination of the range of the Y-STR (the possible values it can take), and its mutation rate (how often it changes its value).

The basic idea is simple: a big room (great range) allows more freedom of movement before you hit one of the walls; a fly (high mutation rate) is more likely to hit a wall before a tortoise.

A Y-STR with a small range and a high mutation rate is hopeless because its propensity to change its value (high mutation rate) is checked by its smaller range.

Going back to the table, we see that many Y-STRs have linearity durations lower than the middle of the Bronze Age, and some of them much lower. This means that including these Y-STRs will tend to suppress age estimates to make them appear younger.

(to be continued)

UPDATE II: Lack of cline in Europe

The authors showed that the observed east-west clinality of Y-STR variance from Turkey to the Atlantic was spurious and there is no longer any longitudinal pattern of decreasing variation. I made exactly the same point in January 2010, when Balaresque et al. appeared:
Equally flawed is the inference that R1b1b2 is clinal (Figure 2A). Microsatellite variance is not significantly higher in Turkey than in Europe -- even if one makes the questionable assumption that modern Anatolian Turks are patrilineal descendants of Neolithic Anatolians. The significance of the regression line disappears if 1 or 2 data points are excluded, and the plot has a quite visible "gap" between Turkey and Italy corresponding to the entirety of eastern Europe and the Balkans, i.e. the routes that any putative Neolithic lineages would have entered Europe
The authors of the current paper seem to be agnostic as to when R-M269 arrived in Europe. As Dr. Capelli says in an otherwise sensationalist BBC piece:
"At the moment it's not possible to claim anything about the age of this lineage," he told BBC News, "I would say that we are putting the ball back in the middle of the field."
In the actual paper, the lack of an east-west cline is interpreted as inconsistent with the Neolithic model:
the homogeneity of STR variance and distribution of sub-types across the continent are inconsistent with the hypothesis of the Neolithic diffusion of the R-M269 Y chromosome lineage.
Personally, I've often emphasized the huge (underappreciated) confidence intervals associated with Y-STR based estimates, so I appreciate the "caution" part of the paper. I was reading the Haplogroup R page on ISOGG, and the statement...
Haplogroup R1b1a2-M269 is observed most frequently in Europe, especially western Europe, but with notable frequency in southwest Asia. R1b1a2-M269 is estimated to have arisen approximately 4,000 to 8,000 years ago in southwest Asia and to have spread into Europe from there.
... pretty much sums up my views on the subject, although I would add that I consider the most likely place of origin of R-M269 to be in the highlands west and south of the Caspian sea, "complementary" to an early R-M17 distribution in the arc of flatlands north and east of the Caspian.

I think that there are many possible migration routes and possible archaeological correlates of the R-M269 spread, but at the moment, a Neolithic-to-Bronze age dispersal is the more likely hypothesis. Indeed, the Paleolithic hypothesis cannot be saved even with the recognition of the phenomena described in this paper, since, as we have seen even the most "linear" markers produce an 8.3ky BP age. Only a descent to the murky territory of the evolutionary rate can save that hypothesis.

What about the lack of clinality across Europe? A point that is overlooked, I think is that clinality does not necessarily follow from a geographical range expansion. Two additional conditions must hold:
  1. The dispersal must be slow, so that variation begins to accumulate at very different dates at the near and far ends of the expansion range
  2. The number of founder colonists spreading at any stage of the expansion must be very low, otherwise they will carry pretty much all the diversity found in their parent population.
The classic demic diffusion model is dead, so I don't particularly see why would expect to see a cline in Europe in either the case of small pioneer expeditions or folk migrations.


Proc. R. Soc. B doi: 10.1098/rspb.2011.1044

The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269

George B. J. Busby et al.

Recently, the debate on the origins of the major European Y chromosome haplogroup R1b1b2-M269 has reignited, and opinion has moved away from Palaeolithic origins to the notion of a younger Neolithic spread of these chromosomes from the Near East. Here, we address this debate by investigating frequency patterns and diversity in the largest collection of R1b1b2-M269 chromosomes yet assembled. Our analysis reveals no geographical trends in diversity, in contradiction to expectation under the Neolithic hypothesis, and suggests an alternative explanation for the apparent cline in diversity recently described. We further investigate the young, STR-based time to the most recent common ancestor estimates proposed so far for R-M269-related lineages and find evidence for an appreciable effect of microsatellite choice on age estimates. As a consequence, the existing data and tools are insufficient to make credible estimates for the age of this haplogroup, and conclusions about the timing of its origin and dispersal should be viewed with a large degree of caution.

Link

May 14, 2011

Let the Y-STR mutation wars begin!

This should strictly go to the new ESHG abstracts post, but I am sure it will spark a lot of interest, so I am posting it separately. I recently noticed how Y-STR age estimates are dependent on the choice of Y-STRs used, so it will be very interesting to see what Busby and Capelli have come up with.

It is certainly a very good thing to reignite the debate, even though I do believe that Y-SNP based dating in the age of whole genome sequencing will solve many dating problems, especially for old clades of the tree. I have argued at length why the evolutionary mutation rate is wrong, but the more serious problem is the fact that different sets of Y-STRs lead to different age estimates (with slower-mutating ones producing much older ages than fast-mutating ones).


Microsatellite choice and Y chromosome variation: attempting to select the best STRs to date human Y chromosome lineages
G. B. J. Busby, C. Capelli
Recently the debate on the origins of the major European Y chromosome haplogroup R-M269 has reignited, and opinion has moved away from Paleolithic origins to the notion of a younger Neolithic spread of these chromosomes from the Near East. We investigate the young, STR-based Time to the Most Recent Common Ancestor estimates proposed so far for R-M269 related lineages and find evidence for an appreciable effect of microsatellite choice on age estimates. We further expand our analysis to include a worldwide dataset of over 60 STRs which differ in their molecular attributes. This analysis shows that by taking into account the intrinsic molecular characteristics of Y chromosome STRs, one can arrive at a more reliable estimate for the age of Y chromosome lineages. Subsequently, we suggest that most STR-based Y chromosome dates are likely to be underestimates due to the molecular characteristics of the markers commonly used, such as their mutation rate and the range of potential alleles that STR can take, which potentially leads to a loss of time-linearity. As a consequence, we update the STR-based age of important nodes in the Y chromosome tree, showing that credible estimates for the age of lineages can be made once these STR characteristics are taken into consideration. Finally we show that the STRs that are most commonly used to explore deep ancestry are not able to uncover ancient relationships, and we propose a set of STRs that should be used in these cases.

December 30, 2010

How old is Y-chromosome Adam?

The presumed shallow time depth of the human Y-chromosome phylogeny is one of the main arguments of the recent Out-of-Africa theory. One of the major things I found while working on my Y-STR series is that point estimates from Y-STR variation are associated with huge confidence intervals, because of uncertainty about factors such as generation length, population history, mutation rates, even if the mutation model behaves "perfectly" in symmetrical stepwise fashion.

Trouble is, the deeper we go in time, the more uncertain we are about the behavior of our models. That is why I have generally avoided providing any age estimates for events prior to the Neolithic.

Nonetheless, it is interesting to see the state of the art in this area, because claims about the shallow time depth of the human Y-chromosome phylogeny are always flying around, but, if you follow the citation labyrinth, you will soon realize that the whole edifice is erected on sand.

Fortunately, I was recently reminded of a thoughtful post by Tim Janzen on the GENEALOGY-DNA-L from 2009 which is probably the "best thing" when it comes to Y-chromosome age estimation for deep clades of the phylogeny.

The most basal clade in the phylogeny is haplogroup A which is found in Africa. By comparing A chromosomes with those of the BT clade (everyone else), we can arrive at an estimate of Y-chromosome Adam. And, since BT clade contains much structure itself, we can compare A chromosomes with different subclades within BT, e.g., E or J or T.

This is essentially what Tim did: he compared a group of haplogroup A chromosomes with all the major clades of the BT group. Different age estimates produced by this method are not independent, because different haplogroups share more recent common ancestors: for example A vs I and A vs J both contain a common line of patrilineal descent (from the BT founder to the IJ founder). In any case, the different age estimates should all give approximately the same figure, as they are estimating the same quantity: if they do not, this is evidence about the inability of Y-STRs to provide good age estimates.

Tim went a step further, and he did his comparisons on different sets of markers: slow-evolving ones to fast-evolving ones. Again, age estimates with fast vs. slow-evolving markers should give similar age estimates. If they do not, then this means that an age estimate is a product not only of the true age of a lineage, but also of the particular mix of fast- and slow-evolving markers that one uses.

In short: age estimates by comparing haplogroup A with several other haplogroups and by using different sets of markers should be roughly similar. But, that is hardly what happened.

Below is Tim's table of age estimates in years. I have added an extra row and extra column: this contains the standard deviation of each column/row divided by the average (in %), and is useful to quantify how varied the age estimates are across different BT haplogroups and across different marker sets.

The standard deviation of the age estimates across haplogroups is reasonably small, but large enough to render any archaeological correlations useless. The real trouble is in the standard deviation of the age estimates across marker sets: they are higher than 100%!

What this means is that age estimates are largely a function of whether one uses slow- or fast- mutating markers.

Age estimates vary overall between 6,530 years and 535,755! It is obvious that fast/medium mutating markers provide unbelievably small age estimates (most of them are less than 20 thousand years). However, if we limit the analysis to slow mutating markers, most age estimates are in excess of 300,000 years!

In short, you can arrive at any age estimate you want, by choosing a particular mix of slow and fast mutating markers.

It could be argued that using all markers (50 markers column) would provide a better estimate, and, indeed, that estimate is in the order of 40-80ky, which is close to what is usually reported for human Y-chromosomes.

But that is equivalent to having a number of different clocks, some of which tell you that 3 seconds have transpired, and some which tell you that it's been a whole minute. The rational thing to do is not to take an average, but to throw the clocks in the garbage, or figure out what's wrong with them.

Conclusion

At present I am aware of no research that quantifies the depth of the human Y-chromosome phylogeny with anything bearing a semblance of accuracy. The 1000 genomes project has the potential to do this using using relatively well-behaved point mutations rather than Y-STRs, but, in the initial publication no actual age estimates were given, and the samples used to produce Supplementary Figure 7 lacked the most basal part of the tree (both clade A and the next most basal clade B).

UPDATE (Jan 2, 2011):

In a post in GENEALOGY-DNA-L, I show that by using slow- vs. fast-evolving markers using the Ballantyne et al. mutation rates and the tested haplogroup A and haplogroup C 67-marker haplotypes from the respective FTDNA projects, you can arrive at age estimates between 10-219ky.

This has confirmed to my mind that Tim Janzen's numbers about the dependence of age estimates on marker mutation rates are basically correct, and that age estimates about Y-chromosome Adam using Y-STRs are basically useless.

Let's hope that the 1000 Genomes Project will produce the data in the coming year that will allow us to make a better estimate, in terms of number of SNPs between A and non-A chromosomes presented as e.g., (i) a fraction of number of SNPs between human and chimpanzee, or (ii) by dividing with father-son Y-SNP mutation rates; the latter is already estimated but should become better fixed by looking at the father-son pairs included in 1000 genomes project

August 18, 2010

Age estimation of Y chromosome lineages (Adamov & Karzhavin 2010)

A nice paper in the Russian Journal of Genetic Genealogy that addresses the subject of age estimation using Y-STRs. The authors share my sentiments on the subject, and were good enough to compare their simulation results and analytical approximations with my 2008 post. Every post I've written on the subject can be found in the Y-STR series label.

The authors write:


One of numerous critics of «effective» mutation rate is D. Pontikos, who published in 2008 the results of his own calculations in his popular blog (Pontikos, 2008 [8]). Fig. 11 shows the results of Pontikos for a fixed interval of genealogical tree with the final size of 750000 – 1250000 individuals ... Those data match well with approximation (6), and the difference between them is only 0.3% and 1.4%, correspondingly.
I have not checked all the details of this paper, but it should be a good read for anyone interested in the subject. Hopefully as more people look at the evidence, age estimation in mainstream journals will catch up with the state of the art.

I will not repeat the long and involved arguments and observations of my Y-STR series, but to summarize the argument for new readers:

  • Most recent population genetics papers use an "effective" mutation rate that is about 3 times slower than the observed "germline" rate (of father-son pairs) and leads to age estimates that are about 3 times older than is justified.
  • This mutation rate is applicable to the constant population case in which a man has 1 son on average. Population size may vary stochastically under this model, but it generally does not grow to large numbers within the time frame of Homo sapiens. For example, in the 2,000 or so generations since Y-chromosome Adam, a lineage evolving under this model would have 1,000 descendants on average, and the probability that it would have millions of descendants (like most real-world haplogroups in non-tribal populations) is practically zero.
  • If the constant population case does not hold, due to selection, or demographic growth, or social dominance, then the effective rate is not applicable, and age estimates using the germline rate are much closer to the truth.
  • The population sizes of real-world haplogroups are huge and could not have been generated by stochastic variation in a model where each man has 1 son on average. Most Y-chromosome age estimates in the mainstream literature are overestimates, and ascribe Paleolithic origins to Neolithic and Bronze Age founders.
The Russian Journal of Genetic Genealogy, Vol 1, No 2 (2010)

About the influence of population size on the accuracy of TMRCA estimation, done by standard methods using STR locus complex

Dmitry Adamov, Sergey Karzhavin

Abstract

Model calculations of influence of a population growth from the common male ancestor towards the final (present-day) population on the TMRCA estimation have been done. The estimation was made by linear and quadratic methods using STR locus of Y-chromosome. The modeling was done using computer simulation of a tribal population during fixed number of generations.
Universal approximations, allowing estimate the average correction for population effects as a function of the final population size, have been obtained. Authors calculated the variance of age estimations for an initial ancestor, which appears due to different types of population effects. Precision of the ancestral allele determining in a STR from the final population haplotypes set have been studied. An algorithm has been proposed for TMRCA calculation for a paternal (tribal) population, taking into account its total population size.

Link

October 12, 2009

Y-chromosome demographic history (Shi et al. 2009)

Just a quick heads up on this open access paper which seems very important in that it tests a very large number of Y-STR markers on a well-known dataset, and proposes a new recalibration of the "evolutionary mutation rate" that I have criticized elsewhere. I will have to read the paper carefully before passing judgment (Look in this space for updates).

UPDATE (Oct 13):

The paper adds nothing to the issue of the appropriate mutation rate choice for TMRCA estimation. The revised Evolutionary Mutation Rates (rEMR) proposed in this paper are nothing more than an application of the Zhivotovsky et al. (Z. et al.) Evolutionary Mutation Rate (EMR) for markers not included in the original calibration by Z. et al. and exhibiting either higher or lower variance than those that are included. The use of a Z. et al.-like calibration is taken uncritically for granted.

Furthermore, the authors use BATWING to generate genealogies in order to infer TMRCA of lineages and populations, employing their rEMR for this purpose. This is wrong because both the "evolutionary mutation rate" and BATWING take into account genealogy. By using rEMR in conjunction with BATWING they are correcting for loss of Y-STR diversity due to genetic drift twice. This mistake was also done in another paper this Spring. I wrote:
Indeed, in this paper they attempt to use Batwing to estimate ages using the effective rate. Batwing employs a Bayesian method with coalescent simulations, and thus takes into account "population history", the effects of which are supposedly encapsulated in the effective mutation rate. Thus, they are "correcting" (inappropriately of course) for population history twice.
In conclusion: the age estimates provided in this paper (which can be found in Supplementary Table S4) are useless. The paper is, nonetheless, useful, because it shows the relative ages of many haplogroups, even though the small sample sizes for many of them do not inspire confidence in their accuracy.

UPDATE II: The paper also completely ignores admixture as a source of genetic diversity.

Molecular Biology and Evolution, doi:10.1093/molbev/msp243

A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations

Wentao Shi et al.

Abstract

We have investigated human male demographic history using 590 males from 51 populations in the HGDP-CEPH worldwide panel, typed with 37 Y-SNPs and 65 Y-STRs, and analyzed with the program BATWING. The general patterns we observe show a gradient from the oldest population TMRCAs and expansion times together with the largest effective population sizes in Africa, to the youngest times and smallest effective population sizes in the Americas. These parameters are significantly negatively correlated with distance from East Africa and the patterns are consistent with most other studies of human variation and history. In contrast, growth rate showed a weaker correlation in the opposite direction. Y lineage diversity and TMRCA also decrease with distance from East Africa, supporting a model of expansion with serial founder events starting from this source. A number of individual populations diverge from these general patterns, including previously-documented examples such as recent expansions of the Yoruba in Africa, Basques in Europe and Yakut in Northern Asia. However, some unexpected demographic histories were also found, including low growth rates in the Hazara and Kalash from Pakistan, and recent expansion of the Mozabites in North Africa.

Link

May 09, 2009

Gender differences in reproductive success (Brown et al. 2009)

This is a wonderful paper which gathers data to address the issue of how sexual selection operates in men and women. Bateman's principles as presented here are:
  1. greater mating variance in men than women;
  2. greater reproductive variance in men than women;
  3. correlation between mating and reproductive success.

Each man and woman has a certain number of lifetimes sexual mates and a certain number of offspring. While women are more similar to each other, with relatively fewer having too few (or too many) partners/offspring compared to the average, men are more variable, with a few of them having no or many offspring/partners.

The authors bring up the interesting point that greater male variance does not -in itself- substantiate sexual selection as it is often assumed. This is because variance can be either due to selection or to random genetic drift.

A good way to see this (not found in the paper), is to imagine the same set of people living their lives either (a) in the peaceful countryside, or (b) in a big city during a series of air raids. In case (b) variance will be greater, as those killed or maimed by the raids will not mate or reproduce, and the survivors will, whereas in case (a) everyone will have the same a priori opportunities.

So, if everyone has the same number of offspring as everyone else does imply a lack of sexual selection; but, variability in reproductive success does not in itself imply selection. Only when mating and reproductive success (Bateman's third rule) are correlated do we have a good case for sexual selection.

The authors collect data on the male- and female- specific variance in mating and reproductive success, although they note a dearth of data in favor of the third principle. One can't disagree with their call for the collection of relevant data to investigate whether the three principles apply in humans, nor with their observation that what is applicable to fruit flies (the subject of Bateman's original research) does not necessarily apply to humans, and certainly not to all societies (*)

(*) An interesting observation from the paper is that although monogamous societies are a minority of human societies, they tend to encompass the largest number of people. Moreover, in about half of nominally polygamous societies, in practice monogamy is practiced by the great majority of the population.

Below is Table 1 from the paper.


Not related to the subject of this paper, but this gives us the opportunity to examine realistic demographic parameters in simulations such as these, where an assumption of Poisson distributed number of offspring (with mean m) is used. In the Poisson distribution, the variance is also m. As the table above shows, the variance is almost equal to the mean in some populations (e.g., USA), but quite different in others (e.g., 19th c. Sweden); indeed the latter seems more common.

Departure from the Poisson assumption in the direction of greater reproductive variance is entirely consistent with my observations in the above-linked post on the importance of reproductive inequality.

Trends in Ecology & Evolution doi:10.1016/j.tree.2009.02.005

Bateman’s principles and human sex roles

Gillian R. Brown et al.

Abstract

In 1948, Angus J. Bateman reported a stronger relationship between mating and reproductive success in male fruit flies compared with females, and concluded that selection should universally favour ‘an undiscriminating eagerness in the males and a discriminating passivity in the females’ to obtain mates. The conventional view of promiscuous, undiscriminating males and coy, choosy females has also been applied to our own species. Here, we challenge the view that evolutionary theory prescribes stereotyped sex roles in human beings, firstly by reviewing Bateman's principles and recent sexual selection theory and, secondly, by examining data on mating behaviour and reproductive success in current and historic human populations. We argue that human mating strategies are unlikely to conform to a single universal pattern.

Link

May 07, 2009

Citation of my Y-STR mutation rate criticism

I was reading Tuuli Lappalainen Ph.D. dissertation, at the University of Helsinki: "Human genetic variation in the Baltic Sea region: Features of population history and natural selection," and, to my surprise, I noticed that my post on How Y-STR variance accumulates: a comment on Zhivotovsky, Underhill and Feldman (2006) was cited:
Estimating the age of haplogroups is important for connecting genetic patterns to historical phenomena. However, it is dependent on the correct estimation of the mutation rate, which has proven to be difficult. Rates calculated from pedigrees are 3-4 times higher than evolutionary rates (Parsons et al. 1997, Howell et al. 2003, Dupuy et al. 2004, Zhivotovsky et al. 2004, Zhivotovsky et al. 2006), and it is unclear which should be used for the calculation of the most recent common ancestor for major haplogroups in large geographic regions. It has recently been suggested (Pontikos 2008) that the widely used evolutionary rate of the Y chromosome (Zhivotovsky et al. 2004) is strongly underestimating the effective mutation rate due to not accounting for population growth and the bias of analyzing the biggest haplogroups that have grown at rates exceeding the general growth rate of the population. These analyses have not been published in a peer-reviewed journal, but they appear to correctly point out at least some problems of the commonly used models. Thus, the appropriate mutation rate to use for analyzing the temporal scale of the Y-chromosomal haplogroup variation may be a few times lower than was used in II – close to the pedigree rate. The same bias should apply to mitochondrial DNA, too. If the revised rates (Pontikos 2008) were used instead, TMRCAs for the main Y-chromosomal haplogroups I1a, N3 and R1a1 would be in the order of 3000-4000 years before present. These dates would imply that instead of the proposed Neolithic arrival of these haplogroups, their upper age limit would be in late Neolithic or early Bronze Age. Interestingly, the revised age of N3 variation in the Baltic Sea region would actually correspond nicely with the recently suggested Bronze Age arrival of the Finno-Ugric language (Häkkinen 2009). However, given the current uncertainty of the appropriate mutation rates, all time estimates should be used with great caution.
The cited post was the first one in the now extensive Y-STR series in which I have tried to dissect various aspects Y-STR based age estimation.

December 01, 2008

Haplotype outliers and Y-chromosome age estimation

In a large collection of Y-chromosome haplotypes from a specific haplogroup and location, there are invariably a number of outliers, i.e., haplotypes that are too distant to the rest of the group. Consider the following table which presents the number of mutations between pairs of haplotypes (a-f):


a b c d e
b 1



c 2 1


d 1 2 2

e 3 2 1 2
f 7 5 6 6 5

While haplotypes a-e are all within 1-3 mutations of each other, haplotype f is 5-7 mutations away from any other haplotypes. It looks like it "doesn't belong".

Haplotypes such as f present a challenge:
  1. Are they true outliers? They might be an artifact of lab error, or simply extreme examples of normal variation. In the above example, if more haplotypes had been sampled, many more "pals" of f might be found, and it will no longer appear to be isolated.
  2. If they are true outliers, how did they end up in the collection?
This is not simply idle speculation: visit any forum dedicated to genetic genealogy, and you will find both (i) people who have too many exact and close matches and who are urged to upgrade their test results to a higher number of markers so that only their real close relatives "stand out" from the crowd, but also (ii) people who don't have any, or very few close matches, whose haplotypes seem to hang in mid-air, unconnected to any other set of Y chromosomes.

Spawn of the shipwrecked sailor

A popular explanation for outliers is that they are of foreign origin, the result of a chance event. According to this explanation, the distinctiveness of the outliers is due to being the product of a rare occurrence: a shipwrecked sailor, a lost explorer, a slave far from home, and so on.

To substantiate this as an explanation, it suffices to show that what is an "outlier" in a certain population X, is actually normal in another population Y. Then, it can be easily seen that the outlier may have ultimate origins in Y.

Relic of a bygone age

A different explanation is that outliers are relics of a previous age. Consider a country in which some important technological innovation, say farming, or iron, or the bow is introduced. Pretty soon, the inhabitants who acquire the new innovation may multiply in numbers, at the expense of their more isolated neighbors. Fast forward into the future, and the gene pool will be dominated by the closely related haplotypes of the "adopters" and the haplotypes of the "non-adopters" will stand out in the total population as oddities.

Implications for age estimation

Determining the cause of an outlier has important implications for determining the age of the common ancestor of the whole group:
  1. If the outlier is of foreign origin, then one must reject it, and age the remaining, more homogeneous haplotypes. This will lead to a younger age than if the entire group was used.
  2. If the outlier is a relic, then one must incorporate it, and downgrade the statistical weight of the larger more populous group; otherwise the age estimate will be dominated by the recently expanding group. This will lead to an older age than if the entire group was used.
As a practical example, there are 17 mutations for haplotypes a-e (Average = 1.7) and 45 mutations for haplotypes a-f (Average = 3). The average number of mutations between f and the rest is, on the other hand 5.8. If we purged f from the set, we would arrive at a young age (based on 1.7); if we did nothing at an intermediate age (based on 3), and if we treated f and the young group (a-e) on an equal footing at an old age (based on 5.8)

Conclusion

The treatment of outliers in the existing literature is problematic. The default position seems to be not to analyze a haplotype group's substructure, and to use all sampled haplotypes. This may lead to either a substantial overestimation of the age (if foreign outliers are included), or a substantial underestimation (if relic outliers are given equal weight with the more populous main group).

Recommendation

For any collection of haplotypes, the first step should be to calculate the distribution of pairwise distances to detect outliers. Subsequently, a search of public databases or the literature should be performed to see if said outliers appear to be of foreign origin. Depending on this search (*), appropriate correction (inclusion/weighting) should be used in age estimation.

(*) Taking into account that the detection of foreign haplotypes depends on adequate sampling of the source population; hence, no matches in other populations do not imply non-foreign origin.

October 18, 2008

Why Y-STR haplotype clusters are not clades

A Y-chromosome clade is the set of Y-chromosomes descended from a single Y-chromosome (the founder). In human terms, it consists of all the patrilineal descendants of a single man.

Clades are usually defined in terms of unique event polymorphisms (UEPs). Such polymorphisms occur rarely enough to be useful for cladistic analysis and determination of the human Y-chromosome phylogeny. A clade defined on the basis of UEPs is a haplogroup.

There is a misconception among some people that haplotypes, i.e. the alleles at several Y-STR loci can also define a clade. This is, however, impossible, for at least three reasons.

First, those who erroneously define clades based on Y-STR haplotypes do so by means of identification of a cluster of similar haplotypes.

But, this isn't enough. Suppose you identify a cluster of haplotypes, and every pair of them has a genetic distance of at most 3. First, it must be shown that the genetic distance between any haplotype in the cluster and any other haplotype (not in the cluster), must be greater than 3. Suppose you have identified a cluster of haplotypes {a, b, c} and dist(a, b)=3. Now, suppose that there is another haplotype d and dist(a, d) = 3. You are not justified to exclude d from the proposed "clade", since it may share a common ancestor with a that is more recent than the common ancestor of a and b.

Moreover, since age estimates are associated with very wide confidence intervals, it is not guaranteed that greater genetic distance implies an older MRCA. To ensure that a group of Y-chromosomes are part of a clade, you must ensure that other Y-chromosomes have an even greater genetic distance than 3, so great indeed, that it is extremely unlikely that they are closely related to any Y-chromosomes in the haplotype cluster.

Needless to say, none of the folks who propose various "clades" on the basis of Y-STR haplotypes have bothered to prove that their haplotype clusters share a common ancestor that is more recent than that between cluster members and non-cluster members.

Second, suppose that you have identified a very distinctive haplotype cluster that addresses the first concern. Suppose that every pair of haplotypes within this cluster is within a short genetic distance (e.g., 3) and very far from any other haplotype (e.g., more than 15). Is this sufficent to define a clade?

It is not, since you are not certain that you have sampled the relevant Y-chromosomes, i.e., those that bridge the gap between your cluster and other Y-chromosomes, revealing them to be part of a continuum, rather than distinct members of a particular clade.

There are several cases in which supposed clades were defined, e.g., if a marker has a value of 12 or 14 but no intermediate (13) values, only to be invalidated later on when chromosomes with intermediate values popped up.

So, while the first concern identifies the need for clusters to be tight and distinct, the second concern identifies the problem that tight and distinct clusters may be spurious due to incomplete sampling of the genetic continuum.

Third, suppose that you have identified a tight and distinct cluster, and that moreover you have extremely large and comprehensive samples that give you a strong degree of confidence in your cluster. Have you now identified a true clade of the Y-chromosome phylogeny?

The answer is still no, and the reason is the time symmetry of the mutation model of Y-STR loci. Consider the following Y-chromosome tree.


Nodes with capital letters are at most g=4 generations away from the clade founder. It is perhaps possible to devise a test that would be able to detect all these haplotypes as related. But, any test that would identify these haplotypes as descendants of the "founder" node, despite 4 generations of mutations, would also erroneously identify all the smallcase nodes, also at most 4 generations away from the "founder" as members of the clade.

A haplotype cluster centered on a presumed founder who lived g generations ago will invariably include a set of Y chromosomes that do not form a clade.

Whereas a clade includes all the descendants of a single founder, a haplotype cluster will invariably include many men who are g generations away from the founder, whether they are his descendants or not.

Why are Y-STRs qualitatively different from UEPs? While a UEP at the founder defines a watershed moment, separating the founder's descendants (who possess the UEP derived state) from his other relatives (who do not), Y-STRs do not define such a moment: node "m", a cousin of the founder, will possess a haplotype that is 4-generations removed from the "founder", just as node "Q" who is a great great grandson. By looking at haplotypes it is impossible to distinguish between the two.

There is a practical reason why the distinction between haplotype clusters and clades is important, and this has to do with ancient DNA.

Suppose that a very old archaeological sample (of age A years) is Y-STR tested and reveals an R1b-like haplotype. Can we make the inference that this was a member of the R1b clade? No, since many (non-descendant) patrilineal relatives of the R1b founder would have similar haplotypes.

Are we justified in claiming that the founder of haplogroup R1b was earlier than A years? The answer is again no, as haplotypes similar to current R1b ones existed before R1b was founded.

How is this compatible with the known fact that haplogroups can be predicted from sufficiently long Y-STR haplotypes?

First, such predictions don't rely only on the Y-STR haplotypes, but also on large number of haplotypes with known UEP results. Haplogroup prediction relies on UEPs and can't be made independent of UEPs.

Second, such predictions don't rely only on the Y-STR haplotypes, but also on the knowledge that they are present-day haplotypes (last row in the figure). Today, only the descendants of the clade founder survive in the haplotype cluster, but this is not necessarily the truth for earlier times.

Conclusion

Clades cannot be defined based on Y-STR haplotype clusters for several reasons, both practical and theoretical.

On the practical side, it is extremely difficult to define a clade using Y-STRs because haplotype clusters must be shown to be distinctive (clearly separated from other Y-chromosomes) and genuine (separated because of common descent, and not incomplete sampling).

But, even if a clear-cut genuine haplotype cluster is detected, it does not constitute a clade, since the time symmetry of Y-STR mutations necessitates that it will include (erroneously) non-descendant relatives of the founder.

There is nothing wrong with exploratory analysis of haplotype clusters, if one keeps in mind that such clusters are not and should not be thought of as clades of the Y-chromosome phylogeny.

October 13, 2008

Estimating TMRCA for a pair of Y-STR haplotypes using Average Squared Distance (ASD) cont'd

This is a continuation of On the use of average squared distance (ASD) to estimate the time to most recent common ancestor (TMRCA) of a pair of Y-STR haplotypes inspired by James Heald's interesting comments. I suggest that you read that post first if you want to make sense of this one.

Summary of first post

In the first post I studied via simulation, the distribution P(gest | g) where g is the real TMRCA, and gest = ASD(hta, htb)/2μest is its estimate via average squared distance of two haplotypes hta and htb.

It was shown that the expected value E[gest | g] = g, and also that gest varies by quite a lot around g.

E[g | gest]

What can we say about E(g | gest), i.e. the expected value of the TMRCA if we have an estimate of its age? In other words, the expected real age, given its apparent age.

To study this, I sample g from a prior distribution P(g) and determine for each sample (via simulation) the corresponding gest. Thus, for each gest I have a set {g1, g2, ..., gn} (real ages), whose apparent age is gest.

Then:

E[g | gest] = mean{g1, g2, ..., gn}

Figuring out the prior distribution P(g) is of course the real trick. Bruce Walsh suggests (Genetics 158: 897–912) using p(t) = λexp(-λt) with λ=1/Ne, where t is the TMRCA and Νe is the effective population size. He cites an estimate for this Ne=5000 for humans, and discovers that the posterior distribution of t is not very sensitive to the prior.

Various kinds of belief or evidence can be incorporated in the prior. For example, we can set P(g)=0 if g>2,500 since two Y-chromosomes cannot have an MRCA older than "Y-chromosome Adam" -- if we are convinced that 2,500 generations is an upper limit on the age of Y-chromosome Adam.

But, if we are comparing Y-chromosomes of patrilineal descendants of a known founder (e.g., Genghis Khan), then we can set P(g) = 0 if g>40. Conversely, if we are comparing an R1a and R1b haplotype then we can set P(g) = 0 for g less than e.g., 160 generations, since the MRCA of R1a and R1b must be older than the time in which R1a1 has been detected in ancient DNA.

The point is that different real ages can lead to the same observed apparent age. To go from the apparent age to the real one, we can use prior information about it, i.e., the P(g) distribution.

It is important to note that P(g) can be seen as our prior belief about the distribution of g, that is: given two Y-chromosomes, what is our guess about their TMRCA before we see their haplotypes?

But, it can also be seen as the actual distribution of g in the collection of haplotypes from which we have sampled a pair. This will be different e.g., in a rapidly expanding population vs. a static one, or in a population expanding early or late in its history, etc.

Simulation

In the following I use a prior P(g) that is Uniform(1, 2500). I take a million samples from P(g). The number of markers is 50, and the mutation rate is .0025. In the present, I ignore uncertainty about the mutation rate that was the topic of the previous post.

Note that my primary concern isn't to motive this prior as realistic, since P(g) is dependent on prior knowledge (ancient DNA/population history) as mentioned in the previous section. Rather, my goal is to show (i) that E[g | gest] is not generally equal to gest, and (ii) that the choice of prior affects this quantity.

In the following I plot E[g | gest]/gest as a function of gest. For better visualization, and since a particular gest value may not be observed in the simulation, I group gest's into 100-generation long bins, i.e., I show the expected value of g, given that gest is between 1 and 100 generations, 101 and 200, and so on.
It is fairly obvious that E[g | gest] is not equal to gest. For small g it is greater than gest, while for large g it is smaller than gest.

It is easy to see why: consider for example ASD=0 and hence gest=0. It is clearly the case that ASD=0 is compatible with many real ages, all of them are greater or equal to 1. Hence gest=0 is clearly an underestimate of the real age. On the other hand an apparent age of 2,500 generations corresponds to real ages less or equal to 2,500 generations, and hence the expected real age is greater than the apparent age.

Now, consider a different prior: Uniform(1, 1000).
Or Uniform(200, 2500):
It is clear that E[g | gest] is not generally equal to gest and moreover depends on P(g).

Conclusion

To put the conclusions succinctly:
  • If the age g of the common ancestor is known, then ASD is expected to be 2μg.
  • If ASD is known, then the expected age of the common ancestor is not in general expected to be ASD/2μ.
  • The expected age (given an ASD value) varies with ASD, and the way in which it varies depends on the prior estimate of the TMRCA.
In short, expected ASD grows linearly with age; expected age does not grow linearly with ASD.

This adds an extra level of uncertainty about the TMRCA, namely population history. It does not seem to be the case that an unbiased estimate of TMRCA can be estimated from ASD.

Of course, the way forward is, once again, to increase nm and pin down the mutation rate more accurately. This will increase the effect of the "evidence" in a Bayesian analysis, making it less susceptible to the background knowledge or belief represented by the prior.

October 05, 2008

On the use of average squared distance (ASD) to estimate the time to most recent common ancestor (TMRCA) of a pair of Y-STR haplotypes

UPDATE: See second part here.

In this post I study the effectiveness of average squared distance (ASD) to estimate the age of the most recent common ancestor (MRCA) of a pair of Y-STR haplotypes.

Each haplotype is a vector of nm allele values:

hta = (a1 a2 ... anm)
htb = (b1 b2 ... bnm)

The average squared distance is defined as:

ASD(hta, htb) = [(a1-b1)2+(a2-b2)2+ ... +(anm-bnm)2]/nm

If the MRCA lived g generations ago, then the expeted value of ASD(hta, htb) is:

E[ASD(hta, htb)] = 2μg

where μ is the Y-STR mutation rate; the symmetric stepwise mutation model is assumed, in which a Y-STR allele increases or decreases by 1 repeat per generation with a probability of μ/2 each.

The above equation allows us to estimate g as:

gest = ASD(hta, htb)/2μest (Eq. 1)

Where, μest is the estimated mutation rate. This post investigates how accurate gest is.

Methodology

Two independent g-generation long chains of Y-chromosome transmissions are simulated, leading to two present-day haplotypes. Haplotypes are nm-marker long.

Each marker has an estimated mutation rate μest=0.0025. This estimate is assumed to be derived from direct observation on nfs father-son pairs. Hence, each marker mutates with a real mutation rate that is binomially distributed according to Binomial(nfs, μest)/nfs.

Estimates of the mean gest, its standard deviation and the 95% C.I. interval (2.5-97.5%) are presented over 10,000 simulation runs.

Results are presented for nm=10 or 50, to represent typical values for a research paper or commercial genealogical samples, respectively, and with nfs=1000 or 10000. The mutation rate of one of the most studied Y-STR loci, DYS19 is based on 9,390 observations as of this writing, and many other markers have established mutation rates based on a much lower number of samples.

Results

The following table summarizes the simulation results:

g nm nfs Mean(gest) s.d. (gest) 95% C.I.
100 10 1000 99 71 20-280
200 10 1000 201 131 40-540
300 10 1000 301 190 60-780
400 10 1000 399 243 80-1000
500 10 1000 500 298 120-1240
600 10 1000 601 354 140-1500
100 10 10000 101 65 20-260
200 10 10000 202 112 60-480
300 10 10000 302 159 80-700
400 10 10000 399 204 120-900
500 10 10000 499 248 140-1120
600 10 10000 601 297 180-1320
100 50 1000 100 33 48-176
200 50 1000 201 58 108-332
300 50 1000 301 84 164-492
400 50 1000 401 110 224-648
500 50 1000 500 133 276-792
600 50 1000 601 160 336-960
100 50 10000 100 28 52-164
200 50 10000 200 50 116-308
300 50 10000 299 71 176-456
400 50 10000 400 91 244-600
500 50 10000 500 113 308-744
600 50 10000 601 132 372-892

It is evident that:
  • The age estimate (Eq. 1) is unbiased, as Mean(gest) is quite close to the real g
  • The standard deviation of the age estimate increases with g in absolute value, but decreases in relative value (s.d. (gest)/g).
  • The standard deviation of the age estimate decreases with both nm (more markers) and nfs (better estimate of the mutation rate)
  • Even for nm=50 and nfs=10000, there is considerable uncertainty about the TMRCA. For example, a 300-generation most recent common ancestor can appear to be as young as 176 generations, or as old as 456 generations, or a length of 280 generations. If we add our uncertainty about generation length (e.g., 25 or 30 years), this corresponds to 9,280 years, and stretches from the Bronze Age to the Upper Paleolithic.

Discussion

While ASD provides an unbiased estimator of TMRCA for a pair of haplotypes, it can provide -at present- a very imperfect estimate because of:
  1. Stochasticity of the mutation process itself
  2. Inaccurate knowledge of the mutation rate
  3. Inaccurate knowledge of the generation length

The age estimate is, in fact, probably even worse, since the current simulation did not take into account:
  1. Deviations from the stepwise symmetric mutation model (multi-step increases/decreases in number of repeats)
  2. Lineage or allele-dependent mutation rate

Conclusion

In a few years, when every bit of variable DNA on the Y-chromosome will be sequenced routinely, including Y-STRs, Y-SNPs, and indel polymorphisms, it will be possible to provide better TMRCA estimates for a pair of Y-chromosomes. For Y-STRs it is important to determine the mutation rate in even larger samples than are currently available (~10,000).

There will always be some residual uncertainty, e.g., because we will never be able to determine the generation length for prehistoric cultures. However, our estimates are likely to be much better than the ones possible today, which are really not much better than guesses.

It is important to be skeptical of low confidence intervals associated with many published age estimates. The assumptions on which these intervals are based are rarely stated explicitly, and may assume (inappropriately) that only one type of uncertainty (of at least five types; see Discussion) are at play.

September 18, 2008

Erratum on my Y-STR variance work (which, surprisingly further supports its central thesis)

I realized today that there was a central error in my assumptions today on the entire Y-STR series. This error related to the condition I used to detect whether a man was also a Most Recent Common Ancestor:

Lineages start with one man, and over the generations grow (or shrink) in size as men have (or fail to have) sons. Previously, I considered that there was a new MRCA in the lineage if the number of descendants in one generation was reduced to 1 man.

So, for example let these be the number of patrilineal descendants in the first few generations:

Generation # of descendants
0 1 (Patriarch)
1 3
2 4
3 6
4 3
5 1
6 3
7 2


I concluded that the "Patriarch" (at generation 0) was not the MRCA (correct) and that the new MRCA was the single surviving man in generation 5 (not necessarily correct).

It is true that at generation 5, the man becomes the new MRCA, since no other patrilineal cousins of his exist at that time.

BUT, it is also possible for e.g., one of the 3 men of generation 6 to be an MRCA. All it takes is for the next generation (7) to be produced entirely by him.

So: the single survivor at generation 5 has three sons in generation 6, thus he is temporarily the MRCA. However, only one of his three sons produces the two men of generation 7, and thus this son becomes the new MRCA.

In retrospect this is an obvious mistake to make, and is the result of using a sufficient condition for an MRCA (the lineage reduced to one man) which is not however necessary for a man to be an MRCA.

What are the consequences?

I will be issuing a bug fix for the Y-chromosome Microsatellite Genealogy Simulator over the next few days. But, it's interesting to see how this error affects the story I've been defending throughout the posts in the Y-STR series, namely that Y-STR variance accumulates at near the germline mutation rate for large observable present-day haplogroups.

For any particular lineage, the MRCA I calculated so far was no younger, and sometimes older than the real one. Consequently, Y-STR variance has accumulated at an even faster rate since the time of the real MRCA, even closer to the germline rate.

At this point I am not in a position of determining how big this effect will be, and I will repeat and report some of my earlier experiments. A few exploratory runs so far, have revealed a small but not insignificant increase in the effective mutation rate.

Stay tuned...

UPDATE #1

I carried out some experiments to see how different in age are the previously calculated "MRCA" and the real MRCA. To differentiate between the two, I will call the previous "MRCA" as a "Near Extinction" event, where a lineage is reduced to one man in one generation. The real MRCA is simply a man of the lineage who produces the entire next generation.

As I mentioned in the very first post of this series, at the MRCA, variance is reset to 0; thus variance is of no use to determine the length of time that has elapsed between the Patriarch and the MRCA. This time can be estimated, however, from demographic considerations.

g m TMRCA s.d TNearExtinct s.d
50 1 40 16 43 13
50 1.05 44 10 46 9
100 1 86 26 91 22
100 1.05 95 10 97 7

It can be seen that (i) the TMRCA is fairly close to the TNearExtinct, and they are both fairly close to the Patriarch, whose time is g. This closeness (in relative terms) increases as both g and m (the average number of sons/man) increase. Both TMRCA and TNearExtinct can be quite variable, as evidenced by their respective s.d.'s, but this variability decreases as m increases.

UPDATE #2

I have repeated my experiments on the effective mutation rate. You can compare the results directly, but it's obvious that the discrepancy exists somewhat only for m=1 and the results are almost identical otherwise. Thus, the problem identified in this erratum does not appear to affect by much all my past inferences about TMRCA and its relationship to Y-STR variance.

g m Size Var ASD μ/wv μ/wa
50 1.00 32 0.043 0.058 2.9 2.2
50 1.01 41 0.047 0.062 2.6 2.0
50 1.02 53 0.052 0.068 2.4 1.8
50 1.03 72 0.057 0.072 2.2 1.7
50 1.04 98 0.061 0.077 2.1 1.6
50 1.05 135 0.065 0.080 1.9 1.6
50 1.10 813 0.086 0.099 1.5 1.3
100 1.00 58 0.081 0.106 3.1 2.4
100 1.01 99 0.097 0.123 2.6 2.0
100 1.02 185 0.118 0.147 2.1 1.7
100 1.03 360 0.135 0.163 1.9 1.5
100 1.04 757 0.153 0.180 1.6 1.4
50 1.00 31 0.043 0.058 2.9 2.2
50 1.01 41 0.048 0.063 2.6 2.0
50 1.02 54 0.052 0.067 2.4 1.9
50 1.03 73 0.058 0.074 2.2 1.7
50 1.04 97 0.064 0.081 2.0 1.5
50 1.05 134 0.065 0.081 1.9 1.5
50 1.10 813 0.085 0.098 1.5 1.3
100 1.00 58 0.082 0.108 3.0 2.3
100 1.01 99 0.101 0.130 2.5 1.9
100 1.02 184 0.120 0.151 2.1 1.7
100 1.03 362 0.135 0.163 1.9 1.5
100 1.04 749 0.154 0.181 1.6 1.4
100 1.05 1593 0.168 0.193 1.5 1.3
200 1.00 111 0.162 0.208 3.1 2.4
200 1.01 347 0.228 0.280 2.2 1.8
200 1.02 1443 0.293 0.342 1.7 1.5
200 1.03 7132 0.350 0.391 1.4 1.3
200 1.04 38123 0.387 0.421 1.3 1.2
400 1.00 212 0.300 0.377 3.3 2.7
400 1.01 2767 0.589 0.670 1.7 1.5
400 1.02 78149 0.768 0.825 1.3 1.2