May 28, 2010

Accuracy of molecular dating with the rho statistic questioned (Cox et al. 2008)

The rho statistic was proposed in 1996 by Forster et al. This paper addresses its limitations, pointing out that we should take age estimates with a grain of salt. Rho is primarily used with mtDNA but it has been used with Y-chromosomes as well.

I have argued before about the influence on population demography on molecular age estimation methods, as well as the generally large confidence intervals associated with such methods. Quite often papers ignore demography in calculating confidence intervals, but it's not only mutation rate uncertainty, but also population size dynamics, generation length, and other factors that contribute to stretch confidence intervals wide.

In general, there is usually lack of knowledge about demography; some models are patently false, e.g., models of constant population size are incompatible with lineages with multi-million extant descendants. While I believe that in the last few thousand years population growth has been more or less continuous -with short fluctuations that have negligible effects on diversity- things become more complex as we move further into the past: population sizes shrink and stochastic factors take the upper hand.

We've seen time and again that inferences from modern populations don't jibe well with prehistoric DNA samples, so any paper that points out the limitations of such inferences is a welcome addition to the literature.

Human Biology 80(4):335-357. 2008
doi: 10.3378/1534-6617-80.4.335

Accuracy of Molecular Dating with the Rho Statistic: Deviations from Coalescent Expectations Under a Range of Demographic Models

Murray P. Cox

Abstract

The ρ statistic is commonly used to infer chronological dates for molecular lineages, especially from mitochondrial DNA sequences obtained in anthropological contexts. Since this approach was described 12 years ago, it has been applied to estimate molecular dates in more than 200 studies, including some published in top-tier journals. However, this method has not been well evaluated, and the accuracy of dates obtained from the ρ statistic remains unknown, especially for genetic data collected from populations with complex demographic histories. Here, molecular dates inferred from ρ are compared against coalescent expectations from a range of demographic models. This exercise reveals considerable inaccuracy. Molecular dates based on ρ have a slight downward bias with large asymmetric variance and commonly exhibit substantial type I error rates, where the true age of a lineage falls outside the 95% confidence bounds derived from the variance of ρ. Furthermore, demography proves to be a strong confounding factor in estimating molecular dates accurately, especially for populations in which bottlenecks, founder events, and size changes have played important historical roles. Therefore considerable caution should be applied to inferences made from molecular dates based on the ρ statistic, many of which may be misleading and warrant considerable

2 comments:

Marcel F. Williams said...

Anyone who takes any molecular clock date seriously is fooling themselves.

McG said...

As Dienekes says recent population growth has been relatively undisturbed by major climactic changes. Certainly, war has affected some population more than others. I have used pedigree mutations rates on somes sets of data 2K years older or less. I find that history and the dates obtained correlate well. However, when I try to make longer term estimates, I feel like time is compressed. I know that Dienekes doesn't agree with the analysis of ZUV, et.al. Yet, I feel something like a factor of 2 to 3 is needed to expand time.