November 05, 2008

Time-dependency of the human mtDNA evolutionary mutation rate

UPDATE (18/11): Some more thoughts on this topic in a newer post [end update]

I had planned to write a post titled: "On the difficulty of archaeological calibration of the mutation rate", for some time. My goal was to follow on my criticism of the proposed explanation for a supposedly lower evolutionary rate, by a criticism of the alleged fact that such a lower rate is proven by archaeological calibration.

So, I was pleasantly surprised to see a new paper which allows me to frame my thoughts in a concrete context. The new paper's purpose is precisely this: to calibrate the evolutionary mutation rate archaeologically.


A molecular clock typically works according to this generic equation:


Some measure of VARIATION is obtained in the present time (e.g. the ρ or π statistics in this paper), and some estimate of the MUTATION RATE is established. This allows us to calculate an estimate of TIME.

How is the MUTATION RATE established? Either by direct measurement, i.e., by looking at relatives and seeing how their genotypes actually differ from each other, or by calibration whereby present-day VARIATION is measured for events with (supposedly) known TIME, leading to an estimate of the MUTATION RATE as VARIATION/TIME.


Two typical examples of calibration go like this:

  1. Humans and Chimps split T millions of years ago. The amount of human-chimp differentiation is D. Therefore, the human-chimp divergence rate was on average D/T over the time period of T years.
  2. The ancestors of Native Americans arrived T thousands of years ago. Variation within one of their founding lineages (e.g., mtDNA haplogroup D1) is D. Therefore, variation increased by a rate of D/T over the time period of T years.
What is the problem with this kind of calibration? First of all, the supposed calibration time T may not be generally known. The last common ancestor of Homo and Pan hasn't been dug up yet. At most, we have some fossils believed to be ancestral to both genera, but these are (by definition) older than the most recent common ancestor. We don't really have a secure time for T, only an iffy upper bound.

But, consider more well-established archaeological events like the arrival of the Native Americans which is based on carbon dating and surveys of many sites across the continent. This, apparently gives us a secure T. Or does it?

There are at least two reasons why it does not: demography and selection.

The arrival of Europeans in the new World is perfectly known: it started in 1492. Pre-existing migrants such as the Vikings do not appear to have made a lasting contribution. Yet, if we calculate diversity within European-origin haplogroups in the New World, we will find that they are pretty much as diverse as they are in Europe, Why? Because the migration involved a large number of migrants.

If, a few thousand years from now, after a collapse and rebirth of human civilization, geneticists look at the genes of Americans of that future time, they might conclude that Amerindians and Europeans arrived to the continent at the same time, or even that lineages of the latter (e.g., mtDNA haplogroup U5) preceded those of the former (e.g., mtDNA haplogroup D1).

But, suppose that a limited number of migrants arrive at the archaeologically calibrated time, i.e., a "founder effect", e.g., a single mtDNA D1 "mother" arrived in the New World T years ago.

How do we know that she is the most-recent-common anestor of present-day D1 women from the New World?

We know that she is a common ancestor, but not necessarily the most recent common ancestor (MRCA). Indeed, if selection is at play, then particular lineages overwhelm those of their relatives, and an D1 woman with an advantageous lineage, who lived long after the first D1 woman, may be the real MRCA.

Thus, archaeological calibration is often an illusion: lineages may appear to be older or younger than the calibration time, depending on the population's demography and the effects of selection.


The authors of this study take for granted that there is a discrepancy between the mutation rate created by measurement and calibration (germline vs. evolutionary). Yet, a recent paper paints quite a different picture, finding no difference between the two rates. How do they arrive at such a conclusion? By looking directly at the gene pool of a population in two different points in time (using ancient DNA), and not relying on calibration.


These criticisms aside, the current paper does have some important implications about the evolutionary rate. Its central idea is that the molecular clock works like this:


i.e., that the MUTATION RATE is itself a function of time. The above equation, like its previous simpler version allows us to estimate TIME from VARIATION. But, it introduces an additional complication. How does the evolutionary mutation rate vary across different time scales?

This, in itself, is a step in the right direction. For example, in human Y-chromosomes a slower evolutionary rate was proposed by Zhivotovsky et al. (2004) (pdf) based on ~1,000 year histories of Bulgarian Gypsies and Polynesians, and a very different rate was proposed by Forster et al. (2000), using a different calibration based on Native American prehistory. Yet, the more widely used Zhivotovsky rate has been used to age 2,700 or 60,000-year-old haplogroups, as if it was equally applicable in both time scales.

The authors of this paper make some interesting comments:
Genealogy-based rate estimates between 2,500-5,000ya are indistinguishable from pedigree-based mutation rate estimate (tables 1-2).
So, for events of the recent past, to at least the Bronze Age, "pedigree" and "evolutionary" rates are indistinguishable. Lineages that appear to be of Bronze Age origin based on the pedigree rate, are indeed that old, and not Neolithic or Paleolithic as might be predicted by use of a slower "evolutionary" rate.

The time period between 5,000ya and ~15,000ya represents both a break in our dataset and a sudden decline in estimated mutation rate (fig. 2). Estimates for the intermediate period roughly 15,000ya are about 40% lower than estimates calibrated on arrival dates less than 5,000ya
This is a break in their dataset because, well, they don't have any calibration points between 5-15ky. Thus, their estimate is nothing more than an extrapolation based on calibration points younger than 5ky or older than 15ky. But still, their estimates for even 15ky (which upper bounds all the Neolithic and Mesolithic for humans) are only 40% lower than using pedigree rates. Thus, even if we accept this extrapolation, ages between 5 and 15ky using the pedigree rate may be somewhat underestimated, but certainly not by a huge factor.

Using simulation, Zhivotovsky, Underhill and Feldman (2006) showed that microsatellite mutation rates estimated from small populations (haplogroups) undergoing serial bottlenecks are indeed reduced compared to pedigree-based rates.

The key word here is small, which is why using such reduced rates for haplogroups that consist of tens of millions of men is nonsense.

The striking difference between our mutation rate estimates from before and after 20,000 years ago suggests that demographic history may play an important role. Prior to the Last Glacial Maximum lasting between 22,000-15,000ya, human populations were characterized by small, mobile hunter-gatherer groups that may have been frequently subject to fluctuations in population size. Following the LGM, humans experienced far fewer climatic swings (Mithen 2004). Particularly after the Younger Dryas cycle, agriculture facilitated dramatic population growth and serial bottlenecks were unlikely to substantially reduce the diversity of such large populations. A dramatic increase in population size during the Neolithic period is supported by mtDNA genomes from African, southeastern Asian, and European populations (Gignoux C, Henn B, unpublished data). We propose that a population history consisting of serial bottlenecks followed by recent population growth currently provides the most compelling explanation for the time-dependency of human hypervariable and coding region mtDNA mutation rate estimates.
In simple English, they propose that the evolutionary mutation rate for large Neolithic populations was close to the pedigree rate, while for the smaller pre-Neolithic hunter-gatherer groups it was much lower.

Which is pretty much what I've been saying for the last few months:
But, if you read Zhivotovsky, Underhill and Feldman (2006) or my two previous posts on the subject, you will realize that the effective rate depends on population history; that the 0.00069 rate is derived for constant-sized populations where haplogroups never grow to large numbers.
Of course, if one studies numerically small populations, it is possible that a slower effective rate may be desired. My concern is with the large human populations (e.g. Greeks or Indians) where real haplogroup sizes exceed greatly those produced by simulations with reproductive equality.
Z.U.F. have also proposed two additional demographic scenaria under which a higher effective mutation rate would be observed:
  • A sudden jump in the size of the haplogroup after it appears
  • An expanding population (m>1)
Both factors seem reasonable for post-Holocene human populations. It is well known that -whatever temporary setbacks there were- mankind has overall experienced a substantial population growth in recent millennia. Thus, an expanding population seems like a fair assumption.
Let's hope that -if nothing else- this paper will be the beginning of the end for the indiscriminate use of "evolutionary rates" across different time spans and marker systems. I am not counting on this happening any time soon, given the substantial intellectual inertia of the field.


I have expressed my reservations about the difficulty of "archaeological calibration" of the mutation rate. These thoughts pertain to the difficulty of obtaining a valid calibration point due to demography and selection. They also find indirect support from calibration of the mutation rate via ancient DNA.

Nonetheless, even within a calibrationist framework, this paper shows that human mtDNA over the last 5,000 years has accumulated variation at the germline (pedigree) mutation rate, and has extrapolated that in the last 15ky at something close to it -- and definitely not at a much slower rate.

Hopefully, this paper will be extended to rethink the calibration of human Y-chromosome and autosomal mutation rates, and its assumptions may be checked from ancient DNA-based calibration in both humans and other species.

Molecular Biology and Evolution doi: 10.1093/molbev/msn244

Characterizing the Time-Dependency of Human Mitochondrial DNA Mutation Rate Estimates

Brenna M. Henn et al.


Previous research has established a discrepancy of nearly an order of magnitude between pedigree-based and phylogeny-based (human vs. chimpanzee) estimates of the mitochondrial (mtDNA) control region mutation rate. We characterize the time-dependency of the human mitochondrial hypervariable region one (HVRI) mutation rate by generating fourteen new phylogeny-based mutation rate estimates using within-human comparisons and archaeological dates. Rate estimates based on population events between 15,000 and 50,000 years ago are at least twofold lower than pedigree-based estimates. These within-human estimates are also higher than estimates generated from phylogeny-based human-chimpanzee comparisons. Our new estimates establish a rapid decay in evolutionary mutation rate between approximately 2,500 and 50,000 years ago, and a slow decay from 50,000 to 6 million years ago. We then extend this analysis to the mtDNA coding region. Our within-human coding region mutation rate estimates display a similar, though less rapid, time-dependent decay. We explore the possibility that multiple hits explain the discrepancy between pedigree-based and phylogeny-based mutation rates. We conclude that while nucleotide substitution models incorporating multiple-hits do provide a possible explanation for the discrepancy between pedigree-based and human-chimpanzee mutation rate estimates, they do not explain the rapid decline of within-human rate estimates. We propose that demographic processes such as serial bottlenecks prior to the Holocene could explain the difference between rates estimated before and after 15,000 years ago. Our findings suggest that human mitochondrial DNA estimates of dates of population and phylogenetic events should be adjusted in light of this time-dependency of the mutation rate estimates.



Vincent said...

Great post, Dienekes. I share your conclusions and your hopes.

just passing by said...

Is the mutation rate for the coding region of the mtDNA variable? Stressful environment has been said to cause mutations for metabolism or etc. So exposure to stressful environments might cause mutations in the affected population, but not in populations living in more amiable climates.

McG said...

I believe that mutations rates at dys loci change with time. I have shown that dys 388 is different for I1a and R1b. That said, I also took each entire data set, I1a and R1b and summed the number of mutations and then normalized them together. My result was that the total number of mutations in each set was essentially the same. In other words some were increasing and some were decreasing in each set balancing the total out. This would suggest that if you use a large enough sample, you have suggested 11, then you shouldn't see a bias due to this effect over time?