August 23, 2012

Or, maybe they speciated 3.7-6.6Ma ago? (Sun et al. 2012)

This has certainly been an eventful August in human origins research; if the Neandertal Wars weren't enough, a different issue that had simmered for a while now, the human autosomal sequence mutation rate, has now come to a full boil.

A couple of weeks ago, Langergraber et al. (2012) came out, and combined direct measurement of generation lengths in humans and other primates with the directly measured human autosomal sequence mutation rate to argue for an old 7-13Ma divergence between Pan and Homo.

Yesterday, Kong et al. (2012) independently derived a low direct mutation rate of 1.2x10^-8, and added the observation that older human fathers pass on more mutations to their offspring than younger ones. As I point out in my post on the topic, this has implications for the Homo-Pan divergence as well: if chimp dads are younger than human dads, they will tend to pass fewer mutations to their offspring. Thus, the chimp mutation rate (/generation) might be lower rather than equal to the human one, and this might push the speciation time even further back in time.

Today, a new paper has appeared in Nature Genetics which argues for an "intermediate" rate between  the direct ~1-1.3x10^-8 rate and the widely used 2.5x10^-8 one: their rate estimate is: 1.4–2.3x10^-8 and the corresponding Human-Chimp speciation time is 3.7-6.6 million years ago. Kari Stefansson is a co-author of the new paper, as he is of the Kong et al. one, which estimated the mutation rate at 1.2x10^-8.

The new paper builds what appears to be a very exhaustive model of microsatellite mutation:
Microsatellites have been widely used to make inferences about evolutionary history. However, the accuracy of these inferences has been limited by a poor understanding of the mutation process. We developed a new model of microsatellite evolution (Supplementary Note). This model can estimate the time to the most recent common ancestor (TMRCA) of two samples at a microsatellite by taking into account (i) the dependence of the mutation rate on allele length and parental age (Fig. 2a,c); (ii) the step size of mutations (Fig. 2b); (iii) the size constraints on allele length (Fig. 2d and Supplementary Figs. 8 and 9); and (iv) the variation in generation interval over history. In contrast to the generalized stepwise mutation model (GSMM), which predicts a linear increase of average squared distance (ASD) between microsatellite alleles over time, the new model predicts a sublinear increase (Fig. 3) and saturation of the molecular clock, due to the constraints on allele length. We also extended the model to estimate the sequence mutation rate, using the per-nucleotide diversity flanking each microsatellite as an additional datum. To implement the model, we used a Bayesian hierarchical approach, first generating global parameters common to all loci, followed by locus-specific parameters and finally the microsatellite alleles at each locus (Online Methods). We used Markov chain Monte Carlo to infer TMRCA and sequence mutation rate. 
I haven't delved deeply into the details of how the sequence mutation rate (per nucleotide/per generation) can be derived by exploiting the microsatellite rate. But, why would the rate estimated with the new method be different than the directly measured one? The authors propose some ideas:
We hypothesize that the lower mutation rate estimates from the whole-genome sequencing studies might be due to (i) the limited number of mutations detected in these studies, which explains why their confidence intervals overlap ours, (ii) possible underestimation of the false negative rate in the whole-genome sequencing studies or (iii) variability in the mutation rate across individuals, such that a few families cannot provide a reliable estimate of the population-wide rate.  
 Apparently, the team behind Sun et al. became aware of the new Kong et al. after the paper was accepted, so they attached the following note at the end of it, as well as a discussion in the supplement:
Note added in proof: After this paper was accepted, another study35 was published that independently estimates the human sequence mutation rate, using a direct measurement in contrast to the indirect measurement we report here. In spite of some key similarities between our results and those of Kong et al.35 (the male-to-female mutation rate ratio and the absence of an effect of mother's age), they estimate a considerably stronger effect of father's age and an overall sequence mutation rate below the range we infer. The discrepancies in the sequence mutation rate may be in part due to the fact that Kong et al. focus on a more intensively filtered subset of the human genome than we analyze here, but other factors are also likely to be at work (Supplementary Note). As an initial attempt to compare the two studies in terms of their implications for evolutionary history, we ran the same Bayesian inference procedure we developed in this paper (integrating over uncertainty in unknown parameters), now using the sequence-based estimates rather than the microsatellite-based estimates as input (Supplementary Note). Notably, the inferred dates based on the measurement of the sequence mutation rate are older and no longer in direct conflict with the inference that S. tchadensis is on the human lineage since the split from chimpanzees. The sequence- and microsatellite-based data sets are very different, and an important direction for future research will be to understand why the direct sequence–based mutation rate estimate is lower than the one inferred on the basis of microsatellites. 
All this leaves me rather perplexed. I guess one take-home lesson from the debate would be to avoid making strong statements about the past that are dependent on a particular mutation rate. The following table from the supplementary material pretty much says it all:

Notice that the two estimates are approximately double one of the other. Personally, I tend to favor the older dates, since they might "match" better with key developments: Out-of-Africa will become pre-100ka and consistent with the appearance of the Nubian technocomplex in Arabia, which seems to be the only real solid evidence of Out-of-Africa in the archaeological record. It would also be consistent with the appearance of modern humans in the Levant c. 100ca at Mt. Carmel, the first clear evidence of Homo sapiens in Eurasia. Moreover, it would explain the early appearance of Neandertaloid features in the Atapuerca hominins at c. 600ka, long before the inferred split of modern humans from Neandertals when the slowest rate is used.

But, my confidence in these correspondences is low until the controversy is resolved one way or another. If the 1.8x10^-8 rate of this paper is closer to the truth, then my money would be on the false negative rate, i.e., full genome sequencing is systematically overlooking SNPs that exist in the genomes.

Apparently, now, we have three rates to contend with: (i) the Icelandic 1.2x10^-8 rate (and other similar rates, such as the 1.36x10^-8 one); the 2.5x10^-8 one that has been very widely used in the literature, and (iii) the "1.82x10^-8 mutations per base pair per generation (90% CI 1.40–2.28 × 10-8; Table 2)" from this paper. This may be disheartening, but all setbacks represent opportunities to learn something new, and now that the issue is out in the open, I'm sure that many "top dogs" will try to figure out what is going on.

Nature Genetics doi:10.1038/ng.2398

A direct characterization of human mutation based on microsatellites

James X Sun et al.

Mutations are the raw material of evolution but have been difficult to study directly. We report the largest study of new mutations to date, comprising 2,058 germline changes discovered by analyzing 85,289 Icelanders at 2,477 microsatellites. The paternal-to-maternal mutation rate ratio is 3.3, and the rate in fathers doubles from age 20 to 58, whereas there is no association with age in mothers. Longer microsatellite alleles are more mutagenic and tend to decrease in length, whereas the opposite is seen for shorter alleles. We use these empirical observations to build a model that we apply to individuals for whom we have both genome sequence and microsatellite data, allowing us to estimate key parameters of evolution without calibration to the fossil record. We infer that the sequence mutation rate is 1.4–2.3-10^-8 mutations per base pair per generation (90% credible interval) and that humanchimpanzee speciation occurred 3.7–6.6 million years ago.



andrew said...

I am skeptical that paternal age effects will have much impact on average mutation rates. Mutation rates are high in the low percentage of births to advanced paternal age fathers, fewer fathers lived to advanced ages in the past than they do now, and the age at which mutation rates become elevated is probably species specific - i.e. it is tied to the relative progression of the aging process not the absolute number of years that pass.

But, paternal age may be much more important in evaluating natural selection models. In those models the issue is not the average mutation rate, but the likelihood that someone at some time gets born with a mutation that can provide a fitness advantage in a niche. Even a single mutation in a single individual, regardless of the size of the population, can make the difference between a mutationally limited community where no one has the variant needed to be more fit in a particular niche, and one where the mutation needed to survive best in an environment is within the range of natural variation. The faster the environment that needs to be adapted to changes and the smaller the effective population in question is, the more important mutational limitations are to ability of the community to utilize natural selection to adapt.

A modest percentage of children born to fathers of advanced paternal age can tweak the extent to which a population is mutationallly limited much more strongly than it does the average mutation rate.

jeffhsu3 said...

One other explanation they offer up for the difference (in the text, but not in the note) is ascertainment bias, microsatellites are microsatellites because they are polymorphic and thus these alleles potentially have a higher mutation rate.

terryt said...

" Even a single mutation in a single individual, regardless of the size of the population, can make the difference between a mutationally limited community where no one has the variant needed to be more fit in a particular niche, and one where the mutation needed to survive best in an environment is within the range of natural variation".

But that mutation still has to spread through the community (or species). What we know from the study of dairy cattle is that most mutations responsible for genetic change start out as recessive genes. Of course in the case of dairy cattle most mutations are harmful but I think we can safely extrapolate to advantageous genes. So a recessive, advantageous gene has to spread through numbers in a population before it can provide an advantage for members of that population. Perhaps its spread will be aided if it is co-dominant with the existing gene.