December 01, 2012

Recent origin of protein-coding variants in humans (Fu et al. 2012)

From the paper:
We estimated the age of all 1,146,401 SNVs using 6 different demographic models 5,6,8–11, 3 of which considered recent explosive population growth5,6,8 (Supplementary Table 2). Estimates of allele age were generally robust across different demographic models, with the largest discrepancies resulting in a twofold difference in average age across all SNVs (Supplementary Table 3 and Supplementary Fig. 8a). However, because most SNVs arose recently (see below), differences among demographic models were highly concordant (Supplementary Information). Accordingly, we report results based on a modified Out-of-Africa model9 in which accelerated population growth began 5,115 years ago with a per-generation growth rate of 1.95% and 1.66% for European Americans and African Americans, respectively6. 
The six models considered are:

None of them are very satisfactory, because 5/6 place the Out-of-Africa event at 87.5kya or later, and it now seems likely that this event took place before 100kya. On the other hand, this parameter may not be as critical in this case, because it appears that while the earliest colonization of Eurasia by modern humans did indeed take place prior to 100kya, the Eurasian population stems from a post-70kya bottleneck. The OOA event may not have caused the Eurasian bottleneck; the latter may be a consequence of environmental deterioration in North Africa-Arabia belt c. 70kya.

The modified Out-of-Africa model used in the main paper is by Gravel et al. (2011). That model used a mutation rate of 2.38x10-8 mutations/bp/gen to convert into times in years, which is a little less than twice the slower mutation rate inferred recently with a variety of methods. Thus, its age estimates ought to be pushed back by a factor of two, and this would resolve the inferred 23ky differentiation between Europeans and Asians without invoking any special mechanism, simply as a consequence of the peopling of West and East Eurasia in the last 40-50 thousand years.

But, in the current paper (Fu et al.) a mutation rate of 1.5x10-8 has been used, which is a fairly slow one, albeit a little faster than the 1.2x10-8 reported in a number of studies. It's not entirely clear to me what the consequence would be of mixing a model (Gravel et al.'s) whose parameters have been inferred using a 2.38x10-8 mutation rate with a mutation rate of 1.5x10-9. Overall, I'd think that the main effect would be to bias age estimates downwards, although the analysis should probably be repeated.

Table S3 provides some idea of how different demographic models affect age estimates:

I took a look at Nelson et al. (which has the higher time estimates) in which a median mutation rate of 1.38x10-8 was inferred. Tennessen et al. estimate OOA at 51kya, so they are probably using the faster (and probably outdated) rate.

When did population growth (which meant more bodies, more mutations, higher chance of mildly deleterious mutations to survive) begin? One possibility is that this was a response to deglaciation and temperature rises after the end of the last Ice Age. Another is that it was a consequence of population growth facilitated by agriculture which increased the land's carrying capacity. Perhaps ancient DNA may inform this discussion by measuring the number of deleterious SNVs in individuals across the last 10 thousand years or so.

Nature (2012) doi:10.1038/nature11690

Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants

Wenqing Fu et al.

Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history1, 2 and will help to facilitate the development of new approaches for disease-gene discovery3. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth4, 5, 6, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000–10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.


No comments: