November 17, 2012

Populations histories with a diffusion process formulation

On the left you can see the best topology on a diffusion time scale. It might be interesting that CEU (YRI) appear closer to Africans (Eurasians) than JPT (BIA; Biaka Pygmies).

Mol Biol Evol (2012) doi: 10.1093/molbev/mss257

Inferring population histories using genome-wide allele frequency data

Mathieu Gautier and Renaud Vitalis

The recent development of high throughput genotyping technologies has revolutionized the collection of data in a wide range of both model and non-model species. These data generally contain huge amounts of information about the past demographic history of populations.

 In this study we introduce a new method to estimate divergence times on a diffusion time-scale from large SNP datasets, conditionally on a population history which is represented as a tree. We further assume that all the observed polymorphisms originate from the most ancestral (root) population, i.e. we neglect mutations that occur after the split of the most ancestral population. This method relies on a hierarchical-Bayesian model, based on Kimura's time-dependent diffusion approximation of genetic drift. We implemented a Metropolis–Hastings within Gibbs sampler to estimate the posterior distribution of the parameters of interest in this model, which we refer to as the Kimura model. Evaluating the Kimura model on simulated population histories, we found that it provides accurate estimates of divergence time. Assessing model fit using the deviance information criterion (DIC) proved efficient for retrieving the correct tree topology among a set of competing histories. We show that this procedure is robust to low-to-moderate gene flow, as well as to ascertainment bias, providing that the most distantly related populations are represented in the discovery panel. As an illustrative example, we finally analyzed published human data consisting in genotypes for 452,198 SNPs from individuals belonging to four populations worldwide.

Our results suggest that the Kimura model may be helpful to characterize the demographic history of dierentiated populations, using genome-wide allele frequency data.


eurologist said...

Looks promising. I have thought about this before. Although I am far from being an expert, it appears intuitive that ignoring selection and mutation might be OK for tree models, since it may only produce a general noise level not that different for the various branches. Perhaps this requires the human population at the root to be larger than what some feel comfortable with - but other recent findings actually support that notion. Of course, in the end I am more worried how these simplifications, including neglecting admixture, effect time scales (unfortunately, here as almost always coupled to the unknown population size).

terryt said...

"Perhaps this requires the human population at the root to be larger than what some feel comfortable with - but other recent findings actually support that notion".

Yes. As we've been trying to explain to Maju it is extremely unlikely that humans all originated in some Garden of Eden and then expanded in some sort of biblical Exodus from the Garden. Yet, as you say, many seem to believe in something similar and are uncomfortable with any alternative explanation. Gene flow over time through the various human 'species' is the most logical explanation for the development of 'modern' humans. A sort of wave theory of evolution.