January 14, 2015

SpaceMix preprint

bioRxiv http://dx.doi.org/10.1101/013474

A Spatial Framework for Understanding Population Structure and Admixture.

Gideon Bradburd, Peter L. Ralph, Graham Coop

Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build “geogenetic maps”, which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.


1 comment:

eurologist said...

This new tool appears to show good promise as a complement to non-geographic tree-with-admixture models.

I am not sure I agree with the authors' (and others') criticism of standard PC analysis: orthogonality does not need to be a "problem." I think some of this "intuitive interpretation failure" may stem from the choice of normalization (standard PC normalization appears to over-emphasize the weight of SNPs lost due to drift over SNPs common between groups).