November 27, 2012

Ancestry Mapper (Magalhães et al. 2012)

The idea of Ancestry Mapper is fairly simple: each individual is represented as a vector of similarity to a fixed number of a priori chosen reference populations. These vectors can then be processed (e.g., with clustering) as any other type of high-dimensional data (e.g., PC co-ordinates).

The following figure should appear familiar to readers familiar with my MDS/MCLUST "Clusters Galore" methodology:

This was produced by applying PAM clustering to AMids. I don't think that this is a better way to do clustering than PCA/MDS+MCLUST, both because "partition around medoids" is a less expressive model than the suite of models that MCLUST may consider and choose from, and also because the AMids assume a priori assignment of individuals to populations, which is not necessary for the "Galore" approach that uses MDS/PCA for dimensionality reduction of individuals and is agnostic about their population labels. In any case, it is useful to know that with both a different dimensionality reduction method and a different clustering algorithm, a large number of meaningful clusters can be inferred.

PLoS ONE 7(11): e49438. doi:10.1371/journal.pone.0049438

HGDP and HapMap Analysis by Ancestry Mapper Reveals Local and Global Population Relationships

Knowledge of human origins, migrations, and expansions is greatly enhanced by the availability of large datasets of genetic information from different populations and by the development of bioinformatic tools used to analyze the data. We present Ancestry Mapper, which we believe improves on existing methods, for the assignment of genetic ancestry to an individual and to study the relationships between local and global populations. The principle function of the method, named Ancestry Mapper, is to give each individual analyzed a genetic identifier, made up of just 51 genetic coordinates, that corresponds to its relationship to the HGDP reference population. As a consequence, the Ancestry Mapper Id (AMid) has intrinsic biological meaning and provides a tool to measure similarity between world populations. We applied Ancestry Mapper to a dataset comprised of the HGDP and HapMap data. The results show distinctions at the continental level, while simultaneously giving details at the population level. We clustered AMids of HGDP/HapMap and observe a recapitulation of human migrations: for a small number of clusters, individuals are grouped according to continental origins; for a larger number of clusters, regional and population distinctions are evident. Calculating distances between AMids allows us to infer ancestry. The number of coordinates is expandable, increasing the power of Ancestry Mapper. An R package called Ancestry Mapper is available to apply this method to any high density genomic data set.

Link

No comments: