November 09, 2012

Multiway Admixture Deconvolution with MULTIMIX

The software will appear here. This has already been used in the recent 1000 Genomes paper. Below is the analysis of the MEX data:

I ran a small CEU/YRI/MEX K=3 analysis using ADMIXTURE on 30 random individuals from each population.
Notice that ADMIXTURE assigns 100% "American" ancestry to the most Amerindian-admixed individuals. MULTIMIX, on the other hand, has correctly not assigned any Mexicans 100% to the Amerindian component, because it makes use of LD to infer the ancestry of individual segments.

A couple advantages of the new method is that it is not limited to two ancestral populations, and does not require phased data as input, although phased data may provide some accuracy benefit, if available.

I'm eager to try the new software when it becomes available. I am not sure how it will scale (CPU/Memory-wise) with more individuals/components, so it'll be fun to experiment with.

Genet Epidemiol. 2012 Nov 7. doi: 10.1002/gepi.21692. [Epub ahead of print]

Multiway Admixture Deconvolution Using Phased or Unphased Ancestral Panels. 

Churchhouse C, Marchini J.


We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model-Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported.


1 comment:

bogdan said...

LAMPLD below also works for multiway mixtures, uses haplotypes, was used in the 1000Genomes and is publicly available.