March 15, 2011

StepPCO for admixture estimation

The authors introduce wavelet transform as a method of estimating admixture proportions and dating the time of admixture. They claim to perform better than HAPMIX which is probably the state of the art when it comes to this sort of thing.

As I had pointed out in my review of HAPMIX, the problem with this type of tool is that quite often you don't have access to the parental populations of an admixed population, because either they no longer exist in unadmixed form themselves, or you are using inappropriate stand-ins for them. This is not much of a problem for unsupervised admixture analysis which makes no assumptions about which populations combined to form an admixed population, but looks only at individuals.

Indeed, I'd say there is plenty of room for researchers to come up with unsupervised versions of HAPMIX/StepPCO and/or to extend them so that they can handle tri-source populations, as they currently assume only two sources of admixture.

An interesting quote from the paper:
Average admixture proportions estimated by the StepPCO method for the African-Americans, Polynesians and Fijians are 19% European ancestry, 24.9% Melanesian ancestry, and 40.2% Melanesian ancestry respectively (Figure 6a). Individual admixture estimates vary substantially among the African-Americans, with some individuals exhibiting very low European ancestry (less than 5%), and some substantially higher (more than 40%). These results were substantiated by the frappe [13] analysis, which agree quite closely with the per-chromosome ancestry estimates from the StepPCO analysis (Figure 6b). A similar pattern is observed in Fiji, with Melanesian ancestry ranging from 22% to 63%. Despite the fact that the Polynesian sample is very diverse, coming from seven different islands [19] , the level of Melanesian ancestry is much more uniform across individuals (varying from 18 to 28%).

Contra the speculations of some, per-chromosome ancestry estimates do not differ greatly from those obtained from a genome-wide maximum likelihood algorithm like frappe; the latter implements the same algorithm as ADMIXTURE, the software I use in the Dodecad Project. Nor is there any evidence that maximum likelihood algorithms suppress low-level admixture: the Mandenka show 2% European admixture in the 2-way analysis by both StepPCO and HAPMIX, and they show 1.66% West Eurasian admixture in my K=3 global unsupervised admixture analysis which looked at 139 different populations.

The main advantage of HAPMIX/StepPCO over maximum likelihood methods is not their greater accuracy, but rather the fact that they can date admixture events, with the above-mentioned caveats. From the paper:
The spectral analysis of the StepPCO signal revealed that the average dominant frequency for the African-Americans is located at level 1.8, which would correspond to an abundance of low frequency wavelets (that is, wider ancestry blocks), while for the Fijians and the Polynesians the average dominant frequency is at level 3.06 and 3.63 respectively, which is indicative of much narrower ancestry blocks (Figure 7). Based on simulations, the WT center of 1.8 corresponds to an admixture time of 6 generations ago (95% CI: 4-8 generations) for the African Americans. Assuming a generation time of 30 years [33] , our results indicate that the admixture in the African Americans started about 180 years ago. Similarly, the simulations indicate that the WT center of 3.63 for the Polynesians corresponds to an admixture time of 90 generations (95% CI: 77-131 generations), or about 2,700 years ago (Figure 8). The time estimation for Fiji is based on simulated data with a 40% admixture rate (to match the higher admixture rate of Fiji), and here the WT center of 3.06 corresponds to an admixture time of 37 generations (95% CI: 29-39) or about 1,100 years ago.

The central estimate for African Americans seems plausible, given that admixture in that population took place since colonial times until more recently, as AA children of half-white heritage are usually considered (by society) as "black" (cf. Obama), and two centuries or so seems like a reasonable middle ground. The ~2.7ky for Polynesian admixture is also in agreement with the different method of Wollstein et al. (2010) of 3ky.

The software runs in R and is available online.

Genome Biology 2011, 12:R19 doi:10.1186/gb-2011-12-2-r19

Dating the age of admixture via wavelet transform analysis of genome-wide data

Irina Pugach et al.

Abstract

We describe a PCA-based genome scan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixed human populations. The wavelet transform method offers better resolution than existing methods for dating admixture, and can be applied to either SNP or sequence data from humans or other species.

Link

4 comments:

Andrew Oh-Willeke said...

I'd be curious to see what dates it assigns to ANI v. ASI in South Asia.

Eze said...

It would be interesting to know the age of the admixture events in Northern Africa. Would be a nice experiment.

pconroy said...

I'd like to see this run with the Lezgins and other European and South Asian populations

astenb said...

@ Eze Some of the north African estimates were actually listed in the full text. (Mozabite = 131 Generations ago) I think it would be more interested seeing African specific admixture dates that contrast Nilo-Saharan with Bantu, Horn African with Nilo-Saharan and Bantu, and South African with Horn-African (E-M293) and Bantu.

Furthermore it would be interesting to infer admixture dates of North Africa contrasted with East and West Sub Saharan Africa. As well as North and East/West Sub Saharan Africa on the Middle East.