From the paper itself, this figure highlights a problem I have previously identified:
In this experiment, the authors "European-ized" East Asian reference panels by introducing TSI (Tuscan) segments into them. From the paper:
Current day Native American haplotypes used as proxy for the Native American component of Latinos are presumed to contain European gene flow. In order to test the effect of this phenomenon on ancestry inference, we introduced TSI segments into the Asian haplotypes of a reference set composed of 117 CEU, 169 (CHB+CHD) and 115 YRI haplotypes. We performed 10 experiments, in each choosing at random a 5 Mb region along the chromosome, and replacing a percentage of the (CHB+CHD) haplotypes with TSI haplotypes along the chosen region.
We observed that the typical effect of increasing the number of TSI segments present in the Native American reference panels is an increase in the estimated proportion of the Native American ancestry along the modified region, at the expense of the estimated European proportion.
In the case of Native American admixture, the occurrence of European segments in the reference panels is a problem, because we can be fairly sure that prior to 1492 there was no recent European ancestry in the Americas.
But, the problem also arises in other cases where this is less certain, for example the arrival of East Eurasian ancestry via Uralic and Turkic speakers from Siberia and Central Asia and into West Eurasia. In that case, we cannot be entirely certain whether the presence of European haplotypes in reference populations (e.g., present-day Siberians/Central Asians) is due to post- or pre-migration contact in the eastern source areas.
To make my observation clearer: suppose that an eastern population X, contributes to a European population Y. We can then estimate how much "X ancestry" population Y has absorbed. But, if X today is "more East Asian" than X when it contributed to Y, then the proportion of admixture will be underestimated, and in the converse case it will be overestimated.
This was made evident in my recent analysis of Turks where substantially different admixture estimates was obtained using different eastern populations. The evidence of that analysis suggests that major admixture occurred in Central Asia after it did in Anatolia.
Bioinformatics (2012) 28 (10): 1359-1367. doi: 10.1093/bioinformatics/bts144
Fast and accurate inference of local ancestry in Latino populations
Yael Baran et al.
Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas).
Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.