November 26, 2012

LAMP-LD paper and software

On a similar topic as the recent MULTIMIX software, this paper describes the performance of LAMP-LD software on Latinos with ancestry from Europe, Africa, and the Americas. The software can be obtained from this site.

From the paper itself, this figure highlights a problem I have previously identified:

In this experiment, the authors "European-ized" East Asian reference panels by introducing TSI (Tuscan) segments into them. From the paper:

Current day Native American haplotypes used as proxy for the Native American component of Latinos are presumed to contain European gene flow. In order to test the effect of this phenomenon on ancestry inference, we introduced TSI segments into the Asian haplotypes of a reference set composed of 117 CEU, 169 (CHB+CHD) and 115 YRI haplotypes. We performed 10 experiments, in each choosing at random a 5 Mb region along the chromosome, and replacing a percentage of the (CHB+CHD) haplotypes with TSI haplotypes along the chosen region. 
We observed that the typical effect of increasing the number of TSI segments present in the Native American reference panels is an increase in the estimated proportion of the Native American ancestry along the modified region, at the expense of the estimated European proportion.

In the case of Native American admixture, the occurrence of European segments in the reference panels is a problem, because we can be fairly sure that prior to 1492 there was no recent European ancestry in the Americas.

But, the problem also arises in other cases where this is less certain, for example the arrival of East Eurasian ancestry via Uralic and Turkic speakers from Siberia and Central Asia and into West Eurasia. In that case, we cannot be entirely certain whether the presence of European haplotypes in reference populations (e.g., present-day Siberians/Central Asians) is due to post- or pre-migration contact in the eastern source areas.

To make my observation clearer: suppose that an eastern population X, contributes to a European population Y. We can then estimate how much "X ancestry" population Y has absorbed. But, if X today is "more East Asian" than X when it contributed to Y, then the proportion of admixture will be underestimated, and in the converse case it will be overestimated.

This was made evident in my recent analysis of Turks where substantially different admixture estimates was obtained using different eastern populations. The evidence of that analysis suggests that major admixture occurred in Central Asia after it did in Anatolia.

Bioinformatics (2012) 28 (10): 1359-1367. doi: 10.1093/bioinformatics/bts144

Fast and accurate inference of local ancestry in Latino populations

Yael Baran et al.

Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas).

Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.




shenandoah said...

"...because we can be fairly sure that prior to 1492 there was no recent European ancestry in the Americas."

According to Oxford professor of Human Genetics, Bryan Sykes, "European" genes have been present in North America for at least 10,000 yrs. He also quotes Mike MacPherson, statistician of 23andMe saying, that he calculates that pre-1492 Native American population in North America possessed anywhere from +/- 15-35% "European" genes.

In practically all chromosome paintings of ~full-blooded North American Native Americans, some similar percentages of "European" 'admixture' is evident; also, in many Native Europeans whose families never resided in North America, a percentage of 'Amerindian' may be apparent.

Does your analysis support their theories?

Mark D said...

You insightfully point out, as you have several times in other posts, an issue that presents in admixture analysis of Amerindians, including the Latino samples mentioned here, that of purported European ancestry when there may not be any recent admixture at all. Dr. Sykes in his recent book DNA USA, mentions this also that "this can be partly due to the Asia/Europe border artifacts of Siberian chromosomes rather than a genuinely recent European admixture." (p.315) He mentions a commercial lab that usually found around 25% European admixture in Native American samples. I think there is much more fine-tuning left to do and this article may contribute to solving the problem presented.