April 08, 2008

HMM-based analysis of polymorphisms in admixed ancestries

New Genomics Software Infers Ancestry With High Accuracy
"With very high accuracy, even for 20 generations, we can trace the populations of those individuals who are indeed represented in your genome," says Stanford computer science Assistant Professor Serafim Batzoglou, who led a team of graduate students to create HAPAA. They include co-lead authors Andreas Sundquist and Eugene Fratkin, as well as Chuong B. Do.

Batzoglou points out that because the HapMap database, a genetic record of 270 individuals of Western European, West African and East Asian ancestry, is very small, HAPAA now can only generate an ethnic profile in terms of these populations.

Fratkin himself was able to verify that he is of European ancestry, but not that he is 1/64th Polish. But more genomics data will become available, the researchers said, which will further expand the software's ability to help people discern their roots.


For now the HAPAA software provides proof of this concept but limited utility given the small size of the HapMap database. In the future the software will benefit not only from having more individuals available for comparison, Batzoglou said, but also more detailed data about each individual. Today's genome samples track about 500,000 markers, or common genetic differences, but there are about 10 million candidates. Most individuals have about 3 million such specific differences. As genomics technology improves, he says, so will HAPAA's ability to infer ancestry from the data.
Genome Research, DOI: 10.1101/gr.072850.107

Effect of genetic divergence in identifying ancestral origin using HAPAA

Andreas Sundquist1, Eugene Fratkin1, Chuong B. Do, and Serafim Batzoglou

The genome of an admixed individual with ancestors from isolated populations is a mosaic of chromosomal blocks, each following the statistical properties of variation seen in those populations. By analyzing polymorphisms in the admixed individual against those seen in representatives from the populations, we can infer the ancestral source of the individual’s haploblocks. In this paper we describe a novel approach for ancestry inference, HAPAA (HMM-based analysis of polymorphisms in admixed ancestries), that models the allelic and haplotypic variation in the populations and captures the signal of correlation due to linkage disequilibrium, resulting in greatly improved accuracy. We also introduce a methodology for evaluating the effect of genetic divergence between ancestral populations and time-to-admixture on inference accuracy. Using HAPAA, we explore the limits of ancestry inference in closely related populations.



n/a said...

I posted on this weeks ago.

Looks promising.

saphorr said...

Whatever. 1/64 Polish? How do you tell a 1/64 of a "Polish gene" from someone just over the border in the Ukraine?

No software will ever be able, in general, to determine which modern nation states your 2^10 ancestors 10 generations ago came from. To suggest this is possible in general betrays a complete misunderstanding of genetics and human history.

saphorr said...

I should say I don't believe that the author himself is asserting such a thing, but I think many people will take this as his meaning, and he ought to prepare for that and issue preemptive disclaimers.

If such a technique is invented, would like to see which of my genes are flagged as "Irish" and which as "Scottish". Then I would like to try the same trick starting with a different seed sample!