"With very high accuracy, even for 20 generations, we can trace the populations of those individuals who are indeed represented in your genome," says Stanford computer science Assistant Professor Serafim Batzoglou, who led a team of graduate students to create HAPAA. They include co-lead authors Andreas Sundquist and Eugene Fratkin, as well as Chuong B. Do.Genome Research, DOI: 10.1101/gr.072850.107
Batzoglou points out that because the HapMap database, a genetic record of 270 individuals of Western European, West African and East Asian ancestry, is very small, HAPAA now can only generate an ethnic profile in terms of these populations.
Fratkin himself was able to verify that he is of European ancestry, but not that he is 1/64th Polish. But more genomics data will become available, the researchers said, which will further expand the software's ability to help people discern their roots.
For now the HAPAA software provides proof of this concept but limited utility given the small size of the HapMap database. In the future the software will benefit not only from having more individuals available for comparison, Batzoglou said, but also more detailed data about each individual. Today's genome samples track about 500,000 markers, or common genetic differences, but there are about 10 million candidates. Most individuals have about 3 million such specific differences. As genomics technology improves, he says, so will HAPAA's ability to infer ancestry from the data.
Effect of genetic divergence in identifying ancestral origin using HAPAA
Andreas Sundquist1, Eugene Fratkin1, Chuong B. Do, and Serafim Batzoglou
The genome of an admixed individual with ancestors from isolated populations is a mosaic of chromosomal blocks, each following the statistical properties of variation seen in those populations. By analyzing polymorphisms in the admixed individual against those seen in representatives from the populations, we can infer the ancestral source of the individual’s haploblocks. In this paper we describe a novel approach for ancestry inference, HAPAA (HMM-based analysis of polymorphisms in admixed ancestries), that models the allelic and haplotypic variation in the populations and captures the signal of correlation due to linkage disequilibrium, resulting in greatly improved accuracy. We also introduce a methodology for evaluating the effect of genetic divergence between ancestral populations and time-to-admixture on inference accuracy. Using HAPAA, we explore the limits of ancestry inference in closely related populations.