December 18, 2012

Genographic GenoChip paper (Elhaik et al. 2012)

... has been posted on the arXiv. I don't have time to comment on it at the moment, and any further thoughts will be posted as an update here. By the way, thanks to the authors for putting me in the acknowledgements section :)

On a related note, I have released a patch for Geno 2.0 data so that they can be used with my DIYDodecad tools. I have converted 3-4 files already using it, so it seems to work fine, but in one file there was a problem because there were a lot of manual line breaks; not sure if this is a general problem or it was caused by the submitter re-saving the file, but if you encounter it, you might want to try saving your .csv file in Unix file format, or using dos2unix to fix it.

arXiv:1212.4116 [q-bio.PE]

The GenoChip: A New Tool for Genetic Anthropology

Eran Elhaik et al.

The Genographic Project is an international effort using genetic data to chart human migratory history. The project is non-profit and non-medical, and through its Legacy Fund supports locally led efforts to preserve indigenous and traditional cultures. In its second phase, the project is focusing on markers from across the entire genome to obtain a more complete understanding of human genetic variation. Although many commercial arrays exist for genome-wide SNP genotyping, they were designed for medical genetic studies and contain medically related markers that are not appropriate for global population genetic studies. GenoChip, the Genographic Project's new genotyping array, was designed to resolve these issues and enable higher-resolution research into outstanding questions in genetic anthropology. We developed novel methods to identify AIMs and genomic regions that may be enriched with alleles shared with ancestral hominins. Overall, we collected and ascertained AIMs from over 450 populations. Containing an unprecedented number of Y-chromosomal and mtDNA SNPs and over 130,000 SNPs from the autosomes and X-chromosome, the chip was carefully vetted to avoid inclusion of medically relevant markers. The GenoChip results were successfully validated. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays for three continental populations. While all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. The GenoChip is a dedicated genotyping platform for genetic anthropology and promises to be the most powerful tool available for assessing population structure and migration history.



eurologist said...

150,000 markers are not all that many, but it seems that they are very well chosen for most non-medical purposes. From the preliminary results I have seen, both y-DNA and mt-DNA resolution are extremely good.

I think a main question for many tentative customers is how good their autosomal-based recent-ancestry tools eventually will be. I am not impressed with those of 23andme and others - but that business kind of just started, and is still in the junk in - junk out mode (two major providers combine France and Germany as Central European? Seriously?). The good news is that it seems they ask for very specific, local information (village level, if possible!). The bad news is they only ask along the deepest uni-parental lines and level you know of (which, still, may not be such a bad approximation also for autosomal, if they have good corrective filters).

Also, I have to say I am not a fan of Spencer Wells confidence in dates and migrations routes - I personally disagree with just too much of that. But that is easily ignored. Let's hope their tools improve over time.

Mark D said...

"I think a main question for many tentative customers is how good their autosomal-based recent-ancestry tools eventually will be."
As a long-time customer of DNA testing,I was very disappointed with Geno2's autosomal ancestry analysis. The "reference" comparisons from Europe are too similar to be of any use to those of us who seek to learn more of our heritage; the Tuscan and Greek comparisons are in fact identical. On top of that, Geno2 uses 1000 Genomes sample labels, so that "Mexican" is really Los Angeles and then there's "Puerto Rican" which to me is likewise uninformative for ancestry purposes. They might as well have used "The Bronx" and obtained similar if not identical results. I hope those in this academic field can utilize Geno2 to much better effect, but as a consumer product, it leaves much to be desired.

eurologist said...


I think this area is relatively new - it is not like other companies have offered this forever, or have done a good job at it. Clearly, 23andme have their own difficulties - some of them self-made. I would expect Geno 2.0 to improve over time, as they collect localization data - but there are of course no guarantees. Right now, they pretty much only offer ancient admixture levels - which is useful only for people who are not recently admixed.

Mark D said...


I agree and have seen the posts on 23andMe, having followed Dienekes' blog for about two years now. The issue of sample grouping and labeling is one that commercial companies will have to better address if they are to be successful in selling their products. The datasets are expanding and the science used in analyzing them is advancing, and I thank Dienekes and everyone else involved in moving this field forward.