June 11, 2008

deCODEme, 23andMe SNPs and published Caucasoid substructure studies

Both 23andMe and deCODEme are offering SNP genotyping services which includes an assessment of ancestry. Both companies offer admixture estimates for major continental groups (races), as well as an assessment of similarity with more specific groups: 23andMe uses categories such as Northern European, Southern European, Near Easterner etc. while deCODEme compares clients' profile with "reference individuals" such as Basque, Russian, Tuscan, Orcadian etc. B

Both services seem to use the HGDP populations for the more specific (subracial) similarity assessment. In the last couple of years, there have been a few studies that looked at intra-European or intra-Caucasoid genomic variation, so it might be possible to devise a test using the published results and the SNPs tested by these companies.

deCODEme uses the Illumina 1M BeadChip, while 23andMe uses the Illumina HumanHap550+ BeadChip with an additional custom set of markers. The deCODEme chip measures 1,072,820 SNPs, while 23andME (according to the "Greg Mendel" data you can download from their website) measures 571,754 SNPs.

Price et al. (2008) have identified a set of 300 ancestry informative markers including that distinguishes between NW/SE Europeans and SE Europeans/Ashkenazi Jews. The deCODEme set tests for 192 of these markers, whereas 23andME tests for 169 of them.

Tian et al. (2008) have identified 1,441 European substructure ancestry informative markers (rtf) (ESAIMs). deCODEme tests 1,412 of them, while 23andME tests 1,424 of them. As far as I can tell, the original study did not publish either frequency data or individual genotypes for these markers, so using them to infer ancestry may not be possible. (Let me know if this data exists and I missed it).

(added Jun 12) Bauchet et al. (2007) have identified a panel of 1,200 markers for European population substructure and report frequency data for Southeastern and Northern Europeans (xls). deCODEme tests for 508 of them, and 23andMe tests for 438 of them.

Seldin et al. (2006) have provided frequency data (pdf) in several populations for 5,735 SNPs. markers. deCODEme tests for 3,502 of these, while 23andMe tests for 2,242 of them.

One could also use the 650K SNPs from Stanford. There are 660,755 non-mitochondrial SNPs in the freely available data, all of which are tested by deCODEme; 23andMe which partially funded the study (Li et al. (2008)) tests 549,118 of them.

In conclusion, it seems possible to make your own test using commercially available SNPs and freely available data other than the HGDP populations.


