November 02, 2008

23andme's advanced global similarity tool

UPDATE: I am told that this tool is currently in alpha version, so it's not clear when it will be fully ready for 23andme customers. As per my comments below, I think this is a great initiative to tie individual customers' genetic data to the many new genetic studies showing genomic-geographic correlations. I am sure that 23andme's blog, the Spittoon, will cover this when it is ready for public release, including any features that I may have overlooked. I will be following this story closely. [end update]

23andme has added a new advanced global similarity tool to their website (you need to register in order to play with it). This tool places a customer, as well as other customers he is "connected" with on the map of the first two principal components like the ones recently published in several papers.

The tools allows one to look at the PC map at the global, continental, or subcontinental level.


This is quite useful, and a right step in the direction I pointed out earlier. However, there are some points of criticism.
  • The axes are labeled North/South Migration and East/West Migration. While the pattern in the first two principal components does correspond roughly with longitude and latitude, it is erroneous to label these principal components as "North/South" and "East/West". It is even more erroneous to label them as "Migration", since a geographical cline is not necessarily produced by a migration event.
  • The "Take a Tour" feature presents a simplistic and misleading account of human prehistory in terms of "migrations". This account is a simple branching pattern, e.g., Africa -> Near East Europe, or Africa -> Near East -> Central Asia -> East Asia. The observed pattern did not emerge in this manner. For example, Central Asian people such as the Uyghur are intermediate between Western Eurasians (Caucasoids) and Eastern Eurasians (Mongoloids) because of a later admixture event; they can't be thought of as "ancestors" of the East Eurasians.
  • Partitioning human variation into this hierarchical set of groups is not the best way to satisfy customers' needs. For example, a Hispanic person may wish to see himself on a PC map which includes "Southern European" and "Native American" groups, an African American person may wish to see himself on a PC map which includes "Northern European" and "West African" groups, an Ethiopian, on a Sub-Saharan/Near Eastern map, while a European Jew on a European/Near Eastern map. Of course, there is a combinatorial number of possible combinations, but there is no reason why some of the more common ones (customer feedback may play a role here) many not be supported.
  • Why should this tool be limited to the first two principal components? Of course, additional components do not have such a strong geographical correspondence, but they -nonetheless- will separate populations in different ways, and allow individuals to place themselves more fully in context.
  • The tool could offer much more information. On mouse hover over an individual, a small label identifying it (e.g. origin and HGDP code), and listing its PC coordinates could appear. This is especially useful for power users. A pretty uncluttered picture is no substitute for as much information as possible.

4 comments:

Andrew Yates said...

Good writeup; I've pushed this to the news.thinkgene.com genomic link share.

albertine meunier said...

albertine is doing an artistic experiment, called 200gr

she need your click to buy her dna kit.
with all your contribution she will be able to discover if her dna weight is 200gr as science says

so just click on the google ads on this page :
http://www.albertinemeunier.net/200grammes/

mikej2 said...

Dienekes, you are right about hierarchical sets. I am at second level exactly in the middle of northern and eastern europeans and can see only the eastern european map due to the drawn subgroup square areas. In practice the result is useless for me.

zadeh79 said...

Actually this tool can be quite inaccurate for middle easterners, as it uses a very limited sample, based on the Human Genome Project. The PCA is only based off of three middle eastern groups (Druze, Bedouins, and Palestinians), so don't be surprised if you fall into an odd position. I would suggest to get a Mcdonald PCA, through Dr. Doug Mcdonald.