23andme has added a new advanced global similarity tool to their website (you need to register in order to play with it). This tool places a customer, as well as other customers he is "connected" with on the map of the first two principal components like the ones recently published in several papers.
The tools allows one to look at the PC map at the global, continental, or subcontinental level.
This is quite useful, and a right step in the direction I pointed out earlier. However, there are some points of criticism.
- The axes are labeled North/South Migration and East/West Migration. While the pattern in the first two principal components does correspond roughly with longitude and latitude, it is erroneous to label these principal components as "North/South" and "East/West". It is even more erroneous to label them as "Migration", since a geographical cline is not necessarily produced by a migration event.
- The "Take a Tour" feature presents a simplistic and misleading account of human prehistory in terms of "migrations". This account is a simple branching pattern, e.g., Africa -> Near East Europe, or Africa -> Near East -> Central Asia -> East Asia. The observed pattern did not emerge in this manner. For example, Central Asian people such as the Uyghur are intermediate between Western Eurasians (Caucasoids) and Eastern Eurasians (Mongoloids) because of a later admixture event; they can't be thought of as "ancestors" of the East Eurasians.
- Partitioning human variation into this hierarchical set of groups is not the best way to satisfy customers' needs. For example, a Hispanic person may wish to see himself on a PC map which includes "Southern European" and "Native American" groups, an African American person may wish to see himself on a PC map which includes "Northern European" and "West African" groups, an Ethiopian, on a Sub-Saharan/Near Eastern map, while a European Jew on a European/Near Eastern map. Of course, there is a combinatorial number of possible combinations, but there is no reason why some of the more common ones (customer feedback may play a role here) many not be supported.
- Why should this tool be limited to the first two principal components? Of course, additional components do not have such a strong geographical correspondence, but they -nonetheless- will separate populations in different ways, and allow individuals to place themselves more fully in context.
- The tool could offer much more information. On mouse hover over an individual, a small label identifying it (e.g. origin and HGDP code), and listing its PC coordinates could appear. This is especially useful for power users. A pretty uncluttered picture is no substitute for as much information as possible.