previous post I showed that the new paper by Elhaik et al. presents as its own (without citation) two of my ideas that were published on the web ~2 years before the paper was submitted.
I also took some time to evaluate the new aspect of their "geographical positioning system" (GPS) which is an algorithm to determine the geographic position of samples given their genetic distances to a group of reference populations. This is described under the heading "Calculating the biogeographical origin of a test sample" of their paper and I include a screenshot of it on the left to help you follow along.
From Equation (2) it is easy to see that the predicted position of the test sample is shifted away from the position of the best matching reference population (Positionbest) and towards the other reference populations (Position(m)) with the contribution of each reference population being weighted by wm which is the ratio of the distance of the closest population to the distance of the m-th reference population.
That is, BC/CA = BD/DA = BP/PA
In terms of the Elhaik et al. (2014) algorithm this constant ratio is wm, so if A and B are two reference populations, and the test sample is e.g., either C or D, then the same constant weight applies (and this is true for all points on the circle).
In practical terms, the algorithm of Elhaik et al. (2014) will predict the same geographical locations for all points on the circle. This will be perfectly accurate for C and biased for every other point on the circle (with D being the absolute worst).
It is actually easy to test whether the test population is like C or like D; in the case of C it is CA+CB=AB. This is a simple test of collinearity that exploits the fact that not only the distance of the test population to reference ones, but also the distance of the two reference populations from each other. And, indeed, it's easy to see that for a test population P we can estimate genetic distances AP, PB, and AB and these uniquely define the circle on which the point must lie. Do this for all pairs of reference populations, find the distribution of the intersections of these circles, find a peak of this distribution (if such exists) et voila you have a sound mechanism for localizing individuals based on genetic distances. I expect to see something like this in Nature Communications circa 2016.