The latitudinal/longitudinal error for leave-one out validation is shown on the left. This involves using N-1 out of N samples to build an estimator, and then guessing the longitude/latitude of the Nth sample that was not included in the estimator.
The authors make a point that latitudinal error is smaller than longitudinal error. However, we should keep in mind that the "rectangle" of the sampled populations (east-west limits: Portugal-Serbia) does not approach a "square", so the relative error (absolute longitudinal error/longitudinal extent) is not that different.
PLoS ONE doi:10.1371/journal.pone.0011892
Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers
Petros Drineas, Jamey Lewis, Peristera Paschou
Recent large-scale studies of European populations have demonstrated the existence of population genetic structure within Europe and the potential to accurately infer individual ancestry when information from hundreds of thousands of genetic markers is used. In fact, when genomewide genetic variation of European populations is projected down to a two-dimensional Principal Components Analysis plot, a surprising correlation with actual geographic coordinates of self-reported ancestry has been reported. This substructure can hamper the search of susceptibility genes for common complex disorders leading to spurious correlations. The identification of genetic markers that can correct for population stratification becomes therefore of paramount importance. Analyzing 1,200 individuals from 11 populations genotyped for more than 500,000 SNPs (Population Reference Sample), we present a systematic exploration of the extent to which geographic coordinates of origin within Europe can be predicted, with small panels of SNPs. Markers are selected to correlate with the top principal components of the dataset, as we have previously demonstrated. Performing thorough cross-validation experiments we show that it is indeed possible to predict individual ancestry within Europe down to a few hundred kilometers from actual individual origin, using information from carefully selected panels of 500 or 1,000 SNPs. Furthermore, we show that these panels can be used to correctly assign the HapMap Phase 3 European populations to their geographic origin. The SNPs that we propose can prove extremely useful in a variety of different settings, such as stratification correction or genetic ancestry testing, and the study of the history of European populations.