The latitudinal/longitudinal error for leave-one out validation is shown on the left. This involves using N-1 out of N samples to build an estimator, and then guessing the longitude/latitude of the Nth sample that was not included in the estimator.
The authors make a point that latitudinal error is smaller than longitudinal error. However, we should keep in mind that the "rectangle" of the sampled populations (east-west limits: Portugal-Serbia) does not approach a "square", so the relative error (absolute longitudinal error/longitudinal extent) is not that different.
PLoS ONE doi:10.1371/journal.pone.0011892
Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers
Petros Drineas, Jamey Lewis, Peristera Paschou
Recent large-scale studies of European populations have demonstrated the existence of population genetic structure within Europe and the potential to accurately infer individual ancestry when information from hundreds of thousands of genetic markers is used. In fact, when genomewide genetic variation of European populations is projected down to a two-dimensional Principal Components Analysis plot, a surprising correlation with actual geographic coordinates of self-reported ancestry has been reported. This substructure can hamper the search of susceptibility genes for common complex disorders leading to spurious correlations. The identification of genetic markers that can correct for population stratification becomes therefore of paramount importance. Analyzing 1,200 individuals from 11 populations genotyped for more than 500,000 SNPs (Population Reference Sample), we present a systematic exploration of the extent to which geographic coordinates of origin within Europe can be predicted, with small panels of SNPs. Markers are selected to correlate with the top principal components of the dataset, as we have previously demonstrated. Performing thorough cross-validation experiments we show that it is indeed possible to predict individual ancestry within Europe down to a few hundred kilometers from actual individual origin, using information from carefully selected panels of 500 or 1,000 SNPs. Furthermore, we show that these panels can be used to correctly assign the HapMap Phase 3 European populations to their geographic origin. The SNPs that we propose can prove extremely useful in a variety of different settings, such as stratification correction or genetic ancestry testing, and the study of the history of European populations.
Link
7 comments:
The authors make a point that latitudinal error is smaller than longitudinal error. However, we should keep in mind that the "rectangle" of the sampled populations (east-west limits: Portugal-Serbia) does not approach a "square", so the relative error (absolute longitudinal error/longitudinal extent) is not that different.
Apart from the poor sampling of the northeast, the area is almost a square: 30 degrees longitude versus 15 degrees latitude makes for an aspect ratio of about 1.3, taking into account the reduction of distance (for a given angle) at northern latitude: 2*cos 48° ~1.3.
Hmmm.
I am presuming thad admixtured people will get complete rubbish from this model. And the folk who need this information as the people who don't know their origin and so don't know if they are admixed or not.
The errors in the table can relate to the error in the model, or the degree to which the population is admixed. More admixed populations can expect to have more error.
Which makes the results for the Germans and the Irish the weirdest, and interesting. Either they are the most admixed populations in Europe (I don't think so) or they are geographically not where they are supposed to be.
Which makes the results for the Germans and the Irish the weirdest, and interesting. Either they are the most admixed populations in Europe (I don't think so) or they are geographically not where they are supposed to be.
Not sure about the Irish, but Germans are clearly quite admixed with the numerous populations at their boundaries (I once posted some images in this blog to show just that). Also, it would certainly help if the the actual location (averaged over the four grand parents) would be used, rather than some country central value.
If I were to take a guess at diversity in Europe (before the past 100 years or so), I might rank Germany at #1, then France, then Italy.
The Balkans surely would take the #1 spot were it not for their numerous nations (they don't typically count as one) and history of ethnic cleansing.
Spain and Romania are also way up there - as is Russia, especially if defined properly, geographically - it easily can take the number 1 spot by a wide margin, in some definitions...
eurologist:
"(I once posted some images in this blog to show just that). "
You might be interested in some of the euro Y and mt DNA data that someone has been putting up on the eupedia website.
Germany is presented by regions north, south, east and west.
The regions you presented in your images six months ago somewhat align with North, South, East and West Germany.
I'm not sure if we can or want to declare any country as a diversity "winner". Each country has its own history and challenges, triumphs and tragedies.
I'm not sure if we can or want to declare any country as a diversity "winner".
Sorry, that wasn't my intent. I just wanted to point out that it seems logical that Germans (as a whole) would be quite diverse (given their geographical location), versus the belief that they are somehow an isolate and/or homogeneous.
On the flip side, there is no question that there are numerous sub-regions within Germany (and within many other European countries) that indeed appear like small, homogeneous islands with relatively low outside exchange for perhaps millennia.
Greece, Italy, France and Spain could surely cream Germany in admixture stakes.
Post a Comment