May 08, 2009

Genetic structure in Europeans (Nelis et al. 2009)

Yet another study on Europeans, this time with ~3K individuals and ~270K SNPs. The studied populations now include a wide assortment of Slavs, Balts, as well as Estonians and several other populations from all over Europe.

The most interesting new fact from this study:
Estonia is a small country with no geographic barriers and its Estonian population is merely one million. In order to study the genetic structure of Estonia in more detail, all Estonian individuals were grouped here by their county of birth. Then, PCA was performed and the mean values of the two first PC of the counties were plotted onto the Estonian regional map (Figure 2). Surprisingly, the resulting genetic map correlates almost perfectly with the geographic map, although Estonia is only 43,400 km2 in size, and the mean area of a county only 2,900 km2. Thus, fine-scale genetic difference can be revealed by PC analysis, and the results can be useful for identification of the distant relatives.
Figure 2 is reproduced here; the Estonian map is on the bottom right.

What seems very interesting is how Swedes and Estonians both deviate towards Finns but from different "starting points", a North German-Central European one and Baltic-West Russian one respectively. This is quite reasonable, as Swedes are Germanics who absorbed some Finnish elements, while Estonians are Finno-Ugrians surrounded by Balto-Slavs.

As the authors note, the multi-dimensional scaling plot is quite similar to the results of the PCA analysis:


Also of interest is the result of PCA within individual countries for which more than one geographical sample were available.


as the authors note:
Interestingly, PC analysis was also capable of highlighting intra-population differences, such as between the two Finnish and the two Italian samples, respectively. A low level of intra-population differentiation in Germany has been reported previously [18], and was confirmed here. In addition, we detected intra-population differences within the Czech and Estonian samples (Figure S3).
The two Finnish samples were from Helsinki and Kuusamo. The German ones from Schleswig-Holstein and Augsburg. The Italian ones from Borbera valley in the Piedmont and Apulia.

PLoS ONE doi:10.1371/journal.pone.0005472

Genetic Structure of Europeans: A View from the North–East

Mari Nelis et al.

Abstract

Using principal component (PC) analysis, we studied the genetic constitution of 3,112 individuals from Europe as portrayed by more than 270,000 single nucleotide polymorphisms (SNPs) genotyped with the Illumina Infinium platform. In cohorts where the sample size was >100, one hundred randomly chosen samples were used for analysis to minimize the sample size effect, resulting in a total of 1,564 samples. This analysis revealed that the genetic structure of the European population correlates closely with geography. The first two PCs highlight the genetic diversity corresponding to the northwest to southeast gradient and position the populations according to their approximate geographic origin. The resulting genetic map forms a triangular structure with a) Finland, b) the Baltic region, Poland and Western Russia, and c) Italy as its vertexes, and with d) Central- and Western Europe in its centre. Inter- and intra- population genetic differences were quantified by the inflation factor lambda (λ) (ranging from 1.00 to 4.21), fixation index (Fst) (ranging from 0.000 to 0.023), and by the number of markers exhibiting significant allele frequency differences in pair-wise population comparisons. The estimated lambda was used to assess the real diminishing impact to association statistics when two distinct populations are merged directly in an analysis. When the PC analysis was confined to the 1,019 Estonian individuals (0.1% of the Estonian population), a fine structure emerged that correlated with the geography of individual counties. With at least two cohorts available from several countries, genetic substructures were investigated in Czech, Finnish, German, Estonian and Italian populations. Together with previously published data, our results allow the creation of a comprehensive European genetic map that will greatly facilitate inter-population genetic studies including genome wide association studies (GWAS).

Link

60 comments:

mikej2 said...

Very intersting to see how Estonians are placed. A suprise for me was taht they have still so much common with Finns.

About Finns some facts. Samples from Helsinki correspond very much to overall Finnish genome map, from east to west, because Helsinki has drawn people over the country during centuries, being s the biggest ecomomic center in Finland. The original Southern coast population, which arrived to the empty land so late than in 1300s, has disappered long ago. Helsinki is a good example representing whole country.

Kuusamo is the biggest favourite of geneticists because of it's isolated population.

Richard said...

I am glad you took the issue of finnicazation of Helsingfors in the picture. Yes, it's the sad the indegious Swedish ethnicity has been wiped out, but your remark on the time frame was a bit over the top. Helsingfors was still 100% Swedish in the early 1800´s. However, we Swedes are still here with our 7% share of the population, mostly concentrated on the southern downtown area by the sea.

"In the Helsingfors (Helsinki) capital region, municipalities that were formerly exclusively Swedish-speaking today often have a proportion of Swedish-speakers lower than ten percent".

Hedberg, C. 2004.The Finland-Swedish wheel of migration.

pconroy said...

Interesting also that some people from Switzerland also move towards Finland - again, as I commented in another post, probably as a result of the settlement of Huns in a number of Swiss valleys, after their defeat at Chalons.

Huns being a Turkic population, with some affinities to other Ural-Altaic speakers, like Finns.

mikej2 said...

"Interesting also that some people from Switzerland also move towards Finland"

Yes, this is interesting. I noticed that Germans deviate towards Finns. Looking Finnish history it looks clear that Finns has absorbed German elements. But I have no idea about Switzerland :)

mikej2 said...

"I am glad you took the issue of finnicazation of Helsingfors in the picture."

You are wellcome, I said only the truth :)

Polak said...

Mike,

My hat is still here in one piece.

So was I right, or was I right?

mikej2 said...

Polak, I said that we need more samples from Russia. I have now a Russian study with bigger sample data, which shows different result than this one in Dionekes' blog, so don't lose your hat :)

pconroy said...

It's interesting that although the Polish sample is taken from the extreme North West of the country, the Russian sample are still nearer to them than to the Baltic countries of Lithuania or Latvia.

IIRC the initial founders of the city of Novgorod were Polish Slavs. Novgorod then went on to colonize a huge swath of land to the North East of the city, that became its own principality in the Middle Ages. The huge area was later conquered by Kievan Rus, and later still, after the destruction of Novgorod and Kiev by the Mongols, was eventually conquered by the principality of Moscow - which went on to become Russia.

Maybe this has some influence on today's population.

Polak said...

pconroy,

The Polish sample comes from Szczecin, which was largely settled post-WWII by Poles from Vilnius, now in Lithuania.

If you want a western Polish sample, you have to go to Poznan, my home town. For a northern sample, you gotta go to Kujawy, not Gdansk, which is also mostly home to eastern Poles.

pconroy said...

Actually here's a link to Novgorod, and the very well preserved Birch Bark documents which were found there in the 1950's, it says:

Due to the writings as well, it was at another time proved that the Novgorod and
Pskov Slovens spoke a dialect that was very much unlike the Kievan region's
mother tongue. So the population of the region was of another origin, and so the
whole teritory of Kievan Rus was inhabited by at least two branches of Slavonic
people. BTW, academician Rybakov even supposed, bringing some proof, that the
Slovens were actually the Western Slavs, and that the name is translated as
"messengers of the Veneds"- "Sly Vene" (the Veneds was the general name of the
Baltic Slavs even before they moved there from Central Europe).
Veneds of course refer to the Venedi - or Western Slavs (Poles), who were called Wends in German.

Polak said...

pconroy,

Well, the Poles in this study are truly eastern. They're even more eastern than the Poles from modern day eastern Poland.

Having said that, all Poles and all western/central Russians, and their descendants in other parts of Russia, are gonna be damn close.

But the real mystery is why the Russians are too western, in terms of their genetics, for where they live?

Corded Ware expansions?

pconroy said...

Polak,

Didn't know you were from Poznan. My first wife grew up in Poznan, and my current wife's paternal grandfather is from there.

My father-in-law has asked me to track down some genealogy info on Poznan for him - do you currently live there?

mikej2 said...

pconroy,

Novgorod was in the begining a multinational town, not only Slavic.

"It's interesting that although the Polish sample is taken from the extreme North West of the country, the Russian sample are still nearer to them than to the Baltic countries of Lithuania or Latvia."

It is also interesting that a Finnish countryside municipality, Kuusamo with only 17000 inhabitans, has in Fst-scaling equal genetic variation as big Russia :)

Btw, those Swiss going to Finland are obviously Finnish F1-drivers.

Polak said...

Mike,

The Finnish "variation" you see there is actually a lack of variation.

Finns are inbred.

Polak said...

P.S. No I don't live in Poznan now. But most of my family does.

mikej2 said...

Polak, I only looked the PCA- and MDS-pictures and principal component axis. So I don't know what to believe.

In west-east axis Finns have more variation than Central European countries, but it is not possible in a tiny counryside village Kuusamo.

Dienekes said...

Finns show a classic pattern of an admixed population, with the two ancestral components being (i)the pre-admixed Finns and (ii) north Germanics. This is also in quite good agreement with their Y-chromosome composition (N3 and I1a being the two most important haplogroups in them.

Polak said...
This comment has been removed by the author.
Polak said...

Yeah, it looks like one of those PCAs with Europeans and Africans, and a long slither of African Americans drifting towards the Europeans from Africa.

Nevertheless, we really need to sample some Finnic groups with LD patterns not quite as skewed as the Kuusamo isolate.

eurologist said...

I noticed that Germans deviate towards FinnsI don't see that in any of the plots. The northern German sample is from Schleswig-Holstein, which would be expected to be almost identical to southern Denmark - so there is no surprise it stretches exactly along the lower left boundary of the Swedish points. Of course, you will also have some Baltic elements in Schleswig-Holstein - which is why the points stop exactly at the westernmost Estonian outliers, but north of the Polish ones.

I find it very interesting to see the Hungarian dots to mingle so well with the Southern German ones. They are essentially both a Central-European "Danubian" population, and it looks like the Hungarians have less Slavic admixture than (some) Germans - and clearly less than Czechs.

Polak said...

^ Don't take the visuals too literally. They're only two dimensional, and genetic relationships are multi-dimensional. It's not wise to get a ruler and start measuring each millimeter.

There's an Fst table if you want to have a look at the pairwise genetic distanes between the populations. And the Hungarians are about as close to Slavs as to the Germanics (0.000 Fst with Czechs and South Germans, 0.001 with Poles and North Germans, 0.002 with Russians and Swedes).

mikej2 said...

Eurologist,

there is certainly not any prove about German deviating to Finns. I wrote this as a joke to refer to all uncertainty what we can see in these maps and also in the same category with a notice that Swiss are deviating to Finns. I am convinced that thse PCA- and MDS-maps are still so undeveloped that it would be foolish to say something too seriously.

The fact is however that during the Hanseatic league about 5% of all inhabitants in Finland were Germans.

Second we can look at how Finns are placed in the MDS-maps of 23andme. This is obvious that Finns split on the Eastern European map to two group, one lies in Russian, one in Polish subgroup. This certanly not mean that part of Finns are Poles, only that some Central European elements.

Maju said...

But the real mystery is why the Russians are too western, in terms of their genetics, for where they live?

Corded Ware expansions?
.

Psah. Corded Ware also affected Baltics, AFAIK.

Are Russians too Western or Fino-Baltic too Eastern? Or is something totally different?

In fact, isn't actually the PC1 graph showing a "south to north" cline, from Southern Italy in one extreme to Latvia (and Finland) in the other? The actual PC1 sequence is: S. Italy-N. Italy-Spain-Switzerland/France/Bulgaria-Hungary/Germany/Czechia/Sweden-Poland-Russia-Lithuania/Estonia-Latvia-Finland, which seems a S-N cline to me (SW-NE if you push me).

Then there is that other PC2 cline with Latvia and Eastern Finland at the extreme, for which most of European populations are rather neutral (some Baltic area nations tend to the Latvian pole anyhow).

Maju said...

Btw, S. Italy - Finland axis is something way too familiar in k-means analysis of autosomal DNA, right, in other papers you'd see Greeks taking the extreme place at the Mediterranean pole ut it must be the smae old red-blue dychotomy. Of course, the presence of those component is much diluted in fact in most populations when you reach deeper levels of analysis but still the same old Finnic-Aegean dychotomy.

You still have PC2, which must be a product of oversampling in that region around Estonia and that seems pretty much irrelevant outside of Eastern Europe (neutral values).

mikej2 said...

Maju, did you notice that Finns actually deviate strongly to Central Europe from the south-notrh axis? Without this deviation Estonians and Finns would be in straight line with the others. This seeem to be German influence, which also have seen by philologies. Latvians and Lithuanians have less this effect than Estonians and Western Finns. This is only a historical point of view.

mikej2 said...

And also consistently, the Slavic influense in Estonian and Finnish samples is insignificant, which has been told also in the doctor thesis of T. Lappalainen. This seems to be reason why Esotnians and Finns rotate Slavic samples from far away. (Slavic "pushes" Finns and Estonians away in their connection to Germans.

Maju said...

Maju, did you notice that Finns actually deviate strongly to Central Europe from the south-notrh axis?.

The two-dimensional image that you see there is in fact nothing but a depiction of two distinct uni-dimensional ones. Your "deviation" therefore is surely a delusion.

Of course. some Swedes tend towards Finns in both axis and some Finns tend towards Swedes and Estonians also in both axis but what you really see is, removed all what is not PC1 nor PC2 (together they make a 13% only and may be even less in some populations), how they compare by these two bipolar parameters.

A population that has only 1% +PC1 but 0% -PC1 will show in the right extreme, while one that has 30% +PC1 and 30% -PC1 will show up at the center, exactly like it would do one with 0% of each. Maybe these extreme cases do not actually exist but they give a good idea of what to expect. Same for the PC2 (and in this case I think it's clear that many populations have nearly nothing of either factor, as it only shows meaningful divergence in Eastern Europe).

It's not a map, just an euclidean 2D space where dots are plotted in funcion of their values in each of the axis.

Polak said...

Maju,

Yeah, it looks like Balts and Finns are the ultra Northern Europeans, but PC2 splits them apart.

The split is most likely the result of Finnish founder effect and isolation, because both groups carry a lot of N. At the same time, there must be a reason why Estonians are only 0.001 Fst away from Poles, and I think that reason is whoever carried R1a1 up there.

Also, I think if that plot was 3D, we'd also see that Finns and Italians don't line up even remotely, because the former deviate from the Central European cluster on an angle.


Mike,

Although PCAs are very dynamic, depending on who's on them, genetic distances are static. Slavs won't make Finns less or more related to anyone. If you're not sure about what a PCA is showing, cross check the results with the Fst distances. They're not dicated by who's on the table.

Dienekes said...

Maju your "explanation" shows that you don't have a clue about how PCA works.

mikej2 said...

Maju, I know how these maps work, but if there is no message in these maps, so why to do them? The fact is that IN THIS point of you Germans and Estonians/Finns are in some connection and the map indicates just that and the map is like it is. If we take an another PC-axis, it would show sam or not.

mikej2 said...
This comment has been removed by the author.
mikej2 said...

Polak: "Slavs won't make Finns less or more related to anyone."

Don't put words into my mouth, it is not nice :) I have not said so. I said that Estonians and Finns have drawn towards Germans and the moderate distance to Slavs make them rotating to the space they are in the map.

Saturday, May 09, 2009 12:00:00 PM

Polak said...

Dienekes,

How do you explain the extreme differentiation between the Finns and Estonians/Balts on PC1? Both carry a lot of N, which is what I presume you would say is the paternal North Eurasian influence in the Finns.

Polak said...

PC2 I meant...

Maju said...

Mikej: it's not a map. As I said above: Baltics and Finnish occupy an extreme in PC1 and diverge in PC2. Russians and Poles are closest among the rest in PC1 and tend towards Baltics in PC2.

Dienekes said...

How do you explain the extreme differentiation between the Finns and Estonians/Balts on PC1? Both carry a lot of N, which is what I presume you would say is the paternal North Eurasian influence in the Finns.The difference between Estonians and Finns is due to the fact that the latter found themselves fairly isolated, sandwiched between Swedes and Slavicized Finno-Ugrians, while the latter found themselves living among Balts and Slavs.

Polak said...

I meant why don't the Balts also pull in an "Eurasian" direction because they also carry a lot of N, and if the Fnnish direction is indeed the Eurasian one?

mikej2 said...

Polak, Ydna haplogroups are different thing than autosomal dna. Same haplogroups exists within large areas in Eurasia.

mikej2 said...

Maju, I know that it is not a real map. Make your point in this issue and donät try to teach me :)

Polak said...

Mike,

Are Y-chromosome haplogroups really different from autosomal DNA, really??? (please note sarcasm).

The Balts carry a lot of N. So they obviously share a lot of ancestry with the Finns, one way or another. This has to reflect somehow on autosomal DNA (and it does show in PC1 here). But the question is, why are the Finns so different from them on PC2? And if its not Eurasian heritage (including HG N), then what?

mikej2 said...

Dienekes "The difference between Estonians and Finns is due to the fact that the latter found themselves fairly isolated, sandwiched between Swedes and Slavicized Finno-Ugrians, while the latter found themselves living among Balts and Slavs."

Yes taht is correct, but I wish to see more those slavicized Finno-Ugrians in studies. Estonian has stonger connections to the Balts, the Finns lack these connections and have some amount more of genetic drift, and it can be seen in PCAs and MDSs. There is no mystics.

Dienekes said...

I meant why don't the Balts also pull in an "Eurasian" direction because they also carry a lot of N, and if the Fnnish direction is indeed the Eurasian one?Patrilocality + a few thousand years of marrying Caucasoid women.

mikej2 said...

Yes Polak, a bit sarcasm is wellcome :)

I think, not sure, but think that cap between nowadays Balts and Finns are Slavs!

Balts and Finns have been earlier (prior 2000 BP) much close each other, but the Slavic expansion impacted greatly to Balts and almost nothing to Finns. It affected also to Estonians, but not equally than to Balts, however more than to finns. Do you agree this, even in a tiny small part :D

Polak said...

Dienekes,

We'll know when we get some Fst scores between East Asians and Finns, Balts and Russians. If you're correct, there will be a huge difference, with the Finns deviating a lot more to Asia.

Mike,

I agree.

Dienekes said...

We'll know when we get some Fst scores between East Asians and Finns, Balts and Russians. If you're correct, there will be a huge difference, with the Finns deviating a lot more to Asia."More than Russians" depends on the sample of Russians one picks; these ones are on the low-end of Finno-Ugrian admixture.

mikej2 said...

Here is a MDS-image form 23andme's Advanced Global Similarity. It is captured from the Eastern European submap. The global picture depends on what kind of genes are excluded outside this image, but at this Eastern European level Finnish genes exist just how they act among Eastern Europeans, i.e. what they have common genes.

So far I have understood that 23andme's Russian reference data is collected from very uncommon place, never seen in these studies, from Vologda. Vologda is known about a high consentration of HG I1 in Russia.

http://www.cephb.fr/en/hgdp/table.php

Blue dots - Finns
Yellow - Estonian

Russian reference data is shown as blue squares (Vologda), Polish as pink squares

On the European level Finns are to the west, northwest and north from Russian. The global level is too compact to be readable.

http://img17.imageshack.us/img17/3775/plot.jpg

Dienekes said...

So far I have understood that 23andme's Russian reference data is collected from very uncommon place, never seen in these studies, from Vologda. Vologda is known about a high consentration of HG I1 in Russia.Vologda is not very unusual in terms of its presence of Finno-Ugrian admixture, however. According to Fechner et al., five regions have more N3 and six less than Vologda.

Polak said...

That 23andMe plot of Eastern Europe is terrible.

It's hampered by too few reference populations (Poles, Ukranians, Russians), and the main components don't correspond to east/west and north/south at all, so the labels make no sense.

Their European wide plot is much better. And on that one Finns usually cluster above the Scandinavians and North Russians.

mikej2 said...

Polak, you are right, reading off all submaps of 23andme requires much knowledge about both used methods and the sample data.

You are wrong, Finns are not to the north of Russians. I show you - and you'll make new explanations if you wish :)

Finns are the circled samples, Western Finns are the lowest inside the circle (in fact 4 Finns and one Estonian - uppermost inside this group) Eastern Finns are the uppermost.

Just below Western Finns are Poles and to the Southwest are Norwegians and perhaps also Swedes, but I have not been lucky to collect them enough.

Sp we can discuss about whether Finns have some genetic drift or not, but you can see, if you wish, that the Northern Russian sample data may have affected to the location of Finns. And you can also measure the distance between Central Europeans and Finns versus the Central Europeans and Italians.



http://img134.imageshack.us/my.php?image=europe070509.jpg

Maju said...

I meant why don't the Balts also pull in an "Eurasian" direction because they also carry a lot of N, and if the Fnnish direction is indeed the Eurasian one?.

Polak: because PC2 represents a cline within Balto-Finnic populations primarily and Eastern Europeans almost exclusively.

And PC2 does not show an "Eurasian" (you mean Asian or Eastern European, Eurasia = Asia with Europe included, all Europe) trend but a Balto-Finnic trend. In this sample at least, where populations around Estonia were oversampled (it's after all a paper on Estonians rather than Europeans as a whole) one of the PCs (PC1) shows a Balto-finnic pole contrasting with a Mediterranean antipodes and the other (PC2) shows a Baltic-Finnic dychotomy, with Slavs, naturally, being closer to Baltics than to Finns along this axis. Both PCs represent only small apportions of the whole European variation anyhow.

argiedude said...

The Italian results struck me immediately as simply wrong. They contradict the previous study by Seldin (2006), in which he used 86 Italians split equally between north, central, and south Italy. The geographic center of this mass of samples was central Italy, not far away from the location of the "south" Italy samples in this new study, which come from Foggia, north Apulia, close to the limit between what can be considered central and south Italy. But the Seldin distance to central Europe was 0,0030 FST, while in this study it's 0,0080 FST, a huge difference that can't be due to the normal margin of error.

Seldin used 86 samples, and this study used 53 and 57 (north and south, respectively).

But what's going on here basically amounts to inbreeding, while sample size is a secondary issue.

They used 19 populations. After doing "quality control" they retained a subset of the original samples. For the non-Italian populations, the average number of retained samples was 96% of the original samples. Only 55% of North Italian samples made it past quality control, and only 60% of South Italian samples did. That's very weird and requires an explanation. My explanation is that they didn't pass the are-you-related test. The study says the North Italian samples come from the Borbera Valley. This place doesn't even qualify as a small city. It's half a dozen villages of 200 to 2,000 people, in a river valley in the mountains. They were probably selected purposefully under the thinking that people in remote areas best preserve the original genetic make-up of the regional population. From the supp: "The Italian samples were randomly chosen from those enrolled in a population study named Carlantino Project, which is focused on inhabitants arising from an isolated village at the border between Central and Southern Italy (Province of Foggia, Region of Apulia with 1200 inhabitants)." Inbreeding is unquestionably a factor here.

The distance between north and "south" Italy is 0,0050 FST. At least in the case of the big difference between Italy and trans-Alpine Europe the argument can be made that the Alps caused the great genetic differentiation. In fact, I was expecting a "big" jump between North Italy and Switzerland of 0,0015 to 0,0020 FST, equal to the distance between Spain and Belgium/England. But you can't make any such argument to justify the incredible 0,0050 FST between north and south Italy. Any other European population separated by the same geographic distance would have had 0,0005 FST, not 0,0050 FST. Second of all, when you line up 3 populations, the middle population will always have an FST to the other 2 that, added up, will be equal to or slightly smaller than the distance between the 2 extremes. France has 0,003 and 0,005 FST to the north and south. Switzerland has 0,003 and 0,004 FST. South Germany has 0,004 and 0,006 FST. Since north and south Italy have 0,0050 FST between them, then in all 3 cases the sum of the inner FST distances is notably bigger than the FST distances between the 2 ends of the line. North Italy should be intermediate to either point, and thus its distance equal to (or less than) the distance between those 2 points. It's extra distance is inbreeding; it's shooting off into a direction that isn't present in either French/Swiss or south Italians. The inbred direction. The extra 0,0030 FST is the inbred factor.

I think that, if anything, we can make a guesstimate of what the north versus "south" (Foggia) distance will eventually be in a better study by subtracting the distance to north and south Italy of France, Switzerland, and south Germany. On average, it's 0,0017 FST ( (0,002 + 0,001 + 0,002) / 3). That sounds reasonable to me, though I'll bet it will be somewhat less than that. And given that Seldin found a distance of 0,0030 FST between central Italy and central Europe, that leaves 0,0015 FST for the distance between north Italy and central Europe, which is exactly what I am expecting it to be. But we'll have to wait for a proper study, which this one to me definitely doesn't seem to be (regarding Italy, only, of course).

.................................................

When dealing with Europe, putting the results at only 3 decimal places is very lousy. Most country-to-country distances are less than 0,0010 FST, and in Heath there was one country-to-country result that was less than 0,0001 FST (Czech-Slovakia).

.................................................

"Mean Fst was 0.0010 for the 14 Estonian counties". [from the study]

There were 966 Estonian samples, so that's 69 samples per county, and you know that's an ideal average, but that the reality must have included counties with just 20 or 30 samples and others with 100. The margin of error using these sample sizes is... 0,0010 FST! Ha ha. So no, they did NOT find a mean of 0,0010 FST between Estonian counties, that's just the margin of error. The real inner-Estonian FST is probably very close to 0,0000 FST.

.................................................

I'm noticing lots of little details that I just find wrong. They note that LD is notably higher in Poland than the rest of Europe, comparable to the Kuusamo, and then say this is probably because the Polish samples are from West Pomerania, which was populated after WWII from more eastern parts. Ok, but then why does the Russian sample, which seems to be located very close to those more eastern parts, have a normal LD like the rest of Europeans? How could they not notice this?

They also said that autosomal and y-dna studies showed a northwest to southeast gradient. The autosomal studies did, but the y-dna studies showed an east-west gradient.

.................................................

But there's a very interesting observation about PC analysis. It says: "The twenty-two (11 SNPs for the first PC and 11 SNPs for the second) most variable SNPs presented as default output of the PC analysis. These SNPs have significantly different allele frequencies between studied populations and correspond to the largest eigenvalues of the first two PCs explaining the most variance."

They built the 2 axis of the PC graph using just 11 SNPs for each axis. And it all works out mathematically. They start with 273,464 SNPs, but of course, after eliminating SNPs in LD they probably end up with about 50,000 SNPs. [correction, the study at the end says specifically they ended up with 68,000 SNPs after LD corrections and other stuff]

The typical FST across wide stretches of Europe is 0,0040 FST, which multiplied by 50,000 SNPs would result in some 200 SNPs (that are highly differentiated within Europe). The 1st and 2nd PCs explain 8,7% and 4,9% of the total variation. Since 100% of the variation is explained by 200 SNPs, then 8,7% + 4,9% is equal to 27 SNPs, very close to the actual 22 SNPs in reality. Good, so now I finally know how many SNPs really go into these PC graphs. This also explains something else: the PC graphs show a greater variance than actually exists. In many previous studies, the PC graphs will show several samples that are way out of where they're supposed to be. This seems to indicate the sample is from someone who is of mixed ancestry, but it could easily be explained by the rare individuals who will be psoitive/negative for just 3 or 4 SNPs that people from their specific region shouldn't be. Just 3 or 4 bad rolls of the die and the sample ends up 2 countries off from where it should be located!

The larger variance in PC graphs can be seen in Figure 2a. The blob formed by the mass of European samples, compared with the triangle formed by the CEU/Yourba/Han+JPT samples, is relatively much larger than the European blob in my own FST genetic graph compared with the same triangle of Euros/Nigerians/Japs+Hans. My genetic graph was painfully (emphasis!) built on whole genome results (aka FST), while these PC chumps are built using a couple dozen significantly differentiated SNPs.

eurologist said...

Both PCs represent only small apportions of the whole European variation anyhow.Seems it would be useful to do two analyses, one that has only the Germanic, Baltic, and Finnish populations, and one that has only the Slavic, Baltic, and Finnish populations. That way perhaps one could distinguish if there is such a thing as an independent Baltic element, and how large/important it is.

In this study, where too much is put into the mix, it looks like the Balts are the most Slavic Slavs - but that certainly only appears so because they are the least Mediterranean, which is also mixed into PC1.

Likewise, Hungary and Bulgaria "look" less Slavic here also because they are more Mediterranean than e.g. Czechs.

Sometimes less is more.

Polak said...

argiedude,

Nice post. I agree with a lot of what you say. I think we should take this effort as an exciting preliminary one, and wait for more detailed stuff which I'm sure will come next year.

In regards to the West Pomeranians having that skewed LD. I can totally see it, because these people come from Polish communities in the east (Lithuania and Belarus), which were somewhat isolated from the main Polish population by geography, and from the locals by religion, language and ethnicity.

And you can add to that the fact that even though Poland is a large country, it's probably the most homogenous one in Europe by far for its size.

Maju said...

To argiedude:

Not sure if the particular focus in NE Europe in this study may skew the effects for Italy. I would not make much of this: it is a paper on Estonians in the European context, not a paper on Europeans in general, much less about Italians.

Apulia is South Italy by all standards: Greek colonization, part of the Sicily-Naples Kingdom, etc.

II was expecting a "big" jump between North Italy and Switzerland of 0,0015 to 0,0020 FST, equal to the distance between Spain and Belgium/England. But you can't make any such argument to justify the incredible 0,0050 FST between north and south Italy. Any other European population separated by the same geographic distance would have had 0,0005 FST, not 0,0050 FST.

Yah, this 50 FST, I guess, can because neither researched component is really major in Italy. They are looking at Estonia and at most at Europe as whole from an Estonian viewpoint.

Anyhow, Italy was a meaningful receptor of population from outside in the Neolithic and Metal Ages. The South especially was strongly influenced from the Balcans and in general Eastern Mediterranean influences all that time, the North instead was more influenced from the Danubian region even since those very Neolithic beginnings (though both regions are Cardium Pottery, the south gets Neolithic earlier from the Dalmatia-Albania area while the north gets it later from the Bosnia-Croatia one - the south by Sea, the North by land). Later developements magnify these differences.

Overall the South should and does tend to approach Greece and nearby areas, while the North is closer to the Transalpine surroundings.

Anyhow, the distance between Milan and Bern (or Venice and Zagreb, or Genoa and Marseilles) is much shorter, even considering the barrier-efefct of the mountains, than that between Milan and Naples. Italy is a big country and especially a long one.

The margin of error using these sample sizes is... 0,0010 FST! Ha ha. So no, they did NOT find a mean of 0,0010 FST between Estonian counties, that's just the margin of error. The real inner-Estonian FST is probably very close to 0,0000 FST.

Not necesarily: it means that it's between 0.0000 and 0.0020 in fact (+/-0.0010 error margin means that). You are arbitrarily chosing one of the extremes of the EM, it could perfectly be the other.

The last section of your comment is a very interesting read anyhow. I'd love to see your resulting graphs based in total variance.

Caudium said...

Maju said:
The South especially was strongly influenced from the Balcans and in general Eastern Mediterranean influences all that time, the North instead was more influenced from the Danubian region even since those very Neolithic beginnings (though both regions are Cardium Pottery, the south gets Neolithic earlier from the Dalmatia-Albania area while the north gets it later from the Bosnia-Croatia one - the south by Sea, the North by land). Later developements magnify these differences. Actually, Maju, there are Y-DNA observations that suggest a common source of most of the Neolithic-dervied lineages in Italy (both North and South).

Read:
Y chromosome J2 subtyping in an Italian sample: Population and forensic implications by Valerio Onofria et. al.

The occurrence of J2 types is significantly different in the two areas (North and South Italy) (Chi-square test χ2=4.55, P=0.03). In order to investigate if such a difference was reflected also for lineages within J2, we compared the relative frequencies of the J2 lineages between the two regions using the Fisher's Exact test as implemented in the Arlequin package [7]. The two groups were not significantly different (P=0.7). We finally explored if any of the single lineages had significantly different frequencies between the two groups by contingency tables. None of the comparison was significant (P>0.05).So frequencies of lineages can be different North Vs. South. But ratios of the lineages are relatively consistent throughout the length of the Italic peninsula.
Mind you, this is only J2, but aside from G and E (which is common throughout the peninsula, with higher rates of E in the South; as well as J1, which is more evident in the South, though still quite weak), this is the one mostly commonly associated with Neolithic diffusion in Italy.

Maju said...

So frequencies of lineages can be different North Vs. South.

Well, that's it. I don't understand what you mean with all the rest.

But anyhow I'm not pointing out to Y-DNA (which normally does not say too much about overall ancestry) but what autosomal data (e.g. Bauchet 2007 and others) do seem to indicate - albeit in absence of an Italian-specific component that I believe must exist.

Caudium said...

Basically, the ratios of the different subclades are the same. The practical application of this is that the entry of J2 into the North and South Italian genetics came from a common event. There was definitely a vector of demic and cultural diffusion from South Italy towards the North. I don't think that Northcentral Italy and Southern Italy were cut-off from eachother all that much during the Neolithic. Aside from that, I see the logic in what argiedude wrote regarding Italy.

Caudium said...

I understand that Autosomal DNA is a much better tool for understanding total ancestry.

But Y-DNA can be useful to get an idea of how certain lineages entered a population. Since we are talking about Paleoanthropology (ie. the nature of Neolithic demic diffusion) I referenced that particular study.

My statement wasn't meant to refute this autosomal study, but rather as a way to supplement the information arising from the implications of this study.

Maju said...

I don't mean they were "cut from each other". But I think that they have got pretty much different histories overall. Britain and Spain (or Italy and Spain or whatever) were never "cut from each other" either but they have pretty much different histories anyhow, and within Britain and within Spain (or France, etc.) you also find strong regional differences.

It's just impossible that such a large area is homogeneous like, say, Estonia. It's comparing an apple with a whole orchard (almost any Italian region, except Val d'Aosta surely, is larger than Estonia).

bau said...

Dear Dienekes, Do you know if the data of the article are free accessible? I cannot find the link or the ID for them