March 11, 2013

Genomewide structure of populations from European Russia (Khrunin et al. 2013)


  1. The intermediate position of Estonians between Balts and Finns
  2. The intermediate position of some Russian groups between Komi and the main body of Europeans.

PLoS ONE 8(3): e58552. doi:10.1371/journal.pone.0058552

A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe

Andrey V. Khrunin et al.

Several studies examined the fine-scale structure of human genetic variation in Europe. However, the European sets analyzed represent mainly northern, western, central, and southern Europe. Here, we report an analysis of approximately 166,000 single nucleotide polymorphisms in populations from eastern (northeastern) Europe: four Russian populations from European Russia, and three populations from the northernmost Finno-Ugric ethnicities (Veps and two contrast groups of Komi people). These were compared with several reference European samples, including Finns, Estonians, Latvians, Poles, Czechs, Germans, and Italians. The results obtained demonstrated genetic heterogeneity of populations living in the region studied. Russians from the central part of European Russia (Tver, Murom, and Kursk) exhibited similarities with populations from central–eastern Europe, and were distant from Russian sample from the northern Russia (Mezen district, Archangelsk region). Komi samples, especially Izhemski Komi, were significantly different from all other populations studied. These can be considered as a second pole of genetic diversity in northern Europe (in addition to the pole, occupied by Finns), as they had a distinct ancestry component. Russians from Mezen and the Finnic-speaking Veps were positioned between the two poles, but differed from each other in the proportions of Komi and Finnic ancestries. In general, our data provides a more complete genetic map of Europe accounting for the diversity in its most eastern (northeastern) populations.



  1. Would be interesting to see where Sami go. (Out a bit further than Finns?

  2. Is it even plausible that one of the major Eurasian primary genetic components have remained undiscovered until now, and that it's found in a small isolated population of a Finnic-speaking, reindeer-herding village?

  3. Do the (or some) Komi have an East-Ural background? They are almost twice as far away from Finns as both are from central Europeans. Surely both have in common being some of the earliest northern populations after LGM, and with the least neolithic and later admixture - the Urals would be a reasonable barrier that could have separated for a long time.

  4. The Russian results of this study closely resemble the Russian results of another study, which also found a northern-southern differentiation among Russians from Europe that is positioned on a cline from Finno-Ugrics to Poles.

  5. Dienekes, I dont know if my comment previously was sent correctly, because my phone doesn't give confimation, so I send it newly.

    The study figures more isolations than admixture in certain cases on all plots, on admix and PCA. In this meaning the result doesn't correspond to the real world around us as a figure of known history. Hopefully authors see it some day. Something between something doesn't mean admix, but also decreasing diversity on younger subpopulations. It is possible to repeat this phenomenon everwhere by using village data. We become very little wiser after reading this.

  6. lol @ eurologist

    I'll give you a hint about what you're seeing - genetic drift.

  7. @ MOCKBA

    "Is it even plausible that one of the major Eurasian primary genetic components have remained undiscovered until now, and that it's found in a small isolated population of a Finnic-speaking, reindeer-herding village?"

    Funny, I thought just the same... But we can find still some villages there.

  8. MOCK-BA
    When you see an elderly bedridden man you cannot from that decide that he was not powerful earlier on. Just a minutes work with Wiki informs one that prior to Russian contact their were a larger region. And early yet ???
    May I remind you that a finger bone from a Siberian cave has had an immense effect on this field.

  9. aeolius, it's a nice point about Denisova admixture, but it goes without saying that it doesn't make a substantioal principal component of variantion of the extant humans.

    As to Izhma Komi, I travelled extensively through their herding area, and studied their history. They seem to have descended from just a handful of families who took up reindeer herding, revolutionized the winter range usage, and eventually displaced the Nenets and even penetrated the Sami range. A narrow founder base and a substantial drift might have made a huge impact on their genetic variation.

  10. Davidski,

    May I suggest you work on your manners if you want to participate in an adult conversation?

    Of course drift plays a big role. But why were these populations (Komi and Finns) able to drift that excessively and from each other? Part of it is surely because they were isolated and did not interact and mix with agriculturalists and Metal Age people as much as peoples further south and west.

    But drift can't explain everything, because they share their special status with respect to Central Europeans (CE) in PC1 and they share a relatively recent language subgroup!

    You would have to propose that they first were the same people, drifted away from CE, and then got separated and drifted for twice the amount of time. That makes no sense (i) because PC1 also separates Italians from CE and Czechs from Balts - and that surely is not due to drift, alone, and (ii) the Komi and Finns are now neighbors - so it is almost certain that they (or part of their make-up!) previously were not, and the Urals are a convenient geographic barrier at hand.

    There is Komi influence on the Mansi language, and both peoples live together in the Khanty-Mansi region (i.e., East of the Ural), today - but the Ugric people are relatively recently intrusive, as are the Nenets.

    The Komi intermarry with Nenets today - they could have easily intermarried with people who lived there before the Nenets, as well. At any rate, it would be good to see where the Nenets place on the PC diagram.

    Finally, Permi is not old enough to interpret the observed genetic distance as drift, alone.

    My guess is that Finns are Finnic intruders who intermarried with native Scandinavians, and Komi are Permian intruders who intermarried with local people of the North-Uralic region. And that, not relatively recent drift sets them apart.

  11. With regard to Eastern Ural contact in the Kumi region, for those unfamiliar with the geography, I should add that east of Troitsko-Pechorsk, the "mighty Urals" form a gentle plateau only 400 m high (rising from 200 m "high" surroundings), with the river valleys that drain West into the Pechora and East into the Ob less than 2km apart.

  12. This comment has been removed by the author.

  13. eurologist

    It is the genetic drift that plays with these isolated population. Of course there could be some old admixture differing from other tested populations, but there cannot be tens or hundreds northern villages holding all different archaic genes. If you look studies made by Finnihs researchers, you see that all villages in Lapland are quite distant for each other. It is much more probable that they are drifted to different directions than carrying all different archaic genes.

    The reason why there is such patternaas like Estonain, Finns and Kuusamoans is in the method. Admix tools tend to make most homogeneous populations as anchors, but this doesnt mean that they are old populations, it mean only that they are homogeneous. They can be only 300 years old colonists.

  14. As a bit of an aside, I see three Czechs and one Pole who seem to be outside the north-south European axis of variation.

    I say that this is likely Ashkenazi admixture, although Roma admixture is possible too, but much less likely historically and culturally in Poland and the Czech Republic than in say, Romania.

    It makes me wonder if these studies should (soon, when the sequencing becomes cheap enough) be doing Y and mtDNA analysis to help try to explain any discordant results. It would be nice if someone could run various versions of DIYDodecad on these samples to identify the source of admixture and the length of the exotic segments.

    Also, if these Izhemski Komi represent a new pole in the axis of European variation, then they should produce a new principal component for ancestry painting.

    Are there "Native American-like" segments among these Izhemski Komi, and if so, how much?

  15. eurologist,

    You need to learn a few basics in population genetics before we can have an adult discussion on this and similar topics.

    Those PCA results are actually very easy to understand, and yes, they're horribly skewed by recent genetic drift.

    The Finns and Komi are located on the left of the plot because they carry the least southern European admixture. Moreover, the Finns are at the top of the plot because they're more western European than the Komi.

    However, the bloated distances between them and other samples are due to genetic drift, and they only obscure the picture.

    In fact, if you look at the ROH results in this study you might realize that recent endogamy and drift are the most important factors in understanding how Balto-Slavic and other Eastern Europeans behave in intra-European analyses.

    West Eurasian and global analyses are much more useful when looking at the affects of prehistoric and historic population movements around Europe, because the effects of genetic drift aren't able to skew the results at these lower levels of biogeographic resolution.

  16. It is the genetic drift that plays with these isolated population.


    Sorry, I disagree, for a number of quantitative reasons.

    My main gripe is directed at the notion that drift can explain the first few PC components in a sufficiently large study. We don't see that with Sardinians, Irish, Orcadians, or Jews, for example - and for a reason.

    Drift is of course important, also for PC2, here - but when properly done, the matrix elements of the analysis are weighted with a function of the allele frequency (e.g., EIGENSOFT; Patterson, Price, and Reich, 2006). This strongly emphasizes rare SNPs over SNPs that are mostly missing in a particular, small group, only (i.e., due to drift). You can see that in (the slightly tilted) PC1, because otherwise, as I mentioned, it would not be the main S-N differentiator. SNPs that Finns lost due to subsequent drift actually dilute the S-N differentiation (moves Northern and Central Europeans closer to Italians), which, however, is obviously not the case.

    At K=4, it looks as though Central Europeans have a good chunk of admixture with what is modal in Komi and Finns, respectively. At K=5 it becomes clear that this is in fact Baltic admixture - which (i) makes much more sense, and (ii) are known, true SNP signatures - not just the lack thereof due to drift. So, up to Estonians the (tilted) PC1 is by a vast majority due to characteristic SNPs - not the lack thereof, and thus not due to drift away from a large original population.

    I am quite confident that, similarly, much of the genetic distance of Komi is a set of unique SNPs, rather than a lack of them - in this case, incorporated via admixture. Nenets and surely also Mansi are heavily removed from extant Europeans, and so likely also were the original people who were displaced/ incorporated during either-side Uralic expansion.

    The main question I posed is: is this simply a N Asian signature, or perhaps an ancient NE Uralic element that is distinct, and thus of great importance in our understanding of Mesolithic and Paleolithic population structure.

  17. Eurologist, I will later show here PCAs of the Finnish study I mentioned. It shows several Lappish villages, similar to Kuusamo. It also shows very big distances between all those villages. We are playing with common and differing snps between sample groups. Mainly the difference is due to the drift. The difference between various drifted population is in the selection. Some sample groups, like Sardinians, are old isolations and less common with other Southern Euros while for example Kuusamoans are a very young isolation with all recent Finnish genetic makeup, just drifted and lost the diversity, thus contacting other Finns on the plot and sharing same k with them.

  18. Also, just to clarify, we are not talking about a village or two, here. There are a million Permic speakers, and about 400,000 Komi.

    This is comparable to the just over a million Estonians (who appear much less "drifted").

    And contrast that with only ~10,000, high-endogamous Veps (who, yet, appear much less "drifted").

    As to the measured endogamy, HGDP-Russians had a lower value than Poles, yet clearly show motion into the second PC direction (slightly counter-clockwise rotated, similar to Veps). This is clearly a measurable genetic component that, like the Finnish component, is outside standard "Baltic" (i.e., general N/NE European) and not just an artifact of drift.

    The fact that Priluzski Komi have significant Asian admixture but Izhemski Komi not, makes me think this probably has nothing to do with recent West Siberian admixture. As I mentioned before, it would be highly desirable to test further populations around the Urals to determine how far spread this component is and whether it should be coined Permian or Uralic.

    It should also be noted that in this analysis, HGDP-Russians are almost 40% Finnish/Komi/East Asian.

  19. Here is the plot I promised. Eurologist, they really are Finns and cannot be compared with the figure between Sardiniasn and Italians. AFAIK Sardinians are a very old population with old genetic makeup, while these Finnish groups are Finns with reduced FINNISH genetic makeup. Very likely we can find similar villages and small local groups everywhere were more southern people moved during the northern expansion era.

  20. @Eurologist

    "Also, just to clarify, we are not talking about a village or two, here. There are a million Permic speakers, and about 400,000 Komi."

    How do you know that gathered samples represent all Komis? I mentioned and have shown that the genetic drift varies from village to village significantly in northern regions. It is hard to deny because it is shown in my previous message by PCA, by just the same method that you use in your conclusions. Despite of all that it is good also to notice taht at least in Finland the northern sample group represents only 0.4% of Finnish populations, being overrepresented locals by the factor over 100.

  21. "As to the measured endogamy, HGDP-Russians had a lower value than Poles, yet clearly show motion into the second PC direction (slightly counter-clockwise rotated, similar to Veps). This is clearly a measurable genetic component that, like the Finnish component, is outside standard "Baltic" (i.e., general N/NE European) and not just an artifact of drift."

    I didn't say the positions of the samples on the PCA plot were dictated by their specific levels of genetic drift.

    I said they were dictated by the genetic drift in the Komi and Finns.

    So even if the HGDP Rusians show lower levels of endogamy than Poles, this doesn't mean they should be further away from the Komi and Finns.

    The reason they're closer to the Komi and Finns is because they're in large part of Finnic origin, and have mixed with Komi and Baltic Finns very recently, especially with the Komi.

    That means they carry many more of the the alleles at higher levels than Poles that have drifted in Komi and Finns.

  22. mikej2,

    Thanks for providing that plot. My understanding of it is that the only additional groups there are CEU and Swedes. That's my point, though: in a sufficiently large (wide) study, drifted populations don't usually have that kind of impact on the first couple of principal components. If no other groups are included, then by definition here Finns form one end of PC1, and PC2 is defined by the drifted Finns/Saami.

    In the study of this thread, on the other hand, there were 8 control groups, many of them with negligible Finnish (and almost non-existent Komi) admixture:

    Italians, Czechs, Germans, Poles, 3 Southern Russian populations, and Latvians.

    And the same way you likely agree that a general Finn (not isolated villages) component is meaningful, the same way I argue a general Komi/Permian component (defined by at least one million people!) is meaningful and not an artifact of recent drift.

  23. Eurologist

    "That's my point, though: in a sufficiently large (wide) study, drifted populations don't usually have that kind of impact on the first couple of principal components. If no other groups are included, then by definition here Finns form one end of PC1, and PC2 is defined by the drifted Finns/Saami."

    You are basically right; bigger amount of samples from larger geographic area makes genetic drift negligible, even though these all samples are locally drifted. But on these plots northern drifted small populations (villages or city minorities) are heavily overrepresented, as I mentionet over 100 times compared to their source population (Finns - Kuusamo Finns). Their genetic makeup strengthens a small part of their "mother population". As an outcome their genetic components on the PCA become also overrepresented.

    Now, read carefully.

    PCA looks over all samples and searches the biggest difference to create the dimesnion 1 and the second biggest differences to create the dimesnion 2. It doesn't get in who belong to big populations (like Italian), it only looks the difference and magnitude between samples. When small drifted populations are vastly overrepresented, like Komis/Kuusamoans are, the biggest difference have been found between them and Italians in this case and the result is placed on the first or second dimensions, or on both dimensions depending on samples. This is why the drift plays a huge role even though Italians and other big populations are present.

  24. How do you know that gathered samples represent all Komis?

    Mikej2 - they studied two, and Veps and at least 3 Russian populations also show deviation in the (tilted) PC2 component. So, it's a geographically wide-spread phenomenon, and one can be fairly certain it will be replicated with other Komi and nearby populations.

    It is the same situations as with Finns: the fact that some northern groups and Lapps are isolated and heavily drifted does not take away from the fact that any Finns, even those with significant population numbers farthest south, and also Estonians, show the same principal component.

    You can map Estonians to their counties if you concentrate on them (as has been done) - but the situation completely changes if there are sufficient out-groups.

    The following is what I base my claim on that properly weighted PC analysis is fairly insensitive to drift, but extremely sensitive to ancient markers or the lack of participation in sweeps and the like. I posted this elsewhere - correct me if my interpretation of EIGENSOFT is wrong:

    The matrix elements are weighted with the function w(j) = 1/sqrt[p(j)*(1 -p(j))], where j is the SNP marker index, and p(j) is the allele frequency calculated as 1/2 of the average for that site. For autosomal data, a site can have the values 0, 1 (only one copy) or 2 (both copies carry the SNP). In the end, the square of the Matrix elements (after projection onto the eigenvectors) enters, so, roughly speaking, w^2 = 1/[p(j)*(1 -p(j))] is more relevant. Now, this is where the number of populations comes in: say, you have 10 different populations, and for simplicity, all populations are fairly homogenous. The most extreme scenarios are that only one carries the marker, and only once - or that all but one population carries the marker, but that one still once. Then p = 1/20 or 19/20 and w^2 ~20 in both cases - that is, the calculation is heavily weighted in favor of such scenarios. On the flip side, a case in which SNPs are all over the place in the different populations, say 3 have the marker twice, 4 once, and 3 don't have it, p = 1/2 and w^2 = 4, i.e., a factor 5 difference in weight. This weight difference keeps increasing with the number of (different) populations included, and increases with the number of individuals if the populations are not homogenous. So, if 1000 individuals are studied, in the extreme case, the weight can be 2000!

    And those scenarios with the high weight are what you would expect from an ancient contribution: an SNP left over but long gone in all comparison populations (individuals), or a rare new mutation, or not being part of a sweep. On the flip side, heavily drifted populations largely have randomly lost markers, or have an unusual build-up of decently common (but not extremely rare) markers (if they were rare, then that is of significance even without subsequent drift). Both such cases do not yield a high weight. In general, drift occurs at a rate sqrt[p(j)*(1 -p(j))] - that is, it occurs most swiftly at sites that receive the lowest weight in EIGENSOFT.

  25. Does anyone know what R1a and R1b haplogroups Komi carry? I would be interested to know if they carry much more Central Asian variants compared to more western-types-carrying Russians. Their mtdna could also be of more Central Asian type. The fact that Russians are similar to Germans and Czechs could be due to Slavic migrations from West to East and Komi would then represent more ancient Central Asian people.

    The difference between Finns, on the one hand, and Southern Russians, Germans and Czechs, on the other, could be due to Finns (and Finnic people) being more Paleo-European (with less Neolithic gene flow and in particular with less southwestern and mediterranean gene flow) and also beacause they are admixerd with Saami, Kuusamo Finns in particular, who seem to have received Arctic East Asian gene flow.


