April 01, 2011

Genetic structure in North-Central Europe with the Galore approach (revisited)

This is an update of a previous post, but with a much larger number of 416 sampled individuals from 26 populations.

The sources of the data are:
  • FIN and GBR from the 1000 Genomes Project
  • Populations with _D ending from the Dodecad Project
  • Populations with _H ending from the HGDP
  • Populations with _B ending from Behar et al. (2010)
20 clusters were inferred with 14 MDS dimensions retained. Below is the number of individuals assigned from each population to each cluster:

Some details on the cluster:
  • #1-3 are dominated by all 100 Finns plus 2 Swedes
  • #4 is clearly Balto-Slavic
  • #5 is clearly Russian
  • #6 is Norwegian-Swedish
  • #8 is British Isles
  • #9 is also British Isles but also encompasses all 3 Danes and a Dutch
  • #10 is French dominated
  • #11 is Central European (German-Hungarian)
  • #13-14 are British-Orcadian
Some points of interest:
  1. A single Estonian groups with Balto-Slavs
  2. A single Austrian groups with a Hungarian, French, and British
By definition, a cluster can be inferred if there are 2 or more individuals belonging to it! Hence, singleton project participants are likely to be grouped with some other broader group irrespective of their closeness to it. Hence, we should not conclude that e.g., the Estonian is indistinguishable from Balto-Slavs, but rather that a possible genetic distinctiveness of the Estonian population must await more Estonian population samples.

It is also interesting that the hitherto distinctive Finnish and British Isles populations have split into several clusters. This is the power of numbers, and I anticipate this to occur for other population groups with large sample sizes.

On the flip side, the inclusion of a wide array of Balto-Slavic populations has tended to make them all fall into a single cluster. Belonging to a single cluster does not mean that there is no population differentiation, but rather that this does not take the form of separate "blobs" of individuals that an algorithm working on unlabeled individuals can uncover.

This also brings an explanation of the mega-British Isles/American White cluster discovered in the most recent analysis for Project participants: the inclusion of multiple admixed individuals has probably served to fill-in-the-gaps within the general population of that origin, whereas the current analysis which included only individuals of a single origin as well as what is presumably a good geographical sampling of the GBR population has allowed population structure to be better visible.

UPDATE (Apr 4): It has come to my attention that the single "Hungarian" and the single "Austrian" joined the project under false pretenses and are the relatives of other Project members. In retrospect it is not surprising that they failed to join the German and Hungarian clusters respectively.

22 comments:

Paul Ó Duḃṫaiġ said...

The Spilt of Irish and British population into Clusters 8 and 9 is interesting. 76% of the Irish are in Cluster 8, whereas with British_D the spilt is: 35% in Cluster 8 and 50% in Cluster 9. Given that Cluster 9 includes Dutch and Danes we could be seeing spilt between "Insular Celtic" population and "Germanic" incomers?

Is GBR a british group as well?

Dienekes said...

>> Is GBR a british group as well?

Yes, it is comprised of English and Scots.

Paul Ó Duḃṫaiġ said...

We need to get some specific welsh and scottish people to join Dodecad it be interesting to try and break out the variation across the island of Britian. I know in past studies that Scots have often overlap with both Irish and English.

On the GBR group which obviously excludes the Welsh the levels of Cluster 8 drop from the 35% in British_D to about 21%

-Paul
(pduffy81)

Andrew Oh-Willeke said...

We would expect at least two subgroups in Finns - true Finns and Swede-Finns. With three groups, one would suspect a strongly Saami admixed Finn, a not strong Saami admixed Finn and Swede-Finn mix, or alternately, a Finn, a Slavic admixed, and a Swede-Finn category.

Debbie Kennett said...

Great Britain also includes Wales.

Dienekes said...

Great Britain also includes Wales.

GBR does not.

eurologist said...

As I have pointed out a number of times, in pretty much all autosomal studies Germans and Hungarians rank very close.

Germans are still represented a bit sparsely, here - so there are missing matches with Swedes, Danes, or Dutch that would probably occur in a larger population sample. It is still interesting to see that eight of the ten Germans group with (18 out of 21) Hungarians. What a strong "Danubian" connection!

Paul Ó Duḃṫaiġ said...

Regarding the GBR set is there any data regarding the breakdown of how many of the 90 were in Scotland or in England?

About 1/10th of total population of Britain have a least one Irish grandparent (6million), if you cover all immigration since the famine the number goes up to about 1/4th of the population with some Irish ancestry/admixture.

Average Joe said...

This may be a result of the small sample sizes but I find it interesting that there is so little overlap between the British and German samples.

Debbie Kennett said...

I understood that the British data in the 1000 Genomes Project was taken from the People of the British Isles Project:

http://www.peopleofthebritishisles.org/press/nl4.pdf

This project includes participants from Wales and Northern Ireland. Do you have a breakdown somewhere of the origins of the geographical origins of the samples they supplied?

Dienekes said...

I understood that the British data in the 1000 Genomes Project was taken from the People of the British Isles Project:

"British from England and Scotland (GBR)"

http://www.1000genomes.org/about

Johnserrat said...

I know where you can get more dutch data! ;)

Unknown said...

The cluster overlap with the Danes will be connected with the group of Vikings that came that way, and Danish rule of Northern England.

And yes, the lack of significant German overlap is striking. But this agrees with haplogroup data that indicates that the impact with groups like the Saxons was minimal.

IMO similarities between the Germans and British Isles populations is related to a much older underlying structure combining a Southern origin component (related to the Sardinians) and a northern Structure (related to Lithuanians).

I dont like this cluster only data as I think it can be misinterpreted. Can we have to graphic also? It really adds to the reality of the situation.

Unknown said...

Thinking about it further the Danish/Dutch cluster looks to just illustrate the similarity between these people and the British. If it were Danish rule/Viking input it would not be so dominant a cluster. Neither made much inroad further south in England.

Fits more in with the Doggerland idea than anything.

eurologist said...

This may be a result of the small sample sizes but I find it interesting that there is so little overlap between the British and German samples.

Average Joe - actually, 4 of the 10 German samples do overlap: 2 (including Norway in the group), and another 2 (including France and Hungary in the group).

But, yes, as I said before, I suspect undersampling of the North - otherwise, there would be more matches with Netherlands, Denmark, and Sweden, and with that, also British.

Paul Ó Duḃṫaiġ said...

Looking at the sample map from the pdf that Debbie provided it looks like they have quite a poor sampling rate for most of Scotland. Most scots samples are in the East and south as well as Orcadian. Not alot of samples from the west in the traditional Gàidhlig speaking areas.

In comparison they got quite a good sample rate from Wales, Cornwall and Northern Ireland.

Debbie Kennett said...

http://www.1000genomes.org/about

Thanks for the link. Perhaps the intention is to add the samples from Wales and Northern Ireland at a later date.

Average Joe said...

I notice that one of the French samples has members in cluster 8 which seems largely exclusive to the Irish and British populations. Does this French sample include Bretons?

Average Joe said...

Given that Cluster 9 includes Dutch and Danes we could be seeing spilt between "Insular Celtic" population and "Germanic" incomers?

In addition to Irish and British samples, cluster 9 includes Danish, Dutch and even French samples but no German or Scandinavian ones. I think that this makes it unlikely for cluster 9 to be Germanic. I think that cluster 9 is a Celtic one that was germanized in Denmark and the Netherlands and latinized in France. From what I understand Y-chromosome R1b - which is more common in Celtic populations than in Germanic ones - is more common in Denmark and the Netherlands than it is in Germany and Scandinavia which may be evidence of a significant Celtic survival in the Dutch and Danish populations.

princenuadha said...

How come the Swiss aren't included? Switzerland is a central European Country, nearly all the population is north of the Alps, and their southern part is north of much of France.

I know there was at least one German Swiss and I'm sure they'd cluster with northern/central.

Rafael said...

@princenuadha
There is actually a Swiss in the clusters galore and he is in the North-Italian cluster.

princenuadha said...

@Rafael
Thanks For the info but was the Guy Italian-swiss (a small minority)? The Italian Swiss have already been shown to be quite different from the German and French Swiss in that one 2008 study. In fact they did cluster with northern Italian.

The one German Swiss I was thinking of, swissgirl, clustered with northern Europeans and I am almost sure most Swiss would also.