October 23, 2010

Detailed admixture analysis of West Eurasian populations (+ GenomesUnzipped individuals)

Here is the result of my ancestry analysis of several West Eurasian populations from HapMap 3, HGDP-CEPH, and Behar et al. (2010) datasets using ADMIXTURE for K=10. Some non West Eurasian populations were also added, to squeeze out the partial admixtures from outside the region in some populations.
I have tried to find an informative label for each of the inferred components, corresponding to the common attribute of the populations where it appears the highest. I make no claim about the geographical/temporal/ethnic origin of any of them, or what they actually represent, so please don't take the labels to be more than mental aids for processing the visual information.

Some comments on the components:

The West African component is centered on the Yoruba, is represented among the Maasai, and North Africans, and also occurs in the Near East. As my previous analysis of African populations suggests, this component is not really limited to West African, as the Yoruba show clear ties with the Luhya of East Africa. However, I prefer to call it West African, rather than Sub-Saharan here, as the West African Yoruba are its only representative.

The East African component is centered on the Maasai. It is also represented in North Africans, being more important than the West African among Egyptians, but, as expected, the reverse is true for Moroccans and Mozabites. Also, as I've noted in my analysis of the African ancestry of Near Eastern populations, it is also more important in them than the West African.

The North European component reaches its maximum in Balto-Slavs. However, its substantial presence among Hungarians, French, and Basque, and indeed all European populations, and even Lezgins, suggest that it is a broader phenomenon.

The Druze component is centered on the Druze of the Near East, occurring at a lower frequency even in Europe, and in many populations of the Near East.

The East Eurasian component is centered on the HapMap Chinese, and occurs at a noticeable frequency among Turks, Iranians, Adygei, Russians, and Chuvash.

The Arab component reaches its highest frequency in Bedouins and Saudis but occurs widely in Semitic populations. A very interesting observation about it is that it occurs in Egyptians and Moroccans, but not in the Mozabite Berbers, perhaps reinforcing its Arab and/or Semitic associations. Its paucity among people from the Caucasus (Georgians, Adygei, Lezgins) and Armenians is a further argument for that association.

The NW African component has a clear association with Moroccans and Mozabites, also occurring in Egypt, Morocco and and Sephardic Jews and various other populations.

The Semitic component attains high frequency among Arabs and Jews, hence its name. It seems more widely distributed than the Arab component, which makes sense, as many Near Eastern and European populations have had a longer time in which to encounter different Semitic groups since their Bronze Age appearance in the historical scene, but the corresponding time for encountering Arabs has been shorter.

The SW European component attains its highest frequency among Sardinians and Basques, hence its name. It has an opposite cline of distribution compared to the next component:

The W Asian component dominates in people from the Caucasus and West Asia and is widely distributed across West Eurasia.

Revisiting the Genomes Unzipped individuals

I started the party of using the data of Genomes Unzipped volunteers for ancestry analysis and was soon joined by others. I took another jab at this data as an afterthought in my Near Eastern African analysis, and now it's time to revisit the topic using dense genotype data on a variety of populations, and the power of ADMIXTURE.

First of all, a note of explanation as to why I don't use PCA/MDS in my ancestry analysis. I have nothing against them per se, and the visual examination of a large number of principal components can give a very useful overview of the data. The main problem, however, that I perceive with these techniques is their treatment of individuals of different backgrounds:

The offspring of a Greek and a Norwegian, a Russian and a Spaniard, and two Hungarians may all fall in the same spot on a PCA map. By its very nature, PCA synthesizes across ancestries, representing individuals as singular dots, whereas ADMIXTURE tries to analyze them into their several underlying components.

If we are interested in how an individual compares against other living humans, by all means PCA/MDS are great tools. But you'll never know how the dot came to be, whether it is (a) descended from a long line of ancestors inhabiting the corresponding geographic space, or it is (b) the product of admixture between more distant ancestors whose genomic average projected on the first few principal components matches that of (a).

With that long introduction, here is the analysis of the Genomes Unzipped individuals. These were listed as the "People" group in the above plot, but now we are looking at their individual level components.
The preponderence of "N European" and "SW European" components in everyone but Dan Vorhaus immediately tells us that these are Europeans. Moreover, the relative importance of the "SW European" vs. the "N European" component tells us that they are not from eastern Europe (compare with Balto-Slavs of previous figure). Finally, the relative insignificance of "W Asian" and "Semitic" components is suggestive of NW EUropean ancestry.

Only three individuals stand out in the analysis:

VXP001 (Vincent Plagnol) shows an excess of "SW European" component, supporting an excess of SW over NW European ancestry, but also a tiny slice (1.5% to be precise) of the "Arab" component lacking in the other individuals. Such tiny slices occur in several southern populations, so these results reinforce the "SW" rather than "NW" impression for this sample.

JKP001 (Joe Pickrell) also has elevated levels of "SW European" vs "N European" component, also suggesting more southern ancestry. He also has small slices of "Semitic" (4.4%) "Arab" (1.1%) and "Druze" (1.1%) components. Inherited from his Italian grandparent, perhaps?

Finally, DBV001 (Dan Vorhaus) shows a relative importance of "SW European" (26.6%) "Semitic" (17.9%), "Arab" (2.7%), and "Druze" (1.8%) components, suggestive of a more south- and eastern- origin than the other individuals. It is useful to compare him with the Ashkenazi Jewish average, side by side, as can be seen on the left. Dan is a close match for his people, with the exception of a small green slice of "NW African" in the Ashkenazi sample, which is lacking in Dan.

But, how is that component distributed among Ashkenazi Jews? Here is the answer for the 21 individuals in the sample:

It is evident that this is detectible in some individuals (like #418), excessive in others (like #426). Thus, Dan falls perfectly within the continuum of his people.

It's quite interesting that the three individuals that stood out from the rest in my initial analysis are also the ones who stand out in this more comprehensive assessment, and some of the reasons for it were revealed.

UPDATE (Nov 1): Joe Pickrell discovers Jewish great-grandparent


Onur Dincer said...

First of all, a note of explanation as to why I don't use PCA/MDS in my ancestry analysis...

I understand your reluctance to use PCA/MDS, but I don't understand your reluctance to use Fst distances, which are the main tools for measuring genetic distances and relationships between populations.

Unknown said...

I think you should consider adding the Ethiopian samples as well. In these runs you just posted, the West African component of the Maasai is smaller than it has been in other tests by you, as well as other studies.

I think the East African may be absorbing some West/Central African ancestry of the Maasai, which is lowering the Eurasian affinity of the East African cluster and possibly leading to a lower "East African" score for some Eurasians. The African component of Ethiopians is more "purely East African", so to say.

I think calling the "West African" cluster Western is fine. In the study by Tishkoff et al. from 2009, even with 14 African clusters, a West African cluster connected Niger-Kordofanian speakers from all over Africa. That's why, notwithstanding geographic proximity between the Maasai and the Luhya, the Luhya have a much higher West African affinity.

Onur Dincer said...

I think you should consider adding the Ethiopian samples as well.

Also South Indians (to determine Dravidoid admixtures).

Gioiello said...

If we all are a cocktail, I think it is undeniable that Tuscans are the most similar to Ashkenazi Jews. Perhaps it happened by chance, but the excellence of a cocktail is due to the measure of its components.

Onur Dincer said...

I think it is undeniable that Tuscans are the most similar to Ashkenazi Jews.

Based on the genetic analyses I've seen to date, that isn't true.

alfio said...

I think as you onur.
To me there have been only a great chaos lately over tuscans.
First someone claimed the great "similarity" with turks because of etruscans, but no one ever shown an overlap between the two, then there is now this new trend to say that tuscans are close to askenazi, but they are distant in every plot.

Tuscans' plots basically mirror their geographic location.
Basically they are where they should be, nothing more nothing less.

Fanty said...

"You should use more northwestern European populations in your analysis such as English, Irish, Scottish, Danish, Dutch and Germans."

There are no NW-Europeans in the raw-data he is using. Except for Orcadians who, so at least the legends tell, are 50% Norse (males) and 50% Scottish (females).
On MDS, the most US Americans seem to either be at the same space like Orcadians or between the gap between Orcadians and French.

Norse and Swedes alike apear as a "bridge" between the Orkney Cluster and the Russian one. (wich matches geography aswell as R1a1a distribution patterns)

As for the components and their distibution....

Some of these Clusters also show up in Polakos "Genographic project".

MAybe with slightly different percentages but... I made maps to have a better understanding where the clusters spread.

The black numbers are from the studies profiles and the RED numbers base on (FTDNA) project members. In this you have some Irish, UKish, Swede, North German and South German, for example.

Here is the "Northern European Component", thats currently called "Baltic" in the Genographic Project.
I used a "Races of Europe" map from Nazi Germany schoolbooks as background as these went at least somewhat by physical apearance and, for example in France and Russia, follow distributions of Y-DNA amazingly well.

Oh and I added a blond hair and blue eyes map to it, for I have the impression, that the distribution of the "North European" component is quiet similiar to the distribution of these stereotype "Nordic" attributes.

this here is "MEditeranian", what Dienekes calls "Southwest European":


Thats "Anatolian/Caucasus", similiar to Dienekes "West-Asia"


Jack said...

I looked at the components very briefly and one thing that does not convince me for now is the semitic component. I am not sure what it's meant to be. What is it, a present day Palestinian component with another name? Palestine can hardly be considered representative of anything given it's history. Why the large difference with lebanese and Jordanians and Syrians? They are not as fully "semitic"? Come on.
I think this component shold be called "20th century Palestinian", not Semitic.
I suspect some tweaking is needed here.

Structure said...

"I think it is undeniable that Tuscans are the most similar to Ashkenazi Jews"

Not true. The only Italians who cluster with most of Jews are South Italians as we can clearly see on this 500K West-Eurasian MDS plot (IT2,3,5,6 on the map):


eurologist said...

* it would be good to have Germany in these studies, so one could better see if the "Slavo/Baltic" = ancient northern component is perhaps actually more central and not Eastern European.

* I suspect that the "Western Asian" = Caucasian category has several important sub-groups, so it would be really helpful to include Afghanistan, Pakistan, and NW India.

Some interesting findings:

- French Basques are different because they are only old Europeans - like Lithuanians, just inverse in the two SW/NE components

- the Caucasian/W Asian component seems to have entered Europe at least via two different routes. I smell dung. I mean, farming and agriculture. If this pans out, we may have a nice tool for finding the percentage of Anatolian agriculturalists in Europe?

Fanty said...

"* it would be good to have Germany in these studies, so one could better see if the "Slavo/Baltic" = ancient northern component is perhaps actually more central and not Eastern European."

One needs to take into account, that admixture loweres components.
So, if we imagine something like... all of northern Europe had been inhabitated by the people that the Northeuropean-Baltic component represents. Their contribution to the genpool would drop in aereas where say, neolithic farmers would migrate to.

Lithuania has 2 other interesting factors I recall:

1. I recall a study about hunter-gatherer mtDNA. Wich claimed, European H/G mtDNA best matches modern baltic states mtDNA.NOt even Finnish mtDNA is that close.

It also mused about central European H/G retreating to Scandinavia first. Then been followed by farmers and further retreated to the Baltic.

The second interesting thing about the Baltic states is, that suporter of the Baltic-Urheimat Hypothesis claim the Baltic language as the most archaic (closest to original) Indoeuropean language and especially claim Lithuanian as the "closest to (calculated) proto-indoeuropean" language of the world.

Of course this only means, the Lithianians must have been very isolated that the indoeuropean tounge has not changed that strong.

The second problem with it is, why would European "Natives" (Hunter-Gatherer) and "Indoeuropeans", be one and the same people?

Specially, since that one study, that claimed discontinuity in European mtDNA, said, Natives+Farmers combined expplain 20% of the modern European mtDNA. Leaves 80% for a third (Indoeuropean?) source.

Anonymous said...

Very interesting. This seems to substantiate the old-time breakdown of Europeans into Nordics, Alpines, and Mediterraneans, with most people being a blend of the components. Likewise, DNA testing seems to substantiate the Caucasian invasion of India and the blending with Dravidians in varying degrees.

Jack said...

I did a comparison of Tuscans with some east MED populations doing a history guess-regression. Assuming these components mean somthing, my first guesstimate is that Tuscans could be the result of an old South central euro population with very roughly:
15% Etruscans
5-8% Jews
I did it for fun, but this might explain what a few here see as a vague similitude between Jews and Tuscans.
I do not see A. Jews as emanating especially form Tuscany based on these "bars".

Gioiello said...

Unfortunately I haven’t at my disposal a graph paper, but Ashkenazi Jews and Tuscans are composed about by the same components in a little bit different quantity, except the East Eurasian, present in AJ and not in Tuscans. I challenge you all to find another people in the Dienekes’ diagram (except the other Jews) who is closer to AJ.
What isn’t acceptable is that Vorhaus is 100% AJ: he is, like the other AJs, a cocktail of W Asian, SW European, Semitic, Arab, East Eurasian, Druze and N European.
Dienekes has my data from deCODEme and 23andMe. If he wants, he could run my SNPs. I am a Tuscan from at least one thousand years, but probably from many thousands more.

alfio said...

Jack for 15% etruscan you mean Druze and for 5% jews you mean semitic?

Anyway i agree, based on this bars to me it seems that N.European and SWeuropean are a bit overestimated in A.Jews and underestimated a bit in Tuscans.
I have seen may plots, and in 23andme to make an example it is easier a tuscan fall in the N.Italian or southern France cluster more than everything else. If nothing like that happen they are anyway very far from Turks or Jews or every other middle eastern people and form a their own cluster but always very far from the previous people.

horacioh said...

OK Goiello.
But dont forget the E.A. component, here diminished, and really from 4% to 27% -inlays in semitic ancient pooll partially-.
The Jews pre Askenazim arrived to Rome and Tuscany from Alexadria and others Near east regions mainly when the Muslims arrived to Egypt en the VII century AE, and from there to the North, they were heading to the Rhine river to the nowadays Lorena and Alsace, they got to oriental Europe where they mixed and got in contact with Khazars Jews -I call "four jews center, East Europe"not ancient like the others since XI century -, It is also known that the resellers were mostly male people who were making up communities with local women and then they kept their genetic pool among the Jewish community without any other important external later influences, this manifests the importance of the religious restriction about Jewish relations their evident endogamy.

For this reason to say : " Curiously, the Ashkenazi mtDNA pool of recent European descent includes Hg “L” at a frequency major to that among North African Jewry <4>, >5>.", as Behar et al. express in their papers, I think the L markers come from Levant also.

I have to point out that communities of Central Asia, Turkey or Caucasian ones as well as the ones of Iran or Iraq, lack practically that "L" marker, the same is significantly present in Yemeni communities (10.4%), Ethiopian one (14%), Ashkenazim (4%) and North African Jewish (2%), only maternal components are considered in these figures. These components are undetectable (or almost undetectable) in non-Jewish communities of eastern or western Europe. The Middle East origin of the Ashkenazim is outstanding, either the four founder mothers highlighted by Behar group researchers- which would be 40 % -include "N1"- individuals carrying these mitochondrial markers- as well from a fifth mother that include the "L1, l2" and "M" haplogroup also and from the Wide or extending Middle East that includes Abyssinian region – who represents 50 % of the nowadays Ashkenazi community.

horacioh said...

Remember that the hyperhaploydia present in Ashkenazim is only compatible with a most ancient population about 8500 years or a mixed with people, along the Diaspora life without religious restriction that is not the case.
The Ashkenazim hyperhaploydia and heterozygosis, - practically almost absent in Sephardim- that could cluster these Ashkenazim populations everywhere you want (not common in isolated population, the same for mtDNA coming in great rate from host population, and endogamy practice that Ashkenazim hold) is explained by the superposition and overlay of diverse fount or source population , that are all of this of Jewish origin (that consider converted into intraJewish assimilations) , one coming from the “Syrian European nucleous” – that Sephardic as well as preAshenazim bring inside -. The other convergence were the “Coptic Jewish nucleous”, coming from Alexandria, the main and largest Judaic center in ancient times – the buried and graves in Jewish graveyards and catacombs of Tuscan, and Alsace as too Rhineland cities take a lot of Egyptian ornaments and display figures from these, as well as Y and mtDNA markers - . The great Jews migration from Egypt beginning after the Muslim invaders from Arabia in the VII AE century. The “Babylonian and Persian nucleous” take place and contacts newly with and when the “preAshenazim second fase” were migrating to the East Europe. A remarkable contact was with the fourth “East Europe Jews nucleous”-not related or little related with ME-, with the descendant of the Jews Khazarians ones, spreading every where and carrying a lot of East Europe and Eurasian markers. That happens between the XI and XII century AE. Note that population events like bottleneck, loosing and losing markers, marriage with local women at the first Jews setting communities. Masculine Murders pogroms and in minor rape must be consider too.

Dienekes said...

horacioh stop posting the same thing over and over again in different threads. I will not approve any more repetitive posts.

horacioh said...

Very sorry Mr. Dienekes.
But there was something changes in that post about the origin of the 35% to 55% european component.
DR. H.H.O.C.

Spy said...

I disagree with Jack. The light green represents the pre-Arabic strain in the Near East.

We can see that, while both populations are endogamous, the Samaritans share in that greenness, but the Druze do not. That suggests that the Druze have so much drift that they make their own cluster, whereas the Palestinians and Samaritans score high in a regional non-Arabic cluster. Call it PaleoLevantine, if you prefer. Palestinians do not constitute their own outlier population and should not name that which they epitomize.

Jack said...

No, to your question.
What I did is I compared several ME bars and their components, "extrapolated" with a seat-of-the-pants statistical technique what I suspect Etruscans were like when they went to Italy more than 2000 yaers ago.
The Jewish component I guessed the same way except that I assumed they did not all go straight from Palestine.
Note that in reality the Jewish "component" for me is just a proxy that groups part of the ME influence not from Anatolia, which I guesstimate to be roughly 5-8%. I do not really mean that such a large number of Jews actually went to Tuscany.
Oh, please consider that this was a fun exercise for me, as is explaining what I did, just to look at things from another perspective.

Dienekes said...

There are only 3 Samaritans, so please keep that in mind. I got some weird clusters centered on Samaritans, Yemenese, Iranians, Moroccan Jews, so I'm guessing these samples have some related individuals in them.

alfio said...

Thanks jack.
Nice experiment anyway.

Spy said...

Do you need more Greek specimens to add to this test? I have one with a EuroDNACalc result of 100% SE, but I predict that the attendant EurasianDNACalc0.4 breakout will be something like 48% West Asian, 25% SW 20% Semitic, 7% NE.

By the way, I was surprised to see the Georgians out-WE the Armenians, and wonder if Assyrians might score even higher.

Katharós said...

"Without wanting to dip into Ideology."
The area referred to the Philistine coast by Herodotus and Sargon ll or even earlier the Hebrew Bible is interesting. It is not only an important transition point of ancient Semites into Egypt via the Sinai. But what is also interesting is that the Philistine coast stands largely outside of the "Hebrew realm" especially as devoted "Pagans".
From the Hebrew Bible to more recent events in the 5th century AD. for instance Gaza was known as devoted Polytheist stronghold , worshiping the Canaanite-Greco God Marnas.”City God of Gaza”

Dagon Marnas

Porphyry of Gaza

eurologist said...

Lithiuanians are interesting because of their isolation. Their language, however, is much more recent than their autosomal DNA, and may have originated somewhere between the Ukraine and the Urals.

The thing is, French have much more of the "northeastern" European component than, e.g., Romanians. That in itself indicates it is more of a central northern than eastern European phenomenon. My guesstimate is somewhere around 60% to 70% for Germany, with only a slight NE to SW cline. Thing is, 65% of 80 million is over 50 million, compared to 3 million Lithuanians...

At any rate, I also think much can be learned by adding and subtracting populations, and looking at the difference in outcome.

"Likewise, DNA testing seems to substantiate the Caucasian invasion of India and the blending with Dravidians in varying degrees."

Or, it may show that Caucasians are more recently/closely/less "perturbed" derivatives from NW Indians than old Europeans are. Or a little of both.

Dienekes said...

Do you need more Greek specimens to add to this test?

Spy, thanks for the offer. I am currently trying to make the test as robust and detailed as possible, and I will definitely make an announcement/call for samples to fill some of the holes in the sampling. Greeks, of course, are at the top of my list, but please wait until I make the announcement.

The inclusion of enough new samples from the Balkans and southern Italy may even reveal a new cluster. Many populations may have Balkan ancestry which is not represented here, because, quite simply, the only Balkan population are the Romanians.

Fanty said...

I had an error (service unavaible).
If the post worked, this is a double post and can be removed.

"My guesstimate is somewhere around 60% to 70% for Germany"

I know of values for 3 Germans from the Project of Polako.
He however uses 2 European clusters more (Basques and Chuvash too), while missing most of the African and Asian ones.

There, Lithuarians are at 81%
A single Southeast-Swedish women is at 65%. A single Finnish guy at 63%

The average of 6 people from UK is 57%, that from 2 Irish even at 59%

The French average at 44% in Polakos calculation.

The 3 Germans:

1. Me, 1/4 German/Dutch border, 1/4 German/French border, 1/4 East Prussia, 1/4 Silesia (Southwest Poland today) + a Lithuanian Great Great Grandmother in the lineage from East Prussia:

Northern European: 55%
Southern European: 20%
East Asian: 4%
Southwest Asian: 2%
North-West African: 0%

Plus the Additional Clusters:
Ural (Chuvash): 12%
Atlantic (French Basque): 8%

German 2:
1/2 Northeast Germany (Baltic coast), 1/4 Swabian (Southwest Germany), 1/4 Swiss

Northern European: 57%
Southern European: 16%
East Asian: 7%
Southwest Asian: 4%
North-West Africa: 2%

Ural: 3%
Atlantic: 12%

German 3:
Not sure, But I think its the doughter of German 2. And the mother is from Hesse (Central Westgermany)

Northern European: 48%
Southern European: 19%
East Asian: 9%
Southwest Asian: 4%
North-West Africa: 1%

Ural: 2%
Atlantic: 16%

To round it up, here are the maps of the "Ural" and "Atlantic" components:


Atlantic (I bet this Fin guy is not "unmixed"):

Dienekes said...

These East Asian numbers look very excessive for German individuals. I get less than 1% east Asian even for Balto-Slavs (with the exception of HGDP Russians).

To round it up, here are the maps of the "Ural" and "Atlantic" components:

The Chuvash are a relatively simple mix of Northern Europeans and Northeast Asians. If this is not recognized and they are treated as an ancestral population, then the extent of the "Ural" component will be extended much further to the west than is justified.

Fanty said...

"These East Asian numbers look very excessive for German individuals."

You are right.
But its my failure.

I dont know how this happend, but replace "East Asian" with "WEST Asian"...LOL

Georgian Center.

Onur Dincer said...

French Basques are different because they are only old Europeans

Even if that is true (which I doubt much, as you should take into account the drifting and homogenizing effects of the small population size and isolation, also it is very unlikely only them and Sardinians - a clearly isolated and drifted population - to be genetically so "pure"), they are currently genetically quite atypical for Europe and in no way genetically represent Europeanness, so they shouldn't be included in long-range genetic analyses (like Dienekes' analyses).