November 07, 2010

Multidimensional scaling and ADMIXTURE across Northern Eurasia corresponds to geography and language

Here is a multi-dimensional scaling plot of a number of North Eurasian populations. In comparison to my previous post, I have excluded Americans and Greenlanders, and added several other populations from Central Asia and West Eurasia.

Population labels have been printed in the co-ordinates of the population averages; these largely correspond with identifiable blobs of colored points, but note that some populations have several outliers, so labels appear in white space. Most notable in that respect are the Koryak, Chukchi, and the Nganasan, all of whom have some apparently European-admixed individuals.


"Mongol" corresponds to Rasmussen et al. (2010) Mongol sample, while "Mongola" to the HGDP-CEPH one. The population codes on the left may not be clearly visible as they overlap with each other and are CEU, LT, HU (relatively unadmixed Caucasoids), FI/RU (Uralian-admixed northern Caucasoids), IR/TR (Altaic-admixed southern Caucasoids). The West Eurasian part of the plot can be seen blown up on the right.

The correspondence with geography and language is striking. Siberian isolates from the extreme north and east, Koryak and Chuckhi are on top; HapMap Chinese at the bottom. Between them are Uralians (Selkup, Yukagir, Nganassan) and Altaics (Mongol-Tungus-Turkic people).

Below is ADMIXTURE analysis for the same set of populations, for K=7:


Finns and Russians seem to have an excess of the "Nganasan" component over the Altaic, while Turks have the opposite. Below is a table of Fst distances between components:


The close relationship between the two Caucasoid components is apparent (Fst=0.033), but note fairly large Fst divergences between the morphologically Mongoloid groups. I attribute this mostly to the very low population sizes of these groups, which have probably affected them by drift. For the less demographically constrained Altaic and East Asian components, Fst=0.044.

If you are not familiar with these ethnic groups, the Red Book of the Peoples of the Russian Empire and the Ethnologue indexes on Altaic and Uralic are invaluable, as are the portraits of ethnic groups of China. On the right a picture of a Nganasan.

UPDATE: Also, a past post from the blog, collating Y-haplogroup N frequencies with anthropological descriptions. Nganasans apparently belong to haplogroup N at a frequency of 92.1%!

23 comments:

Samequeen said...

"But as for certain truth, no man has know it,
Nor will he know it; neither of the gods
Nor yet of all things of which I speak,
And even if by chance he were to utter
The perfect truth, he would himself not know it;
For all is but a woven web of guesses."

Xenophanes
B34, translation by Karl Popper

Gui S said...

Would have been very interesting to see where the Japanese and the Korean fit within this (if there are any samples available).
I also hope that one day reliable Ainu and Nivkh samples can be obtained for this kind of things.

Dienekes said...

I have one Korean, and Japanese are included in HapMap, so I might make a go at it when I find some time. I don't use Japanese primarily because I think it unlikely that they influenced the gene pool of most of Eurasia, which is my region of interest.

Polak said...

Where'd you get these samples? I can only find the data for the ancient Eskimo.

Pikeperch said...

Siberian small populations are interesting.
However in Americas similar populations would be called Mestizos, or in Greenland Danish-Inuit and hardly used as elements of analysis.

Jan said...

Unfortunatelly, you've omitted Ket from this run. In last one three Sibirean components appeared: Southwestern (Nganasan - Uralic), Central (Ket - Yeniseian) and Northeastern (Koryak). As a result, Selkups "lost" their genetic identity.

Dienekes said...

Unfortunatelly, you've omitted Ket from this run. In last one three Sibirean components appeared: Southwestern (Nganasan - Uralic), Central (Ket - Yeniseian) and Northeastern (Koryak). As a result, Selkups "lost" their genetic identity.

Probably because there are only 2 Kets in the data. I'll keep it in mind to include them in future analyses. Unfortunately I just started one with 67 populations that I don't want to interrupt, but in the next iteration I will put them in.

Where'd you get these samples? I can only find the data for the ancient Eskimo.

They are in GEO under GSE22494

Onur Dincer said...

Lithuanians are a fairly good proxy for unadmixed NE European Caucasoids, but Iranians aren't a good proxy for unadmixed Caucasoids of Asia Minor. You should add Armenians to the analysis to see how unadmixed Anatolian Caucasoids would appear on MDS.

Anonymous said...

I am sure those samples will be very interesting to NW and NE Europeans who seem to be gifted with admixture from NE Eurasia.

I am wondering about the small but unexplained NE, North Eurasian admixture seem in some Southern Europeans. Is it all Altaic, a consequence of the Anatolian Turks or something quite different. There is very little said about the movements of Altaic speakers in Southern Europe like the Huns or Avars, and what effect did the Ottomans have in the Balkans?

Andrew Oh-Willeke said...

The geographical and language correspondence is expected, so I don't find those striking.

What I find more striking is the great level of diversity in a very sparsely populated area, in which many populations were historically more compact geographically than they are now (Mongol and Altaic language speakers made their way to the West only in historic times).

While there are clear clusters, they are not tight ones, and for language families, the clusters take up an immense space on the MDS map compared to Europeans or the Chinese.

On the East-West axis some of that may simply be tracking levels of admixture between very different West Eurasian and East Eurasian populations, or may be an artifact of the scale units. It would be interesting to see what the scatterplot would like like with dimension two twice as fine, and dimension one a quarter as fine as it is in the plot. That would make the North-South gradient from East Asia, to Altaic, to Uralic to proto-Siberian more visually obvious, while understating the significance of the European v. East Asian gradient across Siberia.

Onur Dincer said...

I also wonder how the inclusion of South Asians would alter these results. How much of the East and West Eurasian components would be eaten by the South Asian component.

Onur Dincer said...

Dieneke, one very important point, remember that in Auton et al. 2009, which used the POPRES samples, the only non-West Eurasian component in Turks was the South Asian component with no East Eurasian component. Is the difference because the Auton et al. paper used the Affymetrix microarray platform (500K) instead of Illumina? Which microarray platform is more reliable according to you?

Auton et al. 2009 results:

http://2.bp.blogspot.com/_Ish7688voT0/Sfttv4ydl5I/AAAAAAAABVE/ycfoDOsujnQ/s1600-h/auton_structure.jpg

princenuadha said...

"...who seem to be gifted with admixture from NE Eurasia."

Lol, are referring to the .1% East Asian found In the CEU again. Its really only the fins and Russians who have a small but significant enough amount of. East/northeast Asian component.

If HU stands for Hungary the "Huns" and "avatar" did not contribute much non-european blood at all. However if you look at dienekes recent analysis which included the Romanians you'll see that they have a sizable contribution from the east.

Also what did you mean by "gifted'?

princenuadha said...

@dienekes

Do you have data on the swiss, such as the 2008 study "Genes mirror geography within Europe". Or any other study especially including non-western Swiss. I would really like to see the Swiss in dotecad.I don't really know what they are (central European/German, French like, or Alpine/northern Italian).

Dienekes said...

Dieneke, one very important point, remember that in Auton et al. 2009, which used the POPRES samples, the only non-West Eurasian component in Turks was the South Asian component with no East Eurasian component. Is the difference because the Auton et al. paper used the Affymetrix microarray platform (500K) instead of Illumina? Which microarray platform is more reliable according to you?

Lol, do you see any East Asian reference populations in what you are linking? How do you expect to see the East Eurasian admixture in Turks in a study that doesn't include East Eurasians?

Onur Dincer said...

Lol, do you see any East Asian reference populations in what you are linking? How do you expect to see the East Eurasian admixture in Turks in a study that doesn't include East Eurasians?

Dieneke, Auton et al. 2009 of course includes East Eurasians, namely, Chinese, Japanese and Taiwanese.

Look at the link again:

http://2.bp.blogspot.com/_Ish7688voT0/Sfttv4ydl5I/AAAAAAAABVE/ycfoDOsujnQ/s1600-h/auton_structure.jpg

Dienekes said...

onur, the barplot you sent me has a vertical height of about 33 pixels. This means that ~5-6% East Asian admixture in Turks will occupy 1-2 pixels, which will, moreover, may be averaged out in the lossy JPG format.

Also, that sample includes only 4 Turks.

Onur Dincer said...

Dieneke, it is obvious even with this level of resolution that the results of Auton et al.'s Turks are very different from those of your and Behar et al.'s Turks. There is no visible East Eurasian component in Auton et al.'s Turks and a South Asian component much bigger than your and Behar et al.'s Turks. Is it becuase that Auton et al. includes only 4 Turks? Maybe. But Auton et al. uses a different microarray platform from that of Behar et al. and your project and this may be the reason of the difference too. I think you can easily test this probability using Affymetrix microarray platform on Turks and other populations. POPRES samples would be invaluable, but they aren't open source AFAIK.

Mongols said...

great work!

Anonymous said...

Dienekes is keeping us expectantly on the edge of our sofas!

Mongols said...

Genetic Landscape of Eurasia and "Admixture" in Uigurs, 2009

I didn't know if Tibetan-Qiangic group's data of the paper are available for you, if so, It's better to add those to campare with siberians and Mongolians, Mongolians share a relatively large grey cluster with Tibetans and Qiangs while the red cluster shared with south Chinese, Koreans, Japanese are less relatively according the older paper. thanks!

princenuadha said...

@onur

You linked (http://2.bp.blogspot.com/_Ish7688voT0/Sfttv4ydl5I/AAAAAAAABVE/ycfoDOsujnQ/s1600-h/auton_structure.jpg) showing the results of Auton et al. 2009. I'm so glad that it has resulted for the German Swiss. Are the results displayed in any other way such as fst values or composition graphs.

Also how would you classify the German Swiss?

Onur Dincer said...

Nuadha, the Auton et al. paper doesn't have intra-European Fst estimates; and unfortunately its intra-European haplotype analyses are between groups of countries and ethnic groups, not between single countries and/or ethnic groups, so they don't provide any specific information about the German Swiss. As to its other graphs, again they don't have any intra-European results, and they have no European country/ethnicity labels. In short, the Auton et al. paper has nothing of value about the German Swiss, so I can't make a sub-racial classification for them based on this paper.

The paper is free:

http://genome.cshlp.org/content/19/5/795.abstract