September 28, 2010

Some ADMIXTURE estimates in Eurasia

(Last Update: Sep 29)

Continuing my exploration of ADMIXTURE, I turned to the HGDP data, which has 660,918 SNPs for a wide assortment of worldwide populations. After pruning 12,086 SNPs with more than 1% missing genotypes, I was still left with ~650k SNPs.

Here are some experiments on this dataset. First, a clustering with K=2 of Han Chinese, Russians, and Orcadians (left to right)

The emergence of 2 clusters (red=Mongoloid, blue=Caucasoid) is as expected, with Russians showing a small participation in the red cluster (7.2%). These northern Russians are believed to have a substantial Finno-Ugric genetic origin, so this is inline with a recent estimate for the eastern component in the westernmost Finno-Ugric speakers being less than 10% (but see below).

Notice a couple of Chinese individuals with a small Caucasoid component: as I've mentioned before Mongolians, and presumably northern Han have a small Caucasoid component from early movements of Iranian speakers from the west. That's an advantage of doing your own admixture analysis, that you can look at the data at a fine detail, and not rely on the published figures.

Next, a clustering of Orcadians, Uygur, and Han Chinese:
The variable admixture in Uygurs is evident (47.2-63.7%, mean: 54.2%)

Next, a clustering of Druze, Bedouin, and Bantu from Kenya.

Druze appear complete Caucasoid (red), Bantu completely Negroid (save for a couple of individuals), while Bedouins show a quite variable minor Negroid component. This variable African contribution (0-17.6%) makes an elongated cluster out of Bedouins in a recent analysis, pulling them away from other Middle Eastern populations in a Sub-Saharan direction.

Finally, I clustered European populations together with Mandenka and Han Chinese:

The populations are in the following order: Han, Mandenka, Orcadian, French Basque, French, North Italian, Tuscan, Sardinian, Russian.

Here are the admixture proportions:

Notice how the eastern component in Russians is now estimated as 10.9%. This probably reflects the inclusion of French Basque and Sardinians, i.e., populations which have historically no opportunity for eastern Eurasian admixture, rather than only Orcadians. This underscores the importance of having appropriate poles in inter-continental admixture estimates (see Appendix I).

Note also that the 100% value for the Han Chinese is not incompatible with the presence of the two aforementioned Caucasoid-admixed individuals, who are present here with an estimated 1.9% and 0.5% such admixture. However, this contributes little to the sample average of 40+ individuals.

The minor (0.1%) Sub-Saharan admixture in Tuscans and Sardinians is also interesting. As you can guess from the figure, this stems from a handful of individuals (green specks) with less than 1% admixture, which is, however more than the numerical low of 0.001% inferred for most Europeans by the software.

UPDATE I: Eurasian Cline

Below is a run for the following populations (left-to-right: French Basque, Russians, Uygur, Mongolians, Daur, Han Chinese). Notice that the Mongolic-speakers (Mongolian and Daur from HGDP have a small Caucasoid admixture, as I have mentioned before.
APPENDIX I: The importance of choosing poles

The choice of appropriate poles in the estimation of inter-continental admixture is extremely important.

If there is a racial admixture continuum between two major races, such as we observe in Eurasia, then we can express each intermediate population as a weighted sum of populations that live to the east and west of it.

For example, I will use a variable in interval [0, 1] to represent the position in the continuum, with 0: pure western, and 1: pure eastern.

A population at 0.4 can be expressed as the following weighted sum:

0.4 = 0.6*0 + 0.4*1

i.e., as an admixture of 60% western, and 40% eastern.

But, it can also be expressed as e.g.,

0.4 = 0.612*0.02 + 0.388*1

Notice that the choice of a slightly eastward-tilted "western pole" (at position 0.02 in the continuum) has resulted in a reduction of the inferred eastern component (from 40% to 38.8%).

This is exactly what happened in our example: Russian eastern admixture reduced when we used Orcadians, rather than French Basque as the western pole.

Note also, that this is all done automatically: no one told ADMIXTURE to identify these two poles: it was the presence of unlabeled individuals from different ends of the spectrum that influenced the admixture estimates for the rest.

APPENDIX II: Latent populations

Another important point that needs to be remembered has to do with the possible existence of latent ancestral populations.

For example, it is true that Eurasia (minus South Asia) is economically described as a continuum from the Caucasoids of the Atlantic coast to the Mongoloids of the Pacific, with a transition zone in Central Asia and Siberia, and spillovers on either side. But, we cannot exclude the prehistoric existence of other races in the Eurasian landmass that do not exist today in a relatively unadmixed form.

In Eurasia, the Proto-Uralic race was postulated as such a "third race" with features of its own and not reducible to simple Caucasoid-Mongoloid admixture. It is difficult to see whether these features are ancestral peculiarites (prior to admixture with Caucasoids and Mongoloids), or if they have arisen in a mixed Caucasoid-Mongoloid population.

It is also important to understand how such latent populations affect genetic continua:

First, if the latent population is equidistant from the two major races, then its admixture has no effect on an individual's position in the continuum between the two races. However, it is possible that the latent population was more related to one of the two major races. In that case, admixture with it will move a population towards that race.

So while the jury is still out about the existence of a Proto-Uralic race in Eurasia, its effects on admixed populations indicates that if it had existed it was genetically closer to Mongoloids than to Caucasoids.


pconroy said...


I wouldn't use Orcadians for any population admixture analysis, they are NOT a good sample!

On 23AndMe they are skewed far away from the other European clusters, just like the Basque and Sardinians. Americans who are of British, Scandinavian and Native American heritage, frequently cluster with the Orcadians - providing evidence that there is some Central Asian/Native American heritage in the Orkenyar - or else that there is some very ancient North Eurasian heritage there. Either way, they don't represent Northern or Western Europeans too well.

On DeCodeMe, although I am Irish, I am closer to the French and Icelandic samples than the Orkney one - so a poor proxy for British/Irish too.

Onur Dincer said...
Salabencher said...

pconroy, what do you think about FTDNA using the orcadians as the Western Euro reference. Have you taken the Pop finder test?

Onur Dincer said...

I wouldn't use Orcadians for any population admixture analysis, they are NOT a good sample!

I wouldn't use not just Orcadians, but also any other isolated population like Sardinians and Basques. Caucasoidness should be measured using typical West European populations like the English, French, etc. as poles, just as Mongoloidness is measured using typical central East Asian populations like the Han Chinese, Japanese, etc. as poles.

Fanty said...

Orcadians are a mix of Norse vikings and British women in the first place. So, they should be a typical Germanic/Celtic blend and by this represent a lot of Americans.

If there is native American or central Asian in them, it must be from their Norse admixture.

If they would be such a bad anchor, why would FTDNA do them the highest weighted reference population for their "Western European" tag?

Well ok, they had out "Western European" like candy, maybe because of the quiet centralized position of Orkadians.

I made this (better viewable)map:

from this project here:
500K SNP try to recreate the Bahar map with more populations (volunters who send in their profiles):

Some of these clusters base on too few people yet (UK on 2, Germany on 2...)

I have heard today comes an update, wich is possibly containing my data aswell.

Onur Dincer said...

I don't know much about Orcadian history. But if they are genetically non-isolated (unlike Basques and Sardinians) and are genetically typical West Europeans, they can be used as a pole for Caucasoidness.

Dienekes said...

Basques are not a genetic isolate

Onur Dincer said...

Basques are not a genetic isolate

I should put it this way: Basques are from a genetically relatively inbred/isolated segment of SW Europeans, so they shouldn't be used as a pole for Caucasoidness. Spaniards as a whole or the Portuguese represent the genetics of SW Europe much better than Basques.

Mauri said...

Well, use of genetic poles is apparently important to have distinct results, but also misleading when populations under the analysis has historically nothing to do with one or another of those poles.

Onur Dincer said...

Basques are neither isolated nor inbred, nor distinct from other Iberians.

Basques deviate from other Iberians to some extent (not necessarily very much) in all the freely accessible studies (the study you mention isn't freely accessible) I've seen that included them both. But other Iberians too are distinct from other Europeans to some extent.

My proposal is this: If we are to calculate the Caucasoidness and Mongoloidness of a population, we'd better compare it to a bundle of West European and a bundle of East Asian populations. This will prevent the effects of population-specific deviations.

If we were to use Iberians and general, we would be using a population with North African (and in some cases) Negroid admixture.

I think the non-Caucasoid admixture in Iberians is being exaggerated.

Onur Dincer said...

What I mean by bundle is average.

Dienekes said...

My proposal is this: If we are to calculate the Caucasoidness and Mongoloidness of a population, we'd better compare it to a bundle of West European and a bundle of East Asian populations. This will prevent the effects of population-specific deviations.

Your proposal is misguided as there are Caucasoid/East Asian populations with East Asian/Caucasoid admixture. Hence, your study dedign would not produce an accurate estimate.

If one is interested in determining how much gold was added to an alloy, they do it by measuring against pure (or as unadmixed as possible) gold, they do not measure against what most "gold" artifacts are.

GrIQ said...

The Spaniards always cluster with French and Northern Italians, so I would expect Spaniards to be also 98.1 Euro. Nothing strange.

princenuadha said...

You guys are way over me but isn't race genetics about structure and not viewing populations as being elemental (irreducible).

Supposedly the French Basques are the most "European" because they've had less admixture. But that implies their is a parent population shared by Europeans being considered pure. Haven't the Basques been drifting from this parent population due to evolution? Also that parent population was never perfectly homogeneous. I just don't understand how you decide on a elemental population. The above combined with the fact that the Basques are more isolated from other europeans than say the English possibly makes them less European. Wouldn't the English better capture recent European specific mutations?

I guess I'm talking about a trade off.

Onur Dincer said...

there are Caucasoid/East Asian populations with East Asian/Caucasoid admixture

If ingredients of the West European and East Asian averages are chosen from the racially purest West European and East Asian populations, then their averages can be used as gold standards for Caucasoidness and Mongoloidness respectively. A similar gold standard for Dravidoidness (South Asian race) can be established from an average of the racially most Dravidoid South Asian populations. In fact, similar gold standards can be established for virtually all races.

Fanty said...

The dog bites itself into the tail.

"Racial purity" could be only measured by autosomal genetic testing.

But to interpret the results you need to know wich are the "racial purest" populations.

So this is doomed to fail.

I have made several diverent maps of those admixture calculations of Behar recently (used the more detailed charts in the PDF). To get a better picture of what they claim.

This here are maps of K2 and K3 intra European:

Thats a map of the K8 intra European:

And this is a razer lazy done K10, world wide (Actually Europe, Africa, Asia) in a different style (one map for each admixture component)

The darkness of the color is also not proportional to the admixture. But good enough to get the idea.

Its that admixture result that made Litharians apear "THE European" somehow.

Onur Dincer said...

Fanty, your third group of maps are implausible as they show the "southern Mongoloid" component in Asia Minor in equal shade with Kazakhstan and show no visible "southern Mongoloid" component in southern Central Asia, and also as they show the "northern Mongoloid" component more in East Europe and Asia Minor than in most of Central Asia. These are all in direct contradiction with the results of Behar et al. and all other genetic studies of these regions so far and also with anthropology and craniometry in addition to genetics.

The dog bites itself into the tail.

"Racial purity" could be only measured by autosomal genetic testing.

But to interpret the results you need to know wich are the "racial purest" populations.

So this is doomed to fail.

My formula above is plausible. All I need is experiments to show that it works.

Onur Dincer said...

One more thing to add: most of Persia, Mesopotamia, Syria and Transcaucasia have the "northern Mongoloid" component in equal shade with most of Central Asia in your maps, which is also implausible.

Pikeperch said...

In Russia the largest East Asian component probably is in the regions affected by the Turco-Mongol invasion, like those close to Kazakhstan, Southern Volga and Tatarstan. describes both mtDNA and Y in various regions and the share of eastern mtDNA is high in these areas but absent in old Finno-Ugric populatiosn like Mari and Komi-Zyryans.
Inn the Far North there might be another source of Eastern mtDNA tha originally came from the Paleosiberian Q populations to the Khanty over the Ob-Yenisey river systems. Some of it was transmitted with fur trade westwards to the Komi and Vepsians and from them to Novgorod.
The Saami are very heterogenous and might have an old Iberian component by th atlantic caost but also a Q component by the Atlantic coast from the East.
A trace of Q can still be found in Trondheim region in Norway, and might have reached the Orkney islands also.
DNA Tribes seems to find some " Amerind" influence over Siberia. now has a rather new article by Anatoli Klyosov that might deserve its own discussion.

Pallantides said...

Y-DNA Q is absent in the Saami, but found at 4% in Norwegians

Errant-programmer said...

How does this scheme for the Finns and Skandinavia nations?