December 10, 2011

First analysis of Metspalu et al. (2011) data (plus K12a admixture calculator)

Here are the results of my first analysis of the new Metspalu et al. (2011) data (populations with _M endings), together with a large number of other samples from various sources, including Chaubey et al. (2011) (_Ch endings), that I had not used before.



Uploaded with ImageShack.us

Spreadsheet of population averages; no outliers removed in source datasets. I'll defer all the technical and other details for when I release Dodecad v4, which will (most likely) be based on the same dataset.

Fst divergences:


MDS plot of first two dimensions based on above table.

You can use DIYDodecad 2.1 with the 'K12a' calculator, which incorporates the K=12 inferred clusters of the above analysis.

Instructions: uncompress the contents of the K12a bundle to your working directory, and follow the instructions of the DIYDodecad 2.1 README file, substituting 'K12a' for 'dv3' in all those instructions. Terms of use: 'K12a', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.

33 comments:

pconroy said...

Map of Gedrosia - for those like me, who never heard of it!

http://en.wikipedia.org/wiki/File:Gedrosia-Map-Route-of-Alexander-1823-Lucas.png

Onur Dincer said...

Dieneke, why did you use less samples for some of the Dodecad populations in this run than you used in some of the previous runs?

princenuadha said...

I like this analysis best; the different elements seem to spread out in a very orderly way. Like the caucasus element that peaks in the caucasus and drops quickly outside the caucasus. in the last analysis the "west asain" was spread out in a more random way.

And wow is "north European" very structured. It goes across northern Europe and has a huge drops/ from ukraine to romania. and a pretty significant drop from the British to the French.

Also I noticed that moroccans essentially have no "North European". Since moroccans do have European admixture maybe they got it before "north European" was in western Europe.

@onur

I noticed that in this run the d results matched better (very close) with the non d sets.

Vasishta said...

A rather interesting choice of labeling for the Balochistan centered component;

Gedrosia ( /dʒɨˈdroʊʒə/; Greek: Γεδρωσία) from Pashto Gwadar-khua is the hellenized name of an area that corresponds to today's Balochistan. Eastern Balochistan is southwestern province of Pakistan and parts of southwestern and south-central Afghanistan and western Balochistan is divided between Iranian provinces of Hormozgan and Sistan va Baluchestan. The area which is named Gedrosia, in books about Alexander the Great and his successors, runs from the Indus River to the southern edge of the Strait of Hormuz. It is directly to the south of the Iranian countries of Bactria, Arachosia and Drangiana, to the east of the Iranian countries of Persia and Carmania and due west of the Indus River which formed a natural boundary between it and Western India. In 325 BC, Alexander the Great crossed the area on his way back to Babylon after campaigning in the east. Historians say he lost three-quarters of his army to the harsh desert conditions along the way. John Prevas, in his book "Envy of the Gods: Alexander the Great's Ill-Fated Journey Across Asia", says that Alexander wanted to punish his army for retreating from risking a military campaign against the powerful kingdoms in India proper.

Gedrosia here, seems to parallel the South-Central Asian component inferred in your ADMIXTURE analysis of Eurasian populations with K=15 and Harappa's North Indian component in Ref3 + Yunusbayev Caucasus Data Admixture @ K=15. The South Asian component, that peaks in the Tamil Pulayar, is more akin to an ASI-like component (but still with a West-Eurasian fraction subsumed under it), than the South Asian component generally inferred by Admixture.

Dienekes, I'm interested in what you think of the occurrence of the North European component even in some of the more peasant castes of Uttar Pradesh, in North-Central India, which is the most populous state in India. Here is a list of the non-upper caste groups from U.P who seem to have non-noisy amounts of the component. I have mentioned their traditional professions in the brackets wherever possible;
- U.P Chamar (varied menial professions) - 2.4%
- U.P Dharkar (weavers, cane manufacturers, porters etc) - 5.8%
- U.P Dusadh (porters, watchmen) - 2.5%
- U.P Kanjar (nomadic group, reputation as criminals) - 6.2%
- U.P Kurmi (agriculturalists) - 4.8%

The Rajasthani Meghwal, are a generic peasant caste from Rajasthan in NW India, and are traditionally often weavers. They seem to have 3.8% of this component. These percentages, sans the Kanjar, Dharkar and Kurmi are rather small, but still notable. It's interesting how generic peasant castes in South India lack this component altogether, while those from U.P seem to have a small percentage of this same. This seems to correlate with the non-Indo Aryan language of the former and the Indo-Aryan language of the latter. In light of the discussion we were having a while ago, the component's widespread-ness suggests that it is certainly more ancestral in nature and be attributed more antiquity as opposed to admixture with an intrusive, and somewhat recent folk like the Saka and other intrusive non-South Asians. India's populousness would have seriously made it rather difficult for such invaders to have been able to spread their genes to such an extent.

Dienekes said...

Uttar Pradesh was part of the Kushan Empire, South India was not. While it is impossible to discount the possibility that some of this influence predated the Indo-Scythian invasions, its levels are comparable to what we see elsewhere in Eurasia (e.g., Anatolia) with regard to recently intrusive mobile elements superimposed on a dense agricultural population.

Also, the difference between 3.8% and 0% is not one of kind (presence/absence), but of degree, with 0% signifying minimum, not absence.

Another place where the clear Indo-Scythian origin of this influence can be seen is in Pakistan; the Iranic-speaking Pathans, who are presumably more descended from this element than the Indo-Aryan speaking Sindhi, have more of it, and both have more than the populations of Balochistan who lived outside the Kushan Empire, who, conversely have more of the "Southwest Asian" element, which reflects their own superimposed foreign influence (from the Near East).

Onur Dincer said...

its levels are comparable to what we see elsewhere in Eurasia (e.g., Anatolia) with regard to recently intrusive mobile elements superimposed on a dense agricultural population.

The "North European" component levels of North-Central Indians are too high to be completely or mostly from Saka and/or other recent steppe influence. Bulk of that component probably comes from pre-Iron Age migrations to North-Central India.

Dienekes said...

"Too high" or "too low" are useless opinions without evidence or accompanying argument.

Onur Dincer said...

"Too high" or "too low" are useless opinions without evidence or accompanying argument.

I made my statement based on a comparison of the "North European" component levels of North-Central Indians with those of the peoples of the surrounding regions (Pakistan, the rest of India, Afghanistan, Iran, Central Asia), historical information on migrations from Central Asia to South Asia and the estimated demographics of the areas in question in the relevant eras.

Dienekes said...

A collection of words does not an argument make.

Onur Dincer said...

The only probably Saka-Scythian (or Tocharian) genetic influenced South Asian population I have seen so far is Jats. There may of course be others, but indigenous peoples of Uttar Pradesh do not seem to be so.

Andrés said...

I've been following the genome blogosphere for a while, but there's still some things I don't understand. I have a lot of questions about the criteria for choosing K, the specific clustering algorithms, the percentage of variance lost in dimensionality reduction, etc.
Is there any web page or paper that describes the standard practices of genome information processing?
If anyone knows please post a link, it will be very appreciated.

princenuadha said...

I'm not sure if it is significant but does anyone have an idea on why the gedrosia element forms a nice cline in Europe that peaks in the northwest?

Matt said...

Dienekes,

Do you think most of these clusters reflect population movements or just areas where people tend to breed more frequently with one another, without any actual movements of people occuring?

I remember reading on your blog a few years ago that a cluster involving particular groups means that breeding between groups was more frequent between those groups, i.e. if you have population A and B and C and D, and A breeds more with B and C breeds more with D, you'd get AB and CD clusters without population movements.

So I'd guess you've thought about it and would have good reasons either way.

Giulia said...

i'm perplexed.. how can some northern european group have more mediterranean admixture than some italians like northern italians or tuscans?

Dienekes said...

The component can be called whatever one wants. I called it Mediterranean because it was modal in Sardinians and Basques. Whatever it's called, it has the distribution shown.

Grey said...

@Giulia
"i'm perplexed.. how can some northern european group have more mediterranean admixture than some italians like northern italians or tuscans?"

If mediteranean wasn't mediteranean but basal i.e. base neolithic farmers, and that was then mixed in varying proportions with northern migrations folding backwards in proportion to the path of least resistance then the "mediterranean" element might survive better in inaccessible parts of the northern or central regions (like Wales or Brittany) then in the accessible parts of southern regions (like Tuscany).

It would survive the best in the least accessible parts of the south.

Anonymous said...

You included the two Romanians from Behar et al. 2010 with Gypsy admixture.

AP said...

Re: "Uttar Pradesh was part of the Kushan Empire"

Uttar Pradesh was the heart of the Kushan empire with Mathura as one of their capitals. But why would the Kushans introduce that European element? There are some supposed remnants of the Kushans who are called Banaphars (cf. Vanashpara in Sarnath Inscription) made famous in lore by their clansmen Alha and Udal. http://books.google.com/books?id=uYXDB2gIYbwC&pg=PA4

Average Joe said...

Dienekes:

Nice work! Do you have any theories as to why the Gedrosia component is relatively high in the Irish sample compared to other European populations?

Dienekes said...

@AP The Northern European element is not limited to Europe. There is actually a latitudinal arrangement of the Caucasoid components in Asia, from north to south: NorthEuro, Caucasus, Gedrosia. The Kushans are derived ultimately from the north, so they would have shifted people in the direction measured by the NOrth European component.

Dienekes said...

Nice work! Do you have any theories as to why the Gedrosia component is relatively high in the Irish sample compared to other European populations?

I don't have an easy theory. My guess is that Central Asia has something to do with it, and there is a latent element that is beyond our reach using modern populations, because Central Asia has been much changed due to the arrival of Turkic peoples.

This may also have something to do with it, although I don't necessarily agree

http://rbedrosian.com/Classic/Indop2.jpg

Anonymous said...

That was my very first thought after looking at the Fst distances. It could be a wave of West Asians of mostly R1b carriers that became differentiated on their way with admixture of Central Asians like Northwest_African is with Mediterranean plus African admix. The only Europeans that show a >1 Gedrosia to Caucasus ratio are the Basque, British Islanders (Scots>Irish>English), Scandinavians, and Dutch. These are isolated areas which could partly explain the higher ratios.

Eduardo Pinto said...

Hello Dienekes

Is this an a unsupervised run? If so, how come that the "Mediterranean" component is dominant in the two greatest European genetic isolates, the Basques and the Sardinians?

Wouldn't it be more likely for each one of these two populations to create their own specific cluster?

Dienekes said...

Yes, this is an unsupervised run.

Onur Dincer said...

The level of the "South Asian" component is unusual for the Dodecad Egyptians. If it is purely due to a single individual, I think he/she is most likely a Gypsy or recent Gypsy descendant. I would permanently remove such an individual from the Dodecad Egyptian population, as a Gypsy or recent Gypsy descendant cannot represent the Egyptian gene pool.

Anonymous said...

> The level of the "South Asian" component is unusual for the Dodecad Egyptians.

The Egyptians (n=12) from Behar et al. 2010 are 0.3 South_Asian. What are you talking about?

Dienekes said...

There are no Egyptians in the spreadsheet, since there are not 5 Egyptians in the Dodecad Project.

Onur Dincer said...

There are no Egyptians in the spreadsheet, since there are not 5 Egyptians in the Dodecad Project.

But there are Dodecad Egyptians in your ADMIXTURE analysis:

http://imageshack.us/photo/my-images/607/40469901.png/

Dienekes said...

I know what's in the ADMIXTURE analysis. What I'm telling you is that there are not 5 Egyptians in the Dodecad Project, hence there is no Egyptian average reported, hence your complaint that "The level of the "South Asian" component is unusual for the Dodecad Egyptians." is meaningless. If there are ever enough Dodecad Egyptians, we can tell whether the South Asian component in any number of them is unusual within the broader context, and hence whether they constitute outliers.

Dienekes said...

Also, after checking the actual Egyptian_D average for "South Asian" (pink) it is less than 3%, so I don't get what the big deal is anyway.

Anonymous said...

I didn't realize that picture showed Dodecad populations with n<5. Danes also have a positive Gedrosia:Caucasus ratio. I calculated Austrians and German Swiss, because I wanted to know. For anyone interested,

Austrian_D: 45.6 N_Eu, 35.7 Med, 10.9 Cau, 4.8 Ged, 2.6 SW_As, 0.3 Sib, 0.2 S_As
Swiss_German_D: 39.2 Med, 36.5 N_Eu, 17.2 Cau, 4.9 Ged, 1.0 SW_As, 0.3 S_As, 0.3 Sib, 0.2 NW_Af, 0.2 SE_As

Oracle showed me closest to Hungarians (9.1088) followed by Mixed_Germanic_D, German_D, and Dutch_D (11.1405). I calculated the following distances, and they're closer to me which isn't surprising.

Austrian_D (4.9356)
Swiss_German_D (7.5980)

Anonymous said...

So what would settled indo-iranian populations look autosomally without the ASI admixture/later West Eurasian ancestry(greeks, british, turks).

Basically what did the andronovo/bmac(and ivc) mixed people be autosomally and in what percentage? (would there be med among them(maybe associated with the small amounts of r1b that apparently do exist (are these from the neolithic or later north iranian empires)?

Would SW Asian be present?

Is Gedrosia associated with SW Asian or W Asian? Is it purely West Eurasian?

And what type of phenotype would that mix produce?

anthrospain said...

This mediterranean componente is more like a SouthWestern, since it peaks in Basques/Sardinians, and it is more frequent in Germanics than in Greeks, Italians or Cyprus, so it's not really mediterranean.