March 29, 2011

The power of Clusters Galore: Iranians and Arabs

The full power of Clusters Galore depends on its ability to infer clusters of arbitrary size, shape, and orientation in a high-dimensional space. It achieves this by using MCLUST over an MDS or PCA representation of dense genomic data.

Nonetheless, we can still see get a sense of it even in a simple 2D representation as the following:
This was produced by applying MDS on 240 individuals (from Behar et al. 2010, HGDP, Xing et al. 2010, and the Dodecad Project).

One can see that the Behar et al. and Dodecad Iranians form a small cluster on the right, together with the Xing et al. Kurds and the single Dodecad Kurd. Arabs are quite more variable: Druze extend to the bottom of the figure, Bedouin form two groups: one similar to other Arabs, the other extending to the left of the figure. There are also a few Arabs stretching to the top.

The variability of the Arabs can be attributed to reproductive isolation, inbreeding, and variable amounts of African admixture. Let's apply MCLUST over these 240 2D points:
The above visual representation shows the centroids and shapes of the 5 inferred clusters. Here are the numbers of individuals from each population assigned to each cluster:

Notice cluster #5: it consists of all Kurds, most Behar et al. Iranians and all Dodecad ones, and the single Dodecad Kurd, plus a Lebanese and a Syrian. It is overall 96% Iranic in composition. It is quite tempting to think that the two Syrian and Lebanese members have some links to Iranian peoples either due to Kurdish ancestry or the Shia form of Islam.

The more variable Arabs are split into multiple clusters: the main, tight, cluster #3 which includes most of the Levantine Arabs, but also some Saudis and Yemenese, the extremely variable African-admixed cluster #1 dominated by some Yemenese but including a few others, the "Arabian" Saudi-Bedouin dominated cluster #2, and the Druze-specific cluster #4.

It seems that just as the distinction between Celto-Germans and Balto-Slavs is not only cultural, but also genetic, so is the distinction between Iranian and Arab. In the case of the Arabs though, religious distinctions (e.g., the Druze), variable African admixture, and quite possibly Arabization of Levantine populations has resulted in a non-homogeneous array of genetic clusters.

PS: Iranic groups are also not homogeneous if one includes some of those from South Asia, as evidenced by this previous genetic map of West Eurasians which analyzed Kurds and Iranians together with Pathans and Balochis.

20 comments:

astenb said...

Typo in the Yemeni Sample size.

Davidski said...

Dienekes, why does "Clusters Galore" lump the Vologda Russians with Finns? As far as I can see, it does that.

These groups are different from each other, as proved by recent research, and that's easily seen by running a few basic intra-European PCAs.

Does that mean that "Clusters Galore" doesn't work very well in at least some instances?

Dienekes said...

Dienekes, why does "Clusters Galore" lump the Vologda Russians with Finns? As far as I can see, it does that.

It does not:

http://dodecad.blogspot.com/2010/12/genetic-structure-in-north-central.html

All 25 HGDP Russians in cluster #12 and no one else except 2 of 7 Dodecad Russians.
All 7 Dodecad Finns in cluster #6 and no one else.

I.e., Clusters Galore distinguishes between the two groups perfectly.

Does that mean that "Clusters Galore" doesn't work very well in at least some instances?

Perhaps, but the instance you refer to is one in which it works perfectly well.

Davidski said...

I meant instances like this, where the Finnish and Russian clusters overlap...

http://dodecad.blogspot.com/2011/02/clusters-galore-with-dodecad.html

I see now that these are "your" Russians, and not the HGDP Russians.

I suppose it's possible that some of them might be very similar to Finns.

Dienekes said...

I suppose it's possible that some of them might be very similar to Finns.

Either that, or there are not enough samples in the Dodecad Project alone to form a Russian cluster.

If I added the 25 HGDP Russians to the reference set, some the Dodecad Russians would attach themselves to them, just as they did in the experiment I referred to, and which showed the perfect separability of Finns and Russians.

Dienekes said...

Typo in the Yemeni Sample size.

Thnx for the tip. I did a replace "0" with "", but I forgot to tick "match entire cell contents". It's alright now.

Eze said...

The HGDP Bedouins seem to consist of two very different clans/tribes. I believe they were all sampled in the Negev desert of Israel yet still show strong population substructure. One group is more similar to Palestinians, while the other group has an affinity with Saudis.

Daro said...

It would be interesting to include other northern middle eastern populations in the Cluster Galore and MCLUST data, e.g. Jews from Iran/Iraq, Assyrians, Turks, Armenians and Georgians. I suspect that they would fall into cluster#5.

Andrew Oh-Willeke said...

The very distinctive cluster of the Druze that is orthogonal to the Beodin-Kurd/Iranian continum, rather than on it, while not really surprising (anyone whose looked at the data long enough know that the Druze are genetically distinctive), is notable in that it is at odds with a community origin myth that really isn't that old of the Druze being multi-ethnic with its deepest ties to somewhere in the general vicinity of Kurdistan-Iran. Yet, it seems to be the opposite axis of a very thin subset of Yemeni-Beodins. For world populations they are already distinct at K=7 which suggests distinct ethnic roots far older than the Druze religion. How did a Near East ethnic crossroads population manage to get so distinct? A 2008 paper calls them a refugium, but from whom and from where? Their mix of mtDNA X1, X2 and X* screams that this ethnicity has been distinctive since the Upper Paleolithic, not just for the last thousand years. What happened to the groups genetically intermediate between them and other Near Eastern populations?

IHTG said...

Most likely, what happened is that there were particular endogamous tribes in the Levant that became Druze, when that religion emerged. Before that, they had a different identity.
(It's quite likely they wouldn't have remained endogamous if they'd not become Druze)

dok101 said...

@ Daro

In Dienekes' "A genetic map of West Eurasians" analysis from a couple of months ago, the populations of the Mid-East and Caucasus clustered as follows:

Cluster 2: 1/19 SEJ
Cluster 5: 12/12 AJD, 1/16 MOJ, 6/19 SEJ
Cluster 8: 6/7 ARD, 7/8 ASD, 8/8 AZJ, 3/4 GEJ, 4/4 IRJ, 5/11 IQJ, 1/2 UZJ
Cluster 9: 1/7 ARD, 16/20 GEO
Cluster 10: 12/12 CYP, 2/19 SEJ
Cluster 13: 11/17 ADY, 3/20 GEO, 1/18 LEZ, 2/19 TUR
Cluster 15: 1/8 ASD, 31/42 DRZ, 6/11 IQJ, 3/3 SAM, 1/16 SYR
Cluster 16: 1/17 ADY, 1/20 GEO
Cluster 18: 11/46 BED, 1/4 GEJ, 15/20 JOR, 2/7 LEB, 39/46 PAL, 1/20 SAU, 5/16 SYR
Cluster 19: 8/42 DRZ
Cluster 20: 3/42 DRZ, 3/20 JOR, 4/7 LEB, 1/46 PAL, 1/20 SAU, 8/16 SYR
Cluster 21: 1/7 LEB, 1/16 SYR, 14/19 TUR, 1/2 UZJ
Cluster 23: 15/16 MOJ, 10/19 SEJ
Cluster 24: 5/17 ADY, 17/18 LEZ, 5/5 STL, 1/19 TUR, 18/18 URK
Cluster 25: 13/46 BED, 6/46 PAL, 4/20 SAU, 1/15 YEJ
Cluster 26: 14/15 YEJ
Cluster 27: 3/20 IRA, 1/20 JOR, 1/20 SAU, 1/16 SYR
Cluster 29: 15/20 IRA, 24/24 KUR, 2/19 TUR
Cluster 30: 2/46 BED, 11/20 SAU
Cluster 31: 1/20 SAU
Cluster 32: 2/20 IRA, 1/20 SAU
Cluster 33: 1/20 JOR
Cluster 34: 9/46 BED
Cluster 35: 10/46 BED
Cluster 36: 1/46 BED

TruthPlease said...
This comment has been removed by the author.
Andrew Oh-Willeke said...

"there were particular endogamous tribes in the Levant that became Druze"

Fair enough. But who? There are 4500 years of historical records that precede the establishment of the Druze religion, and historical records are more rich in the general vicinity of the Levant than almost anywhere else on Earth in that time frame. An endogamous tribe that coherent ought to have been mentioned by someone at some point and stick out pretty distinctly.

Daro said...

@dok101
I am confused.
In Dodecad Ancestry Project, it was stated:
"It appears that Kurds are not particularly closely related to their linguistic cousins, the Iranians. Neither are they very close to Turks and Armenians...
The distinctiveness of the Kurds is also evident in the ADMIXTURE analysis:
The high blue component distinguishes Kurds from both Iranians and Armenians/Turks. Iranians have slightly more of it, suggesting a somewhat closer relationship."
The ADMIXTURE analysis and MDS plot (Dodecad) show that Kurds are distinctive from Iranians.
Samples from Kurds, Iranians, Turks and Armenians were used.

But the current MDS plot and Cluster Galore analysis (5 clusters) shows that Kurds are not distinctive from Iranians.
Samples from Kurds, Iranians, Syrians, Lebanese, Palestinians, Druze, Bedouins, and Yemenites were used.
The previous Cluster Galore analysis (13 Clusters) that you mentioned (A genetic map of West Eurasians) also did not show the distinctiveness of the Kurds. Surprisingly, the MDS plot showed some differences between Kurds and Iranians.
I don't get it.

Dienekes said...

Daro, the study you refer to was about West Asian IE groups

http://dodecad.blogspot.com/2010/12/structure-in-west-asian-indo-european.html

It is true that Kurds are distinct from Iranians. But that difference is of a lower order than that between Iranic speakers and Arabs.

In this particular experiment, the first two dimensions capture certain aspects of variation, and the Kurdish-Iranian distinction is minor compared to the Arab-Iranic/Druze-Levantine/Caucasoid-African etc, so it is not evident. It would appear if more MDS dimensions were plotted.

Creative said...

I have no doubt that all Arab Bedouins have a deep Semitic root, but considering the fact that Nomadic pastoralism is and was such a successful way of living in the Middle East regardless of Aramaic/Arab - ization or any other Semitic process. It wouldn't be surprising if some Arab Bedouins are acutely ethnic Akkadians or some other Semitic speaking nomads of the past.

agit123 said...

another fantastic work Dienekes. Between the Kurd from Dodecad is a Eastern (Iranian) Kurd and still shows stronger affinities to Kurds in South(Iraq). It is nice to see how the Kurds even while devided are still somehow similar.

agit123 said...

I could bet that when some Anatolian and Syrian Kurds get tested we get a continuum from Iranian Kurdistan to Anatolian Kurds. The border seems to have affected the Kurds not that much. I as a Anatolian Kurd am thinking about autosomal Tested.

agit123 said...

@Daro in compare to Bedoins Kurds and Iranians are much closer to each other. But in compare with other West Asians like Turks, Georgians, Armenians. It gets more clear that Kurds are distinctive. And I also mentioned that those Tested Kurds are all of Southkurdistan(Iraq) and the Kurd from Dodecad is also a Iranian Kurd from East Kurdistan. North Iraq is situated between Anatolia and Iran so no surprise for me. I suspect that the Anatolian, Syrian Kurds might build a continuum from the one Kurd of Dodecad, over Georgians to Armenians, and Turks. So somewhere between those, somehow similar to their geographic Situation.

Dienekes said...

agit123, don't double- or triple-post. I let it slide because you are new here.