March 29, 2011

The power of Clusters Galore: Iranians and Arabs

The full power of Clusters Galore depends on its ability to infer clusters of arbitrary size, shape, and orientation in a high-dimensional space. It achieves this by using MCLUST over an MDS or PCA representation of dense genomic data.

Nonetheless, we can still see get a sense of it even in a simple 2D representation as the following:
This was produced by applying MDS on 240 individuals (from Behar et al. 2010, HGDP, Xing et al. 2010, and the Dodecad Project).

One can see that the Behar et al. and Dodecad Iranians form a small cluster on the right, together with the Xing et al. Kurds and the single Dodecad Kurd. Arabs are quite more variable: Druze extend to the bottom of the figure, Bedouin form two groups: one similar to other Arabs, the other extending to the left of the figure. There are also a few Arabs stretching to the top.

The variability of the Arabs can be attributed to reproductive isolation, inbreeding, and variable amounts of African admixture. Let's apply MCLUST over these 240 2D points:
The above visual representation shows the centroids and shapes of the 5 inferred clusters. Here are the numbers of individuals from each population assigned to each cluster:

Notice cluster #5: it consists of all Kurds, most Behar et al. Iranians and all Dodecad ones, and the single Dodecad Kurd, plus a Lebanese and a Syrian. It is overall 96% Iranic in composition. It is quite tempting to think that the two Syrian and Lebanese members have some links to Iranian peoples either due to Kurdish ancestry or the Shia form of Islam.

The more variable Arabs are split into multiple clusters: the main, tight, cluster #3 which includes most of the Levantine Arabs, but also some Saudis and Yemenese, the extremely variable African-admixed cluster #1 dominated by some Yemenese but including a few others, the "Arabian" Saudi-Bedouin dominated cluster #2, and the Druze-specific cluster #4.

It seems that just as the distinction between Celto-Germans and Balto-Slavs is not only cultural, but also genetic, so is the distinction between Iranian and Arab. In the case of the Arabs though, religious distinctions (e.g., the Druze), variable African admixture, and quite possibly Arabization of Levantine populations has resulted in a non-homogeneous array of genetic clusters.

PS: Iranic groups are also not homogeneous if one includes some of those from South Asia, as evidenced by this previous genetic map of West Eurasians which analyzed Kurds and Iranians together with Pathans and Balochis.

20 comments:

  1. Dienekes, why does "Clusters Galore" lump the Vologda Russians with Finns? As far as I can see, it does that.

    These groups are different from each other, as proved by recent research, and that's easily seen by running a few basic intra-European PCAs.

    Does that mean that "Clusters Galore" doesn't work very well in at least some instances?

    ReplyDelete
  2. Dienekes, why does "Clusters Galore" lump the Vologda Russians with Finns? As far as I can see, it does that.

    It does not:

    http://dodecad.blogspot.com/2010/12/genetic-structure-in-north-central.html

    All 25 HGDP Russians in cluster #12 and no one else except 2 of 7 Dodecad Russians.
    All 7 Dodecad Finns in cluster #6 and no one else.

    I.e., Clusters Galore distinguishes between the two groups perfectly.

    Does that mean that "Clusters Galore" doesn't work very well in at least some instances?

    Perhaps, but the instance you refer to is one in which it works perfectly well.

    ReplyDelete
  3. I meant instances like this, where the Finnish and Russian clusters overlap...

    http://dodecad.blogspot.com/2011/02/clusters-galore-with-dodecad.html

    I see now that these are "your" Russians, and not the HGDP Russians.

    I suppose it's possible that some of them might be very similar to Finns.

    ReplyDelete
  4. I suppose it's possible that some of them might be very similar to Finns.

    Either that, or there are not enough samples in the Dodecad Project alone to form a Russian cluster.

    If I added the 25 HGDP Russians to the reference set, some the Dodecad Russians would attach themselves to them, just as they did in the experiment I referred to, and which showed the perfect separability of Finns and Russians.

    ReplyDelete
  5. Typo in the Yemeni Sample size.

    Thnx for the tip. I did a replace "0" with "", but I forgot to tick "match entire cell contents". It's alright now.

    ReplyDelete
  6. The HGDP Bedouins seem to consist of two very different clans/tribes. I believe they were all sampled in the Negev desert of Israel yet still show strong population substructure. One group is more similar to Palestinians, while the other group has an affinity with Saudis.

    ReplyDelete
  7. It would be interesting to include other northern middle eastern populations in the Cluster Galore and MCLUST data, e.g. Jews from Iran/Iraq, Assyrians, Turks, Armenians and Georgians. I suspect that they would fall into cluster#5.

    ReplyDelete
  8. The very distinctive cluster of the Druze that is orthogonal to the Beodin-Kurd/Iranian continum, rather than on it, while not really surprising (anyone whose looked at the data long enough know that the Druze are genetically distinctive), is notable in that it is at odds with a community origin myth that really isn't that old of the Druze being multi-ethnic with its deepest ties to somewhere in the general vicinity of Kurdistan-Iran. Yet, it seems to be the opposite axis of a very thin subset of Yemeni-Beodins. For world populations they are already distinct at K=7 which suggests distinct ethnic roots far older than the Druze religion. How did a Near East ethnic crossroads population manage to get so distinct? A 2008 paper calls them a refugium, but from whom and from where? Their mix of mtDNA X1, X2 and X* screams that this ethnicity has been distinctive since the Upper Paleolithic, not just for the last thousand years. What happened to the groups genetically intermediate between them and other Near Eastern populations?

    ReplyDelete
  9. Most likely, what happened is that there were particular endogamous tribes in the Levant that became Druze, when that religion emerged. Before that, they had a different identity.
    (It's quite likely they wouldn't have remained endogamous if they'd not become Druze)

    ReplyDelete
  10. @ Daro

    In Dienekes' "A genetic map of West Eurasians" analysis from a couple of months ago, the populations of the Mid-East and Caucasus clustered as follows:

    Cluster 2: 1/19 SEJ
    Cluster 5: 12/12 AJD, 1/16 MOJ, 6/19 SEJ
    Cluster 8: 6/7 ARD, 7/8 ASD, 8/8 AZJ, 3/4 GEJ, 4/4 IRJ, 5/11 IQJ, 1/2 UZJ
    Cluster 9: 1/7 ARD, 16/20 GEO
    Cluster 10: 12/12 CYP, 2/19 SEJ
    Cluster 13: 11/17 ADY, 3/20 GEO, 1/18 LEZ, 2/19 TUR
    Cluster 15: 1/8 ASD, 31/42 DRZ, 6/11 IQJ, 3/3 SAM, 1/16 SYR
    Cluster 16: 1/17 ADY, 1/20 GEO
    Cluster 18: 11/46 BED, 1/4 GEJ, 15/20 JOR, 2/7 LEB, 39/46 PAL, 1/20 SAU, 5/16 SYR
    Cluster 19: 8/42 DRZ
    Cluster 20: 3/42 DRZ, 3/20 JOR, 4/7 LEB, 1/46 PAL, 1/20 SAU, 8/16 SYR
    Cluster 21: 1/7 LEB, 1/16 SYR, 14/19 TUR, 1/2 UZJ
    Cluster 23: 15/16 MOJ, 10/19 SEJ
    Cluster 24: 5/17 ADY, 17/18 LEZ, 5/5 STL, 1/19 TUR, 18/18 URK
    Cluster 25: 13/46 BED, 6/46 PAL, 4/20 SAU, 1/15 YEJ
    Cluster 26: 14/15 YEJ
    Cluster 27: 3/20 IRA, 1/20 JOR, 1/20 SAU, 1/16 SYR
    Cluster 29: 15/20 IRA, 24/24 KUR, 2/19 TUR
    Cluster 30: 2/46 BED, 11/20 SAU
    Cluster 31: 1/20 SAU
    Cluster 32: 2/20 IRA, 1/20 SAU
    Cluster 33: 1/20 JOR
    Cluster 34: 9/46 BED
    Cluster 35: 10/46 BED
    Cluster 36: 1/46 BED

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. "there were particular endogamous tribes in the Levant that became Druze"

    Fair enough. But who? There are 4500 years of historical records that precede the establishment of the Druze religion, and historical records are more rich in the general vicinity of the Levant than almost anywhere else on Earth in that time frame. An endogamous tribe that coherent ought to have been mentioned by someone at some point and stick out pretty distinctly.

    ReplyDelete
  13. @dok101
    I am confused.
    In Dodecad Ancestry Project, it was stated:
    "It appears that Kurds are not particularly closely related to their linguistic cousins, the Iranians. Neither are they very close to Turks and Armenians...
    The distinctiveness of the Kurds is also evident in the ADMIXTURE analysis:
    The high blue component distinguishes Kurds from both Iranians and Armenians/Turks. Iranians have slightly more of it, suggesting a somewhat closer relationship."
    The ADMIXTURE analysis and MDS plot (Dodecad) show that Kurds are distinctive from Iranians.
    Samples from Kurds, Iranians, Turks and Armenians were used.

    But the current MDS plot and Cluster Galore analysis (5 clusters) shows that Kurds are not distinctive from Iranians.
    Samples from Kurds, Iranians, Syrians, Lebanese, Palestinians, Druze, Bedouins, and Yemenites were used.
    The previous Cluster Galore analysis (13 Clusters) that you mentioned (A genetic map of West Eurasians) also did not show the distinctiveness of the Kurds. Surprisingly, the MDS plot showed some differences between Kurds and Iranians.
    I don't get it.

    ReplyDelete
  14. Daro, the study you refer to was about West Asian IE groups

    http://dodecad.blogspot.com/2010/12/structure-in-west-asian-indo-european.html

    It is true that Kurds are distinct from Iranians. But that difference is of a lower order than that between Iranic speakers and Arabs.

    In this particular experiment, the first two dimensions capture certain aspects of variation, and the Kurdish-Iranian distinction is minor compared to the Arab-Iranic/Druze-Levantine/Caucasoid-African etc, so it is not evident. It would appear if more MDS dimensions were plotted.

    ReplyDelete
  15. I have no doubt that all Arab Bedouins have a deep Semitic root, but considering the fact that Nomadic pastoralism is and was such a successful way of living in the Middle East regardless of Aramaic/Arab - ization or any other Semitic process. It wouldn't be surprising if some Arab Bedouins are acutely ethnic Akkadians or some other Semitic speaking nomads of the past.

    ReplyDelete
  16. another fantastic work Dienekes. Between the Kurd from Dodecad is a Eastern (Iranian) Kurd and still shows stronger affinities to Kurds in South(Iraq). It is nice to see how the Kurds even while devided are still somehow similar.

    ReplyDelete
  17. I could bet that when some Anatolian and Syrian Kurds get tested we get a continuum from Iranian Kurdistan to Anatolian Kurds. The border seems to have affected the Kurds not that much. I as a Anatolian Kurd am thinking about autosomal Tested.

    ReplyDelete
  18. @Daro in compare to Bedoins Kurds and Iranians are much closer to each other. But in compare with other West Asians like Turks, Georgians, Armenians. It gets more clear that Kurds are distinctive. And I also mentioned that those Tested Kurds are all of Southkurdistan(Iraq) and the Kurd from Dodecad is also a Iranian Kurd from East Kurdistan. North Iraq is situated between Anatolia and Iran so no surprise for me. I suspect that the Anatolian, Syrian Kurds might build a continuum from the one Kurd of Dodecad, over Georgians to Armenians, and Turks. So somewhere between those, somehow similar to their geographic Situation.

    ReplyDelete
  19. agit123, don't double- or triple-post. I let it slide because you are new here.

    ReplyDelete

Stay on topic. Be polite. Use facts and arguments. Be Brief. Do not post back to back comments in the same thread, unless you absolutely have to. Don't quote excessively. Google before you ask.