December 14, 2011

Clusters Galore analysis of West Eurasians

It's been a while since the last Clusters Galore analysis, so I've decided to use my recently assembled dataset and run such an analysis over the individuals who belonged to the Six main West Eurasian components.

Hence, at the beginning, I identified 945 individuals in my set who had more than 95% combined admixture proportions in the Six. Subsequently, I ran MDS on this set, keeping 50 dimensions.

One of the open issues in Clusters Galore analysis is how to choose how many MDS dimensions to retain. So far, I've applied a heuristic by choosing the number of MDS dimensions that maximizes the number of inferred clusters by MCLUST. However, when I actually inspect the MDS plots, it often turns out that meaningful information seems present at even higher number of MDS dimensions. As a result, I've decided to pick the number of dimensions in the following manner.

The main idea is that data points in uninformative MDS dimensions will appear as largely Gaussian noise. So, we can use a test of normality (I've chosen the Shapiro-Wilk test) to detect dimensions that appear not to be noise. Below is the p-value of this test for different MDS dimensions:
Up to 22 dimensions, there is a strong non-Gaussian signal (all p-values less than 0.001). Hence, I would use the first 22 dimensions in MCLUST analysis. With these dimensions, the number of inferred clusters was estimated as 35. So, this is something like a 6-fold increase in resolution over the Six components inferred by ADMIXTURE.

The cluster totals for the different populations can be seen in the spreadsheet.

Important Caveat: Some populations (e.g., Finnish_D, or Turkish_D) have a great number of individuals who do not meet the "95% in the Six" inclusion threshold. Hence, results are not representative for them, and simply indicate the cluster assignment of their subsets that do meet the threshold. You can check whether individuals have been removed from the original dataset by comparing sample sizes in the Clusters Galore spreadsheet with the K12a one.

Here are some observations on the 35 cluster. I will mention the modal population (or region) for each one:
  1. Ashkenazi
  2. Scandinavian
  3. French
  4. British Isles
  5. Armenian
  6. S Italian/Sicilian
  7. Kurd
  8. Greek
  9. Cypriot
  10. Balto-Slavic
  11. Hungarian
  12. Balkan
  13. Sephardic
  14. Spanish
  15. Iberian
  16. North Italian/Tuscan
  17. Morocco Jews (main)
  18. Saudis
  19. Georgian/Abkhazian
  20. Basque
  21. Bedouin
  22. Druze #1
  23. Druze #2
  24. Druze (main)
  25. Mozabite (main)
  26. Mozabite #1
  27. Orkney
  28. Sardinian
  29. Azerbaijan Jews
  30. Iran/Iraq Jews
  31. Lezgins
  32. Morocco Jews #1
  33. Samaritan
  34. Yemen Jews
  35. Abkhazian

26 comments:

Onur said...

How realistic is it to include more than 5% Negroid-admixed populations and individuals in this analysis (Mozabites are the most obvious example) if analyzing exclusively people more than 95% Caucasoid is your aim? It is clear, for example, that the Northwest African component has a significant Negroid element, so it is misleading to treat it as if it is a pure Caucasoid component.

Kurti said...

May I ask, why did you only use two of the Iranians and Iranian_D samples in your Cluster Galore.

Dienekes said...

>> May I ask, why did you only use two of the Iranians and Iranian_D samples in your Cluster Galore.

Read the post re: inclusion criteria.

>> How realistic is it to include more than 5% Negroid-admixed populations and individuals in this analysis (Mozabites are the most obvious example) if analyzing exclusively people more than 95% Caucasoid is your aim?

The aim is not to analyze people who are more than 95% Caucasoid. If that was the aim, then the inclusion criteria would be based on admixture proportions at a lower K where a component that could be labeled Caucasoid had emerged.

bau said...

nice!
I was wondering what Greek_italian means.

Does D means Dodecad project?

Thank you for your job!

Acid said...

And don't exaggerate already, the Negroid element in the Northwest African component is not that significant. I recomend you to see pictures of real ethnic Berbers, they are completely Caucasoid in appearence. Even South Asians (not being between the so called six) are largely Caucasoid and the resemblances are easy to notice by quick inspection.

Also, if you consider, for example, the Gedrosia component as "fully Caucasoid" you'll still be wrong. Or you can take the Caucasus too. The Fst distances show both have extra affinities with other components, what implies, basically, that a substantial amount in both carry such elements, like it or not.

Dienekes said...

I was wondering what Greek_italian means.

Does D means Dodecad project?


Greek_Italian are a couple of individuals that are part Greek and part Italian, and _D means Dodecad project.

Kurti said...

>>Read the post re: inclusion criteria.

Does this mean that only two samples of Iranians fit the criteria and the other ones form their own cluster.

And another question do you believe that Iranian Gene flow might have started from West Asia, more precise from Kurdistan all the way into Central Asia. I mean the first Iranic traces are found in East Anatolia-Northmesopotamia (Mitanni).

Onur said...

And don't exaggerate already, the Negroid element in the Northwest African component is not that significant. I recomend you to see pictures of real ethnic Berbers, they are completely Caucasoid in appearence.

Acid, you are overgeneralizing about Berbers and you are certainly wrong in your inference. There is a great range of variation in physical appearance among Berbers (it is not an ethnicity but an umbrella term for all ethnicities speaking Berber languages) from completely Caucasoid to more Negroid than Caucasoid. Also, we are not dealing with Berbers as a whole but only the Mozabite ethnic group. There are certainly significantly Negroid-admixed Mozabites:

http://www.encore-editions.com/types-alg%C3%A9riens-mozabite-nd

The "Northwest African" component average of the HGDP Mozabites is 82.6% in Dienekes' last ADMIXTURE analysis and their "West African" component average is 6.9% and their "East African" component average is 1.4%. The same Mozabites have a ~25% "African" component (a completely Negroid component) average at K=8 of Zack's Reference 3 ADMIXTURE analysis. That means the Mozabites' "Northwest African" component captures most of their Negroid element. The Negroid element of the "Northwest African" component is also obvious from its Fst distances to the Sub-Saharan African components compared to those of the other components of the Six.

Also, if you consider, for example, the Gedrosia component as "fully Caucasoid" you'll still be wrong. Or you can take the Caucasus too. The Fst distances show both have extra affinities with other components, what implies, basically, that a substantial amount in both carry such elements, like it or not.

The other components of the Six may also have affinities, however small, with non-Caucasoid elements, I am open to that possibility. The problem is that the exact boundary (assuming such a thing exists) between Caucasoid and non-Caucasoid is not clear due to uncertainties and varying levels of admixtures in the distant past of the extant human races.

The aim is not to analyze people who are more than 95% Caucasoid. If that was the aim, then the inclusion criteria would be based on admixture proportions at a lower K where a component that could be labeled Caucasoid had emerged.

Dieneke, what is the best K in your last ADMIXTURE analysis for calculating the amount of the Caucasoid element in a population or individual?

Dienekes said...

Does this mean that only two samples of Iranians fit the criteria and the other ones form their own cluster.

The other ones are not considered in this analysis. They could very well be in the same cluster as the ones included, if e.g., one has 95.1% in the Six components and the other one 94.9%. The inclusion criteria are just a way of identifying a subset of West Eurasian individuals, nothing more than that.

Dienekes said...

And another question do you believe that Iranian Gene flow might have started from West Asia, more precise from Kurdistan all the way into Central Asia. I mean the first Iranic traces are found in East Anatolia-Northmesopotamia (Mitanni).

As I've mentioned somewhere in my other blog, I think that Indo-Iranians broke up in the territory of the BMAC. The Indo-European element in the BMAC, however, came from West Asia.

Whether it had already differentiated itself into something like proto-Indo-Iranian, or that happened in BMAC itself is impossible to determine at present. I am inclined towards the latter view.

Kurti said...

Thanks for your answer Dienekes.

The Kurdish sample size is due more participating from Iran and Iraq biased toward east and south. out of 8 samples only one is fully and another like half from Anatolia. Take DOD 834 who is a Kurd from Turkey.

Do you believe that he would rather fall into the Kurd or different cluster. In my opinion his admixture results show a Kurdish image just having more of the "Mediterranean component and tending more towards West and placing him in East Anatolia.

Acid said...

@Onur

I wasn't refering to the Berbers as whole, I know some are just culturally Berbers and quite Negroid admixed. For this reason I used the term "ethnic", to make the distintion. And of course, between the Mozabite, you can find several of them if you check the v3 population portraits, scoring almost 100% Northwest African. I don't expect those much different in the present analysis, and also, I'm not denying some Negroid element inside, but those who get reports like this should at least look very similar to most Kabyle Berbers (No Negroid traits).

About the mostly Caucasoid componenets, it's not a possibility, it's a clear fact. If you want to think only the Northwest African includes the mentioned elements, well, you are free to do so, but Caucasus, Gedrosia, SW Asian, etc., are not different in regards for this. Just slightly closer to Europe, nothing else. For the rest, they are the same in consonance with their Fst distances. Then, not fully Caucasoid too.

idurar said...

Onur, you are confused. There are no 'negroid' or 'caucasoid' components.
phenotype ≠ components from ADMIXTURE

And Berbers are as a whole, rather homogenous (when you remove the few recently SSA admixed individuals), and are at least 20% African (non-eurasian), Kabyles included. Being partly African doesn't mean less 'caucasoid' than a 100% Eurasian population.

Here, the issue is that the northwest African component, centered on Mozabites, is part African. The interestindg thing is that despite the African shift, it is equally distant to the south Asian than the mediterranean and the southwest asian for instance.

The cluster galore exercice doesn't work for North_African_D because all are less than 95% of the six, since there aren't as inbred as Mozabites.

Andrew Oh-Willeke said...

Seven groups of Jews in 35 clusters! This for a population that is a tiny percentage of the overall population in that region. I guess a diaspora population that admixes with any given local population and then maintains internal stability naturally forms a cluster.

One would expect a very similar pattern for Roma (i.e. the historic era dispora of South Asians), but perhaps they aren't in the data set or have remained a pan-European intermixing population so they didn't coalese into distinct subclusters to the same extent.

The multiple clusters of Druze is even more interesting because the Druze aren't nearly so geographically diverse, although this not too surprising because the Druze are unique genetically in many other respects (e.g. they are the only population with significant percentages of mtDNA hg X1, X2 and X* in the same population, strongly indicating a strong affinity with the source population for mtDNA hg X) and known to have a far amount of community specific substructure.

It is also notable that a lot of the other populations are also low population outliers (e.g. Orkney and Sardinian) isolated by water or mountains or strong ethnic boundaries. You can probably fit something like 97% of the population of the E.U. into just ten of the 35 clusters, even though this is an extremely high level of detail - at this level of detail you'd probably get more than a hundred clusters for the whole world.

To the extent that genetic similarity is a valid basis for political nationalism, admittedly very touchy subject, this analysis also does seem to suggest that the Basque, Northern v. Southern Italian, and Kurdish ethnicities really do have some objective reality that goes beyond hot heads trying to define group membership whether or not they exist, while undermining the argument for the Balkanization of the Balkans.

Onur said...

I wasn't refering to the Berbers as whole, I know some are just culturally Berbers and quite Negroid admixed. For this reason I used the term "ethnic", to make the distintion.

Your personal opinion about who is real Berber and who is fake has no value.

I don't expect those much different in the present analysis, and also, I'm not denying some Negroid element inside, but those who get reports like this should at least look very similar to most Kabyle Berbers (No Negroid traits).

As is clear from Zack's and many others' ADMIXTURE analyses, the HGDP Mozabites are about 25% Negroid on average. Kabyle people are irrelevant to our discussion, we are talking about Mozabites.

About the mostly Caucasoid componenets, it's not a possibility, it's a clear fact. If you want to think only the Northwest African includes the mentioned elements, well, you are free to do so, but Caucasus, Gedrosia, SW Asian, etc., are not different in regards for this. Just slightly closer to Europe, nothing else. For the rest, they are the same in consonance with their Fst distances. Then, not fully Caucasoid too.

As Dienekes proposed, the amount of the total Caucasoid element in an individual or population is more clear at lower Ks, so only by making comparison with the lower Ks we can arrive at any safe estimation about how much non-Caucasoid element a particular component of higher Ks has. But the non-Caucasoid element is obvious - its existence, not its exact amount - in the "Northwest African" component even without looking at the lower Ks of this analysis.

Onur, you are confused. There are no 'negroid' or 'caucasoid' components.
phenotype ≠ components from ADMIXTURE


There is certainly a correlation between ADMIXTURE components and racial phenotypes to a degree, no one can deny that.

And Berbers are as a whole, rather homogenous (when you remove the few recently SSA admixed individuals), and are at least 20% African (non-eurasian), Kabyles included. Being partly African doesn't mean less 'caucasoid' than a 100% Eurasian population.

The "African" component of K=8 of Zack's Reference 3 ADMIXTURE analysis totally or almost totally represents the Negroid genetic element and is thus non-Caucasoid. BTW, how do you know how homogeneous Berbers are?

The cluster galore exercice doesn't work for North_African_D because all are less than 95% of the six, since there aren't as inbred as Mozabites.

The "Northwest African" component of Dienekes' last analysis is largely the result of the inbredness of at least some of the Mozabite samples. A more detailed and Mozabite-focused analysis can disentangle the Negroid element of the HGDP Mozabites even at higher Ks.

Acid said...

You are confused, it's not what I consider. Download the Mozabite population portraits from the v3 run, and you'll see some of them near 100% Northwest African. Forget about the most admixed individuals and averages, those I mentioned are a very good genetic aproximation. Kabyles are just and example of how more or less they could look like.

The non Caucasoid element it is also obvious in the rest of the components, you seem to be the only one who does not see it. Also, it depends on the percents scored. 2.5% Northwest African has much less non Caucasoid inside than 25% Caucasus, and we can go on again and again with aproximations like this. Doesn't matter if the example is exact or not, just notice that according to the Fst distances only Mediterranean seems to be a fully Caucasoid one, all the rest start deviating from it. It's not possible to calculate the exact affinities with these analysis, ok, but it doesn't mean we cannot know about their existance.

Well, I don't see the point claiming such things about Northwest African considering all these things. The component is mostly Eurasian in regards for affinities, nothing more to argue.

idurar said...

There is certainly a correlation between ADMIXTURE components and racial phenotypes to a degree, no one can deny that.

Not here, especially since most Berbers wouldn't be out of place phenotypically in the middle east and some even in Europe. So much for a 'quadroon' population.
That's retarded to associate non-eurasian (African) ancestry with the word 'negroid'.

BTW, how do you know how homogeneous Berbers are?

Just take Moroccans, Mozabites and North_African_D (Moroccans, Algerians, Tunisians), remove the 3 Moroccan outliers and the Mozabites outliers.
In the past analyses, Dienekes had Sahrawis, southern Moroccans, northern Moroccans, Algerians and Tunisians analysed: the group wasn't very heterogenous when the outliers were removed.

The "Northwest African" component of Dienekes' last analysis is largely the result of the inbredness of at least some of the Mozabite samples.

I already know that and I told Dienekes to do something and as he said, he will 'defer' the details about the dataset for the v4. I also told David (Eurogenes) to do something and he kept only 5 Mozabites (the most outbred) for his analyses.

anthrospain said...

Dienek, will you show individual results for this Galore ?

Kurti said...

Just my opinion.
I think the term "South European" for "Mediterranean" would be much better cause obviously this component on K12a peaks in South Europe and is strong in Central-North(west) Europe Just like North European is also strong in South Europe. Outside of Europe it hardly goes over 15% which can be explained with general influence.

Andrew Oh-Willeke said...

One other point. The use of Gaussian noise to create a non-subjective tests that establishes how many clusters to use is really methodological gold. One of the fundamental issues of cluster analysis is to operationally define the line between meaningful clusters and unmeaningful divisions that amount to noise. This methods is a very convicing way to draw that line.

Onur said...

The subject of my discussion is Mozabites, not any other Berber ethnic group, so please do not distort the discussion by mentioning them. The HGDP Mozabites are about 25% Negroid (=Black), not 25% African, and African is not a racial term but a geographical term; otherwise we would have to consider North Africans and Sub-Saharans as from the same race.

"Inbredness" of some Mozabites does not mean that "outbred" Mozabites are bred with non-Mozabites. "Inbred" Mozabites may just be a small subset of the total Mozabite population.

As I explained, the boundary between Caucasoid and non-Caucasoid is not clear, and this is especially so in the transition zone between West Asia and South Asia, as South Asians invariably have West Asian ancestry while the opposite is not the case. There is also the problem of isolated extreme populations. Are Basques and Sardinians the most Caucasoid or do they just seem so because of their strong isolation in the extremities of Europe?

dalouh said...

@ Onur
"Also, we are not dealing with Berbers as a whole but only the Mozabite ethnic group. There are certainly significantly Negroid-admixed Mozabites:

http://www.encore-editions.com/types-alg%C3%A9riens-mozabite-nd"

please take a good look at these Mozabite individuals..

http://farm6.static.flickr.com/5066/5587265156_1857ba1fdb.jpg

http://3.bp.blogspot.com/_odjI7dH0ZFA/TEMe-FetLyI/AAAAAAAAAN8/_WSX951ldhs/s1600/DSC08702.JPG

https://lh5.googleusercontent.com/_g4O6pdLOYZ4/S2n_OZ2mK0I/AAAAAAAAA-Y/lVqJnKFJNkU/s640/ghardaia%20oasis%20Mohamed%20flowers.jpg

https://lh3.googleusercontent.com/_y9YO5PqnQ70/TDV3PIM91WI/AAAAAAAALok/_9lSh3pB_Kk/s720/20100703MV_61.JPG

6a2dab90-27b3-11e1-adc0-000bcdcb471e said...

The idea that the med component is he only west eurasian one is ridiculous. others might be close to non west eurasian components due to later seperation but they are also fully west eurasian. saying med is the only one reflects your agenda.

Dienekes said...

>> Dienek, will you show individual results for this Galore ?

No, all new results will be posted at the Dodecad blog.

Onur said...

Dalouh, what is your point? I gave the Negroid element percentage of the HGDP Mozabites. I do not think Mozabites' physical appearance (I have seen many others on the internet) is incompatible with these genetic data, as they are still about 75% Caucasoid according to them.

anthrospain said...

It's interesting the first time we see two separate clusters for Iberia, we have now one that has some spaniards and some french, seems like southern french/Spanish