The great thing about researchers putting their data online, like Henn et al. (2011) did, is that they can expect anyone with a computer, a bit of knowledge, and a bit of time, to study it, analyze it, play with it, and perhaps add a little value of their own.
As soon as I realized that there were 30 populations and 587 individuals in this dataset, most of them previously unsampled Africans, I had to get my hands on them and try my Galore approach. This can be summarized as dimensionality reduction via PCA/MDS, followed by MCLUST for an unsupervised clustering of unlabeled individuals with no a priori setting of the number of clusters K. (If you want to try it, instructions here)
As I have explained before, my favorite way of using the Galore method is by iterating over the number of retained MDS dimensions, seeing the optimal K chosen by MCLUST based on the Bayes Information Criterion, and reporting the results for the number of dimensions which produces the highest K. Considering only the first 20 dimensions, there were 42 clusters with 15 retained MDS dimensions.
I have placed a RAR archive of scatterplots of the first 20 dimensions here. Below you can see the first 2 dimensions, which shows a triangle with vertices anchored on Tuscans, San, and the bulk of Sub-Saharan Africans.
Here are the results of the Galore analysis, showing the number of individuals from each population assigned to each cluster.
I would say that the Galore approach had remarkable success in grouping unlabeled individuals into very meaningful clusters:
- Some populations got their own exclusive clusters (e.g., Mandenka, Tuscans, and Mada)
- A few clusters included individuals from related populations, e.g., #12 from two different groups of San, or #26-32 of various types of North and Saharan Africans
- Some populations were split across different clusters; I think it is instructive to see which ones were: the quite diverse San, Hadza, and Sandawe, and also the quite heterogeneous North Africans. In the latter case Arab, Berber, and Sub-Saharan ancestry probably co-exist in various proportions in individuals.