Now, I am revisiting the issue by using MCLUST on the entirety of Howells' dataset, including both the "training" and "testing" dataset. Moreover, in the interest of transparency, I've placed all the necessary code to repeat the analysis online. So, feel free to repeat or improve or my experiment, or indeed shred it to pieces, if you like (see Appendix).
The most interesting part of this analysis for me is the inclusion of several Upper Paleolithic skulls in Howells' testing dataset, and I will show how MCLUST assigns them to very meaningful clusters.
MCLUST on the training data (2,524 skulls with 57 measurements/skull)
Different types of output are included in the rar file, here I will only show the number of skulls from each population assigned to each of the 15 clusters inferred by MCLUST. Note that this is one more cluster than in my previous analysis, because I have included 20 extra Maori skulls.
The 15 clusters might be labeled:
1: Caucasoid, 2: Amerindian, 3: East Asian, 4: Mokapu, 5: Easter Island, 6: Tasmania, 7: Australoid, 8: Ainu, 9: Santa Cruz, 10: Bushman, 11: Andaman, 12: Buriat, 13: Negroid, 14: Moriori/Maori, 15: Eskimo
Note that skulls that fall in populations other than expected are (in part) due to the limitations of the method, and, in part, outliers, some of which were detected by Howells himself.
MCLUST on the training+test data (3,048 skulls with 57 measurements/skull)
Now, let's add the 524 test skulls and repeat the MCLUST analysis with all 3,048 skulls. I recommend looking at the howells.test.txt files in the bundle you can download, because this contains extra information about each skull.
As there are over 189 different populations and individual skulls, I am showing here only the first part of this table, on the training populations. All the data can be found in the download bundle in the frequency_all.csv file.
The 14 clusters inferred in this analysis can be labeled:
1: Linear Caucasoid, 2: Negroid, 3: Amerindian, 4: Moriori/Maori, 5: Santa Cruz, 6: Neandertal, 7: Lateral Caucasoid, 8: Mokapu/Easter Island, 9: Bushman, 10: Australoid, 11: East Asian, 12: Buriat, 13: Andaman, 14: Eskimo
The test data contains various populations as well as many individual skulls. You can look at them in detail in the download bundle, but, here, I will focus on some "famous" skulls.
First of all, notice that cluster #6 is a Neandertal cluster. It includes La Ferassie I, La Chappelle, Skhul V, Shanidar 1, Djebel Irhoud 1. Of course I am aware that there are controversies about some of these skulls, but, MCLUST doesn't seem to have any doubts: they are all placed in cluster #6 with 100% probability and no other skulls have any probability of belonging to this cluster.
Chancelade which was described as Eskimoid in the early literature is assigned to the Linear Caucasoid cluster, so are Predmost III and IV, Mladec 1, and Abri Pataud. The inclusion of Mladec 1, the earliest complete European (>30ky) in the main Caucasoid cluster undermines the idea that Caucasoid morphology developed in the Holocene, or more recent ideas that Eurasians were supposedly undifferentiated as recently as 18,000 years ago.
Cro-Magnon 1 is assigned to the lateral Caucasoid cluster, and so is Afalou-bou-Rhummel 5. It is interesting that the lateral Caucasoid clustered is centered on the population of Berg, described as "Alpine" in the classical sense by Howells, with Alpines being conjectured as being a foetalized evolutionary development of the Upper Paleolithic population by Coon. Cro-Magnon 1 is long-skulled but broad-faced, but its overall suite of measurements places it squarely as a European.
Grimaldi, described by some as Negroid is actually assigned to the Australoid cluster and so is Markina Gora, Djebel Qafzeh 6 and Keilor.
There are many other individual skulls and populations in the data, so feel free to look at it yourself. Also, if you have any other data that has been measured in Howells' standard variables, I'll be happy to include them in an MCLUST analysis.
In order to run the experiment you need to follow these steps:
- Download and install R
- Launch R and in the menu Packages->Install package(s) choose to install the mclust package
- Load the mclust package via the Packages->Load package menu
- Download my code and extract it in the directory of your choice in your computer
- Change the directory in R via the File->Change dir menu
- Enter the command source("code.r") in the command prompt. This will take a while and reproduce a series of files after it runs for a while. You may open the code.r in a text editor to see what exactly it does and/or to modify it.
UPDATE (Dec 7): A new post has Mahalanobis distances between the 14 clusters inferred in the MCLUST analysis.