December 05, 2010

World craniometric analysis with MCLUST revisited

I used MCLUST to cluster Howells' world craniometric dataset back in 2004

Now, I am revisiting the issue by using MCLUST on the entirety of Howells' dataset, including both the "training" and "testing" dataset. Moreover, in the interest of transparency, I've placed all the necessary code to repeat the analysis online. So, feel free to repeat or improve or my experiment, or indeed shred it to pieces, if you like (see Appendix).

The most interesting part of this analysis for me is the inclusion of several Upper Paleolithic skulls in Howells' testing dataset, and I will show how MCLUST assigns them to very meaningful clusters.

Let's start:

Part I
MCLUST on the training data (2,524 skulls with 57 measurements/skull)

Different types of output are included in the rar file, here I will only show the number of skulls from each population assigned to each of the 15 clusters inferred by MCLUST. Note that this is one more cluster than in my previous analysis, because I have included 20 extra Maori skulls.


The 15 clusters might be labeled:

1: Caucasoid, 2: Amerindian, 3: East Asian, 4: Mokapu, 5: Easter Island, 6: Tasmania, 7: Australoid, 8: Ainu, 9: Santa Cruz, 10: Bushman, 11: Andaman, 12: Buriat, 13: Negroid, 14: Moriori/Maori, 15: Eskimo

Note that skulls that fall in populations other than expected are (in part) due to the limitations of the method, and, in part, outliers, some of which were detected by Howells himself.

Part II
MCLUST on the training+test data (3,048 skulls with 57 measurements/skull)

Now, let's add the 524 test skulls and repeat the MCLUST analysis with all 3,048 skulls. I recommend looking at the howells.test.txt files in the bundle you can download, because this contains extra information about each skull.



As there are over 189 different populations and individual skulls, I am showing here only the first part of this table, on the training populations. All the data can be found in the download bundle in the frequency_all.csv file.

The 14 clusters inferred in this analysis can be labeled:

1: Linear Caucasoid, 2: Negroid, 3: Amerindian, 4: Moriori/Maori, 5: Santa Cruz, 6: Neandertal, 7: Lateral Caucasoid, 8: Mokapu/Easter Island, 9: Bushman, 10: Australoid, 11: East Asian, 12: Buriat, 13: Andaman, 14: Eskimo

The test data contains various populations as well as many individual skulls. You can look at them in detail in the download bundle, but, here, I will focus on some "famous" skulls.

First of all, notice that cluster #6 is a Neandertal cluster. It includes La Ferassie I, La Chappelle, Skhul V, Shanidar 1, Djebel Irhoud 1. Of course I am aware that there are controversies about some of these skulls, but, MCLUST doesn't seem to have any doubts: they are all placed in cluster #6 with 100% probability and no other skulls have any probability of belonging to this cluster.

Chancelade which was described as Eskimoid in the early literature is assigned to the Linear Caucasoid cluster, so are Predmost III and IV, Mladec 1, and Abri Pataud. The inclusion of Mladec 1, the earliest complete European (>30ky) in the main Caucasoid cluster undermines the idea that Caucasoid morphology developed in the Holocene, or more recent ideas that Eurasians were supposedly undifferentiated as recently as 18,000 years ago.

Cro-Magnon 1 is assigned to the lateral Caucasoid cluster, and so is Afalou-bou-Rhummel 5. It is interesting that the lateral Caucasoid clustered is centered on the population of Berg, described as "Alpine" in the classical sense by Howells, with Alpines being conjectured as being a foetalized evolutionary development of the Upper Paleolithic population by Coon. Cro-Magnon 1 is long-skulled but broad-faced, but its overall suite of measurements places it squarely as a European.

Grimaldi, described by some as Negroid is actually assigned to the Australoid cluster and so is Markina Gora, Djebel Qafzeh 6 and Keilor.

There are many other individual skulls and populations in the data, so feel free to look at it yourself. Also, if you have any other data that has been measured in Howells' standard variables, I'll be happy to include them in an MCLUST analysis.

Appendix

In order to run the experiment you need to follow these steps:
  1. Download and install R
  2. Launch R and in the menu Packages->Install package(s) choose to install the mclust package
  3. Load the mclust package via the Packages->Load package menu
  4. Download my code and extract it in the directory of your choice in your computer
  5. Change the directory in R via the File->Change dir menu
  6. Enter the command source("code.r") in the command prompt. This will take a while and reproduce a series of files after it runs for a while. You may open the code.r in a text editor to see what exactly it does and/or to modify it.
I have bundled Howells' data in the RAR file, but you can just as well download it from the repository instead.

UPDATE (Dec 7): A new post has Mahalanobis distances between the 14 clusters inferred in the MCLUST analysis.

10 comments:

ashraf said...

If I am not mistaken the first table shows that African Egyptians' skulls are from the same type of the European norses(cluster 1)and different from the African Zulus and Bushmen skulls(cluster 13) although 4 Egyptians and 1 Norse cluster with the Africans!!!

Andrew Oh-Willeke said...

So, is it a correct reading of the post to say that the MCLUST program assigns Upper Paleolithic anatomically modern human skulls from Europe and the Near East to three different clusters?

Dienekes said...

If I am not mistaken the first table shows that African Egyptians' skulls are from the same type of the European norses(cluster 1)and different from the African Zulus and Bushmen skulls(cluster 13) although 4 Egyptians and 1 Norse cluster with the Africans!!!

Yes, with Zalavar too, but remember that this is a single Egyptian series and there is variability in the ancient Egyptian osteological material.

So, is it a correct reading of the post to say that the MCLUST program assigns Upper Paleolithic anatomically modern human skulls from Europe and the Near East to three different clusters?

Well I did not go through all 189 groups painstakingly, but, I'd say the skulls I've seen are classified as Linear Caucasoid, Lateral Caucasoid, Australoid, and Neandertal mostly.

The Australoid category probably can be attributed to the great robusticity of these early skulls, coupled with their linearity, as Australo-Melanesian skulls are very linear and robust.

So, while I'd say that not all Upper Paleolithic West Eurasians looked like recent Europeans overall, many of them did. The fact that so many did is incompatible with them still living as undifferentiated Eurasians as some researchers have suggested.

Also, note that I see absolutely no evidence that the Upper Paleolithic skulls form a unit. For example, the Upper Cave skull has Polynesioid affinities (I believe this was noted by Howells too, although I don't remember clearly).

C Bard said...

I read that Howells would often transform the measurements into "C-scores" to minimize the effects of size differences among the skulls. Have you tried this (or any other size correction method), and if so, were your results affected in any significant way?

Dienekes said...

C-scores are a cludge, and, yes, I've tried them back in 2004 but they are not needed for an algorithm like MCLUST that can adapt to clusters of different size/shape/orientation.

Even Z-scores are not really needed. If you repeat the experiment without Z-scores you'll still get pretty much the same clusters, but, if I remember correctly there are 1-2 sex-specific clusters that also appear. I only use Z-scores to normalize male and female skulls.

aargiedude said...

This is really outstanding. What would be nice now is to be able to somehow measure the relative distances of these clusters from each other.

Ant said...

It would be interesting to find out into which cluster the Hofmeyr skull (South Africa 30000 BC) fits.

aargiedude said...

It would be interesting to find out into which cluster the Hofmeyr skull (South Africa 30000 BC) fits.

It would indeed. Surely there are all sorts of detailed measurements done on that skull, if they were publicly available, Dienekes could add this skull on his own to the collection. [hint :) ]

astenb said...

Are there samples in the database? Can they be added to the run:

-Nubian C,X group.
-Jebel Moya
-Haya
-Dahomy
-US negro

Dienekes said...

astenb, look at the results files included in the RAR file, as some of these are already included.

Dahomey, for example is 100% in the Teita-Zulu cluster #2.

1 Jebel Moya skull is classified as Linear Caucasoid and 2 as Negroid.