September 21, 2005

Human population genetic structure

I was re-reading a classic paper [1] by Wilson et al. which first used the model-based software STRUCTURE program to cluster human populations. This approach was later used by Rosenberg et al. [2] with many more populations and markers. The following two tables from the paper are quite useful. The first table shows (right column) the probability of the number of clusters K given the data.

Image Hosted by ImageShack.us

As you can see, this probability is ~1 for K=4. Contrary to often repeated claims, the number of subdivisions ("races") of a group of individuals is not arbitrary, but for a set of individuals some numbers (in this case 4) are much better than others. Of course with more markers or larger samples, some of these clusters may be further refined, but the basic structure would not change. An alternative clustering with say 2 or 3 clusters would not emerge.

The second table shows that human populations usually fall within the clusters that correspond to the classical anthropological racial categories.

Image Hosted by ImageShack.us

It is also interesting that the Ethiopians belong in the Caucasoid cluster A and also in the Negroid cluster C. The Ethiopians don't "fit well" in the 4-race scheme, but this is a fact that was also appreciated by traditional anthropology. In all likelihood, both ancient links between Proto-Eurasians and East Africans and recent migrations of Caucasoids into East Africa are responsible for Ethiopian intermediacy.

Admixture analysis using K clusters summarizes the genetic structure of populations and individuals with K numbers adding up to 1, i.e., with K-1 degrees of freedom. But, they cannot distinguish between similarity deriving from common descent, or from recent admixture.

For example, Kazakhs and South Asians both score highly for European and Asian ancestry in Ancestry By DNA type tests. But, in the case of the former, this is due to admixture between Caucasoids and Mongoloids in Central Asia, whereas in the latter it is due to admixture between Caucasoids and Proto-Asians, i.e., non-Mongoloid people sharing common descent with East Asians.

This is why autosomal markers are useful for determining overall (genomic) similarity, but we have to turn to haploid markers such as mtDNA and the Y chromosome to interpret this similarity. Such markers can be tied to regions and times of origin and can thus be used to determine the actual processes of expansion and admixture that have led to the observable genetic variation.

[1] J.F. Wilson et al. Nature Genetics 29, 265 - 269 (2001)
[2] N.A. Rosenberg et al. Science, Vol 298, Issue 5602, 2381-2385

No comments: