November 30, 2010

Cluster galore: re-analysis of Behar et al. (2010) data

I have re-analyzed the data of Behar et al. (2010) using my Clusters Galore method. See my previous post on the HGDP panel for some technical details.

Here are the 47 clusters of the optimal mclust solution over the MDS representation retaining 26 dimensions:

Each row has the number of individuals who are mapped to each of the 47 clusters. Here are a few comments:

The discovery that Jewish populations can be subdivided into numerous clusters is not inconsistent with Behar et al. (2010) and their observation of the existence of three major clusters in Jewish populations. This is a difference of detail.

Most clusters strongly map to single populations; many populations with "tribal" traditions and high levels of sanguinity are split into multiple clusters, suggesting the existence of sub-structure in them. And, there are a few clusters spanning several populations, such as #1 (Balto-Slavic), #22 (Syrians-Jordanians-Lebanese), #29 (Ethiopians and Ethiopian Jews), #25 (Romanians and Hungarians), #31 (Iranian and Iraqi Jews).

November 29, 2010

Clusters galore in HGDP panel

For background on this type of analysis, please read:

I've taken the Stanford HGDP dataset and extracted the markers common to it and to HapMap-3, Behar et al. (2010), Rasmussen et al. (2010) and the 23andMe v2 genotyping platform, or about 500k SNPs in total (I removed C/G and A/T SNPs as a precaution and flipped strand in discordant ones to the HapMap-3 standard when it differed from that of HGDP).

I removed SNPs with less than 99% genotyping rate in any of the four data sources, and about 434k SNPs were retained. Subsequently I applied linkage disequilibrium-based pruning on the HGDP set (PLINK parameter: --indep-pairwise 50 5 0.3) resulting in a final dataset of about 177k SNPs. In all analyses of the HGDP set, I followed the recommendations of Rosenberg et al. (2006) keeping the 940 individuals in common between his 952-individual panel and the Stanford data.

Subsequently I ran multidimensional-scaling (MDS) on the 940 individual/57 population/177k SNP set in PLINK, and then I applied model-based clustering as implemented in mclust over the first 42 MDS dimensions, with a maximum number of clusters = 70. In total there were 64 clusters in the optimal solution suggested by mclust (*)

Before I give the results, it might be worth looking at the pairwise MDS scatterplots for just the first 5 dimensions:

As you can see, clusteredness emerges in different dimensions. Rather than inspecting innumerable 2D combinations visually (and indeed we should 3D, etc. as well, because clusters might emerge in 3D and higher subspaces that are not discernible in 2D projections), we let mclust iterate over k, the number of clusters, and different shapes, orientations, and volumes of clusters, using the well-known EM algorithm together with the Bayes Information Criterion to choose a good solution that maximizes detail without sacrficing parsimony.

Below you can see how many individuals are assigned to each of the 64 clusters from each of the 57 populations:

This is rather astonishing. There are many clusters with 100% correspondence to HGDP populations. A few populations, mostly from regions with high levels of inbreeding are split into multiple sub-clusters, perhaps reflecting some type of tribal affinity. And, there are a few populations, such as Tuscans and North Italians that are not split. But, the fact that this was inferred from unlabeled individuals is remarkable.

I remember reading Rosenberg et al. (2002), "The genetic structure of human populations" (pdf) which used structure, a model-based algorithm on raw genetic data to infer the existence of 6 clusters corresponding to continental populations. How is it that so much more detail can be achieved today?

There are three reasons: First, dense genotyping data are much better than the few hundreds of microsatellites used by Rosenberg et al. (2002). Second, the use of dimensionality reduction in the form of MDS allowed us to remove most of the "noise" in the genotyping data and focus on dimensions capturing a lot of distinctions. Third, the use of a sophisticated clustering algorithm such as mclust which can adapt to clusters of different shape, size, and orientation without human input was able to produce this result. mclust is computationally expensive, but it works like a charm (in a few minutes) with a few dozen dimensions and about a thousand individuals, producing a clustering of obviously good value.

How to repeat the experiment

If anyone wants to repeat this experiment they can do it easily. After you've managed to put the HGDP data into PLINK ped/map format, say in files HGDP.ped and (or any other data for that matter), just run

> plink --cluster --mds-plot d --file HGDP

Where d is the number of dimensions you want to retain. This produces a plink.mds file in which there is a header line, and each each line after that corresponds to an individual: the individual's projection in the first d dimensions are in columns number 4 to d+3.

Then, in R, after you install and load the mclust package (see the MCLUST page for limitations on its use and licensing information), you just run:

> MDS <- read.table("plink.mds", header=T)
> maxclust <- 70
> MCLUST <- Mclust(MDS[, 4:(d+3)], G=1:maxclust)

where maxclust is the maximum number of clusters you want to consider.

Then, if you run:


you will see a table in which each line corresponds to an individual and each column to the probability that it belongs to the i-th cluster.

There's much more that you can do in R with the mclust package, but this is enough for anyone wanting to repeat the experiment in its basic form.

(*) The number of clusters in the optimal solution varied between 11 with 2 dimensions retained and 64 with 42 dimensions retained. There was a secondary maximum of 60 clusters with 30 dimensions retained; choosing more dimensions than 42 (up to 50 that I examined), also resulted in a very high number of clusters, but I've decided to keep the one with 42 dimensions and 64 clusters as it is enough to serve the purpose of this post.

Human effective sex ratio: different at different time scales

The authors manage to harmonize the seemingly contradictory results of Keinan et al. and Hammer et al.

From the paper:
Recently, two studies estimated Q in order to detect sexbiases in similar human populations16,17 and found seeminglycontradictory conclusions.25 Using SNP data fromthe International HapMap Project,26 Keinan et al. found evidence for a male bias during the dispersal of modern humans out of Africa (Figure 1A).17 Hammer and colleagues, however, found evidence for a female biasthroughout human history in six populations from theHuman Genome Diversity Panel (HGDP) (Figure 1A).16

This figure from the paper shows the model inferred by the authors which resolves the seeming contradiction.

They write:
Long-term sex-biased processes, such as polygyny or higher female dispersal rates in ancestral human populations,likely caused the Qπ estimates found by Hammer et al.
The male bias detected by Keinan et al. can be explained by a recent event associated with the out-of-Africa dispersal, as initially proposed by the authors. The Q ratios detected by Keinan et al. suggest a very strong male bias for the entire portion of the non-African lineage before the split of Asians from Europeans.

I am not entirely convinced of this explanation. The authors' model suggests a higher male/female ratio in Eurasians than in Africans due to male bias in the Eurasian lineage against an ancestral background of high female/male ratio (due to polygyny).

But, an alternative explanation is that the higher female/male ratio in Africans is due to the fact that they are descended from a relatively small number of males who overwhelmed the pre-existing African gene pool.

There are reasons to believe this is the case: Africa has the deepest lineages in the human Y-chromosome phylogeny (A and B), but the balance is made of entirely of haplogroup E chromosomes, the sister clade of Eurasian D. The extremely diverse Eurasian haplogroup F is represented only by some subclades in Africa, due to back-migration.

So, while Eurasian males are descended from the expansion of F and DE males, African males are largely descended from the expansion of E males. These are the Afrasians I've often spoken of, the common ancestors of Eurasians and Africans. In Africa, the Afrasians could take the women of the Paleo-Africans, but Eurasia was largely empty land, and the Eurasians could only take the women they've brought with them.

The American Journal of Human Genetics, 24 November 2010

Estimators of the Human Effective Sex Ratio Detect Sex Biases on Different Timescales

Leslie S. Emery

Determining historical sex ratios throughout human evolution can provide insight into patterns of genomic variation, the structure and composition of ancient populations, and the cultural factors that influence the sex ratio (e.g., sex-specific migration rates). Although numerous studies have suggested that unequal sex ratios have existed in human evolutionary history, a coherent picture of sex-biased processes has yet to emerge. For example, two recent studies compared human X chromosome to autosomal variation to make inferences about historical sex ratios but reached seemingly contradictory conclusions, with one study finding evidence for a male bias and the other study identifying a female bias. Here, we show that a large part of this discrepancy can be explained by methodological differences. Specifically, through reanalysis of empirical data, derivation of explicit analytical formulae, and extensive simulations we demonstrate that two estimators of the effective sex ratio based on population structure and nucleotide diversity preferentially detect biases that have occurred on different timescales. Our results clarify apparently contradictory evidence on the role of sex-biased processes in human evolutionary history and show that extant patterns of human genomic variation are consistent with both a recent male bias and an earlier, persistent female bias.


Y-chromosomes of Niger-Congo groups

Interesting that African farmers, like their European counterparts seem to have dispersed rapidly at the beginning. Hopefully, ancient DNA analysis in Europe will be able to discover their Y-chromosomes, as the inference from modern populations is not as clear-cut as in the (more recent) spread of Bantu farmers.

Mol Biol Evol (2010) doi: 10.1093/molbev/msq312

Y-chromosomal variation in Sub-Saharan Africa: insights into the history of Niger-Congo groups

Cesare de Filippo et al.


Technological and cultural innovations, as well as climate changes, are thought to have influenced the diffusion of major language phyla in sub-Saharan Africa. The most widespread and the richest in diversity is the Niger-Congo phylum, thought to have originated in West Africa ∼10,000 years ago. The expansion of Bantu languages (a family within the Niger-Congo phylum) ∼5,000 years ago represents a major event in the past demography of the continent. Many previous studies on Y chromosomal variation in Africa associated the Bantu expansion with haplogroup E1b1a (and sometimes its sub-lineage E1b1a7). However, the distribution of these two lineages extends far beyond the area occupied nowadays by Bantu speaking people, raising questions on the actual genetic structure behind this expansion. To address these issues, we directly genotyped 31 biallelic markers and 12 microsatellites on the Y chromosome in 1195 individuals of African ancestry focusing on areas that were previously poorly characterized (Botswana, Burkina Faso, D.R.C, and Zambia). With the inclusion of published data, we analyzed 2736 individuals from 26 groups representing all linguistic phyla and covering a large portion of Sub-Saharan Africa. Within the Niger-Congo phylum, we ascertain for the first time differences in haplogroup composition between Bantu and non-Bantu groups via two markers (U174 and U175) on the background of haplogroup E1b1a (and E1b1a7), which were directly genotyped in our samples and for which genotypes were inferred from published data using Linear Discriminant Analysis on STR haplotypes. No reduction in STR diversity levels was found across the Bantu groups, suggesting the absence of serial founder effects. In addition, the homogeneity of haplogroup composition and pattern of haplotype sharing between Western and Eastern Bantu groups suggest that their expansion throughout Sub-Saharan Africa reflects a rapid spread followed by backward and forward migrations. Overall, we found that linguistic affiliations played a notable role in shaping sub-Saharan African Y chromosomal diversity, although the impact of geography is clearly discernible.


November 27, 2010

Clusters galore: extremely fine-scale ancestry inference

By way of introduction, here is the command that literally made me jump from my seat:
> MCLUST <- Mclust(X,G=1:36)
Warning messages:
1: In summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) :
best model occurs at the min or max # of components considered
2: In Mclust(X, G = 1:36) :
optimal number of clusters occurs at max choice
It may look like gibberish, but this is what happened when I tried to apply Model-based clustering as implemented in the R package mclust, over the first few dimensions of Multidimensional Scaling (MDS) of my standard 36-population, 692-individual dataset I have been using in the Dodecad Project.

But, let's take the story from the beginning...

The basic idea

When we look at an MDS or PCA plot, like the following MDS plot of the 11 HapMap-3 populations, it is obvious that individuals form clusters.

Here are dimensions 1 and 2:
West and East Eurasians form a cluster, and Africans form an elongated cluster towards West Eurasians. Gujaratis and Mexicans overlap between West and East Eurasians.

Here are dimensions 2 and 3:
Here, the Gujarati are shown to be quite different from the Mexicans.

We can use a standard clustering algorithm such as k-means to infer the existence of these clusters. This has two benefits:
  • We don't have to visually inspect an exponential number of 2D scatterplots
  • We can put some actual numbers on our visual impression of the existence of clusters
Actually, k-means is not a very good way to find clusters. For two reasons:
  • You have to specify k. But, how can you know which k is supported by the data, unless you look at an exponential number of 2D scatterplots?
  • k-means, using the Euclidean distance measure prefers "spherical" clusters. But, as you can see, some populations, especially recently admixed ones form elongated clusters, stretched towards their two (or more) ancestral populations.
I had previously used mclust, a model-based clustering algorithm to infer the existence of 14 different clusters in a standard worldwide craniometric dataset. This was 6 years ago, and only recently have geneticists been able to reach that level of resolution with genomic data.

But, for assessing ancestry, genomic data are obviously much better than craniometric ones: the latter reflect both genes and environmental/developmental factors.

So, while 6 years ago I had neither the computing power nor the data to push the envelope of fine-scale ancestry inference, today that's possible.

What mclust does (in a nutshell)

mclust has many bells and whistles for anyone willing to study it, but the basic idea is this: the program iterates between different k and different "forms" of clusters (e.g., spherical or ellipsoidal) and finds the best one.

Best is defined as the one that maximizes the Bayes Information Criterion. Without getting too technical, this tries to balance the "detail" of the model (how many parameters, e.g., k) it has, with its parsimony (how conservative it is in inferring the existence of phantom clusters).

How to combine mclust with PCA or MDS

mclust does not work on 0/1 binary SNP data; it needs scalar data such as skull measurements. However, that's not a problem, because you can convert 0/1 (or ACGT) SNP data into scalar variables using either MDS or PCA.

From a few hundred thousand SNPs, representing each individual, you get a few dozen numerical values placing the individual along each of the first few dimensions of MDS or PCA.

You can then run mclust over that reduced-dimensional representation. This is exactly what I attempted to do.

Clusters galore in HapMap-3 populations

I had previously used ADMIXTURE to infer admixture in the HapMap-3 populations, reaching K=9. So, naturally, I wanted to see whether the approach I just described could do as well as ADMIXTURE.

I used about 177k SNPs after quality-control and Linkage-disequilibrium based pruning and ran MDS as implemented in PLINK over a set of 275 individuals, 25 from each of the 11 HapMap-3 populations. I kept 11 dimensions, equal to the number of populations.

MDS took a few minutes to complete. Subsequently I ran mclust on the 275 individuals, allowing k to be as high as 11. Thus, if there were as many clusters as populations, I wanted mclust to find them. mclust finished running in a second. Here are the results (population averages):
The software esssentially rediscovered the existence of 10 different populations in the data, but was unable to split the Denver Chinese from the Beijing Chinese. Notice also a mysterious low-frequency component in the Maasai reminiscent of that which appeared in the previous ADMIXTURE experiment.

A question might arise why most of these populations look completely unadmixed? Even the Mexicans and African Americans get their own cluster. This is due to mclust's ability to use clusters of different shape. In particular, the "best" model was the one called "VVI", which allows for diagonal clusters of varying volume. In short, the software detected the presence of the elongated clusters associated with the admixed groups.

Indeed, the approach I am describing is not really measuring admixture. It is quantifying the probability that a sample is drawn from each of a set of inferred populations. Hence it is not really suitable for recently admixed individuals, but works like a charm in guessing the population labels of unlabeled individuals.

Clusters galore in Eurasia

Let's now see what clusters are inferred in the 36-population 692-individual dataset I commonly use in the Dodecad Project. This is done with 177k, 36 MDS dimensions retained, and allowing k to be as high as 36. This is what made me jump off my seat, and since I don't have enough colors to represent it, I'll put it in tabular form:

I could hardly believe this when I saw it, but the conclusion is inescepable: dozens of distinct populations can be inferred from unlabeled data of individuals that largely correspond, by a posteriori inspection to the individuals' population labels.

UPDATE: The above table has the average probabilities for the 36 clusters, but a better way might be to look at how many individuals are assigned to each cluster from each population:

For example, out of the 28 French individuals, 23 are assigned to cluster #1 (the French-CEU cluster), and 5 to cluster #3 (the North-Italian/Spanish/Tuscan cluster).

Some interesting observations:
  1. Some populations (e.g., CEU and French, or Belorussians and Lithuanians) remain unsplit even at K=36.
  2. Some populations are split into multiple components (e.g., Sardinians into 2)
  3. Some mini-clusters emerge (e.g., 4 clusters in Maasai, each of them corresponding to 8% of 25 = 2 individuals). These may correspond to pairs of relatives or very genetically close individuals.
Quantifying uncertainty

Naturally, we want to be able to assess how good a particular classification is. Fortunately, this is easy to do with mclust and its uncertainty feature. Looking at my 692-individual dataset, 687 have a less than 5% uncertainty level, and 682 have less than 1%. I did not inspect these fully, but some of them are "borderline" individuals who might belong on several components, e.g., a Frenchman who could either go to the CEU-French cluster #1 (36% probability) or the North/Central Italian-Spanish cluster #3 (64% probability).

Here is a dendrogram of the 36 components:

What does it all mean?

What this means, in short, is that the day of extremely fine-scale ancestry inference has arrived. We already had premonitions of this in the ability of researchers to place individuals within a few 100km of their place of birth in Europe. Now, it is clear that model-based clustering + MDS/PCA can infer ethnic/national identity, or something quite close to it.

This is obviously just the beginning. I allowed K to vary from 1 to 36, not really hoping that the optimal number of clusters would be 36. This raises the question: more than 36?



I have followed up on this exciting new technique in the Dodecad Project blog:

November 26, 2010

ADMIXTURE on the shores of the Indian Ocean

I have applied Multidimensional Scaling and ADMIXTURE on a dataset of 15 populations:
Cambodians, Papuan, NAN_Melanesian, Gujarati, Malayan, Paniya, North_Kannadi, Sakilli, Singaporean Indians, Singaporean Chinese, Singaporean Malay, Yemenese, Saudis, Maasai, Ethiopians
These were collected from HGDP, Behar et al. (2010), HapMap-3, and the Singapore Genome Variation Project. There are 423 individuals in general (I've used samples of 25 individuals from the HapMap populations).

Here is the MDS plot:

At the bottom are the Papuans, relatively unadmixed Australoids. Close to them, but deviating towards East Eurasians are the NAN Melanesians; these are the Nasioi, Papuan speakers from Bougainville, which they inhabit together with Austronesian speakers.

At the top left are the Singaporean Chinese (CHS) who are Mongoloids. Deviating from them towards Indians are the Cambodians, a Southeast Asian group which according to physical anthropology is a basically Mongoloid population, but admixed with a pre-Mongoloid southern population element similar to that which has been preserved in India. Similar to them are the Singaporean Malay (MAS), another population that is basically Mongoloid but has absorbed Indian-like population elements.

The Singaporean Indians (INS), the North Kannada, the Sakilli and the Gujarati (GIH25) form the third population element in the region of interest.

The other two are the Caucasoids, represented here by the Saudis, with the Yemenese spread toward Africa and the more Caucasoid-admixed Ethiopians and the relatively unadmixed Maasai (MKK25).

These are the main population elements of our region of interest: Ethiopids and Australoids framing the Ocean on the west and east; the South Asians occupying India, and the Mongoloids occupying Southeast Asia, having absorbed the Indian-like former inhabitants of the region.

Here is a blowup of the middle part of the MDS plot, focusing on the Indians:
It's fairly clear that North Kannada and Sakilli (South Indians) occupy a place that is furthest from Caucasoids, while Gujarati and Singaporean Indians are positioned towards Caucasoids (to the top-right).

Let's now turn to ADMIXTURE to confirm the visual impression from the MDS:

Notice the following components:
  1. Light blue, Indian
  2. Dark blue, East African
  3. Light green, Southeast Asian
  4. Dark green, Chinese Mongoloid
  5. Pink, Arabian Caucasoid
  6. Red, Australoid
Finally, here is the table of Fst distances between these 6 inferred components:

Notice the small distance (0.023) between Chinese and Southeast Asian Mongoloids. The Indian component is equidistant between Caucasoids and Mongoloids, but as the MDS plot makes clear, and as the study of Y-chromosome and mtDNA polymorphisms have shown, the distinctive component in Indians is sui generis and not the result of admixture between Caucasoids and Mongoloids. And, finally, the Australoid component is clearly distant from all of the above.

Lexical borrowing in the history of Indo-European languages

This is an open access paper.

Proc. R. Soc. B doi: 10.1098/rspb.2010.1917

Networks uncover hidden lexical borrowing in Indo-European language evolution

Shijulal Nelson-Sathi et al.


Language evolution is traditionally described in terms of family trees with ancestral languages splitting into descendent languages. However, it has long been recognized that language evolution also entails horizontal components, most commonly through lexical borrowing. For example, the English language was heavily influenced by Old Norse and Old French; eight per cent of its basic vocabulary is borrowed. Borrowing is a distinctly non-tree-like process—akin to horizontal gene transfer in genome evolution—that cannot be recovered by phylogenetic trees. Here, we infer the frequency of hidden borrowing among 2346 cognates (etymologically related words) of basic vocabulary distributed across 84 Indo-European languages. The dataset includes 124 (5%) known borrowings. Applying the uniformitarian principle to inventory dynamics in past and present basic vocabularies, we find that 1373 (61%) of the cognates have been affected by borrowing during their history. Our approach correctly identified 117 (94%) known borrowings. Reconstructed phylogenetic networks that capture both vertical and horizontal components of evolutionary history reveal that, on average, eight per cent of the words of basic vocabulary in each Indo-European language were involved in borrowing during evolution. Basic vocabulary is often assumed to be relatively resistant to borrowing. Our results indicate that the impact of borrowing is far more widespread than previously thought.


Y-chromosomes of South Africans

Investigative Genetics 2010, 1:6

Development of a single base extension method to resolve Y chromosome haplogroups in sub-Saharan African populations

Thijessen Naidoo et al.


Background: The ability of the Y chromosome to retain a record of its evolution has seen it become an essentialtool of molecular anthropology. In the last few years, however, it has also found use in forensic genetics, providinginformation on the geographic origin of individuals. This has been aided by the development of efficient screeningmethods and an increased knowledge of geographic distribution. In this study, we describe the development ofsingle base extension assays used to resolve 61 Y chromosome haplogroups, mainly within haplogroups A, B andE, found in Africa.

Results: Seven multiplex assays, which incorporated 60 Y chromosome markers, were developed. These resolved Ychromosomes to 61 terminal branches of the major African haplogroups A, B and E, while also including a fewEurasian haplogroups found occasionally in African males. Following its validation, the assays were used to screen683 individuals from Southern Africa, including south eastern Bantu speakers (BAN), Khoe-San (KS) and SouthAfrican Whites (SAW). Of the 61 haplogroups that the assays collectively resolved, 26 were found in the 683samples. While haplogroup sharing was common between the BAN and KS, the frequencies of these haplogroupsvaried appreciably. Both groups showed low levels of assimilation of Eurasian haplogroups and only two individuals in the SAW clearly had Y chromosomes of African ancestry.

Conclusions: The use of these single base extension assays in screening increased haplogroup resolution andsampling throughput, while saving time and DNA. Their use, together with the screening of short tandem repeatmarkers would considerably improve resolution, thus refining the geographic ancestry of individuals.

November 25, 2010

Some Indians as genetically diverse as Africans, recent Out of Africa in serious trouble?

Razib alerts me to a very interesting new paper, which discovered that some Indian populations are more diverse than Africans in a sequenced 100kb region. I will have much more to say about this once I digest it fully, but as I said in my recent review of the Oceania paper, I don't believe in long "interludes" of humans living Africa and then spending tens of thousands of years camping in one place before starting to expand again. I don't believe that there is evidence for Neandertal admixture in Eurasians either; the title of my post hints at what I believe. Update to follow.

Thanks to the Jorde Lab for putting up their genotype data easily accessible online! They're an example for others to follow.


Here is the crux of the paper:
As previously observed, heterozygosity (a measure of genetic diversity) decreases with distance from East Africa (represented here by the Luhya LWK HapMap poopulation). The only trouble is, that this pattern disappears once the Indian populations are included in the analysis.

Things are even worse though: the Luhya are an admixed population. Thus, their level of heterozygosity is inflated because of their relatively recent admixture associated with the spread of Bantu languages. Remove them, and it's clear the pattern of diminution of genetic diversity from East Africa completely disappears. Indeed, I am convinced that this pattern may be completely due to the admixed status of East Africans; the Maasai are another HapMap population from East Africa that seems to be missing from this analysis, and it is less heterozygous than the Luhya.

Even though the pattern of diminution of genetic diversity from East Africa (in autosomal genes, at least) may be largely due to the admixed status of East Africans, the same could be true for the Indian groups, who are largely composed of an indigenous "Ancestral South Indian", and an invasive "Ancestral North Indian" component. But, the point is that these two groups must have been substantially differentiated to produce a larger level of heterozygosity than in the Africans.

A caveat should be registered: genetic diversity in African hunter-gatherers (Bushmen and Pygmies) may be even higher than in the Yoruba and Luhya. Also, the mtDNA phylogeny is pretty unambiguous about the matrilineal origin of humanity being in Africa. And, the earliest known fossils of anatomically modern humans are in Africa. Thus, some kind of Out-of-Africa scenario still finds support in the data.

What does no longer find support in the data is the idea of a recent Out-of-Africa exodus 40-60 thousand years ago. The authors of the current paper:
the divergence time between African and the ancestral Eurasian population (88-112 kya, CIs: 63-150 kya) is much older than the divergence time among the Eurasian groups (27-39 kya, CI: 20-59 kya).
A divergence between Africans and Eurasians 100ky is consistent with the paleoanthropological finds from the Levant and China, showing the presence of anatomically modern humans thousands of kilometers apart at that time outside Africa. If there was an Out of Africa, it happened 100 thousand years ago.

The second important point is that the supposed maintainance of a Eurasian population outside Africa in the Levant for tens of thousands of years before the breakup of the Eurasians:

There are serious reasons to doubt this hiatus:

First, the presence of AMH in China in the Levant and China 100,000 years ago is hardly consistent with the maintainance of a geographically circumscribed population of Eurasians in the Levant until 40,000 years ago.

Second, it is hardly parsimonious that such a population would maintain itself in a geographically circumscribed area for so long. If they moved from East Africa to the Near East, why on earth would they stop there?

In my opinion two underappreciated factors should be considered:
  • Gene flow within Eurasia reduces divergence times between Eurasians; West, South, and East Eurasians did not branch out from a common ancestor; there were episodes of gene flow between them, some of them very recent, some of them beyond any record. Such lateral gene flow did not abolish differences between them, but it would have reduced the inferred divergence time.
  • Gene flow between Afrasians (i.e., Eurasians' unadmixed ancestors in East Africa) and other Palaeoafricans inhabiting other parts of the continent would have increased the inferred divergence time between Africans and Eurasians.
These two factors might suffice to explain the observed pattern, without invoking a long hiatus.

Genome Biology 2010, 11:R113 doi:10.1186/gb-2010-11-11-r113

Genetic diversity in India and the inference of Eurasian population expansion

Jinchuan Xin et al.

Abstract (provisional)

Genetic studies of populations from the Indian subcontinent are of great interest because of India's large population size, complex demographic history, and unique social structure. Despite recent large-scale efforts in discovering human genetic variation, India's vast reservoir of genetic diversity remains largely unexplored.

To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100 kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90-110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans.

Our results show that Indian populations harbor large amounts of genetic variation that have not been surveyed adequately by public SNP discovery efforts. Our data also support a delayed expansion hypothesis in which an ancestral Eurasian founding population remained isolated long after the out-of-Africa diaspora, before expanding throughout Eurasia.


November 24, 2010

mtDNA and Y chromosomes from pre-Columbian Andean Highlanders

They found all A, B, C, D mtDNA haplogroups. The evidence for genetic continuity can probably be ascribed to the small timespan to the present, and the relative isolation of Amerindians in comparison to populations of Eurasia where genetic discontinuities have been discovered over longer time periods.

From the paper:
All individuals belong to haplogroup Q1a3a∗. Only individuals where it was possible to determine the full profile of six SNPs are considered here. There is a high number of individuals showing allelic dropout, presumably due to DNA degradation, for one or more SNPs. For two individuals with realised polymorphisms in M242 and M3, only M19 could not be typed, so there is a chance that these individuals could belong to haplogroup Q1a3a2 and not Q1a3a∗.

Annals of Human Genetics doi: 10.1111/j.1469-1809.2010.00620.x

Diachronic Investigations of Mitochondrial and Y-Chromosomal Genetic Markers in Pre-Columbian Andean Highlanders from South Peru

Lars Fehren-Schmitz et al.

This study examines the reciprocal effects of cultural evolution, and population dynamics in pre-Columbian southernPeru by the analysis of DNA from pre-Columbian populations that lived in the fringe area between the Andean highlandsand the Pacific coast. The main objective is to reveal whether the transition from the Middle Horizon (MH: 650–1000 AD) to the Late Intermediate Period (LIP: 1000–1400 AD) was accompanied or influenced by population dynamic processes. Tooth samples from 90 individuals from several archaeological sites, dating to the MH and LIP, in the researcharea were collected to analyse mitochodrial, and Y-chromosomal genetic markers. Coding region polymorphisms weresuccessfully analysed and replicated for 72 individuals, as were control region sequences for 65 individuals and Ychromosomalsingle nucleotide polymorphisms (SNPs) for 19 individuals, and these were compared to a large set ofancient and modern indigenous South American populations. The diachronic comparison of the upper valley samples from both time periods reveals no genetic discontinuities accompanying the cultural dynamic processes. A high genetic affinity for other ancient and modern highland populations can be observed, suggesting genetic continuity in the Andean highlands at the latest from the MH. A significant matrilineal differentiation to ancient Peruvian coastal populations can be observed suggesting a differential population history.


23andMe $99 sale

Just an alert for anyone that might be interested (it will bring me more samples for the Dodecad Project). There doesn't seem to be an official announcement, but I've tested it, and it seems to work as of this writing. It's $99+shipping+1 year of mandatory Personal Genome Service at $5/month.

I won't be ordering myself, however. The price is not bad, but the mandatory $5/month for a year subscription to a Personal Genome Service is not my cup of tea.

I am all for choice, and for people being able to choose what they want. Personally, I want a few hundred thousand SNPs without any of the trimmings.

I don't want some intermediary to provide me with Relative Finder matches until I stop paying them a fee. In fact, I'm not interested in finding relatives at all, I already know who they are. I am not interested in personalized health reports, because, frankly, the results of health information you can get from a personal genome test is tiny. I am not interested in 23andMe's ancestry analysis, because mine and that produced by other dilettantes is, frankly, more cutting edge.

So, hopefully, a company will realize that there's money to be made by people who want to get their DNA genotyped and interpret it themselves using freeware community resources and networking. Until that happens, feel free to use every available option, including the great 23andMe sale, but count me out.

November 23, 2010

East Eurasian population structure as a window into the human past

Here is an MDS plot of 454 individuals from 32 East Eurasian/substantially East Eurasian populations:
Population labels have been placed in the (x,y) spot of the population averages. This corresponds -usually- to the midpoint of blobs of individuals from that population, but some populations have a few European-admixed individuals, and hence their population average is transposed. Consult the legend for color/point information for the 32 populations.

The most striking feature of this plot is the extreme homogeneity of the vast majority of East Eurasians. They may occupy a tiny blob on the left of the plot, but the various ethnic groups of China, the Japanese, and the Cambodians all appear to be very close to each other; indeed you can hardly see their labels in the mass of points. Here is a blowup of that part of the plot.

These dots represent the overwhelming mass of East Eurasians, and indeed the biggest single set of human groups. The scattering of populations to the right of the main MDS plot are, in comparison, demographically insignificant, some of them numbering less than 1,000 individuals.

What this plot shows, in tangible form, is a picture of mankind's past: before the invention of agriculture, most humans lived in small tribes, scattered across the world. We can be fairly certain that the action of genetic drift and natural selection would have created a cornucopia of human diversity, with high between-group diversity due to high levels of genetic drift.

Out of all this variety, some tribes of hunters made the transition to agriculture, growing in numbers, filling the areas they exploited, and expanding into new ones. The hunters were on their way to extinction, but new tribes formed at the fringes, those of pastoral nomads exploiting animals to thrive where neither farmer nor hunter could.

In the world of farmers, with growing population densities and expansion came the breakdown of isolates: this led to a further homogenization of the farmers' gene pool, as different tribes that had adopted the new way of life lost all trace of their past tribal identities and formed new ones based on the common language of the agriculturalists and the new way of life.

With more human bodies in the farming communities, came more novel mutations, and hence more of the raw materials of selection.

Coupled with the new challenges of agriculture, for which man is unaccustomed to, the social challenges of living close to many others in villages, and, later, cities, the cognitive challenges of new symbolic systems of communication, selection further reduced diversity in key aspects of human appearance and behavior, while maintaining it, or even increasing it in others, such as resistance to pathogens.

In western Eurasia this process was pushed to its limits, and there are virtually no nomads or hunters to be found there anymore. Africa was explored by Europeans just in time to find living hunters such as the San and Pygmies still in existence there. A few centuries more, and perhaps they, too, would have beeen absorbed into the mass of expanding farmers.

The few dots of European- and Chinese-admixed individuals that deviate from their own populations are a reminder of what would have happened if genetic science had not arrived at the scene when it did. For better or worse, the odds are stacked against most of these peoples surviving as distinct entities. The numbers are against them, and, sooner, or later, they will be assimilated.

November 22, 2010

Ancient mtDNA from Sargat culture

From the paper:
The Sargat culture was located in the forest-steppe region of southwestern Siberia, near what is now the border of Russia and northern Kazakhstan, from around the 5th century BC until the 5th century AD. It is associated with a number of similar archaeological cultures in the region from the same period or slightly preceding it, for example, the Gorokhovo, Iktul, and Baitovo. The Sargat culture is also known for containing a number of kurgan burials (Koryakova and Daire 2000; Matveeva 2000; Andrey Shpitonkov et al., personal communication, 2004), and roughly half of all graves contain the remains of horse harnesses (Koryakova 2000). On the basis of archaeological evidence, the Sargat culture has been ascribed to a zone of intermixture between the Iranian steppe peoples to the south, such as the Saka or Sarmatians, and native Ugrian and/or Siberian populations (Koryakova and Daire 2000; Matveeva 2000; Andrey Shpitonkov et al., personal communication, 2004). Previous craniological research has also suggested some intrusion of Iranian peoples from the south (Matveeva 2000).

I've written before about the intrusion of Iranian speakers into Uralic territory, so this is a nice confirmation of the fact:
The southern sites were both successful in all phases to varying degrees. The results can be seen in Table 2. The three Kurtuguz individuals belonged to haplogroups A, C, and Z.


The four Sopininsky samples represent two different graves, corresponding to two individuals. The kurgan burial included a tooth and a rib sample, which resulted in a sequence belonging to haplogroup T (more specifically, T1). This sequence is a relatively uncommon variant of T/T1 having the mutation 16243C. The "at grave included a tooth and a metatarsal sample, both of which yielded a sequence belonging to haplogroup Z, with one ampli!cation showing a double peak (C/T) at position 16224.
From the paper:
Furthermore, the speci!c subtype T1 tends to be found farther east and is common in Central Asian and modern Turkic populations (Lalueza-Fox et al. 2004), who inhabit much of the same territory as the ancient Saka, Sarmatian, Andronovo, and other putative Iranian peoples of the 2nd and 1st millennia BC.
The haplogroups of the other samples—A, C, and the two variants of Z—are typical of Siberian populations. Haplogroups A, C, and Z are common in northern Asia, particularly north of the Altai Mountains and the Amur River (i.e.,Siberia), and they decrease in frequency as one moves south, with haplogroup Z being rare at best (as one might expect, there is one individual of haplogroup Z present in the Iranian sample discussed earlier, three members of haplogroup C, and only three individuals with a variant of haplogroup A). In fact, haplogroups A, C, and Z along with haplogroups D, G, and Y constitute approximately 75% of the haplogroups of Siberia (Derenko et al. 2007; Mishmar et al. 2003).
The authors make a good point that this T in Siberians cannot be the result of Slavic expansion, as that postdates these ancient DNA samples. So, the picture seems reasonably consistent with what I know about Siberian prehistory, namely the presence of a Paleolithic substratum of east Eurasian origin, that was modified at its fringes by movements of Scytho-Sarmatian type of people of the steppes, and, more recently, by the expansion of the Russian Empire.

Human Biology
, Volume 82, Number 2, April 2010

Investigation of Ancient DNA from Western Siberia and the Sargat Culture

Casey C. Bennett, Frederika A. Kaestle

Mitochondrial DNA from 14 archaeological samples at the Ural State University in Yekaterinburg, Russia, was extracted to test the feasibility of ancient DNA work on their collection. These samples come from a number of sites that fall into two groupings. Seven samples are from three sites, dating to the 8th-12th century AD, that belong to a northern group of what are thought to be Ugrians, who lived along the Ural Mountains in northwestern Siberia. The remaining seven samples are from two sites that belong to a southern group representing the Sargat culture, dating between roughly the 5th century BC and the 5th century AD, from southwestern Siberia near the Ural Mountains and the present-day Kazakhstan border. The samples are derived from several burial types, including kurgan burials. They also represent a number of different skeletal elements and a range of observed preservation. The northern sites repeatedly failed to amplify after multiple extraction and amplification attempts, but the samples from the southern sites were successfully extracted and amplified. The sequences obtained from the southern sites support the hypothesis that the Sargat culture was a potential zone of intermixture between native Ugrian and/or Siberian populations and steppe peoples from the south, possibly early Iranian or Indo-Iranian, which has been previously suggested by archaeological analysis.


November 19, 2010

Y chromosomes and mtDNA in Guinea-Bissau

I am not entirely convinced that R1b in this population represents only "European influx". Part of it may be related to the other African R1b.

Forensic Sci Int Genet. 2010 Nov 2. [Epub ahead of print]

Paternal and maternal lineages in Guinea-Bissau population.

Carvalho M, Brito P, Bento AM, Gomes V, Antunes H, Costa HA, Lopes V, Serra A, Balsa F, Andrade L, Anjos MJ, Corte-Real F, Gusmão L.

The aim of the present work was to study the origin of paternal and maternal lineages in Guinea-Bissau population, inferred by phylogeographic analyses of mtDNA and Y chromosome defined haplogroups. To determine the male lineages present in Guinea-Bissau, 33 unrelated males were typed using a PCR-SNaPshot multiplex based method including 24 Y-SNPs, which characterize the main haplogroups in sub-Saharan Africa and Western Europe. In the same samples, 17 Y-STRs (included in the YFiler kit, Applied Biosystems) were additionally typed. The most frequent lineages observed were E1b1a (xE1b1a4,7)-M2 (68%) and E1a-M33 (15%). The European haplogroup R1b1-P25 was represented with a frequency of 12%. The two hypervariable mtDNA regions were sequenced in 79 unrelated individuals from Guinea-Bissau, and haplogroups were classified based on control region motifs using mtDNA manager. A high diversity of haplogroups was determined in our sample being the most frequent haplogroups characteristic of populations from sub-Saharan Africa, namely L2a1 (15%), L3d (13%), L2c (9%), L3e4 (9%), L0a1 (8%), L1b (6%) and L1c1 (6%). None of the typical European haplogroups (H, J and T) were found in the present sample of Guinea-Bissau. From our results, it is possible to confirm that Guinea-Bissau presents a typically West African profile, marked by a high frequency of the Y chromosome haplogroup E1b1a(xE1b1a4,7)-M2 and a high proportion of mtDNA lineages belonging to the sub-Saharan specific sub-clusters L1 to L3 (89%). A small European influx has been also detected, although restricted to the male lineages.


November 18, 2010

Joint effects explain some hidden variance (Culverhouse et al. 2010)

It seems that my lego-block paradigm has found support after all. The authors found genetic effects for their trait of interest (nicotine dependence) for pairs of SNPs, where the SNPs themselves showed no individual effect. Thus: unremarkable "commodity" building blocks combined to produce a particular effect.

Note that this was done on only pairs of SNPs. But, there is no reason to think that there won't be triads, or tetrads, or n-nads of SNPs having such effect, and the important thing is: if n-1 SNPs have no effect, n might. To give an everyday analogy: press Ctrl: nothing happens; press Alt: nothing happens, press Del: nothing happens, or perhaps a character is deleted, but press them all together, and all of the sudden something big happens.

There is a catch, however: for independent SNPs, the number of individuals that possess a particular n-long combination decreases exponentially with n. In short, you'd need to sample the whole population of the Earth, and you'd still not be able to find any individuals having some particular effective n-long combination, let alone a large enough sample to establish a statistical dependence with the trait of interest.

To reiterate: genome-wide association studies treat humans like black boxes: flip a SNP and see if the person is nicotine dependent or not. Or, as in this study, flip two SNPs that looked like "dead switches" when you tried to flip them individually. That approach is a dead end for most complex traits of interest, and the way forward is to get into the box, and see what genes actually do.

HUMAN GENETICS DOI: 10.1007/s00439-010-0911-7

Uncovering hidden variance: pair-wise SNP analysis accounts for additional variance in nicotine dependence

Robert C. Culverhouse et al.


Results from genome-wide association studies of complex traits account for only a modest proportion of the trait variance predicted to be due to genetics. We hypothesize that joint analysis of polymorphisms may account for more variance. We evaluated this hypothesis on a case–control smoking phenotype by examining pairs of nicotinic receptor single-nucleotide polymorphisms (SNPs) using the Restricted Partition Method (RPM) on data from the Collaborative Genetic Study of Nicotine Dependence (COGEND). We found evidence of joint effects that increase explained variance. Four signals identified in COGEND were testable in independent American Cancer Society (ACS) data, and three of the four signals replicated. Our results highlight two important lessons: joint effects that increase the explained variance are not limited to loci displaying substantial main effects, and joint effects need not display a significant interaction term in a logistic regression model. These results suggest that the joint analyses of variants may indeed account for part of the genetic variance left unexplained by single SNP analyses. Methodologies that limit analyses of joint effects to variants that demonstrate association in single SNP analyses, or require a significant interaction term, will likely miss important joint effects.


November 17, 2010

mtDNA haplogroup C1 in Icelanders: a genetic mystery

American Journal of Physical Anthropology DOI: 10.1002/ajpa.21419

A new subclade of mtDNA haplogroup C1 found in icelanders: Evidence of pre-columbian contact?

Sigríður Sunna Ebenesersdóttir et al.

Although most mtDNA lineages observed in contemporary Icelanders can be traced to neighboring populations in the British Isles and Scandinavia, one may have a more distant origin. This lineage belongs to haplogroup C1, one of a handful that was involved in the settlement of the Americas around 14,000 years ago. Contrary to an initial assumption that this lineage was a recent arrival, preliminary genealogical analyses revealed that the C1 lineage was present in the Icelandic mtDNA pool at least 300 years ago. This raised the intriguing possibility that the Icelandic C1 lineage could be traced to Viking voyages to the Americas that commenced in the 10th century. In an attempt to shed further light on the entry date of the C1 lineage into the Icelandic mtDNA pool and its geographical origin, we used the deCODE Genetics genealogical database to identify additional matrilineal ancestors that carry the C1 lineage and then sequenced the complete mtDNA genome of 11 contemporary C1 carriers from four different matrilines. Our results indicate a latest possible arrival date in Iceland of just prior to 1700 and a likely arrival date centuries earlier. Most surprisingly, we demonstrate that the Icelandic C1 lineage does not belong to any of the four known Native American (C1b, C1c, and C1d) or Asian (C1a) subclades of haplogroup C1. Rather, it is presently the only known member of a new subclade, C1e. While a Native American origin seems most likely for C1e, an Asian or European origin cannot be ruled out.

Y-chromosome Microsatellite Genealogy Simulator request

If anyone had downloaded YMGS version 1.0.3 before, please send me a copy ( The site where it was hosted has been taken down, and I only have version 1.0.1 saved in the computer I am currently using.

UPDATE: Never mind, found it; new link updated in the YMGS page.

November 16, 2010

Genomic runs of homozygosity in worldwide populations (Kirin et al. 2010)

This is a very interesting paper about the global distribution of runs of homozygosity. Such runs are typical of recently inbred individuals (who have a greater chance of inheriting the same chunk of DNA from their related parents), but also occur because of population history (populations that today number in the millions are descended from a much smaller of ancestors, so even if one's parents aren't "relatives" in the genealogical sense, they, nonetheless contain chunks of identical DNA).

"Old" inbreeding manifests itself in small chunks, as DNA chunks of ancestors get cut into ever finer pieces across the generations, while recently inbred individuals may have very long chunks.

Oceanians and Native Americans, for example, who are descended from relatively few founders because of the bottlenecks associated with crossing the Beringian/maritime voyages have an excess of short runs of homozygosity, but Native Americans also have long ones, suggestive of recent consanguinity.

Raw data can be found in the supplement.

PLoS ONE 5(11): e13996. doi:10.1371/journal.pone.0013996

Genomic Runs of Homozygosity Record Population History and Consanguinity

Mirna Kirin et al.

The human genome is characterised by many runs of homozygous genotypes, where identical haplotypes were inherited from each parent. The length of each run is determined partly by the number of generations since the common ancestor: offspring of cousin marriages have long runs of homozygosity (ROH), while the numerous shorter tracts relate to shared ancestry tens and hundreds of generations ago. Human populations have experienced a wide range of demographic histories and hold diverse cultural attitudes to consanguinity. In a global population dataset, genome-wide analysis of long and shorter ROH allows categorisation of the mainly indigenous populations sampled here into four major groups in which the majority of the population are inferred to have: (a) recent parental relatedness (south and west Asians); (b) shared parental ancestry arising hundreds to thousands of years ago through long term isolation and restricted effective population size (Ne), but little recent inbreeding (Oceanians); (c) both ancient and recent parental relatedness (Native Americans); and (d) only the background level of shared ancestry relating to continental Ne (predominantly urban Europeans and East Asians; lowest of all in sub-Saharan African agriculturalists), and the occasional cryptically inbred individual. Moreover, individuals can be positioned along axes representing this demographic historic space. Long runs of homozygosity are therefore a globally widespread and under-appreciated characteristic of our genomes, which record past consanguinity and population isolation and provide a distinctive record of the demographic history of an individual's ancestors. Individual ROH measures will also allow quantification of the disease risk arising from polygenic recessive effects.


Demographic history of Oceania (Wollstein et al. 2010)

frappe analysis on the left: New Guinea Highlanders split at K=4 (light blue), with Polynesian-Fijians remaining aligned with East Asians; at K=5 the specificity of the East Eurasian component in Polynesians-Fijians (teal) is revealed; at K=5 the specificity of Borneo is apparent (red), but there are individuals of clearer East Asian ancestry remaining.

From the paper:
Among the three demographic models examined for the peopling of Near Oceania (Figure 4, models 2a–2c), the model receiving the highest support involves a split of New Guineans from a common European-East Asian (i.e., Eurasian) ancestor population. This finding does not support the southern dispersal hypothesis of separate human migrations from Africa to Near Oceania and to East Asia [33, 34]. The existence of a single ancestral population for all present-day non-Africans is supported, among other genetic evidence, by recent data from the Neandertal genome sequence, indicating that all present-day non-African genome sequences studied(including one from a Papua New Guinean) have equivalent amounts of Neandertal admixture [46].
However, the authors date the split of Near Oceanians from the common Eurasians at 27ky and of East Asians from Europeans at only 18ky. These dates are far too low, in my opinion, as there is evidence that Upper Paleolithic Europeans were already robust versions of modern Caucasoids.

Moreover, if Eurasian unity broke down at 27ky, then where were the Eurasians since the time they acquired "Neandertal admixture" until 27ky?

It is difficult to imagine Eurasians camping in the Near East or Europe (where Neandertals are attested) for tens of thousands of years before starting to split off at 27ky. And indeed, the fact that there are anatomically modern humans from South China and the Levant at around 100ky, make the idea that Eurasians got Neandertal admixture in one place before starting to disperse a few tens of thousands of years ago hard to believe.

I personally don't buy the idea that New Guineans have the same "Neandertal admixture" as Europeans. In fact, I doubt there is any substantial Neandertal admixture in Eurasians at all, and if there is, it is certainly not the 1-4% evenly distributed element across Eurasia that was discovered in the recent paper.

In any case, this issue is peripheral to this paper which offers important new data on the question of Oceanian origins.


Curr Biol. 2010 Nov 10. [Epub ahead of print]

Demographic History of Oceania Inferred from Genome-wide Data.

Wollstein A, Lao O, Becker C, Brauer S, Trent RJ, Nürnberg P, Stoneking M, Kayser M.


BACKGROUND: The human history of Oceania comprises two extremes: the initial colonizations of Near Oceania, one of the oldest out-of-Africa migrations, and of Remote Oceania, the most recent expansion into unoccupied territories. Genetic studies, mostly using uniparentally inherited DNA, have shed some light on human origins in Oceania, particularly indicating that Polynesians are of mixed East Asian and Near Oceanian ancestry. Here, we use ∼1 million single nucleotide polymorphisms (SNPs) to investigate the demographic history of Oceania in a more detailed manner.

RESULTS: We developed a new approach to account for SNP ascertainment bias, used approximate Bayesian computation simulations to choose the best-fitting model of population history, and estimated demographic parameters. We find that the ancestors of Near Oceanians diverged from ancestral Eurasians ∼27 thousand years ago (kya), suggesting separate initial occupations of both territories. The genetic admixture in Polynesian history between East Asians (∼87%) and Near Oceanians (∼13%) occurred ∼3 kya, prior to the colonization of Polynesia. Fijians are of Polynesian (∼65%) and additional Near Oceanian (∼35%) ancestry not found in Polynesians, with this admixture occurring considerably after the initial settlement of Remote Oceania. Our data support a greater contribution of East Asian women than men in the admixture history of Remote Oceania and highlight population substructure in Polynesia and New Guinea.

CONCLUSIONS: Despite the inherent ascertainment bias, genome-wide SNP data provide new insights into the genetic history of Oceana. Our approach to correct for ascertainment bias and obtain reliable inferences concerning demographic history should prove useful in other such studies.


November 15, 2010

Reconstruction of 2,500 year old Carthaginian

Racial Reality points me to this reconstruction. From the related story:
An anthropological study of the skeleton showed that the man died between the age of 19 and 24, had a pretty robust physique and was 1.7 metres (five feet six inches) tall, according to a description by Jean Paul Morel, director of the French archaeological team at Carthage Byrsa.


"We can clearly see that this exceptional witness to Carthage in the Punic era is a Mediterranean man, he has all the characteristics," noted Sihem Roudesli, a paleo-anthropologist at the Tunisian National Heritage Institute.

November 14, 2010

Austronesians in Nias

Mol Biol Evol (2010) doi: 10.1093/molbev/msq300

Unexpected island effects at an extreme: reduced Y-chromosome and mitochondrial DNA diversity in Nias

Mannis van Oven et al.

The amount of genetic diversity in a population is determined by demographic and selection events in its history. Human populations which exhibit greatly reduced overall genetic diversity, presumably resulting from severe bottlenecks or founder events, are particularly interesting, not least because of their potential to serve as valuable resources for health studies. Here, we present an unexpected case, the human population of Nias Island in Indonesia, that exhibits severely reduced Y chromosome (NRY) and to a lesser extent also reduced mitochondrial (mt)DNA diversity as compared with most other populations from the Asia / Oceania region. Our genetic data, collected from more than 400 individuals from across the island, suggest a strong, previously undetected bottleneck or founder event in the human population history of Nias, more pronounced for males than for females, followed by subsequent genetic isolation. Our findings are unexpected given the island's geographic proximity to the genetically highly diverse Southeast Asian world, as well as our previous knowledge about the human history of Nias. Furthermore, all NRY and virtually all mtDNA haplogroups observed in Nias can be attributed to the Austronesian expansion, in line with linguistic data, and in contrast with archaeological evidence for a pre-Austronesian occupation of Nias that, as we show here, left no significant genetic footprints in the contemporary population. Our work underlines the importance of human genetic diversity studies not only for a better understanding of human population history, but also because of the potential relevance for genetic disease mapping studies.


November 13, 2010

Skin yellowness but not masculinity as indicators of male facial attractiveness

We should keep in mind that this sample was recruited from a fairly homogeneous population, and thus is not directly relevant if one is interested in attractiveness of people of different ancestry.

From the paper:
The regression retained only skin yellowness as a predictor of attractiveness, and the effect of skin yellowness was positive and highly significant (F(1,71) = 10.806, Beta = .366, t = 3.287, p less than .002). Skin lightness, redness and morphological masculinity did not significantly predict attractiveness (all p>.114, see Table 1).

On the left, faces that scored low/high in brightness (top), redness (middle), and yellowness (bottom). On the right, composites of low/high masculinity based on morphology (top row), ratings (middle row), and morphing of faces in a more feminine (bottom left) or more masculine direction (bottom right).
PLoS One. 2010 Oct 27;5(10):e13585.

Does masculinity matter? The contribution of masculine face shape to male attractiveness in humans.

Scott IM, Pound N, Stephen ID, Clark AP, Penton-Voak IS.


BACKGROUND: In many animals, exaggerated sex-typical male traits are preferred by females, and may be a signal of both past and current disease resistance. The proposal that the same is true in humans - i.e., that masculine men are immunocompetent and attractive - underpins a large literature on facial masculinity preferences. Recently, theoretical models have suggested that current condition may be a better index of mate value than past immunocompetence. This is particularly likely in populations where pathogenic fluctuation is fast relative to host life history. As life history is slow in humans, there is reason to expect that, among humans, condition-dependent traits might contribute more to attractiveness than relatively stable traits such as masculinity. To date, however, there has been little rigorous assessment of whether, in the presence of variation in other cues, masculinity predicts attractiveness or not.

METHODOLOGY/PRINCIPAL FINDINGS: The relationship between masculinity and attractiveness was assessed in two samples of male faces. Most previous research has assessed masculinity either with subjective ratings or with simple anatomical measures. Here, we used geometric morphometric techniques to assess facial masculinity, generating a morphological masculinity measure based on a discriminant function that correctly classified >96% faces as male or female. When assessed using this measure, there was no relationship between morphological masculinity and rated attractiveness. In contrast, skin colour - a fluctuating, condition-dependent cue - was a significant predictor of attractiveness.

CONCLUSIONS/SIGNIFICANCE: These findings suggest that facial morphological masculinity may contribute less to men's attractiveness than previously assumed. Our results are consistent with the hypothesis that current condition is more relevant to male mate value than past disease resistance, and hence that temporally fluctuating traits (such as colour) contribute more to male attractiveness than stable cues of sexual dimorphism.


November 12, 2010

Y chromosomes in Brabant

From the paper:
The Duchy of Brabant was a historical region in the Low Countries between the 12th and 18th century and consisted of a present-day Dutch province and three contemporary Belgian provinces together with the Brussels-Capital Region. The total area is 14.425km2 with approximately 150 km between the two most remote places in Brabant. The main reason for selecting this region was the ability to obtain reliable genealogical data of the patrilineal line for each of the numerous donors living together on a small geographical scale.

The authors typed 37 Y-STRs, and 103 Y-SNPs. They write:
All individuals were correctly assigned to the main haplogroups using the Whit Atheys’ Haplogroup Predictor. In total, eight main haplogroups were observed with almost 85% of the samples belonging to haplogroup R(63%) and I(21%)(Table 1). On the lowest observed level of the phylogenetic tree 32subhaplogroups were found in the dataset, whereby nearly 70% of all samples belonged to only four subhaplogroups: R1b1b2a1(R-U106), R1b1b2a2* (R-P312*), R1b1b2a2g(R-U152) andI1*(I-M253*)

They found star-patterns in all their subhaplogroups, but uncovered some structure in their J2a* (J-M410*) chromosomes. Youngest expansion ages "were observed for E1b1b1a2(E-V13) and I1*(I-M253*), respectively 4182–5855 and 4531–6344 years ago."

a strong downward trend in the frequency of haplogroup R was observed from North to South (Table 1; Fig. S5). The difference in the frequency of R haplogroups was circa 10% between the most northern and southern part, mainly due to the downward frequency of R1b1b2a1(R-U106).
The European-wide distribution of R-U106 suggests to me that it was a Germanic lineage.

Moreover, it was even possible to detect further substructuring within subha-
plogroup J2a*(J-M410*)based on the network analysis of all single-allele Y-STR haplotypes. Nevertheless, it was remarkable that the network analyses could not differentiate all observed subhaplogroups within R1b1b2(R-M269) and I2b(I-M223). This might be due to the relatively young age of these specific subhaplogroups making it impossible to differentiate these groups based on the Y-STRs.

The extraordinary success of these subhaplogroups is one of the most interesting questions: natural selection, or demographic dominance of a recently formed population group storming Western Europe by force of numbers? Ancient Y-DNA urgently needed...

The occurrence of haplogroup Q1 in 2.6% at Kempen, and 1.59% at Mechelen is an oddity of the findings that might merit further study.

In short this might be called a "model study" of Y-chromosome variation, due to the large number of individuals (477) and markers tested.

Forensic Sci Int Genet. 2010 Oct 29. [Epub ahead of print]

Micro-geographic distribution of Y-chromosomal variation in the central-western European region Brabant.

Larmuseau MH, Vanderheyden N, Jacobs M, Coomans M, Larno L, Decorte R.


One of the future issues in the forensic application of the haploid Y-chromosome (Y-chr) is surveying the distribution of the Y-chr variation on a micro-geographical scale. Studies on such a scale require observing Y-chr variation on a high resolution, high sampling efforts and reliable genealogical data of all DNA-donors. In the current study we optimised this framework by surveying the micro-geographical distribution of the Y-chr variation in the central-western European region named Brabant. The Duchy of Brabant was a historical region in the Low Countries containing three contemporary Belgian provinces and one Dutch province (Noord-Brabant). 477 males from five a priori defined regions within Brabant were selected based on their genealogical ancestry (known pedigree at least before 1800). The Y-haplotypes were determined based on 37 Y-STR loci and the finest possible level of substructuring was defined according to the latest published Y-chr phylogenetic tree. In total, eight Y-haplogroups and 32 different subhaplogroups were observed, whereby 70% of all participants belonged to only four subhaplogroups: R1b1b2a1 (R-U106), R1b1b2a2* (R-P312*), R1b1b2a2g (R-U152) and I1* (I-M253*). Significant micro-geographical differentiation within Brabant was detected between the Dutch (Noord-Brabant) vs. the Flemish regions based on the differences in (sub)haplogroup frequencies but not based on Y-STR variation within the main subhaplogroups. A clear gradient was found with higher frequencies of R1b1b2 (R-M269) chromosomes in the northern vs. southern regions, mainly related to a trend in the frequency of R1b1b2a1 (R-U106).