September 19, 2011

Inference of ancient human demography from individual genomes (Gronau et al. 2011)

This new paper is reminiscent of Li & Durbin (2011), in that it also fits a model of ancient human demography based on individual genome sequences. Unlike that paper, it also considers a San individual, and is hence a good realization of the project I proposed in response to the Li & Durbin paper.

As is so often the case, the absolute age estimates are based on a calibration, which is spelled out quite nicely in the supplementary material (pdf; p. 55). In particular, the age estimates are based on:
  • Human-chimp divergence of 6.5Mya
  • Generation length of 25 years
As the authors note, their calibration results in:
an adjusted estimate of the per generation mutation rate would be slightly more than 2 × 10−8 mutations per site. This adjusted estimate agrees well with independent estimates of 1.8–2.5 ×10−8 (Nachman and Crowell, 2000; Kondrashov, 2003). It is slightly higher than recently reported estimates of 1.0–1.3 ×10−8 (The 1000 Genomes Project Consortium, 2010; Lynch, 2010; Roach et al., 2010), but, considering the many sources of uncertainty in these studies, we do not regard this difference as a serious concern. It is difficult to reconcile per-generation mutation rate estimates as low as 1×10−8 with the observed levels of human/chimpanzee genomic divergence.
However, Nachman & Crowell do not provide a mutation rate estimate independent of demography. As can be seen from Table 3 of their paper, their mutation rate estimate depends on human-chimp speciation as well as assumptions on ancestral effective population size. First, they assume a generation length of 20 years, hence their calibrations need to be scaled: 6.5My in 25y generations is equivalent to 5.2My in 20y generations. Nachman and Crowell estimate the mutation rate at 2.5x10-8 and 1.4x10-8 with an effective size of 10,000 individuals and speciation at 5 or 5.5Mya.

Hence, their mutation rate estimate for 5.2My would be between 1.4x10-8 and 2.5x10-8, i.e., close to the value of Gronau et al. (2011), assuming that the effective population size was 10,000 individuals. Gronau et al. estimate the effective population size at 9,000 individuals. So, there is nothing independent about N&C's age estimate: it is dependent on the effective population size, and the Gronau et al.'s mutation rate/effective size estimate of 2.0x10-8/9,000 individuals may be consistent with the data, but so is a lower mutation rate and higher effective size.

Note that, unlike Li & Durbin, Gronau et al. do not consider a model with a structured African population, or the presence of archaic admixture. These would have produced observed divergence times by a combination of a younger divergence between modern human groups, coupled with admixture with a more distantly diverged (archaic or "Palaeoafrican") population, for which there is now genetic and palaeoanthropological evidence.

I do not have a strong opinion how the 2-fold mutation rate difference between different papers will be resolved. If the slower empirical estimates are accepted, then this would result in deeper divergences between human populations, as well as an earlier human-chimp split, but the difference is not necessarily linear.

As I have noted before, there is no reason I can think of why parent-offspring rates should be slower than evolutionary ones. Two potential processes might actually make them appear faster: phantom mutations based on current whole genome sequencing technology, or loss of mutations due to drift across geological time scales. So, unless there is a technical reason for the low 1000Genomes rate, I'm more inclined to trust it rather than circular calibrations of demography/mutation rate/effective population size. In any case, we will have more full genome sequences from family members in the coming years, so the mutation rate will be calibrated directly, without recourse to human-chimp speciation or ancestral population sizes.

A slower mutation rate would make sense to me on palaeoanthropological grounds:
  • The authors estimate European/East Asian divergence at 30-45kya. But, the presence of clearly derived Caucasoid morphology in the Upper Paleolithic population of Europe, suggests to me that divergence may have begun some time before.
  • Table S2 of adjusted Mahalanobis distances from Harvati et al. (2011) leaves little doubt that the Eurasian anatomically modern humans (EAM) from the Levant (Skhul/Qafzeh) are related to subsequent Eurasians. EAM has a distance of -0.25 to later Upper Cave from China (UC); 6.42 to recent Oceanians (OCE); 7.19 to Upper Paleolithic Eurasians. All of the above are well-within the maximum divergence observed between any two modern human groups. Ancestral Eurasians likely lived before 100ky, and did not split from Africans only 50ky.
  • If there was a long isolation between Khoe-San and the rest of mankind, then where did it happen? It is no longer plausible to postulate multiple fully modern groups in Africa that are absolutely absent from the palaeoanthropological record in the timeframe in question, in reproductive isolation to the multiple archaic or archaic-like ones that keep turning up.
  • How did the ur-humans in Africa manage reproductive isolation for tens of thousands of years between themselves (Khoe-San vs. rest or moderns vs. archaics), but apparently mixed a-plenty right after they left Africa with Neandertals/Denisovans? Were Neandertal women really that sexy?
  • Actually, the fragmentary record, as it stands, has not revealed any traces of a Proto-San population, and the Hofmeyr skull from South Africa stands as an outlier in the African paleoanthropological record with its strong affinities to Upper Paleolithic Eurasians.
We are only now beginning to harness the power of full human genomes for evolutionary inferences, but it is inevitable that a new theory of human origins will appear that will reconcile the different and conflicting lines of evidence. That theory must take into account latent admixture as a cause of African genetic diversity, and it must also harmonize with the paleoanthropological record.

Nature Genetics (2011) doi:10.1038/ng.937

Bayesian inference of ancient human demography from individual genome sequences

Ilan Gronau et al.

Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108–157 thousand years ago, that Eurasians diverged from an ancestral African population 38–64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ~9,000.

Link

10 comments:

German Dziebel said...

"Actually, the fragmentary record, as it stands, has not revealed any traces of a Proto-San population..."

Not exactly. See here http://www.scielo.org.za/scielo.php?pid=S0038-23532007000400020&script=sci_arttext. There's nothing in the Paleolithic contexts, though.

"How did the ur-humans in Africa manage reproductive isolation for tens of thousands of years between themselves (Khoe-San vs. rest or moderns vs. archaics), but apparently mixed a-plenty right after they left Africa with Neandertals/Denisovans?"

Good point. Also, how can San be completely modern in their language and behavior if supposedly they separated from the rest of humanity 60-100K years before we begin seeing stable traces of modern human behavior in the archaeological record? Did modern language and behavior evolve independently at least twice in Africa? This doesn't make sense.

Geneticists constantly create these living fossils in the form of Bushmen and Pygmies, while reality is they represent local and relatively recent African developments and not ancient retentions.

Jean said...

Have the authors taken into account back-migration into Africa of farmers and pastoralists carrying Y-DNA E and R1b?

If they haven't then this could mess up their calculations. If the major part of the population of Africa today actually has some ancestry in common with the Near East, surely the date of their separation from Eurasian people is going to look a lot younger than the actual "out of Africa" migration? Or am I just a clueless non-mathematician?

Andrew Oh-Willeke said...

A factor that could really screw with the inferred mutation rate is the possibility that mutation rates are higher in circumstances that put more environmental stress on the average human (e.g. drought caused malnutrition), leading to period of relatively high mutation rates and relatively low mutation rates.

Dienekes said...

There's nothing in the Paleolithic contexts, though.

That is the point, that the San are said to have been diverging from other modern humans for more than 100,000 years, and nothing resembling the San even remotely is found until very recent times.

German Dziebel said...

"That is the point, that the San are said to have been diverging from other modern humans for more than 100,000 years, and nothing resembling the San even remotely is found until very recent times."

We're in complete agreement on that. I was just saying that there're Khoisanid skulls from early Holocene on. This gives us a bit of a timeline on when San may have diverged from their source population. Linguists date the Khoisan family at 15-10K, which is pretty close. With lots of uncertainties surrounding this language family and the rate of skull recoveries, it's still notable that, if we calibrate gene divergence by a cross between paleoanthropological and linguistics dates, we'll have to multiply evolutionary rate ten-twenty fold to arrive at the 100-200K window for San divergence. This is in addition to dividing it by three in order to arrive at the Caucasoid divergence. This is fine by me, as I think, mutation rate is lineage specific, but how does it sound to you?

terryt said...

"As I have noted before, there is no reason I can think of why parent-offspring rates should be slower than evolutionary ones".

Or even constant. Surely the 'mutation rate' relies on constant survival of genetic changes, but that is unlikely to be the case. Even mutations not subject to selection either for or against can hardly be assumed to arise at a constant rate, especially in the short term. andrew sees the same problem:

"A factor that could really screw with the inferred mutation rate is the possibility that mutation rates are higher in circumstances that put more environmental stress on the average human (e.g. drought caused malnutrition), leading to period of relatively high mutation rates and relatively low mutation rates".

Azerty said...

the TMRCA of Y-DNA Khoisan Marker A3b1 is only about 30Kya to 15kya and Probably at 9Kya.

what about the Aterian culture 200Kya. she still existed until 20Kya. with the beginning of Iberu-maurusian culture in the maghreb 50Kya. why Ignoring the North African ones ?!

Jim said...

"Linguists date the Khoisan family at 15-10K, which is pretty close."

And then there are the many linguists who doubt it is one family at all, so a date for the proto-language is moot. At best Koisan is an areal grouping. One suppsoed sub-group does appear to be a Sprachbund in the group, which can look like a language family at first glance. But that's about all there is to the so-called Khoisan langauge family.


One of the groups shows some affinities with Afro-Asiatic that the otrhers don't, which is another nail in the coffin of Khoisan. One of these affinities was in having noun gender, and this further made for simialrities in the pronouns. Since pronouns are rarely borrowed this suggest a genetic rather than an areal connection.

German Dziebel said...

@Jim

That's exactly what my caveat was: "With lots of uncertainties surrounding this language family..."

"One of the groups shows some affinities with Afro-Asiatic that the otrhers don't"

Can you be more specific?

"Since pronouns are rarely borrowed this suggest a genetic rather than an areal connection."

But pronouns, including the whole pronominal systems, DO get borrowed (http://books.google.com/books?id=h36tPYqAZPwC&pg=PA248&lpg=PA248&dq=lyle+campbell+pronouns+borrowed&source=bl&ots=etjvxplwjO&sig=dtTcnNltM0_wKp9hJf_iSrBVTTc&hl=en&ei=ZON5Tom1JIivsAKQl-CiAw&sa=X&oi=book_result&ct=result&resnum=7&ved=0CEwQ6AEwBg#v=onepage&q&f=false). Whether it's rare or not doesn't really matter when we're talking about a single case which may easily fall into a "rare" category.

On a separate note, Hadza, which is considered an outlier in the Khoisan language family hypothesis, is also a genetic outlier.

Jim said...

"But pronouns, including the whole pronominal systems, DO get borrowed..."

Yeah, that why I said "suggests" rather than "indicates". Thia for instance has borrowed English "you' and for two reasons. One is that pronouns are an open class in Thai unlike most languages outside SE Asia, so adding pronouns doesn't disrupt the system because there is none and two, since the calss is open because it accomodates a complex social ranking sytem, the simplicity of having a single word to cover all second person reference is very appealing. thiose are very special conditions.

But as you say, rarity does not mena something never occurs. In the case of "Central Khoisan" and Afroasiatic, borrowing of pronouns at the very least is an indicator of geographic vicinity. That's interesting by itself.

And the same holds for the noun gender system. There are examples all over the place of simalr typological borrowings and the name for this is "areal feature".

However, when you have a situation where you are looking for gentic affiliation between groups, and you have evidence like pronoun and gender systems that link one group to another as opposed to weak, impressionistic observations of phonetic similarities such as sharing click phonemes linking it to someother group, it seems likielst that the first alignment is more valid than the second.