May 04, 2014

Genealogical vs. Evolutionary Y-STR mutation rate

Long-time readers will remember my Y-STR series which was inspired by my desire to figure out why some papers used the directly observed mutation rate for Y-STRs while others used a 3-times slower "evolutionary" one. My conclusion was that the evolutionary rate was misapplied because its theoretical justification hinged on an assumption of constant population size that was wrong for modern humans and even modest amounts of growth led to a mutation rate that was closer to the genealogical one.

My enthusiasm for Y-STR based dating waned in 2011 when Busby et al. (2011) published a study which showed the effect of microsatellite choice on TMRCA estimates due to the fact that Y-STRs don't follow the the unconstrained strict symmetrical stepwise model (i.e., increase by +/- 1 steps per mutational event with equal probability). This meant that various published age estimates depended on the cocktail of slow/fast Y-STRs used as well as the age of the target node in the phylogeny (because deviations from linearity were more egregious for older nodes). Additionally, at that time it was clear that whole Y chromosome sequencing was around the corner and so the issue would soon be resolved by a new technology.

A new preprint presents an interesting coda to this long-standing controversy and basically agrees that the genealogical rate is better than the evolutionary one in the sense that it produces age estimates that are closer to those from resequencing. The evolutionary rate is good only for the very deep split in the tree which is not surprising since this is the domain where deviations from linearity plague the genealogical rate so a slower rate will do better.

One comment to the authors is that the generation length of 25 years is not applicable to modern humans, but rather a male generation length of 31-32 years has been estimated across different cultures. The authors estimate the "super-Eurasian" CT clade at 56.26kya (using sequence data) which would correspond to ~71 thousand years if we apply a 31.5/25 multiplier. This would bring it closer to the age estimate for Eurasian M+N mtDNA which has been estimated without an assumption of generation length using ancient DNA. (There is of course no a priori reason that the two should match, but the dates seem to match with the drying up of the Sahara-Arabia zone c. 70kya. As I've argued before, the population contraction that must have accompanied such a traumatic event would be a good opportunity for drift to shed genetic lineages and leave the CT/M+N pair as the inheritors of Eurasia).

bioRxiv doi: 10.1101/004705

Comparison of Y-chromosomal lineage dating using either evolutionary or genealogical Y-STR mutation rates

Chuan-Chao Wang, Li Hui

We have compared the Y chromosomal lineage dating between sequence data and commonly used Y-SNP plus Y-STR data. The coalescent times estimated using evolutionary Y-STR mutation rates correspond best with sequence-based dating when the lineages include the most ancient haplogroup A individuals. However, the times using slow mutated STR markers with genealogical rates fit well with sequence-based estimates in main lineages, such as haplogroup CT, DE, K, NO, IJ, P, E, C, I, J, N, O, and R. In addition, genealogical rates lead to more plausible time estimates for Neolithic coalescent sublineages compared with sequence-based dating.

Link

22 comments:

Blender said...

Hi Dienekes,

I think you overlooked Wei et al. 2013 in your discussion.
http://www.fsigenetics.com/article/S1872-4973(13)00102-6/abstract

Unknown said...

Dienekes - you were ahead of the zeitgeist on this one. That's quite a bit of difference and makes a lot of diversity quite recent.

terryt said...

Unfortunately Australian Y-DNA was not included, neither C1d nor K1. This is more than a little surprising considering Australia provides some of the earliest evidence for modern human habitation outside Africa.

Annie Mouse said...

Hmm a glaring absence of G2, so far as I can tell. Which is after all the oldest Y haplogroup we have actually found, hence a good calibration check. Wonder why they left it out?

"In addition, genealogical rates lead to more plausible time estimates for Neolithic coalescent sublineages compared with sequence-based dating"

Hmmmm. So sequence based calculations are fine for the old stuff, but not when it gives ages that are too old (not "plausible") for the neolithic expansion haplogroups. For that we need the notoriously dodgy STRs with their genealogical rates. Hmmmm. How very self fulfilling.

eurologist said...

We should just disregard generation length altogether, and focus on a mutation rate over time. Male offspring mutation load increases approximately linearly with the age of the father, with the zero intersect approximately at birth. This is even true for autosomal DNA, for which the male line mutations overwhelm the female line ones. Thus, generation length drops out of the picture except for studies that use recent known lineages. However, for these the generation length is known and can be factored out.

Mark Moore (Moderator) said...

We can presume C1d and K1 came after CT, which makes it quite recent even by the new measure. Good call by our host the doubtful rate used by so many scientists. Why? Why use a number several times slower than the observed rate?

I do wonder about the generational age though. Is it the age of the male at their first offspring or their median age for all offspring? If the latter, maybe even 32 is too low. If the former, I would suggest that even 25 would be too high in many cultures. I would guess the age at first child for a hunter-gatherer in a plentiful environment to still be in the teens. For poor agricultural societies it might be later. That leads to the question of how much of the mutational difference in sub-saharan Africans, and Australians, and other Eurasians is a function of difference in generation times?

andrew said...

"The authors estimate the "super-Eurasian" CT clade at 56.26kya (using sequence data) which would correspond to ~71 thousand years if we apply a 31.5/25 multiplier. This would bring it closer to the age estimate for Eurasian M+N mtDNA which has been estimated without an assumption of generation length using ancient DNA. (There is of course no a priori reason that the two should match, but the dates seem to match with the drying up of the Sahara-Arabia zone c. 70kya. As I've argued before, the population contraction that must have accompanied such a traumatic event would be a good opportunity for drift to shed genetic lineages and leave the CT/M+N pair as the inheritors of Eurasia)"

I agree that 25 years is very low. The 29 year generation is pretty standard for estimates, and your reference of 31 years for male uniparental and 25-28 of female generations with 29 years for autosomal is about right.

Even more key, I think Dienekes is right to infer that the mutation events should line up with a key event that would be a post-Out of Africa population bottleneck leading to genetic drift, rather than the Out of Africa evident itself. The absence of mtDNA M and N lineages, or of Y-DNA C or F lineages in Africa, except for lineages convincingly traceable to Upper Paleolithic or later back migration strongly argues for the case that the Out of Africa population had predominantly or exclusively mtDNA L3 and Y-DNA B from which C and F derive (I deliberate don't resolve the issue of the source of Y-DNA DE clades - but Y-DNA E in Eurasia is probably post-LGM, or at least Upper Paleolithic in origin, while Y-DNA D has a very quirky distribution that might have been part of a separate wave than the C and F derived clades in Eurasia). In a continually expanding population model drift would have a very low probability of so completely removing mtDNA L3 in favor of mtDNA M and N, and removing Y-DNA B in favor of Y-DNA CT from the the proto-Eurasian population, or of mtDNA M and N emerging so closely in time to each other. But, a proto-Eurasian modern human population bottleneck in SW Asia (which would also be a plausible time for the introgression of the part of Neanderthal ancestry shared by all Eurasians) makes a purge of mtDNA L3 and Y-DNA B in favor of the Out of Africa derived versions of these clades seem much more plausible.

Also, if the genetic dates are pointing to the start of the expansion from the most severe post-Out of Africa, pre-into SE Asia population bottleneck of proto-Eurasians, rather than to the Out of Africa event itself, then the discrepancy between the ca 71kya genetic dates that have an appropriate Toba eruption associated climate correspondence, and the earliest archaeological evidence for modern humans outside Africa ca. 100kya to 120kya. A population bottleneck around that time fits with the gap in the archaeological evidence for modern humans in the Levant from ca. 75kya to 50kya.

The climate event at ca. 75kya may have killed off many mtDNA L3/Y-DNA B people, driven others back to Africa, and left as the sole remaining proto-Eurasians only those who fled to West Asia and South Asia, and perhaps the interior of Arabia, which could be a much smaller founder population clan that experienced lots of uniparental market drift from the entire initial Out of Africa population that would already be drifted relative to East and NE Africans.

terryt said...

"We can presume C1d and K1 came after CT, which makes it quite recent even by the new measure".

CT is placed at 56,000 years which allows plenty of time for the ancestor of C1d to have reached Australia by 45-50,000 years ago. Makes sense. But we would have a better idea if C1d had been sampled. Surely one Aboriginal C haplogroup is available. K1 at more recent than 44,000 years basically eliminates K as being the first Y-DNA to Australia though. But again a sample of K from Australia would have been helpful.

Annie Mouse said...

The actual "Resequencing" age numbers (shown in figure a and b) are not in this paper (so far as I can tell) although the other age calculations are in the supplementary material.

I presumed the numbers would be in Wei's paper which I have linked here (to save others time).

http://genome.cshlp.org/content/23/2/388

However although the full Y chromosome sequences are present in Table 2 of the supplementary material, the calculated ages were not. Darn it. Perhaps someone can recalculate the exact ages from the sequences, I dont have the software.

Anyhow I wanted to explore if these age estimates were reasonable in terms of real events. The best estimate is that the aborigines arrived in Australia 40-50k years ago. The main Y haplogroups are C1d (previously C4) and K. From Figures a and b

K ~= 33kya
C ~= 27kya
"C3" now C2~= 26kya (presumably similar to C1d)

These clearly do not fit a >40kya arrival date.

Applying the correction for generation time (31.5/25).

K ~= 41.58kya
C ~= 34.03kya
C2~= 31.5kya

This pushes K (just barely) into into the right time window. But C4 must have arrived later (or C originated in situ from CF).

Anyhow IMO either the displayed "Recalibrated" age estimates are too low, or the first men to arrive in Australia were not K or C4.

Personally I think K probably WAS the first to arrive, and even with the generation time correction the "Recalibrated" Y chromosome age estimates are still too low. I expected K to be dated to >55kya as it probably originated in SW Asia, and a subgroup arrived in Australia around 50kya.

If C did originate in Australia I suppose the first Australian men would have been CF and the age estimates could be right. The CF split is dated 55kya. Also this would put a whole new swing on La Brana, which is also C1 of some description (formerly C6).

andrew said...

I would also add that the paper is not very well done as a piece of scholarly work. While p values are calculated, there is no discussion at all of margin of error, which is the more pertinent measurement for most purposes and is hard to determine from the available data in the paper.

The organization of the papers sections is somewhat irregular, leading ahead to results, the description of the methodology is rather vague, and the engagement with the prior literature in both archaeology and inferred archaic admixture is shallow. The results section also contains no meaningful discussion of what limitations in terms of ability to reach a valid conclusion might be particular to this methodology.

terryt said...

"Personally I think K probably WAS the first to arrive"

For several reasons I think that unlikely, and we've been here before. I would expect the first people to arrive in Sahul from Timor to have arrived in Australia and only later to have reached New Guinea. We have C in Australia but virtually none in New Guinea, and what is there is derived from Timor C1c-M38 and is relatively young. On the other hand KMNOPS-derived Y haplogroups are far more common and diverse in New Guinea/Melanesia than in Australia. Of course you comment here could be correct:

"the first men to arrive in Australia were not K or C4".

Or this one:

"If C did originate in Australia I suppose the first Australian men would have been CF and the age estimates could be right. The CF split is dated 55kya. Also this would put a whole new swing on La Brana, which is also C1 of some description (formerly C6)".

I am sure KMNOPS originated in SE Asia, and so the newly defined C1 may have moved west with the P element from it.

terryt said...

I'd like to clarify my earlier comment. I certainly don't expect CF to have originated in Australia, or even C. That is in spite of trying to point out to German it is far easier to 'prove' modern humans evolved in Australia than it is to prove they arose in America. The big split in C is between C1 and C2. C2 almost certainly first appeared in northern China/Mongolia/Tibet. Therefore C1 most probably developed somewhere south of that region, but nearby. Perhaps near SW China/Burma/northeast India.

The split between CF may simply be a west/east one at each end of the Tibetan Plateau, F near the Iranian Plateau and C to the east (or north originally). The geographic split may be the result of increased aridity or cold having pushed both haplogroups southward.

At Dienekes' blog here:

http://dienekes.blogspot.co.nz/2014/04/mtdna-history-of-oceania-duggan-et-al.html

I have put a simplified version of what I consider to be the most likely sequence of mt-DNA entries into the Australia/New Guinea region. We can be sure that any mt-DNA into any uninhabited region must have been accompanied by some Y-DNA or it wouldn't have produced descendants. One of the papers I linked to there makes it obvious that although mt-DNA P evolved in Australia P1'2'8'10 moved to Eastern Indonesia quite early in the piece, long before either P3b or P4a moved to New Guinea or P1 and P2 moved to Australia/New Guinea. This presence in Eastern Indonesia would have provided some impetus to expansion further westward.

Another interesting piece of evidence supporting a New Guinea settlement after that of Australia is that the oldest mt-DNAs in New Guinea seem to first appear in the east rather than in the west where we would expect them to have arrived. This phenomenon is most easily explained if they arrived there by voyaging along the northern coast of Australia before following the York Peninsula to New Guinea. And again the same paper shows that mt-DNA Q1'2 also moved back to Eastern Indonesia before the separate haplogroups eventually returned to (western) New Guinea. As a result we can be sure of quite a level of backwards and forwards movement in the region which almost certainly included movement back across Wallace's Line into SE Asia.

Annie Mouse said...

"The big split in C is between C1 and C2. C2 almost certainly first appeared in northern China/Mongolia/Tibet."

My personal opinion is that C originated in Central Asia. The huge span of C1 (Japan, Australia to Europe) suggests to me that C1 also originates nearby, somewhere fairly central anyway.

I guess my main point in my previous point is that I think we are still underestimating Y haplogroup ages, but it is also possible that neither K nor C1d represent the first Australian men.

Annie Mouse said...

It occurs to me that if C1 travelled from central asia to Australia, then they have to be the front running candidate for the carrier of the Denisovan DNA.

One possible scenario is that C1 arrived in Australia carrying Denisovan and a robust phenotype to the preexisting gracile Australians. Maybe.

It is just also possible that Denisovan was already in south Asia as well as Siberia.

terryt said...

"My personal opinion is that C originated in Central Asia".

That is probably correct. But it must have been pushed out of much of Central Asia at some stage.

" The huge span of C1 (Japan, Australia to Europe) suggests to me that C1 also originates nearby, somewhere fairly central anyway".

I don't think so. Japanese C1 is separated geographically from Australian C1 by a swathe of C2. The two regions are not really connected genetically. Japanese C1 is far more closely connected to West Eurasian (European?) C1 than to Australian, SE Asian or even South Asian C1. I agree C1 must have originated somewhere near where C2 originated but to me that region is unlikely to have been Central Asia in the case of C1. C1's Central Asian distribution looks tied to the Upper Paleolithic expansion while Australian C1 is earlier. Unless this comment is correct:

"it is also possible that neither K nor C1d represent the first Australian men".

"I guess my main point in my previous point is that I think we are still underestimating Y haplogroup ages"

Possible. But different researchers seem to be coming up with consistent results, and those results easily fit archeological evidence. They just don't fit some people's preconceptions. I'm certainly prepared to accept the dates until it is shown they are wildly out.

terryt said...

"The huge span of C1 (Japan, Australia to Europe) suggests to me that C1 also originates nearby, somewhere fairly central anyway".

Another couple of factors to bear in mind. C1 forms four subgroups: C1a, C1b, C1c and C1d. Three of them are southern haplogroups: for example C1d confined to Australia and C1c in Eastern Indonesia and Melanesia, although of greatest age in Southern Wallacea. Both C1a and C1b form two subgroups. C1b1 primarily in Gujarat and C1b2 in Bangladesh. C1a is the odd one out, the only member of C1 found in more northern regions: C1a1 in Japan and C1a2 in Europe. Some years ago Ebizur mentioned that C-M38 (now C1c) showed an age of 46,000 years in that region. That is considerably earlier than any realistic date for C1a's presence in Europe. To me that is a considerable argument against C1 having moved from Central Asia, through South Asia to Southern Wallacea and so to Australia. Instead it has every indication of showing a migration in the reverse order.

terryt said...

"It occurs to me that if C1 travelled from central asia to Australia, then they have to be the front running candidate for the carrier of the Denisovan DNA".

That's what I used to think but two things changed my mind. Firstly, the new phylogeny and secondly, someone mentioned the Denisovan element is more pronounced in New Guinea rather than in Australia. It is still possible that 'Denisova' was carried to SE Asia by Y-DNA C but it is noticeably absent in East Asia and so that scenario is unlikely. I have come to believe Denisova was earlier carried to SE Asia by an expansion of Homo heidelbergensis when it contributed to the change from SE Asian H. erectus to H. soloensis. In other words it is not just also possible that Denisovan was already in south Asia as well as Siberia, but quite likely it was. Its presence in New Guinea would than be attributed to introgression into Y-DNA MNOPS and mt-DNA M haplogroups.

Further to my comments regarding the age of C1. Although C1c was probably in Southern Wallacea by 45,000 years ago it is unlikely anyone, C1a or otherwise, reached Japan much before 35,000 years ago. That fits with an Upper Paleolithic expansion from northwest South Asia that spread C1a to both Japan and Europe.

Annie Mouse said...

"it is unlikely anyone, C1a or otherwise, reached Japan much before 35,000 years ago."

OK I can buy that. But Japanese C1a is not in this paper unfortunately, so I cant look at that. Just C2 data (see below).

But again if C2 is 28kya then C1 is probably similar in age which is a bit young for a 35kya colonization of Japan by C1 men. Although the generation time correction would lift it to a more reasonable (28/25x35=39.2kya).

C2 (Formerly C3, M217, PK2, P44)
= ~28kya
C2c (formerly C3e,P53.1)
= ~7kya

Incidentally Genghis Khan was C3e now C2c. A ~7kya date for the Kereys has to be interesting for those interested in the origins of the Horde horse clans and nomadic pastoralists in general..

terryt said...

"But again if C2 is 28kya then C1 is probably similar in age"

Not necessarily so. We can get no idea of the age of any split between C1 and C2 by looking only at C2. The 'age' of a haplogroup actually reflects its period of diversification. Such diversification could occur quite some time after the haplgroup had separated from its closest relations. C2 does have a 'tail' of a few mutations although it is doubtful that the full list of mutations has been discovered for any C haplogroup. And as I mentioned Ebizur, using Scheinfeldt et al. (2006), came up with around 45,000 years for the diversification of what is now C1c. That fits exactly an arrival of C1d's ancestors in Australia around that time.

"Incidentally Genghis Khan was C3e now C2c. A ~7kya date for the Kereys has to be interesting for those interested in the origins of the Horde horse clans and nomadic pastoralists in general.."

Interestingly I noticed a few minutes ago when checking my diagram of Y-DNA C that C2e is actually the one C2 haplogroup that has expanded widely. The others are more geographically limited in extent (apart from C2b's presence in North America). C2e1b-Z8440 is spread throughout the Han Chinese and even reaches Bangladesh, while C2e1a has a more northern distribution. A time of 7000 years for that overall expansion fits a southward Neolithic expansion very well, presumably in association with many Y-DNA Os' expansions.

Annie Mouse said...

"Incidentally Genghis Khan was C3e now C2c. A ~7kya date for the Kereys has to be interesting for those interested in the origins of the Horde horse clans and nomadic pastoralists in general."

I did not make my point very clearly so I thought I would hammer it home a bit. I was surprised to see the famous Genghis Khan (C2c/C3e) star cluster dated a massive 7kya. It is far to early for Genghis (which did not surprise me). What surprised me was that it is also far to early for the Kereys or even the Hordes. Even the Scythians look to be too recent.

This is a much earlier event.The very first horselords expanding from a post glacial refuge perhaps?

terryt said...

"What surprised me was that it is also far to early for the Kereys or even the Hordes. Even the Scythians look to be too recent".

But those groups didn't spring from nowhere (in spite of what German seems to be claiming regarding some populations). C2c may have remained geographically restricted for quite some time before its later wide expansion. On the other hand the haplogroup must have diversified regionally to some extent before that expansion otherwise it would have remained as a single lineage.

By the way, has Genghis Khan's haplogroup changed from C3e to C2c? I thought the new nomenclature had only changed the numbering system, not the following letters.

Tyler said...

This is a much earlier event.The very first horselords expanding from a post glacial refuge perhaps?

AFAIK the domestication of the horse is dated to about 6kya. It is tempting to tie an event like domesticating the horse to the expansion of C2c people.

The time between migrating out of a post glacial refuge and the later expansion also would seem plausible.