October 12, 2009

Y-chromosome demographic history (Shi et al. 2009)

Just a quick heads up on this open access paper which seems very important in that it tests a very large number of Y-STR markers on a well-known dataset, and proposes a new recalibration of the "evolutionary mutation rate" that I have criticized elsewhere. I will have to read the paper carefully before passing judgment (Look in this space for updates).

UPDATE (Oct 13):

The paper adds nothing to the issue of the appropriate mutation rate choice for TMRCA estimation. The revised Evolutionary Mutation Rates (rEMR) proposed in this paper are nothing more than an application of the Zhivotovsky et al. (Z. et al.) Evolutionary Mutation Rate (EMR) for markers not included in the original calibration by Z. et al. and exhibiting either higher or lower variance than those that are included. The use of a Z. et al.-like calibration is taken uncritically for granted.

Furthermore, the authors use BATWING to generate genealogies in order to infer TMRCA of lineages and populations, employing their rEMR for this purpose. This is wrong because both the "evolutionary mutation rate" and BATWING take into account genealogy. By using rEMR in conjunction with BATWING they are correcting for loss of Y-STR diversity due to genetic drift twice. This mistake was also done in another paper this Spring. I wrote:
Indeed, in this paper they attempt to use Batwing to estimate ages using the effective rate. Batwing employs a Bayesian method with coalescent simulations, and thus takes into account "population history", the effects of which are supposedly encapsulated in the effective mutation rate. Thus, they are "correcting" (inappropriately of course) for population history twice.
In conclusion: the age estimates provided in this paper (which can be found in Supplementary Table S4) are useless. The paper is, nonetheless, useful, because it shows the relative ages of many haplogroups, even though the small sample sizes for many of them do not inspire confidence in their accuracy.

UPDATE II: The paper also completely ignores admixture as a source of genetic diversity.

Molecular Biology and Evolution, doi:10.1093/molbev/msp243

A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations

Wentao Shi et al.


We have investigated human male demographic history using 590 males from 51 populations in the HGDP-CEPH worldwide panel, typed with 37 Y-SNPs and 65 Y-STRs, and analyzed with the program BATWING. The general patterns we observe show a gradient from the oldest population TMRCAs and expansion times together with the largest effective population sizes in Africa, to the youngest times and smallest effective population sizes in the Americas. These parameters are significantly negatively correlated with distance from East Africa and the patterns are consistent with most other studies of human variation and history. In contrast, growth rate showed a weaker correlation in the opposite direction. Y lineage diversity and TMRCA also decrease with distance from East Africa, supporting a model of expansion with serial founder events starting from this source. A number of individual populations diverge from these general patterns, including previously-documented examples such as recent expansions of the Yoruba in Africa, Basques in Europe and Yakut in Northern Asia. However, some unexpected demographic histories were also found, including low growth rates in the Hazara and Kalash from Pakistan, and recent expansion of the Mozabites in North Africa.



  1. You are not going to like it, Dienekes: their R1b1b2 (global) has a MRCA some 36 ky old and the Basque one for the same lineage is c. 12 ky old. They also get an expansion date for Basques (recent in European context) of 2000 years before Neolithic.

    However I'm not enthusiastic about it either, populations like Sardinians or Northern Russians, that we know are recent, appear to have the oldest European chronologies, for example. There are some artefacts that have not been properly checked and give unusual results, not only for the Yoruba and Xiabei, for whom they acknowledge that possibility.

    My opinion? An elaborate theoretical excercise that needs more refining and also probably a better sample: again important populations like Sudanese or Indians, or in this case Australian Aboriginals too, have been just overlooked completely, while the typical nonsense populations like the Mozabites have been included one more time. 500 individuals seem rather few for a worldwide survey too.

  2. Who says that Sardinians are recent? Not the scholars who wrote papers on them and not I.
    I must read the paper and now unfortunately I must go to work.

  3. I am really sorry about this dumb question. But I have been trying to download some NRY sequence data from the Manfred Kaysers series of publications, and I can't seem to, and neither can I dwnload data from the paper tht u have quoted here. Is there anyway that I can? It might just be that I don't know. Please help!


  4. Who says that Sardinians are recent? -

    Recent in the sense that the island seems uninhabited before Neolithic (as happens with most other Mediterranean islands).

    The age is an artefact of the combined haplogroups, which in Sardinia includes what they describe, with their limited set of SNP markers, as R1b(xR1b1b2), as well as I and others.

    Essentially they seem to have calculated (supplementary material spreadsheet) the age for each haplogroup (among the SNPs they use) within each population and then picked the oldest one as MRCA (or something like that). It is roughly correct but that doesn't mean the MRCA of Sardinians lived in Sardinia (surely not but was in fact some Asian F, or rather IJK but they don't use this marker, guy).

  5. Bluejay: sounds like your software problem. Maybe you have javascript deactivated or some other similar issue. I had no problems at all.

  6. Of course I don’t understand why Dienekes and other continue to criticize all the most famous scholars in Genetics when they use Zhivotovsky mutation rate. Is it possible that all these scholars don’t understand on Genetics and only Vizachero (who for what I know isn’t a genetiticist) does? I can understand the critics of Klyosov on the paper on CMH (Zhivotovsky mutation rate isn’t appropriate for a few thousands of years), but I think so for many ten thousands.
    Of course I am very glad that Tuscans (page 14) is the most ancient population to expand in Europe (35ky) and this confirm all my theories.

    To Maju I can say that probably Sardinians didn't live in Sardinia, because they lived in Tuscany. If you can read Italian (and being Spanish I think so)I suggest you to read: Massimo Pittau, La lingua dei Sardi Nuragici e degli Etruschi, Editrice Libreria Dessì, Sassari 1981.

  7. We're talking about the patrilineal MCRA, who for any population (or sample) that has combined R and I (like most Europeans) is IJK, and for those that also have E (also most) is Y(xB,A) or the "Eurasian Adam". I wouldn't talk of Tuscany or anywhere else in Europe for such remote ancestors.

    However the global TRMCAs for each haplogroup may be meaningful.

  8. "Recent in the sense that the island seems uninhabited before Neolithic (as happens with most other Mediterranean islands)".

    Why would that be so if humans possessed boats surely capable of reaching those islands for such a long time before then?

  9. Terry: you're just another one who likes to insist on your dogma, even if your style is different. I pass.

  10. Maju says: “I wouldn't talk of Tuscany or anywhere else in Europe for such remote ancestors”.

    They didn’t name Tuscans 35kyBP, but there is a reason (a sufficient reason would say Leibnitz) if the base populations for European ancestry comparison are Tuscans, Sardinians, Italians, French, Orcadians, Russians and Adyghey? Or not?

  11. They didn’t name Tuscans 35kyBP

    You did. Quite gratuitously IMO.

    (...) but there is a reason (a sufficient reason would say Leibnitz) if the base populations for European ancestry comparison are Tuscans, Sardinians, Italians, French, Orcadians, Russians and Adyghey? Or not? -

    And Basques.

    I don't think it's a good reason. The populations are an arbitrary sample of Europeans and they could be meaningless for these purposes. Also Tuscans show a more recent TRMCA than Sardinians.

  12. Really, if we must believe to the paper, Tuscans have the highest values in Europe and Basque the lowest. It is not sufficient to speak a strange language (probably of Caucasian origin, how demonstrated the great Italian linguist Alfredo Trombetti, Le origini della lingua basca (1925)) for being genetically ancient. Hope we’ll discover where Basque took their language from, but genetically they are the same of the Spaniards: for R1b1b2 very recent. But I supposed that Spain/France was the refugium of R1b1*... and I believe always the same. Of course the few R1b1*s don’t change the ancientness of today population. But we can think also that Basque language was the language of R1b1*, come with this haplogroup from Caucasus. Between the Jewish R1b1* Sam Vass and the Basque Arellano I have always supported that Arellano was the donor and Vass the converted.

  13. Highest values of what? Have you even checked the supplementary material?

    There it's clear that if the sample happens to have some DE, you get an older overall date than those samples that do not show it.

    Does that mean anything real? No: they are just artefacts of small samples and a quite curious but otherwise limited way of calculating.

    CF''DE will always be an older MCRA than F alone. But virtually all Europeans, Basques included, have in fact some E, even if it does not show up in these small samples.

    As we know that E, I and R (and possibly also R1b and R1a) belong to different processes and have likely different origins in what regards to Europe, we can't do as this study does, much less with such small unrepresentative samples.

    It is an artefact caused by diverse origins: the more diverse the origins the older the MCRA. If you get a multiracial or mixed population of America, for example, you'll get Y-DNA Adam as MRCA or at least as old as your Etruscans (CF'DE node). However the population as such only has a few centuries at most.

  14. The values I speak on are those of Expansion Time. Of course they have had to test each haplogroup separately, and why they didn’t? Of course peoples who have a high percentage of R1b are really tested above all for this haplogroup and their results are reliable. In a world wide comparation results are also reliable, being unlikely that San have hg. R1b1b2 or Tuscans hg. A or B. Of course the researchers presuppose that peoples are stable in the time and for long times this isn’t true.


