August 18, 2010

Age estimation of Y chromosome lineages (Adamov & Karzhavin 2010)

A nice paper in the Russian Journal of Genetic Genealogy that addresses the subject of age estimation using Y-STRs. The authors share my sentiments on the subject, and were good enough to compare their simulation results and analytical approximations with my 2008 post. Every post I've written on the subject can be found in the Y-STR series label.

The authors write:

One of numerous critics of «effective» mutation rate is D. Pontikos, who published in 2008 the results of his own calculations in his popular blog (Pontikos, 2008 [8]). Fig. 11 shows the results of Pontikos for a fixed interval of genealogical tree with the final size of 750000 – 1250000 individuals ... Those data match well with approximation (6), and the difference between them is only 0.3% and 1.4%, correspondingly.
I have not checked all the details of this paper, but it should be a good read for anyone interested in the subject. Hopefully as more people look at the evidence, age estimation in mainstream journals will catch up with the state of the art.

I will not repeat the long and involved arguments and observations of my Y-STR series, but to summarize the argument for new readers:

  • Most recent population genetics papers use an "effective" mutation rate that is about 3 times slower than the observed "germline" rate (of father-son pairs) and leads to age estimates that are about 3 times older than is justified.
  • This mutation rate is applicable to the constant population case in which a man has 1 son on average. Population size may vary stochastically under this model, but it generally does not grow to large numbers within the time frame of Homo sapiens. For example, in the 2,000 or so generations since Y-chromosome Adam, a lineage evolving under this model would have 1,000 descendants on average, and the probability that it would have millions of descendants (like most real-world haplogroups in non-tribal populations) is practically zero.
  • If the constant population case does not hold, due to selection, or demographic growth, or social dominance, then the effective rate is not applicable, and age estimates using the germline rate are much closer to the truth.
  • The population sizes of real-world haplogroups are huge and could not have been generated by stochastic variation in a model where each man has 1 son on average. Most Y-chromosome age estimates in the mainstream literature are overestimates, and ascribe Paleolithic origins to Neolithic and Bronze Age founders.
The Russian Journal of Genetic Genealogy, Vol 1, No 2 (2010)

About the influence of population size on the accuracy of TMRCA estimation, done by standard methods using STR locus complex

Dmitry Adamov, Sergey Karzhavin


Model calculations of influence of a population growth from the common male ancestor towards the final (present-day) population on the TMRCA estimation have been done. The estimation was made by linear and quadratic methods using STR locus of Y-chromosome. The modeling was done using computer simulation of a tribal population during fixed number of generations.
Universal approximations, allowing estimate the average correction for population effects as a function of the final population size, have been obtained. Authors calculated the variance of age estimations for an initial ancestor, which appears due to different types of population effects. Precision of the ancestral allele determining in a STR from the final population haplotypes set have been studied. An algorithm has been proposed for TMRCA calculation for a paternal (tribal) population, taking into account its total population size.



German Dziebel said...

The authors write: "As the current
research has shown, the size of tribal population is a critical parameter affecting the quality of the estimates of the initial ancestor life time and of
the ancestral haplotype values. Tribal populations with number of members less than 200-300 men,
can not be reliably investigated using above-mentioned parameters at all. It is concerned first of all the isolates in the Amazonian and the Indonesian jungles, as well as some nations on the Far North."

Let's compare it with a statement made by Zhivotovsky (with a nod at Cavalli-Sforza) in his 2001 paper "Estimating Divergence Time with the Use of Microsatellite Genetic Distances":

"Indeed, among the current human populations, South American aboriginals [and these are specifically the Surui and Karitiana that have low population sizes - GD] can be considered a reference for microsatellite variation in an ancient African ancestor because their population size is low and might be compared with that estimated for an African ancestor, from one to a few thousand gametes (Rogers and Harpending 1992 ; Rogers 1995 ; Rogers and Jorde 1995; Zhivotovsky et al. 2000), and they have maintained their style of life probably since they arrived in this area (L. L. Cavalli-Sforza, personal communication). On the other hand, the Southern American Indian populations have descended from Asians, whose variation is very high, and thus the time since human colonization of South America might not have been sufficient to reduce microsatellite variation to the level of that in the ancestral population in Africa. Also, gene flow between even small villages can greatly increase within-population variance at microsatellite loci (Feldman, Kumm, and Pritchard 1999). Nevertheless, there is no objective proof of whether or not variation at the studied loci in South Amerindians is close to an African ancestral value. Therefore, the corresponding estimates of TD should be taken cautiously."

Any thoughts of how these two statements [I included full quotes for context] correlate or conflict with each other and what the implications might be?

Steve Sailer said...

Dear Dienekes:

I just wanted to say how much I enjoy your site.