The age of the Y-MRCA can be estimated (among other ways) using either allele variance or average squared distance (ASD):
xi is the observed allele in one of the n descendants of the unknown Y-MRCA. The ancestral allele, xa, is generally unknown, and can be estimated e.g. by taking the modal (most frequent) or median allele from the xi's.
We can related these statistics to age g in generations according to these two equations:
The parameters wa and wv are effective mutation rates, and they govern how sharply ASD or Variance accumulates with the passage of time (g).
While Zhivotovsky, Underhill and Feldman (2006) (Z.U.F.) studied wv using many simulations with many different population histories, and Zhivotovsky et al. (2004) (pdf) derived wa from "known" real-life histories (of Bulgarian Gypsies and Maori), the proposed effective rate of 0.00069/locus/generation (which is 3.6x lower than the observed germline rate) has been used indiscriminately in the literature for all kinds of populations and all kinds of statistics (both ASD, and Variance, and even ρ as in the recent paper on African pastoralists).
But, if you read Zhivotovsky, Underhill and Feldman (2006) or my two previous posts on the subject, you will realize that the effective rate depends on population history; that the 0.00069 rate is derived for constant-sized populations where haplogroups never grow to large numbers; that most interesting haplogroups that scientists date with it are so large that can't have grown under the assumptions leading to the 0.00069 rate; hence, there has been a general overestimation of Y-MRCA ages whenever the one-size-fits-all rate is used.
What is an appropriate effective mutation rate?
I have previously hinted that the appropriate effective mutation rate is much closer to the germline rate μ, i.e. the probability that a son's allele differs by 1 repeat from that of his father. Now, I present some more systematic simulations which address the issue of the rate for (i) different population growth m (=average number of sons/man according to a Poisson process) and (ii) different antiquity in generations g. As usual, I keep μ=0.0025 and average results over 10,000 simulation runs.
The following table lists g, m, Size, Var, ASD, μ/wv, μ/wa. The average number of descendants is Size. I calculate ASD using the median allele from the observed ones.
(Note on Sep 18: An updated table with slightly different results is found in an erratum)
As you can see, the correction factor μ/wv=3.6 is at the upper limit of these rates. For a constant-sized population (m=1), μ/wv approaches 3.6 with increasing g. But, for all (g, m) settings resulting in a fairly large haplogroup (but still smaller than observed ones), the correction factor for both ASD and Variance is less than 1.3.
A Practical Example
Suppose that a haplogroup has Size=1,000,000 men, and an ASD=0.275.
Using the Z.U.F. rate leads to a TMRCA estimate of 0.275/0.00069 = 399 generations.
Yet, for 400 generations, even if m=1, the observed ASD is 0.37 but the Size=207.
If the haplogroup did originate 400 generations ago, it should have grown at a faster rate than 1.02/generation to reach its current size; but this would have led to an ASD greater than 0.821 (last row of the table).
So, the estimate of 399 generations using the Z.U.F. rate is a gross overestimate.
Appenix A: Relationship of ln(haplogroup Size), and correction Divider (μ/wa) with m (for g=140)
Appendix B: Relationship of ln(haplogroup Size), and correction Divider (μ/wa) with g (for m=1.02)