August 08, 2008

On the effective mutation rate for Y-STR variance

This is part III in the trilogy on dating Y-chromosome Most Recent Common Ancestors (Y-MRCAs) using microsatellite variation. See part I and part II.

The age of the Y-MRCA can be estimated (among other ways) using either allele variance or average squared distance (ASD):

xi is the observed allele in one of the n descendants of the unknown Y-MRCA. The ancestral allele, xa, is generally unknown, and can be estimated e.g. by taking the modal (most frequent) or median allele from the xi's.

We can related these statistics to age g in generations according to these two equations:
The parameters wa and wv are effective mutation rates, and they govern how sharply ASD or Variance accumulates with the passage of time (g).

While Zhivotovsky, Underhill and Feldman (2006) (Z.U.F.) studied wv using many simulations with many different population histories, and Zhivotovsky et al. (2004) (pdf) derived wa from "known" real-life histories (of Bulgarian Gypsies and Maori), the proposed effective rate of 0.00069/locus/generation (which is 3.6x lower than the observed germline rate) has been used indiscriminately in the literature for all kinds of populations and all kinds of statistics (both ASD, and Variance, and even ρ as in the recent paper on African pastoralists).

But, if you read Zhivotovsky, Underhill and Feldman (2006) or my two previous posts on the subject, you will realize that the effective rate depends on population history; that the 0.00069 rate is derived for constant-sized populations where haplogroups never grow to large numbers; that most interesting haplogroups that scientists date with it are so large that can't have grown under the assumptions leading to the 0.00069 rate; hence, there has been a general overestimation of Y-MRCA ages whenever the one-size-fits-all rate is used.

What is an appropriate effective mutation rate?

I have previously hinted that the appropriate effective mutation rate is much closer to the germline rate μ, i.e. the probability that a son's allele differs by 1 repeat from that of his father. Now, I present some more systematic simulations which address the issue of the rate for (i) different population growth m (=average number of sons/man according to a Poisson process) and (ii) different antiquity in generations g. As usual, I keep μ=0.0025 and average results over 10,000 simulation runs.

The following table lists g, m, Size, Var, ASD, μ/wv, μ/wa. The average number of descendants is Size. I calculate ASD using the median allele from the observed ones.

g m Size Var ASD μ/wv μ/wa
50 1.00 31 0.042 0.055 3.0 2.3
50 1.01 39 0.046 0.061 2.7 2.0
50 1.02 51 0.051 0.066 2.4 1.9
50 1.03 69 0.056 0.072 2.2 1.7
50 1.04 93 0.059 0.075 2.1 1.7
50 1.05 131 0.064 0.080 1.9 1.6
50 1.10 795 0.084 0.096 1.5 1.3
100 1.00 57 0.080 0.105 3.1 2.4
100 1.01 96 0.098 0.127 2.5 2.0
100 1.02 176 0.116 0.145 2.2 1.7
100 1.03 357 0.134 0.162 1.9 1.5
100 1.04 736 0.152 0.179 1.6 1.4
100 1.05 1595 0.167 0.193 1.5 1.3
200 1.00 109 0.155 0.200 3.2 2.5
200 1.01 337 0.226 0.276 2.2 1.8
200 1.02 1417 0.296 0.343 1.7 1.5
200 1.03 6876 0.346 0.387 1.4 1.3
200 1.04 37360 0.385 0.419 1.3 1.2
400 1.00 207 0.296 0.370 3.4 2.7
400 1.01 2744 0.577 0.655 1.7 1.5
400 1.02 76392 0.764 0.821 1.3 1.2

(Note on Sep 18: An updated table with slightly different results is found in an erratum)

As you can see, the correction factor μ/wv=3.6 is at the upper limit of these rates. For a constant-sized population (m=1), μ/wv approaches 3.6 with increasing g. But, for all (g, m) settings resulting in a fairly large haplogroup (but still smaller than observed ones), the correction factor for both ASD and Variance is less than 1.3.

A Practical Example

Suppose that a haplogroup has Size=1,000,000 men, and an ASD=0.275.

Using the Z.U.F. rate leads to a TMRCA estimate of 0.275/0.00069 = 399 generations.

Yet, for 400 generations, even if m=1, the observed ASD is 0.37 but the Size=207.

If the haplogroup did originate 400 generations ago, it should have grown at a faster rate than 1.02/generation to reach its current size; but this would have led to an ASD greater than 0.821 (last row of the table).

So, the estimate of 399 generations using the Z.U.F. rate is a gross overestimate.

Appenix A: Relationship of ln(haplogroup Size), and correction Divider (μ/wa) with m (for g=140)
Appendix B: Relationship of ln(haplogroup Size), and correction Divider (μ/wa) with g (for m=1.02)


McG said...

Consider the following scenario, which I happen to think is correct. You have a paleolithic/mesolithic start in western europe, problably in Iberia, and the northern mediterranean. Pick at time such as 12000 to 10000 BC, just as the Ice age is ending and the climate is starting to improve. I don't know exactly what exactly the mutation rate should be considering the difficulties of living then, but I would imagine it would be less than the current germline mutation rate. About 6200 to 5600 BC; the great flood occurs and significant life is lost all over western europe, especially the coast regions, which are probably where most inhabitants are? Essentially, it is a restart for the inhabitants remaining. Assume that the present day germline rate is quickly reattained and life goes on fairly benignly until now. What would the average effective mutation rate be for that scenario?? It would meet your criterion for some of the recent events that you time - since they have enjoyed that same rate. But for anything prior to 6000 BC, it would be estimated as much shorter in time back? Your thoughts would be greatly appreciated.

terryt said...

"About 6200 to 5600 BC; the great flood occurs". What great flood are you refering to?

McG said...

This the mythical biblical flood. However, I don't think it is a myth. The last glacier lakes over northern canada burst about 8200BP to 7600 BP and emptied 700 miles into the Atlantic in a series of bursts of water at a period of 1 to 1.5 days per burst (have you ever emptied a 5 gallon carboy; its not laminar flow.) This triggered a huge underwater Tsunami off the coast of Norway which inundated doggerland and the east coast of the british isles. In the Mediterranean, Mt.Etna lost a huge amount of soil and another Tsunami occurred, this may have precipitated the opening of the straits of Bosporus and the flooding with saline water of the black sea?? The current paper Dienekes refers to, Aug 12, again references a complete obliteration of the mesolithic settlements on Cyprus. I can understand why?

terryt said...

Mcg. "The current paper Dienekes refers to, Aug 12, again references a complete obliteration of the mesolithic settlements on Cyprus".

The only relevant comment I can find is, "there appears to have been a sharp decline in Late Mesolithic population levels". Not extinction. The decline can be attributed to the fact that most population expansion are followed by decline once the easily exploitable resources are extinguished after the initial population growth. Population then falls until a new technology is either invented or introduced.

Is there any actual evidence for such a tsunami as you describe? And anyway the statement, "significant life is lost all over western europe" is hardly likely to be true even if such a tsunami occurred.

You mentioned, "it is a restart for the inhabitants remaining". I thought the biblical flood wiped out everyone except for a single family.

I'd be fairly sure the biblical flood is mythical. Probably a fisherman's type exaggeration of a localised Mesopotamian flood which stranded people on the natural rise at Eridu. After all fish was eaten during ritual meals at the first temple there.