The story so far
In my previous post I showed how the "evolutionary rate" of Zhivotovsky, Underhill, and Feldman (2006) is inappropriate for TMRCA calculations, because:
- It is not calculated from the time depth of the MRCA, but of an earlier "Patriarch"; more importantly:
- It is an average over many simulated haplogroups of small size, and not the kinds of haplogroups one is usually interested in dating in population studies
How big are the haplogroups in Z.U.F.-type simulations?
Z.U.F. consider several different demographic models, differing in their choice of m, the population growth constant. The population size increases (stochastically) on average by 100(1-m)% every generation.
I produce N=10,000 simulations for each reported number. These are the average, and maximum number of descendants over these N simulations.
Constant population size (m=1)
Under this assumption, haplogroup size grows purely due to randomness of the fathering process; there is no overall population growth. This is an important case, because the 3.6x slower evolutionary rate has been derived from it.
|Number of Descendants|
It is clear, that this type of simulation produces very small haplogroup sizes. Even for 320 generations (early Neolithic for Greece) the very largest haplogroup produced had 1,310 descendants, while the average one had the theoretically predicted ~160.
Small haplogroups => more drift => loss of variance => lower "effective" mutation rate.
So, as I mentioned in my previous post, to calculate the 3.6x slower rate, not only do we average over haplogroups of all sizes, small and large alike, but we are actually missing the relevant observations. But more on this, in the next section.
Expanding population (m=1.01)
|Number of Descendants|
Predictably, haplogroups end up bigger in an expanding population, but still far short of the sizes of commonly dated real-world haplogroups. The case of m=1.01 is important, because it is the one which yields the maximum effective mutation rate considered by Z.U.F. assuming haplogroups start with one individual.
Thus, even the highest mutation rate considered by Z.U.F (about 0.55μ over 400 generations) is derived by averaging over haplogroups that are unrealistic (too small). Real Y-STR variance accumulates at a higher rate in the real world.
Why are Z.U.F.-style simulated haplogroups so small?
It is surprising that these simulated haplogroups end up so small, looking nothing like commonly studied haplogroups even for an expanding population.
The apparent mystery is resolved, once we realize that m is nothing more than the average number of sons a man has. The reason why we see haplogroups so much bigger than the simulated ones is because for individual men, m may be much more, or much less than its population average. In other words, there is reproductive inequality, which could be due both to social advantage, or to natural selection.
So, rather than having a uniform m for all men, we can allow m to vary in individual lineages. A man A may have mA<m if he is impoverished or has a faulty Y-chromosome gene, and he may have mA>m if he is a ruler or has an advantageous gene in his Y-chromosome.
The advantage could be slight but long-standing (a small fitness improvement) or small and intense (a conquest or foundation of a dynasty). Its effect on the lucky lineage is an increase in the number of descendants. Its effect on Y-STR variance is a rate of increase approaching the germline rate.
It is clear, by now, that realistic haplogroup sizes can occur only when there is reproductive inequality. They are not the result of genetic drift, but of natural or social selection. And, effective mutation rates should be calculated over successful haplogroups under conditions of reproductive inequality, and not over all haplogroups under conditions of reproductive equality.(*)
A note on sampling
Consider a lineage of 1,000 men (i.e. ~ the maximum produced with reproductive equality) in a population of 1,000,000 men. Its frequency is thus 0.1%
We take a sample of 1,000 men from this population; this is much larger sample than is typically used in population studies, and for a smaller population. We expect on average to find just 1 man from the lineage in question in our sample. You can't do a variance-based age estimate with one man!
Thus, it becomes clear why haplogroups produced by Z.U.F.-style simulations are uninteresting. You just never encounter enough representatives from them in a real population study. You are typically interested in the much larger haplogroups, which could only have proliferated under conditions of reproductive inequality, and which are the only ones that can yield enough representatives in a sample to allow for a variance calculation.
In the previous post I showed that Z.U.F. calculate their effective rate over all simulated observations, but the rate is applied in the literature over a very specific set of observations, i.e. large haplogroups.
In this post, I showed that Z.U.F.-style simulation just don't produce realistic haplogroup sizes. Drift alone can't explain why millions of men share patrilineal ancestry. Large haplogroup sizes require an assumption of reproductive inequality, and Y-STR variance within them accumulates near the germline rate.
(*) Of course, if one studies numerically small populations, it is possible that a slower effective rate may be desired. My concern is with the large human populations (e.g. Greeks or Indians) where real haplogroup sizes exceed greatly those produced by simulations with reproductive equality.
UPDATE (August 8): Continued in On the effective mutation rate for Y-STR variance