When we sample two alleles from a population, we generally don't know when their common ancestor lived. The common ancestor of alleles a and b may have lived 100 generations ago, and the common ancestor of alleles a and c may have lived 200 generations ago.
The Interclade method circumvents this problem by cleverly exploiting the Y-chromosome phylogeny: alleles found in two different haplogroups (neither of which is a subgroup of the other) coalesce to precisely one man, the common ancestor of these two haplogroups.1
For example an allele sampled in a haplogroup J1 man and allele sampled in a haplogroup J2 man coalesce to the unique common ancestor of haplogroups J1 and J2.2
The Interclade method is encapsulated in the following formula:
where x, y are alleles sampled from the two groups A and B and NA, NB represent the number of different alleles in the two groups. μ is the mutation rate at the locus in consideration, and g is the number of generations that have elapsed since the common ancestor of groups A and B. This equation leads to an estimation of g by dividing the left-hand side by 2μ.
In this post, I will examine the properties of this estimator. My results are averaged over 10,000 simulations for each reported number. Men have sons according to a Poisson process with parameter m. The two groups are created by first generating two independent founders who lived g-g' generations after the common ancestor; thus these founders lived g' before the present. Subsequently their descendants in the present-time are collected. In my simulations, I will keep g=100, and vary m, a parameter regulating how fast haplogroups grow and g' the antiquity of the two groups.3
The following table shows m, g', the average age estimate, and the average error |age estimate-100|.
|m||g'||Estimated Age||Estimate Error|
The Interclade method is bias free, a very attractive property, since its average performance does not depend on how recent the two groups are, or what kinds of population expansion they experienced.
Its error is dependent on population history (m) and the antiquity of the two groups (g'). It is minimized when the two groups were founded soon after their common ancestor and then expanded at a fast rate.
The average error is substantial, but the estimator will be used in practice over many STR loci. The residual error of its age estimation will be entirely due to our ignorance of (i) generation length, (ii) precise germline mutation rates, and (iii) the mutation process in general.
1Ignoring, of course, as is commonly done, stochasticity in generation size.
2This common ancestor was a J man but not necessarily the ancestor of all J men, since there are also J*(xJ1,J2) men in the world, i.e. men who belong to J but neither in J1 nor in J2.
3In general the two groups will coalesce to different ages, but the assumption that they coalesce to g' allows us to investigate how their antiquity affects the estimator.