September 18, 2008

Erratum on my Y-STR variance work (which, surprisingly further supports its central thesis)

I realized today that there was a central error in my assumptions today on the entire Y-STR series. This error related to the condition I used to detect whether a man was also a Most Recent Common Ancestor:

Lineages start with one man, and over the generations grow (or shrink) in size as men have (or fail to have) sons. Previously, I considered that there was a new MRCA in the lineage if the number of descendants in one generation was reduced to 1 man.

So, for example let these be the number of patrilineal descendants in the first few generations:

Generation # of descendants
0 1 (Patriarch)
1 3
2 4
3 6
4 3
5 1
6 3
7 2


I concluded that the "Patriarch" (at generation 0) was not the MRCA (correct) and that the new MRCA was the single surviving man in generation 5 (not necessarily correct).

It is true that at generation 5, the man becomes the new MRCA, since no other patrilineal cousins of his exist at that time.

BUT, it is also possible for e.g., one of the 3 men of generation 6 to be an MRCA. All it takes is for the next generation (7) to be produced entirely by him.

So: the single survivor at generation 5 has three sons in generation 6, thus he is temporarily the MRCA. However, only one of his three sons produces the two men of generation 7, and thus this son becomes the new MRCA.

In retrospect this is an obvious mistake to make, and is the result of using a sufficient condition for an MRCA (the lineage reduced to one man) which is not however necessary for a man to be an MRCA.

What are the consequences?

I will be issuing a bug fix for the Y-chromosome Microsatellite Genealogy Simulator over the next few days. But, it's interesting to see how this error affects the story I've been defending throughout the posts in the Y-STR series, namely that Y-STR variance accumulates at near the germline mutation rate for large observable present-day haplogroups.

For any particular lineage, the MRCA I calculated so far was no younger, and sometimes older than the real one. Consequently, Y-STR variance has accumulated at an even faster rate since the time of the real MRCA, even closer to the germline rate.

At this point I am not in a position of determining how big this effect will be, and I will repeat and report some of my earlier experiments. A few exploratory runs so far, have revealed a small but not insignificant increase in the effective mutation rate.

Stay tuned...

UPDATE #1

I carried out some experiments to see how different in age are the previously calculated "MRCA" and the real MRCA. To differentiate between the two, I will call the previous "MRCA" as a "Near Extinction" event, where a lineage is reduced to one man in one generation. The real MRCA is simply a man of the lineage who produces the entire next generation.

As I mentioned in the very first post of this series, at the MRCA, variance is reset to 0; thus variance is of no use to determine the length of time that has elapsed between the Patriarch and the MRCA. This time can be estimated, however, from demographic considerations.

g m TMRCA s.d TNearExtinct s.d
50 1 40 16 43 13
50 1.05 44 10 46 9
100 1 86 26 91 22
100 1.05 95 10 97 7

It can be seen that (i) the TMRCA is fairly close to the TNearExtinct, and they are both fairly close to the Patriarch, whose time is g. This closeness (in relative terms) increases as both g and m (the average number of sons/man) increase. Both TMRCA and TNearExtinct can be quite variable, as evidenced by their respective s.d.'s, but this variability decreases as m increases.

UPDATE #2

I have repeated my experiments on the effective mutation rate. You can compare the results directly, but it's obvious that the discrepancy exists somewhat only for m=1 and the results are almost identical otherwise. Thus, the problem identified in this erratum does not appear to affect by much all my past inferences about TMRCA and its relationship to Y-STR variance.

g m Size Var ASD μ/wv μ/wa
50 1.00 32 0.043 0.058 2.9 2.2
50 1.01 41 0.047 0.062 2.6 2.0
50 1.02 53 0.052 0.068 2.4 1.8
50 1.03 72 0.057 0.072 2.2 1.7
50 1.04 98 0.061 0.077 2.1 1.6
50 1.05 135 0.065 0.080 1.9 1.6
50 1.10 813 0.086 0.099 1.5 1.3
100 1.00 58 0.081 0.106 3.1 2.4
100 1.01 99 0.097 0.123 2.6 2.0
100 1.02 185 0.118 0.147 2.1 1.7
100 1.03 360 0.135 0.163 1.9 1.5
100 1.04 757 0.153 0.180 1.6 1.4
50 1.00 31 0.043 0.058 2.9 2.2
50 1.01 41 0.048 0.063 2.6 2.0
50 1.02 54 0.052 0.067 2.4 1.9
50 1.03 73 0.058 0.074 2.2 1.7
50 1.04 97 0.064 0.081 2.0 1.5
50 1.05 134 0.065 0.081 1.9 1.5
50 1.10 813 0.085 0.098 1.5 1.3
100 1.00 58 0.082 0.108 3.0 2.3
100 1.01 99 0.101 0.130 2.5 1.9
100 1.02 184 0.120 0.151 2.1 1.7
100 1.03 362 0.135 0.163 1.9 1.5
100 1.04 749 0.154 0.181 1.6 1.4
100 1.05 1593 0.168 0.193 1.5 1.3
200 1.00 111 0.162 0.208 3.1 2.4
200 1.01 347 0.228 0.280 2.2 1.8
200 1.02 1443 0.293 0.342 1.7 1.5
200 1.03 7132 0.350 0.391 1.4 1.3
200 1.04 38123 0.387 0.421 1.3 1.2
400 1.00 212 0.300 0.377 3.3 2.7
400 1.01 2767 0.589 0.670 1.7 1.5
400 1.02 78149 0.768 0.825 1.3 1.2

2 comments:

McG said...

Some comments on TMRCA. 1.In doing TMRCA analysis, you converge to a haplotype, who had that haplotype is the question. I believe new lineages may occur whenever a mutation occurs (read that tree line), but the common characteristic of every person in the oldest tree is that they all possess that same first mutation. Clan Gregor is a good example; all "genetic" MacGregors have a 10 at 385a, which is the original mutation. Other mutations occur as the number of descendants grow but they all have that same mutation. Surprisingly, based on analysis, the current clan chieftain has the same haplotype the Patriarch had!!! Now another example, the Kerchner family. Charles claims Frederick is the TMRCA for his family. When I do a TMRCA analysis I converge to his father Adam, the math is saying to me that Adam had the mutation, had one son who had the mutation and then Frederick had three sons and the line became a tree. We may be saying the same thing, but I claim that convergence is to a haplotype and the first person who had that mutation??

McG said...

I have been studying the relationship between the Chambers mutation rates and the ZUL derived rates I use. One observation is that in most R1b data sets I have observed the number of mutations at 426 and 388 are similar and my rates for the two are within about 10%. Chandler has a 3:1 ratio for 388 over 426. The only way I can make sense of that is that, as he says, he used multiple Hg's, including I. As you have admitted 388 varies across haplogroups. My point is , I have never read an in depth analysis of Chambers technique for extracting data. How did he exclude father/son samples and cousins samples which will increase rates??? So my basic question is: how accurate are Chambers germline rates. As you know ZUL's approach has been subjected to much scrutiny.