For example consider the following figure, in which I've made all three populations have different ages:
Now, it is true that in terms of sequence divergence divergence(B, C) will be less than divergence(A,C), simply because of the fact that A has had more time to "evolve" away from C, or x is greater than y.
But, if we look at the f statistics, no signal of admixture will appear.
Let's take the f3 statistic which is a scaled version of the correlation of allele frequencies (B-A)*(B-C), if we are investigating whether population B is admixed or not. The frequency difference B-A is proportional to the path x+y and the frequency difference B-C is proportional to the path y+z+w. So, the correlation is between x and z+w, which represent independent periods of drift and hence the expected value of the correlation will be zero.
The same is true for the f4 statistics, in which we have to add another outgroup D:
Now, we are interested in correlations of the form (B-A)*(C-D). Again, B-A is proportional to x+y and C-D is proportional to w+m+n. Since these again represent independent periods of drift, the correlation has an expected value of 0.
So, I am now convinced that the f statistics are robust to sample age. B may have sequence that is very close to C (if z+w+z is very small compared to x), but the f statistics will not show him to be a mix of A and C.
And, this is very good, because it means we may have one less thing to worry about when dealing with ancient DNA samples.
No comments:
Post a Comment
Stay on topic. Be polite. Use facts and arguments. Be Brief. Do not post back to back comments in the same thread, unless you absolutely have to. Don't quote excessively. Google before you ask.