September 08, 2012

f-statistics are robust to differences in sample age

In a previous posts I raised the possibility that ancient DNA samples might appear admixed only on account of their greater age, which would make them appear genetically closer to an outgroup population by the mere fact of having fewer generations (less drift) separating them.

For example consider the following figure, in which I've made all three populations have different ages:



Now, it is true that in terms of sequence divergence divergence(B, C) will be less than divergence(A,C), simply because of the fact that A has had more time to "evolve" away from C, or x is greater than y.

But, if we look at the f statistics, no signal of admixture will appear.

Let's take the f3 statistic which is a scaled version of the correlation of allele frequencies (B-A)*(B-C), if we are investigating whether population B is admixed or not. The frequency difference B-A is proportional to the path x+y and the frequency difference B-C is proportional to the path y+z+w. So, the correlation is between x and z+w, which represent independent periods of drift and hence the expected value of the correlation will be zero.

The same is true for the f4 statistics, in which we have to add another outgroup D:


Now, we are interested in correlations of the form (B-A)*(C-D). Again, B-A is proportional to x+y and C-D is proportional to w+m+n. Since these again represent independent periods of drift, the correlation has an expected value of 0.

So, I am now convinced that the f statistics are robust to sample age. B may have sequence that is very close to C (if z+w+z is very small compared to x), but the f statistics will not show him to be a mix of A and C.

And, this is very good, because it means we may have one less thing to worry about when dealing with ancient DNA samples.

No comments: