September 03, 2012

Ancient DNA age/Mutation rate per annum, and signal of archaic admixture

UPDATE (8 Sep 2012 ): The following discussion in smallcase is now obsolete. See f-statistics are robust to differences in sample age for details.

The D-statistic takes the form:

D(H1, H2, Vindija, Chimp) = (sum of ABBA - sum of BABA) / (sum ABBA + sum BABA)

If A: chimp allele, and Vindija (the Neandertal source of the "Neandertal genome") has the derived B allele, then in sites where two individuals H1 and H2 differ, there are two possible patterns:

ABBA = H2 matches Neandertal, but H1 does not
BABA = H1 matches Neandertal, but H2 does not

If Neandertal did not contribute DNA more to H1 or to H2, then the rates at which ABBA and BABA occur are equal, and the D-statistic has an expected value of 0.

Now, consider that H1 is a living human, and H2 is one that lived X years ago. It is now not expected, that ABBA and BABA will be equal. Suppose that modern humans and Neandertals diverged Y years ago, and that Vindija is V years old. Then, H1 (the living human) is separated from Vindija by 2*Y-V years of evolution, but H2 (the ancient human) is separated by 2*Y-V-X years. It is now expected that H2 will match Neandertal more often than H1 does at any site, and, consequently, there will be an excess of ABBA over BABA, and a non-zero statistic.

It will appear that ancient genomes may appear to be archaic-admixed even if they are not, and the older they are, the more archaic-admixed they will appear to be.

There is a different complication that may arise from the fact that the mutation rate per annum may not be the same in different human populations. If H1 and H2 are both modern humans, but the mutation rate per annum in the ancestry of H2 is less than the mutation rate per annum in the ancestry of H1, then H2 will be effectively closer to an archaic hominin (such as Vindija) than H1, and will appear to be archaic-admixed relative to H1.

It is not clear whether the mutation rate per annum has been the same in the ancestry of individuals who inhabit different climate zones, tend to have different body sizes, or have different generation lengths. Table S15 of Meyer et al. (2012) may suggest that it is not:

It appears that the the San- and Yoruba-specific branches are a a little longer compared to Eurasian-specific branches. This may contribute to a signal of archaic admixture in Eurasians.

In both described cases, it remains to be seen how much of the signal of admixture might be explainable on the basis of these effects, and how much will remain intact.

Andrew Millard said...

There is one other potential bias you have not considered. H2 and Vindija are both ancient DNA and we know that diagenesis can cause systematic changes in aDNA. So any unidentified diagenetic changes are likely to be the same in H2 and Vindija, and make them appear closer than they really are. Only when we have very high coverage of the ancient genomes can we be confident of eliminating diagenetic changes from the list of SNPs.

AndyC said...

Curious -- John Hawks posted that Otzi's genome matched Neandertal more closely than any existing group. Given the expected mutation rate and the age of Otzi, would the difference be entirely accounted for by this effect?

I think this is not right, at least
for the case of no archaic admixture
and no double mutations. Consider
the derived allele frequency at the
root of (H1, H2). p say.
Then we can calculate that conditional on Vindija having derived
allele
P(ABBA) = P(BABA) = p(1-p)

The point is that "new" mutations don't effect ABBA or BABA

Dienekes said...

I'll think about it some more.

But, consider the following case:

X = living sapiens
Y = Neandertal

now:

W = ancient human on sapiens lineage almost immediately after the common ancestor of (X,Y)

W is nearly symmetrically related to X and Y, because he is separated by nearly the same number of generations from them. So, I expected

D(X,W,Y,Chimp) ~ 0.5

If I "slide down" W on the sapiens lineage towards the present, then D(X,W,Y,Chimp)->0.

If Neandertal=B and Chimp=A, then B is a new mutation on the Neandertal lineage some of the time, but most of the time it is the ancestral state for the (Neandertal, Modern) common ancestor, because that ancestor is closer to Neandertal (in generations) than to Chimp.

So, I do think that as we slide up the modern human lineage to the common ancestor, the D-statistic will show admixture even if it's only an unadmixed sample.