In a previous post I calculated f3-statistics between my K=7 and K=12 ancestral components. The basic idea is to discover which component A can be seen as a mixture of two other components, B and C, in which case (assuming A does not have excessive drift), we expect a negative f3(A; B, C) statistic.
As part of my analysis of the world dataset, I calculated f3-statistics for each of the K=3 to K=12, that is, for some K, I tried to see if one of the K inferred components could be seen as a mixture of the remaining K-1. It turns out that no negative f3 statistics appeared at all, and this suggests that the components inferred by ADMIXTURE at each K tend to form an "orthogonal" set that are not mixtures of each other.
More generally, we can calculate f3 statistics where A, B, and C are components inferred from any of the K=3 to K=12 runs. There is a total of 75 such components, and hence 75*(74 choose 2) = 202,575 such f3 statistics. Since calculating these would take a while (and would become intractable as K increases further), I decided to calculate pairwise f3 statistics, i.e., statistics where A, B, and C are constrained to be from successive K, K+1 runs. The significant results can be seen in the spreadsheet.
It might be worthwhile to develop an automated way of using these statistics to guide us in the interpretation of ADMIXTURE components. But, they are useful, in any case, as a source of information.
For example, consider the following (the third column represents the mixed population):
Atlantic_Baltic_6/globe6_Z Near_East_6/globe6_Z European_5/globe5_Z -0.013911 0.000084 -166.457
This means that the European component at K=5 can be seen as a mix of the Atlantic_Baltic and Near_East components at K=6. So, this suggests that the European component can be seen as "secondary", the product of admixture. But:
European_5/globe5_Z Amerindian_5/globe5_Z Atlantic_Baltic_6/globe6_Z -0.003964 0.000175 -22.588
This indicates conversely that the Atlantic_Baltic at K=6 component can be seen as a mix of the European and Amerindian components at K=6.
It would be very interesting to use f-statistics to guide one in the choice of an "orthogonal" set of ancestral populations, or to summarize the relationships between them in tree or network form. One could potentially use my ADMIXTURE to TreeMix script to do something like this, although as K increases, there is a combinatorial explosion in the total number of components with a probable runtime slowdown/memory usage blowup which might render this approach unusable, at least for large K.
It would be very interesting to use f-statistics to guide one in the choice of an "orthogonal" set of ancestral populations, or to summarize the relationships between them in tree or network form.
ReplyDeleteVery interesting. It'd like to see you flesh these ideas out more. I was wondering if something like that would be possible, thinking along the lines of a "If subsequent components show up as mixtures of previous components, then "force" the program to keep the previous components, otherwise new components win" which is probably not quite what you mean, and might not technically work.
Also, the following seemed of note to me when I looked at the spreadsheet and I can't recall having seen it explicitly said before,
Of the three West Eurasian components, at K=10 through K=12:
Atlantic_Baltic and West_Asian (the two closest components in the analysis) seem basically identical in their relationship to the non-West Eurasian components (as reflected by Fst).
Assuming these all map to real ancient populations, it seems like that would be explained by the Atlantic_Baltic and West_Asian populations being one population while all their major admixture events have gone on, then drifting away from one another and the rest of the world with no further admixture at about the same rates.
(Btw, this seems to me to support the idea that the ancient West_Asian population (if WA ever did map to an existing population, be they Indo-Europeans or what may) was somewhat more like those populations who today are extremely high in Atlantic_Baltic (e.g. Lithuanians, Norwegians, Belorussians) than we would think when look at present day populations who are high in West_Asian, like Georgians and Balochis (these populations also having a good whack of the more West_Asian distal Southern or South Asian components, as much as these pops would make a better proxy than, say, Lithuanians.)
While Southern has elevated differences everywhere compared to Atlantic_Baltic & West_Asian, presumably reflecting greater drift, but this is least so with Africans rather than others (presumably reflecting either differential mixture with Africans in Southern or the East-Eurasian/Oceanian clade in Atlantic_Baltic & West_Asian).