In a previous post I calculated f3-statistics between my K=7 and K=12 ancestral components. The basic idea is to discover which component A can be seen as a mixture of two other components, B and C, in which case (assuming A does not have excessive drift), we expect a negative f3(A; B, C) statistic.
As part of my analysis of the world dataset, I calculated f3-statistics for each of the K=3 to K=12, that is, for some K, I tried to see if one of the K inferred components could be seen as a mixture of the remaining K-1. It turns out that no negative f3 statistics appeared at all, and this suggests that the components inferred by ADMIXTURE at each K tend to form an "orthogonal" set that are not mixtures of each other.
More generally, we can calculate f3 statistics where A, B, and C are components inferred from any of the K=3 to K=12 runs. There is a total of 75 such components, and hence 75*(74 choose 2) = 202,575 such f3 statistics. Since calculating these would take a while (and would become intractable as K increases further), I decided to calculate pairwise f3 statistics, i.e., statistics where A, B, and C are constrained to be from successive K, K+1 runs. The significant results can be seen in the spreadsheet.
It might be worthwhile to develop an automated way of using these statistics to guide us in the interpretation of ADMIXTURE components. But, they are useful, in any case, as a source of information.
For example, consider the following (the third column represents the mixed population):
Atlantic_Baltic_6/globe6_Z Near_East_6/globe6_Z European_5/globe5_Z -0.013911 0.000084 -166.457
This means that the European component at K=5 can be seen as a mix of the Atlantic_Baltic and Near_East components at K=6. So, this suggests that the European component can be seen as "secondary", the product of admixture. But:
European_5/globe5_Z Amerindian_5/globe5_Z Atlantic_Baltic_6/globe6_Z -0.003964 0.000175 -22.588
This indicates conversely that the Atlantic_Baltic at K=6 component can be seen as a mix of the European and Amerindian components at K=6.
It would be very interesting to use f-statistics to guide one in the choice of an "orthogonal" set of ancestral populations, or to summarize the relationships between them in tree or network form. One could potentially use my ADMIXTURE to TreeMix script to do something like this, although as K increases, there is a combinatorial explosion in the total number of components with a probable runtime slowdown/memory usage blowup which might render this approach unusable, at least for large K.