I had previously issued a note of caution on admixture estimates. In the present, I will touch upon another subject, namely, what does it mean when two populations share an inferred ancestral component?
It is a common tendency to think in terms of gene flow from the population where the component occurs at a high fraction (say 50%), towards the one where it occurs at a low one (say 10%). But, in reality, component sharing has four possible interpretations:
- Gene flow from the high-fraction to the low-fraction group
- Gene flow from the low-fraction to the high fraction group
- Gene flow from an unsampled third group to both
- Common ancestry of the two groups without substantial gene flow after the initial divergence
1. Gene flow from the high-fraction to the low-fraction group
In the first example, we have three populations: it appears that the population in the middle is admixed, and is composed of a minority element from the one on the left (light grey) and a majority element from the one on the right (dark grey). Indeed, this is what happened, and the middle population (African Americans, ASW) is a mix of white Americans (CEU), and West Africans (YRI).
2. Gene flow from the low-fraction to the high fraction group
This is much like the previous figure, where it appears that the middle group is admixed, while the left and right ones are unadmixed. In reality, the middle group are Anatolian Turks, the left one are Sardinians, and the right one are Gujarati Indians, and the explanation that the former are the result of admixture between the two groups is much less plausible than the alternative that both Sardinians and Gujarati Indians have experienced gene flow from Anatolia, due, e.g., to the spread of the Neolithic economy.
Note, also, that this does not exclude the possibility that some gene flow from Western Europe and South Asia did take place! However, one would be remiss if they interpreted the observed pattern as gene flow from the high fraction groups (Sardinians and Gujaratis) to the low fraction one (Anatolian Turks) and not the opposite.
3. Gene flow from an unsampled third group to both
Here it appears that some individuals from the light grey population have admixture from the dark grey one, and many individuals from the dark grey one have admixture from the light grey one. The two populations are actually Iranians and Ethiopians, and the observed pattern does not necessarily indicate the migration of Persians to Ethiopia or Ethiopians to Persia (although that might have taken place!), but is probably mediated by the geographically intermediate Arabians. Adding the Saudis (right), we obtain the following:
Notice how the "Iranian" component largely disappears from the Ethiopians, and is replaced by the component modal in Saudis. One could, indeed, extend the above, by adding even more groups that may be confounding results, e.g., South Asians in the case of Iranians, or Sub-Saharan Africans in the case of Ethiopians. That is why it's important to sample as broadly as possible, and to include populations bordering one's region of interest.
4. Common ancestry of the two groups without substantial gene flow after the initial divergence
Once again, it appears that there are two relatively "pure" groups and an admixed one, but, in reality, the three groups are Russians, Selkups, and Tongans. It is extremely unlikely that the Selkups from Central Siberia and the Tongans from the South Pacific experienced direct gene flow; the Tongans are believed to be a mix of East Asian-like and Papuan-like people who colonized the Pacific from Southeast Asia and Near Oceania, and any relationship that they have with the Selkups is due to deep common ancestry, rather than more recent gene flow.
Bonus: Lack of visible admixture is not lack of admixture
These three populations, except for three individuals, appear not to share any ancestral components. They are in fact Cambodians, Papuans, and Tongans, and the Tongan population did not appear out of thin air, but is actually derived from Southeast Asia and Near Oceania, from ancestors similar to Cambodians and Papuans.
A good way to see this, is to reduce K=2, which reveals that Tongans are predominantly Cambodian-like, but differ from Cambodians by having some "Papuan" admixture.
I have used ADMIXTURE for months now, and I consider it one of the three best pieces of code a genome blogger may employ (the other being MCLUST, as used in the Galore approach, and, of course, the indispensable PLINK).
ADMIXTURE reveals common ancestral elements in populations, but the interpretation of these elements has to be done with caution:
- Use common sense and background knowledge
- Notice the Fst divergences between components, as these constrain their deep relationships
- Experiment, experiment, experiment with your data
I will probably extend this to not only the ~140 populations with the full set of markers used in Dodecad v3, but also to 100+ more with a smaller number of markers, as well as all unrelated Project participants, encompassing thousands of individuals. So, I am looking forward to hearing peoples' theories on how to interpret the evidence, and, hopefully, the notes of caution in this and my previous posts will be helpful in doing so.