June 25, 2011

Interpretation of ADMIXTURE results: component sharing

I had previously issued a note of caution on admixture estimates. In the present, I will touch upon another subject, namely, what does it mean when two populations share an inferred ancestral component?

It is a common tendency to think in terms of gene flow from the population where the component occurs at a high fraction (say 50%), towards the one where it occurs at a low one (say 10%). But, in reality, component sharing has four possible interpretations:
  1. Gene flow from the high-fraction to the low-fraction group
  2. Gene flow from the low-fraction to the high fraction group
  3. Gene flow from an unsampled third group to both
  4. Common ancestry of the two groups without substantial gene flow after the initial divergence
I will illustrate each of these cases with a simple example. In each one of them we know roughly what happened.

1. Gene flow from the high-fraction to the low-fraction group
In the first example, we have three populations: it appears that the population in the middle is admixed, and is composed of a minority element from the one on the left (light grey) and a majority element from the one on the right (dark grey). Indeed, this is what happened, and the middle population (African Americans, ASW) is a mix of white Americans (CEU), and West Africans (YRI).

2. Gene flow from the low-fraction to the high fraction group
This is much like the previous figure, where it appears that the middle group is admixed, while the left and right ones are unadmixed. In reality, the middle group are Anatolian Turks, the left one are Sardinians, and the right one are Gujarati Indians, and the explanation that the former are the result of admixture between the two groups is much less plausible than the alternative that both Sardinians and Gujarati Indians have experienced gene flow from Anatolia, due, e.g., to the spread of the Neolithic economy.

Note, also, that this does not exclude the possibility that some gene flow from Western Europe and South Asia did take place! However, one would be remiss if they interpreted the observed pattern as gene flow from the high fraction groups (Sardinians and Gujaratis) to the low fraction one (Anatolian Turks) and not the opposite.

3. Gene flow from an unsampled third group to both
Here it appears that some individuals from the light grey population have admixture from the dark grey one, and many individuals from the dark grey one have admixture from the light grey one. The two populations are actually Iranians and Ethiopians, and the observed pattern does not necessarily indicate the migration of Persians to Ethiopia or Ethiopians to Persia (although that might have taken place!), but is probably mediated by the geographically intermediate Arabians. Adding the Saudis (right), we obtain the following:
Notice how the "Iranian" component largely disappears from the Ethiopians, and is replaced by the component modal in Saudis. One could, indeed, extend the above, by adding even more groups that may be confounding results, e.g., South Asians in the case of Iranians, or Sub-Saharan Africans in the case of Ethiopians. That is why it's important to sample as broadly as possible, and to include populations bordering one's region of interest.

4. Common ancestry of the two groups without substantial gene flow after the initial divergence
Once again, it appears that there are two relatively "pure" groups and an admixed one, but, in reality, the three groups are Russians, Selkups, and Tongans. It is extremely unlikely that the Selkups from Central Siberia and the Tongans from the South Pacific experienced direct gene flow; the Tongans are believed to be a mix of East Asian-like and Papuan-like people who colonized the Pacific from Southeast Asia and Near Oceania, and any relationship that they have with the Selkups is due to deep common ancestry, rather than more recent gene flow.

Bonus: Lack of visible admixture is not lack of admixture
These three populations, except for three individuals, appear not to share any ancestral components. They are in fact Cambodians, Papuans, and Tongans, and the Tongan population did not appear out of thin air, but is actually derived from Southeast Asia and Near Oceania, from ancestors similar to Cambodians and Papuans.

A good way to see this, is to reduce K=2, which reveals that Tongans are predominantly Cambodian-like, but differ from Cambodians by having some "Papuan" admixture.
Epilogue

I have used ADMIXTURE for months now, and I consider it one of the three best pieces of code a genome blogger may employ (the other being MCLUST, as used in the Galore approach, and, of course, the indispensable PLINK).

ADMIXTURE reveals common ancestral elements in populations, but the interpretation of these elements has to be done with caution:
  • Use common sense and background knowledge
  • Notice the Fst divergences between components, as these constrain their deep relationships
  • Experiment, experiment, experiment with your data
I invite reads to try their hand at interpreting the new results of the Dodecad v3 platform. Thanks to my ideas of using "zombies", converting unsupervised to supervised ADMIXTURE runs, and using framing populations, it is now possible to estimate ancestral components in a very large number of populations with the exact same measuring instrument.

I will probably extend this to not only the ~140 populations with the full set of markers used in Dodecad v3, but also to 100+ more with a smaller number of markers, as well as all unrelated Project participants, encompassing thousands of individuals. So, I am looking forward to hearing peoples' theories on how to interpret the evidence, and, hopefully, the notes of caution in this and my previous posts will be helpful in doing so.

4 comments:

Unknown said...

I have a problem with scenario 2.

If Anatolia was the source for Gujarati and Sardinians I would expect that the Gujarati and Sardininian would contain both BOTH of the Anatolian components. Assuming that Anatolia was not composed of two distinct and completely separate ethnicities.

A more likely scenario is that Sardinian is derived from a different population (eg a people that travelled north of the Black Sea). Some of this population made it to Anatolia. And the Gujarati are predominantly from another population, some of which spread to Anatolia.

Alternatively A pair of populations are related(eg Anatolian ancestral to Sardinian), and Anatolia has later received a big admixture from the third population.

Dienekes said...

If Anatolia was the source for Gujarati and Sardinians I would expect that the Gujarati and Sardininian would contain both BOTH of the Anatolian components. Assuming that Anatolia was not composed of two distinct and completely separate ethnicities.

There are no "Anatolian components" in example 2. The point of #2 is to show that a population that appears admixed from two other populations that appear pure is not necessarily so.

A more likely scenario is that Sardinian is derived from a different population (eg a people that travelled north of the Black Sea).

That scenario is not likely at all. According to my latest results, Sardinians appear to be "Mediterranean" to a substantial degree, and also "West European", a component which is closely related to the "West Asian" component, with the remainder made up of "West Asian", "Southwest Asian" and "Northwest African" components. They are in fact one of a few populations with absolutely 0% "Eastern European".

http://dodecad.blogspot.com/2011/06/dodecad-v3-population-averages.html

Fanty said...

Is such a cenario possible:

Populations A,B,C

B is a population that is 50% of A and 50% of C

But ADMIXTURE chosed for whatever reasons B as the anchor population.

It now suggests A and C both having 50% admixture with the "B Component"

This again leads to the false guess, that A and C must be 50% euqal, because they share 50% "B-Component". But in fact, they dont share anything at all, because each ones "B-Component" is their own DNA in the B population.

Dienekes said...

With K=2, that scenario is unlikely; B will most likely be seen as admixed, because the "benefit" (in terms of likelihood) of having a component anchored at B will be outweighes by the penalty of having A and C share another component at 50%.

At K=3, however, it is possible that A and C may have their own component and also admixture from a component anchored at B. In fact, this often happens when one uses populations that generate population-specific components (e.g. Druze or Kalash).

With v3 I have sought to avoid all such components, with the possible exception of the NW African that is strongly anchored on the Mozabites, but that was unavoidable due to a lack of populations with appropriate marker coverage.