I want to expand on a theme I touched upon briefly in a previous post: the importance of choosing appropriate parental populations in admixture analyses.
I will first show empirically the impact of this choice to the admixture proportions. Then, I will deal with a special and difficult cases: the Indian Cline.
The not so easy case of Mexican Mestizos
Mexican Mestizos are a tri-hybrid population composed of European, Native American, and West African elements. These elements began interbreeding only in the last half millennium or so, and, hence, the process occurred in historical time.
Consider a sample of 25 Mexicans from the HapMap and 25 Yoruba from the Hapmap, 25 Iberian Spanish from the 1000 Genomes Project, and 14 Pima from the HGDP as parental populations. We obtain for our Mexican sample:
- 59.7% European
- 36.9% "Native American"
- 3.4% African
- 49.9% European
- 47.3% "Native American"
- 2.8% African
- 70% "Native American"
- 29.7% European
- 0.4% African
The "Native American" component has increased again! The explanation is simple: as we exclude less admixed Native American groups, Mexicans appear (comparatively) more Native American. The "Native American pole" has shifted, and so has the relative position of populations between them.
In other terms, what is labeled "Native American" in the three experiments is not the same: in the first one it is anchored on the more unadmixed Pima, in the last one in the more admixed Mexicans.
A color analogy is apt: imagine you had white and black paint, and you wanted to achieve a medium grey hue: you could mix equal parts white and black (1/2 each) to achieve this. Now, imagine that instead of white paint, you had a light grey hue. You would now have to mix greater amounts of light grey (more than 1/2) to achieve the same medium hue.
- If you are going to study admixture, you'd better find unadmixed representatives of ancestral populations.
As we will now see, this is not always possible:
No unadmixed populations: the Indian Cline
What if the process of admixture had occurred for a thousand more years and all inhabitants of the New World had acquired a generous portion of European ancestry? We would then have no unadmixed native populations to use in the estimation of admixture proportions.
This is, in essence, the problem that Reich et al. (2009) had to deal with in the context of India. West Eurasian-like people have been arriving to the Indian subcontinent since at least Neolithic times and until quite recently. The caste system has served to barricade gene flow to some extent, but, nonetheless, the populations of India are, today, variable mixes of West Eurasians and indigenous Indians.
Even the Andamanese Islanders had evidence of the West Eurasian-like element (which they termed Ancestral North Indian). Looking back to the Mexican example, the lack of unadmixed reference populations would inflate estimates of native ancestry.
To see whether this is the case, I took the 18 populations of the Indian Cline described by Reich et al. (2009) together with 25 Europeans from HapMap CEU and ran ADMIXTURE over the set. Below you can see the comparison between the "West Eurasian" component of ADMIXTURE and the Ancestral North Indian:
The cline is preserved in both representations, but the right column has smaller numbers than the left one, confirming our intuition about the use of admixed populations.
Below is a scatterplot of the two columns, with the regression equation on the chart:
The high R2 value suggests that two techniques are measuring the same underlying reality, but ADMIXTURE produces lower West Eurasian admixture (by about 38%) over the technique of Reich et al. (2009). Indeed, this is what we expect, as Reich et al. (2009) assign 38.8% ANI ancestry in the "most indigenous" group (the Mala) along the cline.
The position of populations along the cline is roughly the same, but the two sets of admixture proportions are shifted by about 38% with respect to each other.
(Reich et al. (2009) removed 8 individuals from their dataset as well as 7 Pathans and 14 Sindhis as outliers. I used the recommendations of Rosenberg with respect to the Pathans and Sindhis, using his H971 set and kept all the Indian individuals of Reich et al. (2009). As can be seen, the slightly different datasets did not largely affect the correlation between admixture proportions)
Reich et al. (2009) were able to infer the existence of ANI ancestry even in the most "indigenous" of Indian populations by exploiting the simple structure of the problem, namely:
- Admixture occurred between only 2 ancestral groups
- The 2 groups were related to extant human populations that are not part of the cline: CEU and Adygei for ANI and Onge for ASI
- There was treelike evolution of all studied groups except for the ANI-ASI admixture event
It is a beautiful result that showed that there are cases where the extent admixture can be inferred even in the absence of unadmixed populations representative of involved populations.
Much more can be said on this issue, but let's summarize a couple of lessons:
- The full extent of an admixture cline can be captured only if unadmixed populations on either side of the cline exist. Use as many populations as possible to capture the full extent of an admixture cline.
- Use of an admixed population in lieu of an unadmixed native one inflates the inferred native component. Use native populations if possible instead of admixed ones .
- Even in the absence of unadmixed native populations, it is sometimes possible to reconstruct the admixture proportions as per Reich et al. (2009).
PS: The substantial correlation between the ANI-ASI populations of Reich et al. (2009) and of the "West Eurasian"-"South Asian" ones in K=2 ADMIXTURE analysis makes it possible to infer a person's ANI-ASI proportions from their ADMIXTURE results. Dodecad Project members of South Asian heritage should keep an eye on the Dodecad Project blog for that type of inference.