September 16, 2011

Latent admixture causes spurious serial founder effect

There is a series of interesting papers on Amerindian populations in the early view section of the American Journal of Physical Anthropology. One of them struck my interest, because it deals with an issue that has been a familiar topos of this blog, and has, in my opinion, much greater potential applicability than the settlement of the Americas.

The basic idea of the paper is the following: the serial founder effect (SFE) is a model, whereby populations expand by successive splits, with daughter populations expanding and colonizing new territories. It is a tree model, with the nodes furthest from the root representing late founder populations, and the ones closest to the root representing early splits close (geographically and temporally) to the initial colonization impetus.

Gene identity is the probability that two random alleles from either two individuals in a population, or from two individuals from different populations will be identical. This has been used to argue for a SFE in the Americas, because it apparently matches expectations: the most basal populations are in north America, and gene identity increases toward south America.

However, the authors of the current paper show that the observed pattern is due to European admixture in native American populations; this makes the north American populations (that are more European-admixed and hence more different than the rest) appear both more basal and more diverse.

From the paper:
Many aspects of the pattern of neutral genetic variation in the Americas are consistent with the predictions of the serial founder effects process. The NJ tree is rooted in northern North America, it shows a northsouth pattern of internal branching, and gene identity within populations increases steadily with increasing geographic distance from Beringia. However, admixture with Europeans could account for all of these features. The tree is rooted in northern North America because the gene identities between the three northern North America populations and the other Native American populations are particularly low (Fig. 2). European admixture has contributed to this low identity, and, in principle, it could account for the position of the root. The Admixture tree (Fig. 5A) topology indicates that the north-south pattern of branching in the NJ tree might be the result of relatively high admixture in northern North America, intermediate levels in Central America and northern South America, and low levels in eastern South America. The partial correlation analyses show that the north-south increase in gene identity within populations can also be explained by geographically patterned admixture (Table 2). We conclude that geographically patterned admixture between Native Americans and Europeans has obscured our ability to reconstruct precontact evolutionary processes in the Americas.
Of course, this is an extremely important piece of work that future studies of Amerindian populations must take into account. It is no longer feasible to interpret the observed gene identity pattern in the Americas as a remnant of the migration and spread of Amerindian ancestors thousands of years ago. It is more likely a result of much more recent events, namely the different intensity of European admixture in post-1492 times.

Hunley and Healy is important not only for the Americas, however. The serial founder effect has been evoked to explain both the spread of modern humans from east Africa, as well as more recent Neolithic expansions in different parts of the world. We must now be vigilant that these patterns may, in part, be the result of latent admixture.

In the Americas, we know (from historical documents) that this admixture took place, and we have relatively unadmixed populations still in existence. But, there may very well have been admixture events before the birth of history, and many ancestral populations may no longer exist in unadmixed form. So, we may be interpreting patterns of modern human variation as the result of ancient colonization processes, oblivious to the presence of latent admixture.

For example, there is an increase in gene identity from eastern Africa through Arabia, and India, all the way to Siberia, and southward across the Americas. Hunley and Healy deal with the latter part of this cline, but the whole of it has been interpreted as evidence of an orderly Out of Africa colonization as a series of founder effects.

However, the Eurasian portion of the pattern may also be spurious: current east Africans, for example, are partially admixed, both with West Eurasians and with people from other parts of the continent. Likewise, Arabians often have African admixture, whereas South Asians have been convincingly shown to be largely 2-way mixes of West Eurasians and "Ancestral South Indians". To top it all off, we now have convincing evidence that archaic admixture may have played a role in the evolution of some living Africans: this would furthermore increase their gene diversity and contribute to a perceived Eurasian cline.

Tree models are orderly and well-behaved. It would be great if people behaved that way, because the math would be easier. But, people aren't laboratory mice that follow predefined paths in a maze: they mix with their neighbors, they split and move forward, but sometimes, they split and move backward. Hopefully, H&H's paper will lead to an increased appreciation of admixture in the human story, beyond the case of the Americas.

AJPA DOI: 10.1002/ajpa.21506

The Impact of Founder Effects, Gene Flow, and European Admixture on Native American Genetic Diversity

Keith Hunley and Meghan Healy

Abstract
Recent studies have concluded that the global pattern of neutral genetic diversity in humans reflects a series of founder effects and population movements associated with our recent expansion out of Africa. In contrast, regional studies tend to emphasize the significance of more complex patterns of colonization, gene flow, and secondary population movements in shaping patterns of diversity. Our objective in this study is to examine how founder effects, gene flow, and European admixture have molded patterns of neutral genetic diversity in the Americas. Our strategy is to test the fit of a serial founder effects process to the pattern of neutral autosomal genetic variation and to examine the contribution of gene flow and European admixture to departures from fit. The genetic data consist of 678 autosomal microsatellite loci assayed by Wang and colleagues in 530 individuals in 29 widely distributed Native American populations. We find that previous evidence for serial founder effects in the Americas may be driven in part by high levels of European admixture in northern North America, intermediate levels in Central America, and low levels in eastern South America. Geographically patterned admixture may also account for previously reported genetic differences between Andean and Amazonian groups. Though admixture has obscured the precise details of precontact evolutionary processes, we find that genetic diversity is still largely hierarchically structured and that gene flow between neighboring groups has had surprisingly little impact on macrogeographic patterns of genetic diversity in the Americas.

Link

14 comments:

Andrés said...

Tree models are simple aristotelean categories as seen through XIX classification systems. They may be valid on a macroevolutive context, to show the divergence of different spieces. But I think they are unsuitable for the study of current evoution inside a single species. Admixture is always possible and trees make it as if populations just split and never touch each other again.

Andrew Oh-Willeke said...

I'm skeptical because there are quite a few uniparental genes on both the Y-DNA and mtDNA side that are found in Native American populations in North America, but not South America, that could not possibly be a result of European admixture because they are derivative of uniparental markers that are almost entirely absent from Europe and in particular from the parts of Europe that were the source of early European admixture with Native American populations.

It also begars the issue of why there isn't more latent European admixture in the Pacific coastal region of South America which had European colonial impacts long before they were present in North America.

It is certainly no surprise that Native Americans believed to be "pure blooded" in fact have some latent European admixture, but the evidence for a serial founder effect from an Asian sourced population is too robust to be overcome by this fact.

sykes.1 said...

Are the authors saying that European admixture into Canadian Indians is larger than in Mexican Indians? This is extremely hard to believe.

Anonymous said...

I'm with Andrew. Surely there is the mystery of the X2a mtdna population (Kennewickians?) as well as the red-haired Spirit Cave corpses.

terryt said...

"there are quite a few uniparental genes on both the Y-DNA and mtDNA side that are found in Native American populations in North America, but not South America, that could not possibly be a result of European admixture because they are derivative of uniparental markers that are almost entirely absent from Europe"

I've recently become interested in American haplogroups through discussion with German, so I'm interested in what haplogroups those are. Can you elaborate please?

German Dziebel said...

@Terry

What Andrew wrote makes no sense. Overall, mtDNA N lineages are more frequent in NA than in SA. Out of them, hg X is exclusive to NA. MtDNA M lineages are more frequent in SA. There're no major mtDNA lineages that are only found in SA and not in NA, although there're Sa-specific sublineages. As far as Y-DNA is concerned, both hgs Q and C are found in NA and in SA. Hg C is more frequent in NA, hg Q is more frequent in SA, but hg C IS present in SA. They just recently detected hg C in Ecuador. http://www.ncbi.nlm.nih.gov/pubmed/20932815.

El Lurker said...

Andrew, Pacific coastal South America had millions of inhabitants, the natives there are less admixed than Mexicans because less Spaniards reached Peru, it was easier to emigrate to Mexico (it is easier to just cross the atlantic than crossing the atlantic, crossing the panama isthmus on a mule, and then going from Panama to Lima on a ship)

And also, the natives of Peru and Bolivia survived better because of the reason mentioned before, there were less Spaniards to spread their diseases than in Mexico, and the natives in Peru and Bolivia lived in very high mountains in isolated villages, there was less contact. In Mexico the geography was much more similar to Spain, many mountains but not Andes like.

Simply the dozens of thousands of Spaniards who moved to Peru in colonial times could not affect genetically the millions of natives who lived in the former Inca empire.

In the USA you did not have any place with the population density of mexico, central america and the central andes of Peru, Ecuador and Bolivia.

pos said...

I think this reasoning is wrong. Compare with this ADMIXTURE analysis: (http://www.nature.com/nature/journal/v463/n7282/fig_tab/nature08835_F3.html) Here you can see that what looks like "European" in e.g. the Na-Dene in the Structure analysis in the paper is actually primarily "East Asian". Even so that does not mean that it is due to East Asian admixture. Structure/admixture does not magically find 'true' populations, see for example Engelhardt and Stephens' paper in PloS genetics. Genetic drift (e.g. serial founder effects) can cause false signs of admixture such as the example of bottlenecked Southern American populations that characterise a certain 'cluster' just because they have such characteristic genetics. The effect of this on both PCA and Structure is the illusion of a gradient of ancestry that might just be a gradient in genetic similarity.

KLH said...

H&H do not refute the serial founder effects process in the Americas. Admixture has clouded the picture, but it has not completely erased evidence of other evolutionary processes. In fact, they show clearly that the pattern of gene-identity variation in the Americas is largely hierarchically patterned. Additionally, both authors have argued in favor of a serial founder effects process at the global level. Dienekes is right to urge caution in blindly assuming that the process is the only one that has shaped human genetic diversity. It is not. H&H have provided some insights into one of these processes.
One of the other posts is also right to point out that haploid genetic systems also provide some, albeit relatively weak, evidence for a serial founder effects process in the Americas. While haploid genetic data are relatively information-poor, they are required to sort out the details of the evolutionary process and its larger social and scientific meaning.
As to the pattern of admixture in the Americas, there is a large genetic and ethnographic literature supporting the admixture results in H&H (see also papers by Wang and colleagues, which inspired the analysis of H&H).
Lastly, we encourage the post citing Engelhardt and Stephens to read it and H&H more carefully. The latter clearly show that the discrete model underlying the STRUCTURE analysis is valid, and the former clearly explain why STRUCTURE is appropriate under this model.

terryt said...

"MtDNA M lineages are more frequent in SA".

And that is very inteesting in itself. Haplogroups C and D have South America almost to themselves, so perhaps they were the first in. As I understand the situation we have D2, a member of the D4e1'5 haplogroup and C1 and C4c, from two separate subclades of the four within C. What is the geographic distribution of the American subclades within these groups? Related members of both haplogroups are widespread through South, East and Central Asia and are especially common in Northeast Asia, with more than 50% C in the Yukagir for example. D makes up a substantial proportion of the remainder. But neither haplogroup made it far into mainland SE Asia, and so failed to make it into New Guinea, Australia or out into the Pacific.

"Overall, mtDNA N lineages are more frequent in NA than in SA. Out of them, hg X is exclusive to NA".

Possibly a result of their being in the 'having been left behind' element of the population. Again the distribution of the American subclades would probably be revealing. American N haplogroups are just members of A2, one of the six subclades within A4, and B2, one of the two subclades within B4b. A is spread through Central Asia from the Volga east, via the Tibetan Plateau, to Northeast Asia although not as far north as C and D. Like the above haplogroups A made it to mainland SE Asia, but unlike them made it to the Philippines. But not beyond.

When we look at B we see an altogether different pattern. B is almost circum-Pacific with B4a1a1a in Polynesia and B2, a clade within B4'b'd'e, in America. But even then B failed to make much of an impression on New Guinea, and even less on Australia.

German Dziebel said...

"And that is very inteesting in itself."

Yes, Terry, all the patterns you identified are very interesting and not easy to explain. The mistake people are making is to try to explain the heterogeneous nature of the Amerindian population as a result of either multiple migrations or a random collecting effort on the part of a wandering Eurasian population that cherry picked Amerindian haplogroups a la carte from a wider menu of available lineages. IMO, the best way to interpret the data is that Amerindian and Old World populations share a deep common origin but have been evolving separately from each other for a very long time. Different selective pressures drove New World diversity down, while Old World diversity up resulting in the random preservation of certain lineages in one region and other in another region. This created an illusion of Amerindians being a subset of non-Amerindians.

"But even then B failed to make much of an impression on New Guinea, and even less on Australia."

If we look at Y-DNA, hg C is found in the Americas, in Polynesia and in Australia. If we look at mtDNA, America and Polynesia have hg B and Australia has hg P, all from the same clade U.

"As I understand the situation we have D2"

No, we have D1 in America.

terryt said...

"all the patterns you identified are very interesting and not easy to explain".

I think they are very easy to explain.

"If we look at Y-DNA, hg C is found in the Americas, in Polynesia and in Australia".

Three different basal groups. In order: C3 (same as NE Asia), C2 (same as Southern Wallacea) and C4(on its own). Therefore almost as ancient as the division between C and F.

"If we look at mtDNA, America and Polynesia have hg B and Australia has hg P, all from the same clade U".

No. All thre are separate clades within R. None are more closely realated to each other than is any other R-derived haplogroup.

"No, we have D1 in America".

Sorry. Can't remember where I got the idea that D2 was American. But according to Phylotree D1 is just one of 17 subclades of D4.

German Dziebel said...

"No. All thre are separate clades within R. None are more closely realated to each other than is any other R-derived haplogroup."

yes, you confused D1 with D2 and I confused U and R.

Denise Neufeld said...

For some reason I thought mtDNA halpogroups A2, B2, C1, D1, X2a were found both in NA and SA according to page 3 of this report. The are all Indigenous people.

http://www.familytreedna.com/pdf/Fagundes-et-al.pdf