June 19, 2013

Native American origins from whole-genome and exome data (Gravel et al. 2013)

arXiv:1306.4021 [q-bio.PE]

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data

Simon Gravel et al.

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR appears most closely related to Equatorial-Tucanoan-speaking populations, supporting a Southern America ancestry of the Taino people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a three-population demographic model. The ancestral populations to the three groups likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features a Mexican population of 62,000, a Colombian population of 8,700, and a Puerto Rican population of 1,900. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest size and the earlier migration from Europe.



  1. Or maybe they were completely different long before they got to the americas, that's just way too little time for that kind of differentiation.

  2. So what does this infer about the wave theory...That the peopling of the Americas happened in 3 or more waves. This seems to say that it was just one population and then it dispersed in rapid succession. That is what the genes say...but the languages say something different. Or am I overlooking or overanalyzing something here i.e. is it just talking about what happen within one of those waves?

    The Na-Dene/ Eskimo-Aleut/ North Amerindian genetic relationship is nothing new...however as has been pointed out, Na Dene doesn't fit with them linguistically. Eskimo-Aleut is definitely distinct because it exists on both sides of the Bering Sea and is postulated to be related to Uralic or, at least, have had some definite contact with it. There are "weird" instances of Eskimo-Aleut-like names in central and eastern Siberia, for example Eskimo groups in Siberia call themselves Yupigyt, a term which means "authentic people" (from yuk "person"). “Yugra” (the origin of the word "Ugrian") or “yuk” like names are found
    throughout the Eurasian taiga, including: Ural, Ircae, Ugor, Yugra, Uyghur, Uriankhai, Yurak, Yukaghir, the name of the Yeniseian Yugh, and Yakut.

    We know Uralic had an extensive range. Yeniseian has a clearly Uralic loan word for river *-ses even though it is constructed at a proto-level for Yeniseian, it clearly is a word taken over from Uralic as this word (or some variant of it) is widespread in Uralic. When the Yeniseians moved into the region they clearly encountered the Uralic speakers of the area. Based on that and the hydronyms we know Uralic extended well into eastern Siberia. It should be mentioned the the Yukaghir people also live in the far east of Siberia and their language is postulated to be directly related to Uralic or had some very extensive contact with it and much more so than Eskimo-Aleut. This demonstrates the vast range of Uralic in that region. So it should not be surprising that one would find a relationship, linguistically, between Eskimo-Aleut and Uralic...they clearly had some direct areal contact at some point in the past if they don't have some sort of linguistic genetic tie.

    All of that being said, clearly the Eskimo-Aleuts came from Asia at a relatively recent date. Those who likely came from Asia in the first wave had no apparent contact with Uralic. Neither did the Na-Dene people, again, apparently. Strange since Yeneiseian did, IF it is true that Yeniseian and Na-Dene are related (something not clearly proven as of yet).

    What all this has to do with Y-DNA haplogroups Q and N, others can sort that out.

  3. A question for people better educated in this than me. Doesn't the very even spread of West African across Mexicans in nearly every individual suggest that the time of admixture is in the very distant past? True, it has been 500 years, so maybe 25 generations is enough since the first Africans we know of arrived. It just rubs me wrong though. The higher, uneven spikes in Columbia/Puerto Rico are easily attributed to recent mixing. But Mexico, with roughly the same measure right across the board doesn't look right. Am I way out of it here? Maybe I just read too much Thor Heyerdahl as a kid.

  4. I absolutely laud the authors' efforts to study the indigenous segments in fig. 2b - although I feel more could and should be done in this regard.

    Taking that figure at face value (thus ignoring the fact that spurious non-Native contributions may have an effect), things look to me like what I have proposed before, namely, that there were two major (proto-Mongoloid) groups involved in the first migration from Beringia: a previously primarily coastal one (now Central Amerind) and an initially inland (or simply earlier) Beringian hunting group (now Chibchan). Both have some admixture with each other and a small amount with newcomers.

    Then, all later groups vary in their admixture with those two. E.g., Northern Amerind and Na-Dene are Mongoloids with primarily modest Central Amerind admixture, Inuits have rather little admixture, and Andean are some advanced group of Na-Dene that got south earlier, but of course also went through Central Amerind regions, and are much more highly admixed. The Tucanoans appear to have yet greater Chibchan (i.e., first southern population) admixture.

    Well, at least that is one interpretation.

    I also like their fig. 8 - very much inspired by similar error range diagrams used in fundamental particle physics and cosmology.

    Look at the ADMIXTURE analysis results carefully. The most Native American-descended Mexican individuals lack the Negroid admixture.

  6. Nice to see this.

    In case some of you hadn't seen he Genographic reference for those groups, look here

    We know different Arawak tribes were living in Western Venezuela. Most coastal areas from where Colombia is to 100-150 kilometres East of Coro were populated by Arawak nations, speaking a language similar to those in most of the central Caribbean Islands.

    There are Arawak groups further in the inner parts of South America.

    The Caribs were expanding from Eastern and Central coast of Venezuela (and further up to Guyana) towards the islands at the time Spaniards arrived and the Arawaks feared them.

  7. Just to correct my earlier comment.

    Apparently, there are Uralic loans in Na-Dene, at least according to linguist Arnaud Fournet. He actually proposes a genetic relationship which I would not go that far. He compares Eyak to Uralic.

  8. As you said, on linguistic grounds, it has been said that there were three migrations to the Americas: Amerind, Na Dené and Eskimo-Aleut migrations. However, I have some difficulties in accepting the single origin of all Amerind groups because of the considerable linguistic diversity of these languages.

    Judging by the Y-DNA, there could even be six arrivals: Q-M346 (upstream Q-L54), Q-L54 (upstream Q-M3 and Q-Z780), Q-M3 and Q-Z780 (parallel to M3), C3* and C3b. According to Wikipedia, small [migrating] groups had few founders, but they must have included men from these four Q lines. Q-M346 could have the same origin as the Saqqaq individual. A similar haplotype is still found in Koryak men. In Zegura et al. paper they say that native American haplogroup Q-M242 is typical of Chipewyans belonging to Na Dené language family. I now wonder if this Q-M242 is the same as Q-L54 above. However, this haplogroup could represent the same language family as the Yeniseian Q1a3-M346 migration in Altai. C3* seems to be found, mysteriously, in small numbers on the Andes, and C3b is present in Na Dené speakers. Originally, the languages spoken by Q carrying men and C carrying men must have been very different!

    According to Malyarchuk paper, the coalescence age of Q1a3-M346 in South Siberia is only 4500 years, and it has been said that Yeniseian Ket arrived quite recently to their current location i.e. near the core area of Uralic languages, and on the other hand, N1c should be quite recent in Beringia, because it did not reach America and is not diverse in this area. If so, this might explain why the contact between Uralic was so late. At a later stage, there were surely close contacts between these northeasternmost people in the arctic zone. I also remember having read that Eskimo-Aleut is the youngest language family in America.

    As we know, on the MtDNA side, there are five different haplogroups. I have seen people arguing that hg A is the original companion of Y-DNA Q. It is said that it has an East Asian origin, but I think that it could even have arrived to Asia with Y-DNA Q. Haplogroup B must have been in East Asia before Y-DNA Q, and must have been picked up by Q carrying men somewhere in China. I have seen some people arguing for a more Northern origin for haplogroup B than usually postulated, but, however, it is quite rare in North East Asia. IMO, the origin of haplogroup C is not obvious. Wikipedia says that its origin is in Central Asia. Haplogroup C1 has also interesting small branches in India, Near East and Europe. If hg C originated in Central Asia, it could have arrived with Y-DNA Q, but if so, then Y-DNA Q should be older than what is said on Wikipedia where its age is estimated only at c. 17,000-22,000 years. MtDNA C may also have arrived with Y-DNA NO which is clearly older, c. 34 600 years old. Haplogroup D could have been picked up by Q carrying men in China, along with haplogroup B, if it did not arrive to America with Y-DNA C3. If I remember correctly, D1 has been found in ancient Japanese. As for the last native American MtDNA haplogroup, I think that MtDNA X could, in fact, have arrived with Na Dené Q-M242.

    I do not have enough detailed data on the distribution of different native American MtDNA subclades, so I have no idea what MtDNA could have arrived with Y-DNA C3* and C3b, but there are surely sub-haplogoups that fit the distribution patterns of C3 lineages.

    With all above, i.e. with groups arriving at different times from different locations and with a different input from women, it would be easier to understand, at least for me, the huge diversity of native American languages.


