March 31, 2012

Iceman's sheep belonged to mtDNA haplogroup B

This establishes that the main mtDNA haplogroup (B) of extant European sheep was already present in the ~5.3ky old sheep hair shafts of the Tyrolean Iceman's clothing. The fact that the precise sheep sequence (like that of its bearer's) has not been identified in modern sheep testifies to the importance that drift and/or selection has played in the recent evolution of the species.

PLoS ONE 7(3): e33792. doi:10.1371/journal.pone.0033792

Phylogenetic Position of a Copper Age Sheep (Ovis aries) Mitochondrial DNA

Abstract Top
Sheep (Ovis aries) were domesticated in the Fertile Crescent region about 9,000-8,000 years ago. Currently, few mitochondrial (mt) DNA studies are available on archaeological sheep. In particular, no data on archaeological European sheep are available.

Methodology/Principal Findings
Here we describe the first portion of mtDNA sequence of a Copper Age European sheep. DNA was extracted from hair shafts which were part of the clothes of the so-called Tyrolean Iceman or Otzi (5,350 - 5,100 years before present). Mitochondrial DNA (a total of 2,429 base pairs, encompassing a portion of the control region, tRNAPhe, a portion of the 12S rRNA gene, and the whole cytochrome B gene) was sequenced using a mixed sequencing procedure based on PCR amplification and 454 sequencing of pooled amplification products. We have compared the sequence with the corresponding sequence of 334 extant lineages.

A phylogenetic network based on a new cladistic notation for the mitochondrial diversity of domestic sheep shows that the Otzi's sheep falls within haplogroup B, thus demonstrating that sheep belonging to this haplogroup were already present in the Alps more than 5,000 years ago. On the other hand, the lineage of the Otzi's sheep is defined by two transitions (16147, and 16440) which, assembled together, define a motif that has not yet been identified in modern sheep populations.


Three quarters of Kerey clan men belong to Genghis Khan Y chromosome cluster

From the paper:
According to the historical data, the split between two sub-clans of the Kereys occurred about 20-22 generations ago (Khalidullin 2005). Estimation of divergence time (TD) of two groups of 15 STR haplotypes (except for DYS385a,b loci) found in the Kereys sub-clans demonstrates that TD value equal to 630 ± 190 years (or approximately 21 ± 6 generations) is resulted when a mean of per-locus, per-generation mutation rate of 0.0033 and a 30-year generation time are used. Note that similar value of mutation rate (0.00324) has been calculated as optimal for 15 STR haplotypes by Busby et al. (2011) who have investigated the question on how average squared distance (ASD) estimates change within haplotype sets when using different combinations of Y-chromosome STRs. This mutation rate belongs to a class of so called genealogical STR mutation rates revealed by direct observation in father/son pairs (Kayser et al. 2000; Goedbloed et al. 2009).
The correspondence between the split time of the Kerey sub-clans and the age estimate of their Y-STR divergence is quite interesting and provides an independent historical argument for the correspondence between the C3* star cluster and Genghis Khan (or at least his direct patrilineal kin). Note that the star cluster's age matches G. K. only using a genealogical mutation rate, and not the widely (mis)used "effective mutation rate. The timeframe is recent enough to render any saturation effects from non-linearity (as described by Busby et al.) relatively unimportant.


The data reported above, taken together with the known arguments in favor of the
possible Genghis Khan‟s descent of Y-chromosome C3* star-cluster (Zerjal et al. 2003), allow us to suggest two hypotheses.
(1) The star-cluster is not directly related to the descendants of Genghis Khan, but rather is associated with the Kerait clan members. Mongol conquest with participation of the Keraits as special Khan‟s military forces allowed them to disseminate the Kerait-specific Y-chromosomes in the vast area inhabited by various peoples.
(2) Genghis Khan by himself belonged to the Keraits. This is supported by the following historical evidence (Man 2004; Khalidullin 2005). The Keraits inhabited the banks of the Onon River, where the camp of Genghis Khan‟s father Yesukhei was located. Yesukhei was declared as a blood brother of the Keraits‟ Khan Toghrul (Wang Khan). Toghrul then declared Genghis Khan his son-in-law. Fraternization of the Genghis Khan family with the Keraits‟ Khan suggests  that a real blood relationship, though probably not approved officially, existed between them.

Human Biology: Vol. 84: Iss. 1, Article 4.

The Y-chromosome C3* star-cluster attributed to Genghis Khan's descendants is present at high frequency in the Kerey clan from Kazakhstan

Serikbai Abilev et al.

In order to verify the possibility that the Y-chromosome C3* star-cluster attributed to Genghis Khan and his patrilineal descendants is relatively frequent in the Kereys, who are the dominant clan in Kazakhstan and in Central Asia as a whole, polymorphism of the Y-chromosome was studied in Kazakhs, represented mostly by members of the Kerey clan. The Kereys showed the highest frequency (76.5%) of individuals carrying the Y-chromosome variant known as C3* star-cluster ascribed to the descendants of Genghis Khan. C3* star-cluster haplotypes were found in two sub-clans, Abakh-Kereys and Ashmaily-Kereys, diverged about 20-22 generations ago according to the historical data. Median network of the Kerey star-cluster haplotypes at 17 STR loci displays a bipartite structure, with two subclusters defined by the only difference at DYS448 locus. It is noteworthy that there is a strong correspondence of these subclusters with the Kerey sub-clans affiliation. The data obtained suggest that the Kerey clan appears to be the largest known clan in the world descending from a common Y-chromosome ancestor. Possible ways of Genghis Khan‟s relation to the Kereys are discussed.


March 28, 2012

A rare look at the Y chromosomes of Afghanistan

I often bemoan the fact that some of the regions of the world that are most interesting to the student of prehistory (e.g., Mesopotamia and the Iranian Plateau) seem to also be the ones with more than their fair share of political trouble, hindering efforts to study them with the newest set of tools. Afghanistan is certainly one case that hasn't been quite the most welcoming of places in recent decades.

The country is transitional between the Iranic speaking world of Iran and the Indo-Aryan speaking world of South Asia, as well as between the Indo-Iranian world and the (mostly) Turkic-speaking world of Central Asia. Hence, the absence of data for that country has been acutely felt for all those who are trying to understand "what happened" in Eurasia.

The appearance of a new paper by the Genographic Project is a welcome sight, and a good example of what is best about this Project. I haven't been exactly a fan of the Genographic's interpretation of their own data, but kudos to them for getting them in the first place.

From the paper:
Pashtuns are the largest ethnic group in Afghanistan, accounting for about 42 percent of the population, with Tajiks (27%), Hazaras (9%), Uzbeks (9%), Aimaqs (4%), Turkmen people (3%), Baluch (2%), and other groups (4%) making up the remainder [6]. In the present study, eight ethnic groups were examined, with a focus on the largest four groups: - The Pashtuns, traditionally lived a seminomadic lifestyle, they reside mainly in southern and eastern Afghanistan and in western Pakistan. They speak Pashto which is a member of the Eastern Iranian languages. - The Tajiks are a Persian-speaking ethnic group which are closely related to the Persians of Iran. In Afghanistan, they are the largest Tajik population outside their homeland to the north in Tajikistan. - The Hazara population speaks Persian with some Mongolian words. They believe they are descendants of Genghis Khan's army that invaded during the twelfth century. - The Uzbeks are a Turkic speaking group that have been living a sedentary farming lifestyle in Northern Afghanistan.
The main features of the Y-chromosome gene pool:
Genotyping revealed 32 halpogroups present in Afghanistan's ethnic groups among our samples. Haplogroups R1a1a-M17, C3-M217, J2-M172, and L-M20 were the most frequent when Afghan ethnic groups were pooled, together comprising >66% of the chromosomes. Absolute and relative haplogroup frequencies are tabulated in Table S4.
-The PCA analysis (left) showcases wonderfully the correspondence between different haplogroups and the three main regions of the Near East (green), South Asia (yellow), and Central Asia (purple).

It is a real shame that the newer markers available within the most prominent R-M17 haplogroup were not tested:
The prevailing Y-chromosome lineage in Pashtun and Tajik (R1a1a-M17), has the highest observed diversity among populations of the Indus Valley [46]. R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where the mid-Holocene R1a1a7-M458 sublineage is dominant [46]. R1a1a7-M458 was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as previously thought [47], expansions from the Pontic Steppe [3], bringing the Indo-European languages to Central Asia and India.
Nonetheless, I can't really disagree with the dismissal of the R-M17/Indo-European theory. R-M17 is simply too populous in South Asia to be the genetic legacy of "Indo-Europeans": (i) under an elite-dominance model, its frequency is way too high (compared to well-attested examples of elite dominance, e.g., Hungary or Turkey where the genetic legacy of the elite element is in the minority), (ii) under a folk migration model, it is difficult to understand why a hypothetical migrating Indo-European people would have such an overwhelming influence in the region while at the same time hardly influencing at all other densely occupied agricultural landscapes of the Eurasian steppe periphery; moreover, no autosomal signal corresponding to a migration from eastern Europe to South Asia really exists -the main cline of variation links South with West Asia, not Europe- and the small signal that does exist does not really correspond to observed levels of R-M17.

From the paper:
The E1b1b1-M35 lineages in some Pakistani Pashtun were previously traced to a Greek origin brought by Alexander's invasions [48]. However, RM network of E1b1b1-M35 found that Afghanistan's lineages are correlated with Middle Easterners and Iranians but not with populations from the Balkans.
Greek populations are not homogeneous in their haplogroup E frequencies, so it would be useful to consider the possibility that the lack of this frequent Southeastern European haplogroup in South Asia may not reflect a complete lack of Greek influence in this region, but rather, an influence from a structured ancient Greek population.

Looking at the Y-haplogroup composition:

A few points of interest:

  • The clear link between C/N/O with Central Asia
  • A clear difference between Persian and Pashto speakers in terms of inverse J2a/R1a frequences
  • The paucity of J1 chromosomes (only 1 Tajik) testifies to the absence of relatively recent Middle Eastern influences associated with the spread of Islam; consistent with the absence of the autosomal "Southwest Asian" component in South/Central Asia.
  • Paucity of R1b, except in a couple Uzbeks and a Tajik; I have argued before that R1a had an early distribution in the arc of flatlands north and east of the Caspian, while R1b a complementary distribution in the smaller arc of the highlands west and south of it, out of which the Tocharians may have originated.
  • The small Nurestani sample comprises of J2a, R1a, and R2; these are linguistic relatives of the Kalash of Pakistan who -unlike the latter- were converted to Islam in the 19th century.
I would say that the evidence is pretty clear that the earliest Iranians may have included haplogroups R1a and J2, although I would not wager on their relative proportions and overall contribution to modern Iranian-speaking populations. For whatever reason, it seems that Kurds and Persians ended up with a J2-over-R1a advantage, while Pathans and (plausibly) Turkified Central Asian former Iranian speakers with the reverse. Nonetheless, the occurrence of both haplogroups in most Iranian groups, as well as in most Indo-Aryan ones is quite telling. It is unfortunate that the relationships between these Y chromosomes (still J2a*! six years after Sengupta et al.) and their West Eurasian brethren was not further pursued.

Hopefully, the data can be re-used down the road once the phylogeny of different haplogroups (and R1a in particular) is better understood. As I've stated before on this blog, I take Y-STR based age estimates with a huge grain of salt, so I would not put much faith in any of the ones presented in this paper.

Related: Firasat et al. (2006), Y-chromosomes of Afghanistan, Lashgary et al. (2011), Regueiro et al. (2006).

PLoS ONE doi:10.1371/journal.pone.0034288

Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events

Marc Haber et al.


Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.


High mtDNA mutation rate from deep-rooted Costa Rican pedigrees

American Journal of Physical Anthropology DOI: 10.1002/ajpa.22052

High mitochondrial mutation rates estimated from deep-rooting costa rican pedigrees

Lorena Madrigal et al.


Estimates of mutation rates for the noncoding hypervariable Region I (HVR-I) of mitochondrial DNA vary widely, depending on whether they are inferred from phylogenies (assuming that molecular evolution is clock-like) or directly from pedigrees. All pedigree-based studies so far were conducted on populations of European origin. In this article, we analyzed 19 deep-rooting pedigrees in a population of mixed origin in Costa Rica. We calculated two estimates of the HVR-I mutation rate, one considering all apparent mutations, and one disregarding changes at sites known to be mutational hot spots and eliminating genealogy branches which might be suspected to include errors, or unrecognized adoptions along the female lines. At the end of this procedure, we still observed a mutation rate equal to 1.24 × 10−6, per site per year, i.e., at least threefold as high as estimates derived from phylogenies. Our results confirm that mutation rates observed in pedigrees are much higher than estimated assuming a neutral model of long-term HVRI evolution. We argue that until the cause of these discrepancies will be fully understood, both lower estimates (i.e., those derived from phylogenetic comparisons) and higher, direct estimates such as those obtained in this study, should be considered when modeling evolutionary and demographic processes.


Improved eigenanalysis with Minimum Average Partial test (Shriner 2012)

I had mentioned a previous article by the same author on the topic of how many PCA dimensions to retain. I had identified this problem in the context of my "Clusters Galore" analysis, and the topic has recently re-surfaced in the recent Falush and Lawson pre-print, which, unfortunately, appeared at the same time as the publication of this new paper.

Personally, I have tried three methods for choosing the number of principal components to retain:

  1. Tracy Widom; which seems to retain more dimensions than are necessary, with a resulting reduction in clustering quality
  2. A test of normality (such as Shapiro-Wilk), which tends to identify a smaller number of dimensions where the data appear not normally distributed and hence may contain useful information about population structure.
  3. A more pragmatic approach of picking the number of components to retain that maximize the number of inferred clusters by MCLUST
It would be great if this test could be incorporated into future versions of EIGENSOFT.

Human Heredity Vol. 73, No. 2, 2012

Improved Eigenanalysis of Discrete Subpopulations and Admixture Using the Minimum Average Partial Test

Daniel Shriner

Abstract Principal components analysis of genetic data has benefited from advances in random matrix theory. The Tracy-Widom distribution has been identified as the limiting distribution of the lead eigenvalue, enabling formal hypothesis testing of population structure. Additionally, a phase change exists between small and large eigenvalues, such that population divergence below a threshold of FST is impossible to detect and above which it is always detectable. I show that the plug-in estimate of the effective number of markers in the EIGENSOFT software often exceeds the rank of the sample covariance matrix, leading to a systematic overestimation of the number of significant principal components. I describe an alternative plug-in estimate that eliminates the problem. This improvement is not just an asymptotic result but is directly applicable to finite samples. The minimum average partial test, based on minimizing the average squared partial correlation between individuals, can detect population structure at smaller FST values than the corrected test. The minimum average partial test is applicable to both unadmixed and admixed samples, with arbitrary numbers of discrete subpopulations or parental populations, respectively. Application of the minimum average partial test to the 11 HapMap Phase III samples, comprising 8 unadmixed samples and 3 admixed samples, revealed 13 significant principal components.


mtDNA links between Africa and Europe, old and new (Cerezo et al. 2012)

Link to open access supplementary material.

Genome Research DOI: 10.1002/ajpa.22052

Reconstructing ancient mitochondrial DNA links between Africa and Europe

María Cerezo et al.


Mitochondrial DNA (mtDNA) lineages of macro-haplogroup L (excluding the derived L3 branches M and N) represent the majority of the typical sub-Saharan mtDNA variability. In Europe, these mtDNAs account for less than 1% of the total but, when analyzed at the level of control region, they show no signals of having evolved within the European continent, an observation that is compatible with a recent arrival from the African continent. To further evaluate this issue, we analyzed 69 mitochondrial genomes belonging to various L sublineages from a wide range of European populations. Phylogeographic analyses showed that ∼65% of the European L lineages most likely arrived in rather recent historical times, including the Romanization period, the Arab conquest of the Iberian Peninsula and Sicily, and during the period of the Atlantic slave trade. However, the remaining 35% of L mtDNAs form European-specific subclades, revealing that there was gene flow from sub-Saharan Africa toward Europe as early as 11,000 yr ago.


March 27, 2012

Cranial variation and the transition to agriculture in Europe

Students of physical anthropology won't be surprised that Pinhasi and von Cramon-Taubadel find that the Neolithic and pre-Neolithic populations in Europe were differentiated cranially as they were apparently genetically.

It has long been recognized that the ancient European population was different than the Upper Paleolithic population of the continent. Carleton Coon ascribed this differentiation to migration of narrow-faced Mediterraneans into the territory of the robust broad-faced Upper Paleolithics. Ilse Schwidetzky also viewed migration from the Southeast of gracile Mediterraneans who gradually replaced broad-faced Cro-Magnoids.

So, it is nice to read that the re-analysis of a wide assortment of skulls on 15 cranial variables has revealed that:
The major shape differences separating hunter-gatherer Mesolithic populations and farming Neolithic populations are coded by PC1 with Neolithic specimens having longer and taller vaults, and Mesolithic specimens having larger, and broader faces.
There are two (or three) puzzles in European prehistory:
  • How the robust, low-skulled, broad-faced hunter-gatherers became more high-skulled, narrow-faced and gracile
  • How the latter became brachycephalized until early modern times
  • Why they have become partially debrachycephalized in the most recent of times
Anthropologists have tended to favor either migration or adaptation to explain these trends, with some even suggesting simple phenotypic plasticity without any major genetic change. It is now clear that -whatever the role of adaptation or plasticity- the Upper Paleolithic population of Europe did not simply change to become more gracile on its own, but was affected by an already gracile population of foreign origin who set the ball rolling. There is already work on the genetic basis of facial structure, so, it is quite possible that eventually we'll be able to track directly the genetic changes underlying the phenotypic transformation of Europeans.

From the paper:
Nonetheless, the craniometric analysis allows us to discern certain patterns. For example, the ‘Forest Neolithic’ specimens are clearly much more similar to other Mesolithic hunter-gatherers than to Neolithic farmers in terms of their craniometric shape, suggesting a large degree of cultural diffusion in this region. However, it is also evident that the earliest potential colonisers of southeast and central Europe are very similar to the Anatolian Çatal Höyük population, congruent with an initial demic diffusion from the Near East/Anatolia.
The "Forest Neolithic" included pottery-using groups of eastern Europe (hence Neolithic, since pottery is one of the hallmarks of that period), but should not be confused with the early agriculturalists who apparently practiced farming without pottery early on in the Near East and Greece, and then acquired pottery and expanded with it into the rest of Europe, together with their full "package" of domesticated crops and animals.

Human Biology vol. 84

Cranial variation and the transition to agriculture in Europe

Ron Pinhasi, Noreen Von Cramon-Taubadel


Debates surrounding the nature of the Neolithic demographic transition in Europe have historically centred on two opposing models; a 'demic' diffusion model whereby incoming farmers from the Near East and Anatolia effectively replaced or completely assimilated indigenous Mesolithic foraging communities and an 'indigenist' model resting on the assumption that ideas relating to agriculture and animal domestication diffused from the Near East, but with little or no gene flow. The extreme versions of these dichotomous models have been heavily contested primarily on the basis of archaeological and modern genetic data. However, in recent years there has been a growing acceptance of the likelihood that both processes were ongoing throughout the Neolithic transition and that a more complex, regional approach is required to fully understand the change from a foraging to a primarily agricultural mode of subsistence in Europe. Craniometric data have been particularly useful for testing these more complex scenarios, as they can reliably be employed as a proxy for the genetic relationships amongst Mesolithic and Neolithic populations. In contrast, modern genetic data assume that modern European populations accurately reflect the genetic structure of Europe at the time of the Neolithic transition, while ancient DNA data are still not geographically or temporally detailed enough to test continent-wide processes. Here, with particular emphasis on the role of craniometric analyses, we review the current state of knowledge regarding the cultural and biological nature of the Neolithic transition in Europe.


March 26, 2012

Similarity matrices and clustering (Lawson and Falush)

Lawson and Falush have a new review paper on different clustering methods using haplotype data such as their own ChromoPainter/fineSTRUCTURE methodology, as well as the MCLUST/fastIBD methods that I started playing with a while back.

I won't have much time for the next few days to comprehensively review this new work, but I will add one data point to the discussion, by pointing to my ChromoPainter and fastIBD analyses over the same dataset. I will also add any further comments on this blog post, once I get the opportunity to read the paper.

Another point that needs to be made is how commendable the ChromoPainter folks' attitude towards the topic has been. Not only did they post their ChromoPainter preprint and software online months before their original paper was published, but they quickly jumped on my comments and suggestions on their paper to write their new review paper, making at available as a preprint itself. I'm guessing this saved about a year or two over what would have been possible if all the formalities of "traditional" publishing had been observed. It's also a very nice example of synergy between professional and amateur science, that the Internet and social media has made possible.

Similarity matrices and clustering algorithms for population identification using genetic data

Daniel John Lawson and Daniel Falush


A large number of algorithms have been developed to identify population
structure from genetic data. Recent results show that the information used
by both model-based clustering methods and Principal Components Analysis
can be summarised by a matrix of pairwise similarity measures between
individuals. Similarity matrices have been constructed in a number of ways,
usually treating markers as independent but differing in the weighting given
to polymorphisms of different frequencies. Additionally, methods are now being
developed that better exploit the power of genome data by taking linkage
into account. We review several such matrices and evaluate their ‘information
content’. A two-stage approach for population identification is to first construct
a similarity matrix, and then perform clustering. We review a range
of common clustering algorithms, and evaluate their performance through a
simulation study. The clustering step can be performed either directly, or
after using a dimension reduction technique such as Principal Components
Analysis, which we find substantially improves the performance of most algorithms.
Based on these results, we describe the population structure signal
contained in each similarity matrix, finding that accounting for linkage leads
to significant improvements for sequence data. We also perform a comparison
on real data, where we find that population genetics models outperform
generic clustering approaches, particularly in regards to robustness against
features such as relatedness between individuals.


March 24, 2012

Report on the symposium on Modern Human Genetic Variation

Joshua Akey summarizes the talks of a recent symposium at the Swedish Royal Academy of Sciences. Two bits of information stand out from his report. The first:

In another talk focused on demography, Mattias Jakobsson (Uppsala University, Sweden) presented novel data on the impact of the agricultural revolution on the genetics of contemporary European populations. Specifically, Jakobsson and colleagues obtained nearly 250 Mb of sequence from three 5,000-year-old remains of Neolithic hunter-gatherers and one Neolithic farmer excavated in Scandinavia. Analysis of these sequences in the context of the present day European gene pool suggests that the spread of agriculture involved the northward migrations of farmers. Thus, these data provide the most direct and compelling support for the demic diffusion model of agriculture (as opposed to cultural diffusion) described to date. 

It seems I have my answer to the what's next question. Jakobsson has been doing some interesting work on the demography of human emergence and dispersal, so it will be interesting to see not only the novel sequences from these Neolithic Scandinavians, but also how they fit into existing models of demic diffusion.

The second bit of information:

Similarly, Jeff Wall (University of California San Francisco, USA) described a novel method for inferring archaic admixture, which he applied to publicly available whole-genome sequence data generated by Complete Genomics. Provocatively, he finds higher rates of introgression in Asians compared to Europeans. An advantage of Wall’s method is that it does not require an archaic genome to infer introgression, and thus he was able to also test the hypothesis that contemporary African genomes have signatures of gene flow with archaic human ancestors. Strikingly, Wall indeed did find evidence of archaic admixture in African genomes, suggesting that modest amounts of gene flow were widespread throughout time and space during the evolution of anatomically modern humans.

I guess that I shouldn't throw explanation #1 out the window yet. Wall was involved in the recent paper on archaic African admixture, which only looked at a small subset of the genome, so it is nice to see that he is now working with full genomes, and that the race to data mine complete genomes for archaic admixture is afoot.

The book of abstracts is online at the symposium site. The Jakobsson paper does seem to agree with our emerging picture of a non-local origin of northern European farmers as well as greater survival of pre-farming populations in the northern periphery of Europe, but it will be interesting to see where exactly extant populations fall on the farmer-hunter/gatherer continuum.
Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Northern Europe 
Mattias Jakobsson
Department of Evolutionary Biology, Evolutionary Biology Centre (EBC), Uppsala University, Sweden 
The prehistoric spread of farming in Europe has garnered intense interest for almost a century, and was one of the first questions to which population genetic data was used to investigate demographic hypotheses. However, the impact of the agricultural revolution on the European gene pool remains largely unknown. We obtained 249 million base pairs of quality-filtered human autosomal sequence data from some 5,000 year-old remains of three Neolithic hunter-gatherers and one Neolithic farmer excavated in Scandinavia, the northernmost fringe of agricultural practice at the time. Applying novel methods to study population structure based on low genome-coverage data, we find that Northern European Neolithic farmers are most similar to modern-day southern Europeans, contrasting sharply to Neolithic hunter-gatherers who are most similar to extant individuals from northern Europe. With most extant European populations appearing genetically intermediate between the two Neolithic groups, our results suggest that migration from the south by a genetically distinct group of humans accompanied the spread of agriculture to geographic regions where hunting and gathering was the mode of subsistence, but that admixture eventually shaped modern-day patterns of genomic variation.

Archaic admixture in the human genome 
Jeff D Wall
Department of Epidemiology & Biostatistics, University of California, San Francisco, USA 
We describe a method that uses patterns of linkage disequilibrium in extant human populations to identify regions of the genome that were inherited from ‘archaic’ human ancestors, such as Neandertals, Homo erectus or H. floresiensis. We validate this approach using two recently published archaic human genomes, and show that several ancient admixture events must have occurred, both within and outside of Africa. We also explore differences in the amount of archaic admixture across different contemporary human populations.

Finally, here is the meeting report:

Investigative Genetics 2012, 3:7 doi:10.1186/2041-2223-3-7

Understanding human evolutionary history: a meeting report of the Swedish Royal Academy of Sciences symposium of modern human genetic variation 

Joshua M Akey

Link (pdf)

March 23, 2012

African American athletes have more European mtDNA than the general AA population

Does anyone have a theory why this is the case?

American Journal of Physical Anthropology DOI: 10.1111/j.1600-0838.2010.01289.x

Importance of mitochondrial haplotypes and maternal lineage in sprint performance among individuals of West African ancestry

M. Deason et al.

Mitochondrial DNA (mtDNA) is inherited solely along the matriline, giving insight into both ancestry and prehistory. Individuals of sub-Saharan ancestry are overrepresented in sprint athletics, suggesting a genetic advantage. The purpose of this study was to compare the mtDNA haplogroup data of elite groups of Jamaican and African-American sprinters against respective controls to assess any differences in maternal lineage. The first hypervariable region of mtDNA was haplogrouped in elite Jamaican athletes (N=107) and Jamaican controls (N=293), and elite African-American athletes (N=119) and African-American controls (N=1148). Exact tests of total population differentiation were performed on total haplogroup frequencies. The frequency of non-sub-Saharan haplogroups in Jamaican athletes and Jamaican controls was similar (1.87% and 1.71%, respectively) and lower than that of African-American athletes and African-American controls (21.01% and 8.19%, respectively). There was no significant difference in total haplogroup frequencies between Jamaican athletes and Jamaican controls (P=0.551 ± 0.005); however, there was a highly significant difference between African-American athletes and African-American controls (P less than 0.001). The finding of statistically similar mtDNA haplogroup distributions in Jamaican athletes and Jamaican controls suggests that elite Jamaican sprinters are derived from the same source population and there is neither population stratification nor isolation for sprint performance. The significant difference between African-American sprinters and African-American controls suggests that the maternal admixture may play a role in sprint performance.


Another look at Oetzi with 'euro7' and 'world9' calculators

After taking the first look at the genome of the Tyrolean Iceman, I decided to run him through a couple more calculators developed by the Dodecad Project.

The first one was euro7 which has a little bit more resolution within Europe. Oetzi was:

  • 37.8% Southwestern
  • 37.7% Southeastern
  • 22.5% Northwestern
  • 1.9% African
  • 0.1% Far_Asian

These results are consistent with his K12b "Atlantic_Med" major ancestral component, and he appears once again to be a very close match for the Sardinian components using the same calculator.

The second one was world9 which is a "global" calculator that includes Amerindian and Australasian components. Oetzi was:

  • 47.3% Atlantic_Baltic
  • 46.4% Southern
  • 3.1% Caucasus_Gedrosia
  • 1.7% Australasian
  • 0.9% African
  • 0.6% East_Asian

Again, these match quite well the world9 values for Sardinians, who are a bit more Atlantic_Baltic and a little less Southern than Oetzi, as noted before for the K7b comparison that is similar to world9, with the addition of the Amerindian and Australasian components.

Overall, this is a nice demonstration that Oetzi's genome is indeed Sardinian-like as argued by Keller et al., and also that the Dodecad Project calculators based on the idea of "zombies" are indeed working as they're supposed to. (Note that the previous K=7 and K=12 comparisons were not based on "zombies", but produced quite the same conclusion as the supervised runs in this post).

This is even more impressive as only ~44k SNPs were used in these various experiments, intersecting the set of SNPs I have for Oetzi (1,459,228 SNPs mapped to hg18, or 156,691 SNPs intersected with my main Stanford HGDP reference), with the ~160-170k SNPs used in my various calculators after linkage disequilibrium-based pruning.

So, despite an about 4-fold reduction in the number of SNPs, the results are excellent. Hopefully, in the future, I'll find some time to create new calculators that use all ~160k of Oetzi's SNPs, although intersection with all the dozen Illumina-based datasets I currently have available leaves only ~72k SNPs in all.

But I have to say that I'm already growing tired of Oetzi, with his Sardinian-like predictability: what's the next ancient genome in the works? (e-mail me if you want to tip me)

March 22, 2012

Projecting Sub-Saharan Africans on the European-East Asian axis

Continuing my exploration of Sub-Saharan origins, I projected African populations onto the Principal Components created by a set of European and East Asian populations. The set of SNPs was ascertained on a San individual, as before, and 112 European/110 East Asian individuals were used to avoid sample size issues.

Note that projecting populations onto PCs generated by other populations tends to make the projected populations regress to the mean somewhat. Overall, it appears that African populations are strongly shifted to the European side of the fist principal component.

There is a dearth of haplogroups of Sub-Saharan origin in Europe. Most of them occur as outliers, and in small percentages. Hence, the very strong shift of African populations towards Europeans cannot be ascribed to occasional African admixture in Europe.

Y-haplogroup E is the only major link between Africa and Europe, which is not also shared by Africa and East Asia. But, this haplogroup occurs at very small frequencies in all the included European populations, and at quite variable frequencies overall. As McEvoy et al. correctly observed, even Y-haplogroup E devoid European populations exhibit a closeness with Africans not shared by East Asians.

If admixture with Y-haplogroup E bearing males in Europe cannot account for the relative closeness of Africans to Europeans, what does? As I have mentioned in my review of McEvoy et al., there are two possibilities:

  • An Asian component not shared by Eurafricans is actually pulling East Asians away from Europeans and Africans. With the discovery of Denisovans and archaic-leaning Red Deer Cave people, this is a possibility that must be seriously entertained.
  • Y-haplogroup E bearers were originally closer to West than to East Eurasians. This is also quite likely, since the origins of haplogroup E are traced to East Africa from DE-bearing ancestors who may have well lived in Eurasia; Asia possesses both sister clades E and D, as well as both sister clades DE and CF one level-up. But, even if DE originated in East Africa itself, we must remember that the population there was originally different than many of the current inhabitants of the region, and may very well have been genetically closest to the emerging Proto-Caucasoids of the adjacent regions of Asia, rather than to the more distant human populations that would ultimately evolve into the East Asians of today.
I tend toward the second explanation -because of the relatively shallow divergence times between East Asians and Europeans, and the lack of any archaic Y-chromosomes/mtDNA. It is hard to imagine such a strong divergence being explained by a little archaic admixture in East Asia.

Interpreting the Beaker phenomenon in Mediterranean France

Antiquity Volume: 86 Number: 331 Page: 131–143

Interpreting the Beaker phenomenon in Mediterranean France: an Iron Age analogy

Olivier Lemercier

The author offers a new descriptive explanation of the Beaker phenomenon, by focusing on Mediterranean France and making reference to the Greek influx in the same area 2000 years later. In the Iron Age, the influence began with an exploratory phase, and then went on to create new settlements and colonise new areas away from the coast. The Beaker analogy is striking, with phases of exploration and implantation and acculturation, but adjusted to include a final phase where Beaker practice was more independent. Comparing the numerous models put forward to explain it, the author shows that immigration and a cultural package are both aspects of the Beaker phenomenon.


Keros: maritime sanctuary of the Early Bronze Age

Antiquity Volume: 86 Number: 331 Page: 144–160

The oldest maritime sanctuary? Dating the sanctuary at Keros and the Cycladic Early Bronze Age

Colin Renfrew1, Michael Boyd1 and Christopher Bronk Ramsey2

The sanctuary on the island of Keros takes the form of deposits of broken marble vessels and figurines, probably brought severally for deposition from elsewhere in the Cyclades. These acts of devotion have now been accurately dated, thanks to Bayesian analyses of the contemporary stratigraphic sequence on the neighbouring islet of Dhaskalio. The period of use—from 2750 to 2300 cal BC—precedes any identified worship of gods in the Aegean and the site is among the earliest ritual destinations only accessible by sea. The authors offer some preliminary thoughts on the definition of these precocious acts of pilgrimage.


March 20, 2012

Rare mtDNA haplogroups of North Asia

From the paper:
The results of our study provided an additional support for the existence of limited maternal gene flow between eastern Asia/southern Siberia and eastern Europe revealed by analysis of modern and ancient mtDNAs previously [12], [37], [39], [48], [42], [58], [59]. Two more mtDNA subclusters which may be indicative of eastern Asian influx into gene pool of eastern Europeans have been revealed within haplogroups M10 and N9a. The presence of N9a3a subcluster only in eastern European populations may indicate that it could arose there after the arrival of founder mtDNA from eastern Asia about 8–13 kya. It is noteworthy that another eastern Asian specific lineage, C5c1, revealed exclusively in some European populations (Poles, Belorussians, Romanians), shows evolutionary ages within frames of 6.6–11.8 kya depending on the mutation rates values [12]. In addition, recent molecular-genetic study of the Neolithic skeletons from archaeological sites in the Alföld (Hungary) has demonstrated high frequency of eastern Asian mtDNA haplogroups in ancient inhabitants of the Carpathian Basin [42]. Specifically, haplogroups N9a and C5 were also revealed in remains, thus indicating that genetic continuity for some eastern Asian mtDNA lineages in Europeans is possible from the Neolithic Period. Prehistoric migrations associated with the distribution of the pottery-making tradition initially emerged in the forest-steppe belt of northern Eurasia starting at about 16 kya and spread to the west to reach the south-eastern confines of eastern European Plain by about 8 kya [60] could be suggested as a potential cause for eastern Asian mtDNA haplogroups appearance in Europe. More information from complete mtDNA sequences as well as the other genetic markers in the contemporary and extinct populations of Eurasia would be helpful to validate our conclusions.
PLoS ONE 7(2): e32179. doi:10.1371/journal.pone.0032179

Complete Mitochondrial DNA Analysis of Eastern Eurasian Haplogroups Rarely Found in Populations of Northern Asia and Eastern Europe

Miroslava Derenko et al.

With the aim of uncovering all of the most basal variation in the northern Asian mitochondrial DNA (mtDNA) haplogroups, we have analyzed mtDNA control region and coding region sequence variation in 98 Altaian Kazakhs from southern Siberia and 149 Barghuts from Inner Mongolia, China. Both populations exhibit the prevalence of eastern Eurasian lineages accounting for 91.9% in Barghuts and 60.2% in Altaian Kazakhs. The strong affinity of Altaian Kazakhs and populations of northern and central Asia has been revealed, reflecting both influences of central Asian inhabitants and essential genetic interaction with the Altai region indigenous populations. Statistical analyses data demonstrate a close positioning of all Mongolic-speaking populations (Mongolians, Buryats, Khamnigans, Kalmyks as well as Barghuts studied here) and Turkic-speaking Sojots, thus suggesting their origin from a common maternal ancestral gene pool. In order to achieve a thorough coverage of DNA lineages revealed in the northern Asian matrilineal gene pool, we have completely sequenced the mtDNA of 55 samples representing haplogroups R11b, B4, B5, F2, M9, M10, M11, M13, N9a and R9c1, which were pinpointed from a massive collection (over 5000 individuals) of northern and eastern Asian, as well as European control region mtDNA sequences. Applying the newly updated mtDNA tree to the previously reported northern Asian and eastern Asian mtDNA data sets has resolved the status of the poorly classified mtDNA types and allowed us to obtain the coalescence age estimates of the nodes of interest using different calibrated rates. Our findings confirm our previous conclusion that northern Asian maternal gene pool consists of predominantly post-LGM components of eastern Asian ancestry, though some genetic lineages may have a pre-LGM/LGM origin.


mtDNA haplogroup HV4a1a from Franco-Cantabria

PLoS ONE 7(3): e32851. doi:10.1371/journal.pone.0032851

Genetic Continuity in the Franco-Cantabrian Region: New Clues from Autochthonous Mitogenomes

Alberto Gómez-Carballa et al.



The Late Glacial Maximum (LGM), ~20 thousand years ago (kya), is thought to have forced the people inhabiting vast areas of northern and central Europe to retreat to southern regions characterized by milder climatic conditions. Archaeological records indicate that Franco-Cantabria might have been the major source for the re-peopling of Europe at the beginning of the Holocene (11.5 kya). However, genetic evidence is still scarce and has been the focus of an intense debate.

Methods/Principal Findings

Based on a survey of more than 345,000 partial control region sequences and the analysis of 53 mitochondrial DNA (mtDNA) genomes, we identified an mtDNA lineage, HV4a1a, which most likely arose in the Franco-Cantabrian area about 5.4 kya and remained confined to northern Iberia.


The HV4a1a lineage and several of its younger branches reveal for the first time genetic continuity in this region and long-term episodes of isolation. This, in turn, could at least in part explain the unique linguistic and cultural features of the Basque region.


Dual origins of Sub-Saharan Africans?

A reader quotes from Coon's The Living Races of Man (1965):
''Meanwhile we may note that a detailed analysis of 571 modern Negro crania, made by advanced mathematical techniques, has shown that these crania gravitate between two poles, a Mediterranean Caucasoid and a Pygmy one. The former type is again divisible into an ordinary Mediterranean and a Western Asian type, which suggests more than a single northern point of origin for the Caucasoid element. As we shall in greater detail in Chapter 8 and 9, the Negroes resemble Caucasoids closely a number of genetic traits that are inherited in a simple fashion. Examples of these are fingerprints, types of earwax, and the major blood groups. The Negroes also have some of the same local, predominantly African, blood types as the Pymies. 
This evidence suggests that the Negroes are not a primary sub-species but rather a product of mixture between invading Caucasoids and Pygmies who lived on the edges of the forest, which at the end of the Pleistocene extended farther north and east than it does now.''
Since I'm not one to reject old theories just because they're old, I decided to test this particular idea. I used the San ascertainment panel of the Harvard HGDP dataset, and plotted all African populations, together with West and East Eurasian ones in a PCA plot:
African populations fall along a cline towards West Eurasian populations, with the most isolated Mbuti Pygmies on one end, and a tight blob of West Eurasian populations on the other. Hence, there is evidence for variable affiliation of Sub-Saharans with West Eurasians, but no real evidence for variable affiliation of West Eurasians with Sub-Saharans, except for the Mozabites and HGDP Arabs, with their well-known African admixture.

Nor can the results be explained in terms of more recent common ancestry of African farmers with Eurasians in general, because African populations fall in a clear cline towards West Eurasian populations.

While I would hesitate to say that the above results prove the correctness of Coon's theory, they're certainly quite consistent with it. It may very well be that Y-haplogroup E1b1 bearers from East Africa, descended from Y-haplogroup DE-YAP from Eurasia are ultimately responsible for the introduction of the Caucasoid component into Africa.

Maghrebi origin of early south Iberian Neolithic

Quaternary Research
Volume 77, Issue 2, March 2012, Pages 221–234

The Mesolithic–Neolithic transition in southern Iberia

Miguel Cortés Sánchez et al.

New data and a review of historiographic information from Neolithic sites of the Malaga and Algarve coasts (southern Iberian Peninsula) and from the Maghreb (North Africa) reveal the existence of a Neolithic settlement at least from 7.5 cal ka BP. The agricultural and pastoralist food producing economy of that population rapidly replaced the coastal economies of the Mesolithic populations. The timing of this population and economic turnover coincided with major changes in the continental and marine ecosystems, including upwelling intensity, sea-level changes and increased aridity in the Sahara and along the Iberian coast. These changes likely impacted the subsistence strategies of the Mesolithic populations along the Iberian seascapes and resulted in abandonments manifested as sedimentary hiatuses in some areas during the Mesolithic–Neolithic transition. The rapid expansion and area of dispersal of the early Neolithic traits suggest the use of marine technology. Different evidences for a Maghrebian origin for the first colonists have been summarized. The recognition of an early North-African Neolithic influence in Southern Iberia and the Maghreb is vital for understanding the appearance and development of the Neolithic in Western Europe. Our review suggests links between climate change, resource allocation, and population turnover.


March 19, 2012

The effects of ascertainment on admixture estimates

In a previous experiment, I discovered a clear signal of West Eurasian-like admixture in Sub-Saharan African populations, using a set of markers ascertained in a San individual. The marker panels of the Harvard HGDP dataset have been ascertained on different individuals from around the world, and so they are very useful in showing the effects of ascertainment on admixture estimates.

I have repeated the same K=5 experiment, using the ascertainment panels on a French, Han, Papuan1, San, and Yoruba individuals. Irrespective of the panel used, the same five components emerged: Asian, West-Eurasian, African, Australasian, and Amerindian. However, there are substantial differences in the inferred admixture proportions. The average admixture proportions can be found in this spreadsheet.

The levels of the "African" component in the HGDP African populations are summarized below:

These are almost the same in the French/Han/Papuan1 ascertainments, despite the different number of markers used. When SNPs are ascertained on Eurasian individuals, many SNPs present in African populations are not discovered, and hence, African populations appear "purer" by having a higher proportion of the "African" ancestral component.

However, when SNPs are ascertained on African populations, a different picture emerges, with clear evidence of West Eurasian admixture:

San ascertainment:
Yoruba ascertanment:

It is now evident that there is Eurasian admixture in African populations that was "hidden" in panels of SNPs ascertained on Eurasian individuals. Moreover, this Eurasian admixture seems to be more related to West Eurasians.

Someone might argue that the observed West Eurasian admixture in African populations is the result of a second migration Out-of-Africa that only affected West Eurasians. This, however, is a weak argument, because -with the exception of a few populations with known recent African admixture, such as the HGDP Arabs- there is no variation in African ancestry in Eurasia: it is everywhere virtually zero. On the contrary, in Africa, there are populations with a lot of West Eurasian admixture (Mozabites), intermediate (Yoruba, Bantu) and minimal (San, Pygmies). If the HGDP included more East African and Saharan populations, we would see an even clearer view of variable West Eurasian admixture throughout Africa. In short: the variable levels of West Eurasian admixture in Africa, coupled with the constant lack of any substantial African admixture in West Eurasia is a tell-tale sign that it was a West Eurasia-to-Africa migration, and not the reverse. 

This migration created a cline of West Eurasian admixture in Africa, with minima in isolated African hunter-gatherer (San/Pygmy populations), maxima in North and East Africa, and sharp transitions across geographical barriers (such as the Sahara), or ethnic differences (e.g., African agriculturalists vs. foragers). It is no longer tenable to view West Eurasian back-migrations as limited events that affected only North and East Africa: their effects are clearly evident throughout Africa, having affected different populations to a different extent.

The existence of Eurasian admixture throughout Africa is an interesting and novel finding. How much such admixture is there? As I explain here, in the case of admixed populations, the proportion of foreign admixture of a population increases if we include "purer" indigenous populations: Mexican Mestizos are less "European" if remote Amazonians are included in the analysis; North Indians appear more "South Asian" if South Indian tribals are excluded.

The African hunter-gatherers (San and Pygmies) are the least admixed Africans currently in existence, but we cannot tell what their proportion of indigenous African vs. Eurasian ancestry actually is. We simply don't have the genomes of pre-back-migration Africans to compare against, although there are strong hints from palaeonathropology that these included forms that do not fit within the present-day Homo sapiens continuum, such as Iwo Eleru.

It is at present unknown what percentage of African genomes is derived from Eurasian back-migrants, anatomically modern humans in Africa, as well as more divergent indigenous African hominins.

Hopefully the techniques of "virtual genomes", inference techniques allowing migration, such as TreeMix, together with full genome sequencing, and (hopefully) ancient African DNA will help elucidate the emerging picture of the multiple origins of the Sub-Saharan Africans. Africa may have been the cradle of H. sapiens but many of her Eurasian sons came back.

Panel for admixture inference in Americas

This is based on only a few hundred AIMs, but nonetheless is quite useful because it reports new data for a variety of South American populations.

PLoS Genet 8(3): e1002554. doi:10.1371/journal.pgen.1002554

Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas

Joshua Mark Galanter et al.

Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R2>0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region.


March 18, 2012

Neandertal/Denisovan admixture using PCA and ADMIXTURE (and another African surprise)

I have repeated the previous experiment, but this time I created K=5 synthetic populations from the components inferred by ADMIXTURE in the Harvard HGDP set. At K=5, the five inferred components corresponded to Sub-Saharan, West-Eurasian, East-Asian, Amerindian, and Australasian populations.
It is noteworthy, that for the first time, some deep Sub-Saharan African populations show evidence of Eurasian admixture; the type of this admixture is not clear, but it seems to be mostly West rather than East Eurasian. It may, in fact, reflect an element that was already West Eurasian-like, since there is a definite excess of the West Eurasian-centered component in the Sub-Saharan populations.

Note that this element was picked up because the SNP ascertainment was done on a San individual.  The San were not generally used for the ascertainment of SNPs in commercial SNP arrays designed for association studies, and hence their SNP diversity is generally under-reported. But, the panel4 of SNPs was ascertained in a San individual with full genome sequencing (as described in the Harvard HGDP materials).

Hence, it appears that Yoruba are 82.9% of the San-centered component, BantuKenya 78.3%, BantuSouthAfrica 86.9%, BiakaPygmy 94.7%, Mandenka 80.9%, MbutiPygmy 97.9%. San-centered components have appeared before in ADMIXTURE experiments, but the crucial difference here is that the remainder of the African populations' ancestry does not fall in another Sub-Saharan-centered component, but rather in the Eurasian ones.

Remembering that ~0% in an ADMIXTURE component means minimum -in the context of the studied set of populations- not zero, influence, we can reasonably infer that the San and Pygmy groups have minimal Eurasian influence, and not that it is absent entirely. Just as the Eurasian ancestry in other Sub-Saharan Africans turned up when we ascertained SNPs in the San, so could the Eurasian ancestry in the San turn up if we ascertain SNPs in an even more divergent African population. Unfortunately, we're out of luck, because the San are the most divergent African population currently in existence, having barely survived the recent Bantu onslaught.

In order to determine how much, i.e., whether there has indeed been a major episode of admixture between a Eurasian-like and a very divergent (unknown) African hominin, we would have to have the genome of that hominin (like we do with Neandertals and Denisovans), and measure how shifted towards it different populations are. I don't hold high hopes that the African heat will provide such an ancient genome, but if such a population did in fact persist down to the Holocene boundary, there may yet be hope of discovering their DNA directly. Otherwise, it will take ingenuity with full genomes of modern populations to test the hypothesis.

Getting back on topic, below is the projection of the 5 ancestral components onto the PCA created by chimp, Denisova, and Neandertal

The exact average positions of the different groups are:

# Population Sample_Size PC1 PC2

1         Chimp  1  0.815 -0.051
2       Neander  1 -0.452 -0.680
3      Denisova  1 -0.363  0.731
4    East-Asian 50  0.034 -0.011
5 West-Eurasian 50  0.033 -0.013
6   Sub-Saharan 50  0.050 -0.001
7  Australasian 50  0.024  0.000
8    Amerindian 50  0.034 -0.012

So: Neandertals are "bottom left", and Denisova are "top left". The plot of the ancestral components, for the "central" portion of modern human populations:
The results are as expected, and there may even be a hint of a little excess of "Neandertal" influence in  the West Eurasian population -relative to East Asians- in accordance with what John Hawks has reported. Also, if the Yoruba are 17.1% more Eurasian than the San, then the little Neandertal ancestry in them detected by Hawks's experiments is immediately explained.

March 16, 2012

Neandertal/Denisovan admixture using PCA

Eric Durand at 23andMe developed a test of Neandertal ancestry based on PCA. The main idea is to map modern human samples onto a PCA space of Chimpanzee, Neandertal, and Denisova, and see if there are any shifts in direction towards the two archaic hominins.

I tried my hand at replicating this experiment using the Harvard HGDP set. In particular I used the panel4 of SNPs, ascertained on a San individual. Below is the broad-view PCA:

blue (top left) = Neandertal
green (bottom left) = Denisova
red (right) = Chimp

Now, here is a blowup of the middle portion of the figure, on human populations.

As expected, Eurasians deviate from Africans in a Neandertal direction, while, Papuans deviate from Eurasians in a Denisova direction.

TreeMix analysis of North Eurasians (and an African surprise)

I have used my K12b dataset to isolate a set of 537 individuals who had less than 10% membership in the South Asian, Northwest African, Southeast Asian, South Asian, East African, Gedrosia, South Asian, East African, Southwest_Asian, and Sub_Saharan components. Hence, the remaining 537 individuals had 90%+ membership in the remaining Atlantic_Med, North_European, Caucasus, Siberian, and East_Asian components.
  • The Atlantic_Med component is frequent in northwestern Europe
  • The North_European component is dominant in northeastern Europe and forays into Siberia
  • The Caucasus component is dominant in the Caucasus and forays into Central Asia
  • The Siberian component is dominant in North Asia and forays into Europe
  • The East_Asian component is frequent in East Asia and forays into North Asia
This pruning procedure may not be perfect, but it helps isolate a dataset consisting (mostly) of North Eurasian individuals. Furthermore, I removed all populations who had less than 5 remaining individuals after the first pruning step. Hence, in the end, I had a dataset of 38 populations/452 individuals. The remaining populations were:
Russian_D, Polish_D, German_D, Finnish_D, Swedish_D, Mixed_Slav_D, Norwegian_D, Lithuanian_D, Japanese_D, Daur, French, French_Basque, Hezhen, Japanese, Oroqen, Russian, Sardinian, Yakut, CEU30, JPT30, Belorussian, Chuvashs, Hungarians, Lithuanians, Romanians, Selkup, Evenk, Tuva, Yukagir, Nganassan, Dolgan, Buryat, Mongol, FIN30, Kent_1KG, Bulgarians_Y, Ukranians_Y, Mordovians_Y
Additionally, a sample of 30 Yoruba from the HapMap-3 was used as an outgroup.

TreeMix analysis

The TreeMix analysis was performed with default parameters, and allowing for a different number of migration edges.

Nomenclature: The direction of gene flow is best seen in the figure and/or associated treeout files.
For the text, I will put in (), the common ancestor of two populations, e.g., (French_Basque,Sardinian) and also as (X, *) the tree rooted at a particular node X, e.g., (Buryat, *)

0 migration edges:

The West and East Eurasian clusters are identified, with some populations with likely admixture being placed closer to the Eurasian root.

1 migration edge:
64% from (Sardinians/Basques) to Yoruba; this is difficult to interpret, but there has been evidence in the past that Africans and West Eurasians share more ancestry than Africans and East Asians do. In the linked post, I proposed a major episode of back-migration into Africa, and it is perhaps this that is being captured by this migration edge: Sardinians/Basques are the only two South-West Eurasian populations included, and any back-migration into Africa must have originated in the southern parts of West Eurasia.

Such a high level of back-migration may in fact be plausible, since Yoruba are a predominantly Y-haplogroup E bearing population, and the origin of the DE clade of the human Y-chromosome phylogeny is up in the air with both an African and Eurasian case having been advanced. Personally, I favor the Eurasian case, since within the CT clade, we have two subclades: CF (Eurasian) and DE (Eurasian/African).

Interestingly, John Hawks has recently discovered an unanticipated excess of "Neandertal ancestry" in Yoruba. This may also point to a back-migration into Africa and/or admixture of a group of Africans related to Eurasians (whom I've called Afrasians), with groups of Africans (Palaeoafricans) that split before the H. sapiens/H. neandertalensis common ancestor.

There is, however, another detail in the figure that may have escaped your notice: there is now about 0.5 worth of drift in the figure (left-to-right) as opposed to only 0.12 in the tree without migration edges. So, perhaps what we are seeing is indeed the first sign of admixture between modern and archaic humans in Africa, which has been made more likely by recent anthropological discoveries.

It's not clear to me whether TreeMix has stumbled onto something important or not, but it is certainly worth keeping in mind that the above model fits the data better than the simple tree model. Moreover, TreeMix attempts to reverse the polarity of migration edges, and -apparently- the (Sardinian, French_Basque)-to-Yoruba edge is preferable to the reverse.

So, we should keep our minds open to the possibility that the greater similarity of West Eurasians to Africans is not the result of multiple Out-of-Africa waves, one of which affected only West Eurasians, but of an Into-Africa back-migration from West Eurasia.

So far, tree-based models have focused on how diverse African groups are, and hence, the reduced diversity of Eurasians has been interpreted as an Out-of-Africa bottleneck that carried a subset of African variation into Eurasia.

But, there is an alternative interpretation of the evidence, namely that African groups are diverse because they carry a superset of ancient Into-Africa variation, with the African-specific part of their variation being the result of admixture with pre-existing African hominins. Such a scenario cannot be captured by tree models, but is apparently considered and not rejected by TreeMix which allows for lateral gene flow. Let's wait and see what new things come from full genome sequencing.

2 migration edges:

The (French_Basque/Sardinian)-to-Yoruba edge persists (64%) and a new edge was added from  (Buryat, *)-to-Mongol (85%). The "Mongol" sample consists of Siberian Mongols described by Rasmussen et al. (2010). An inspection of their K12b population portrait indicates that they do, in fact, have West Eurasian admixture, which according to the K12b spreadsheet amounts to about 18% in total. 

3 migration edges:
The aforementioned (French_Basque/Sardinian)-to-Yoruba (64%) and (Buryat,*)-toMongol (85%) edges persist, and now we have a 68% Nganasan-to-Selkup edge. 

These are the two Siberian Uralic populations in the dataset. This seems to parallel the K12b results, as Selkups have a North_European element which the Nganasans (Uralic speakers from the Arctic coast of Central Siberia lack), so we are seeing the hybridity of the Selkups here, who, like the Mongol sample are partly of West Eurasian ancestry.

4 migration edges:

The aforementioned (French_Basque,Sardinian)-to-Yoruba (64%), (Buryat,*)-to-Mongol (84%), and Nganasan-to-Selkup (68%) persist, and now we have a 89% (Buryat, *)-to-Tuva edge. According to the K12b the Tuva have 13.3% West Eurasian admixture, so again we have reasonably good agreement between TreeMix and ADMIXTURE. 

Interestingly, the non-"eastern" component of Selkups and Tuvans now forms a clade. It seems that a Nganasan-like and a (Buryat, *)-like population have converged into southern Siberia, absorbed a common local element and became the Selkup and Tuva respectively.

5 migration edges:

The aforementioned (French_Basque,Sardinian)-to-Yoruba (64%), (Buryat,*)-to-Mongol (85%),  Nganasan-to-Selkup (68%) persist, and 90% (Buryat, *)-to-Tuva persist, and now we have a new 18% Oroqen-to-(Yakut, Evenk) edge. The Oroqen and the Evenk are Tungusic speakers, whereas the Yakut are Turkic people from northeastern Siberia, having migrated there from the vicinity of Lake Baikal during the last millennium.

migration edges:

The aforementioned (French_Basque,Sardinian)-to-Yoruba (64%), (Buryat,*)-to-Mongol (85%),  Nganasan-to-Selkup (68%), 90(Buryat, *)-to-Tuva persist, 18% Oroqen-to-(Yakut, Evenk), persist, and a new 16% Nganasan-to-Oroqen edge appears. Interestingly, this has allowed the Oroqen and Hezhen to now form their own clade, which makes sense as these are both Tungusic speakers from northeastern China. The other Tungusic population, the Evenk group with the Turkic Yakut: what they share in common is that they both share origins close to Lake Baikal in Siberia.

migration edges:

The aforementioned (French_Basque,Sardinian)-to-Yoruba (64%), (Buryat,*)-to-Mongol (85%),  Nganasan-to-Selkup (68%), 90(Buryat, *)-to-Tuva persist, 18% Oroqen-to-(Yakut, Evenk),  16% Nganasan-to-Oroqen edges persist, and there is a new 81% Evenk-to-Yukagir edge. The remainder of the Yukagirs' ancestry is derived from the West Eurasian tree. The Yukagir language is rather mysterious, with some links to Uralic having been postulated. Here it pays off to look at the population portraits, since it is apparent that -unlike the Selkup- their West Eurasian ancestry is limited to a few individuals.

It is fairly interesting that Russian anthropologists placed the Yukagirs in the Baikal group of the Central Asian race, the same as the Evenks, who are their biggest donors. So, Yakuts, Evenks, and Yukagirs all seem to share the same Baikal-type of origin.

migration edges:
There is now a 64% Sardinian-to-Yoruba edge, a 16% Oroqen-to-Yukagir edge, 20% (Buryat, *)-to-(Yakut, Evenk), and a 24% Nganasan-to-Chuvash edge, 29% Oroqen-to-(Yakut,Evenk) edge, 88% (Buryat, *)-to-Tuva, 62% Nganasan-to-Selkup, 85% (Buryat, *)-to-Mongol. 

The tree has been rather re-organized, with two main Siberian groups identified: an eastern group (Hezhen, Daur, Oroqen, Buryat), and a central group (Yukagir, Dolgan, Nganasan, Yakut, Evenk, Selkup). The Chuvash, predominantly Europeoid Turkic speakers from Russia show evidence of gene flow from the central group as well, whereas the Selkup, Uralic speakers from Siberia, who belong to the central group, show evidence of gene flow from Europe.

migration edges:

64% (French_Basque,Sardinian)-to-Yoruba, 85% (Buryat, *)-to-Mongol, 68% Nganasan-to-Selkup, 92% (Buryat,*)-to-Tuva, 14% Oroqen-to-(Yakut,Evenk), 14% Nganasan-to-Oroqen, 82% Yakut-to-Yukagir, 90% Evenk-to-Dolgan, 13% Hezhen-to-(Nganasan, *).

10 migration edges:

64% (French_Basque, Sardinian)-to-Yoruba, (85% Nganasan, *)-to-Mongol, 68% Nganasan-to-Selkup, 92% (Nganasan,*)-to-Tuva, 15% Oroqen-to-(Yakut,Evenk), 15% Nganasan-to-Oroqen, 82% Yakut-to-Yukagir, 90% Evenk-to-Dolgan, 43% Hezhen-to-Buryat, 14% Sardinian-to-Bulgarian.

I will stop at this point. I may add more migration edges later to this post, but I'm tired of typing this stuff.

You can download all the plots and *.treeout files here.

UPDATE (March 20): I have repeated the experiment with HGDP San, rather than Yoruba as the outrgroup:

There is now a 63% migration edge from (Basque, Sardinian) to San.