October 29, 2010

Continuity between Neolithic, Bronze Age, and recent Sardinians

The serendipity of this paper appearing just now is incredible. Sardinians emerge as belonging over 96.2% in a "Southern European" genetic component revealed by ADMIXTURE analysis, while peninsular Italians typically also possess to a great extent the West Asian, Northern European, and Southwest Asian components.

(On the right my ADMIXTURE analysis, with the Sardinian element in light green)

So, it is a great joy to read that this component may indeed correspond to an ancient southern European population inhabiting the island since Neolithic times, while peninsular Italy has deviated from this population, probably as the result of the settlement of Romano-Celts, Greeks, and all the other historical processes which have affected it through history, and which can be detected in the genetic record.

Note that, Sardinia is also one of the few locations in the world so far where there is clear evidence of mtDNA continuity, with many other samples showing contrasts with modern populations.

Homo. 2010 Oct 25. [Epub ahead of print]

Craniofacial morphometric variation and the biological history of the peopling of Sardinia.

D'Amore G, Di Marco S, Floris G, Pacciani E, Sanna E.


The aim of this work is to explore the pattern of craniofacial morphometric variation and the relationships among five prehistoric Sardinian groups dated from Late Neolithic to the Nuragic Period (Middle and Late Bronze Age), in order to formulate hypotheses on the peopling history of Sardinia. Biological relationships with coeval populations of central peninsular Italy were also analysed to detect influences from and towards extra-Sardinian sources. Furthermore, comparison with samples of contemporary populations from Sardinia and from continental Italy provided an indication of the trend leading to the final part of the peopling history. Finally, Upper Palaeolithic and Mesolithic samples were included in the analyses to compare the prehistoric Sardinians with some of their potential continental ancestors. The analysis is based on multivariate techniques including Mahalanobis D(2) distance, non-parametric multidimensional scaling (MDS) and principal component analysis (PCA). The results showed the tendency to progressive differentiation between Sardinian groups and peninsular Italian groups, with the possible exception of a discontinuity showed by the Bonnànaro (Early Bronze Age) Sardinian sample. Several aspects of the morphological results were found to agree with the current genetic evidence available for the present-day Sardinian population and a Nuragic sample: (1) biological divergence between the Sardinian and peninsular Italian populations; (2) similarity/continuity among Neolithic, Bronze Age and recent Sardinians; (3) biological separation between the Nuragic and Etruscan populations; (4) contribution of a Palaeo-Mesolithic gene pool to the genetic structure of current Sardinians.


Another theory about Neandertal-modern interbreeding

Jean M points me to a paper about the theory of Neandertal introgression into the modern human gene pool that I had overlooked.

The paper echoes many of my criticisms about the theory which I expressed here and here.

The authors put it quite succinctly what was expected of the Neandertal genome by the two competing theories (Out-of-Africa vs. multi-regional) that stand on opposite ends of the spectrum:
There are two predominant models of modern human origins: multiregional evolution and recent African replacement. Multiregional evolution posits that the evolution of contemporary peoples occurred around the globe, with archaic populations such as the Neandertals contributing locally in their geographic regions [4]. This model predicts that Neandertals will share significant genetic variation with Europeans to the exclusion of other populations. Recent African replacement suggests that contemporary humans owe their heritage to a small African population that spread around the world replacing archaic populations with little to no interbreeding [5]. This model predicts that Neandertals will be equally distantly related to all contemporary human populations.
The surprising find about the Neandertal genome is that they were more closely related to Eurasians than to Africans (Yoruba and San): this conflicts with both models. The authors of the original study saw two ways out of this:

First, that humans absorbed Neandertal genes in the Levant that somehow managed to get distributed evenly across Eurasia, to the extent that a Papuan is as close to Neandertals as a European is, even though there is no hint of Neandertal presence in Papua or thousands of km close to it.

Second, that there was African population structure, and that Neandertals and modern humans formed a clade with respect to other African hominids. The breakdown of this structure drove some Africans away from Neandertals, rather than Neandertal admixture in Eurasians driving them towards Neandertals.

The authors of the current paper propose that:
We propose a third alternative. The paleontological and archaeological records suggest that modern humans and Neandertals overlapped in the Eastern Mediterranean region around 100 thousand years ago during a time when the African faunal zone extended temporarily into the Middle East. The range of modern humans then likely contracted back into Africa, severing contact with Neandertals, before finally expanding their range out of Africa around 50 thousand years ago [11]. Admixture may not have been possible during this time because a southern route out of Africa through the Arabian peninsula [12] would not have put the populations in contact. Any admixture would have occurred prior to the expansion of modern humans out of Africa between East Africans and Neandertals (Figure 1C). If this is correct, Neandertal genes will be found at low frequency in East Africans and perhaps others. These low-frequency Neandertal genes may then have been pushed to high frequency or fixation in the out of Africa populations through the iterated founder effect associated with range expansions [13].
The authors call (like I did) for sampling East Africans for evidence of "Neandertal genes".

Of course East African-Neandertal admixture prior to the Eurasian expansion would have the desired effect, the same as my favored model of deep African population structure.

But, a new curveball was thrown our way by the discovery of 100ky anatomically modern humans in south China. The third alternative seems less plausible now: it would be possible to think of modern (Neandertal-admixed) humans retreating back to east Africa after 100ky due to a retreat of African fauna from the Middle East when we didn't know about the Chinese modern Homo sapiens. But, now, we have to conclude that there were modern humans living between Israel and south China 100ky ago, and any idea of retreat becomes implausible.

Thus, I still believe that the 2nd scenario (deep African population structure + common Neandertal-sapiens ancestor in East Africa) is more plausible.

Nonetheless I, like the authors, have arrived -for different reasons- at the same conclusion: east Africans need to be sampled for Neandertal genes.

If humans admixed with Neandertals in Eurasia, then we expect East Africans to have Neandertal genes in proportion to their Eurasian admixture. Let's say 4% Neandertal genes and 10% Eurasian admixture in a particular east African population. Then, we expect about 0.4% Neandertal genes in this population. If we find much more, then there will be something wrong with the theory of "Neandertal admixture in Eurasia".

Current Biology Volume 20, Issue 12, 22 June 2010, Pages R517-R519

Neandertal Genome: The Ins and Outs of African Genetic Diversity

Jason A. Hodgsona, Christina M. Bergeya and Todd R. Disotell


Analysis of the Neandertal genome indicates gene flow between Neandertals and modern humans of Eurasia but not Africa. This surprising result is difficult to reconcile with current models of human origins and might have to do with insufficient African sampling.


Facial composite of "global human"

Razib points me to a Dutch article where a "global average" was created, by averaging 1470 people in 25 countries, taking into account population size. Not sure what the exact methodology is (feel free to comment if you have more info), but it looks plausible enough. Here is a link to the original article (in Dutch).

October 28, 2010

1000 Genomes Project has arrived

John Hawks covers it (it's open access). But, here is an interesting tidbit from the supplement. Onto it Y-chromosome sleuths!
14.4. Y chromosome Haplogroups
A maximum likelihood haplogroup tree under a HKY model of evolution was produced using phyML, and bootstrap values were produced using 100 subsamplings. Trees were produced using both all 2870 filtered sites (Supplementary Figure 7), and the 1971 UYR sites; though there was very little difference between the two trees. The haplogroup tree classifies all the major haplogroups as monomorphic, and recovers the relationships between them, with high bootstrap confidence. It also shows evidence for a deep division between haplogroups DE and CT, previously identified only by a single marker (P143; Karafet, Mendez et al. 2008). New insights into recent human evolution can also be gained from the branch lengths; for example, the short internal branch lengths within the haplogroup R1b relative to the other haplogroups suggest a recent expansion of this European haplogroup (Balaresque, Bowden et al. 2010).

Nature 467 , 1061–1073 (28 October 2010) doi:10.1038/nature09534

A map of human genome variation from population-scale sequencing

The 1000 Genomes Project Consortium

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.


Long live the 28th October 1940

In celebration of today's holiday, I've opened up submission to the Dodecad project (for a limited time) to many more groups than my target ones.

October 27, 2010

Origin of Indian Austroasiatic speakers

Mol Biol Evol (2010) doi: 10.1093/molbev/msq288

Population Genetic Structure in Indian Austroasiatic speakers: The Role of Landscape Barriers and Sex-specific Admixture

Gyaneshwer Chaubey et al.

The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in South and Southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in Southeast Asia with a later dispersal to South Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from South Asia. To test the two alternative models this study combines the analysis of uniparentally inherited markers with 610,000 common SNP loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17-28 KYA) in Southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and “structure-like” analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterised by two ancestral components - one represented in the pattern of Y chromosomal and EDAR results, the other by mtDNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from Southeast Asia, followed by extensive sex-specific admixture with local Indian populations.


Ancient mtDNA from Titriş Höyük, southeastern Turkey

I will comment later.

International Journal of Osteoarchaeology, n/a. doi: 10.1002/oa.1213

Understanding Early Bronze Age social structure through mortuary remains: A pilot aDNA study from Titriş Höyük, southeastern Turkey

T. Matney et al.

This report describes a pilot study examining aDNA from a skeletal population excavated in the 1990s at the late Early Bronze Age (EBA, c. 2300-2100 BC) urban settlement of Titriş Höyük in southeastern Turkey. Typically, late EBA burials at Titriş Höyük consisted of periodically reused underground family crypts contained within houses. However, one unique set of remains dated to the latest phase of the late EBA occupation at the site departed from this burial pattern entirely. It consisted of an above ground mortuary installation (B98.87) displaying the skulls and post-cranial bones of 19 individuals, most exhibiting a variety of fatal traumas. In this article, we compare the mtDNA sequences of these individuals with those buried in contemporary traditional late EBA intramural crypts. After successful extraction and amplification of ancient DNA molecules during a double blind study of 13 skeletons selected for the pilot study, our team was able to compare the genetic relatedness of individuals displayed in B98.87 to individuals buried elsewhere on the site. Based on archaeological evidence alone, earlier we had suggested that the occupants of B98.87 were perhaps outsiders, possibly soldiers vanquished in hostilities taking place shortly before the abandonment of the city at the end of the late EBA. However, our pilot study showed no clear genetic difference between the population of B98.87 and the broader population of the late EBA city, contrary to our expectations.


October 26, 2010

100,000 year old anatomically modern humans from Zhirendong

I'm sure I'll have a lot to say about this paper once I read it, but right now I'm focusing on the pilot phase of the Dodecad Project.

I'll post my comments later in this post; For now, I'll just say: those pesky ancestors have a way of upsetting scientific theories. But, in a sense, that's the beauty of science.

Related: the previous "oldest modern human" was Liujiang.

The paper's section on populational implications:
Populational Implications. Assuming that modern human biology
emerged initially in the late Middle Pleistocene of equatorial
Africa (8, 31, 36), the presence of derived, modern human
mandibular features in East Asia by early MIS 5 implies early
modern human population dispersal or gene flow across at least
southern Asia sometime before the age of the Zhiren Cave human
remains or independent emergence of these features in East
Asia. The early modern human MIS 5 dispersal into Southwest
Asia may therefore have included further population dispersal or
gene flow eastward across southern Asia.

However, the Zhiren 3 complex mosaic of distinctly derived,
modern human features of the anterior mandibular symphysis,
combined with archaic features of the lingual symphysis and
overall mandibular robustness, indicates that any “dispersal”
involved substantial admixture between dispersing early modern
human populations (cf. 5) or gene flow into regional populations
(cf. 37, 38). The paleontological data are insufficient to assess the
levels of such gene flow or admixture, but the morphological
mosaic of Zhiren 3 is most parsimoniously explained as the result
of such populational processes. It is not easily accommodated
into any Out-of-Africa with populational replacement scenario.
The short story: anatomically modern humans (AMHs) first emerge in East Africa in examples like Omo and Herto about 200-150ky. The first undeniably modern finds in Eurasia were from Qafzeh in the Levant, roughly contemporaneous with the new Zhiren sample.

These Qafzeh AMHs were usually interpreted as the Out-of-Africa-that-failed, an early excursion of anatomically modern humans into Eurasia that seems to have fizzled as AMHs appear, first as isolated teeth, and then as skulls like the Oase mandible and Mladec in Europe, and Liujiang in East Asia only 50-60 thousand years later.

Until now, it was supposed that these later AMHs were descendants of the Out-of-Africa-that succeeded, which postdated Qafzeh, was contemporaneous with the Aurignacian and the emergence of full-blown behavioral modernity.

The new Zhirendong find upsets this standard model: anatomically modern humans existed 100 thousand years ago in Africa, the Levant, and East Asia. It's extremely difficult to make the argument now that two of these AMH populations died out and the African one repopulated the world.

The two pillars of Out of Africa are (i) the genetics, i.e., the evidence for greater African genetic diversity, diminution of heterozygosity from east Africa, and increase of linkage disequilibrium, (ii) the palaeoanthropology, i.e., the temporal gap between AMHs in Africa and Eurasia.

Factor (ii) has just taken a huge blow. Moreover, Out-of-Africa supporters must now either (a) come up with scenarios for dispersal of AMHs 50,000 years at least before their current models, or (b) accept the emergence of modernity in Eurasia without dispersals from Africa.

UPDATE: John Hawks questions the chin=African equation.

PNAS doi: 10.1073/pnas.1014386107

Human remains from Zhirendong, South China, and modern human emergence in East Asia

Wu Liu et al.

The 2007 discovery of fragmentary human remains (two molars and an anterior mandible) at Zhirendong (Zhiren Cave) in South China provides insight in the processes involved in the establishment of modern humans in eastern Eurasia. The human remains are securely dated by U-series on overlying flowstones and a rich associated faunal sample to the initial Late Pleistocene, >100 kya. As such, they are the oldest modern human fossils in East Asia and predate by >60,000 y the oldest previously known modern human remains in the region. The Zhiren 3 mandible in particular presents derived modern human anterior symphyseal morphology, with a projecting tuber symphyseos, distinct mental fossae, modest lateral tubercles, and a vertical symphysis; it is separate from any known late archaic human mandible. However, it also exhibits a lingual symphyseal morphology and corpus robustness that place it close to later Pleistocene archaic humans. The age and morphology of the Zhiren Cave human remains support a modern human emergence scenario for East Asia involving dispersal with assimilation or populational continuity with gene flow. It also places the Late Pleistocene Asian emergence of modern humans in a pre-Upper Paleolithic context and raises issues concerning the long-term Late Pleistocene coexistence of late archaic and early modern humans across Eurasia.


October 25, 2010

Dodecad Ancestry Project

I have set up a new ancestry project for admixture analysis of Eurasian individuals. Please read the introductory post in the new dedicated blog.

More detailed analysis of Eurasian populations (K=10)

I have removed some populations from the previous run (such as Moroccan Jews and Samaritans) that tended to generate mini-clusters due to the presence of close relatives and/or inbreeding in the sample. I have removed some redundant populations to even out the dataset, and I have also added North Kannadi and Gujarati, which helped reveal the gradient of ancestry in South Asia.

ADMIXTURE results:

Admixture proportions:

Some interesting observations:
  • The occurrence of 3.8% South Asian in Romanians may signify its Roma population. Indeed, almost all of this comes from a 25% South Asian individual, almost certainly a Roma.
  • The small African component in Spaniards which was revealed in a previous K=8 run turns out to be East African (0.5%) rather than West African (0.1%). If this holds up in larger sets then it might signify that its origin is from East African admixed populations from the east, rather than Sub-Saharan Africans.
  • The multiplicity of ancestries of the Uygur is made evident, in agreement with the extensive craniometric and genetic data on prehistoric and extant populations from the area.
  • The proportion of the two East Eurasian components in Turkic populations is interesting. It seems that the earliest departures from the Turkic homeland (such as the Chuvash and Yakut) have a predominance of the NE Asian component, the Anatolian Turks are intermediate, and the Uygurs, the only ones to have stayed close to the homeland, have experienced an increase in the E Asian component.
  • The absence of the West African component in Ethiopians is striking. Here are the individual results for Ethiopians, illustrating the variability of the Southwest African vs. East African components. The Ethiopian sample consists of a number different ethnic groups of the country, some of which (like the Amharas) are of Western Eurasian linguistic origin.

I am currently running K=11 and K=12 on the exact same data to see how the LogLikelihood and Bayes Information Criterion will move and whether new mini-clusters will appear, or if the mega-components (such as the "West Asian", "South European", and "North European") will split informatively. I will update this post with information on what actually happened, and with additional plots -- if I get robust results.

October 23, 2010

Detailed admixture analysis of West Eurasian populations (+ GenomesUnzipped individuals)

Here is the result of my ancestry analysis of several West Eurasian populations from HapMap 3, HGDP-CEPH, and Behar et al. (2010) datasets using ADMIXTURE for K=10. Some non West Eurasian populations were also added, to squeeze out the partial admixtures from outside the region in some populations.
I have tried to find an informative label for each of the inferred components, corresponding to the common attribute of the populations where it appears the highest. I make no claim about the geographical/temporal/ethnic origin of any of them, or what they actually represent, so please don't take the labels to be more than mental aids for processing the visual information.

Some comments on the components:

The West African component is centered on the Yoruba, is represented among the Maasai, and North Africans, and also occurs in the Near East. As my previous analysis of African populations suggests, this component is not really limited to West African, as the Yoruba show clear ties with the Luhya of East Africa. However, I prefer to call it West African, rather than Sub-Saharan here, as the West African Yoruba are its only representative.

The East African component is centered on the Maasai. It is also represented in North Africans, being more important than the West African among Egyptians, but, as expected, the reverse is true for Moroccans and Mozabites. Also, as I've noted in my analysis of the African ancestry of Near Eastern populations, it is also more important in them than the West African.

The North European component reaches its maximum in Balto-Slavs. However, its substantial presence among Hungarians, French, and Basque, and indeed all European populations, and even Lezgins, suggest that it is a broader phenomenon.

The Druze component is centered on the Druze of the Near East, occurring at a lower frequency even in Europe, and in many populations of the Near East.

The East Eurasian component is centered on the HapMap Chinese, and occurs at a noticeable frequency among Turks, Iranians, Adygei, Russians, and Chuvash.

The Arab component reaches its highest frequency in Bedouins and Saudis but occurs widely in Semitic populations. A very interesting observation about it is that it occurs in Egyptians and Moroccans, but not in the Mozabite Berbers, perhaps reinforcing its Arab and/or Semitic associations. Its paucity among people from the Caucasus (Georgians, Adygei, Lezgins) and Armenians is a further argument for that association.

The NW African component has a clear association with Moroccans and Mozabites, also occurring in Egypt, Morocco and and Sephardic Jews and various other populations.

The Semitic component attains high frequency among Arabs and Jews, hence its name. It seems more widely distributed than the Arab component, which makes sense, as many Near Eastern and European populations have had a longer time in which to encounter different Semitic groups since their Bronze Age appearance in the historical scene, but the corresponding time for encountering Arabs has been shorter.

The SW European component attains its highest frequency among Sardinians and Basques, hence its name. It has an opposite cline of distribution compared to the next component:

The W Asian component dominates in people from the Caucasus and West Asia and is widely distributed across West Eurasia.

Revisiting the Genomes Unzipped individuals

I started the party of using the data of Genomes Unzipped volunteers for ancestry analysis and was soon joined by others. I took another jab at this data as an afterthought in my Near Eastern African analysis, and now it's time to revisit the topic using dense genotype data on a variety of populations, and the power of ADMIXTURE.

First of all, a note of explanation as to why I don't use PCA/MDS in my ancestry analysis. I have nothing against them per se, and the visual examination of a large number of principal components can give a very useful overview of the data. The main problem, however, that I perceive with these techniques is their treatment of individuals of different backgrounds:

The offspring of a Greek and a Norwegian, a Russian and a Spaniard, and two Hungarians may all fall in the same spot on a PCA map. By its very nature, PCA synthesizes across ancestries, representing individuals as singular dots, whereas ADMIXTURE tries to analyze them into their several underlying components.

If we are interested in how an individual compares against other living humans, by all means PCA/MDS are great tools. But you'll never know how the dot came to be, whether it is (a) descended from a long line of ancestors inhabiting the corresponding geographic space, or it is (b) the product of admixture between more distant ancestors whose genomic average projected on the first few principal components matches that of (a).

With that long introduction, here is the analysis of the Genomes Unzipped individuals. These were listed as the "People" group in the above plot, but now we are looking at their individual level components.
The preponderence of "N European" and "SW European" components in everyone but Dan Vorhaus immediately tells us that these are Europeans. Moreover, the relative importance of the "SW European" vs. the "N European" component tells us that they are not from eastern Europe (compare with Balto-Slavs of previous figure). Finally, the relative insignificance of "W Asian" and "Semitic" components is suggestive of NW EUropean ancestry.

Only three individuals stand out in the analysis:

VXP001 (Vincent Plagnol) shows an excess of "SW European" component, supporting an excess of SW over NW European ancestry, but also a tiny slice (1.5% to be precise) of the "Arab" component lacking in the other individuals. Such tiny slices occur in several southern populations, so these results reinforce the "SW" rather than "NW" impression for this sample.

JKP001 (Joe Pickrell) also has elevated levels of "SW European" vs "N European" component, also suggesting more southern ancestry. He also has small slices of "Semitic" (4.4%) "Arab" (1.1%) and "Druze" (1.1%) components. Inherited from his Italian grandparent, perhaps?

Finally, DBV001 (Dan Vorhaus) shows a relative importance of "SW European" (26.6%) "Semitic" (17.9%), "Arab" (2.7%), and "Druze" (1.8%) components, suggestive of a more south- and eastern- origin than the other individuals. It is useful to compare him with the Ashkenazi Jewish average, side by side, as can be seen on the left. Dan is a close match for his people, with the exception of a small green slice of "NW African" in the Ashkenazi sample, which is lacking in Dan.

But, how is that component distributed among Ashkenazi Jews? Here is the answer for the 21 individuals in the sample:

It is evident that this is detectible in some individuals (like #418), excessive in others (like #426). Thus, Dan falls perfectly within the continuum of his people.

It's quite interesting that the three individuals that stood out from the rest in my initial analysis are also the ones who stand out in this more comprehensive assessment, and some of the reasons for it were revealed.

UPDATE (Nov 1): Joe Pickrell discovers Jewish great-grandparent

October 22, 2010

mtDNA H1 in North Africa

The 95% CI for North African H1 is 4.4-11.5ky, thus the notion that this reflects post-glacial expansions from Iberia is a bit forced. Also, the age of North African H1 is useless to determine when its ancestors arrived; to achieve that, one should have carried out an estimation of the common ancestor of European and North African H1s; if the two split during the Paleolithic, then the common ancestors would have a correspondingly old age.

The age estimation of the North African specific subclades does put a lower limit to the arrival age, but this is 4.3ky (0-9.8ky 95% CI) for the oldest H1v clade.

Thus, I conclude that an old arrival of H1-related lineages to North Africa is likely, but a pre-Neolithic one is neither supported nor refuted by the data.

PLoS ONE 5(10): e13378. doi:10.1371/journal.pone.0013378

Mitochondrial Haplogroup H1 in North Africa: An Early Holocene Arrival from Iberia

Claudio Ottoni et al.

The Tuareg of the Fezzan region (Libya) are characterized by an extremely high frequency (61%) of haplogroup H1, a mitochondrial DNA (mtDNA) haplogroup that is common in all Western European populations. To define how and when H1 spread from Europe to North Africa up to the Central Sahara, in Fezzan, we investigated the complete mitochondrial genomes of eleven Libyan Tuareg belonging to H1. Coalescence time estimates suggest an arrival of the European H1 mtDNAs at about 8,000–9,000 years ago, while phylogenetic analyses reveal three novel H1 branches, termed H1v, H1w and H1x, which appear to be specific for North African populations, but whose frequencies can be extremely different even in relatively close Tuareg villages. Overall, these findings support the scenario of an arrival of haplogroup H1 in North Africa from Iberia at the beginning of the Holocene, as a consequence of the improvement in climate conditions after the Younger Dryas cold snap, followed by in situ formation of local H1 sub-haplogroups. This process of autochthonous differentiation continues in the Libyan Tuareg who, probably due to isolation and recent founder events, are characterized by village-specific maternal mtDNA lineages.


October 20, 2010

The shape of things to come

(Last Update: Oct 22)

Here is a teaser from my ongoing ADMIXTURE experiments. I have assembled a nice set of Eurasian populations, and this time I will continue increasing K until the Bayes Information Criterion maxes out.

I had previously used BIC to successfully choose K in my craniometric analysis of world populations. I am now at K=7 and it keeps on rising! Either it will stop or my computer will burn out doing Quasi-Newton iterations.

Onto the teaser: I'll only say that "People" are the 12 Genomes Unzipped individuals; I will leave their individual proportions a mystery - for now.

In color:
Sampled populations, # of individuals, ancestral components:

UPDATE (Oct 21):

Here are the K=8 results. I've added some more populations. The next step is to integrate HapMap data with HGDP-CEPH and the Behar et al. dataset. Stay tuned.

I've decided to present the data as population averages, as this is much more readable, especially as the number of individuals grows.

In this run, one of the clusters (purple) became associated with north Mongoloids (Yakuts, and partially Mongols and Daur), whereas Han are mostly in the green component. Notice that Yakut, Uzbeks, Chuvash, and Turks (all Turkic populations) are predominantly in the north cluster as far as their Mongoloid component is concerned, as expected.

UPDATE II (Oct 22):

Here are results for K=9. The addition of Maasai (MKK) has revealed the east African component which they share with Ethiopians; the latter also have a significant West Asian component.

For lack of a better term, I decided to call the lime component "Sardinian", as it is dominant in that population, but it is clearly reflective of something much broader.

October 19, 2010

Origin of Neolithic N1a

My comments on this paper will be posted in this space; needless to say, I have serious objections to the idea that N1a is a Mesolithic European mtDNA.

UPDATE (Oct 20):

The paper makes a significant contribution to the phylogeny of N1a by full sequencing of mtDNAs belonging to individuals from different populations, ascribing the Neolithic sequences to several identified clades. That part of the paper is solid and interesting.

My main problems with the paper are threefold:

1. The paper attempts to infer the origin of mtDNA founders using TMRCA estimates. But, as I have pointed out, TMRCAs of clades tell us nothing about where the common ancestors lived. The TMRCA of Latin American R1b, for example, predates the arrival of Western Europeans into the Americas by thousands of years.

2. The paper attempts to infer the origin of mtDNA founders using modern populations. Given the clear evidence for discontinuity between the Mesolithic and Neolithic and present in Central Europe, due to either demography or selection, I find it a very questionable proposition. If N1a turns up in dated human remains associated with a Mesolithic culture of Europe prior to the arrival of the Neolithic economy, then the hypothesis that it is a forager lineage is unsubstantiated.

3. Finally, the paper uses the frequency of identified subclades to infer the location of the founders. In addition to the above 2 criticisms, I must point out that this puts the cart before the horse. In a uniform landscape, we do expect present-day frequency to be related to the place of origin of a mutation. But, we are not dealing with such a landscape. In particular, we are dealing with an expanding population exploiting new territory. As an analogy, the few people who made the crossing into the Americas left a few relatives in the Old World, and produced a plethora of new ones (descendants) in the New World. They did so by exploiting their new environment.

An additional objection, is, of course, that the idea of Mesolithic N1a in Europe requires a virtual partition of pre-Neolithic European populations, to account for its non-existence among north/central Mesolithic North/Central Europeans. That is hard to swallow, unless by "Mesolithic" one means "very shortly before the Neolithic", which would make it possible for a lineage to establish itself in e.g., the Balkans shortly prior to the onset of the Neolithic, and begin expanding shortly thereafter. However, I don't see how an argument could be made for such a scenario, and, of course, I doubt that genetic dating methods in modern populations have enough precision to allow one to distinguish between late Mesolithic and early Neolithic intrusions.

BMC Evolutionary Biology 2010, 10:304doi:10.1186/1471-2148-10-304

Mitochondrial haplogroup N1a phylogeography, with implication to the origin of European farmers

Malliya GOUNDER Palanichamy et al.

Tracing the genetic origin of central European farmer N1a lineages can provide a unique opportunity to assess the patterns of the farming technology spread into central Europe in the human prehistory. Here, we have chosen twelve N1a samples from modern populations which are most similar with the farmer N1a types and performed the complete mitochondrial DNA genome sequencing analysis. To assess the genetic and phylogeographic relationship, we performed a detailed survey of modern published N1a types from Eurasian and African populations.

The geographic origin and expansion of farmer lineages related N1a subclades have been deduced from combined analysis of 19 complete sequences with 166 N1a haplotypes. The phylogeographic analysis revealed that the central European farmer lineages have originated from different sources: from eastern Europe, local central Europe, and from the Near East via southern Europe.

The results obtained emphasize that the arrival of central European farmer lineages did not occur via a single demic diffusion event from the Near East at the onset of the Neolithic spread of agriculture into Europe. Indeed these results indicate that the Neolithic transition process was more complex in central Europe and possibly the farmer N1a lineages were a result of a 'leapfrog' colonization process.


30,000-year old evidence of plant food processing

PNAS doi: 10.1073/pnas.1006993107

Thirty thousand-year-old evidence of plant food processing

Anna Revedin et al.

European Paleolithic subsistence is assumed to have been largely based on animal protein and fat, whereas evidence for plant consumption is rare. We present evidence of starch grains from various wild plants on the surfaces of grinding tools at the sites of Bilancino II (Italy), Kostenki 16–Uglyanka (Russia), and Pavlov VI (Czech Republic). The samples originate from a variety of geographical and environmental contexts, ranging from northeastern Europe to the central Mediterranean, and dated to the Mid-Upper Paleolithic (Gravettian and Gorodtsovian). The three sites suggest that vegetal food processing, and possibly the production of flour, was a common practice, widespread across Europe from at least ~30,000 y ago. It is likely that high energy content plant foods were available and were used as components of the food economy of these mobile hunter–gatherers.


October 18, 2010

Joe Pickrell on his ancestry

Joe Pickrell has a nice post on his ancestry at Genomes Unzipped, prompted in part by my use of the EURO-DNA-CALC program on his data. He explores his data using PCA on a much larger set of markers than the 192 used by EURO-DNA-CALC (the overlap between the Price et al. paper on which it is based and the 23andMe set).

I did not report Joe's results in my re-analysis of the Genomes Unzipped data, as his confidence interval intersected 0. For the record, his sample, like that Vincent Plagnol, didn't show any of the "yellow" cluster that was shared by Arabs and Dan Vorhaus (an Ashkenazi Jew).

Joe's explanation that his anomalous results is due to similar allelic frequencies in some markers between Ashkenazi Jews and southern Europeans is quite interesting, and it is based on a dissection of the markers used by EURO-DNA-CALC.

As I stated in my reanalysis of Plagnol's data, his result might also be due to:
a European-origin component in the composite Ashkenazi Jewish gene pool that he happens to share.
Joe's discovery about similar allele frequencies in some markers between Ashkenazi Jews and Italians is quite interesting in terms of my theory about the origin of the European component in the Ashkenazi Jewish gene pool.

Some early studies on AJ, using Y-chromosomes (pdf) overestimated their Near Eastern component by considering them a mix between Levantine and north/central European populations. But, this made the assumption that Jewish ancestors took a Lufthansa flight to Germany rather than spend 1,000+ years in the territory of the Roman Empire and the Hellenistic world where there might also have been introgression of European elements into their gene pool.

Thus, it may very well be that Ashkenazi Jews are distinctive with respect to NW Europeans both in terms to an ancient Near Eastern ancestry (shared with Arabs and "discovered" in my re-analysis) and with respect to Southern European ancestry corresponding to particular populations they interacted with during their sojourn in the Roman Empire.

To conclude, the release of the Genomes Unzipped data on the web has been beneficial to all involved: a few bloggers like myself could run and test their tools on the data, and people like Joe could give feedback on these tools. This is a strong argument in favor of open source tools and public available data and against the use of proprietary databases/ancestry estimation methods/datasets barricaded behind various controls.

As I continue my various ADMIXTURE-experiments, I will be sure to revisit the Genomes Unzipped folk and all those who wish to join them.

UPDATE (Oct 23): A much more detailed analysis of Genomes Unzipped individuals.
UPDATE (Nov 1): Joe Pickrell discovers Jewish great-grandparent

October 17, 2010

ADMIXTURE across Eurasia: from Anatolia to Siberia

(Last Update: Oct 17)

Here is a result of an ADMIXTURE run of a few populations from Eurasia (left to right: Turks, Armenians, Georgians, followed by a mix of Uygur, Mongolians, Yakut, Hezhen in no order), combining the HGDP dataset with that of Behar et al. (2010).

It's more of a test, rather than a final result, as I've just finished integrating the two datasets, but it's a nice comparison of a wide assortment of linguistic families.

Notice Turks and Armenians being quite similar to each other, (green+blue), although Turks are differentiated by the presence of an east Eurasian component (5.5%). On the basis of uniparental markers, five years ago, I estimated this component as 6.2% which seems to be right on the money. In the combined Armenian/Georgian sample this admixture is only 0.14% and as can be seen is limited to a handful of Georgian individuals.

It is interesting that Georgians belong semi-uniquely to the green cluster. Turks' non-Mongoloid ancestors were Indo-European speaking like the Armenians still are. It would be tempting to see in the blue-green contrast an Indo-European/Caucasian one, especially as the Caucasoid component further east seems to be mainly blue, in agreement with the idea that it was Indo-Europeans (in particular mainly Iranic speakers) who brought Caucasoid genes to the heartland of Asia.

UPDATE I (Oct 17):

Moving to the north, we see (left-to-right) Han (red), Hungarian/Belorussian (blue), Chuvash (first red "step"), Uzbek (second red "step"). Unlike the Turks, the Hungarians, who also speak a language that came from the east, seem to lack a noticeable east Eurasian component.

Their linguistic conversion was one of elite dominance, where a handful of Mongoloid and quasi-Mongoloid upper echelons left their language but not their genes:
According to his observations, the “overlords” were characterized by Turanid, Uralian and Pamir race elements and also by certain long-headed components. The “middle layer” or “warriors’ layer”, however, showed an anthropological profile distinctly different from that of the overlords. It was essentially constituted by Mediterraneans, Nordoids (who might also have been tall robust Mediterraneans) and Pamir component while the absence of Turanid and Uralian race characteristics was remarkable. As regards the third layer, the so-called “common folk”, they were dominated, just as the middle layer was, by Mediterranean and Nordoid elements but, in addition, the Cromagnoid ones were also significant.
The Chuvash are Turkic and live in Europe, while the Uzbeks, closer to the Altaic homeland in Asia are also Turkic, and have a predictable higher percentage of east Eurasian genes.

October 15, 2010

Y chromosome and mtDNA of Louis XVI of France (?)

From the paper:
After the execution of Louis XVI in January 21st, 1793, eyewitnesses stated that many people from the crowd dipped their handkerchiefs in the king’s blood and kept these objects as mementos [8]. An Italian family has owned for more than a hundred years – as demonstrated by a letter addressed to the director of the Muse/ e Carnavalet in Paris, January 31st, 1900 – a dessicated gourd that presumably contained one of these handkerchiefs.
The mtDNA results:
the majority of the cloned sequences (87%) showed a rare N1b haplotype, with the substitutions 16093C-16145A-16176(G)-16223T. The same results were found in Bologna by direct sequencing, along with another substitution (16390G), not included in the amplicon generated in Barcelona. The haplotype found at the mtDNA HVR2(73G, 151T, 152C, 189G, 194T, 195C, 263G and 315.1C), is consistent with the N1b haplotype from the HVR1, although the substitutions 151T, 189G and 194T are not described in the current N1b dataset lineages. We interpret these three substitutions as additional, undescribed modifications of a N1b haplotype.
The Y-chromosome STR markers:

A ysearch query reveals a handful of distant (3 off in 9 markers) matches ranging from Anatolia to Scotland. Likewise yhrd turns up no matches using either the full or restricted panel, but a 1-off match with DYS389II-29 in the restricted panel in Marche, Italy.

Wikipedia tells me that Louis XVI's patrilineage goes all the way to Robert the Strong, and his matrilineage to Catherine of Mayenne.

One would think that a 1,000-year long line of kings and nobles would have left enough side branches and bastards along the way to register a few hits on the European map. Perhaps, that's a reason to stress the "presumed" in the paper's title. On the other hand, I find it interesting that the presumptive haplogroup of Louis XVI was G2a, the same as 2 of 5 warriors from Merovingian Bavaria (7th c. AD).

There is a way to authenticate the results, as the authors note:
At present it is not possible to prove genetically that the sample really belongs to the king Louis XVI. One possibility would be to extract a new sample from the dry heart attributed to the Dauphin Louis XVII, son of Louis XVI, preserved at the Basilique Saint-Denis in Paris, and compare both Y-chromosome profiles. Owing to the fact that the Y-chromosome profile found is not present in our current genetic databases such as YHRD, a potential match would directly authenticate the studied blood sample.

Forensic Sci Int Genet. 2010 Oct 10. [Epub ahead of print]

Genetic analysis of the presumptive blood from Louis XVI, king of France.

Lalueza-Fox C, Gigli E, Bini C, Calafell F, Luiselli D, Pelotti S, Pettener D.

Institut de Biologia Evolutiva, CSIC-UPF, Dr. Aiguader 88, 08003 Barcelona, Spain.

A text on a pyrographically decorated gourd dated to 1793 explains that it contains a handkerchief dipped with the blood of Louis XVI, king of France, after his execution. Biochemical analyses confirmed that the material contained within the gourd was blood. The mitochondrial DNA (mtDNA) hypervariable region 1 (HVR1) and 2 (HVR2), the Y-chromosome STR profile, some autosomal STR markers and a SNP in HERC2 gene associated to blue eyes, were retrieved, and some results independently replicated in two different laboratories. The uncommon mtDNA sequence retrieved can be attributed to a N1b haplotype, while the novel Y-chromosome haplotype belongs to haplogroup G2a. The HERC2 gene showed that the subject analyzed was a heterozygote, which is compatible with a blue-eyed person, as king Louis XVI was. To confirm the identity of the subject, an analysis of the dried heart of his son, Louis XVII, could be undertaken.


October 14, 2010

African admixture in the Near East: where from?

Here is the result of running ADMIXTURE for K=5 using 275K SNPs on the combined HGDP + HapMap African and West Asian populations, also including Adygei and Tuscans. The populations are in order: Luhya, Maasai, Tuscans, Yoruba, Adygei, Bedouins, Druze, Mozabites, Palestinians.
At this level of detail, Africans are divided into three clusters which can be labeled Sub-Saharan (red), East African (blue), and "Mozabite" or North African (purple). Europeans and West Asians form the green cluster, while the Arab samples have a substantial contribution of the yellow cluster.

Here are the admixture proportions:

African admixture in the two European populations is probably in the limits of statistical noise and consists of "Mozabite" (0.4%) for Tuscans and "E African" for Adygei (0.6%).

Druze, an Arab population that was religiously isolated from Arab Muslims for about a thousand years seems to have correspondingly missed most African admixture, registering 0.6% "Mozabite" and 1.1% "E African".

Non-Druze Arabs have clear traces of African admixture both in the form of "Mozabite" North African (4.5% for Palestinians, 4.9% for Bedouins), E African (6% for Palestinians and 5.7% for Bedouins) and a little Sub-Saharan (1.3% for Palestinians and 2.1% for Bedouins).

I had pointed the mainly eastern African admixture in Near Eastern Arabs a year ago in my review of HAPMIX. Clearly Maasai are a better stand-in than the Yoruba for whatever African ancestry Arabs have.

It is quite interesting to note the genetic distance (expressed in Fst) between the five inferred clusters:

We can plainly see that proximity to Eurasians increases in the order of Sub-Saharan, East African, "Mozabite". I have little doubt that Somalis and Ethiopians from East Africa would occupy an intermediate position between Maasai from Kenya and "Mozabites" in that order.

An interesting observation is that the "Arab" cluster is slightly more distant to all African clusters than the European/W Asian cluster is. This might seem perplexing as geography might dictate that it should be closer to the African clusters.

However, this is not very surprising to me, as there was gene flow between West Asia and Europe and Africa in old times, evidenced by such things as the presence of Eurasian Y-haplogroup R-V88 in Africa and African haplogroup E1b in Europe and West Asia.

The original Arab ancestors, were probably haplogroup J1e-bearing Semites exploiting arid environments of West Asia. Present-day Levantine Arabs (especially Bedouins, in the available samples) maintain a strong signal of this component of their ancestry, admixed, however, principally with the original Tuscan- and Adygei- like West Asians, and secondarily with E and N Africans.

Revisiting GenomesUnzipped "Ashkenazi Jewish" admixture

There were two individuals in my recent post who showed some evidence of "Ashkenazi Jewish" admixture (DBV001: 100% and VXP001: 32%). I list in the comments of that post some possible explanatios for why VXP001 (who has no knowledge of Jewish ancestry) might get such a result. Naturally, using 275K SNPs is better than the 192 of EURO-DNA-CALC, so I did a separate run that included these two individuals.

The results are:

DBV001: 85.1% European/W Asian, 10.5% "Arab", 0.5% "E African", and 3.8% "Mozabite". This is entirely consistent with full known Jewish ancestry. The closest population to the Middle Eastern component of Jews are presumably the Druze, who have about 16.9% of the "Arab" (which should probably be relabeled "Semitic") cluster. Ashkenazi Jews are known to be intermediate between Levantine and European populations, and DBV001's result is entirely consistent with this.

As I've mentioned before, the exact percentage of Middle Eastern ancestry in modern European Jews is difficult to estimate, as this would depend on determining the exact percentage of "European/W Asian" and "Semitic" components was present in their gene pool before they settled in Europe. If, for example, they were 100% in the "Semitic" cluster, then DBV001 would be about 10% of Middle Eastern ancestry, but if they were like modern Druze, then this percentage would be 100*10.5/16.9 = 62.1%. The truth is probably somewhere in between.

VXP001: A shorter story, as VXP001 comes out 100% "European/W Asian". Thus, I am inclined to believe that VXP001's AJ score is either due to the small number of markers, or to a European-origin component in the composite Ashkenazi Jewish gene pool that he happens to share.

UPDATE (Oct 23): A much more detailed analysis of Genomes Unzipped individuals.

October 13, 2010

Clustering European Populations with ADMIXTURE

Here is the result of running ADMIXTURE for K=6 using 275K SNPs on the combined HGDP+HapMap European populations + the CHB Chinese sample from HapMap. (Only the larger HapMap Tuscan sample was included, rather than the smaller HGDP one).

Using a large number of K in closely related populations makes convergence quite harder, but the end result does correspond to several identifiable clusters.

yellow: Northern European
green: "Adygei"
red: "Sardinian"
purple: "Basque"
light and dark blue: Chinese

It's interesting that the Chinese split before some Europeans do. Interestingly, one of the two Chinese components (light blue) is the one represented in Russians, so this split probably reflects some type of northern/southern differentiation in Chinese, as the east Eurasian component in Russians was introduced from northern Finno-Ugric speakers.

Here are the admixture proportions:

It would be interesting to add the Xing et al. dataset into the mix, however, there is little overlap between all three datasets, rendering the aggregate quite useless for such fine-level work.

October 11, 2010

Running EURO-DNA-CALC on GenomesUnzipped

(Last Update Oct 23)

genomesunzipped is a new initiative to put data of personal genomics customers online. It's a great idea, and the data will be quite useful to many people.

I downloaded the available data and ran EURO-DNA-CALC on them. Of course it is meant to be used for European or West Eurasian people, which all of them seem to be.

Here are the results for the 12 people whose data were online as of this writing. In bold are components whose confidence intervals do not intersect 0.

Most of them seem to be of NW European descent as their names suggest, and a couple seem to be partly or significantly of Jewish descent.

If you are one of these people, feel free to write to me or leave a comment to tell me if I'm right or dead wrong!

UPDATE (Oct 14): A more detailed analysis of DBV001 and VXP001 in this post.
UPDATE (Oct 23): A much more detailed analysis of all Genomes Unzipped individuals in the context of western Eurasia.
UPDATE (Nov 1): Joe Pickrell discovers Jewish great-grandparent

Deep ancestors of human DNA compatible with structured African population

(Last Update Oct 13)

This is a wonderful paper as it directly deals with the old coalescence times of human autosomal DNA and their presumed incompatibility with the Out of Africa model:
A genome-wide frequency distribution of the TMRCAs has been reported by curating
the literature (Garrigan and Hammer 2006) but no systematic and consistent analysis has been performed in a single genome-wide data set. We report the fi rst genomewide estimation of the TMRCAs of anatomically modern humans, and we investigate if diff erent scenarios of human evolutionary history are supported by this estimate.
The four scenarios considered by the authors are seen schematically in the following figure from the paper:

The Recent out of Africa: Single Origin Population model is the simple model that has found support in the shallow coalescence times of human Y-chromosomes and mtDNA and has made the jump to popular culture. In this model, humans are a young species that underwent a bottleneck, and Eurasians are descended from a group of Africans that left the continent. This model has been criticized for its perceived inability to explain deep divergence times in autosomal DNA.

The Recent out of Africa: Multiple Archaic Populations is the model I have advocated over the years (check out the "Palaeoafrican" label of the post for my past writings on the subject). It agrees with the previous model in the recent African origin of modern Homo sapiens but it states that the African population was structured and not panmictic: divided into fairly isolated long-standing subpopulations, and that Eurasians are descended from a single one of these African subpopulations (which I have termed "Afrasians").

The existence of a structured African population makes easy work of deep divergence times, as the variants that have such deep origins are presumed to have evolved separately in different African subpopulations, and then to have found themselves in the modern gene pool after the breakdown of this structure.

The Multi-Regional: Recent Admixture model is the one advocated by those seeing Neandertal and/or Homo erectus introgression in Eurasia. Like the previous two models, it agrees on the recent African origin of modern humans, but it sees a place for long isolated pre-existing Eurasian hominids, who contributed some of their mtDNA to modern humans.

Like the previous model, deep divergence times are no problem, as two variants with deep common ancestry are presumed to stem from the separated Eurasian and African Homo. This model has found recent support by analysis of the Neandertal genome but as the authors of that study and myself have stressed, the evidence for 1-4% Neandertal introgression into Eurasians has an alternative explanation consistent with the previous (Multple Archaic Populations in Africa) model.

Finally, the Multi-Regional: Long Standing Admixture model sees no special place for Africa, except as the point of origin of human Y chromosomes and mtDNA. Humans are descended from Homo populations from around the world that have always maintained gene flow between them. This model obviously explains deep divergence times, but has a difficult time explaining the African origin of the uniparental markers, the palaeoanthropological evidence for an emergence of anatomical modernity in East Africa and the genetic evidence for a diminution of genetic variation in Eurasia with increasing distance from East Africa.

The authors seem to propose a fifth model, Ancestral Bottleneck which is noted as a bottleneck 150,000 years ago in a possibly ancestral structured population. This model doesn't get its own figure, but can be seen in the Single Origin Population model as "Potential bottleneck 150,000 years ago".

This model seems to combine elements of the first two ones: it is an essentially single origin model for extant humans, but it keeps the possibility of structure in Africa prior to the bottleneck, and pushes the breakdown of this structure before the bottleneck.

Here is what the distribution of TMRCAs for autosomal DNA, mtDNA, and Y-chromosomes:

The authors observe that really old most recent common ancestors are predicted by all four models, so they are no reason to discount the Single Origin Population model. However, it is plain that the variance of TMRCAs observed for actual human autosomal DNA is great (the black curve is "flat"). Here is what they write:
The variance of the empirical TMRCAs is larger than the variance predicted by three of the four different models of human evolution (see Figure 2 and Supplementary Table 3), and this large variance has been interpreted as the result of archaic sub-structure in Africa (Harding and McVean 2004). Indeed, the Multiple Archaic Populations' (scenario 2) shows similar variance of TMRCAs as the empirical data, but the inflated variance of the empirical TMRCA estimates can also be due to variation in mutation or recombination rate across the 40 sequence-regions (McVean et al. 2004).
In other words, the variance is great (more young and old TMRCAs than expected), either because of variation in mutation and recombination rates (i.e., different genomic regions evolve at different paces), or because of the multiple archaic populations idea. Unfortunately, the paper does not attempt to show how e.g., a variable genome-wide mutation rate might serve to flatten the TMRCA variance of the three models that fail to reproduce the data.

When we look at uniparental markers (mtDNA and Y-chromosomes), all four models predict older ancestors than observed. Here is what they write:
The models of human evolution typically predict older TMRCAs compared to the estimated 170,000 years for mtDNA (Ingman et al. 2000) and the upper estimate of 100,000 years for the Y-chromosome (Tang et al. 2002; Wilder et al. 2004; Shi et al. 2010). For mtDNA, a TMRCA of 170,000 years is within the range of values predicted by the `Multiple Archaic Populations' scenario (P(TMRCA less than 170,000) = 0.21), but the mitochondrial TMRCA estimate is diffi cult to reconcile with the remaining three scenarios (P less than 4x10-2). For the Y-chromosome, a TMRCA of 100,000 years is clearly at odds with three of the models (P less than 6x10-4), but for the `Multiple Archaic Populations' scenario with archaic African admixture, the proportion of simulated gene trees with TMRCAs younger than 100,000 years is larger than for the other three models, albeit quite small (P = 1.5x10-2).
Thus, while all four models can perhaps account for old autosomal TMRCAs (The "multiple archaics" on its own, the other three with help from variable genome-wide evolution), none of them can account for the young ages of human Y-chromsomes and mtDNA, with "multiple archaics" again coming on top, being consistent with "mitochondrial Eve", and coming closer (but not quite) to consistency with "Y-chromosome Adam".

There are ways to reconcile all four models with the uniparental markers, however. For the Multiple Archaic Populations model, they acknowledge that the Y-chromosome problem would go away if they increased the number of these populations from their current 3, while for the rest they invoke selection to account for the recency of human mtDNA and Y-chromosomes.

The effective population size tug of war

Parenthetically, it is important to note here the problem of the effective population size, as it has fueled quite a lot of sensationalistic media stories and documentaries (of the "humans were at the brink of extinction, and then a small band of them survived and went on to conquer the world" kind).

Here are some useful observations:

High effective population size => old TMRCAs
Low effective population size => young TMRCAs
Directional selection => young TMRCAs
Balancing selection => old TMRCAs
Structured population => old TMRCAs

In order to account for the recency of human Y-chromosomes and mtDNA, scientists came up with very low population sizes for our ancestors ("the endangered tribe" meme).

Unfortunately, this has the side-effect of predicting very low ages for autosomal DNA, lower than observed! To fix one problem, another one is created.

Can we have our cake and eat it too? An idea is to invoke balancing selection in autosomal DNA, i.e., the persistence of two variants at a given locus because they confer different advantages/disadvantages and an equilibrium between them exists, not allowing one or the other to reach its destiny of fixation.

Another idea is to invoke directional selection in Y-chromosomes and mtDNA. In directional selection, competing alleles are weeded out not by the winds of fortune, but by the supremacy of the successful alleles (Adam and Eve in our case) which push them to the side.

A different idea is to invoke ancient population structure. This immediately adds time to the TMRCA (since the different sub-populations became separated), and can thus explain old divergence times.

A fourth idea is to invoke "technical" things like variable mutation rate across the genome, or see problems in the standard age estimations for Adam and Eve. That way you can explain why there are more old autosomal TMRCAs than your model predicts, or why Adam and Eve are younger.

No wonder that there is no consensus among experts!


This paper certainly shows that the multiple archaic African populations model that I have advocated is a strong contender for being close to what actually happened. A priori, I think that the ecological and climatic variation in Africa -especially due to its north-south geometrical orientation-, and the long-established presence of Homo in the continent, make it unlikely that a single population of Homo survived there at the expense of all others.

In short, I think that: humans were never endangered in Africa, never dwindled to small numbers (inferred ancestral effective population sizes in the paper are 8k for Multiple Archaic Populations and 14k for Ancestral Bottleneck), and were not a single panmictic population spanning ecological niches and climate zones.

Rather, there were always separate populations in Africa, and climatic change (and more lately behavioral/subsistence change) has resulted in an ever-present process of population fusions and fissions. One of these sub-populations, living somewhere in East Africa, accumulated enough biological advantages to become extremely successful, populating Eurasia on the one hand where some admixture with archaic Eurasians may have taken place, but, also, successfully populating the rest of Africa, where it absorbed other subpopulations of Homo in the continent itself.

UPDATE (Oct 13): Some discussion of the paper and my own theories in Gene Expression, wherein Chris Stringer, a leading proponent of the "Recent out of Africa: Single Origin Population" says that:
My new book covers all this, and your recent work, but I do agree with Dienekes on the importance of deep African population substructure to the story..
While Gregory Cochran thinks I'm wrong:
Dienekes is wrong about the Neanderthal interbreeding results being explained by African population substructure, , but there are a lot of indications that there was significant substructure. A lot of this involves work that is not yet published: I look forward to seeing the details. Some of what I hear is remarkable.
For myself, I'm waiting to see data on native east Africans on segments of "Neandertal" ancestry. Let's look at native groups from Somalia, Kenya, Ethiopia, Tanzania with limited Caucasoid admixture and let's see how much "Neandertal" ancestry they have. If they don't have any, then "Neandertal" genes must have a Eurasian admixture explanation. If they have too little, then it can be explained by Caucasoid admixture in more recent times. But, if they have much more "Neandertal" admixture than Caucasoid admixture can explain, then the obvious solution is African population substructure.

Mol Biol Evol (2010) doi: 10.1093/molbev/msq265

Deep divergences of human gene trees and models of human origins

Michael GB Blum and Mattias Jakobsson

Two competing hypotheses are at the forefront of the debate on modern human origins. In the first scenario, known as the recent Out-of-Africa hypothesis, modern humans arose in Africa about 100,000-200,000 years ago, and spread throughout the world by replacing the local archaic human populations. By contrast, the second hypothesis posits substantial gene flow between archaic and emerging modern humans. In the last two decades, the young time estimates – between 100,000 and 200,000 years – of the most recent common ancestors for the mitochondrion and the Y-chromosome provided evidence in favor of a recent African origin of modern humans. However, the presence of very old lineages for autosomal and X-linked genes has often been claimed to be incompatible with a simple, single origin of modern humans. Through the analysis of a public DNA sequence database, we find, similar to previous estimates, that the common ancestors of autosomal and X-linked genes are indeed very old, living, on average, respectively 1,500,000 and 1,000,000 years ago. However, contrary to previous conclusions, we find that these deep gene genealogies are consistent with the Out-of-Africa scenario provided that the ancestral effective population size was approximately 14,000 individuals. We show that an ancient bottleneck in the Middle Pleistocene, possibly arising from an ancestral structured population, can reconcile the contradictory findings from the mitochondrion on the one hand, with the autosomes and the X-chromosome on the other hand.


October 10, 2010

ADMIXTURE on African HapMap populations

Here is the result of running ADMIXTURE on the three African HapMap-3 populations, using about 440K SNPs, including Tuscans as a non-African group.

The Tuscans are in purple and show no trace of African admixture. All the other populations are separated: red: Luhya (Bantu); green: Maasai (Nilotes); Yoruba (Niger-Congo).

The two east African groups show asymmetrical affinities: the Maasai have some Luhya red, while the Luhya have little Maasai green, while they have substantial West African turqoise, consistent with the origin of their Bantu language.