January 31, 2012

The American Anthropological Association opposes open science

Why am I surprised? This is the same group of people who wanted to remove "science" from the definition of anthropology. Apparently, not only do they think that science might not be "inclusive" enough for them (=they want to include under their auspices those who don't do science), but they also think that it should not be open.

We write today to make the case that while we share the mutual objective of enhancing the public understanding of scientific enterprise and support the wide dissemination of materials that can reach those in the public who would benefit from such knowledge (consistent with our associations' mission), broad public access to such information currently exists, and no federal government intervention is currently necessary.
Really? Try accessing the current issue of the flagship journal of the AAA, the American Anthropologist. If you're in the small part of the population that is a member of the AAA or in a subscribing institution, you can. If you are not an anthropologist, or not in a subscribing institution: tough luck. You have to pay! Either by buying access, or by spending time and money to go to a library that has access.

What is the effect of this? That anthropological knowledge will remain locked away from the vast majority of the population; that the population that funds the AAA either directly or indirectly will have no access to its work; and, most importantly, that the population at large will never be exposed to the kind of heavy-on-the-prose/light-on-the-facts/heavy-on-the-politics/light-on-the-science drivel that their tax money buys from the good folks at the AAA (*).

More from the AAA response:
We kow of no research that demonstrates a problem with the existing system for making the content of scholarly journals available to those who might benefit from it.
Read on: by "those who might benefit from it", the good folks at AAA mean "other researchers". Apparently, for AAA, plain anthropoi might not really benefit from their product. What a nice way to create an insular community of anthropologists reading and citing each others' papers with no reference to society at large?

Indeed, AAA is right: plain people do not really benefit from their papers. And, sooner or later, people will wake up and decide that either the good folks of AAA must not lock away their product where it cannot be seen, or they'll have to do without the (direct or indirect) funding that keeps them alive.

we dispute assertions underlying many of them suggesting that the federal government has the legal right to mandate public access to scholarly journal articles which result from federally funded research.
Their argument is that since journal articles are the product of non-federally funded individuals (such as editors and designers), the federal government doesn't have the right to mandate public access.

Trouble is, that these "non-federally funded" persons aren't really so non-federally funded. They are paid by publishers, who make money largely from subscribing anthropologists, and subscribing libraries, who... get their money from government. They don't get funded from government directly, but if government stopped funding anthropology departments and libraries, most of their income would disappear.

It's a nice racket: let's protect the interests of designers, publishers, and distributors! How about we protect the interests of the regular people who fund anthropology departments and libraries via their taxes, and shouldn't be forced to pay twice for the same product?

(*) With exceptions, of course.

mtDNA of domestic horses: Ancestral Mare ~130-160 thousand years old

The paper is open access.

PNAS doi: 10.1073/pnas.1111637109

Mitochondrial genomes from modern horses reveal the major haplogroups that underwent domestication

Alessandro Achilli et al.

Archaeological and genetic evidence concerning the time and mode of wild horse (Equus ferus) domestication is still debated. High levels of genetic diversity in horse mtDNA have been detected when analyzing the control region; recurrent mutations, however, tend to blur the structure of the phylogenetic tree. Here, we brought the horse mtDNA phylogeny to the highest level of molecular resolution by analyzing 83 mitochondrial genomes from modern horses across Asia, Europe, the Middle East, and the Americas. Our data reveal 18 major haplogroups (A–R) with radiation times that are mostly confined to the Neolithic and later periods and place the root of the phylogeny corresponding to the Ancestral Mare Mitogenome at ∼130–160 thousand years ago. All haplogroups were detected in modern horses from Asia, but F was only found in E. przewalskii—the only remaining wild horse. Therefore, a wide range of matrilineal lineages from the extinct E. ferus underwent domestication in the Eurasian steppes during the Eneolithic period and were transmitted to modern E. caballus breeds. Importantly, now that the major horse haplogroups have been defined, each with diagnostic mutational motifs (in both the coding and control regions), these haplotypes could be easily used to (i) classify well-preserved ancient remains, (ii) (re)assess the haplogroup variation of modern breeds, including Thoroughbreds, and (iii) evaluate the possible role of mtDNA backgrounds in racehorse performance.


January 30, 2012

AAPA 2012 abstracts (Part 3)

Continuing from Part 2.

An analysis of the Klasies River hominins using a hybrid model.
LILY MALEKFAR. Anthropology, Northern Illinois University.
Current research indicates that modern Homo sapiens originated in East Africa and then migrated across Africa as well as out of Africa, where they encountered archaic hominins. The Klasies River Main site (KRM) in South Africa is one location where there is evidence that modern and archaic Homo sapiens may have interacted. As Smith and other researchers have suggested, the KRM mandibular sample, in particular, exhibits significant size and morphological variability, which counters claims that the KRM specimens are fully modern.
The null hypothesis predicts that KRM’s range of variation does not significantly differ from the ranges of variation indicated in the comparative samples, including Sima de los Huesos, Krapina, Skhul, Qafzeh, and the Northern Illinois University (NIU) Collection, the latter containing specimens classified as modern Homo sapiens from India. If the null hypothesis is rejected, this would be tentative support that the KRM sample may possibly be a hybrid sample. This study examines first and second mandibular molar lengths and widths as well as mandibular corpus height and breadth in adult hominins and compares patterns of variation using the coefficient of variation.
The results demonstrate that the KRM sample is markedly more variable than any of the comparative samples, which rejects the null hypothesis and is one possible indicator of an admixed sample at KRM. This study is limited by small sample sizes for KRM. This and the fact that KRM spans several thousand years may impact these results.

The origins of dental modernity.
Research over the past decade has established that the study of dental morphological characters is a useful and important tool for interpreting the later stages of human evolution. A good deal of this research has focused on identifying dental characters that are relevant specifically to the distinction between Neandertals and H. sapiens, and more broadly to the question of modern human origins. However, while the dental patterns of certain recent H. sapiens populations have been described as primitive (e.g., Sub-Saharan Africans) or derived (Northeast Asians) relative to other groups, no study to date has proposed a dental pattern that characterizes H. sapiens as a species. To this end, this study investigates (1) whether or not there is a unique dental pattern in H. sapiens; (2) if so, which traits comprise this pattern; and (3) when, during the course of human evolution, these traits emerge. Our results show that size notwithstanding, H. sapiens has few uniquely derived dental traits that distinguish them from other hominins. These include the U-shaped fissure pattern of the lower P4, relatively flat, featureless upper incisors that are buccolingually narrow, lower molars lacking a hypoconulid and lower molars lacking any form of trigonid crest on enamel and dentine surfaces. Early H. sapiens from Qafzeh, Klasies River Mouth and Jebel Irhoud possess some of these characters. Interestingly, none of the recently discovered teeth from Qesem Cave, Israel exhibit any derived H. sapiens non-metric traits, while the molars of H. floresiensis are derived toward the H. sapiens condition.
Endocranial shape in early modern humans.
Humans have more globular brains and therefore endocasts than our extant and extinct relatives: chimpanzees and Neanderthals both have anterioposteriorly elongated endocasts. Based on an ontogenetic series of recent modern humans, we have previously shown that this modern human globular shape develops directly after birth during an ontogenetic phase that is absent in chimpanzees and Neanderthals. However, it is unclear at which point in the evolution of our species this unique pattern of brain development appeared.
Here, we aim to trace its evolutionary origin. Based on the shape of fossil adult humans, we investigate the morphological evolution of Homo sapiens endocasts using geometric morphometrics. Investigating representatives of H. sapiens from different time periods (comprising samples from Jebel Irhoud, Qafzeh, Skhul, Mladec, Cro-Magnon) makes it possible to assess when and how (gradually or rapidly) this developmental phase appeared in the course of recent human evolution. As several relevant fossils are fragmentary and partly deformed, they require reconstruction before they can be analyzed. To this end, we generate and reconstruct virtual endocasts based on CT scans. We first use mirror-imaging and segmentation techniques, and then the thin-plate-spline interpolation function for reference-based reconstruction. Generating multiple reconstructions based on landmarks of 60 recent human endocasts, we keep track of the reconstruction uncertainty throughout the shape analysis. We document temporal trends of endocranial shape within anatomically modern humans during the Late Pleistocene and discuss potential implications for the evolution of the modern human brain.

AAPA 2012 abstracts (Part 2)

Continuing from Part 1.

I am quite looking forward to the following study which echoes some of my own thoughts on the subject:

Vindija Neandertals as evidence for gene flow from early modern humans.
The Vindija Neandertal remains have played a critical role in discussions on the emergence of modern Eurasians and the possible involvement of Neandertals in that process. Most recently, fragments from Vindija yielded a draft sequence of a Neandertal genome revealing a 1-4% contribution of Neandertals to recent Eurasians. Morphology of the Vindija Neandertals has long been regarded as showing progressive features in a late Neandertal sample, but the interpretation of the meaning of this pattern has varied over time. Although various studies have shown the Vindija pattern is not due to any type of sample bias, that interpretation is still cited. Otherwise the morphology is seen as either reflecting the process of modern human emergence in Eurasia or as just a part of “normal” Neandertal variation. If Vindija does reflect the process of transition to modern humans, the question is how does it reflect this process? We suggest that the Vindija morphology reflects evidence for gene flow from early modern populations into Neandertals. We show how the Vindija cranial and mandibular pattern reflects that process and demonstrate that indications of mixing among stratigraphic levels
at the site do not impact biological interpretations of the Vindija sample. This direction of gene flow has not been detected in genetic studies so far. Our interpretation underscores the importance of using both morphological and genetic data in approaching questions of late human evolution.

Assessing the pattern of Neandertal ancestry in living human populations.
JOHN HAWKS. Department of Anthropology, University of Wisconsin-Madison.
People living outside Africa today derive 2 to 4% of their ancestry from Neandertal populations. This initial estimate was based on whole-genome sequencing of a small number of individuals, and the pattern of Neandertal ancestry has yet to be characterized. Here I employ the sequencing data from the 1000 Genomes Project to identify Neandertal-derived haplotypes in living human populations. Initial sequence-level comparison allowed development of a genome-wide sample of SNP haplotypes informative of Neandertal ancestry. Humans within a population differ little in the amount of Neandertal ancestry, but the fraction does vary significantly among samples from different regions. Most Neandertal genes today are rare, existing only in one or two copies in the 1000 Genomes sample. However, a few have become majority haplotypes, 50% or higher. Europeans, South Asians, and East Asian populations differ substantially in which Neandertal-derived haplotypes are presently common, so that a haplotype present in one of these regions is very likely to be absent in samples from other regions. This heterogeneity of present-day Neandertal ancestry provides information about the Late Pleistocene dispersals of humans. In particular, today's populations outside Africa differentiated under strong genetic drift. A relatively small proportion of Neandertal-derived haplotypes contain candidates for selection in later human populations, based on their current pattern of extended haplotype heterozygosity and fraction of derived SNP alleles. Additionally, I report on the application of these methods to investigate and visualize Neandertal ancestry at the whole-genome level from commercial SNP genotype data.

Rates of Neandertal introgression in genic versus intergenic regions of the human genome.
The Neandertal genome project recently estimated that 1-4% of the genetic material found in non-African populations is the result of the introgression of Neandertal genes. When populations that were previously isolated admix, incompatibility at the genic level can often result in distinctive patterns of introgression. It can be predicted that intergenic regions will be more likely to introgress into a population than protein coding changes when two populations or species have lowered hybrid viability or fertility. As coding changes are more likely to be associated with inviability and infertility due to epistatic interactions between gene products, these regions are less likely to be exchanged between diverging populations. Coding regions, therefore, should show an earlier divergence time than intergenic regions. To test this hypothesis, we looked at Neandertal introgression in five genic and five intergenic regions from six geographically distinct modern human populations (Han Chinese, Gujarati Indian, Italian, Puerto Rican, Japanese, and CEPH Europeans). We chose regions with similar recombination rates that did not show strong departures from neutrality. Using maximum likelihood estimation, we calculated the time to the most recent common ancestor (TMRCA) for each of the 10 regions separately based on human-Neanderthal-chimp sequence alignments. Our results highlight the patterns of introgression for intergenic and coding regions in different human populations while expanding our understanding of Neandertal population dynamics and raising new questions about human-Neandertal admixture.

AAPA 2012 abstracts (part 1)

Here are some interesting abstracts from the 81st Annual Meeting of the American Association of Physical Anthropologists.

Maternal marks of admixture in Cape Coloreds of South Africa.
Previous studies of genetic diversity have suggested that the Cape Coloureds of South Africa are a highly admixed population with genetic roots from indigenous African groups including Khoisans, and the later arrival of Bantu speaking Xhosa farmers. Further genetic contributions came during European colonization of South Africa, which added to the inclusion of largely male European markers to the gene pool. Slaves from Indonesia, Malaysia, Madagascar and India are also thought to have contributed to the genetic makeup of this ethnic group. This study examines the maternal contribution of each of these groups to the genetic diversity of the Cape Coloreds through sequencing of the hypervariable region I of the mitochondrial DNA and through restriction fragment length polymorphism.
A total of 123 individuals were examined for this study. High frequencies of haplogroups L1 and L2 were found at 81.3 percent in this group (100 of the 123 individuals), which indicates that this group has a large African contribution to its mitochondrial makeup. Restrictions of the major European haplogroups identified nine individuals, 7.3 percent of the sample, belonged to haplogroups I and J. Five individuals (4.1 percent of the sample) belonged to the superhaplogroup M, indicating that Asian slaves did contribute to the maternal gene pool. The majority of maternal lineages in this Cape Coloured sample are African in origin, with some European influence and a small contribution from Asian maternal lineages.

Ancient DNA reveals the population origin of the Eastern Xinjiang.
Connecting with the Turpan Basin, the Eurasia steppe and the Gansu Corridor, the Eastern region of Xinjiang has played a significant role in the history of human migration, cultural developments, and communications between the East and the West. The population origin, migration and integration of this region have attracted extensive interest among scientists.
In order to research the population origin and movement of the Eastern Xinjiang, genetic polymorphisms studies of the Hami population were conducted. The Hami site is located in the East of Tian-Moutain in Xinjiang, dating back to the Bronze-early Iron Age. Archaeological studies showed that the culture of the Hami site possessed features from both the East and the West. Ancient mtDNA analysis showed that A, C, D, F, G, Z and M7 of the Eastern maternal lines, and W, U2e, U4, and U5aof the Western maternal lines were identified. Tajimas’D test and mismatch distribution analysis show that the Hami population had experienced population expansion in recent time. The demographic analysis of haplogroups suggests that the populations of the Northwest China, Siberia and the Central Asia have contributed to the mtDNA gene pool of the Hami population.
Our study reveals the genetic structure of the early population in Eastern Xinjiang, and its relationships with other Eurasian populations. The results will provide valuable genetic information to further explore the population origin and migration of Xinjiang and Central Asia.

Analysis of Chuvash mtDNA points to Finno-Ugric origin.
A sample of 92 unrelated individuals from Chuvashia, Russia was sequenced for hypervariable region-I (HVR-I) of the mtDNA molecule. These data have been verified using RFLP analysis of the control region, revealing that the majority exhibit haplogroups H (31%), U (22%), and K (11%), which occur in high frequencies in western and northern Europe, but are virtually absent in Altaic or Mongolian populations. Multidimensional scaling (MDS) was used to examine distances between the Chuvash and reference populations from the literature. Neutrality tests (Tajima’s D (-1.43365) p<0.05, Fu’s FS (-25.50518) p<0.001) and mismatch analysis, which illustrates unimodal distribution, all suggest an expanding population.
The Chuvash speak a Turkic language that is not mutually intelligible to other extant Turkish groups, and their genetics are distinct from Turkic-speaking Altaic groups. Some scholars have suggested that they are remnants of the Golden Horde, while others have advocated that they are the products of admixture between Turkic and Finno-Ugric speakers who came into contact during the 13th century. Earlier genetic research using autosomal DNA markers indicated a Finno-Ugric origin for the Chuvash. This study examines uniparental mitochondrial DNA markers to better elucidate their origins. Results from this study maintain that the Chuvash are not related to Altaic or Mongolian populations along their maternal line, thus supporting the “Elite” hypothesis that their language was imposed by a conquering group —leaving Chuvash mtDNA largely of Eurasian origin. Their maternal markers appear to most closely resemble Finno-Ugric speakers rather than Turkic speakers.

An ancient DNA perspective on the Iron Age “princely burials” from Baden-Wurttemberg, Germany.
During the Iron Age in Europe, fundamental social principles such as age, gender, status, and kinship were thought to have played an important role in the social structure of Late Hallstatt and Early Latene societies. In order to address the question of kinship relations represented in the Iron Age “princely burials” that are characterized by their rich material culture, we carried out genetic analysis of individuals associated with the Late Hallstatt culture from Baden-Wurttemberg, Germany. Bone specimens of thirty-eight skeletal remains were collected from five sites including Asperg Grafenbuhl, Muhlacker Heidenwaldle, Hirschlanden, Ludwigsburg, and Schodeingen. Specimens were subjected to DNA extraction and amplification under strict criteria for ancient DNA analysis. We successfully obtained mitochondrial DNA (mtDNA) control region sequences from seventeen individuals that showed different haplotypes, which were assigned to nine haplogroups including haplogroups H, I, K, U5, U7, W, and X2b. Despite the lack of information from nuclear DNA to infer familial relations, information from the mtDNA suggests an intriguing genetic composition of the Late Hallstatt burials. In particular, twelve distinct haplotypes from Asperg Grafenbuhl suggest a heterogeneous composition of maternal lineages represented in the “princely burials”. The results from this study provide clues to the social structure reflected in the burial patterns of the Late Hallstatt culture and implications on the genetic landscape during the Iron Age in Europe.

Genetic snapshot from ancient nomads of Xinjiang.
Nomads of the Eurasian steppes are known to have played an important role in the transfer commodities and culture among East Asia, Central Asia, and Europe. However, the organization of nomadic societies and initial population genetic composition of nomads were still poorly understood because of few archaeological materials and written history.
In this study, the genetic snapshot of nomads was emerged by examining mitochondrial DNA and Y-chromosome DNA of 30 human remains from Heigouliang (HGL) site in the eastern of Xinjiang, which dated 2000 years ago and associated to the nomadic culture by archaeological studies. Mitochondrial DNA analysis showed that the HGL population included both East Eurasian haplogroups (A, C, D, G, F and Z) and West Eurasian haplogroups (H, K, J, M5 and H). The component of Eastern haplogroups is dominant. The distribution frequency and Fst values of Eastern haplogroups indicated the HGL population presented close genetic affinity to the nearby region modern populations of Gansu and Qinghai, while those of western haplogroups showed similar with Mongolia and Siberia populations. The results implied various maternal lineages were introduced into the HGL population. Regarding the Y chromosomal DNA analysis, nearly all samples belonged to haplogroup Q which is thought to be the mark of the Northern Asian nomads. We identified paternal kinship among three individuals at the same tomb by Y-STR marker.
Combined with archaeological and anthropological investigations, we inferred that the gene flow from the neighboring regions was possibly associated with the expansion of Xiongnu Empire.

Vikings, merchants and pirates at the top of the world: Y-chromosomal signatures of recent and ancient migrations in the Faroe Islands.
The Faroe Islands are a small archipelago in the North Atlantic Ocean. With a current population of approximately 48,000 individuals and evidence of high levels of genetic drift, the Faroese are thought to have remained highly homogeneous since the islands were settled by Vikings around 900CE. Despite their geographic isolation, however, there is historical evidence that the Faroese experienced sporadic contact with other populations since the time of founding. Contact with Barbary pirates in the seventeenth century is documented in the Faroes; there is also the possibility of modern migrations to work in the highly productive fishery. This study set out to distinguish the signal of the original founders from later migrants. Eleven Y-chromosomal STR markers were scored for 139 Faroese males from three geographically dispersed islands. Haplotypes were analyzed using Athey's method to infer haplogroup. Median-joining networks within haplogroups were constructed to determine the phylogenetic relationships within the Faroese and between likely parental populations—Danish, Irish, and Norwegians. Dispersal patterns of individuals around Faroese haplogroups suggest different times of haplotype introduction to the islands. The most common haplogroup, R1a, consists of a large node with a tight network of neighbor haplotypes, such that 68% of individuals are one or two mutational steps away. This pattern may represent the early founder event of R1a in the Faroes. Other distributions, especially of non-Scandinavian haplotypes, document more recent introductions to the islands. The overall pattern is one of a strong founder effect followed by minor instances of later migrations.

Date estimates for major mitochondrial haplogroups in Yemen.
Yemen occupies a key location as the first stop for anatomically modern humans on a theoretical southern migration route out of Africa. If modern humans did pass through Yemen during the first migrations out of Africa and if they left modern-day descendants, we would expect to see deep divergences in the Yemeni mitochondrial gene tree. Alternatively, if modern humans passed through Yemen but did not leave modern-day descendants or if Yemen was not on the path of these ancient migrations, we would expect more recent dates to be associated with Yemeni mitochondrial haplogroups.
Using 44 previously sequenced mitochondrial genomes as well as 24 newly sequenced mitochondrial genomes from samples collected throughout Yemen, several methods were used to estimate divergence dates of major Yemeni haplogroups including L2, M, R0a and HV. Specifically, phylogenetic trees were generated using MrBayes and maximum likelihood methods. Bayesian and ρ statistic based methods were used to estimate dates of Yemeni haplogroups and these dates were compared with each other, previously published dates for these haplogroups, approximate dates of climatic change that might be expected to correlate with population expansions, and estimates based on archaeological and paleontological evidence for the first migrations out of Africa. These comparisons are intended to cover the range of possible haplogroup divergence dates with respect to the history of early modern humans in southern Arabia.

January 29, 2012

Early Neandertals used red ochre

I have never quite understood the fascination of archaeologists with red ochre. As far as I can tell, this fascination stems from the fact that it is a pigment that survives in time, and had been used by the earliest artists during the Upper Paleolithic. By extrapolation, its presence in earlier contexts has been interpreted as evidence of "art" or "symbolic behavior". So, a new paper that appeared in PNAS slightly demystifies the pigment by discovering its use more than 200 thousand years in Europe, probably by early Neandertals, and at the same time as its earliest traces in Africa.

In my opinion, this points to the conclusion that use of pigments and red ochre in particular is not a modern human innovation that was adopted (late) by the sister Neandertal taxon, but rather that something that humans used long-before the advent of "modernity", dating, perhaps, to H. heidelbergensis, the common ancestor of modern humans and Neandertals.

Of interest in that regard is the following tidbit of information from an unrelated source:
Sicevo Gorge - a canyon cut into the Kunivica plateau in southeastern Serbia - contains a series of caves, at least one of which has yielded evidence of human presence during the Ice Age of present-day Europe. In 2008, anthropologists excavating in a small cave uncovered a partial human lower jaw with three teeth. 
"We were looking for Neanderthals," said Dr. Mirjana Roksandic, a participating palaeo-anthropologist with the University of Winnepeg (Canada) and a leading research team member. "But this is much better. 
"What they discovered was definitely a human that, at least in terms of morphology, predated the Neanderthal and may have had more in common physically with Homo erectus - thought by many scientists to be the precursor to both Neanderthals and modern humans. Recent tests conducted by Dr. Norbert Mercier at the University of Bordeaux (France) produced a date of "older than" 113,000 years BP - long before modern humans in present-day Europe - and the fossil could be substantially older.
So, I would not assume that 200-250ky ago in Europe was definitely "early Neandertals".

John Hawks covers the paper in detail.

PNAS doi: 10.1073/pnas.1112261109

Use of red ochre by early Neandertals

Wil Roebroeks et al.


The use of manganese and iron oxides by late Neandertals is well documented in Europe, especially for the period 60–40 kya. Such finds often have been interpreted as pigments even though their exact function is largely unknown. Here we report significantly older iron oxide finds that constitute the earliest documented use of red ochre by Neandertals. These finds were small concentrates of red material retrieved during excavations at Maastricht-Belvédère, The Netherlands. The excavations exposed a series of well-preserved flint artifact (and occasionally bone) scatters, formed in a river valley setting during a late Middle Pleistocene full interglacial period. Samples of the reddish material were submitted to various forms of analyses to study their physical properties. All analyses identified the red material as hematite. This is a nonlocal material that was imported to the site, possibly over dozens of kilometers. Identification of the Maastricht-Belvédère finds as hematite pushes the use of red ochre by (early) Neandertals back in time significantly, to minimally 200–250 kya (i.e., to the same time range as the early ochre use in the African record).


January 28, 2012

Chris Stringer, "Rethinking Out of Africa"

Over at the Edge:
I'm thinking a lot about species concepts as applied to humans, about the "Out of Africa" model, and also looking back into Africa itself. I think the idea that modern humans originated in Africa is still a sound concept. Behaviorally and physically, we began our story there, but I've come around to thinking that it wasn't a simple origin. Twenty years ago, I would have argued that our species evolved in one place, maybe in East Africa or South Africa. There was a period of time in just one place where a small population of humans became modern, physically and behaviourally. Isolated and perhaps stressed by climate change, this drove a rapid and punctuational origin for our species. Now I don’t think it was that simple, either within or outside of Africa.
There is a 44' video at the site (which I haven't viewed yet).

UPDATE: at 10:45, he suggests that Broken Hill is much younger than "many of us think". This seems exceptionally important, since BH (or Kabwe) is thought to be the African branch of H. heidelbergensis and a precursor to the later H. sapiens. Given the current extent of dates proposed for the specimen, it seems almost certain that "much younger" means post-Omo, and hence: (i) one possible precursor for H. sapiens disappears from Africa, and (ii) one additional post-H. sapiens archaic hominin is added.

January 27, 2012

The Arabian cradle (Fernandes et al. 2012)

I have written about Out-of-Arabia before. It is important to remember, when discussing the prehistory of Arabia in terms of the modern inhabitants, that the peninsula undergoes periods of extreme aridity followed by periods of relative humidity. Hence, unlike other regions of the world where continuous occupation can be argued due to a fairly stable climate, this is not the case for Arabia.

This observation is important because when looking at modern populations we cannot a priori assume the survival of the most ancient inhabitants. Nonetheless, it can be well argued that Homo sapiens is an extremely adaptable species: not only did it spread throughout the world in a geological blink of an eye over the last 50 thousand years or so, but also persisted throughout most of the world, coming to occupy nearly every corner of the planet.

So, even though hyper-arid periods may have driven away most people from desert areas, perhaps they did not drive away everybody. There may yet be relics of ancient populations to be found. This is exactly what a new paper proposes: that Arabia possesses extremely old mtDNA lineages within the major macro-haplogroup N, dating to about 60,000 years ago. This is quite close to the estimates time depth of haplogroup L3 which unites many Africans with the Eurasians belonging to macrohaplogroups M and N.

The mainstream understanding of what happened -according to most geneticists- is that modern humans began spreading from Africa at around that time, about 60-70 thousand years ago. On the contrary, archaeologists have found indisputable evidence (palaeoanthropological or archaeological) of modern humans in Asia from before 100 thousand years, stretching from the Levant to the southern parts of Arabia.

There are two possibilities:

  • The pre-70ka modern humans in Asia left absolutely no traces of mtDNA, and all of the extant mtDNA in Asia is derived from post-70ka Africans. Hence, the pre-70ka modern humans in Asia were the descendants of failed exodi.
  • The people who expanded post-70ka in Asia were descended from people who lived in Asia before 100ka, descendants of successful exodi perhaps associated with the Mount Carmel hominins or the recently discovered Nubian Complex.
I am rather in favor of the second hypothesis; the authors of the current paper favor the first. It seems unnatural that pre-70ka modern humans in Asia would just vanish: why would they? They, apparently, lived across a vast area, and were bearers of technologies that were no worse than contemporaneous African cultures. Moreover, there is simply no archaeological evidence about population movements originating in Africa at 70-60ka.

However, if the second hypothesis is true, there is a problem: haplogroup L3 is dated to 70ka, so if the expansion associated with it started in Asia, that means that there must have been substantial back-migration of L3-related lineages back to Africa. I don't see any major problem with that hypothesis, but it is true that many scientists are reluctant to feature extensive back-migration to Africa into their models. At present it has not been possible to determine to what extent genetic diversity in Africa is due to great antiquity vs. admixture of divergent human populations, which I have called Afrasian (related to Eurasians) and Palaeoafrican. If L3 did originate in Africa, then the concusion of a recent African exodus is inescapable.

The major contribution of the current paper is that it fixes a major human expansion Out-of-Arabia at very close to 60ka. Whether this expansion originated from transient Out-of-Africans who had recently exited Africa, or from long settled populations of Asia (prior to 100ka) remains to be seen.

From the paper:
The presence of archaeological sites in the Gulf basin demonstrates a long tradition of human occupation.9 However, neither direct cultural influences from the Levant nor any African influence has been detected in the Upper Palaeolithic (Late Pleistocene) lithics observed in eastern Arabia, pointing to a local development of cultural techniques.9,47 Curiously, however, the fact that some of the branches studied here include deep lineages in eastern Africa (haplogroups I, N1a, and N1f) shows that migration back to Africa occurred a number of times between 15 and 40 ka ago. 
The hypothesized Gulf Oasis9 appears to be the most likely locus of the earliest branching of haplogroup N, including the three relict basal N(xR) haplogroups studied here, as well as the major Eurasian haplogroup R. Time estimates, frequencies, and genetic diversities reported here for these haplogroups are often similar between the Levant and Arabia, challenging the hypothesis of longterm isolation between these two regions. The other two refugia identified in the south and southwest of the Peninsula might have acted as a corridor for migrations west, back toward eastern Africa. Y chromosome microsatellite diversity in the Arabian Peninsula has suggested that Dubai and Oman share genetic affinities with other Near Eastern populations, whereas Saudi Arabia and Yemen show signs of greater isolation (although for fast-evolving microsatellites, these differences might reflect more recent events).4

The American Journal of Human Genetics, 26 January 2012 doi:10.1016/j.ajhg.2011.12.010

The Arabian Cradle: Mitochondrial Relicts of the First Steps along the Southern Route out of Africa

Verónica Fernandes et al.

A major unanswered question regarding the dispersal of modern humans around the world concerns the geographical site of the first human steps outside of Africa. The “southern coastal route” model predicts that the early stages of the dispersal took place when people crossed the Red Sea to southern Arabia, but genetic evidence has hitherto been tenuous. We have addressed this question by analyzing the three minor west-Eurasian haplogroups, N1, N2, and X. These lineages branch directly from the first non-African founder node, the root of haplogroup N, and coalesce to the time of the first successful movement of modern humans out of Africa, ∼60 thousand years (ka) ago. We sequenced complete mtDNA genomes from 85 Southwest Asian samples carrying these haplogroups and compared them with a database of 300 European examples. The results show that these minor haplogroups have a relict distribution that suggests an ancient ancestry within the Arabian Peninsula, and they most likely spread from the Gulf Oasis region toward the Near East and Europe during the pluvial period 55–24 ka ago. This pattern suggests that Arabia was indeed the first staging post in the spread of modern humans around the world.


fineStructure paper (Lawson et al. (2012)


PLoS Genet 8(1): e1002453. doi:10.1371/journal.pgen.1002453

Inference of Population Structure using Dense Haplotype Data

Daniel John Lawson et al.

The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this “chromosome painting” can be summarized as a “coancestry matrix,” which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.

January 26, 2012

Y chromosomes of West African descendants (Torres et al. 2012)

PLoS ONE 7(1): e29687. doi:10.1371/journal.pone.0029687

Y Chromosome Lineages in Men of West African Descent

Jada Ben Torres et al.

The early African experience in the Americas is marked by the transatlantic slave trade from ~1619 to 1850 and the rise of the plantation system. The origins of enslaved Africans were largely dependent on European preferences as well as the availability of potential laborers within Africa. Rice production was a key industry of many colonial South Carolina low country plantations. Accordingly, rice plantations owners within South Carolina often requested enslaved Africans from the so-called “Grain Coast” of western Africa (Senegal to Sierra Leone). Studies on the African origins of the enslaved within other regions of the Americas have been limited. To address the issue of origins of people of African descent within the Americas and understand more about the genetic heterogeneity present within Africa and the African Diaspora, we typed Y chromosome specific markers in 1,319 men consisting of 508 west and central Africans (from 12 populations), 188 Caribbeans (from 2 islands), 532 African Americans (AAs from Washington, DC and Columbia, SC), and 91 European Americans. Principal component and admixture analyses provide support for significant Grain Coast ancestry among African American men in South Carolina. AA men from DC and the Caribbean showed a closer affinity to populations from the Bight of Biafra. Furthermore, 30–40% of the paternal lineages in African descent populations in the Americas are of European ancestry. Diverse west African ancestries and sex-biased gene flow from EAs has contributed greatly to the genetic heterogeneity of African populations throughout the Americas and has significant implications for gene mapping efforts in these populations.


January 24, 2012

Paleolithic Siberian domestic dog

From the press release:
A 33,000-year-old dog skull unearthed in a Siberian mountain cave presents some of the oldest known evidence of dog domestication and, together with an equally ancient find in a cave in Belgium, indicates that modern dogs may be descended from multiple ancestors.
I've been following the dog domestication saga for a few years now; it seems that geneticists are in general agreement that domestic dogs share a fairly recent ancestry from East Asia, although there are some lingering controversies about the role of other dogs in the formation of modern breeds. On the contrary, there are now two cases of Upper Paleolithic domesticated dogs, from both Belgium and Siberia. I can't wrap my head around the idea that dogs that were domesticated more than 30 thousand years ago, and would -presumably- have plenty of time to adapt would be totally replaced.

It would be great if we could get some Paleolithic dog DNA for comparison, as this would show whether some modern dog breeds are differentially affiliated to Paleolithic dogs, which would support a "multiregional evolution of domestic dogs".

PLoS ONE 6(7): e22821. doi:10.1371/journal.pone.0022821

A 33,000-Year-Old Incipient Dog from the Altai Mountains of Siberia: Evidence of the Earliest Domestication Disrupted by the Last Glacial Maximum

Nikolai D. Ovodov et al.

Virtually all well-documented remains of early domestic dog (Canis familiaris) come from the late Glacial and early Holocene periods (ca. 14,000–9000 calendar years ago, cal BP), with few putative dogs found prior to the Last Glacial Maximum (LGM, ca. 26,500–19,000 cal BP). The dearth of pre-LGM dog-like canids and incomplete state of their preservation has until now prevented an understanding of the morphological features of transitional forms between wild wolves and domesticated dogs in temporal perspective.

Methodology/Principal Finding
We describe the well-preserved remains of a dog-like canid from the Razboinichya Cave (Altai Mountains of southern Siberia). Because of the extraordinary preservation of the material, including skull, mandibles (both sides) and teeth, it was possible to conduct a complete morphological description and comparison with representative examples of pre-LGM wild wolves, modern wolves, prehistoric domesticated dogs, and early dog-like canids, using morphological criteria to distinguish between wolves and dogs. It was found that the Razboinichya Cave individual is most similar to fully domesticated dogs from Greenland (about 1000 years old), and unlike ancient and modern wolves, and putative dogs from Eliseevichi I site in central Russia. Direct AMS radiocarbon dating of the skull and mandible of the Razboinichya canid conducted in three independent laboratories resulted in highly compatible ages, with average value of ca. 33,000 cal BP.

The Razboinichya Cave specimen appears to be an incipient dog that did not give rise to late Glacial – early Holocene lineages and probably represents wolf domestication disrupted by the climatic and cultural changes associated with the LGM. The two earliest incipient dogs from Western Europe (Goyet, Belguim) and Siberia (Razboinichya), separated by thousands of kilometers, show that dog domestication was multiregional, and thus had no single place of origin (as some DNA data have suggested) and subsequent spread.


January 20, 2012

Archaic DNA data mining for dummies

I have repeatedly stressed how full genome sequencing will allow us to detect archaic DNA in modern humans, so I thought of writing a simple post where I lay out the rationale behind my conviction.

The age of the microarray

Microarrays test for a few 105 variants in the human genome. Conceptually, we can view the difference between two individuals as follows:


As you can see, these two individuals differ in a couple of locations tested by the microarray and are the same in one.

The age of the full genome

What will happen when we use full genomes? All the unknown positions in the two sequences will be known.

This may end up looking like this (Possibility #1):


i.e., the sites that were polymorphic in the microarray were the only ones that were polymorphic, and the rest of the sequence appears like a carbon copy of each other.

Or, it may end up looking like this (Possibility #2):


i.e., there are additional differences between the two individuals that were not captured by the microarray.

In the second scenario, there are 6 mutations between the two sequences compared to only 2 in the first one. So, the two sequences share a much older common ancestor compared to the first scenario.

By scanning stretches of DNA in full genomes, it is possible to identify regions where the number of mutations between two sequences are so many (expressed e.g., as a fraction of the number of differences between humans and chimps), that the common ancestor must have lived a very long time ago, even millions of years ago.

In some cases, we will be able to directly compare these sequences to actual archaic hominins, which is how Mendez et al. were able to infer archaic introgression from a Denisova-like hominin into Melanesians. But, even in the absence of archaic DNA, a good enough case of archaic admixture can be made.

Balancing selection

Balancing selection is one mechanism whereby two very different sequences could be mantained for a very long time in the human population. The major histocompatibility complex is one part of the human genome where this is believed to take place.

Balancing selection occurs when heterozygotes have a selective advantage over homozygotes. In "regular" evolution, either due to drift or to selection, one allele drives another one to extinction either due to simple chance (drift) or due to an advantage (directional selection). In balancing selection the two alleles are maintained because people who have both of them (heterozygotes) outbreed people who have only one or the other (homozygotes).

It is, however, possible to distinguish between sequences maintained by balancing selection and those that are not. For example, one can examine the functional consequence of polymorphism, or survey the geographical distribution of the variant sequences.


A different issue is that of recombination. Recombination slices up genome sequences  and stitches up new sequences that are a combination of those inherited from one's father and mother. Going back to our previous example:


Now consider this:


You can see that now the two sequences appear more similar to each other. This could in fact be, because a stretch of DNA (ATTA in blue) from the top sequence has become stitched up to the bottom.

If there has been archaic admixture in modern humans, we cannot expect to find very long stretches of archaic DNA. Rather, we expect to find a pastiche of archaic and modern sequence due to multiple generations of recombination. For really old admixture events recombination may obliterate all traces of admixture altogether!

This is why full genome sequencing is important, since it allows us to look at arbitrarily small lengths of DNA.   Archaic sequences of various lengths may lurk in-between the test points covered by microarrays, and by comparing full genomes we have a chance of uncovering them.

It may not, however, be possible to detect archaic admixture in very small lengths, because of statistics: 10 mutations in a length of a 100 and 100 mutations in a length of 1,000 both give the same age estimate, but the latter has a much tighter confidence interval..


Full genome sequencing will allow us to detect archaic DNA in modern humans by identifying regions of DNA that have common ancestors that are much older than the genomewide average. Some of these regions may be explained by balancing selection, while traces of others may have been lost by recombination. Nonetheless, not all of the evidence will have disappeared (especially for events in the last 100-200 thousand years), so expect it to surface sooner or later.

Introgression of archaic haplotype at OAS1 in Melanesians (Mendez et al. 2012)

It seems that Michael Hammer was good on his promise that in 2012 "This year, we should be able to confirm what we found and go way beyond that."  In a new paper, conclusive evidence is presented about introgression of an archaic sequence into Melanesian populations. The argument is as follows:

  • Melanesians are more diverse in that region than Africans.
  • The common ancestor of the "archaic" and "African" haplotypes lived >3 million years ago.
  • The "archaic" haplotype matches the ancient DNA from the Denisova hominin.
  • Balancing selection (which can sometimes maintain extremely old polymorphism) is not reasonable in this case, because it would need to maintain both "archaic" and "African" haplotypes for a long time, but then (inexplicably) would continue to operate in Melanesia and cease to operate everywhere else.

Notice that once again, this is based on resequencing a small region of the genome. This is why I am all the more confident in my prediction that the advent of full genome sequencing will uncover more archaic admixture in humans. It may not always be able to use all the above listed criteria to confirm this admixture (since we do not and cannot have ancient DNA from all the archaic hominins that once roamed the planet), but all the remaining ones will suffice to make a very good case for introgression.

What I find particularly interesting, is that Mendez et al. re-iterate a few times that genomewide averages admit to different explanations:

Full genome comparisons of the Neandertal and Denisova draft genomes with modern human sequences have revealed different amounts of shared ancestry between each of these archaic forms and anatomically modern human (AMH) populations from different geographic regions. For example, a higher proportion of SNPs was shared between non-African and Neandertal, and between Melanesian and the Denisova genomes, than between either Neandertal or Denisova and extant African genomes (Green et al. 2010; Reich et al. 2010). An intriguing possibility is that these patterns result from introgression of archaic genes into AMH populations in Eurasia. However, this SNP sharing pattern could also be explained by ancestral population structure in Africa (i.e., without the need to posit introgression). For example, if non-Africans and the ancestors of Neandertals descend from the same deme in a subdivided African population, and this structure persisted with low levels of gene flow among African residents until the ancestors of non-Africans migrated into Eurasia, then we would expect more SNP sharing between non-Africans and Neandertals (Durand et al. 2011). 
While genome-wide comparisons detect more sequence agreement between non-African and Neandertal genomes, and between Melanesian and Denisova genomes, the specific loci exhibiting these signals have not yet been identified. Furthermore, current analyses do not elucidate the relative roles of recent introgression versus long-term population structure in Africa in explaining these patterns.

The current paper does a good job at showing how in one particular region archaic introgression into Melanesians is indeed the best explanation for the evidence. But, the fact that the authors seem to re-iterate the possibility of African population structure and repeatedly caution against using patterns of genomewide sharing between modern and archaic humans is a strong hint that there are more things to come on the topic.

We should remember that the widely-circulated estimates of Neandertal->Eurasian introgression are based on genomewide averages. It is true that Reich et al. (2010) identified 13 regions of potential Neandertal introgression, which together make up a very small portion of the human genome. So, the jury is out on whether African population structure or Neandertal introgression is responsible for most of the genomewide pattern.

What you can be sure of is that many scientists are busy lining up full genomes from different human populations as we speak, and finding plenty of regions where haplotypes of extremely old divergence times co-exist in our species. We will probably learn more about such efforts during 2012.

Mol Biol Evol (2012)doi: 10.1093/molbev/msr301

Global genetic variation at OAS1 provides evidence of archaic admixture in Melanesian populations

Fernando L. Mendez, Joseph C. Watkins and Michael F. Hammer

Recent analysis of DNA extracted from two Eurasian forms of archaic human show that more genetic variants are shared with humans currently living in Eurasia than with anatomically modern humans in sub-Saharan Africa. While these genome-wide average measures of genetic similarity are consistent with the hypothesis of archaic admixture in Eurasia, analyses of individual loci exhibiting the signal of archaic introgression are needed to test alternative hypotheses and investigate the admixture process. Here, we provide a detailed sequence analysis of the innate immune gene, OAS1, a locus with a divergent Melanesian haplotype that is very similar to the Denisova sequence from the Altai region of Siberia. We re-sequenced a 7 kb region encompassing the OAS1 gene in 88 individuals from 6 Old World populations (San, Biaka, Mandenka, French Basque, Han Chinese, and Papua New Guineans) and discovered previously unknown and ancient genetic variation. The 5' region of this gene has unusual patterns of diversity, including 1) higher levels of nucleotide diversity in Papuans than in sub-Saharan Africans, 2) very deep ancestry with an estimated time to the most recent common ancestor of >3 million years, and 3) a basal branching pattern with Papuan individuals on either side of the rooted network. A global geographic survey of >1500 individuals showed that the divergent Papuan haplotype is nearly restricted to populations from eastern Indonesia and Melanesia. Polymorphic sites within this haplotype are shared with the draft Denisova genome over a span of ∼90 kb and are associated with an extended block of linkage disequilibrium, supporting the hypothesis that this haplotype introgressed from an archaic source that likely lived in Eurasia.


January 19, 2012

Shortage of female math geniuses not due to "stereotype threat"

Men are over-represented at the high end of math performance: there are more male math geniuses than female ones.

A theory that was proposed to explain that fact is that of stereotype threat. According to this theory, there is a stereotype in society that "women are bad in math"; women internalize this stereotype and lose confidence about their math abilities, and so they tend to perform sub-optimally in math tests, hence rendering the idea of "women are bad in math" a self-fulfilling prophecy.

This new study demonstrates that much of the literature that has accumulated around the idea of a "stereotype threat" can be relegated to the trash bin, and those who hope that fighting the stereotype will lead to more females joining the mathematical elite have their work cut out for them.

A video on the topic by the first author:

Review of General Psychology, Jan 16 , 2012, No Pagination Specified. doi: 10.1037/a0026617

Can stereotype threat explain the gender gap in mathematics performance and achievement?

Stoet, Gijsbert; Geary David C.

Men and women score similarly in most areas of mathematics, but a gap favoring men is consistently found at the high end of performance. One explanation for this gap, stereotype threat, was first proposed by Spencer, Steele, and Quinn (1999) and has received much attention. We discuss merits and shortcomings of this study and review replication attempts. Only 55% of the articles with experimental designs that could have replicated the original results did so. But half of these were confounded by statistical adjustment of preexisting mathematics exam scores. Of the unconfounded experiments, only 30% replicated the original. A meta-analysis of these effects confirmed that only the group of studies with adjusted mathematics scores displayed the stereotype threat effect. We conclude that although stereotype threat may affect some women, the existing state of knowledge does not support the current level of enthusiasm for this as a mechanism underlying the gender gap in mathematics. We argue there are many reasons to close this gap, and that too much weight on the stereotype explanation may hamper research and implementation of effective interventions.


January 18, 2012

Manifold Learning for Human Population Structure Studies (Siu et al. 2012)

Software implementing this should be available here.

PLoS ONE 7(1): e29901. doi:10.1371/journal.pone.0029901

Manifold Learning for Human Population Structure Studies

Hoicheong Siu et al.

The dimension of the population genetics data produced by next-generation sequencing platforms is extremely high. However, the “intrinsic dimensionality” of sequence data, which determines the structure of populations, is much lower. This motivates us to use locally linear embedding (LLE) which projects high dimensional genomic data into low dimensional, neighborhood preserving embedding, as a general framework for population structure and historical inference. To facilitate application of the LLE to population genetic analysis, we systematically investigate several important properties of the LLE and reveal the connection between the LLE and principal component analysis (PCA). Identifying a set of markers and genomic regions which could be used for population structure analysis will provide invaluable information for population genetics and association studies. In addition to identifying the LLE-correlated or PCA-correlated structure informative marker, we have developed a new statistic that integrates genomic information content in a genomic region for collectively studying its association with the population structure and LASSO algorithm to search such regions across the genomes. We applied the developed methodologies to a low coverage pilot dataset in the 1000 Genomes Project and a PHASE III Mexico dataset of the HapMap. We observed that 25.1%, 44.9% and 21.4% of the common variants and 89.2%, 92.4% and 75.1% of the rare variants were the LLE-correlated markers in CEU, YRI and ASI, respectively. This showed that rare variants, which are often private to specific populations, have much higher power to identify population substructure than common variants. The preliminary results demonstrated that next generation sequencing offers a rich resources and LLE provide a powerful tool for population structure analysis.


January 17, 2012

Comparison of MCLUST with fineSTRUCTURE

Dan Lawson has written up a comparison of fineSTRUCTURE and MCLUST and a PDF with further details. Dan first talked to me about doing this comparison in December, and it's unfortunate that I didn't try my new fastIBD method in time, so it could also be included in the analysis.

There are two parts to this type of structure inference:

  • Deriving a matrix of relationships between individuals (using PLINK IBS, ChromoPainter, or fastIBD, or ...)
  • Clustering these relationships (using fineSTRUCTURE, MCLUST, or ...)
Assessing the quality of the inferred structure is tricky, since these linkage-based methods tend to infer clusters that are finer-scaled than the level of population labels. It's not easy to know what e.g., a couple of Sardinian clusters mean if one does not have finer-level details about the origin of different Sardinian individuals. I tend to take a pragmatic view, that if clusters correspond to real-world phenomena (as the Iberian or Armenian ones do), then they are of value.

The analysis of Lawson and Falush seems to identify the main issues qute well: MCLUST is much faster, as good, but requires tuning for the number of dimensions; fineSTRUCTURE on the other hand does not require such tuning, is slower, but requires a prior (which is good or bad depending on whether you're a Bayesian or not). Both clustering algorithms perform better in the presence of linkage information than in the absence thereof.

One additional issue that MCLUST seems good at is its ability to detect clusters of varying shape, and hence discover recently admixed populations that form such clusters in PCA/MDS space. The simulated data of Lawson & Falush assume a biological model of splits/expansions, so it is not clear how their approach would handle lateral gene flow that results in "stretched" clusters of individuals.

I would love to see many different methods evaluated on a standard real-world dataset. Running ChromoPainter/fineSTRUCTURE is computationally very expensive, but I will try my hand at the Stanford HGDP set and the No1stOr2ndDegreeRelatives subset thereof, which consists of 940 individuals. If anyone wants to try alternative methods on the same real-world set, drop me an e-mail or write a comment, and I'll link to your analysis.

PS: I also have to applaud the quick response of Lawson and Falush to my idea of comparing MCLUST and fineSTRUCTURE. It is exactly the type of "open science" that I am a strong advocate for.

January 16, 2012

Phased Omni haplotypes with ShapeIT

The working directory of the 1000 Genomes ftp site contains phased haplotypes for 2,123 individuals from the 1000 Genomes Project (US/Europe). The data were phased with ShapeIT, which I've recently played with, and could recommend as a fairly user-friendly and high quality phasing software.

You can use vcftools to convert the data into PLINK format, which appears to be quite efficient (but use --plink-tped) compared to doing it on the single file I previously linked to. So, it's also a way to get 1000Genomes data into the more useful PLINK format, and it's pre-phased as a bonus.

January 13, 2012

Napoleon Bonaparte belonged to haplogroup E1b1b1c1* (E-M34*)

A previous paper on his mtDNA which was H.  A previous study found that Hitler also belonged to haplogroup E1b1b. So, expect plenty of war and mayhem if a new European leader emerges with a haplogroup E1b1b chromosome -- and, yes, I'm joking.

Journal of Molecular Biology Research Vol 1, No 1 (2011)

Haplogroup of the Y Chromosome of Napoléon the First

Gerard Lucotte, Thierry Thomasset, Peter Hrechdakian

This paper describes the finding of the determination of the Y-haplogroup of French Emperor Napoléon I (Napoléon Bonaparte). DNA was extracted from two islands of follicular sheaths located at the basis of two of his beard hairs, conserved in the Vivant Denon reliquary. The Y-haplogroup of Napoléon I, determined by the study of 10 NRY-SNPs (non-recombinant Y-single nucleotide polymorphisms), is E1b1b1c1*. Charles Napoléon, the current collateral male descendant of Napoléon I, belongs to this same Y-haplogroup; his Y-STR profile was determined by using a set of 37 NRY-STRs (non-recombinant Y-microsatellites).


Back to (North) Africa (Henn et al. 2012)

A great new paper has just appeared, presenting new data, new conclusions about African prehistory, and new methodologies. I'll have to read it before I comment on it, but since it's open access you can read it for yourselves.


The new data are publicly available here, with information about samples here.
The new PCADMIX software is also available.

PLoS Genet 8(1): e1002397. doi:10.1371/journal.pgen.1002397 

Genomic Ancestry of North Africans Supports Back-to-Africa Migrations 

Brenna Henn et al.

 North African populations are distinct from sub-Saharan Africans based on cultural, linguistic, and phenotypic attributes; however, the time and the extent of genetic divergence between populations north and south of the Sahara remain poorly understood. Here, we interrogate the multilayered history of North Africa by characterizing the effect of hypothesized migrations from the Near East, Europe, and sub-Saharan Africa on current genetic diversity. We present dense, genome-wide SNP genotyping array data (730,000 sites) from seven North African populations, spanning from Egypt to Morocco, and one Spanish population. We identify a gradient of likely autochthonous Maghrebi ancestry that increases from east to west across northern Africa; this ancestry is likely derived from “back-to-Africa” gene flow more than 12,000 years ago (ya), prior to the Holocene. The indigenous North African ancestry is more frequent in populations with historical Berber ethnicity. In most North African populations we also see substantial shared ancestry with the Near East, and to a lesser extent sub-Saharan Africa and Europe. To estimate the time of migration from sub-Saharan populations into North Africa, we implement a maximum likelihood dating method based on the distribution of migrant tracts. In order to first identify migrant tracts, we assign local ancestry to haplotypes using a novel, principal component-based analysis of three ancestral populations. We estimate that a migration of western African origin into Morocco began about 40 generations ago (approximately 1,200 ya); a migration of individuals with Nilotic ancestry into Egypt occurred about 25 generations ago (approximately 750 ya). Our genomic data reveal an extraordinarily complex history of migrations, involving at least five ancestral populations, into North Africa.


January 11, 2012

How people get blue eyes

Genome-wide association studies can uncover links between genetic variants and phenotypes, even in the absence of any knowledge of how these links come about. All it takes is to make a statistical case linking genetic variation with the recorded phenotypic information.

This is somewhat unsatisfactory for a couple of reasons. First, we would like to know how cause and effect works, rather than simply observe that it does. Why do some people with certain genetic alleles have blue eyes?

Second, such functional studies allow us to predict phenotypes from genotypes. A great number of genetic mutations may cause particular phenotypes, and we are only able to discover associations between a subset of them that happens to exist in a population. Developing knowledge about function, rather than just statistical association, may help us in the future to infer the phenotypes of individuals from the deep past for which all non-osteological traces of phenotype have vanished, and may have been affected by genetic variants that are now extinct.

Many human traits are governed by a great number of genes, either through additive effects, or through complex interactions. Eye color is an example of a particular trait the genetic underpinnings of which in Caucasoids (other races have eyes that are uniformly brown) have been known for a while. Now a new study shows precisely how genetic mutations disrupt the formation of pigment in melanocytes, resulting in light-pigmented irides.

Genome Res doi:10.1101/gr.128652.111

HERC2 rs12913832 modulates human pigmentation by attenuating chromatin loop formation between a long-range enhancer and the OCA2 promoter

Mijke Visser et al.

Pigmentation of skin, eye and hair reflects some of the most evident common phenotypes in humans. Several candidate genes for human pigmentation are identified, and the SNP rs12913832 has strong statistical association with human pigmentation. It is located within an intron of the non-pigment gene HERC2, 21 kb upstream of the pigment gene OCA2, and the region surrounding rs12913832 is highly conserved among animal species. However, the exact functional role of HERC2 rs12913832 in human pigmentation is unknown. Here we demonstrate that the HERC2 rs12913832 region functions as an enhancer regulating OCA2 transcription. In darkly pigmented human melanocytes carrying the rs12913832 T-allele, we detected binding of the transcription factors HLTF, LEF1 and MITF to the HERC2 rs12913832 enhancer, and a long-range chromatin loop between this enhancer and the OCA2 promoter which leads to elevated OCA2 expression. In contrast, in lightly pigmented melanocytes carrying the rs12913832 C-allele, chromatin-loop formation, transcription factor recruitment and OCA2 expression are all reduced. Hence, we demonstrate that allelic variation of a common non-coding SNP located in a distal regulatory element not only disrupts the regulatory potential of this element but also affects its interaction with the relevant promoter. We provide the key mechanistic insight that allele-dependent differences in chromatin-loop formation (i.e. structural differences in the folding of gene loci) results in differences in allelic gene expression that affects common phenotypic traits. This concept is highly relevant for future studies aiming to unveil the functional basis of genetically-determined phenotypes including diseases.


Lactase persistence in Neolithic Iberia

This is an extremely important study as it establishes the occurrence of lactase persistence in Neolithic Europe. This invalidates the idea proposed by some about a very late (post-Neolithic) introduction of lactase persistence into Europe by a pastoral population from the east, since we now have good evidence about the presence of this trait in a Neolithic sample from Atlantic Europe.

The frequency is higher than in the early Neolithic Linearbandkeramik (where it was absent in the tested samples), and lower than in present-day Basques, although levels of 27% are quite comparable to some modern south European populations. We are unlikely to detect the earliest occurrence of this trait (when it was limited to the original mutant and his descendants, prior to having a substantial advantage for digesting milk), but the new findings represent a new non-zero data point in the time series, which will certainly fill up as more points in space and time are tested.

European Journal of Human Genetics advance online publication 11 January 2012; doi: 10.1038/ejhg.2011.254

Low prevalence of lactase persistence in Neolithic South-West Europe

Theo S Plantinga et al.

The ability of humans to digest the milk component lactose after weaning requires persistent production of the lactose-converting enzyme lactase. Genetic variation in the promoter of the lactase gene (LCT) is known to be associated with lactase production and is therefore a genetic determinant for either lactase deficiency or lactase persistence during adulthood. Large differences in this genetic trait exist between populations in Africa and the Middle-East on the one hand, and European populations on the other; this is thought to be due to evolutionary pressures exerted by consumption of dairy products in Neolithic populations in Europe. In this study, we have investigated lactase persistence of 26 out of 46 individuals from Late Neolithic through analysis of ancient South-West European DNA samples, obtained from two burials in the Basque Country originating from 5000 to 4500 YBP. This investigation revealed that these populations had an average frequency of lactase persistence of 27%, much lower than in the modern Basque population, which is compatible with the concept that Neolithic and post-Neolithic evolutionary pressures by cattle domestication and consumption of dairy products led to high lactase persistence in Southern European populations. Given the heterogeneity in the frequency of the lactase persistence allele in ancient Europe, we suggest that in Southern Europe the selective advantage of lactose assimilation in adulthood most likely took place from standing population variation, after cattle domestication, at a post-Neolithic time when fresh milk consumption was already fully adopted as a consequence of a cultural influence.


Clusters Galore (fastIBD edition)

(You can scroll down to the Results section, if you are not interested in the technical stuff)

When I proposed Clusters Galore  in November 2010, I was pleasantly surprised to see that very fine scale population structure could be uncovered using a combination of two algorithms:
  • A dimensionality reduction technique (such as PCA or MDS) applied to dense genotypic data
  • MCLUST, a state-of-the art model-based normal mixture clustering algorithm that had enough chops to uncover clusters of arbitrary size, shape, and orientation in multidimensional space
I explained how to carry out Clusters Galore analysis here. A most recent analysis of West Eurasians can be found here.

I have always thinking since that time of ways to improve the methodology. Since MCLUST is hard to best in my experience, I thought that improvement could be produced in the first step of the analysis; I have tried various ideas about choosing how many dimensions to retain, based on test of normality, Tracy-Widom statistics or some newer ideas. My conclusion has been that one could expect only delta improvements with any of these ideas.

After reading an abstract by Myers et al. in last years's ICHG, I realized that further improvement in resolution might be had by exploiting the linkage structure of dense genotype data, i.e., the pattern of co-inheritance of alleles along a stretch of chromosome.

Since, I didn't want to reinvent the wheel, I found the paintmychromosomes website, which I've also covered here, noting that it is unclear to what extent the ability of this methodology to infer fine-scale population structure is due to its exploitation of linkage.  Unfortunately, the processing pipeline for this technique is computationally daunting, and I would probably have to wait months to carry out any meaningful experiment on the types of datasets I'm used to working with.

An hour's worth of coding may save you a month of runtime, so I decided to look elsewhere.

I've been experimenting with fastIBD for a while now, so I invested some time to write some auxiliary code that would help me use it for my purposes. fastIBD finds identical-by-descent segments in a collection of individuals. It also has several attractive properties:

  • It is fast
  • It does its own phasing
  • It runs within BEAGLE, a very well-known genetic analysis software
In principle one could do a single fastIBD run over an entire dataset, but the memory footprint is prohibitive. So, rather than beg for money for a bigger computer, I took ten minutes to write some code that combines the results of 22 fastIBD runs (one per chromosome). I also wrote some code that calculates how much (in Morgans) IBD sharing exists in a pair of individuals.

fastIBD is tunable in various ways, but the default parameters seem to work fine for my purposes.

The end result of my labors is an NxN matrix (if there are N individuals) of IBD-based distance between individuals (in Morgans not IBD between individuals). This can then be fed into R's MDS routine, and then it's business as usual.


I assembled a North European dataset for testing my ideas. A thing that always bugged me was the lack of ability to detect much population structure in the British Isles. So, I hoped that the added "punch" of using fastIBD would finally uncover this structure. All analyses were run with 256,932 SNPs.

Clusters Galore (fastIBD)

The 26 first MDS dimensions deviated from normality according to a Shapiro-Wilk test, and MCLUST found a total of 21 clusters using these dimensions.

Population N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Russian_D 22 2 20
Irish_D 22 19 2 1
Polish_D 23 23
German_D 21 1 18 2
Finnish_D 17 10 7
Swedish_D 13 1 12
English_D 12 12
British_D 13 1 12
Norwegian_D 11 11
Lithuanian_D 10 1 9
Dutch_D 9 5 3 1
British_Isles_D 8 8
Mixed_Scandinavian_D 4 4
Danish_D 3 3
Ukrainian_D 2 2
Latvian_D 1 1
Estonian_D 1 1
Russian 25 1 24
Orcadian 15 15
Lithuanians 10 1 9
Belorussian 9 8 1
Orkney_1KG 25 19 2 2 2
Kent_1KG 38 34 2 2
Cornwall_1KG 33 1 30 2
Argyll_1KG 4 1 3
FIN30 30 14 16
Ukranians_Y 20 20
Mordovians_Y 15 13 2

The clusters could be labeled as:
  1. Mordovian
  2. Slavic
  3. Irish
  4. English/British 
  5. German
  6. Mini-cluster of 2 related Germans?
  7. Scandinavian
  8. Finnish 1
  9. Finnish 2
  10. Lithuanian
  11. Orkney
  12. Vologda Russians (HGDP)
  13. Mini-cluster of 2 Vologda Russians
  14. Mini-cluster of 2 Vologda Russians
  15. Mini-cluster of 2 Vologda Russians
  16. Mini-cluster of 2 Kent English
  17. Mini-cluster of 2 Kent English
  18. Cornwall
  19. Mini-cluster of 2 Cornwall
  20. Argyll
  21. Mini-cluster of 2 Mordovians
So, it seems that my intuition was correct. There is a fairly clean division of Lithuanians and Slavs that was much more muddled whenever it came up before, a clean division of Mordvins and Russians, and a fairly comprehensive split of British Isles populations: a quite clean Irish cluster, a Cornwall cluster, an Argyll cluster, and a Kent/English cluster. Note that British_D and British_Isles_D populations consist mostly of English+some other British Isles, so I am not very surprised that they fall in the English main cluster.

The entire analysis (fastIBD + MCLUST) took a few hours to run.

Clusters Galore (PLINK/MDS)

For comparison, using the "classical" Clusters Galore with PLINK's MDS facility, there were 15 non-normal dimensions. A total of 16 clusters were inferred:

Population N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Russian_D 22 2 15 5
Irish_D 22 6 6 10
Polish_D 23 20 3
German_D 21 1 8 4 3 5
Finnish_D 17 17
Swedish_D 13 12 1
English_D 12 5 2 5
British_D 13 1 3 9
Norwegian_D 11 5 2 3 1
Lithuanian_D 10 10
Dutch_D 9 2 4 3
British_Isles_D 8 3 1 4
Mixed_Scandinavian_D 4 4
Danish_D 3 1 1 1
Ukrainian_D 2 1 1
Latvian_D 1 1
Estonian_D 1 1
Russian 25 8 1 16
Orcadian 15 6 9
Lithuanians 10 10
Belorussian 9 9
Orkney_1KG 25 2 14 5 2 2
Kent_1KG 38 15 8 11 2 2
Cornwall_1KG 33 17 1 13 2
Argyll_1KG 4 2 2
FIN30 30 1 29
Ukranians_Y 20 20
Mordovians_Y 15 11 1 3

Some of the distinctions lost: 2 Finnish clusters rolled into 1; Lithuanians and + Slavs rolled into 1; there are 3 British Isles clusters with substantial overlap between different populations as well as with Scandinavians; Mordovians and Russians overlap; no German cluster.

It seems pretty clear to me that Clusters Galore (fastIBD) is the way to go into the future for this type of analysis, and hopefully further refinements to the methodology and the addition of more project participants will add even more resolution.

Clustering relies on (i) the ability to detect "blobs" of individuals, and (ii) the existence of such "blobs" of individuals. Clusters Galore (fastIBD edition) seems to be pretty good at doing (i), but it's as good as the data it's fed. For example, currently the Dutch seem split between the English and the Germans, but I have little doubt that if their sample sizes were to grow, they would also form their own specific cluster.

If you are a Project participant from these groups, you can find the results of this run in this spreadsheet.