August 17, 2014

Indo-Europeans preceded Finno-Ugrians in Finland and Estonia

According to an abstract of a Ph.D thesis (below). This would appear to work well with the dating of the signature Y-chromosome haplogroup of Finno-Ugrians. 

Bidrag till Fennoskandiens språkliga förhistoria i tid och rum (Heikkilä, Mikko)
My academic dissertation "Bidrag till Fennoskandiens språkliga förhistoria i tid och rum" ("Spatiotemporal Contributions to the Linguistic Prehistory of Fennoscandia") is an interdisciplinary study of the linguistic prehistory of Northern Europe chiefly in the Iron Age (ca. 700 BC―AD 1200), but also to some extent in the Bronze Age (ca. 1700―700 BC) and the Early Finnish Middle Ages (ca. AD 1200―1323). The disciplines represented in this study are Germanistics, Nordistics, Finnougristics, history and archaeology. The language-forms studied are Proto-Germanic, Proto-Scandinavian, Proto-Finnic and Proto-Sami. This dissertation uses historical-comparative linguistics and especially loanword study to examine the relative and absolute chronology of the sound changes that have taken place in the proto-forms of the Germanic, Finnic and Samic languages. Phonetic history is the basis of historical linguistics studying the diachronic development of languages. To my knowledge, this study is the first in the history of the disciplines mentioned above to examine the systematic dating of the phonetic development of these proto-languages in relation to each other. In addition to the dating and relating of the phonetic development of the proto-languages, I study Fennoscandian toponyms. The oldest datable and etymologizable place-names throw new light on the ethnic history and history of settlement of Fennoscandia. For instance, I deal with the etymology of the following place-names: Ahvenanmaa/Åland, Eura(joki), Inari(järvi), Kemi(joki), Kvenland, Kymi(joki), Sarsa, Satakunta, Vanaja, Vantaa and Ähtäri. 
My dissertation shows that Proto-Germanic, Proto-Scandinavian, Proto-Finnic and Proto-Sami all date to different periods of the Iron Age. I argue that the present study along with my earlier published research also proves that a (West-)Uralic language – the pre-form of the Finnic and Samic languages – was spoken in the region of the present-day Finland in the Bronze Age, but not earlier than that. In the centuries before the Common Era, Proto-Sami was spoken in the whole region of what is now called Finland, excluding Lapland. At the beginning of the Common Era, Proto-Sami was spoken in the whole region of Finland, including Southern Finland, from where the Sami idiom first began to recede. An archaic (Northwest-)Indo-European language and a subsequently extinct Paleo-European language were likely spoken in what is now called Finland and Estonia, when the linguistic ancestors of the Finns and the Sami arrived in the eastern and northern Baltic Sea region from the Volga-Kama region probably at the beginning of the Bronze Age. For example, the names Suomi ʻFinlandʼ and Viro ʻEstoniaʼ are likely to have been borrowed from the Indo-European idiom in question. (Proto-)Germanic waves of influence have come from Scandinavia to Finland since the Bronze Age. A considerable part of the Finnic and Samic vocabulary is indeed Germanic loanwords of different ages which form strata in these languages. Besides mere etymological research, these numerous Germanic loanwords make it possible to relate to each other the temporal development of the language-forms that have been in contact with each other. That is what I have done in my extensive dissertation, which attempts to be both a detailed and a holistic treatise.

August 15, 2014

ISBA 2014 titles

Some interesting talks and posters from the upcoming International Symposium on Biomolecular Archaeology. I don't see any abstracts on the site (yet?) but the titles are intriguing. Some that caught my eye:

  • Investigating the maternal lineage diversity from an early medieval site in Southern Italy
  • Ancient mitochondrial and Y chromosomal DNA reveals the western Carpathian Basin as a corridor of the Neolithic expansion
  • Ancient mitochondrial DNA from the Northern fringe of the Neolithic farming expansion in Europe sheds light on the dispersion process
  • The effect of demography and natural selection on pigmentation heterogeneity in late Pleistocene and early Holocene Europeans
  • The genomics of equine speciation and domestication
  • Ancient population genetics: new insights on horse domestication
  • Species identification and analysis of the Tyrolean Iceman's clothes using next generation sequencing of ancient DNA.
  • Early evidence for the use of pottery: extending the ancient lipid record to the Pleistocene.
  • Whey to go – first identification of lactose in prehistoric pottery
  • Use of the earliest pottery on the Western and Eastern side of the Baltic
  • The geographical distribution of the Polynesian cultural complex and its association with P33-C2a1 Y chromosomes: adding data from Aotearoa (New Zealand)
  • Interdisciplinary investigation of an archaic hominin femur from the Swabian Jura (South-West Germany)
  • Tracing the genetic history of farming populations of El Portalón Cave in the Sierra de Atapuerca, Spain.
  • Ancient human genomes suggest three ancestral populations for present-day Europeans
  • Ancient DNA from Early Neolithic farmers in Europe
  • Genomic diversity and admixture in Stone-Age farmer and hunter-gatherer groups in Scandinavia
  • Ancient DNA reveals the complex genetic history of the New World Arctic
  • A prediction of the hybridisation potential between Hominin species using mitochondrial DNA
  • Population Genomics of Vikings
  • Tracing the genetic profile of Sus scrofa on Romanian territory from the Neolithic period until the Middle Ages
  • The origins of the Aegean palatial civilizations from a population genetic perspective
  • Ancient DNA evidence for a diversified origin of ancestor of Han Chinese

mtDNA from Chalcolithic Iberia (El Mirador cave)

A very exciting new study from Chalcolithic Iberia. The authors compare their mtDNA data with those from the Brandt et al. (2013) paper which includes German samples from the same time.

The following plot seems quite useful. From its caption:
This study: El Mirador (MIR). Published prehistoric cultures [21]: Hunter-gatherer central (HGC), Linear Pottery culture (LBK), Rössen culture (RSC), Schöningen group (SCG), Baalberge culture (BAC), Salzmünde culture (SMC), Bernburg culture (BEC), Corded Ware culture (CWC), Bell Beaker culture (BBC), Unetice culture (UC), Funnel Beaker culture (FBC), Pitted Ware culture (PWC), Hunter-Gatherer south (HGS), (Epi) Cardial (CAR), Neolithic Portugal (NPO), Neolithic Basque Country and Navarre (NBQ), Treilles culture (TRE), Hunter-gatherer east (HGE), Bronze Age Siberia (BAS), Bronze Age Kazakhstan (BAK).

From the paper:
In none of the analyses El Mirador sample shows close genetic affinities with a contemporaneous Bell Beaker population of 29 specimens gathered from three sites in Germany. The Bell Beaker mtDNA signal is characterized by high frequencies (around 50%) of H haplogroup that in El Mirador only reaches 26%. This heterogeneity in the genetic composition of geographically close populations adds further complexity to future reconstructions of these ancient expansions and correlates with the existence of contemporaneous groups with and without the typical Bell Beaker burial kit.
mtDNA may not be the best tool for studying the spread of Bell Beakers (if this involved men), but this shows that the high frequency of H in Bell Beakers of Germany (observed by Brandt et al.) is not due to an even higher frequency of H in Iberia.

PLoS ONE 9(8): e105105. doi:10.1371/journal.pone.0105105

Mitochondrial DNA from El Mirador Cave (Atapuerca, Spain) Reveals the Heterogeneity of Chalcolithic Populations

Daniel Gómez-Sánchez,Iñigo Olalde et al.

Previous mitochondrial DNA analyses on ancient European remains have suggested that the current distribution of haplogroup H was modeled by the expansion of the Bell Beaker culture (ca 4,500–4,050 years BP) out of Iberia during the Chalcolithic period. However, little is known on the genetic composition of contemporaneous Iberian populations that do not carry the archaeological tool kit defining this culture. Here we have retrieved mitochondrial DNA (mtDNA) sequences from 19 individuals from a Chalcolithic sample from El Mirador cave in Spain, dated to 4,760–4,200 years BP and we have analyzed the haplogroup composition in the context of modern and ancient populations. Regarding extant African, Asian and European populations, El Mirador shows affinities with Near Eastern groups. In different analyses with other ancient samples, El Mirador clusters with Middle and Late Neolithic populations from Germany, belonging to the Rössen, the Salzmünde and the Baalberge archaeological cultures but not with contemporaneous Bell Beakers. Our analyses support the existence of a common genetic signal between Western and Central Europe during the Middle and Late Neolithic and points to a heterogeneous genetic landscape among Chalcolithic groups.


August 12, 2014

168 South Asian Genomes

PLoS ONE 9(8): e102645. doi:10.1371/journal.pone.0102645

The South Asian Genome

John C. Chambers et al.

The genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.


August 09, 2014

New estimates of human mtDNA node dates and substitution rates (Rieux et al. 2014)

This is a quite useful paper as it compares different methods of obtaining mutation rate estimates, either using "archaeological calibration" based on known migration events or ancient mtDNA genomes (with known archaeological dates). The authors write:
Our estimate of 143 Kya [112-180 95% HPD] for the TMRCA of all modern human mtDNA is slightly younger but highly consistent with the 157 Kya [120-197 95% HPD] value obtained by Fu et al. (2013b). We stimate the coalescence of the L3 haplogroup (the lineage from which all non-African mtDNA haplogroups descend), often used to date the “out-of-Africa” event, to 72 Kya [54-93 95%HPD], a value also onsistent with Fu et al. (2013b) estimation of 78 Kya [62-95 95%HPD]. This estimation rather places a conservative upper bound of 93 kya for the time of the last major gene exchange between non-African nd sub-Saharan African populations. As pointed out by Fu et al. (2013b), it is important to recognize that this divergence time may merely represent the most recent gene exchanges between the ancestors f non-Africans and the most closely related sub-Saharan Africans and thus may reflect only the most recent population split in a long, drawn-out process of population separation (Scally and Durbin 2012).
The 72kya date would agree quite well with my postulated Out-of-Arabia event circa 70 thousand years ago.

It should be fairly easy to pick out the common ancestor of Eurasian mtDNA (the common ancestor of M+N). I am reasonably sure that the two African red dots to the right of event "8" in the figure are African L3's, and this would place them within the Eurasian variation, and in particular as a relative of Eurasian M.

A similar observation could be found in Supplementary Figure 14 of the Lippold et al. (2014) preprint, with African L3 lineages clearly related to Eurasian M (and nested within the Eurasian phylogeny).

In any case, I don't see any evidence at all from this phylogeny that the date of L3 corresponds to an Out-of-Africa event. Unfortunately I couldn't see an estimate for the split of L3 from the rest of the phylogeny; my eyeball estimate from the figure is that it's about 20ky earlier. Hopefully, someone sooner or later will deal with the question of L3 phylogeny, because the "conventional wisdom" that Eurasian M, N are nested within African L3 variation does not appear to be quite right.

Mol Biol Evol (2014) doi: 10.1093/molbev/msu222

Improved calibration of the human mitochondrial clock using ancient genomes

Adrien Rieux et al.

Reliable estimates of the rate at which DNA accumulates mutations (the substitution rate) are crucial for our understanding of the evolution and past demography of virtually any species. In humans, there are considerable uncertainties around these rates, with substantial variation among recent published estimates. Substitution rates have traditionally been estimated by associating dated events to the root (e.g. the divergence between humans and chimpanzees) or to internal nodes in a phylogenetic tree (e.g. first entry into the Americas). The recent availability of ancient mtDNA sequences allows for a more direct calibration by assigning the age of the sequenced samples to the tips within the human phylogenetic tree. But studies also vary greatly in the methodology employed and in the sequence panels analysed, making it difficult to tease apart the causes for the differences between previous estimates. To clarify this issue, we compiled a comprehensive dataset of 350 ancient and modern human complete mtDNA genomes, among which 146 were generated for the purpose of this study, and estimated substitution rates using calibrations based both on dated nodes and tips. Our results demonstrate that, for the same dataset, estimates based on individual dated tips are far more consistent with each other than those based on nodes and should thus be considered as more reliable.


August 08, 2014

mtDNA haplogroup V7 from ~5,000-year old kurgan of the Novosvobodnaya culture

I am not sure that the finding of a single mtDNA V7 sample suggest "a role of the TRB culture in the development of the Novosvobodnaya culture", or, indeed with the labeling of the TRB as "Indo-European". In any case, it's good to see some ancient DNA from the North Caucasus.

Acta Naturae. 2014 Apr-Jun; 6(2): 31–35.

Analysis of the Mitochondrial Genome of a Novosvobodnaya Culture Representative using Next-Generation Sequencing and Its Relation to the Funnel Beaker Culture

A. V. Nedoluzhko et al.

The Novosvobodnaya culture is known as a Bronze Age archaeological culture in the North Caucasus region of Southern Russia. It dates back to the middle of the 4th millennium B.C. and seems to have occurred during the time of the Maikop culture. There are now two hypotheses about the emergence of the Novosvobodnaya culture. One hypothesis suggests that the Novosvobodnaya culture was a phase of the Maikop culture, whereas the other one classifies it as an independent event based on the material culture items found in graves. Comparison between Novosvobodnaya pottery and Funnelbeaker (TRB) pottery from Germany has allowed researchers to suggest that the Novosvobodnaya culture developed under the influence of Indo-European culture. Nevertheless, the origin of the Novosvobodnaya culture remains a matter of debate. We applied next-generation sequencing to study ~5000-year-old human remains from the Klady kurgan grave in Novosvobodnaya stanitsa (now the Republic of Adygea, Russia). A total of 58,771,105 reads were generated using Illumina GAIIx with a coverage depth of 13.4x over the mitochondrial (mt) DNA genome. The mtDNA haplogroup affiliation was determined as V7, suggesting a role of the TRB culture in the development of the Novosvobodnaya culture and supporting the model of sharing between Novosvobodnaya and early Indo-European cultures.


August 06, 2014

Dairy farming transition ~2,500 years BC in the far north of Europe

Proceedings of the Royal Society B doi: 10.1098/rspb.2014.0819

Neolithic dairy farming at the extreme of agriculture in northern Europe

Lucy J. E. Cramp et al.

The conventional ‘Neolithic package’ comprised animals and plants originally domesticated in the Near East. As farming spread on a generally northwest trajectory across Europe, early pastoralists would have been faced with the challenge of making farming viable in regions in which the organisms were poorly adapted to providing optimal yields or even surviving. Hence, it has long been debated whether Neolithic economies were ever established at the modern limits of agriculture. Here, we examine food residues in pottery, testing a hypothesis that Neolithic farming was practiced beyond the 60th parallel north. Our findings, based on diagnostic biomarker lipids and δ13C values of preserved fatty acids, reveal a transition at ca 2500 BC from the exploitation of aquatic organisms to processing of ruminant products, specifically milk, confirming farming was practiced at high latitudes. Combining this with genetic, environmental and archaeological information, we demonstrate the origins of dairying probably accompanied an incoming, genetically distinct, population successfully establishing this new subsistence ‘package’.


Yfitter preprint and software

arXiv:1407.7988 [q-bio.PE]

YFitter: Maximum likelihood assignment of Y chromosome haplogroups from low-coverage sequence data

Luke Jostins, Yali Xu, Shane McCarthy, Qasim Ayub, Richard Durbin, Jeff Barrett, Chris Tyler-Smith

(Submitted on 30 Jul 2014)

Low-coverage short-read resequencing experiments have the potential to expand our understanding of Y chromosome haplogroups. However, the uncertainty associated with these experiments mean that haplogroups must be assigned probabilistically to avoid false inferences. We propose an efficient dynamic programming algorithm that can assign haplogroups by maximum likelihood, and represent the uncertainty in assignment. We apply this to both genotype and low-coverage sequencing data, and show that it can assign haplogroups accurately and with high resolution. The method is implemented as the program YFitter, which can be downloaded from this http URL


Craniofacial morphology of Greeks through 4,000 years

Anthropol Anz. 2014;71(3):237-57.

Craniofacial morphology in ancient and modern Greeks through 4,000 years.

Papagrigorakis MJ, Kousoulis AA, Synodinos PN. Abstract


Multiple 20th century studies have speculated on the anthropological similarities of the modern inhabitants of Greece with their ancient predecessors. The present investigation attempts to add to this knowledge by comparing the craniofacial configuration of 141 ancient (dating around 2,000-500 BC) and 240 modern Greek skulls (the largest material among relevant national studies).


Skulls were grouped in age at death, sex, era and geographical categories; lateral cephalograms were taken and 53 variables were measured and correlated statistically. The craniofacial measurements and measurements of the basic quadrilateral and cranial polygon were compared in various groups using basic statistical methods, one-way ANOVA and assessment of the correlation matrices.


Most of the measurements for both sexes combined followed an akin pattern in ancient and modern Greek skulls. Moreover, sketching and comparing the outline of the skull and upper face, we observed a clock-wise movement. The present study confirms that the morphological pattern of Greek skulls, as it changed during thousands of years, kept some characteristics unchanged, with others undergoing logical modifications.


The analysis of our results allows us to believe that the influence upon the craniofacial complex of the various known factors, including genetic or environmental alterations, is apt to alter its form to adapt to new conditions. Even though 4,000 years seems too narrow a span to provoke evolutionary insights using conventional geometric morphometrics, the full presentation of our results makes up a useful atlas of solid data. Interpreted with caution, the craniofacial morphology in modern and ancient Greeks indicates elements of ethnic group continuation within the unavoidable multicultural mixtures.


July 31, 2014

Wine cup of Pericles found

Wine cup used by Pericles found in grave north of Athens
Experts are "99 per cent" sure that the cup was used by the Athenian statesman, as one of the other names listed, Ariphron, is that of Pericles' elder brother.

"The name Ariphron is extremely rare," Angelos Matthaiou, secretary of the Greek Epigraphic Society, told the newspaper.

"Having it listed above that of Pericles makes us 99 per cent sure that these are the two brothers," he said.
Finding the cup of Pericles is cool, but finding his actual tomb would be even cooler. Thanks to Pausanias and other ancient observers, the location and identity of many of the tombs of ancient prominent Athenians is known.

July 29, 2014

Lethal mutations quantified

A very interesting new preprint on the arXiv (so it can be freely read). The founder population is the Hutterites. The key sentence:
Our approach indicates that on average, one in every two humans carries a recessive lethal allele on the autosomes that lead to lethality after birth and before reproductive age or to complete sterility.

arXiv:1407.7518 [q-bio.PE]

An estimate of the average number of recessive lethal mutations carried by humans

Ziyue Gao, Darrel Waggoner, Matthew Stephens, Carole Ober, Molly Przeworski

The effects of inbreeding on human health depend critically on the number and severity of recessive, deleterious mutations carried by individuals. In humans, existing estimates of these quantities are based on comparisons between consanguineous and non-consanguineous couples, an approach that confounds socioeconomic and genetic effects of inbreeding. To circumvent this limitation, we focused on a founder population with almost complete Mendelian disease ascertainment and a known pedigree. By considering all recessive lethal diseases reported in the pedigree and simulating allele transmissions, we estimated that each haploid set of human autosomes carries on average 0.29 (95% credible interval [0.10, 0.83]) autosomal, recessive alleles that lead to complete sterility or severe disorders at birth or before reproductive age when homozygous. Comparison to existing estimates of the deleterious effects of all recessive alleles suggests that a substantial fraction of the burden of autosomal, recessive variants is due to single mutations that lead to death between birth and reproductive age. In turn, the comparison to estimates from other eukaryotes points to a surprising constancy of the average number of recessive lethal mutations across organisms with markedly different genome sizes.


July 26, 2014

Ancestry of Cubans

PLoS Genet 10(7): e1004488. doi:10.1371/journal.pgen.1004488

Cuba: Exploring the History of Admixture and the Genetic Basis of Pigmentation Using Autosomal and Uniparental Markers

Beatriz Marcheco-Teruel et al.

We carried out an admixture analysis of a sample comprising 1,019 individuals from all the provinces of Cuba. We used a panel of 128 autosomal Ancestry Informative Markers (AIMs) to estimate the admixture proportions. We also characterized a number of haplogroup diagnostic markers in the mtDNA and Y-chromosome in order to evaluate admixture using uniparental markers. Finally, we analyzed the association of 16 single nucleotide polymorphisms (SNPs) with quantitative estimates of skin pigmentation. In the total sample, the average European, African and Native American contributions as estimated from autosomal AIMs were 72%, 20% and 8%, respectively. The Eastern provinces of Cuba showed relatively higher African and Native American contributions than the Western provinces. In particular, the highest proportion of African ancestry was observed in the provinces of Guantánamo (40%) and Santiago de Cuba (39%), and the highest proportion of Native American ancestry in Granma (15%), Holguín (12%) and Las Tunas (12%). We found evidence of substantial population stratification in the current Cuban population, emphasizing the need to control for the effects of population stratification in association studies including individuals from Cuba. The results of the analyses of uniparental markers were concordant with those observed in the autosomes. These geographic patterns in admixture proportions are fully consistent with historical and archaeological information. Additionally, we identified a sex-biased pattern in the process of gene flow, with a substantially higher European contribution from the paternal side, and higher Native American and African contributions from the maternal side. This sex-biased contribution was particularly evident for Native American ancestry. Finally, we observed that SNPs located in the genes SLC24A5 and SLC45A2 are strongly associated with melanin levels in the sample.


July 17, 2014

More selection on the X than in autosomes in humans

Mol Biol Evol (2014) doi: 10.1093/molbev/msu166

Evidence for Increased Levels of Positive and Negative Selection on the X Chromosome versus Autosomes in Humans

Krishna R. Veeramah et al.

Partially recessive variants under positive selection are expected to go to fixation more quickly on the X chromosome as a result of hemizygosity, an effect known as faster-X. Conversely, purifying selection is expected to reduce substitution rates more effectively on the X chromosome. Previous work in humans contrasted divergence on the autosomes and X chromosome, with results tending to support the faster-X effect. However, no study has yet incorporated both divergence and polymorphism to quantify the effects of both purifying and positive selection, which are opposing forces with respect to divergence. In this study, we develop a framework that integrates previously developed theory addressing differential rates of X and autosomal evolution with methods that jointly estimate the level of purifying and positive selection via modeling of the distribution of fitness effects (DFE). We then utilize this framework to estimate the proportion of nonsynonymous substitutions fixed by positive selection (α) using exome sequence data from a West African population. We find that varying the female to male breeding ratio (β) has minimal impact on the DFE for the X chromosome, especially when compared with the effect of varying the dominance coefficient of deleterious alleles (h). Estimates of α range from 46% to 51% and from 4% to 24% for the X chromosome and autosomes, respectively. While dependent on h, the magnitude of the difference between α values estimated for these two systems is highly statistically significant over a range of biologically realistic parameter values, suggesting faster-X has been operating in humans.


Craniofacial feminization and the origin of behavioral modernity

Current Anthropology Vol. 55, No. 4, August 2014

Robert L. Cieri et al.


The past 200,000 years of human cultural evolution have witnessed the persistent establishment of behaviors involving innovation, planning depth, and abstract and symbolic thought, or what has been called “behavioral modernity.” Demographic models based on increased human population density from the late Pleistocene onward have been increasingly invoked to understand the emergence of behavioral modernity. However, high levels of social tolerance, as seen among living humans, are a necessary prerequisite to life at higher population densities and to the kinds of cooperative cultural behaviors essential to these demographic models. Here we provide data on craniofacial feminization (reduction in average brow ridge projection and shortening of the upper facial skeleton) in Homo sapiens from the Middle Pleistocene to recent times. We argue that temporal changes in human craniofacial morphology reflect reductions in average androgen reactivity (lower levels of adult circulating testosterone or reduced androgen receptor densities), which in turn reflect the evolution of enhanced social tolerance since the Middle Pleistocene.


Early Neandertal disappearance in Iberia

Journal of Human Evolution DOI: 10.1016/j.jhevol.2014.06.002

New evidence of early Neanderthal disappearance in the Iberian Peninsula

Bertila Galván et al.

The timing of the end of the Middle Palaeolithic and the disappearance of Neanderthals continue to be strongly debated. Current chronometric evidence from different European sites pushes the end of the Middle Palaeolithic throughout the continent back to around 42 thousand years ago (ka). This has called into question some of the dates from the Iberian Peninsula, previously considered as one of the last refuge zones of the Neanderthals. Evidence of Neanderthal occupation in Iberia after 42 ka is now very scarce and open to debate on chronological and technological grounds. Here we report thermoluminescence (TL) and optically stimulated luminescence (OSL) dates from El Salt, a Middle Palaeolithic site in Alicante, Spain, the archaeological sequence of which shows a transition from recurrent to sporadic human occupation culminating in the abandonment of the site. The new dates place this sequence within MIS 3, between ca. 60 and 45 ka. An abrupt sedimentary change towards the top of the sequence suggests a strong aridification episode coinciding with the last Neanderthal occupation of the site. These results are in agreement with current chronometric data from other sites in the Iberian Peninsula and point towards possible breakdown and disappearance of the Neanderthal local population around the time of the Heinrich 5 event. Iberian sites with recent dates (less than 40 ka) attributed to the Middle Palaeolithic should be revised in the light of these data.


July 15, 2014

k-means and structure

I was reading one of the many negative reviews of Nicholas Wade's new book when I came across this statement:
"The problem is that Structure, which uses an algorithm called “k-means,”"
I pointed out that Structure does not use k-means and a small discussion ensued on twitter. I see that the above statement has now been removed from the article, but an endnote on the topic remains:
*Originally, I wrote that STRUCTURE uses the k-means algorithm. Some population geneticists thought that I oversimplified what STRUCTURE does. Different clustering algorithms make different assumptions. STRUCTURE is indeed very similar to k-means, but with a particular error structure – binomial instead of gaussian. This is a fine technical detail compared with the principal point, which is that k is picked by the user, and does not emerge from the data automatically. To learn more, see this Twitter chain and this and this. Thanks to Graham Coop at UC Davis.
I did not intend to spend more time on this, but since the author of the article invited me to comment at more than 140 characters on the topic, I thought it was a good idea to do so.

k-means is completely unrelated to the structure algorithm of Pritchard and Stephens. Remember that structure can be run in either a no-mixture or a mixture mode. In both modes, the input is a set of N individuals and K, the number of ancestral populations. In the no-mixture mode, individuals are assigned to one of K populations, while in the mixture mode, their ancestry proportions from K populations are inferred. (Incidentally, allele frequencies in the K ancestral populations are also inferred, although usually not reported).

k-means has no mixture mode, but rather it is a clustering algorithm which assigns individuals to K populations. Thus, it can be used to solve the same problem as the no-mixture mode of structure. The two algorithms solve this problem in entirely different ways. Saying that structure uses k-means is equivalent to saying that any partitioning method into k groups uses k-means.

More importantly, structure is commonly used in mixture mode, including in the landmark paper by Rosenberg et al. (2002) that both Wade and the author of the review refer to. In this mode, structure does not even solve the same problem as k-means. Rather than find some partitioning of N individuals into K disjoint clusters, it estimates the mixture proportions of each of N individuals into all K populations. In practice (including the paper by Rosenberg et al. 2002), many individuals often have most (or all) of their ancestry from one or a few of the K populations. If humans had no structure at a particular K, the algorithm could very well produce a jumbled mess of different colors. Instead it produces neat ancestral populations that correspond well to what may be instantly recognizable as major human groups.

The reader is invited to look at any standard implementation of k-means, such as the one in R to be convinced that k-means does not even produce the same output as structure. The point is a trivial one, but k-means estimates N parameters (the cluster label for each of N individuals), whereas structure estimates N(K-1) parameters (the mixture proportions of N individuals in K populations; only K-1 numbers are needed as they have to add up to unity).

The only thing these algorithms have in common is that they require that the user input K. This point has been used by the plethora of negative reviews of Wade's book to argue that the classification of humans into biological races is arbitrary as it is subjective (it relies on user input of K).

This is a rather weak objection, for at least a couple of reasons: first, K can also be estimated from data and there are indeed clustering algorithms (such as fineStructure) that do not require user input of K and identify a value of K and organize the K ancestral populations into a hierarchical tree whose deep splits correspond exactly to the continental human races. Another popular algorithm, ADMIXTURE, proposes a cross-validation procedure to choose K. So, the choice of K can be automated and need not be subjective.

The more important reason against the "subjective K" objection is that it does not in any way invalidate the partitioning of humans into different K at different levels of granularity. This is reasonably easy to understand: the whole field of taxonomy divides living things into a hierarchical structure. In some cases it is useful to speak of vertebrates, and in others it's useful to speak of mammals, or primates, etc. In humans it's sometimes useful to speak of the entire species H. sapiens in contradistinction to other species, when studying what is common to humans, and sometimes it is useful to speak of major populations of H. sapiens (such as Europeans or East Asians), or minor ones (e.g., Mongols and Vietnamese), when studying how human groups differ from one another. These groupings are not arbitrary, but appear when biological traits (e.g., SNPs) are subjected to various types of analysis (including structure and similar algorithms).