May 31, 2011

Y-chromosome, mtDNA, and autosomal DNA from Treilles (5,000 years ago, Neolithic France)

The paper is behind a paywall, but there is plentiful raw genetic data in the online supplement. I'll probably have much more to say on this when I read it, but here's the groundbreaking part:

Most of this sample belonged to haplogroup G2a-P15 with some I2a-P37.2 also represented.

G2a was also one of the haplogroups represented in a small sample from Neolithic Central Europe. I think we can now safely say that G2a may have been the main Neolithic link that ties the farmers that went north across the Balkans to Central Europe, and those that followed the western, maritime route to the Western Mediterranean. The unambiguous West Asian origin of this lineage should put to rest any ideas about Neolithic farmers in the Western Mediterranean being descended from indigenous Mesolithic foragers.

I-P37.2 is also quite interesting, as it is tied to the Balkans, but also modern Southwestern Europe (it is especially frequent in Sardinia in its derived M26+ form). ISOGG tells me that:
I2-M438 et al includes I2* which shows some membership from Armenia, Georgia and Turkey; I2a-P37.2, which is the most common form in the Balkans and Sardinia. I2a1-M26 is especially prevalent in Sardinia. I2b-M436 et al reaches its highest frequency along the northwest coast of continental Europe. I2b1-M223 et al occurs in Britain and northwest continental Europe. I2b1a-M284 occurs almost exclusively in Britain, so it apparently originated there and has probably been present for thousands of years.
If these aren't signals of a maritime pioneer colonization that followed the maritime route along the Mediterranean and Atlantic, I don't know what is.

What is absent is also quite interesting as what is present. The absence of E1b1b is consistent with my theory about the Bronze Age Greek expansion of that haplogroup in Europe that has been tied to the historical Greeks of the West Mediterranean.

R-M269 which, because of its apparent young Y-STR age has been tied by some to either the Mediterranean or Central European Neolithic is conspicuous absently from both at the moment. It may yet surface in a Neolithic context, but its absence this late from a region where, today, it is abundant only adds to its mystery. The absence of J2 is equally mysterious, as this is another putative Neolithic lineage which has failed to appear so far in a Neolithic context, while its J1 sister clade did make an appearance in much later aboriginals from the Canary Islands.

UPDATE I: Interestingly, some French researchers had noted a littoral distribution of haplogroups I, J, G in the Finistère, on the Atlantic side.

UPDATE II: I was reviewing my Ancient Y-chromosome studies compendium and one thing starts to become clear: how many of the earliest samples we have were dominated by 1-2 haplogroups, whereas there is a plethora of haplogroups in most modern populations: Treilles, Krasnoyarsk, Xiaohe, Pengyang all belonged to a single haplogroup, while Yangtze China to several lineages, all of which were in the O haplogroup.

Look at the MDS plot of the Y-chromosome and mtDNA from Treilles:

The Y-chromosome is an extreme outlier compared to modern groups, probably because of its heavy G2a domination, whereas the mtDNA from Treilles appears just like a normal and unexceptional Mediterranean-type population.

Perhaps the modern Caucasus where particular ethnic groups are dominated by particular Y-haplogroups is a good analogy for prehistoric man, with many different groups with their signature haplogroups kept disjoint patrilineal gene pools before beginning to merge in late prehistorical and historical times.

UPDATE III: A poster at dna-forums as well as Ken Nordtvedt both agree that the I2a haplotypes belong to I-M26, a haplogroup that is modal in the SW Mediterranean, reaching very high frequencies in Sardinia. This may be consistent with the great biological continuity since the Neolithic in Sardinia, continuity which is also evident on the mtDNA. It also shows why the inference of pre-Neolithic I2a in Sardinia was flawed because of the use of the evolutionary mutation rate, while the origin and expansion of I-M26 in "genealogical rate" years becomes 5-7ky, consistent with the Neolithic origin of that haplogroup and the ancient DNA presence in Neolithic France.

UPDATE IV: Table S4 lists (in %) shared mtDNA lineages between Treilles and modern populations. The top ones are: Welsh (17.391), Cornish (16.667), Central Greeks (14.286), Bulgarians (12.5). Several Italian groups as well as South Tyrol Ladins and Germans are also greater than 10%.

UPDATE V: The G2a median joining network shows that the Treilles haplotypes are disjoint from those that dominate the North Caucasus, with clear links to the Middle East, Central/East Mediterranean regions, as well as the South Caucasus.

UPDATE VI: Some more good news: "The ancient DNA Lacan is now extracting from skeletons across France and Spain, Haak says, should provide more “piece[s] of the enormous puzzle we are trying to put together.”

PNAS doi: 10.1073/pnas.1100723108

Ancient DNA reveals male diffusion through the Neolithic Mediterranean route

Marie Lacan et al.

The Neolithic is a key period in the history of the European settlement. Although archaeological and present-day genetic data suggest several hypotheses regarding the human migration patterns at this period, validation of these hypotheses with the use of ancient genetic data has been limited. In this context, we studied DNA extracted from 53 individuals buried in a necropolis used by a French local community 5,000 y ago. The relatively good DNA preservation of the samples allowed us to obtain autosomal, Y-chromosomal, and/or mtDNA data for 29 of the 53 samples studied. From these datasets, we established close parental relationships within the necropolis and determined maternal and paternal lineages as well as the absence of an allele associated with lactase persistence, probably carried by Neolithic cultures of central Europe. Our study provides an integrative view of the genetic past in southern France at the end of the Neolithic period. Furthermore, the Y-haplotype lineages characterized and the study of their current repartition in European populations confirm a greater influence of the Mediterranean than the Central European route in the peopling of southern Europe during the Neolithic transition.


May 30, 2011

Tonal preferences in music/speech

PLoS ONE 6(5): e20160. doi:10.1371/journal.pone.0020160

Co-Variation of Tonality in the Music and Speech of Different Cultures

Shui' er Han et al.

Whereas the use of discrete pitch intervals is characteristic of most musical traditions, the size of the intervals and the way in which they are used is culturally specific. Here we examine the hypothesis that these differences arise because of a link between the tonal characteristics of a culture's music and its speech. We tested this idea by comparing pitch intervals in the traditional music of three tone language cultures (Chinese, Thai and Vietnamese) and three non-tone language cultures (American, French and German) with pitch intervals between voiced speech segments. Changes in pitch direction occur more frequently and pitch intervals are larger in the music of tone compared to non-tone language cultures. More frequent changes in pitch direction and larger pitch intervals are also apparent in the speech of tone compared to non-tone language cultures. These observations suggest that the different tonal preferences apparent in music across cultures are closely related to the differences in the tonal characteristics of voiced speech.


How to create Zombies from ADMIXTURE

An important new (semi-)technical post over at the Dodecad blog showing how to harness allele frequencies from unsupervised ADMIXTURE results to create dummy populations corresponding to the inferred ancestral populations, and, how to use these "zombies" to do cool things fast.

A side-story is that it's now possible to discover the origin of the enigmatic Kalash, a population that tends to form their own cluster; it turns out these Dardic speakers of Pakistan are of West Asian origin.

UPDATE (May 30): Another post, reconstructing ANI/ASI "zombies".

May 29, 2011

Dolgopolsky on the two homelands of PIE

A classic study of the problem, which makes the two most important linguistic points:
  1. Lexical borrowing between PIE and Kartvelian/Semitic languages places the early PIE homeland in the Near East
  2. The maximum dialectal diversity within IE in the Balkans places the secondary PIE homeland in Southeastern Europe
My only point of disagreement with Dolgopolsky's model is that the secondary Balkan homeland was responsible only for the European IE languages (and Armenian) but not for the eastern migration of the Tocharians and Indo-Iranians.

I have argued elsewhere in my blog about the West Asian origin of these Asian IE branches; a Balkan origin for them seems unlikely at the moment, due to the lack of Y-haplogroup I and of the "Southern European" component among the eastern Indo-Europeans. Of course, we must wait to see what surprises archaeogenetics may have in store for us.

Mediterranean Language Review 1987 3:7-31

The Indo-European Homeland and Lexical Contacts of Proto-Indo-European with Other Languages

A. Dolgopolsky

Link (doc)

May 28, 2011

Tight/loose culture influenced by a nation's past

Audio interview with lead author in NPR, also has a table with 10 loosest and 10 tightest cultures (left).

Another podcast in Science.

From the press release:
Gelfand and colleagues found that countries such as Japan, Korea, Singapore and Pakistan are much tighter whereas countries such as the Ukraine, Israel, Brazil, and the U.S. are looser. Their research further showed that a nation's tightness or looseness is in part determined by the ecological and human factors that have shaped its history – including wars, natural disasters, disease outbreaks, population density and scarcity of natural resources. Tight and loose societies also vary in their institutions—with tight societies having more autocratic governments, more closed media, and criminal justice systems that had more monitoring and greater deterrence of crime as compared to loose societies.

The study found that the situations that people encounter differ in tight and loose societies. For example, everyday situations—like being in park, a classroom, the movies, a bus, at job interviews, restaurants, and even one's bedroom—constrain behavior much more in tight societies and afford a wider range of behavior in loose societies.

"We also found that the psychological makeup of individual citizens varies in tight and loose societies," Gelfand said. "For example, individuals in tight societies are more prevention focused (attentive to rules), have higher self-regulation strength (more impulse control) and have higher needs for order and self-monitoring abilities than individuals in loose societies. These attributes, Gelfand said, help people to adapt to the level of constraint (or latitude) in their cultural context, and at the same time, reinforce it.

Science 27 May 2011:
Vol. 332 no. 6033 pp. 1100-1104
DOI: 10.1126/science.1197754

Differences Between Tight and Loose Cultures: A 33-Nation Study

Michele Gelfand et al.

With data from 33 nations, we illustrate the differences between cultures that are tight (have many strong norms and a low tolerance of deviant behavior) versus loose (have weak social norms and a high tolerance of deviant behavior). Tightness-looseness is part of a complex, loosely integrated multilevel system that comprises distal ecological and historical threats (e.g., high population density, resource scarcity, a history of territorial conflict, and disease and environmental threats), broad versus narrow socialization in societal institutions (e.g., autocracy, media regulations), the strength of everyday recurring situations, and micro-level psychological affordances (e.g., prevention self-guides, high regulatory strength, need for structure). This research advances knowledge that can foster cross-cultural understanding in a world of increasing global interdependence and has implications for modeling cultural change.


New Identity-by-Descent methods: MCMC and DASH

Genome Research doi: 10.1101/gr.115360.110

A method for detecting IBD regions simultaneously in multiple individuals—with applications to disease genetics

Ida Moltke et al.

All individuals in a finite population are related if traced back long enough and will, therefore, share regions of their genomes identical by descent (IBD). Detection of such regions has several important applications—from answering questions about human evolution to locating regions in the human genome containing disease-causing variants. However, IBD regions can be difficult to detect, especially in the common case where no pedigree information is available. In particular, all existing non-pedigree based methods can only infer IBD sharing between two individuals. Here, we present a new Markov Chain Monte Carlo method for detection of IBD regions, which does not rely on any pedigree information. It is based on a probabilistic model applicable to unphased SNP data. It can take inbreeding, allele frequencies, genotyping errors, and genomic distances into account. And most importantly, it can simultaneously infer IBD sharing among multiple individuals. Through simulations, we show that the simultaneous modeling of multiple individuals makes the method more powerful and accurate than several other non-pedigree based methods. We illustrate the potential of the method by applying it to data from individuals with breast and/or ovarian cancer, and show that a known disease-causing mutation can be mapped to a 2.2-Mb region using SNP data from only five seemingly unrelated affected individuals. This would not be possible using classical linkage mapping or association mapping.


The American Journal of Human Genetics, doi:10.1016/j.ajhg.2011.04.023

DASH: A Method for Identical-by-Descent Haplotype Mapping Uncovers Association with Recent Variation

Alexander Gusev et al.

Rare variants affecting phenotype pose a unique challenge for human genetics. Although genome-wide association studies have successfully detected many common causal variants, they are underpowered in identifying disease variants that are too rare or population-specific to be imputed from a general reference panel and thus are poorly represented on commercial SNP arrays. We set out to overcome these challenges and detect association between disease and rare alleles using SNP arrays by relying on long stretches of genomic sharing that are identical by descent. We have developed an algorithm, DASH, which builds upon pairwise identical-by-descent shared segments to infer clusters of individuals likely to be sharing a single haplotype. DASH constructs a graph with nodes representing individuals and links on the basis of such segments spanning a locus and uses an iterative minimum cut algorithm to identify densely connected components. We have applied DASH to simulated data and diverse GWAS data sets by constructing haplotype clusters and testing them for association. In simulations we show this approach to be significantly more powerful than single-marker testing in an isolated population that is from Kosrae, Federated States of Micronesia and has abundant IBD, and we provide orthogonal information for rare, recent variants in the outbred Wellcome Trust Case-Control Consortium (WTCCC) data. In both cohorts, we identified a number of haplotype associations, five such loci in the WTCCC data and ten in the isolated, that were conditionally significant beyond any individual nearby markers. We have replicated one of these loci in an independent European cohort and identified putative structural changes in low-pass whole-genome sequence of the cluster carriers.


Rapidly mutating Y-STR panel

This should also be of interest to genealogists, as the ability to tell apart very closely related individuals is essential to proving a genealogical link, rather than a more general link of an individual to a broader patrilineal kinship group.

Forensic Sci Int Genet. 2011 May 23. [Epub ahead of print]

A new future of forensic Y-chromosome analysis: Rapidly mutating Y-STRs for differentiating male relatives and paternal lineages.

Ballantyne KN, Keerl V, Wollstein A, Choi Y, Zuniga SB, Ralf A, Vermeulen M, de Knijff P, Kayser M.


The panels of 9-17 Y-chromosomal short tandem repeats (Y-STRs) currently used in forensic genetics have adequate resolution of different paternal lineages in many human populations, but have lower abilities to separate paternal lineages in populations expressing low Y-chromosome diversity. Moreover, current Y-STR sets usually fail to differentiate between related males who belong to the same paternal lineage and, as a consequence, conclusions cannot be drawn on the individual level as is desirable for forensic interpretations. Recently, we identified a new panel of rapidly mutating (RM) Y-STRs, composed of 13 markers with mutation rates above 1×10(-2), whereas most Y-STRs, including all currently used in forensics, have mutation rates in the order of 1×10(-3) or lower. In the present study, we demonstrate in 604 unrelated males sampled from 51 worldwide populations (HGDP-CEPH) that the RM Y-STRs provide substantially higher haplotype diversity and haplotype discrimination capacity (with only 3 haplotypes shared between 8 of the 604 worldwide males), than obtained with the largest set of 17 currently used Y-STRs (Yfiler) in the same samples (33 haplotypes shared between 85 males). Hence, RM Y-STRs yield high-resolution paternal lineage differentiation and provide a considerable improvement compared to Yfiler. We also find in this worldwide dataset substantially less genetic population substructure within and between geographic regions with RM Y-STRs than with Yfiler Y-STRs. Furthermore, with the present study we provide enhanced data evidence that the RM Y-STR panel is extremely successful in differentiating between closely and distantly related males. Among 305 male relatives, paternally connected by 1-20 meiotic transfers in 127 independent pedigrees, we show that 66% were separated by mutation events with the RM Y-STR panel whereas only 15% were with Yfiler; hence, RM Y-STRs provide a statistically significant 4.4-fold increase of average male relative differentiation relative to Yfiler. The RM Y-STR panel is powerful enough to separate closely related males; nearly 50% of the father and sons, and 60% of brothers could be distinguished with RM Y-STRs, whereas only 7.7% and 8%, respectively, with Yfiler. Thus, by introducing RM Y-STRs to the forensic genetic community we provide important solutions to several of the current limitations of Y chromosome analysis in forensic genetics.


May 26, 2011

Accurate fake smile detecting collectivist Chinese

Random bit of knowledge from the paper:
In Eastern cultures, especially China, “one must NOT show ones' teeth when smiling” is a strict rule of discipline for women that has lasted thousands of years, ever since the Tang Dynasty (so the Mona Lisa's smile could also have been appreciated by ancient Chinese). Ancient Chinese women even used adornments around the mouth (e.g., fake dimples) to compensate for the lack of emotional information conveyed by the mouth during their closed-mouth smiles. A good example of such an historic and prevalent influence of cultural value on the role of the mouth in smiles can be illustrated by contrasting the smile emoticons used on the Internet by Easterners and Westerners. In common Western smile emoticons such as :-) or :), the mouth is exaggerated with a crimped line whereas the eyes are simplified as two dots. As a contrast, Japanese use smile emoticons with a simplified mouth but crimped eyes, e.g., (ʌ.ʌ) or (ʌ_ʌ) [16], [17], [18]. Chinese, especially females, go even further by not only adopting simplified mouth and crimped eyes, but also inusing them with the ancient tradition of attaching fake dimples, e.g., (*ʌ_ʌ*), ( = ʌ_ʌ = ), or (@ʌ_ʌ@) [19].
I think the distinction between Western and Eastern here may not accurately capture the temporal dynamics of this process. If one looks back at Western art -the "Mona Lisa example is apt- I think one would be hard to find examples of the big bright smile that seems to be favored in much of Western culture today.

Indeed, when I was collecting pictures for my "women from the 40s vs. women from the 2000s facial composites", I noticed how difficult it was to find pictures of the modern women that did not adopt the typical wide smile. I don't remember ever seeing an ancient Greek depiction of a modern-type smile, and the Greek meidiama was a decidedly closed-mouth affair, with the occasional depiction of satyrs slightly deviating towards a more open-mouth smile, but of a malevolent rather than pleasant bent.

Going back to the East Asian smile, this picture of an anime character seems to match the emoticon quite well, also making it obvious why "the eyes have it" when it comes to accurate detection of smiling:

PLoS ONE 6(5): e19903. doi:10.1371/journal.pone.0019903

Eyes Are Windows to the Chinese Soul: Evidence from the Detection of Real and Fake Smiles

Xiaoqin Mai et al.

How do people interpret the meaning of a smile? Previous studies with Westerners have found that both the eyes and the mouth are crucial in identifying and interpreting smiles, yet less is known about Easterners. Here we reported that when asking the Chinese to judge the Duchenne and non-Duchenne smiles as either real or fake, their accuracy and sensitivity were negatively correlated with their individualism scores but positively correlated with their collectivism scores. However, such correlations were found only for participants who stated the eyes to be the most useful references, but not for those who favored the mouth. Moreover, participants who favored the eyes were more accurate and sensitive than those who favored the mouth. Our results thus indicate that Chinese who follow the typical Eastern decoding process of using the eyes as diagnostic cues to identify and interpret others' facial expressions and social intentions, are particularly accurate and sensitive, the more they self-report greater collectivistic and lower individualistic values.


May 24, 2011

Michael Frachetti on the Inner Asian Mountain Corridor

Here is another video of Michael Frachetti's talk in the Secrets of the Silk Road symposium:

The abstract of the talk:

Abstract - Seeds for the Soul: East/West Diffusion of Domesticated Grains along the Inner Asian Mountain Corridor.
Inner Asia has commonly been conceived as a region of Nomadic societies surrounded by agricultural civilizations throughout Antiquity. Societies of China, SW Asia, and Eastern Europe each developed agriculture in the Neolithic, while the earliest evidence for agriculture from the Eurasian steppe shows it was not a major part of local economies until the Iron Age (c. 700 BC). Newly discovered botanical evidence of ancient domesticated wheat and millet at the site of Begash in Kazakhstan, however, show that mobile pastoralists of the steppe had access to domesticated grains already by 2300 BC and that they were likely essential to the diffusion of wheat into China, as well as millet into SW Asia and Europe in the mid-3rd millennium BC. Currently, Begash provides the only directly dated botanical evidence of these crisscrossed channels of interaction. Whatsmore, the seeds from Begash were found in a ritual cremation context rather than domestic hearths. This fact may suggest that the earliest transmission of domesticated grains between China and SW Asia was sparked by ideological, rather than economic forces. This paper describes the earliest known evidence of wheat in the Eurasian steppes and explores the extent of ritual use of domesticated grains from China to SW Asia, across the Inner Asian mountains.

All in all a very enlightening talk that suggests that the mountain corridor south of the Caspian Sea, lands that would later be part of the Silk Road was the main conduit for cultural exchange between east and west, with Begash having the earliest presence of wheat in the steppe in a ritual context (more below).

His passing remark about the absence of grains east of the Don and all the way to Mongolia is interesting in terms of some of my recent comments.

Frachetti points out how misguided it is to view the Eurasian steppe as a uniform culture area, pointing out that the horse and cattle were more important in the European steppe, whereas goats were much more important in the Asian steppe with a full-blown pastoralist economy that did not depend on horses.

He thinks that domesticated wheat and millet moved in opposite directions (from West Asia and China) and arrived in Central Asia, a land formerly devoid of the cereals that were used in the great civilizations of the Aegean, Near East, South Asia, and China.

His inference that the wheat at Begash and Xiaohe had a ritual funerary use seems very well-argued, although over time wheat acquired an alimentary role as well. They basically find no grains anywhere on the site except at a cremation burial from an early period where wheat was deposited; the existence of a cremation burial is in itself interesting.

During the Q&A an attendee expresses incredulity that wheat would be used in such a context, but really I see no problem with it, as the offering of wheat in that context has a long history, and is, indeed, widely practiced even today.

At Begash we seem to be witnessing the beginnings of the spread of ideology to a steppe population. These steppe pastoralists seem to be adopting the use of wheat as a symbol of life, or "food for the dead", and the fact that they probably traded for this commodity suggests its symbolic importance to them.

Finding the founder of Stockholm

Birger and his son belonged to Y-haplogroup I1 and had haplogroup H and Z1a mtDNA respectively. The presence of Z1a is interesting, suggesting that occasional Asian mtDNA sequences in Swedes may have been present in that population from a fairly early historical period. The female had mtDNA U5b1.

The authors also tested for the lactase persistence allele: Birger was heterozygous, and the other two individuals had the T (persistent) allele.

The Y-chromosome results will be added to the Ancient Y-chromosome studies page.

Annals of Anatomy - Anatomischer Anzeiger

Finding the founder of Stockholm – A kinship study based on Y-chromosomal, autosomal and mitochondrial DNA

Helena Malmström et al.

Historical records claim that Birger Magnusson (died 1266), famous regent of Sweden and the founder of Stockholm, was buried in Varnhem Abbey in Västergötland. After being lost for centuries, his putative grave was rediscovered during restoration work in the 1920s. Morphological analyses of the three individuals in the grave concluded that the older male, the female and the younger male found in the grave were likely to be Birger, his second wife Mechtild of Holstein and his son Erik from a previous marriage. More recent evaluations of the data from the 1920s seriously questioned these conclusions, ultimately leading to the reopening and reexamination of the grave in 2002. Ancient DNA-analyses were performed to investigate if the relationship between the three individuals matched what we would expect if the individuals were Birger, Erik and Mechtild. We used pyrosequencing of Y-chromosomal and autosomal SNPs and compared the results with haplogroup frequencies of modern Swedes to investigate paternal relations. Possible maternal kinship was investigated by deep FLX-sequencing of overlapping mtDNA amplicons. The authenticity of the sequences was examined using data from independent extractions, massive clonal data, the c-statistics, and real-time quantitative data. We show that the males carry the same Y-chromosomal haplogroup and thus we cannot reject a father–son type of relation. Further, as shown by the mtDNA analyses, none of the individuals are maternally related. We conclude that the graves indeed belong to Birger, Erik and Mechtild, or to three individuals with the exact same kind of biological relatedness.


The reality of the Altaic language family

Personally I'm not surprised by this; my own look at genomic data has identified an "Altaic" component which peaks at the Turkic Yakut and Tungusic Evenk, and is shared by every Turkic, Mongolic, and Tungusic population available to me. The same component also occurs to some extent among all the Japanese (5) and Korean (4) members of the Dodecad Project, while it is lacking in all the Chinese ones (8).

Of particular interest is the degree of CCM between Indo-European and Semitic languages (Tables 2 and 3). In many of the most geographically distant languages these are less than 10; by comparison, among Semitic languages the are all greater than 20. This seems to be quite in agreement with the idea that Semitic is a Bronze Age language family, Indo-European a Neolithic one.

This impression is strengthened by the fact that CCM between reconstructed proto-languages (e.g. Proto-Iranian and Proto-Slavic = 20) are much higher. Since these proto-languages are a few thousand years closer to the root of PIE than present-day languages, and differences between them are similar to those of Semitic languages, the notion that PIE is a few thousand years older than Proto-Semitic seems quite consistent with the evidence.

Journal of Language Relationship • Вопросы языкового родства • 3 (2010) • Pp. 117–126 • © Turchin P., Peiros I., Gell-Mann M., 2010

Analyzing genetic connections between languages by matching consonant classes

Peter Turchin (University of Connecticut)
Ilia Peiros (Santa Fe Institute)
Murray Gell-Mann (Santa Fe Institute)

The idea that the Turkic, Mongolian, Tungusic, Korean, and Japanese languages are genetically related (the “Altaic hypothesis”) remains controversial within the linguistic community. In an effort to resolve such controversies, we propose a simple approach to analyzing genetic connections between languages. The Consonant Class Matching (CCM) method uses strict phonological identification and permits no changes in meanings. This allows us to estimate the probability that the observed similarities between a pair (or more) of languages occurred by chance alone. The CCM procedure yields reliable statistical inferences about historical connections between languages: it classifies languages correctly for well-known families (Indo-European and Semitic) and does not appear to yield false positives. The quantitative patterns of similarity that we document for languages within the Altaic family are similar to those in the non-controversial Indo-European family. Thus, if the Indo-European family is accepted as real, the same conclusion should also apply to the Altaic family.

Link (pdf)

May 23, 2011

Perceptions of reverse racism on the increase

From the press release:
Whites believe that they are replacing blacks as the primary victims of racial discrimination in contemporary America, according to a new study from researchers at Tufts University's School of Arts and Sciences and Harvard Business School. The findings, say the authors, show that America has not achieved the "post-racial" society that some predicted in the wake of Barack Obama's election.

Both whites and blacks agree that anti-black racism has decreased over the last 60 years, according to the study. However, whites believe that anti-white racism has increased and is now a bigger problem than anti-black racism.
Perspectives on Psychological Science May 18, 2011 vol. 6 no. 3 215-218

Whites See Racism as a Zero-Sum Game That They Are Now Losing
Michael I. Norton and Samuel R. Sommers


Although some have heralded recent political and cultural developments as signaling the arrival of a postracial era in America, several legal and social controversies regarding “reverse racism” highlight Whites’ increasing concern about anti-White bias. We show that this emerging belief reflects Whites’ view of racism as a zero-sum game, such that decreases in perceived bias against Blacks over the past six decades are associated with increases in perceived bias against Whites—a relationship not observed in Blacks’ perceptions. Moreover, these changes in Whites’ conceptions of racism are extreme enough that Whites have now come to view anti-White bias as a bigger societal problem than anti-Black bias.

May 22, 2011

Horse not important for the emergence of steppe pastoralism

The earliest horses from the Botai culture of Kazakhstan were used for the mares' milk and were hunted for food. It has also been suggested that the horse has been instrumental in the early emergence of Eurasian pastoralism. If that is true, then we expect to find horse remains in steppe pastoralist cultures in addition to domesticated animals (goats and cattle, the pig is lacking).

A paper by Frachetti and Benecke looked at the chronological sequence of the Begash culture from southeastern Kazakhstan. Surprisingly, they found no horse bones in the earliest period, a few ones in subsequent periods, while 14 per cent of the animals were horses only in the later (post-Mongolian) phase.

From the paper:
The relative frequencies show a steady increase of this species through time, from 2 per cent in Phase 1b to 14 per cent in the Phases 5 and 6. Whether the lack of horses in the earliest phase of occupation (Phase 1a) is an artefact of the small size of the total faunal collection or was a reality remains an open question. The second half of the third millennium BC, which roughly corresponds with Phase 1a, is considered as the period when horse domestication flourished in Western Asia (Benecke & von den Driesch 2003). Nevertheless, percentages of horse remains at Begash remain below 6 per cent until approximately AD 50 (Phase 3b).

The steady increase in horses in the faunal record does correlate with documented political and social expansions of eastern Eurasian mobile pastoralists in the mid-first millennium BC. For example, the increase in horses in Phase 3 (750 BC-AD 50) corresponds with the growth of nomadic steppe confederacies such as the Saka and Wusun (Chang et al. 2003; Rogers 2007).


The domestic horse is documented at Begash by the start of the second millennium BC, but its impact on pastoralism is not clear. It is true that by increasing their use of the horse throughout the Iron Age and later periods, the inhabitants of Begash likely improved their mobility and access to pastures across various ecological niches for their primary herd animals. Nevertheless, the zooarchaeological record from Begash illustrates that the increase in horses through time correlates first with opportunistic hunting forays at the end of the Bronze Age and then with expanding political engagements that undoubtedly reshaped the organisation of Eurasian pastoralist communities from the first millennium BC onward.

When compared to the relative stability of other domesticates at Begash, the small Bronze Age presence and limited expansion of horses in the faunal record before historic periods demands that we reconsider the degree to which domestic horses played a dominant role in emerging pastoralist lifeways or in aiding the diffusion of regional material culture throughout prehistory. Specifically, there is not ample evidence for extensive horse riding during the second millennium BC at Begash. To the degree that Begash is indicative of other pastoralist settlements in the region, the faunal evidence directly challenges the image of middle to late Bronze Age pastoralists (2000-1000 BC) as derived from migrating horse-riders (Kuz’mina 2008) and suggests that horse riding was not the most significant catalyst for regional diffusion at this point in prehistory. This does not demote the importance of domestic horse riding as a key innovation for Eurasian populations in general or defray its historical impact on the region write large; rather these data suggest there were other mechanisms by which pastoralism, material culture, and ideology developed among regional populations in the third and second millennia BC (Frachetti 2008a).
The early "cowboys of the steppe" paradigm is slowly collapsing. Certainly the horse was known, milked, eaten, and occasionally ridden on the steppes, but its central role in the emergence of Eurasian pastoralism has been ovestated on rather flimsy evidence.

It is only in the 1st millennium BC when it is picked up by Iranic/Turkic warrior confederations that the horse starts to affect Eurasia in a big way, and that is precisely the time when the Scythians appear in West Eurasia from their eastern homeland, followed centuries later by nomadic groups, from the west and north making their presence felt in China.

Volume: 83 Number: 322 Page: 1023–1037

From sheep to (some) horses: 4500 years of herd structure at the pastoralist settlement of Begash (south-eastern Kazakhstan)

Michael Frachetti and Norbert Benecke

Does the riding of horses necessarily go with the emergence of Eurasian pastoralism? Drawing on their fine sequence of animal bones from Begash, the authors think not. While pastoral herding of sheep and goats is evident from the Early Bronze Age, the horse appears only in small numbers before the end of the first millennium BC. Its adoption coincides with an increase in hunting and the advent of larger politically organised


More on Out of North Africa

I already covered this commentary (Was North Africa the Launch Pad for Modern Human Migrations?) back in January when it appeared in Science, but it's a great time to read it again, as the redrawing of the human Y-chromosome tree may vindicate its perspective, and indicate that Out-of-Africa was in reality Out-of-North Africa.

In any case, here's a pdf copy (and another) of the Balter piece, as well as a podcast in which he talks about "the growing evidence that North Africa was the original home of the modern humans who first trekked out of the continent".

Here is another spooky date coincidence with our new 142ka forefather:
During the past 2 years, the dates have gotten even older. In 2009, Barton and Abdeljalil Bouzouggar of the National Institute of Archaeological and Heritage Sciences in Rabat reported OSL dates of at least 110,000 years from the Aterian site
of Dar es-Soltan in Morocco; and in a new volume edited by Garcea, the team reports
similarly old dates from three other Moroccan caves. Then in September, TL dates of about 145,000 years were reported for Ifrin’Ammar in Morocco. “The Aterian goes back at least 145,000 years,” Stringer says.“That’s an incredible length of time.”
PS: This is probably also a good place to remind readers of the story of Kiffians and Tenerians. Methinks that the Kiffians may have been some of the last unadmixed indigenous descendants of the ancestral Saharan population during its latest Holocene wet phase; these late hunter-gatherers were eventually replaced by the tide of farmers on both sides of the Sahara, but perhaps some of their genes live on in NW Africa.

May 20, 2011

Is Jebel Irhoud the Father of mankind?

The redating of the human Y-chromosome phylogeny to about 142 thousand years ago and the relocation of its most ancient lineages from east and south Africa to the Northwest marks a watershed moment in our understanding of human prehistory.

It is a fortuitous coincidence that there actually is a sample of humans from Northwest Africa from around the same time: Jebel Irhoud, about 160 thousand years ago from Morocco:
Jebel Irhoud is a cave site located about 100 km west of Marrakech, Morocco. The site is known for the numerous hominid fossils discovered there. Currently, the site has yielded seven specimens. The best known of these are portions of two adult skulls, Irhoud 1 and 2, a child’s mandible (Irhoud 3), and a child’s humerus (Irhoud 4). Fossils 1-3 were discovered while the cave was being quarried for barytes and thus their exact context and age has been subject to debate. Originally the Irhoud hominids were considered North African Neandertals. It is now clear that they are best grouped with other early anatomically modern humans such as Qafzeh (Israel) and Skhul (Israel).

A 2007 article by Smith et al. is extremely important for this population:
Earliest evidence of modern human life history in North African early Homo sapiens

Tanya M. Smith et al.

Recent developmental studies demonstrate that early fossil hominins possessed shorter growth periods than living humans, implying disparate life histories. Analyses of incremental features in teeth provide an accurate means of assessing the age at death of developing dentitions, facilitating direct comparisons with fossil and modern humans. It is currently unknown when and where the prolonged modern human developmental condition originated. Here, an application of x-ray synchrotron microtomography reveals that an early Homo sapiens juvenile from Morocco dated at 160,000 years before present displays an equivalent degree of tooth development to modern European children at the same age. Crown formation times in the juvenile's macrodont dentition are higher than modern human mean values, whereas root development is accelerated relative to modern humans but is less than living apes and some fossil hominins. The juvenile from Jebel Irhoud is currently the oldest-known member of Homo with a developmental pattern (degree of eruption, developmental stage, and crown formation time) that is more similar to modern H. sapiens than to earlier members of Homo. This study also underscores the continuing importance of North Africa for understanding the origins of human anatomical and behavioral modernity. Corresponding biological and cultural changes may have appeared relatively late in the course of human evolution.
In the recent paper on the Ceprano calvarium, Irhoud 1 clearly belonged in the modern human cluster, and so it was in my re-analysis of that data, as were skulls from the Sudan and Tanzania in Africa, and the Qafzeh/Skhul early skulls from the Levant.

It thus seems to me, that the earliest modern human skulls are found in North/East Africa and West Asia, while the root of the Y-chromosome phylogeny is provisionally in Northwest Africa and seems to be in agreement with the autosomal evidence for a bottleneck in the human population at around 150,000 years ago.

Here is the interesting part: Irhoud had been once seen as a Neandertal. Indeed, it displayed some Neandertal-like leanings in a previous analysis. The consensus now (supported by the results of Mounier et al.) seems to be that it was modern human, but the Neandertal connection does not stop there:

The lithic industries of Jebel Irhoud were Mousterian, the same as Neandertals. Mousterian industries link European Neandertals, with modern humans in North Africa and the Near East. The Mousterian industries represented a genuine progress over the Acheulean tools that archaic humans had been using for hundreds of thousands of years before, and they, in turn, were replaced by the Aurignacian at exactly the time that Cruciani et al. date the main Y-chromosome CT clade that encompasses all Eurasians and most Africans.

The evidence seems to be in astonishing agreement with my hypothesis about the so-called "Neandertal admixture" in modern humans:
  • Early modern humans originated in North Africa, or at least somewhere between North and East Africa. Their traces may very well be hidden under the sands of the once (or thrice) green Sahara
  • They formed a clade with Neandertals, and used the same Mousterian tools, while humans elsewhere continued to use the older Acheulean ones. Both of them could very well have descended from Homo heidelbergensis, although the transition is not yet clear.
  • They expanded briefly into West Asia after Marine Isotope Stage 5, 120,000 years or so ago, and appeared in the Levant (Skhul/Qafzeh). As the Sahara dried up, they must've spread both to West Asia, and deeper into Africa, and, not surprisingly, the next major branching of the Y-chromosome phylogeny dates to about that time; this accounts for the deep (but not deepest) Y-chromosome lineages in modern day San.
  • Eventually (around 40,000 years ago, after the end of wet Marine Isotope Stage 3), they developed the even more advanced Aurignacian technology, and went on to conquer most of the world, driving the Neandertals to extinction. As the Sahara dried up, they expanded into Sub-Saharan Africa once again, and this time they inundated it with their genes.
Hence, the Modern-Neandertal affinity is not the result of any hypothetical admixture event between the two: Sub-Saharan Africans have also preserved some of the genetic legacy of the older Acheulean-using populations of the continent which shifts them somewhat away from other modern humans and Neandertals.

The model in brief

195 ka: Anatomically modern humans appear in East Africa (Omo skulls)
160 ka: Mousterian-using modern humans in North Afrca (Jebel Irhoud)
142 ka: Y-chromosome Adam
MIS 5 120-110 ka: Demographic expansion of modern humans in Sahara during this wet phase, followed by collapse as Sahara becomes dry; escape to West Asia and Sub-Saharan Africa; possible admixture with Neandertals and Acheulean-using Palaeoafricans respectively.
MIS 3 50-45 ka: Second demographic expansion of modern humans in Sahara, followed by collapse as Sahara becomes dry; development of Aurignacian
after 45 ka: Colonization of Eurasia and Sub-Saharan Africa by modern humans sensu stricto: anatomically and behaviorally modern people with advanced tools. Most human Y-chromosomes (belonging to C, DE, and F) coalesce to this period.
40 ka: Neandertals extinct after contact with modern humans (absorbed/killed?); some survive in periphery
Today: Most living humans descended from post 45-ka expansion people. Sub-Saharan Africans shift away from Eurasians due to a little archaic African admixture. Papuans shifted away from Eurasians due to "Denisovan" admixture. Contribution of other archaic hominins to regional Homo sapiens populations TBD.

The father of us all: 142 thousand years ago

  • Age estimates of the human Y-chromosome phylogeny root (Y-chromosome Adam) based on microsatellites were unreliable and depended on the set of fast- or slow-evolving markers one used
  • Y-chromosome Adam may have lived much earlier than what you might have heard
  • We would eventually switch from microsatellites to SNPs which would provide a better age estimate of the "father of us all"
All these predictions have come true, thanks to a long-awaited new paper by Cruciani et al. that has just appeared in the American Journal of Human Genetics.

The new age of our male-line common ancestor is ~142 thousand years ago, which is in agreement with multiple new lines of evidence, as well as a recent genetic model in which the bottleneck that gave rise to modern humans occurred 150 thousand years ago, and not 60 thousand years ago, as it has commongly been thought/reported.

Cruciani et al. not only estimate a new age for Y-chromosome Adam, but also shake up the most basal clades of the tree, revising its deepest phylogeny:
To test the robustness of the backbone and the root of current Y chromosome phylogeny, we searched for SNPs that might be informative in this respect. To this aim, a resequencing analysis of a 205.9 kb MSY portion (183.5 kb in the X-degenerate and 22.4 kb in the X-transposed region) was performed for each of seven chromosomes that are representative of clade A (four chromosomes belonging to haplogroups A1a, A1b, A2, and A3), clade B, and clade CT (two chromosomes belonging to haplogroups C and R) (Table S1 available online).
The Y-chromosome phylogeny as it was understood until today had haplogroup A as the first lineage to branch off, followed by haplogroup B, with almost all non-African belonging to the remainder haplogroup CT.

If this phylogeny was correct, then all haplogroup A chromosomes would be placed on the same branch of the tree, and all non-A ones in another. But, this is not what the authors uncovered:
The deepest branching separates A1b from a monophyletic clade whose members (A1a, A2, A3, B, C, and R) all share seven mutually reinforcing derived mutations (five transitions and two transversions, all at non-CpG sites).

So, now the most basal clade of the Y-chromosome phylogeny is what was previously known as A1b. The strength of the evidence can be seen on the left, while the comparison of the "new" with the "old" phylogeny can be seen below:

The authors also came up with new age estimates for various nodes in the tree:
To estimate the age of ancestral nodes in the tree, we used the rho statistic,20,21 considering a germline MSY mutation rate of 1.0 × 10−9 single-nucleotide substitutions (SNS) per base per year.22 Indel variants were excluded from this calculation. We obtained a time estimate for the root of the MSY tree of 141.5 ± 15.6 KY, with an age in mutations (rho) of 29.1 ± 3.2 and values of 107.6 ± 12.2 KY, 104.9 ± 13.1 KY, 74.5 ± 12.5 KY, and 38.8 ± 9.7 KY for the coalescence age of A1a-T, A2-T, BT, and CT, respectively (Figure 1 and Table S3 )
The age for CT (38,800 years) is particularly interesting as it seems to correspond well to the Upper Paleolithic revolution. It seems that the vast majority of men have a recent male-line common ancesstor in the last 40,000 years, while a few are descended from Palaeoafricans who split at various times up to 100,000 years earlier than this.

It will only get better from here. The authors sequenced only about 200 thousand bases of the Y-chromosome. With the advent of cheap whole genome sequencing, this will be blown up for sure by a couple orders of magnitude: the confidence intervals will get smaller, the phylogeny may be revised at different levels of the tree, and many more internal nodes will be precisely dated.

200 thousand is plenty enough for dating the root of the tree, because there has been plenty of time for mutations to accrue since the first split between A1b and the rest. But, that is not the case for much younger splits in the tree, e.g., between sub-haplogroups R1a and R1b of the R1 clade. To get good estimates of those, we will need to sequence much more of the Y-chromosome, but, thankfully, that will become possible very soon, if it not already is.

Meet our newer older relatives

It has become fashionable to treat the Khoi-San people of South Africa as our most distant cousins that stayed behind in Africa. However, the revision of the phylogeny implies a different picture:
Four subjects (two Berbers from northwest Africa, one Tuareg and one Fulbe from Niger) were confirmed as belonging to clade A1a.24,29 It is worth noting that this clade was previously detected in west Africa, although at low frequencies.10,30,31,32 Three chromosomes from the Bakola pygmy group from southern Cameroon (central Africa) were found to carry the derived allele at V164, V166, V196, and P114 and were classified as A1b. Interestingly, one chromosome from an Algerian Berber group (northwest Africa) was found to carry the derived allele at V164, V166, and V196 but carried the ancestral one at P114, implying a bipartite structure for A1b, where P114 defines an internal node.
Between Cameroon and North Africa where the oldest A1b and A1a clades of the tree occur, it seems that our search for the ur-fathers of mankind has taken us to an unexpected quarter of Africa:
Third, contrary to previous phylogeny-based conclusions,15,16 the deepest clades of the revised MSY phylogeny are currently found in central and northwest Africa. MSY lineages from these regions coalesce at an older time (142 KY) than do those from east and south Africa (105 KY), opening new perspectives concerning early modern human evolution. A scenario of a Y chromosome “Adam” living in central-northwest Africa about 140 KY ago would provide a good fit to the present data. However, we also note that, because of the still largely incomplete geographic coverage of the African MSY diversity and unknown consequences of past population processes such as growth, extinction, and migration, any phylogeny-based inference on the geographical origin of human MSY diversity in Africa should be made with caution.
This ought to be a potent reminder that different genetic systems tell different stories. The autosomal evidence seems to suggest that the most divergent modern humans (from the rest) are the Mbuti Pygmies and Khoi-San, but Y-chromosomes point to Bakola and Berbers. Perhaps this is yet another much-needed wakeup call to consider more interesting within-Africa scenarios.

UPDATE (Haplogroup DE):

Haplogroup DE was not dated in this analysis, which was unfortunate. Karafet et al. used a SNP-counting method, that, unlike this one, relied on a calibration point which they set at 70ka for CT. They then estimated an age for DE of 65ka. In this study, CT was directly dated to 38.8ka, hence the age of DE using this calibration becomes ~36ka, but it would certainly be a good idea to compare D and E using the methodology of this paper directly. In any case, the expansion of DE also probably fits the scenario of human origins I describe in a new post.


It appears that Karafet et al. (2009) called CT the M168 node; this includes DE. They estimated an age for CF = 68.9ka and for DE = 65ka, based on a calibration for CT = 70ka. In any case, all these ages translate to about 36-39ka in the current paper's terms.

UPDATE II (May 23, DE):

Apparently the 1000 genomes also distinguishes between CT and DE:
The haplogroup tree classifies all the major haplogroups as monomorphic, and recovers the relationships between them, with high bootstrap confidence. It also shows evidence for a deep division between haplogroups DE and CT, previously identified only by a single marker (P143).

Related: Phylogeographic analysis of Y-haplogroups A and B

The American Journal of Human Genetics, 19 May 2011

A Revised Root for the Human Y Chromosomal Phylogenetic Tree: The Origin of Patrilineal Diversity in Africa

Fulvio Cruciani et al.

To shed light on the structure of the basal backbone of the human Y chromosome phylogeny, we sequenced about 200 kb of the male-specific region of the human Y chromosome (MSY) from each of seven Y chromosomes belonging to clades A1, A2, A3, and BT. We detected 146 biallelic variant sites through this analysis. We used these variants to construct a patrilineal tree, without taking into account any previously reported information regarding the phylogenetic relationships among the seven Y chromosomes here analyzed. There are several key changes at the basal nodes as compared with the most recent reference Y chromosome tree. A different position of the root was determined, with important implications for the origin of human Y chromosome diversity. An estimate of 142 KY was obtained for the coalescence time of the revised MSY tree, which is earlier than that obtained in previous studies and easier to reconcile with plausible scenarios of modern human origin. The number of deep branchings leading to African-specific clades has doubled, further strengthening the MSY-based evidence for a modern human origin in the African continent. An analysis of 2204 African DNA samples showed that the deepest clades of the revised MSY phylogeny are currently found in central and northwest Africa, opening new perspectives on early human presence in the continent.


On Tocharian origins

Where did the Tocharians originate from? J.P. Mallory's recent talk has been somewhat of an eye-opener for me, as Prof. Mallory brought to my attention two important issues:
  1. The lack of a clear connection between Afanasyevo and the Tarim Basin.
  2. The existence (in Tocharian) of a rich agricultural IE terminology related to cereals, as well as the domesticated pig, which cannot be easily explained if Tocharians arrived in Xinjiang from the steppes to the north, and, ultimately from eastern Europe.
To begin with, I want to point out an important issue: we cannot assume that the earliest Caucasoids of Xinjiang, including some of the famous early Tarim mummies were Tocharian speaking. There are several arguments why this is so:
  1. Tocharian is first attested in the 8th c. AD, that is, about 3 thousand years after the earliest detected Caucasoids in the region
  2. There has been a shift in the region from Tocharian and eastern Iranian languages to Turkic over the last thousand years or so. Why assume linguistic continuity in the preceding three thousand?
  3. Indeed, there has been linguistic shift throughout other regions of Eurasia in shorter timespans, such as the spread of Slavic across most of eastern Europe, the virtual extinction of Celtic in most of western Europe, the replacement of multiple languages by Arabic in the Near East, and so on. Linguistic continuity does not seem to be an appropriate default position in the absence of direct evidence.
  4. The earliest Caucasoids of the Tarim were already substantially mixed with Mongoloids at least in their mtDNA. This reduces our confidence that they spoke an Indo-European language, as there is a pattern of Caucasoid patrilineages combined with Mongoloid mtDNA in present-day non-IE South Siberians
  5. Indeed, the current Turkic Uyghurs, who are closer (temporally) to the Tocharians than the early Bronze Age Caucasoids have a rich assortment of Caucasoid Y-chromosome haplogroups, whereas the early Bronze Age ones seem to have belonged uniformly to R1a1. What languages were spoken by the non-R1a1 Caucasoids who arrived in the Tarim prior to the Turkification of the region?
To summarize the first part of the argument: the early population of the Tarim does not have clear steppe connections, it may not have been Indo-European speaking, and even if it were, it did not necessarily speak the same language as the later Tocharians. Moreover, the Tocharian language has a vocabulary without clear steppe associations, but with rich agricultural ones.

In search of the Tocharians

We may discover the origin of the Tocharians by a careful sorting of Y-chromosome lineages in the present-day Uyghur population of Xinjiang that is assumed to have absorbed the pre-Turkic inhabitants of the region:
  1. Remove all east Eurasian lineages that are likely to be associated with the Xiongnu, Mongols, or Uyghur
  2. Remove all west Eurasian lineages that can be explained from a non-Tocharian source (such as Iranians, or various Silk Road outliers)
  3. See if anything is left
A recent paper by Zhong et al. provides rich data on Uyghurs that can be used to carry out this program.

The phylogeographic analysis of these lineages does leave some candidates:
  1. Haplogroup D can be excluded as Mongolian/Tibetan
  2. Haplogroup E can be excluded as Mediterranean/African
  3. Haplogroup C can be excluded as Altaic/South Asian (C5)
  4. Haplogroup G2a* (West Asian) does not seem to have an important presence (3 samples)
  5. Haplogroup H can be excluded as South Asian
  6. Haplogroup I can be excluded as a European outlier (1 sample)
  7. Haplogroup J*(xJ2) can be excluded as NE Caucasian/Semitic with small presence (2 samples)
  8. Haplogroup NO; haplogroup N has been founded in a Xiongnu context, so it is likely intrusive; O is East Eurasian
  9. Haplogroup Q is also associated with Xiongnu nomads from Pengyang
This analysis leaves four candidates: J2-M172, R1a1a-M17, R1b-M343, and L-M20.

We can exclude L-M20 because its overall low frequency in most populations makes it difficult, at present, to make a definitive pronouncement on its origin, except perhaps for its Indian L1 clade which is absent here.

J2, present in both its J2a and J2b subclades here at substantial frequencies has an origin in West Asia, as well as a substantial presence among Indo-Iranian speakers. While it is possible (indeed likely, in my opinion) to have been present among the Tocharians, we cannot exclude the possibility that it represents either a specifically Iranian influence, or even something earlier than both.

R1a1a is present in both the steppe, as well as South Asia and West Asia. Its high frequency among some Indo-Iranian populations also makes it difficult to ascribe a specifically Tocharian origin to it.

This leaves only R1b-M343 as a candidate. Have we found a genuine Tocharian genetic signature?

The West Asian roots of R-M343 (?)

R-M343 and its main R-M269 clade are in a sense exasperating: the combination of their widespread distribution from Africa, the Atlantic, to the depths of Inner Asia, combined with their apparent Y-STR-estimated youth make it nearly impossible to associate them with a specific archaeological or historical phenomenon.

Where could R-M269 have come from? It was not present, as far as we can tell, in early Bronze Age Xinjiang, and neither has it been detected in south Siberians. The steppe/"northern" route seems out.

A southern route, from the Indian subcontinent also seems out, as despite its ubiquity elsewhere in Eurasia, it seems to have (mostly) skipped both India and (to an extent) Pakistan.

An indigenous origin seems highly unparsimonious, as it would require that it trek all the way to the Atlantic, but make hardly an impact in either East Asia or South Asia.

As far as I can tell, the only explanation for the presence of R-M343 in Xinjiang is West Asia, or at least Central Asia west of the Tarim. There it can be found at a high frequency in Armenians, Turks, north Iranians, and Lezgins among others. And, unlike both J2 and R1a1a, R-M343 does not seem to be Indo-Iranian (due to its absence in India).

Gamkrelidze and Ivanov cited W. N. Henning to the effect that the ancestors of the Tocharians could be identified with the Gutians from the Zagros, a people that attacked the Sumerians and founded a dynasty. As usual, I don't presume to know the linguistic evidence for this, but this hypothesis would place the ancestors of the Tocharians in the "right spot": virtually all of their Caucasoid Y-chromosome gene pool could be explained with an origin in north Iran.

A model of Tocharian origins

The model of Tocharian origins I present is simplicity itself:

First, Tocharians are descended from a group of farmers that moved east of the PIE homeland and settled on the Zagros and beyond, south of the Caspian sea.

Second, their trek to the Tarim was a simple west-to-east movement along what would later become the Silk Road, beyond the Taklamakan desert and into the Tarim basin. There they must've mixed with the early pre-IE mixed Caucasoid/Mongoloid population of the early Bronze Age. The desert probably sheltered them, to an extent, from encroachments by the Iranians.

An open question remains: were the Tocharians late fugitives who were pushed out of their ancestral homelands by the emergence of the Iranians and entered the Tarim late? Or were they established there fairly early and were the historical Tocharians are the eastern relics of a once great people that was not Iranized unlike most of the people of Central Asia?

Autosomal evidence

The fine-scale analysis of the Dodecad project on a sample of 10 Uyghurs provides some additional evidence:

The Uyghurs seem to lack the Southwest Asian component that is ubuiquitous in most of West Asia today, and may have, in large part, expanded with the more recent spread of Semitic languages. They are similar, in that respect with South Asians, suggesting that neither the spread of Islam to the east nor the cosmopolitanism of the Silk Road were enough to bring this component to the region. Hence, the plethora of Caucasoid Y-haplogroups in the region cannot be attributed to recent arrivals.

The absence of specific South European components in them also suggests that the opinion of some linguists and archaeologists that would see the Tocharians related to Celts and moving from deep within Europe, or even Western Europe to the Tarim, are unlikely; the south European component is ubuiquitous in Europe, and the Uyghurs, like South Asians, seem to lack it entirely.

Their Caucasoid components are primarily West Asian and North European. Projecting them on the East Eurasian/West Asian/North European PCA plot (left), it is clear that they are more West Asian than North European, a result that is in agreement with their ADMIXTURE results.

Notice also how the North European/West Asian ratio is reversed for the more northern-latitude Uralic/Altaic speakers (Selkups, Dolgans, etc.).

Of course, the results should be interpreted with caution, but they seem perfectly in agreement with the model presented here:
  • the Uyghurs are partly Mongoloid both because they may carry the legacy of the ancient mixed population of the Tarim, and also because of their more recent Turkic/Xiongnu associations.
  • with respect to their Caucasoid components, they are mainly West Asian (with the West Asian component also being primary in South Asia), but somewhat shifted to the north due to their absorption of mixed Northern Caucasoid/Mongoloid peoples from the steppelands.


The mystery of the Tocharians may be that there is no mystery. The Tocharians are revealed to have been just another West Asian branch of the Indo-European family that, unlike most of its cousins, went east, absorbed Northern Caucasoid, Mongoloid, and South Asian population elements, emerged long enough in history to leave us a written record of their presence, before succumbing to the Xiongnu and the Mongols.

Thankfully, by combining the remnants of their language, and fragments of their DNA in their descendants, we are able to reconstruct the history of this, once forgotten people

May 19, 2011

Nicholls and Ryder: Semitic 4.4-5.1 thousand years before present

The same authors dated Proto-Indo-European at 8.4ky, in agreement with the work of Gray and Atkinson. In the current paper they re-analyze the data of Kitchen et al. (2009) for Semitic languages, and their estimate is somewhat younger than 5,750 years of that paper. All in all, it's good to see different researchers using different techniques but coming up with similar solutions.

It is increasingly clear that while the Proto-Indo-Europeans originated in the Neolithic Near East, the Proto-Semites followed them by about three thousand years. In the latter case there is also a Y-chromosome marker (J-P58) with an apparent age in impeccable agreement with the linguistic evidence, now that the genealogical-"evolutionary" mutation wars seem to have been won.

This also brings into focus the weakness of the argument that Anthony (2007) (p. 76) brings to the table by hypothesizing that the first farmers of northern Syria were Afro-Asiatic speakers like the Semites of the Near Eastern lowlands. Semites come into the picture 5,000 years after the onset of the Neolithic, and 3,000 years after the Proto-Indo-Europeans. Their relationship with Afroasiatic speakers of Africa make it quite likely that they lived in the south, probably in Arabia, and certainly not in eastern Anatolia or northern Syria.

Indeed, the recent discovery that haplogroup J1*(xP58) is associated with Northeast Caucasian languages, together with the absence or paucity of J1 in most African Afroasiatic speakers suggests to me that the J-P58 Proto-Semites may be the result of the transfer of an African language on a basically West Asian population. Such a scenario might also explain some of the -incorrectly quantified, but nonetheless existent- African genetic components in both Jews and Arabs, as well as the pastoralist/dry-climate J1 associations.

Proceedings of the 26th International Workshop on Statistical Modelling.

Phylogenetic models for Semitic vocabulary.
Geoff K Nicholls and Robin J. Ryder

Abstract: Kitchen et al. (2009) analyze a data set of lexical trait data for twenty five Semitic languages, including ancient languages Hebrew, Aramaic and Akkadian, modern South Arabian and Arabic languages and fifteen ethiosemitic languages. They estimate a phylogenetic tree for the diversification of lexical traits using tree and trait models and methods set up for genetic sequence data. We reanalyze the data in a homplasy-free model for lexical trait data. We use a prior on phylogenies which is non-informative with respect to some of the key scientific hypotheses (concerning topology and root time). Our results are in broad agreement with those of Kitchen et al. (2009), though our 95% HPD for the root of the Semitic tree (the branching of Akkadian) is [4400, 5100]BP and we place Moroccan and Ogaden Arabic in the Modern South Arabian Group.

May 18, 2011

The Central Asian element in Turks (part 3)

In a previous post I summarized extensive evidence by myself and Turkish researchers to the effect that modern Turks are about 1/7 descended from Central Asian Turkic speakers, and 6/7 from pre-Turkic West Asians.

Some people have argued that Uzbeks, the best representative of the Central Asian ancestors of the Turks are inappropriate as a parental population.

Can Turks be modeled as a 1/7-6/7 simple mix of West Eurasians and Central Asians? I refer to my most recent K=11 ADMIXTURE results as useful data that can be used to test this hypothesis once again.

I will use the 4-way average of Greek_D, Armenian_D, Georgians, and Syrians as representative of the "West Eurasian" component in Turks. These 4 populations border Turks from the West, East, North, and South, and their average is expected to be a good stand-in for what pre-Turkish Anatolians were like, and probably more robust than choosing arbitrarily just one of the 4 populations.

I will use Uzbeks as representative of Central Asian Turks, and I will calculate the weighted average of the two (1/7 Uzbek + 6/7 "West Eurasian"). I will then compare this with the average of the Turks (from Behar et al. 2010)+Turkish_D combined sample.

If Turks can be modeled as the simple mix I have claimed, then the empirical Turkish average will be similar to the simulated one (1/7 Uzbek + 6/7 "West Eurasian"). Here are the actual numbers:

As you can see, the simulated average is virtually identical to the empirical one. All components do not deviate from it by more than 0.4%, and only the most important West Asian one deviates by a mere 1.8% which, in relative terms (divided by the mean of 49.2%) represents a 3.7% error.

Given the finite sample sizes, the limitations of ADMIXTURE, and the use of a 4-way average as a proxy for pre-Turkish Anatolians, I can easily claim that this does not only confirm the validity of my model but to an extraordinary degree.

A different way of testing the model's validity is the correlation between the empirical and simulated admixture proportions which is 0.99956. I don't think I need to point out how remarkable this is.


The empirical data are consistent with the idea that Anatolian Turks are a simple mix of a West Eurasian population element equivalent to the average of their immediate neighbors, and a Central Asian population element similar to Uzbeks in a 6:1 analogy. These results confirm and extend the extensive evidence of the previous post.

UPDATE (May 21): In a new experiment, I demonstrate that all available Turkic samples fall almost perfectly on a cline between West and East Eurasians. That experiment also shows that Uzbeks are the most West Eurasian out of the available Central Asian Turkic populations.

It is still unclear what the ratio of West/East Eurasian elements in Turkic people who entered Anatolia was, but these results certainly point out that the Uzbeks are not unusually Mongoloid in their makeup among Turkic peoples, rather the opposite.