March 31, 2011

Winners of televised election debates, or, hoi polloi are easily swayed

The authors seem to argue that the ease with which viewers of electoral debates were swayed by scientifically-clad distortion in the form of the "worm" is a danger to democracy. Certainly, their study should be worrisome for anyone invested in the good function of modern democratic polities, i.e., most of us.

However, I would argue that the study brings to the forefront an even greater problem: the ease with which voters are swayed. The effect of the distorted worm was non-trivial:

Advocates of democracy often claim that this form of constitution is superior because the outcome of an election is determined by the balance of interests of most people in society.

Yet, experiments such as this show that rather than a balance of rational interests, it may be a balance of influences by political marketers who act in many more ways than just the "worm" that determine who wins an election.

PLoS ONE 6(3): e18154. doi:10.1371/journal.pone.0018154

Social Influence in Televised Election Debates: A Potential Distortion of Democracy

Colin J. Davis et al.


A recent innovation in televised election debates is a continuous response measure (commonly referred to as the “worm”) that allows viewers to track the response of a sample of undecided voters in real-time. A potential danger of presenting such data is that it may prevent people from making independent evaluations. We report an experiment with 150 participants in which we manipulated the worm and superimposed it on a live broadcast of a UK election debate. The majority of viewers were unaware that the worm had been manipulated, and yet we were able to influence their perception of who won the debate, their choice of preferred prime minister, and their voting intentions. We argue that there is an urgent need to reconsider the simultaneous broadcast of average response data with televised election debates.


Iklaina tablet

Tablet Discovery Pushes Earliest European Writing Back 150 Years
A clay tablet discovered Greece changes what is known about the origins of literacy in the western world, obviously a good thing, and, unfortunately, also about the origins of bureaucracy. Measuring 2 inches by 3 inches, the tablet fragment is the earliest known written record in Europe, dating back to between 1450 and 1350 B.C., 100-150 years before the tablets from the Petsas House at Mycenae.

The tablet was unearthed last summer during the excavation at the site in Iklaina, which sits in the middle of an olive grove in southwest Greece. Iklaina dates to the Mycenaean period (ca. 1500-1100 B.C.), an era famous for such mythical sagas as the Trojan War. It was one of the capital cities of famed King Nestor, who figures prominently in Homer’s “Iliad.” Iklaina is the rare care where archeology meets mythology.


“Iklaina could potentially challenge what we know about the origins of states in ancient Greece,” Cosmopoulos said. “Not only does it push the origins of those states back in time by at least a century and a half, but the tablet shows that literacy and bureaucracy appeared earlier and were more widespread than what we had thought until now. We still have a lot to learn about the ancient world.”

Here is the website of the Iklaina Archaeological Project.

March 29, 2011

The power of Clusters Galore: Iranians and Arabs

The full power of Clusters Galore depends on its ability to infer clusters of arbitrary size, shape, and orientation in a high-dimensional space. It achieves this by using MCLUST over an MDS or PCA representation of dense genomic data.

Nonetheless, we can still see get a sense of it even in a simple 2D representation as the following:
This was produced by applying MDS on 240 individuals (from Behar et al. 2010, HGDP, Xing et al. 2010, and the Dodecad Project).

One can see that the Behar et al. and Dodecad Iranians form a small cluster on the right, together with the Xing et al. Kurds and the single Dodecad Kurd. Arabs are quite more variable: Druze extend to the bottom of the figure, Bedouin form two groups: one similar to other Arabs, the other extending to the left of the figure. There are also a few Arabs stretching to the top.

The variability of the Arabs can be attributed to reproductive isolation, inbreeding, and variable amounts of African admixture. Let's apply MCLUST over these 240 2D points:
The above visual representation shows the centroids and shapes of the 5 inferred clusters. Here are the numbers of individuals from each population assigned to each cluster:

Notice cluster #5: it consists of all Kurds, most Behar et al. Iranians and all Dodecad ones, and the single Dodecad Kurd, plus a Lebanese and a Syrian. It is overall 96% Iranic in composition. It is quite tempting to think that the two Syrian and Lebanese members have some links to Iranian peoples either due to Kurdish ancestry or the Shia form of Islam.

The more variable Arabs are split into multiple clusters: the main, tight, cluster #3 which includes most of the Levantine Arabs, but also some Saudis and Yemenese, the extremely variable African-admixed cluster #1 dominated by some Yemenese but including a few others, the "Arabian" Saudi-Bedouin dominated cluster #2, and the Druze-specific cluster #4.

It seems that just as the distinction between Celto-Germans and Balto-Slavs is not only cultural, but also genetic, so is the distinction between Iranian and Arab. In the case of the Arabs though, religious distinctions (e.g., the Druze), variable African admixture, and quite possibly Arabization of Levantine populations has resulted in a non-homogeneous array of genetic clusters.

PS: Iranic groups are also not homogeneous if one includes some of those from South Asia, as evidenced by this previous genetic map of West Eurasians which analyzed Kurds and Iranians together with Pathans and Balochis.

March 28, 2011

Relationship between Iran and the Arabian peninsula

J Hum Genet. 2011 Mar;56(3):235-46. Epub 2011 Feb 17.

Mitochondrial DNA and Y-chromosomal stratification in Iran: relationship between Iran and the Arabian Peninsula.

Terreros MC, Rowold DJ, Mirabal S, Herrera RJ.

Modern day Iran is strategically located in the tri-continental corridor uniting Africa, Europe and Asia. Several ethnic groups belonging to distinct religions, speaking different languages and claiming divergent ancestries inhabit the region, generating a potentially diverse genetic reservoir. In addition, past pre-historical and historical events such as the out-of-Africa migrations, the Neolithic expansion from the Fertile Crescent, the Indo-Aryan treks from the Central Asian steppes, the westward Mongol expansions and the Muslim invasions may have chiseled their genetic fingerprints within the genealogical substrata of the Persians. On the other hand, the Iranian perimeter is bounded by the Zagros and Albrez mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts, which may have restricted gene flow from neighboring regions. By utilizing high-resolution mitochondrial DNA (mtDNA) markers and reanalyzing our previously published Y-chromosomal data, we have found a previously unexplored, genetic connection between Iranian populations and the Arabian Peninsula, likely the result of both ancient and recent gene flow. Furthermore, the regional distribution of mtDNA haplogroups J, I, U2 and U7 also provides evidence of barriers to gene flow posed by the two major Iranian deserts and the Zagros mountain range.


March 26, 2011

Pastoral and farmer populations from the Sahel

In Central Asia, the expansion of the Mongol nomads led to the predominance of a few Y-chromosome lineages of relatively recent vintage. The opposite seems to be true for the African pastoralists examined here: they see to be more diverse in their Y-chromosomes than farmers are, and less so in their mtDNA. This suggests that they were a fairly old group who did not particularly marry farmers' daughters.

Mol Biol Evol (2011) doi: 10.1093/molbev/msr067

Genetic structure of pastoral and farmer populations in the African Sahel

Viktor Černý et al.

Traditional pastoralists survive in but few places in the world. They can still be encountered in the African Sahel, where annual alternations of dry and wet seasons force them to continual mobility. Little is known about the genetic structure of these populations. We present here the population distribution of 312 HVS-I mtDNA and 364 Y-STR haplotypes in both farmer and pastoralist groups from the Lake Chad Basin and the West African Sahel. We show that the majority of pastoral populations (represented in the African Sahel by the Fulani nomads) fail to show significant departure from neutrality for mtDNA as evidenced by Fu's Fs statistics, and exhibit lower levels of intra-population diversity measures for mtDNA when contrasted with farmers. These differences were not observed for the Y chromosome. Furthermore, AMOVA analyses and population distributions of the mtDNA haplotypes show more heterogeneity in the sedentary groups than in the pastoralists. On the other hand, pastoralists retain a signature of a wide phylogenetic distance contributing to their male gene pool, whereas in at least some of the farmer populations a founder effect and/or drift might have led to the presence of a single major lineage. Interestingly, these observations are in contrast with those recorded in Central Asia, where similar comparisons of farmer and pastoral groups have recently been carried out. We can conclude that in Africa there have been no substantial mating exchanges between the Fulani pastoralists coming to the Lake Chad Basin from the West African Sahel and their farmer neighbors. At the same time we suggest that the emergence of pastoralism might be an earlier and/or a demographically more important event than the introduction of sedentary agriculture, at least in this part of Africa.


March 24, 2011

Harappa Ancestry Project tests Cluster Galore

The Harappa Ancestry Project has adopted the Clusters Galore approach on its dataset, coming up with 28 clusters. This seems roughly similar to my fine-scale ancestry analysis for South Asians which turned up 34 clusters. I encourage others to try this approach using the simple instructions so that they too can verify its power to detect extremely fine-scale population structure.

March 22, 2011

"Neandertal" genes in East Africa

John Hawks writes that different "Neandertal"-derived haplotypes are found in Europe and China. He attributes this to genetic drift after a population of modern humans admixed with Neandertals in West Asia.

As you all know, I've voiced significant reservations about the interpretation of the Neandertal genome data as evidence for Neandertal admixture in Eurasians. So, I decided to pull up an old experiment I had as a draft for ages because it is quite pertinent to the issue.

East Africa is a possible source of information about the issue of "Neandertal" admixture. The populations of the region are complex: they are thought to preserve features of very old Africans, perhaps the earliest Homo sapiens but they have also been affected by gene flow from Sub-Saharan Africa and West Asia.

If Neandertal admixture occurred in West Asia, then we would not expect East Africans to possess any of it, as Neandertals did not exist in East Africa. At most we would expect them to possess as much of it as could be explained by back-migration from West Asia.

So, I took the Maasai (MKK) sample from HapMap (r3 b36) and calculated allele frequencies for all SNPs in common with it and the 13 genomic regions of Neandertal admixture from Reich et al. (2010), first described by Green et al. (2010) and available here (xls).

There are 190 SNPs in that file, and 46 of them are in the HapMap data. Fortunately, this includes 9 SNPs on chromosome 5 (from rs17617368 to rs16898552) which cover all the length of a 70kbp region attributed to Neandertal admixture (from 28986511 to 29056374).

The interesting thing about this region is that 3/45 Asians possess the "Neandertal" alleles while Africans and Europeans (AFR and CEU) do not. So, it is an example of "Neandertal" genes that survived in Asians but not Europeans.

Here is a table of the minor ("Neandertal") allele frequencies on the MKK sample of 156 individuals:

Maasai seem to have some "Neandertal" genes in common with East Asians that are not shared by Europeans.

Admixture of Maasai with East Asians seems unlikely. Thus, there are three possibilities:
  • A recent back-migration of West Asians who possess these alleles
  • A really old back-migration of undifferentiated "Neandertal"-admixed West Asians in which these alleles had not yet been lost by drift
  • Origin of these alleles in the common ancestors of East Africans and Eurasians rather than introgression from Neandertals
I can't exclude the possibility that some recent Caucasoids from West Asia possessed these alleles while CEU do not. I will simply note that HapMap Tuscans (TSI), do not possess them, and neither do 471 Ashkenazi Jews from Bray et al. (2010) who are likely to be of West Asian/European ancestry. Neither do Kurds and Urkarah Dagestanis (from Xing et al. 2010) possess 2 "Neandertal" alleles on SNPs available in that dataset.

So, I will tentatively exclude the possibility that recent Caucasoid back-migrations brought these alleles to East Africa.

This leaves open two possibilities: that (i) these aren't Neandertal genes at all and were part of the ancestral gene pool of East Africans and Eurasians, or that (ii) they were brought back to Africa by a major early back-migration of undifferentiated Eurasians.

To conclude the post:
  • It's not as simple as Africans vs. non-Africans.
  • Sample more diverse African groups for "Neandertal" genes. Both Green et al. (2010) and Reich et al. (2010) claim that African groups do not differ from each other with respect to Eurasian archaic hominins, which is what you'd expect for admixture that took place in Eurasia. But, they haven't sampled broadly in Africa to make that claim convincingly.

March 21, 2011

A note of caution on admixture estimates

I want to expand on a theme I touched upon briefly in a previous post: the importance of choosing appropriate parental populations in admixture analyses.

I will first show empirically the impact of this choice to the admixture proportions. Then, I will deal with a special and difficult cases: the Indian Cline.

The not so easy case of Mexican Mestizos

Mexican Mestizos are a tri-hybrid population composed of European, Native American, and West African elements. These elements began interbreeding only in the last half millennium or so, and, hence, the process occurred in historical time.

Consider a sample of 25 Mexicans from the HapMap and 25 Yoruba from the Hapmap, 25 Iberian Spanish from the 1000 Genomes Project, and 14 Pima from the HGDP as parental populations. We obtain for our Mexican sample:
  • 59.7% European
  • 36.9% "Native American"
  • 3.4% African
Now, substitute the Pima with 21 Maya from the HGDP as representative of Native Americans. We now obtain:
  • 49.9% European
  • 47.3% "Native American"
  • 2.8% African
Notice that the Native American component has increased. We will see shortly why this is the case. But, let's run a final experiment with just the Mexicans, Spanish, and Yoruba, i.e., with no Native American samples. At K=3 we obtain:
  • 70% "Native American"
  • 29.7% European
  • 0.4% African
The "Native American" component has increased again! The explanation is simple: as we exclude less admixed Native American groups, Mexicans appear (comparatively) more Native American. The "Native American pole" has shifted, and so has the relative position of populations between them.

In other terms, what is labeled "Native American" in the three experiments is not the same: in the first one it is anchored on the more unadmixed Pima, in the last one in the more admixed Mexicans.

A color analogy is apt: imagine you had white and black paint, and you wanted to achieve a medium grey hue: you could mix equal parts white and black (1/2 each) to achieve this. Now, imagine that instead of white paint, you had a light grey hue. You would now have to mix greater amounts of light grey (more than 1/2) to achieve the same medium hue.

The moral:
  • If you are going to study admixture, you'd better find unadmixed representatives of ancestral populations.
As we will now see, this is not always possible:

No unadmixed populations: the Indian Cline

What if the process of admixture had occurred for a thousand more years and all inhabitants of the New World had acquired a generous portion of European ancestry? We would then have no unadmixed native populations to use in the estimation of admixture proportions.

This is, in essence, the problem that Reich et al. (2009) had to deal with in the context of India. West Eurasian-like people have been arriving to the Indian subcontinent since at least Neolithic times and until quite recently. The caste system has served to barricade gene flow to some extent, but, nonetheless, the populations of India are, today, variable mixes of West Eurasians and indigenous Indians.

Even the Andamanese Islanders had evidence of the West Eurasian-like element (which they termed Ancestral North Indian). Looking back to the Mexican example, the lack of unadmixed reference populations would inflate estimates of native ancestry.

To see whether this is the case, I took the 18 populations of the Indian Cline described by Reich et al. (2009) together with 25 Europeans from HapMap CEU and ran ADMIXTURE over the set. Below you can see the comparison between the "West Eurasian" component of ADMIXTURE and the Ancestral North Indian:

The cline is preserved in both representations, but the right column has smaller numbers than the left one, confirming our intuition about the use of admixed populations.

Below is a scatterplot of the two columns, with the regression equation on the chart:

The high R2 value suggests that two techniques are measuring the same underlying reality, but ADMIXTURE produces lower West Eurasian admixture (by about 38%) over the technique of Reich et al. (2009). Indeed, this is what we expect, as Reich et al. (2009) assign 38.8% ANI ancestry in the "most indigenous" group (the Mala) along the cline.

The position of populations along the cline is roughly the same, but the two sets of admixture proportions are shifted by about 38% with respect to each other.

(Reich et al. (2009) removed 8 individuals from their dataset as well as 7 Pathans and 14 Sindhis as outliers. I used the recommendations of Rosenberg with respect to the Pathans and Sindhis, using his H971 set and kept all the Indian individuals of Reich et al. (2009). As can be seen, the slightly different datasets did not largely affect the correlation between admixture proportions)

Reich et al. (2009) were able to infer the existence of ANI ancestry even in the most "indigenous" of Indian populations by exploiting the simple structure of the problem, namely:
  1. Admixture occurred between only 2 ancestral groups
  2. The 2 groups were related to extant human populations that are not part of the cline: CEU and Adygei for ANI and Onge for ASI
  3. There was treelike evolution of all studied groups except for the ANI-ASI admixture event
It is a beautiful result that showed that there are cases where the extent admixture can be inferred even in the absence of unadmixed populations representative of involved populations.


Much more can be said on this issue, but let's summarize a couple of lessons:
  • The full extent of an admixture cline can be captured only if unadmixed populations on either side of the cline exist. Use as many populations as possible to capture the full extent of an admixture cline.
  • Use of an admixed population in lieu of an unadmixed native one inflates the inferred native component. Use native populations if possible instead of admixed ones .
  • Even in the absence of unadmixed native populations, it is sometimes possible to reconstruct the admixture proportions as per Reich et al. (2009).
Capturing the complexities of human prehistory from modern populations is tricky. Nonetheless, with increased coverage of human genetic diversity (there are already ~9k individuals in my database), new analytical techniques, and, hopefully some archaeogenetic calibration, we are bound to learn much more about the distant human past in the not-so distant future.

PS: The substantial correlation between the ANI-ASI populations of Reich et al. (2009) and of the "West Eurasian"-"South Asian" ones in K=2 ADMIXTURE analysis makes it possible to infer a person's ANI-ASI proportions from their ADMIXTURE results. Dodecad Project members of South Asian heritage should keep an eye on the Dodecad Project blog for that type of inference.

Combe Capelle RIP

John Hawks points me to an announcement about the date of Combe Capelle. Long-thought to be an early Paleolithic skull, the burial site has been redated to 7,575BCE, making the skull Epipaleolithic.

Combe Capelle differs quite a bit from the "Cro-Magnons" (right) who inhabited France during the Upper Paleolithic by being hyperolichocranic, higher-skulled narrower-faced. This led some to postulate the existence of two primordial European races.

The younger age of Combe Capelle upsets this theory, although skulls of this general type also appear in the Moravian site of Dolni Vestonice.

In 2004 another supposed Aurignacian specimen, Vogelherd was demoted to the last 5,000 years. It's always a good idea to be vigilant about possible misdating, especially if a sample seems discordant with other individuals from the same place and time.

(Combe Capelle Photograph Gunter Bechly)

March 20, 2011

D-statistic paper (Durand et al. 2011)

The D-statistic was introduced in Green et al. (2010) (Neandertal admixture paper) and used in Reich et al. (2010) (Denisovan admixture paper). In basic terms it studies whether from a pair of populations P1, P2 one is closer to a third one P3, using P4 as an outgroup.


In the aforementioned papers it was usually used like this:

D(Eurasian, African, Archaic, Chimpanzee)

and its positive values were interpreted as evidence of archaic admixture of different kind in subsets of modern humans (non-Africans and Melanesians).

The new paper is highly technical, but suggests that one can use this statistic to infer archaic admixture even in the absence of an ancient specimen. I'm not entirely clear on how the nuts-and-bolts of this work, but the gist of it seems to be the detection of levels of genetic divergence that can either be explained by thousands of generations of population structure that was broken down or archaic admixture.

It might be interesting to see if new types of archaic admixture can be predicted from the genomes of modern populations, while we wait for the next archaic hominin to be sequenced. As I've mentioned in the past, DNA preservation in hot and humid climates may make DNA preservation impossible, and hence there may never be an ancient sequence to compare against. In a sense, the Denisovan paper got lucky because the Denisova group (in the Altai) may have been related to the people that Melanesians admixed with much further south -- unless they took a massive detour.

So, hopefully, as full genomic data on diverse human populations become widely available over the next few years, new traces of archaic admixture and/or deep population structure may be inferred. The admixture record stands 2 for 2 with the only two actual archaic hominin groups that were tested so far, so I'd bet that this isn't the end of the story. Anthropological theory is bound to move away from naive recent Out-of-Africa and towards a more nuanced view of human origins, in which more diverse ancestors have their own place.

Mol Biol Evol (2011) doi: 10.1093/molbev/msr048

Testing for ancient admixture between closely related populations

Eric Y. Durand et al.

One enduring question in evolutionary biology is the extent of archaic admixture in the genomes of present-day populations. In this paper, we present a test for ancient admixture that exploits the asymmetry in the frequencies of the two non-concordant gene trees in a three-population species tree. This test was first developed to detect interbreeding between Neandertals and modern humans. We derive the analytic expectation of a test statistic, called the D-statistic, which is sensitive to asymmetry under alternative demographic scenarios. We show that the D-statistic is insensitive to some demographic assumptions such as ancestral population sizes, and requires only the assumption that the ancestral populations were randomly mating. A important aspect of D-statistics is that they can be used to detect archaic admixture even when no archaic sample is available. We explore the effect of sequencing error on the false positive rate of the test for admixture, and we show how to estimate the proportion of archaic ancestry in the genomes of present-day populations. We also investigate a model of subdivision in ancestral populations that can result in D-statistics that indicate recent admixture.


March 18, 2011

Pan-Asian SNP Consortium analysis

The Harappa Ancestry Project tips me about the availability of the Pan-Asian SNP data. I was able to download it and run a quick ADMIXTURE test run on them.

The admixture proportions can be found in the spreadsheet.

Analysis of 1000 Genomes + HapMap 3 data

A reader tipped me on the availability of data from the 1000 Genomes Project genotyped on the Illumina Omni 2.5 chip. Out of 2.5 million or so SNPs, there are about 720,000 with rs-numbers in the working dataset. There are a few new populations in the data:
  • GBR (Great Britain)
  • FIN (Finland)
  • IBS (Iberian Spanish)
  • CLM (Colombians)
  • MXL (Mexican Americans from Los Angeles)
  • PUR (Puerto Ricans)
I've been rebuilding my various datasets to account for common markers, high quality SNPs, and linkage disequilibrium, so this is based on about 133,000 markers. I also limited the number of individuals at 25 per population.

I took the HapMap-3 data to make sure that the integration was correct and ran various analytical techniques over the joint dataset of 17 populations and 425 individuals.

Multidimensional Scaling

As expected the three poles correspond to West Eurasians (top left, GBR, CEU, TSI), East Eurasians (bottom left, CHB, CHD, JPT), and Sub-Saharan Africans (YRI).

Other populations fall in between the three poles: for example, FIN slightly removed from West Eurasians in an East Eurasian direction, Mexicans and Gujarati Indians (GIH) in-between West and East Eurasians, African Americans (ASW) and Maasai East Africans (MKK) in-between Sub-Saharan Africans and West Eurasians.

Clusters Galore Analysis

I then used the Clusters Galore approach to cluster individuals. As I've mentioned before, individuals with quite distinct origins may overlap in the MDS representation, and the Galore approach is able to discover distinct clusters by looking at several dimensions at the same time, and using a state-of-the art clustering algorithm, MCLUST.

As can be seen in the MDS plot, Mexicans and Gujarati Indians overlap, as well as African Americans and Maasai. Obviously these populations are completely different mixtures that happen to coincide in genomic space due to the relatedness of their ancestral components that intermixed at different times and in different continents.

Here are the results of the Galore analysis. With 20 MDS dimensions retained (the maximum I considered) there were 35 clusters in the MCLUST solution that maximized the Bayes Information Criterion.

This is quite instructive:
  • Some populations (FIN and YRI) form their own very specific clusters #2 and #35
  • Some clusters join 2 or more populations. For example White Americans (CEU) and Britons (GBR) form cluster #1
  • Latinos form several clusters, especially the Mexicans. This should've been anticipated from the MDS plot where they are shown to be widely dispersed (quite variable). In essence, Latinos are not homogeneous populations but sets of individuals possessing variable admixture proportions
Note also, that some populations that are folded into a single cluster in this analysis (e.g., Spanish and Tuscans in #3) can in fact be distinguished from each other although not so easily in the first 20 dimensions considered here, as these are dominated by more salient features of the global genetic landscape.

ADMIXTURE analysis

I then ran ADMIXTURE over the dataset for K=5.

Here are the admixture proportions corresponding to this plot:

This is quite instructive with respect to the absence of particular reference populations: Finns show East Eurasian influences in the form of "Native American" (1.5%) and "East Asian" (6.2%) elements. Clearly, we don't have to imagine Native Americans moving into Finland, and these two components are standins for the Siberian ancestors of the European Finns. Similarly, Spanish show African admixture (1.6%). This is also probably due to both North and Sub-Saharan African elements, but the absence of appropriate North African references makes the distinction impossible. Finally, the Maasai show European and African admixture. This may be due to the non-emergence of a specific East African component at this level of resolution, as well as the absence of appropriate West Asian Caucasoid groups that are more likely to have influenced them. The absence of West Asian reference populations also probably affects Tuscans as their West Asian admixture may be misinterpreted as South Asian.

Here are the Fst distances between components:

This is also instructive: the South Asian component, in the absence of relatively unadmixed South Asian references is closer to Europeans than to East Asians. In fact, it is a composite of West Eurasian and indigenous South Asian population elements, the latter being distantly related to East Asians. Similarly, in the absence of Amerindian references, the Native American component (a bit of a misnomer) is equidistant to Europeans and East Asians. In fact, it is also a composite of West Eurasian and pre-Columbian American populations.


The Omni 2.5 data seem to work fine, and genome bloggers can anticipate good things in the future from the 1000 Genomes Project, as many more populations are in the pipeline. Clearly, the full-sequence data will probably be too much to handle for most hobbyists at the moment, but for anthropological investigations the 2.5 million SNPs will be more than enough.

The few experiments I carried out here also served to highlight the problems associated with using a limited number of reference populations. But, thankfully, this was a contrived problem aimed to make a point: there are now publicly available data for most major human populations, so the field is wide open for anyone interested in the study of human variation.

HAPMIX 2 released

Via the Reich lab. A related method StepPCO using wavelets. Both of these algorithms estimate the origin of segments of DNA in admixed populations and can estimate the time for the admixture event.

March 16, 2011

Longevity of people with long-lived relatives

European Journal of Human Genetics , (16 March 2011) | doi:10.1038/ejhg.2011.40

The genetic component of human longevity: analysis of the survival advantage of parents and siblings of Italian nonagenarians

Alberto Montesanto et al.

Many epidemiological studies have shown that parents, siblings and offspring of long-lived subjects have a significant survival advantage when compared with the general population. However, how much of this reported advantage is due to common genetic factors or to a shared environment remains to be resolved.

We reconstructed 202 families of nonagenarians from a population of southern Italy. To estimate the familiarity of human longevity, we compared survival data of parents and siblings of long-lived subjects to that of appropriate Italian birth cohorts. Then, to estimate the genetic component of longevity while minimizing the variability due to environment factors, we compared the survival functions of nonagenarians' siblings with those of their spouses (intrafamily control group).

We found that both parents and siblings of the probands had a significant survival advantage over their Italian birth cohort counterparts. On the other hand, although a substantial survival advantage was observed in male siblings of probands with respect to the male intrafamily control group, female siblings did not show a similar advantage. In addition, we observed that the presence of a male nonagenarians in a family significantly decreased the instant mortality rate throughout lifetime for all the siblings; in the case of a female nonagenarians such an advantage persisted only for her male siblings.

The methodological approach used here allowed us to distinguish the effects of environmental and genetic factors on human longevity. Our results suggest that genetic factors in males have a higher impact than in females on attaining longevity.


March 15, 2011

StepPCO for admixture estimation

The authors introduce wavelet transform as a method of estimating admixture proportions and dating the time of admixture. They claim to perform better than HAPMIX which is probably the state of the art when it comes to this sort of thing.

As I had pointed out in my review of HAPMIX, the problem with this type of tool is that quite often you don't have access to the parental populations of an admixed population, because either they no longer exist in unadmixed form themselves, or you are using inappropriate stand-ins for them. This is not much of a problem for unsupervised admixture analysis which makes no assumptions about which populations combined to form an admixed population, but looks only at individuals.

Indeed, I'd say there is plenty of room for researchers to come up with unsupervised versions of HAPMIX/StepPCO and/or to extend them so that they can handle tri-source populations, as they currently assume only two sources of admixture.

An interesting quote from the paper:
Average admixture proportions estimated by the StepPCO method for the African-Americans, Polynesians and Fijians are 19% European ancestry, 24.9% Melanesian ancestry, and 40.2% Melanesian ancestry respectively (Figure 6a). Individual admixture estimates vary substantially among the African-Americans, with some individuals exhibiting very low European ancestry (less than 5%), and some substantially higher (more than 40%). These results were substantiated by the frappe [13] analysis, which agree quite closely with the per-chromosome ancestry estimates from the StepPCO analysis (Figure 6b). A similar pattern is observed in Fiji, with Melanesian ancestry ranging from 22% to 63%. Despite the fact that the Polynesian sample is very diverse, coming from seven different islands [19] , the level of Melanesian ancestry is much more uniform across individuals (varying from 18 to 28%).

Contra the speculations of some, per-chromosome ancestry estimates do not differ greatly from those obtained from a genome-wide maximum likelihood algorithm like frappe; the latter implements the same algorithm as ADMIXTURE, the software I use in the Dodecad Project. Nor is there any evidence that maximum likelihood algorithms suppress low-level admixture: the Mandenka show 2% European admixture in the 2-way analysis by both StepPCO and HAPMIX, and they show 1.66% West Eurasian admixture in my K=3 global unsupervised admixture analysis which looked at 139 different populations.

The main advantage of HAPMIX/StepPCO over maximum likelihood methods is not their greater accuracy, but rather the fact that they can date admixture events, with the above-mentioned caveats. From the paper:
The spectral analysis of the StepPCO signal revealed that the average dominant frequency for the African-Americans is located at level 1.8, which would correspond to an abundance of low frequency wavelets (that is, wider ancestry blocks), while for the Fijians and the Polynesians the average dominant frequency is at level 3.06 and 3.63 respectively, which is indicative of much narrower ancestry blocks (Figure 7). Based on simulations, the WT center of 1.8 corresponds to an admixture time of 6 generations ago (95% CI: 4-8 generations) for the African Americans. Assuming a generation time of 30 years [33] , our results indicate that the admixture in the African Americans started about 180 years ago. Similarly, the simulations indicate that the WT center of 3.63 for the Polynesians corresponds to an admixture time of 90 generations (95% CI: 77-131 generations), or about 2,700 years ago (Figure 8). The time estimation for Fiji is based on simulated data with a 40% admixture rate (to match the higher admixture rate of Fiji), and here the WT center of 3.06 corresponds to an admixture time of 37 generations (95% CI: 29-39) or about 1,100 years ago.

The central estimate for African Americans seems plausible, given that admixture in that population took place since colonial times until more recently, as AA children of half-white heritage are usually considered (by society) as "black" (cf. Obama), and two centuries or so seems like a reasonable middle ground. The ~2.7ky for Polynesian admixture is also in agreement with the different method of Wollstein et al. (2010) of 3ky.

The software runs in R and is available online.

Genome Biology 2011, 12:R19 doi:10.1186/gb-2011-12-2-r19

Dating the age of admixture via wavelet transform analysis of genome-wide data

Irina Pugach et al.


We describe a PCA-based genome scan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixed human populations. The wavelet transform method offers better resolution than existing methods for dating admixture, and can be applied to either SNP or sequence data from humans or other species.


March 14, 2011

The coming of the Greeks to Provence and Corsica (King et al. 2011)

I am sure I will have much more to say on this paper once I read it carefully, but, for the moment, I will remind readers of my 2008 post on Expansion of E-V13 explained in which I postulated that E-V13 in Europe is attributed largely to Greek colonization.

The paper is also quite exciting as it includes samples of Greeks from the vicinity of Smyrna and Phocaia, the first, as far as I know published samples of Greek men from Asia Minor. I do find, however, somewhat bizarre the use of Anatolian Greeks as the putative ancestors of the colonization of the West Mediterranean and of Anatolian Turks as the supposed representatives of the Neolithic population (Table 1). The claim that the latest Anatolian population stratum (Turks) can be linked to its earliest (Neolithic-era Anatolians) is rather suspect.

UPDATE I (Mar 15)

The authors claim:
This high frequency ofhaplogroup J2a-Page55 (formerly DYS413≤ 18) in Smyrna is characteristic of non Greek Anatolia.
This claim is based entirely on the authors' limited Balkan Greek samples. An inspection of more Greek samples shows that DYS413 less or equal to 18 occurs at higher frequencies both in Crete, but also several mainland sites (Serrai, Larisa, Patrai) spanning the entire country. Hence, I believe that the claim that J2a-Page55 distinguishes Greeks from non-Greeks is spurious.

UPDATE II (Mar 15)

The authors cite the "Phoenician" paper:
Previous Y-chromosome genetic studies of Phoenician colonization have demonstrated that haplogroup J2 frequency was amplified in regions containing the Phoenician colonies of Iberia and North Africa in comparison to areas not containing Phoenician colonies [7]
My scathing criticism of that paper, and the specific "Phoenician" association with J2 can be found here.


The authors make a big deal of the presumed relationship of Phocaea with Ionians and of Smyrna with Ionian/Aeolians. As I have mentioned before, it is a hard sell to think that two sites right next to each other, inhabited by people who had no ethnic or religious distinction for more than 2,000 years (any tribal Greek identities had disappeared by ancient times) managed to retain, nonetheless distinctive gene pools from each other over that time span that can be traced to archaic Greek tribal distinctions.

UPDATE (Mar 17)

The above-mentioned nitpicks do not, however, detract from the paper's thesis. So, it's worth repeating a few of the things on which this thesis is supported:
  • We have new Greek population samples from Asia Minor that show E-V13 frequencies well within the regional variation of mainland Greece, and higher than in the Turkish Anatolian population. This disproves the theory that E-V13 may have been introduced to the mainland Greek population recently from Albanians, Thracians, and other bizarre theories advocated by some, as these would not have affected substantially the Greeks of West Asia Minor.
  • It should be noted however, that E-V13 frequencies vary substantially among Greek populations. This seems consistent with my theory of its Bronze Age "heroic" origin, as late lineages are expected to have non-homogeneous frequency distributions.
  • The Corsican evidence is consistent with the Greek origin of E-V13 due to the higher frequency of E-V13 around the colony of Alalia (4.6% East Corsica vs. 1.6% in West Corsica).
  • The absence of I-M423 in Provence precludes a substantial contribution to the Provencal population by Balkan populations north of Greece where I-M423 reaches a higher frequency.
It seems pretty clear to me that E-V13 bearing men of Provence are patrilineally descended from the Greeks of the archaic age. The same could be true for others (e.g., J-M92) assigned (erroneously in my opinion) to non-Greek Anatolians, but overall, the evidence supports the persistence of the gene pool of the Western Greeks among the present-day southern French.

BMC Evolutionary Biology 2011, 11:69doi:10.1186/1471-2148-11-69

The coming of the Greeks to Provence and Corsica: Y-chromosome models of archaic Greek colonization of the western Mediterranean

Roy J King et al.

Abstract (provisional)

The process of Greek colonization of the Central and Western Mediterranean during the Archaic and Classical Eras has been understudied from the perspective of population genetics. To investigate the Y chromosomal demography of Greek colonization in the Western Mediterranean, Y-chromosome data consisting of 29 YSNPs and 37 YSTRs were compared from 51 subjects from Provence, 56 subjects from Smyrna and 31 subjects whose paternal ancestry derives from Asia Minor Phokaia, the ancestral embarkation port to the 6th century BCE Greek colonies of Massalia (Marseilles) and Alalie (Aleria, Corsica).

19% of the Phokaian and 12% of the Smyrnian representatives were derived for haplogroup E-V13, characteristic of the Greek and Balkan mainland, while 4% of the Provencal, 4.6% of West Corsican and 1.6% of East Corsican samples were derived for E-V13. An admixture analysis estimated that 17% of the Y-chromosomes of Provence may be attributed to Greek colonization. Using putative Neolithic Anatolian lineages: J2a-dys445=6, G2a-M406 and J2a1b1-M92 the data predict a 0% Neolithic contribution to Provence from Anatolia. Estimates of colonial Greek vs. indigenous Celto-Ligurian demography predict a maximum of a 10% Greek contribution, suggesting a Greek male elite-dominant input into the Iron Age Provence population.

Given the origin of viniculture in Provence is ascribed to Massalia, these results suggest that E-V13 may trace the demographic and socio-cultural impact of Greek colonization in Mediterranean Europe, a contribution that appears to be considerably larger than that of a Neolithic pioneer colonization.


Y chromosomes of Altaian Kazakhs

This paper uses both the pedigree (genealogical or germline) and evolutionary mutation rates. Readers of the blog are aware that I've been a vehement critic of the latter on theoretical grounds since 2008, and I've started keeping track of cases where the germline rate has a better fit to the archaeological record than the evolutionary rate.

In this particular case, the difference between the two rates (about 3-fold) is especially interesting, because of the whole "Genghis Khan" theory according to which a large number of central Asian men belong to a haplotype cluster dated to around the time of the Mongol conqueror and maybe the descendants of Genghis and his close male relatives. This theory relies on the use of the germline rate: otherwise the genetic signature attributed to the Khan must be redated to a much earlier time.

The authors give a convincing argument in favor of the pedigree rate:
The difficulty in reliably determining the coalescent dates for the lineages found in Kazakh populations makes it nearly impossible to determine whether these lineages were present in ancestral nomadic steppe groups (Scythians, Xiongnu, Xianbei, Toba, and Jou-Jan) or were contributed by the descendents of Genghis Khan and the Mongol armies that, at one time, held control over the region. An important reason for caution here is the current debate about the most appropriate mutation rate for NRY coalescence estimates. The evidence provided by Zerjal et al. [14] supports the younger estimates, suggesting that the Kazakh haplotypes could be the direct result of the Mongol influence in the 13th century CE. The presence of the C3* haplotype cluster in the Kazakh also supports the genealogical assertions that (for at least some Kazakh men) there is a direct paternal connection to Genghis Khan.

If the evolutionary rate is the more accurate value for Y-STRs, then the Kazakh lineages coalesce to roughly 2,000 years ago. This date suggests a far older source for them, possibly with the westward movements of Altaic-speaking peoples around the second and first centuries BCE. In this case, we would expect to see multiple haplotype clusters exhibiting a similar pattern as the Genghis Khan cluster. However, we do not observe this pattern. As Zerjal et al. [14] pointed out, this haplotype cluster is unique. Therefore, given the evidence presented here and in Zerjal et al. [14], we believe the best interpretation of the data is that Kazakh Y-chromosome diversity was strongly influenced by the Mongols of the 13th century CE.

The younger ages of the Mongoloid lineages in this population makes good historical sense, as these are derived from tribal Turko-Mongolian tribes establishing (more recently) control over the pre-existing Iranian populations of the steppe. The gene pool of the latter has been marginalized but it maintains its genetic diversity.

The presence of haplogroup J2a here as the modal Caucasoid lineage, followed by haplogroups G1 and G2a is also quite interesting, and plausibly brings origin of the ancestors of the pre-Altaic inhabitants of the region in close proximity to the West Asian homeland of the ancestors of the Indo-Aryans.

PLoS ONE 6(3): e17548. doi:10.1371/journal.pone.0017548

Y-Chromosome Variation in Altaian Kazakhs Reveals a Common Paternal Gene Pool for Kazakhs and the Influence of Mongolian Expansions

Matthew C. Dulik et al.

Kazakh populations have traditionally lived as nomadic pastoralists that seasonally migrate across the steppe and surrounding mountain ranges in Kazakhstan and southern Siberia. To clarify their population history from a paternal perspective, we analyzed the non-recombining portion of the Y-chromosome from Kazakh populations living in southern Altai Republic, Russia, using a high-resolution analysis of 60 biallelic markers and 17 STRs. We noted distinct differences in the patterns of genetic variation between maternal and paternal genetic systems in the Altaian Kazakhs. While they possess a variety of East and West Eurasian mtDNA haplogroups, only three East Eurasian paternal haplogroups appear at significant frequencies (C3*, C3c and O3a3c*). In addition, the Y-STR data revealed low genetic diversity within these lineages. Analysis of the combined biallelic and STR data also demonstrated genetic differences among Kazakh populations from across Central Asia. The observed differences between Altaian Kazakhs and indigenous Kazakhs were not the result of admixture between Altaian Kazakhs and indigenous Altaians. Overall, the shared paternal ancestry of Kazakhs differentiates them from other Central Asian populations. In addition, all of them showed evidence of genetic influence by the 13th century CE Mongol Empire. Ultimately, the social and cultural traditions of the Kazakhs shaped their current pattern of genetic variation.


March 09, 2011

Clusters Galore analysis of Henn et al. (2011) data

The great thing about researchers putting their data online, like Henn et al. (2011) did, is that they can expect anyone with a computer, a bit of knowledge, and a bit of time, to study it, analyze it, play with it, and perhaps add a little value of their own.

As soon as I realized that there were 30 populations and 587 individuals in this dataset, most of them previously unsampled Africans, I had to get my hands on them and try my Galore approach. This can be summarized as dimensionality reduction via PCA/MDS, followed by MCLUST for an unsupervised clustering of unlabeled individuals with no a priori setting of the number of clusters K. (If you want to try it, instructions here)

As I have explained before, my favorite way of using the Galore method is by iterating over the number of retained MDS dimensions, seeing the optimal K chosen by MCLUST based on the Bayes Information Criterion, and reporting the results for the number of dimensions which produces the highest K. Considering only the first 20 dimensions, there were 42 clusters with 15 retained MDS dimensions.

I have placed a RAR archive of scatterplots of the first 20 dimensions here. Below you can see the first 2 dimensions, which shows a triangle with vertices anchored on Tuscans, San, and the bulk of Sub-Saharan Africans.

Here are the results of the Galore analysis, showing the number of individuals from each population assigned to each cluster.
I would say that the Galore approach had remarkable success in grouping unlabeled individuals into very meaningful clusters:
  • Some populations got their own exclusive clusters (e.g., Mandenka, Tuscans, and Mada)
  • A few clusters included individuals from related populations, e.g., #12 from two different groups of San, or #26-32 of various types of North and Saharan Africans
  • Some populations were split across different clusters; I think it is instructive to see which ones were: the quite diverse San, Hadza, and Sandawe, and also the quite heterogeneous North Africans. In the latter case Arab, Berber, and Sub-Saharan ancestry probably co-exist in various proportions in individuals.
I anticipate that the ~55k SNPs included in the released data will be largely compatible with the datasets included in the Dodecad Project, and while that project's focus is on Eurasian populations, the availability of such rich and varied African data will surely be welcome, and allow me to frame the ancestry of African-admixed individuals more accurately.

Out of South Africa? Out of anywhere?

The most widely circulated theory of modern human origins involves the emergence of our species from a small tribe of only a few thousand people who lived in East Africa. There are three main arguments in favor of this theory:
  1. The earliest anatomically modern (albeit with archaic traits) skulls are found in Ethiopia (Omo skulls, dated to 195 thousand years ago)
  2. The shallow coalescence times of human mtDNA and Y-chromosomes (within the last 200 thousand years) is seen as evidence for a recent emergence of our species.
  3. Diminution of genetic diversity across Eurasia is proportional to distance from East Africa
Of course, there are counter-arguments for all these claims:
  1. Modern human skulls may have been preserved in Ethiopia because of its climate which is favorable to preservation and/or the extreme interest by palaeoanthropologists on this region
  2. The shallow coalescence times may be the result of selective sweeps affecting these uniparental markers and do not, in general have much to say about the time depth of the species. Anyway, the molecular dates are highly suspect, not to mention that other genetic systems are supportive of more complex processes than the recent Out of Africa model popularized in the media.
  3. Diminution of genetic diversity across Eurasia tells us nothing about how genetic diversity is distributed within Africa itself. Indeed:
  • We would expect the exit of modern humans to occur in East Africa irrespective of whether they originated there or not: decreased genetic diversity with distance from east Africa simply means that Eurasians may have passed through east Africa, not that they originated there.
  • The law-like diminution of genetic diversity from east Africa is questionable
  • The paper that is the subject of this post shows that within Africa east Africans are not the most genetically diverse
The most scathing criticism of the east African, or, indeed, any single origin of mankind comes from multiregional evolution (MRE). This theory, seen in a more favorable light after ancient DNA research's 2 for 2 record of inferring archaic admixture in modern humans questions the very idea of an "origin" of humans in some small geographically circumscribed place.

Rather, it proposes that modern humans (Homo sapiens) are genomic blends of components that originated at different times in different places: there was never an "African tribe on the verge of extinction that went on to populate the world", but a single set of interbreeding Homo populations where alleles could (and did) originate anywhere, and:
  1. the greater observed African diversity is due to a higher African effective population size
  2. the reduced overall diversity within our species is not due to a bottleneck but to the culling of variation by natural selection, although not of the classical sweep kind.
The current paper argues against and east African origin of mankind and in favor of a south African one. I can't help but feel the irony of the fact that recently north Africa was implicated in modern human origins, while just yesterday I posted an abstract from the upcoming AAPA 2011 which rejects a south African origin and favors higher central/east African genetic diversity (at least for the Y-chromosome)!

Things are clearly not simple.

What is the underlying assumption causing so many divergent opinions? I would say it is the phylogeographic axiom that greater diversity implies place of origin. This is made explicit by an author of the paper in the following quote:
Henn admits that migration could certainly be a possibility, but counters that when a population migrates, typically only a subset moves to a new area, and this subset is less genetically diverse than the parent population. She argues that if a group left eastern African for southern Africa it would be expected to have less diversity in the south. "This is not what we find in the data," she says.

True, but this tree-like model of human migration does not really capture the complexity of what happened. Because a population's diversity does not increase only when it is ancient, but also when it is admixed, the product of the coming together of two genetically divergent populations. Indeed, within the paper itself can be found this statement:
Recently Tishkoff et al. (3) suggested a potential origin for modern humans in southern Africa, on the basis of heterozygosity estimates from microsatellite data. However, their sample of KhoeSan was small, and the directionality of a southwestern origin of humans based on heterozygosity could have been driven by the inclusion of a highly admixed (and thus highly heterozygous) “Coloured” population.
But, the same could very well be true for the highly diverse KhoeSan population of African hunter-gatherers! Their greater genetic diversity could mean they are highly admixed, rather than they are very old.

There is, however, an additional argument, based on patterns of linkage disequilibrium. Typically admixed individuals have high LD, as they inherit whole blocks of DNA from one or the other of the source populations. LD decays over time. A recently admixed population is expected to have high LD. Yet, it may very well be the case that the very low LD observed in the KhoeSan may be compatible with their status as an admixed population if the admixture event took place long ago.

Readers of the blog know that this is precisely what I have proposed: that African hunter-gatherers are to a large extent the product of old admixture between "modern" humans and archaic Africans, just as Eurasians may also be the product old admixture between "modern" humans and archaic Eurasians, such as the Neandertals or Denisovans.

Old admixture implies non-tree-like evolution and invalidates the aforementioned phylogeographic axiom. It implies that places of high diversity may be due to admixture, and not to antiquity.

In any case, while I don't believe that this paper proves the south African origin of mankind, it is an extremely important contribution to the sampling of African genomic diversity, and hopefully its data will be useful in the future, and I don't doubt that I will find some use of the ~55K SNPs for several populations in my Dodecad Project.

PNAS doi: 10.1073/pnas.1017511108

Hunter-gatherer genomic diversity suggests a southern African origin for modern humans

Brenna M. Henn et al.

Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the ≠Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by FST, in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.


March 07, 2011

Origin of Life from Outer Space ?

Overcoming Bias links to a paper and commentary about the possible discovery of cyanobacteria-like fossils in three meteorites. The paper, by Richard B. Hoover, Ph.D. NASA/Marshall Space Flight Center makes the argument that the discovered structures don't appear to be recent earthly contaminants.

I won't comment on the validity of the case, but I will give my €0.02 on the "Origin of Life" question, on which there are two main theories:
  • The view, popularized by Carl Sagan in Cosmos and also found in the Selfish Gene about life beginning with a single replicator molecule forming by chemical accident in primordial Earth, this molecule eventually leading (via evolution) to all extant life on Earth.
  • The idea of panspermia of Hoyle and Wickramasinghe that life is common in the universe, that it can travel through space in things like meteors, and that it arrived on Earth from space.
As a non-expert, I see that both ideas have their advantages:
  • The Earthly Origin idea is simple, as we know that life exists on Earth, so why posit its origin from elsewhere?
  • The Panspermia hypothesis is also simple in a Copernican principle sort of way: the universe is large, so why posit that life originated here just because it is found here?
How can we weigh the relative advantages of the two hypotheses? The Earthly Origin assumes that:
  1. The formation and survival of the first replicator on Earth is not too improbable, and it could have happened in less than a billion years between the formation of the Earth and the earliest attestation of life.
  2. Transport and survival through space of a replicator is improbable
If we think about it:
  • the probability that a replicator will emerge in a given volume of space increases with that volume and must approach unity for the entire universe (because life does exist)
  • The probability that it will reach Earth from a certain distance decreases with that distance.

So, it is a matter of weighing in the probabilities: a replicator may spontaneously form throughout the Universe, but how likely is it to spread from its point of origin across space and reach us?

The various commentaries on the paper make some interesting points, and I will add some of my own:
  1. If we could date the origin of earthly life (using some type of molecular clock) and find it to be older than the geological age of the Earth, that would favor panspermia. A commentator states that this is actually twice the age of the earth, but I am too skeptical of molecular clocks across such time scales to put much faith in that claim, and one math-less and one-under-review study are offered as evidence.
  2. If we found extra-terrestrial life that would also favor panspermia, provided we exclude the possibility of contamination; Hoover's paper presents evidence that this is the case.
  3. On the other hand it is possible that ancient-looking extra-terrestrial life fossils falling out of the sky in the 19th century could in fact be due the nostos of bacteria ejected from the Earth during its formative period billions of years ago.
  4. If life isn't found in the solar system then the argument for panspermia would be weakened, because presumably Earth is not the only world that could have been hit by life-bearing meteorites, nor the only world where it might flourish. So far, no extra-terrestrial life has been found in the solar system.
  5. If we could show the spontaneous generation of a replicator in the lab, that would be evidence for Earthly Origin; of course, no lab is large enough and no experiment long-running enough to statistically exclude this possibility.
  6. We could also show the perseverence of life through experiment, by e.g., sending a probe or commandeering an asteroid and setting it on a highly elliptical trajectory that would bring it back to Earth's vicinity in a century or so: seed it with bacteria and see what comes back during its next pass from Earth's vicinity. Again, this might show that the transport of life is possible across "short" distances, but not that it is possible across interstellar space.
I always carry a small basket when it comes to alien life claims from NASA-affiliated scientists, but it's certainly interesting to think about the Origin of Life, regardless of the fate of this particular paper (which I'm sure will be torn to pieces by skeptics, if it hasn't been already by the time you read this).

My personal opinion is that the idea of panspermia would be cool if true, but Fermi's Paradox is a bit difficult to reconcile with a life-filled universe, unless we accept that life is common but EM-transmitting intelligence not.

PS: The fact that the Journal of Cosmology seems to be a web-based effort certainly makes me a tiny bit suspicious, and a 10-min perusal of its contents does suggest that it publishes "unconventional" content. On the other hand, I like this bit from its editor who seems legit:
Official Statement from Dr. Rudy Schild,
Center for Astrophysics, Harvard-Smithsonian,
Editor-in-Chief, Journal of Cosmology.

Dr. Richard Hoover is a highly respected scientist and astrobiologist with a prestigious record of accomplishment at NASA. Given the controversial nature of his discovery, we have invited 100 experts and have issued a general invitation to over 5000 scientists from the scientific community to review the paper and to offer their critical analysis. Our intention is to publish the commentaries, both pro and con, alongside Dr. Hoover's paper. In this way, the paper will have received a thorough vetting, and all points of view can be presented. No other paper in the history of science has undergone such a thorough analysis, and no other scientific journal in the history of science has made such a profoundly important paper available to the scientific community, for comment, before it is published. We believe the best way to advance science, is to promote debate and discussion.
That certainly agrees with my general philosophy on peer review. Whatever the fate of the paper, putting it out there for criticism by all is a praiseworthy attitude and certainly a better guarantee of scrutiny than a handful of Nature or Science reviewers.

March 06, 2011

AAPA 2011 abstracts

A draft of the abstracts from the 80th meeting of the American Association of Physical Anthropologists is online. Some titles of interest:

Cristian Capelli et al.
Early Y chromosome lineages in Africa: the origin and dispersal of Homo sapiens.
The study of Y chromosome variation in extant populations has provided significant insights into the genetic history of Homo sapiens. Focusing on sub-Saharan Africa, demographic events associated with the spread of languages, agriculture and pastoralism have been targeted but little is known on the early history of the continent. The first two branches of the Y chromosome genealogy, namely haplogroup A and B, are African specific, with average continental frequencies of 14-34%, reaching up to 65% in groups of foragers . Despite the potential of such lineages in revealing signatures of the ancient peopling of the continent, an exhaustive investigation of their distribution and variation is currently missing. Here we show that their systematic dissection provides novel insights into the early history of our species. We highlighted complex pattern of populations’ dynamics among hunter-gatherer communities, evidence for the peopling of western and southern Africa, and showed the retention of the very early human Y chromosome lineages in eastern and central but not southern Africa. These results open new perspectives on the early African history of Homo sapiens, with particular attention to areas of the continent where human fossil remains and archaeological data are scanty.
Aslihan Sen et al.
The genetic history of the Karachays:Insights from mtDNA and Y-chromosome evidence
The Karachay-Malkar population of the northwestern Caucasus Mountains has an interesting but unclear history. Oral traditions indicate that they descended from the Alans, ancient Iranian tribes who entered the region starting in the 1st century BC. However, they now speak a Kipchak Turkic language, which was purportedly brought to the Caucasus by the Kumans from the Minusinsk Basin (Yenisei River-Altai Mountains). They are also allegedly related to the Hun-Bulgars, with the name Malkar/Balkar being evidence for this affiliation. Therefore, to elucidate their genetic past, we characterized genetic variation in 106 Karachay individuals using a combination of HVS1/ HVS2 sequencing and SNP analysis for mtDNAs and SNP and STR analysis for Y-chromosomes. We observed a predominance of mtDNA haplogroups H and U in this population, along with a minority of East Eurasian lineages, and mostly Y-chromosome haplogroups G, I, J and R1. The mtDNA data suggest that the Karachay are most similar to the Adygei, among Caucasus populations, and have affinities with eastern Iranians, supporting the hypothesized link to Scythio-Iranians (Alans), although being quite distant to Turkic speaking indigenous Altaians. By contrast, Y-chromosome data point to genetic links with populations from Anatolia, the Near East and the Balkans, as well as the Volga-Ural region, Central Asia and Siberia, the source area for ancient Turkic populations. Using these data and associated genealogical and linguistic evidence, we attempt to reconstruct the history of the Karachay population and assess its genetic relationships to the diverse ethnolinguistic groups of the Caucasus.

Jasem Theyab et al.
The genetic structure of the Kuwaiti population: mitochondrial DNA markers.
In the past few decades, researchers using human mitochondrial DNA (mt- DNA) have significantly contributed to our understanding of human evolution and migration. However, little attention has been paid to the Arabian Peninsula which is assumed to be one of the first inhabited regions following the expansion of early Homo sapiens out of Africa. Recently, a number of investigations have started to reconstruct human expansion through the archaeology and the study of the genetic structure of populations of the Arabian Peninsula. Populations of Kuwait, located in the Northeast portion of the Arabian Peninsula, have not been studied from a molecular genetic perspective. This research investigated the mitochondrial DNA (mtDNA) genetic variation in 117 unrelated individuals to determine the genetic structure of the Kuwaiti population and compared the Kuwaiti population to their neighboring populations. Restriction fragment length polymorphism (RFLP) and mt- DNA sequencing analyses were used to reconstruct the genetic structure of Kuwait. The result showed that the Kuwaiti population has a high frequency of haplogroup pre-HV (18%) and U (12%) similar to other Arabian populations. In addition, the African influence was detected through the presence of haplogroup L (1.6%). Furthermore, the MDS plot showed that the Kuwaiti population is clustered with neighboring populations, including Iran and Saudi Arabia, but not Iraq.

Kristin L. Young et al.
Paternal genetic history of the Basque population of Spain.
This study examines the genetic variation in Basque Y chromosome lineages using data on 12 Y-STR loci in a sample of 158 males from four Basque provinces of Spain. In agreement with previous studies, the Basques are characterized by high frequencies of haplogroup R1b (83%). Five additional haplogroups were identified in this sample: E1b1b (6%), J2a (3%), I2 (3%), G2a (2%), and L (1%). Only 8% of haplotypes were found in more than one province, and the AMOVA analysis shows only a small amount of variation (1.71%, p50.0369) is accounted for between provinces, demonstrating the overall homogeneity of this population. Gene and haplotype diversity levels in the Basques are on the low end of the European distribution (gene diversity: 0.4268; haplotype diversity: 0.9421). Other isolated populations in Europe, including the Swedish Saami, the Roma in Portugal, and Albanians in Kosovo, also exhibit low haplotype diversity levels. Comparison of the Garza-Williamson Index for the Basques and 36 additional European populations shows no significant impact of a recent genetic bottleneck on the continent. A bootstrapped neighbor-joining tree (R2 5 0.922) of Shriver’s genetic distances (DSW) clusters Basque populations with other Atlantic Fringe groups (Galicia, Ireland) and the non- Indo-European Saami. Paleolithic and Neolithic contribution to the paternal Basque gene pool was estimated by measuring the proportion of proposed Paleolithic (R1b, I2a2) and Neolithic haplogroups (E1b1b, G2a, J2a). The Basque provinces show varying degrees of post-Neolithic contribution in the paternal lineages, with 10.9% Neolithic lineages in the combined sample.

Timothy D. Weaver
Did a short-term event in the Middle Pleistocene give rise to modern humans?

It is often stated that modern humans originated 250,000-150,000 years ago. This statement implies, at least implicitly, that something ‘‘special’’ happened at this point in the Middle Pleistocene, such as a speciation event that was perhaps triggered by, or resulted in, a bottleneck in human population size. Two pieces of evidence are usually said to support this contention: that living human mitochondrial DNA haplotypes coalesce _200,000 years ago, and that fossil specimens classified as anatomically modern humans begin to appear shortly afterward. Alternatively, modern human origins could have been a lengthy process that lasted from the divergence of the modern human and Neandertal evolutionary lineages _400,000 years ago to the expansion of modern humans out of Africa _50,000 years ago, and nothing particularly ‘‘special’’ happened 250,000-150,000 years ago. Because this alternative model does not posit a discrete origins event, it may be better able to explain why [50,000-year-old fossils are arguably only ‘‘near modern’’ in anatomy. Here I use computer simulations based on theory from population and quantitative genetics to show that the alternative lengthy-process model also is consistent with a _200,000-year-old mitochondrial DNA coalescence time and the appearance shortly afterward of fossil specimens that, at least for some traits, appear to be anatomically modern. I further discuss how these two models differ in their predictions and whether or not it is possible to distinguish between them with current fossil and genetic evidence.
Steven L. Wang
Regional isolation and extinction? The story of mid-Pleistocene hominins in Asia.

Over the past decade, numerous reviews of the Middle Pleistocene record have taken place in light of new fossil discoveries. However, with primary foci on the Euro- African records, much of the rich fossil evidence in Asia was sidelined and overlooked. It is thus unsurprising that in the minds of many, Asia remains terra incognita— and its hominin record exotic. Moreover, the accuracy of the Asian chronology remains problematic, adding another layer of impediment to our understanding of regional evolution and local adaptation. In this context, I bring a synergistic review of the chronology of mid-Pleistocene hominins from East and South Asia, including recent new dates from key sites such as Zhoukoudian Locality 1 and Hathnora. Using 3-D geometric morphometric data, I examine cranial shape changes between H. erectus and mPH (post-erectus, non- Neandertal mid-Pleistocene Homo), as well as both to later Pleistocene hominins. A large number of not-often-discussed specimens are considered (e.g., Hexian, Nanjing 1, Maba, and Ngawi), many of them original fossils. The cranial anatomy from the Asian mid- Pleistocene suggests the existence of at least two distinctive groups in the region. Additionally, a north-south (geographical) shape difference is observed, hinting the presence of paleodemes each evolving in relative isolation. The shape affinity of mPH to extra-Asian fossils is confirmed; however, depending on the fossil in question (Dali or Narmada), the said affinity to Kabwe and Petralona is exclusive. This, coupled with a limited number of good sample, warrants caution against lumping all Asian mPH within the H. heidelbergensis hypodigm.

John Hawks
Deep genealogy, Neandertal ancestors, and our accelerating evolution
Anthropologists have long confused genealogical and behavioral definitions of humanity. At least five out of six living living humans have Neandertal ancestors, which comprise an estimated 1 to 4% of their ancestry. Human genes have divergent genealogical histories, representing multiple "archaic" populations inside and outside of Africa. Late Pleistocene populations show comparable technical and symbolic abilities within and outside of Africa. A humanlike vocal-auditory channel had appeared before 600,000 years ago. Yet humans of the last 40,000 years have evolved extremely rapidly, in some instances diversifying; in others paralleling each other. Using new visualization methods, I examine the genealogical patterns of human genes. The impact of our rapid Holocene evolution simplifies some genealogical relationships while partially obscuring earlier ones. The genetic echoes of Neandertals and other archaic populations emerge against a slim network binding all living people. These networks show the impact of adaptive potential in ancient human populations. A broad view of human cultural and technical records suggests that gene-culture interaction may be a fundamental aspect of Pleistocene human evolution.