August 30, 2010

When inclusive fitness is right and when it can be wrong (van Veelen 2010)

I've been doing a little reading to put the recent Nowak et al. paper in context, and I came across another interesting recent critique of kin selection and its limitations. This is a fascinating area of study, and the fact that theoretical arguments have continued so long since the introduction of the inclusive fitness concept tells me that it is perplexing even for experts.

In the interest of clearness, here is Figure 1 from the paper, which gives a nice graphical representation of the cost-benefit relationship.

In this figure: "c the net cost of the behaviour to the acting individual, and b the aggregated benefits to the others."

I'll keep on thinking about both papers before I write anything more on the subject.

Journal of Theoretical Biology
Volume 259, Issue 3, 7 August 2009, Pages 589-600

Group selection, kin selection, altruism and cooperation: When inclusive fitness is right and when it can be wrong

Matthijs van Veelen

Group selection theory has a history of controversy. After a period of being in disrepute, models of group selection have regained some ground, but not without a renewed debate over their importance as a theoretical tool. In this paper I offer a simple framework for models of the evolution of altruism and cooperation that allows us to see how and to what extent both a classification with and one without group selection terminology are insightful ways of looking at the same models. Apart from this dualistic view, this paper contains a result that states that inclusive fitness correctly predicts the direction of selection for one class of models, represented by linear public goods games. Equally important is that this result has a flip side: there is a more general, but still very realistic class of models, including models with synergies, for which it is not possible to summarize their predictions on the basis of an evaluation of inclusive fitness.


August 28, 2010

Y-chromosome haplogroup I and heart disease

There was a recent study on AIDS progression and Y-chromosome haplogroups. There is a new one on haplogroup I and coronary heart disease. I haven't tracked down the article yet, but here is the press release from the European Society of Cardiology:
Scientists in the UK have shown that genetic variations in the Y chromosome affect a male’s risk of coronary heart disease. It is well known that males have a higher incidence of coronary heart disease than females due, in part, to the Y chromosome they inherit from their fathers. To investigate the role of the Y chromosome further, a team from the University of Leicester carried out research to determine whether genetic variations in the Y-chromosome affect risk for males.

Not all Y chromosomes are the same. There are variants within the male gender called “Y-haplogroups”, which are usually associated with specific geographic regions and tend to indicate the origin of the ancestral line. Professor Nilesh Samani explains the background to the project that was funded by the British Heart Foundation, “We set out to determine if men with differing types of Y chromosome were at differing risk of heart disease. We tested nearly 3,000 British males, and found that those carrying the I-haplogroup variant had a 55 percent higher risk of coronary heart disease.”

Of the 3,000 men tested, 1,295 were the cohort group of those with coronary heart disease and the rest were the control group. The Y-haplogroup was identified in all men, and the results showed that those in the I-haplogroup had an approximately 55 percent higher risk of coronary heart disease compared to the others. The association of the I-haplogroup with coronary heart disease was independent of, and not explained by, traditional heart risk factors such as cholesterol, high blood pressure and smoking.

Commonly found in central, eastern and northern Europe, the I-haplogroup is carried by about 13 percent of British men. Its origin is thought to be of the Gravettian culture, which arrived in Europe from the Middle East about 25,000 years ago. Since the I-haplogroup is not so prevalent in southern parts of Europe, an interesting speculation is whether it contributes to the higher levels of coronary heart disease in the north compared to the south – however, this requires further research and testing.

What is clear from this study though, is that men carrying the I-haplogroup are more likely to suffer from coronary heart disease than men with other Y-haplogroups.

August 26, 2010

Analysis of Ashkenazi Jewish genomes (Bray et al. 2010)

The paper hasn't gone live at the PNAS site as of this writing, but here is part of the press release. The abstract and my comments on the paper will be posted here (after I get through the zillion other interesting papers that the last week of August seems to have brought us):
Investigators in the laboratory of Stephen Warren, PhD, chairman of human genetics at Emory University School of Medicine, used DNA microarray technology to read variant sites across the entire genomes of 471 Ashkenazi Jews. The work comes from a collaboration between Warren and Ann Pulver, ScD, associate professor of psychiatry and behavioral sciences at Johns Hopkins University School of Medicine, who recruited the participants for a study of schizophrenia genetics.

Researchers looked for close to one million single nucleotide polymorphisms (SNPs): common alternative spellings in the genome, analogous to American and British spellings of words such as organize/organise. One measure of genetic diversity in a population is heterozygosity, or how many of the SNPs inherited from the mother and father are different; a more inbred population has less heterozygosity.

"We were surprised to find evidence that Ashkenazi Jews have higher heterozygosity than Europeans, contradicting the widely-held presumption that they have been a largely isolated group," says first author Steven Bray, PhD, a postdoctoral fellow in Warren's laboratory.


High linkage disequilibrium can come either from an isolated population (for example, an island whose residents are all descendents of shipwreck survivors) or the relatively recent mixture of separate populations. Bray and his colleagues did find evidence of elevated linkage disequilibrium in the Ashkenazi Jewish population, but were able to show that this matches signs of interbreeding or "admixture" between Middle Eastern and European populations.

The researchers were able to estimate that between 35 and 55 percent of the modern Ashkenazi genome comes from European descent.

"Our study represents the largest cohort of Ashkenazi Jews examined to date with such a high density of genetic markers, and our estimate of admixture is considerably higher than previous estimates that used the Y chromosome to calculate European admixture at between five and 23 percent," Bray says.


"Only six of the 21 disease genes that we examined showed evidence of selection," Bray says. "This supports the argument that most of the Ashkenazi-prevalent diseases are not generally being selected for, but instead are likely a result of a genetic bottleneck effect, followed by random drift."
The new paper comes in the heels of two other papers by Behar et al. and Atzmon et al. which considered Jews in general, discovering additional clusters of Jews that were distinct from Ashkenazi Jews. As I have argued in my review of these papers, the different clusters are not the result of isolation, as the different groups of Jews do not only deviate from each other, but also in the direction of their host populations. It would be worthwhile to perform similar admixture analyses on non-AJ populations to determine what their influence from host populations is. With a little effort it would be possible to reconstruct the ancestral Jewish population, by identifying what is common in the different Jewish populations.

UPDATE: The paper is now online and is open access.

From the paper:
The fixation index, FST, calculated concurrently to the PCA, confirms that there is a closer relationship between the AJ and several European populations (Tuscans, Italians, and French) than between the AJ and Middle Eastern populations (Fig. S2B). This finding can be visualized with a phylogenetic tree built using the FST data (Fig. S2C), showing that the AJ population branches with the Europeans and not Middle Easterners. Two recent studies performing PCA and population clustering with high-density SNP genotyping from many Jewish Diaspora populations, both showed that of the Jewish populations, the Ashkenazi consistently cluster closest to Europeans (13, 25). Genetic distances calculated by both groups also show that the Ashkenazi are more closely related to some host Europeans than to the ancestral Levant (13, 25). Although the proximity of the AJ and Italian populations could be explained by their admixture prior to the Ashkenazi settlement in Central Europe (13), it should be noted that different demographic models may potentially yield similar principal component projections (33); thus, it is also consistent that the projection of the AJ populations is primarily the outcome of admixture with Central and Eastern European hosts that coincidentally shift them closer to Italians along principle component axes relative to Middle Easterners. Taken as a whole, our results, along with those from previous studies, support the model of a Middle Eastern origin of the AJ population followed by subsequent admixture with host Europeans or populations more similar to Europeans. Our data further imply that modern Ashkenazi Jews are perhaps even more similar with Europeans than Middle Easterners.
The bolded part reminds me of what I wrote in my review of Atzmon et al. regarding the choice of parental populations and how they affect admixture estimates. The "Middle Eastern" component estimate will increase if central and eastern Europeans are used as representative of the European admixture, while the "European" estimate will increase if Italians are used. But, the same applies to the other end of the continuum: if ancestral Jews were indeed like current Middle Easterners such as the Druze or Palestinians, but the latter may have moved (in genetic space) away from ancient Levantines due to subsequent admixture (Arabs, and in the case of Palestinians even Africans): this would reduce the inferred Middle Eastern component.

Estimating admixture percentages in the absence of clear knowledge about parental populations is no easy thing, but the intermediate-leaning-on-Europe status of AJ relative to living Europeans and living Middle Easterners seems to be a pretty secure conclusion.

PNAS doi: 10.1073/pnas.1004381107

Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population

Steven M. Bray et al.

The Ashkenazi Jewish (AJ) population has long been viewed as a genetic isolate, yet it is still unclear how population bottlenecks, admixture, or positive selection contribute to its genetic structure. Here we analyzed a large AJ cohort and found higher linkage disequilibrium (LD) and identity-by-descent relative to Europeans, as expected for an isolate. However, paradoxically we also found higher genetic diversity, a sign of an older or more admixed population but not of a long-term isolate. Recent reports have reaffirmed that the AJ population has a common Middle Eastern origin with other Jewish Diaspora populations, but also suggest that the AJ population, compared with other Jews, has had the most European admixture. Our analysis indeed revealed higher European admixture than predicted from previous Y-chromosome analyses. Moreover, we also show that admixture directly correlates with high LD, suggesting that admixture has increased both genetic diversity and LD in the AJ population. Additionally, we applied extended haplotype tests to determine whether positive selection can account for the level of AJ-prevalent diseases. We identified genomic regions under selection that account for lactose and alcohol tolerance, and although we found evidence for positive selection at some AJ-prevalent disease loci, the higher incidence of the majority of these diseases is likely the result of genetic drift following a bottleneck. Thus, the AJ population shows evidence of past founding events; however, admixture and selection have also strongly influenced its current genetic makeup.


More structure in haplogroup R1b (Cruciani et al. 2010)

The stack of exciting papers to read keeps growing. Good things come in pairs for R1b folk.

Forensic Science International: Genetics doi:10.1016/j.fsigen.2010.07.006

Strong intra- and inter-continental differentiation revealed by Y chromosome SNPs M269, U106 and U152

Fulvio Cruciani et al.

More than 2700 unrelated individuals from Europe, northern Africa and western Asia were analyzed for the marker M269, which defines the Y chromosome haplogroup R1b1b2. A total of 593 subjects belonging to this haplogroup were identified and further analyzed for two SNPs, U106 and U152, which define haplogroups R1b1b2g and R1b1b2h, respectively. These haplogroups showed quite different frequency distribution patterns within Europe, with frequency peaks in northern Europe (R1b1b2g) and northern Italy/France (R1b1b2h).


August 25, 2010

Kin selection dead?

I had blogged about E.O. Wilson's change of mind about group selection recently, in which it was hinted that he no longer thought that Hamilton's kin selection was a major evolutionary force. Now, a new paper has been published in Nature by Nowak, Tarnita, and Wilson which formalizes the argument.

An accompanying story in Nature gives a high-level overview of the paper:
Kin selection is based on 'inclusive fitness', the idea that, for example, sterile workers can accrue reproductive benefits by helping their relatives. In doing so, they help shared genes to survive and get passed on to the next generation. This provides a route for eusociality to evolve.

But Martin Nowak, a mathematical biologist at Harvard University in Cambridge, Massachusetts, and the lead author of the analysis, says, "there is no need for inclusive fitness to explain eusociality".

Nowak and his team provide the first mathematical analysis of inclusive fitness theory. They calculated which of two behaviours, for example defection — such as going off to set up a separate colony — or cooperation would become more prevalent in a population if standard natural selection was at work. They then worked out what assumptions would be needed for inclusive fitness theory to deliver the same result.

The team discovered that inclusive fitness delivers the same result only in a limited set of specific situations that would rarely hold in reality. For example, inclusive fitness worked only if the two behaviours were very similar — so that the pressure to select one over the other is vanishingly small — and if just two individuals were interacting at one time.

And when the inclusive fitness theory worked, the answer that it provided was mathematically equivalent to that derived from standard natural selection.

"We show that inclusive fitness is not a general theory of evolution as its proponents had claimed," says Nowak. "In the limited domain where inclusive fitness theory does work, it is identical to standard natural selection. Hence there is no need for inclusive fitness. It has no explanatory power."

In a second mathematical analysis, the team investigated how eusociality could evolve through standard natural selection. They found that a gene for eusociality could spread readily as long as the advantages it confers — increasing the lifespan and reproductive success of the queen — kick in even for small colonies. So colonies that have as few as two or three workers must provide significant advantages to their queen for the gene and the behaviour to become widespread.

"Whether or not eusociality evolves depends on how colony size affects the mortality and fecundity of the queen," says Nowak. "Our model also shows that eusociality is hard to evolve but is very stable once it is established."
Going against four decades of theory is no joke, so I will have to think about the significance of this new paper.

UPDATE (Aug 26):

From the press release:
Eusociality is rare, but important in evolutionary biology because the few species that adhere to it -- including social insects and, to an extent, humans -- rank among the planet's most dominant. The biomass of ants alone composes more than half that of all insects, exceeding that of all terrestrial nonhuman vertebrates combined. Humans, who are more loosely eusocial, dominate land vertebrates. Eusociality has arisen independently some 10 to 20 times in the course of evolution," says Tarnita, a junior fellow in Harvard's Society of Fellows. "Our model shows that it is difficult to get eusociality in the first place, but that it is very stable once it is established. A colony behaves like a 'superorganism,' reproducing the genome of the queen and the sperm she has stored."
Nowak, Tarnita, and Wilson's proposal on eusocial evolution sketches out three distinct steps species can take to sidestep eusociality's evolutionary cost:
  • First, species must form groups within a population, such as when nests or food attract individuals to discrete locations some distance apart, when parents and offspring remain together, or when migrating flocks follow leaders.
  • Second, species must accumulate traits, arising through ordinary natural selection, that favor the switch to eusociality. For instance, Ceratina and Lasioglossum bees, which appear perched on the cusp of eusociality, cooperate in foraging, tunneling, and guarding resources. Another such pre-adaptation is progressive provisioning, in which a female builds a nest, lays an egg in it, and then feeds or guards larvae until they mature. Most importantly, the candidate species must build a defensible nest.
  • Finally, individuals must develop genes supporting eusociality, whether by mutation or recombination. Crossing the threshold to eusociality essentially requires that a female and her offspring not disperse to start new, individual nests, but rather remain at the old nest. While eusocial genes have yet to be identified, at least two eusocial ant species are known to have genes that quell the urge to roam from the nest.

If these steps are followed and a species becomes eusocial, the evolutionary costs of individuals foregoing reproduction are compensated by the greatly reduced mortality of the queen and her larvae, which are protected by the colony. In some ant species, a queen that might live for only a few months if alone can live for 25 years or more as part of a colony, producing millions of offspring in the process.

Nature 466, 1057-1062 (26 August 2010) | doi:10.1038/nature09205; Received 10 March 2010; Accepted 26 May 2010

The evolution of eusociality

Martin A. Nowak, Corina E. Tarnita & Edward O. Wilson


Eusociality, in which some individuals reduce their own lifetime reproductive potential to raise the offspring of others, underlies the most advanced forms of social organization and the ecologically dominant role of social insects and humans. For the past four decades kin selection theory, based on the concept of inclusive fitness, has been the major theoretical attempt to explain the evolution of eusociality. Here we show the limitations of this approach. We argue that standard natural selection theory in the context of precise models of population structure represents a simpler and superior approach, allows the evaluation of multiple competing hypotheses, and provides an exact framework for interpreting empirical observations.


R1b founder effect in Central and Western Europe

Post will be updated after I read the paper. (Last Update: Aug. 29)


From the paper:
The ages of various haplogroups in populations were estimated using the
methodology described by Zhivotovsky et al,30 modified according to Sengupta
et al,10 using the evolutionary effective mutation rate of 6.9 x 10^-4 per 25 years.
The accuracy and appropriateness of this mutation rate has been independently
confirmed in several deep-rooted pedigrees of the Hutterites.
Of course readers of the blog are aware that I disagree with the use of the evolutionary rate. My comments on the Hutterites paper will be posted separately after I read it. I will simply say that there are numerous cases where the use of the genealogical rate makes better sense of the evidence than use of the "evolutionary" rate. Off the top of my head, the genealogical rate harmonizes with the Genghis Khan cluster, the expansion of Na-Dene speakers into the Americas, the expansion of Balto-Slavic, the Bronze Age spread of Semitic speakers, in accordance with the linguistic evidence, the expansion of Bantu in Angola, more recent British surnames, the formation of Arabian kingdoms, Greek colonization of Sicily, and the Bronze Age origin of Indo-Aryans and Finno-Ugrians (and I skipped a few).

UPDATE II (Aug 26):

Here is the phylogeny of R-M207 from the paper. For reference, the R-M207 page from ISOGG.

UPDATE III (Aug 26):

Going through the material in this paper in a systematic manner is not easy, so I will probably do a potpourri of updates covering various topics of interest.

As noted in the other recent paper, and shown in the above Figure from the current one, R-U106 peaks in northern Europe. Its frequency (including the R-U198 sublineage) is 36.8% in the Netherlands, 20.9% in Germany and Austria, 18.2% in Denmark, 18.2% in England, 12.6% in Switzerland, 7.5% in France, 6.1% in Ireland, 5.9% in Poland, 5.6% in north Italy 4.4% in Czech Republic and Slovakia, 3.5% in Hungary, 4.8% in Estonia, 4.3% in south Sweden, 2.5% in Spain and Portugal, 1.3% in eastern Slavs, 0.8% in south Italy, 0.6% in Balkan Slavs, 0.5% in Greeks (i.e. 2 of 193 Cretans, and no mainland Greeks), 0.4% in Turks, 0% in Middle East.

The age of R-U106 is estimated by the authors as 8.7ky BP, which translates to about 2.5ky BP with the germline rate. The existence of R-U106 as a major lineage within the Germanic group is self-evident, as Germanic populations have a higher frequency against all their neighbors (Romance, Irish, Slavs, Finns). Indeed, highest frequencies are attained in the Germanic countries, followed by countries where Germanic speakers are known to have settled in large numbers but to have ultimately been absorbed or fled (such as Ireland, north Italy, and the lands of the Austro-Hungarian empire). South Italy, the Balkans, and West Asia are areas of the world where no Germanic settlement of any importance is attested, and correspondingly R-U106 shrinks to near-zero.

UPDATE IV (Aug 26):

Another informative lineage, as noted in the other recent paper as well is R-U152:

Of interest is the fact that while
R-U152 has a clear French-Italian center of weight, the locations exhibiting highest STR variance are Germany and Slovakia, i.e., Central Europe. My guess is that R-U152 originated in Central Europe spreading to the west and south, perhaps with Italo-Celtic speakers or some subset thereof. In its home territory of Central Europe, its frequency decreased by the introduction of the Germanic and Slavic speaking elements which dominate the region.

Irrespective of what the ultimate origin of R-U152 is, it provides us with a good diagnostic marker for population movements out of the French-Italian area. In Italy for example it is noted at 26.6% for the north and 10.5% in the south. It would be extremely interesting to see its occurrence in Balkan Vlachs, as this would confirm/disprove the Italian component in their origin. However, R-U152 occurs in 7.3% of Cretans, suggesting introgression Y-chromosomes of North Italian (Venetian) origin, from the 4-century period of Venetian rule of the island. It also occurs in 4.1% of Greeks, where it might come from any period since the Roman annexation of the Hellenistic states to the Vlachs. However, its presence at only 1.8% of Romanians makes a large Italian contribution to the Romanian population unlikely. Balkan R-U152 chromosomes should be better resolved to determine when they arrived from the northwest.

The paucity of R-U152 in Turks (0.6%) make tales of wandering Galatians less likely to be true. There is no doubt that Galatians settled in Anatolia, but they were probably so few in numbers that they did not permanently alter the population. Knowledgeable readers should chime in about the Lebanese Christian R1b which was posited as a signature of the Crusades a couple of years ago, and its position in the phylogeny.

UPDATE V (Aug 26):

The most commong R1b subgroup in Europe is R-M269 and the most common subgroup is R-L23 which encompasses the vast majority of European R-M269 chromosomes. It is interesting to see where R-M269(xL23) is concentrated. In Europe I see cases in Germany, Switzerland, Slovenia, Poland, Hungary, Russia, the Ukraine. It is most prominent, however, in the Balkans, where every population except Croatia mainland (N=108) possesses it. In the Caucasus it does not exist except in the northeast. In Turkey and Iran there is some, albeit it is not clear in which regions.

UPDATE VI (Aug 27):

The authors write with respect to haplogroup R-V88:
With the exception of rareincidences of R1b-V88 in Corsica, Sardinia13 and Southern France(Supplementary Table S4), there is nearly mutually exclusive patterning of V88 across trans-Saharan Africa vs the prominence of P297-related varieties widespread across the Caucasus, Circum-Uralic regions, Anatolia and Europe. The detection of V88 in Iran, Palestine and especially the Dead Sea, Jordan (Supplementary Table S4) provides an insight into the back to Africa migration route.
Haplogroup R-V88 has been the subject of a recent study and was associated with the migration of Chadic speakers in Africa. It is difficult to say whether or not the authors' results really provide any insight into an alleged movement of this haplogroup from Asia to Africa, as it occurs in only a single Palestinian, and a single Iranian. Neither is the higher frequency (13.7%) observed in the Amman and Dead Sea area of Jordan really evidence of its antiquity there.

Neither the aforementioned paper nor the current one presents any evidence (e.g., Y-STR variance) for any great antiquity of the Asian R-V88 with respect to the African one. Indeed, with the exception of the aforementioned Jordanian sample, R-V88 is rare in Asia, while it is widespread in African Berbers. I see no clear reason at present to think that it migrated to Africa from Asia, and not to think of it as a relic of an older, widely dispersed R1b population leading to R-V88 in Africa itself.

UPDATE VII (Aug 28):

The paper repeats the standard claims about the origin of R1b and its main sublineage R-M269 in Asia, but presents no new information that would support this claim. With the state of the evidence, I see no real reason to prefer a West Asian to a Southeastern European origin for this haplogroup.

I don't give much credence to small differences in Y-STR variance, due to the large confidence intervals associated with such estimates, and it is interesting that the authors do not present an argument from Y-STR variation about the origin of R1b, preferring to make broad statements about Mesolithic-Neolithic movements into Europe.

A study of supplementary table S2 which gives coalescent times reveals that there is no clear pattern of greater Asian diversity within haplogroup R1b or its subclades. And, while Central-Western Europe does appear to be an outgrowth of R1b rather than a place of origin (with the dominance of derived R-M412 lineages) there is nothing in the paper that would make one prefer West Asia to Southeastern Europe as a place of origin.

Personally I think the issue cannot be settled yet, but there are reasons to prefer the latter option. An Asian origin of R1b has a major parsimony hurdle: it would require a seemingly directed drang nach westen for R1b, into Europe, and into North Africa, with a paucity of R1b in the opposite direction (among Arabians and to the south and in South Asia) and a scattering of very young R-M73 and R-M269 to the east of Europe.


R-S116 shows maximum Y-STR diversity in France and Germany but maximum frequency in Iberia and the British Isles. In the latter region it is represented mainly by R-M529 with the R-M222 subclade being particularly prominent in Ireland but also North England. It would be interesting to see data for Scotland, and I do not doubt that R-M222 would be prominent there as well. R-S116 also shows signs of being a Celtic, or Celtiberian-related lineage.

European Journal of Human Genetics doi: 10.1038/ejhg.2010.146

A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe

Natalie M Myres et al.

The phylogenetic relationships of numerous branches within the core Y-chromosome haplogroup R-M207 support a West Asian origin of haplogroup R1b, its initial differentiation there followed by a rapid spread of one of its sub-clades carrying the M269 mutation to Europe. Here, we present phylogeographically resolved data for 2043 M269-derived Y-chromosomes from 118 West Asian and European populations assessed for the M412 SNP that largely separates the majority of Central and West European R1b lineages from those observed in Eastern Europe, the Circum-Uralic region, the Near East, the Caucasus and Pakistan. Within the M412 dichotomy, the major S116 sub-clade shows a frequency peak in the upper Danube basin and Paris area with declining frequency toward Italy, Iberia, Southern France and British Isles. Although this frequency pattern closely approximates the spread of the Linearbandkeramik (LBK), Neolithic culture, an advent leading to a number of pre-historic cultural developments during the past ≤10 thousand years, more complex pre-Neolithic scenarios remain possible for the L23(xM412) components in Southeast Europe and elsewhere.


Novel component in the genetic structure of sub-Saharan Africans

Post will be updated after I read the paper. (Last Update Aug 29)

UPDATE I (Aug 29):

Here is the PCA plot with the full marker set:

It is the familiar V shape, with Caucasoids at the bottom, and Mongoloids/Negroids at the left/right tips. Notice the variable position of Middle Eastern Arabs (brown) which corresponds to their variable African affiliation, and the heterogeneous group of populations between Caucasoids and Mongoloids, which includes mainly South-Central Asians but also a couple of (yellow) clusters of Papuan-Melanesians.

It is useful to zoom in on the African portion of the plot: notice the circles and diamonds for African Americans and Maasai that are tilted toward Caucasoids, while the crosses and squares for Pygmies are further removed near the top right.

European Journal of Human Genetics doi: 10.1038/ejhg.2010.141

A genomic analysis identifies a novel component in the genetic structure of sub-Saharan African populations

Martin Sikora et al.


Studies of large sets of single nucleotide polymorphism (SNP) data have proven to be a powerful tool in the analysis of the genetic structure of human populations. In this work, we analyze genotyping data for 2841 SNPs in 12 sub-Saharan African populations, including a previously unsampled region of southeastern Africa (Mozambique). We show that robust results in a world-wide perspective can be obtained when analyzing only 1000 SNPs. Our main results both confirm the results of previous studies, and show new and interesting features in sub-Saharan African genetic complexity. There is a strong differentiation of Nilo-Saharans, much beyond what would be expected by geography. Hunter-gatherer populations (Khoisan and Pygmies) show a clear distinctiveness with very intrinsic Pygmy (and not only Khoisan) genetic features. Populations of the West Africa present an unexpected similarity among them, possibly the result of a population expansion. Finally, we find a strong differentiation of the southeastern Bantu population from Mozambique, which suggests an assimilation of a pre-Bantu substrate by Bantu speakers in the region.


August 24, 2010

Social selection in Y-chromosome haplogroup C3 clusters

There was another recent paper on Y-chromosome haplogroup C recently. The authors of the current paper also present dates with both the evolutionary and genealogical mutation rate. They write:
The age of accumulated STR variation within hg C3, estimated using the method of Zhivotovsky et al. (2004), is about 14.9 ky or 4.1 ky depending on the mutation rate values selected for calculations (Table 3). The older time estimate is most compatible with the view that hg C3 haplotypes were present in Siberia during the Last Glacial Maximum from where the ancestors of C3b Native Americans migrated to the Beringia (Karafet et al., 2002; Zegura et al., 2004).
However, as I noted in my review of the earlier paper:
A case in point is haplogroup C3b-P39; according to the authors' date, this ought to be related to the early arrival of the ancestors of Amerindians, but haplogroup C in the Americans has a strong relationship with Na-Dene speakers such as Athapaskans, and it seems to me that a late spread of this haplogroup is more consistent with its limited geographical distribution and strong linguistic associations.
If C3 had spread into the Americas "early" (together with the other main haplogroup, Q), then we would expect to see it today all over the Americas (perhaps lost here and there due to drift in small populations), and in all language groups. However, this is not what we see. Hence, I am inclined to believe in the more recent spread of C3 into the Americas, together with Na-Dene speakers.

From the paper:
The median joining network of subcluster C3c appears to be complex, with several common haplotypes present in different populations (Fig. 2). Our analysis revealed that the age of this subcluster is about 5.9 ky or 1.6 ky, whereas the age of subcluster C3d appears to be younger – about 2.0 ky or 0.5 ky, depending on the mutation rate values selected.
Haplogroup C3c dominates Kalmyks, Evenks, Evens, i.e. Mongolic-Tungusic populations; C3d in Mongols, Buryats, Khamnigans (Mongolic-speakers).

As I noted in the Cohen Modal Haplotype paper, age estimation must harmonize with both the observed Y-STR variation and known demography. The key questions are: has there been enough time for haplogroup ? to accumulate so much variation, and to grow to such a population size?

Younger haplogroup ages often hit a stumbling block in trying to explain demography (hence my reservations on the CMH paper). Yet, it is much easier to consider massive demographic growth/social selection in Mongols and associated peoples, as the evidence for that growth and expansion is a part of history, and it also harmonizes with what we know about nomadic peoples.

In any case, the real age could be different than the point estimates, both due to the limitations of Y-STR markers, as well as the potential presence of outliers, in which the recently expanded group within a haplogroup dominates the age estimate.

The authors also deal with the Genghis Khan "star cluster" which is part of the paragroup C3*. Here is what they have to say:
It is suggested that this subcluster appears to have originated in Mongolia about 1 ky ago, taking into account the genealogical mutation rate (Zerjal et al., 2003). Our present and previous (Derenko et al., 2007b) studies have shown that the highest frequency of the “star cluster” in C3∗ is observed in Mongols (35%), whereas in Siberia it varies from 8% in Altaian Kazakhs and 6.5% in Buryats to less than 3% in Tuvinians, Altaians and Shors (Table S2). According to our data, the age of the “star cluster” in C3∗ is 2.8 ± 1.0 or 0.78 ± 0.27 ky, based on the evolutionary and genealogical mutation rates, respectively.
Genghis Khan flourished about 0.8ky. If we accept (as I do) the genealogical rate as closer to the truth, then the "star cluster" is related to Genghis and his male relatives, otherwise it is a completely unrelated phenomenon.

Annals of Human Genetics DOI: 10.1111/j.1469-1809.2010.00601.x

Phylogeography of the Y-chromosome haplogroup C in northern Eurasia

Boris Malyarchuk et al.

To reconstruct the phylogenetic structure of Y-chromosome haplogroup (hg) C in populations of northern Eurasia, we have analyzed the diversity of microsatellite (STR) loci in a total sample of 413 males from 18 ethnic groups of Siberia, Eastern Asia and Eastern Europe. Analysis of SNP markers revealed that all Y-chromosomes studied belong to hg C3 and its subhaplogroups C3c and C3d, although some populations (such as Mongols and Koryaks) demonstrate a relatively high input (more than 30%) of yet unidentified C3* haplotypes. Median joining network analysis of STR haplotypes demonstrates that Y-chromosome gene pools of populations studied are characterized by the presence of DNA clusters originating from a limited number of frequent founder haplotypes. These are subhaplogroup C3d characteristic for Mongolic-speaking populations, “star cluster” in C3* paragroup, and a set of DYS19 duplicated C3c Y-chromosomes. All these DNA clusters show relatively recent coalescent times (less than 3000 years), so it is probable that founder effects, including social selection resulting in high male fertility associated with a limited number of paternal lineages, may explain the observed distribution of hg C3 lineages.


Hitler's "Jewish and African" roots

I'm not going to honor with the link all the stories circulating in a number of newspapers about Hitler being supposedly of "Jewish or African" origin because he belonged to haplogroup E1b1b1.

This comes from a study of men believed to share patrilineal origin with Hitler.

It is difficult to see how much of the hype is due to the original geneticists or to the "journalists" who report on the work.

These "Jewish and African" roots are supposedly due to the fact that Hitler belonged to haplogroup E1b1b1. But, without further information about the subclade in which Hitler belonged to, there is no reason to think that he was of Jewish or African ancestry. He could just as well be of Greek or Albanian patrilineal ancestry. But, I guess that "Hitler's Greek or Albanian roots" doesn't have the same zing that his "Jewish and African roots" does.

What do we know? That Hitler may have had distant Y-chromosome E1b1b1 cousins that wouldn't be considered Aryan in Nazi Germany. But, this would be true no matter what haplogroup he had (except I, perhaps).

If the researchers had found evidence of a recent "non-Aryan" ancestor that would disqualify Hitler from possessing Aryan status according to his race laws, then that would be an interesting and ironic discovery. Or, if they had found that he belonged to a subclade such as E-M81 (found in Berbers and other north Africans) that has most plausibly introgressed into Europe in historical times.

However, the available public evidence is that Hitler may have belonged to a haplogroup that isn't as common in Germany as e.g., in the Balkans, just as some people from the Balkans belong to haplogroups that are more common in Germany, or some Jews belong to haplogroups that are more common in Europe, and so on.

August 23, 2010

Geographic origin of Europeans with ancestry informative markers (Drineas et al. 2010)

The latitudinal/longitudinal error for leave-one out validation is shown on the left. This involves using N-1 out of N samples to build an estimator, and then guessing the longitude/latitude of the Nth sample that was not included in the estimator.

The authors make a point that latitudinal error is smaller than longitudinal error. However, we should keep in mind that the "rectangle" of the sampled populations (east-west limits: Portugal-Serbia) does not approach a "square", so the relative error (absolute longitudinal error/longitudinal extent) is not that different.

PLoS ONE doi:10.1371/journal.pone.0011892

Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers

Petros Drineas, Jamey Lewis, Peristera Paschou

Recent large-scale studies of European populations have demonstrated the existence of population genetic structure within Europe and the potential to accurately infer individual ancestry when information from hundreds of thousands of genetic markers is used. In fact, when genomewide genetic variation of European populations is projected down to a two-dimensional Principal Components Analysis plot, a surprising correlation with actual geographic coordinates of self-reported ancestry has been reported. This substructure can hamper the search of susceptibility genes for common complex disorders leading to spurious correlations. The identification of genetic markers that can correct for population stratification becomes therefore of paramount importance. Analyzing 1,200 individuals from 11 populations genotyped for more than 500,000 SNPs (Population Reference Sample), we present a systematic exploration of the extent to which geographic coordinates of origin within Europe can be predicted, with small panels of SNPs. Markers are selected to correlate with the top principal components of the dataset, as we have previously demonstrated. Performing thorough cross-validation experiments we show that it is indeed possible to predict individual ancestry within Europe down to a few hundred kilometers from actual individual origin, using information from carefully selected panels of 500 or 1,000 SNPs. Furthermore, we show that these panels can be used to correctly assign the HapMap Phase 3 European populations to their geographic origin. The SNPs that we propose can prove extremely useful in a variety of different settings, such as stratification correction or genetic ancestry testing, and the study of the history of European populations.


August 19, 2010

Which population is most genetically distant from Africans?

Razib had a post yesterday on the question of which population is more distant from Africans. In that post he concluded that:

That means that those of us of non-African ancestry are all equally distant from the African root.
I was planning to rebut this conclusion, but Razib beat me to the punch by correcting himself that Amerindians are more distant from Africans. First, let's deal with the original post:

He based this conclusion on some observations on a recent study:

The tree makes it clear: all non-Africans form their own independent branch from Africans. In the PCA you see that along the biggest component of variation in the genetic data the non-African groups are about the same distance from Africans. And in the ADMIXTURE analysis when you assume four ancestral populations, the Africans and non-Africans separate out cleanly excluding groups which a high likelihood of European or Arab admixture.
Neither PCA nor ADMIXTURE tells us anything about who is more distant from Africans. First of all, the clusters identified by ADMIXTURE are not phylogenetic units, nor is the pattern of splits at successive K evidence for the human phylogeny. For example, at K=2, East Eurasians split from Europeans/Africans, even though it is clear that East Eurasians do not represent a sister clade to a clade comprising of Europeans/Africans. In short ADMIXTURE (a well as frappe, STRUCTURE and assorted Bayesian clustering methods) tell us nothing about who is more distant from whom.

Nor does PCA tell us anything more. Sure at the first 2 dimensions, Africans split off from non-Africans but that is insufficient to conclude that Africans are equidistant to all Eurasians. This is due to the fact that distance at the first two dimensions of PCA lower bounds total distance (a consequence of the Pythagorean theorem, no less). What does this mean? Populations that are equidistant on the (PC1, PC2) plane may be actually non-equidistant overall.

Finally, the tree argument provides evidence for Out of Africa, but it does not provide evidence for equidistance, as it not only encodes phylogenetic relationships (who is closer to whom in evolutionary terms) but also distance. It is perfectly possible that two taxa A and B may form a clade relative to a third taxon C, but that does not guarantee that A will be closer to B than to C.

As Razib notes in his newer post, it is actually Amerindians who are most distant from Africans. Here is part of a table of Fst values that I was planning to use to make that point (he uses a different one, but all studies pretty much tell the same story). It's from the recent Korean study:

YRI GU 0.118332
YRI GJ 0.119501
YRI JC 0.1192
YRI YC 0.118593
YRI PC 0.118986
YRI GR 0.119158
YRI JJ 0.11892
YRI NJ 0.118754
YRI US 0.119086
YRI CA 0.11903
YRI VN 0.116789
YRI CB 0.108875
YRI JL 0.117513
YRI MH 0.109053
YRI CHB 0.117318
YRI JPT 0.118053
YRI AI 0.142997
YRI CEU 0.101197

Note that Yoruba (YRI) are most distant from Amerindians (AI), while all other populations fall within a very narrow range of 0.101197 to 0.119501 or about ~18% variation in distance to Yoruba.

Things become more interesting if we exclude Caucasoids (CEU) and limit ourselves to East Eurasians. Now, the range of distance from Yoruba becomes 0.108875-0.119501, or only ~10% variation in distance to Yoruba.

Yoruba are closer to (in order): to Caucasoids, East Asians, Amerindians, in that order. Distant from all, but not equidistant.

Razib advances an explanation for the Amerindian-African distance:
So what does this mean? And why is this so? I think I won’t revise my model of the out of Africa migration. I don’t think there was serious secondary migration out of Africa after the initial one (at least until recently). And yet somehow the indigenous populations of the New World are more genetically distinct. This is because of genetic drift. Specifically, a set of serial founder events, where the genetic variation was reduced and ancestral allele frequencies changed rapidly. When a population goes through a bottleneck, and then becomes isolated, it “goes its own way,” as there isn’t gene flow to requilibrate the frequencies. The push east, to Australasia and to the New World, was accompanied by founder events due to fissioning off of small groups from the main ancestral population. From what we can tell there was relatively little gene flow after the initial settlement of the New World and Oceania (actually, there may have been several waves into the New World to be fair, but it looks like there wasn’t enough Eurasian gene flow to dampen the reduction in heterozygosity caused by bottlenecks).
"Drift" gets invoked so often, so it is worthy to consider if it is responsible for the greater distance of Amerindians to Africans. "Drift" can indeed shift allele frequencies in a small population, and increase Fst. However, there are dozens of small and isolated populations throughout Eurasia, yet none of them are more distant from Africans than Amerindians are. If "drift" is responsible, then we would expect some isolated Eurasians to repeat the pattern of greater distance that Amerindians present. Good luck finding any study where any Eurasian population is more distant to Africans than (unadmixed) Amerindians are.

There is no need to invoke "drift" to explain Amerindian-African distance, as there is a simpler explanation: Amerindians, unlike Eurasians, are more distant from Africans, not because of "drift" shifting their allele frequencies, but because they lacked the opportunity for substantial gene flow with Africans.

It is worthy to revisit the Fst table: Amerindians are 1/5 further from Africans than East Asians are because they spent about 1/5 of the time since "Out of Africa" isolated from Africans. East Asians had about 60 thousand years worth of opportunities for gene flow with Africans (however limited), while the corresponding time for Amerindians is about 45-50 thousand years.

There is actual evidence for post-Out of Africa gene flow between Africa and Eurasia, but not between Africa and the Americas. It comes from the YAP marker of the human Y-chromosome. This defines the DE-YAP clade of the Y-chromosome phylogeny. This clade is absent in the Americas, but it is present in Africa, West and East Eurasia. Africa and West Eurasia share the E subclade, and East Eurasia possesses the D subclade.

What YAP tells us is that Out-of-Africa was not a one-time event: there has been subsequent gene flow linking Africa and Eurasia, and YAP is the smoking gun for this gene flow (*) There is no way around it: post-OOA men bearing YAP Y-chromosomes spread across a range spanning drom Nigeria and South Africa to the isles of Japan, but not into the Americas. These were not necessarily the only conduits for post-OOA gene flow, but they prove the existence of such gene flow.

Moreover, the greater proximity of Caucasoids to Africans can also be explained by the joint possession (by Africans and Caucasoids) of the E subclade of YAP (at the exclusion of East Asians and Negritos, who possess the D subclade).

In conclusion: the population that is most distant from Africans are Amerindians. They are more distant not because of drift but because of an earlier (10-15kya) cessation of any important gene flow between them and the rest of the species: about 1/5 less gene flow => about 1/5 more distance (relative to East Asians). Finally, this cessation of gene flow does not only make sense archaeologically, but is also evidenced genetically by the prevalence of the YAP marker, which records African-Eurasian contacts (and a nested subset of African-Caucasoid contacts) at the exclusion of Amerindians.

(*) The direction of this gene flow is contested as both Eurasian and African origin of YAP have been proposed. That's not important however, as YAP links Africa and Eurasia at the exclusion of the Americas either way.

UPDATE (Dec 24): It turns out that the greatest distance within our species is between Mbuti Pygmies and Papuans.

August 18, 2010

Age estimation of Y chromosome lineages (Adamov & Karzhavin 2010)

A nice paper in the Russian Journal of Genetic Genealogy that addresses the subject of age estimation using Y-STRs. The authors share my sentiments on the subject, and were good enough to compare their simulation results and analytical approximations with my 2008 post. Every post I've written on the subject can be found in the Y-STR series label.

The authors write:

One of numerous critics of «effective» mutation rate is D. Pontikos, who published in 2008 the results of his own calculations in his popular blog (Pontikos, 2008 [8]). Fig. 11 shows the results of Pontikos for a fixed interval of genealogical tree with the final size of 750000 – 1250000 individuals ... Those data match well with approximation (6), and the difference between them is only 0.3% and 1.4%, correspondingly.
I have not checked all the details of this paper, but it should be a good read for anyone interested in the subject. Hopefully as more people look at the evidence, age estimation in mainstream journals will catch up with the state of the art.

I will not repeat the long and involved arguments and observations of my Y-STR series, but to summarize the argument for new readers:

  • Most recent population genetics papers use an "effective" mutation rate that is about 3 times slower than the observed "germline" rate (of father-son pairs) and leads to age estimates that are about 3 times older than is justified.
  • This mutation rate is applicable to the constant population case in which a man has 1 son on average. Population size may vary stochastically under this model, but it generally does not grow to large numbers within the time frame of Homo sapiens. For example, in the 2,000 or so generations since Y-chromosome Adam, a lineage evolving under this model would have 1,000 descendants on average, and the probability that it would have millions of descendants (like most real-world haplogroups in non-tribal populations) is practically zero.
  • If the constant population case does not hold, due to selection, or demographic growth, or social dominance, then the effective rate is not applicable, and age estimates using the germline rate are much closer to the truth.
  • The population sizes of real-world haplogroups are huge and could not have been generated by stochastic variation in a model where each man has 1 son on average. Most Y-chromosome age estimates in the mainstream literature are overestimates, and ascribe Paleolithic origins to Neolithic and Bronze Age founders.
The Russian Journal of Genetic Genealogy, Vol 1, No 2 (2010)

About the influence of population size on the accuracy of TMRCA estimation, done by standard methods using STR locus complex

Dmitry Adamov, Sergey Karzhavin


Model calculations of influence of a population growth from the common male ancestor towards the final (present-day) population on the TMRCA estimation have been done. The estimation was made by linear and quadratic methods using STR locus of Y-chromosome. The modeling was done using computer simulation of a tribal population during fixed number of generations.
Universal approximations, allowing estimate the average correction for population effects as a function of the final population size, have been obtained. Authors calculated the variance of age estimations for an initial ancestor, which appears due to different types of population effects. Precision of the ancestral allele determining in a STR from the final population haplotypes set have been studied. An algorithm has been proposed for TMRCA calculation for a paternal (tribal) population, taking into account its total population size.


Ancient Megalithic mtDNA from France

An extremely interesting paper, the first one on Megalithic remains, and a link between the Megalithic people and the early central European Neolithic Linearbandkeramik, where N1a was unexpectedly detected as a major component a few years ago. I'll probably have more to say on this after I read the paper.


From the paper:
We reproducibly retrieved partial HVR-I sequences (nps 16,165 to 16,390) from three human remains (Prisse´ 1, 2, and 4, Table 1), one adult and two children deposited during different stages of use of the burial chamber. Corresponding sequences could be unambiguously assigned to haplogroups X2, U5b, and N1a (Table 2 and Supporting Online Information).
Haplogroup U5b subclusters are believed to have spread from central-southern Europe post-LGM. Haplogroup X2 is believed to have spread from the Near East and Mediterranean Europe; it is one of those mystery haplogroups that turn up in the Taklamakan desert as well as Native Americans. Together with the clearly invasive nature of N1a, these results are consistent with migrationism.

The authors write:
The widespread distribution of the N1a lineage in Early and Middle Neolithic northwestern Europe may indicate genetic continuity from Mesolithic populations.
This scenario would support a Mesolithic contribution to the earliest Neolithic of Atlantic Europe. This would imply that the N1a lineage was already common in
indigenous north European populations and that the spread of the Neolithic was principally the result of cultural diffusion. Although so far the N1a lineage has not
been encountered among late European hunter-gatherers in central and north Europe (Bramanti et al., 2009; Malmstro¨m et al., 2009), it is worth noting that less
than half of the hunter-gatherers’ paleogenetic data come indeed from the pre-Neolithic period (predating LBK expansion). Finally, no paleogenetic data currently
exist for the Mesolithic period in Western Europe. This prevents any conclusion being drawn about N1a occurrence during the Mesolithic period in those regions.
Of course we won't know if N1a occurred in France prior to the Neolithic until we test pre-Neolithic French samples. However, if N1a was present in France prior to the Neolithic, then why wasn't it present in central-northern Europe where substantial sample sizes exist? This would require a partition of pre-Neolithic populations of Europe, and also existence of N1a in both the Linearbandkeramik (that spread on a south-north vector) and in Mesolithic French. So, while we wait for pre-Neolithic Western Europeans to come up N1a, I'm willing to wager that they will not, and that N1a spread into France with the Neolithic or the later spread of Megalithic cultures.


American Journal of Physical Anthropology DOI: 10.1002/ajpa.21376

News from the west: Ancient DNA from a French megalithic burial chamber

Marie-France Deguilloux et al.

Recent paleogenetic studies have confirmed that the spread of the Neolithic across Europe was neither genetically nor geographically uniform. To extend existing knowledge of the mitochondrial European Neolithic gene pool, we examined six samples of human skeletal material from a French megalithic long mound (c.4200 cal BC). We retrieved HVR-I sequences from three individuals and demonstrated that in the Neolithic period the mtDNA haplogroup N1a, previously only known in central Europe, was as widely distributed as western France. Alternative scenarios are discussed in seeking to explain this result, including Mesolithic ancestry, Neolithic demic diffusion, and long-distance matrimonial exchanges. In light of the limited Neolithic ancient DNA (aDNA) data currently available, we observe that all three scenarios appear equally consistent with paleogenetic and archaeological data. In consequence, we advocate caution in interpreting aDNA in the context of the Neolithic transition in Europe. Nevertheless, our results strengthen conclusions demonstrating genetic discontinuity between modern and ancient Europeans whether through migration, demographic or selection processes, or social practices.


The mother of us all lived 200 thousand years ago

From the press release:
Cyran said human genetic models have become more complex over the past couple of decades as theorists have tried to correct for invalid assumptions. But some of the corrections -- like adding branching processes that attempt to capture the dynamics of population growth in early human migrations -- are extremely complex. Which raises the question of whether less complex models might do equally well in capturing what's occurring.

"We wanted to see how sensitive the estimates were to the assumptions of the models," Kimmel said. "We found that all of the models that accounted for random population size -- such as different branching processes -- gave similar estimates. This is reassuring, because it shows that refining the assumptions of the model, beyond a certain point, may not be that important in the big picture."
Theoretical Population Biology

Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating

Krzysztof A. Cyran and Marek Kimmel

Methods of calculating the distributions of the time to coalescence depend on the underlying model of population demography. In particular, the models assuming deterministic evolution of population size may not be applicable to populations evolving stochastically. Therefore the study of coalescence models involving stochastic demography is important for applications. One interesting approach which includes stochasticity is the O’Connell limit theory of genealogy in branching processes. Our paper explores how many generations are needed for the limiting distributions of O’Connell to become adequate approximations of exact distributions. We perform extensive simulations of slightly supercritical branching processes and compare the results to the O’Connell limits. Coalescent computations under the Wright–Fisher model are compared with limiting O’Connell results and with full genealogy-based predictions. These results are used to estimate the age of the so-called mitochondrial Eve, i.e., the root of the mitochondrial polymorphisms of the modern humans based on the DNA from humans and Neanderthal fossils.


August 17, 2010

mtDNA relics in south China

In a past post I had noted that the occurrence of low-frequency divergent haplotypes in a population might be a "relic of a bygone age". The point I was trying to make is that early settlement in a region may create a diverse gene pool (as there is plenty of time for variation to accumulate), but this antiquity of settlement may be obscured by later (including fairly recent) expansions of sublineages that appear to be young in evolutionary terms.

Hence, the importance of outliers in age estimation, as these may alternatively be "relics" of the most ancient population (prior to the expansion, due to either selection or demographic increase, of the recent lineages), or introgressed lineages from abroad.

In order to discover outliers, you need a large sample. The authors of this paper, in the context of mtDNA, discovered 5 new basal (=near the trunk) lineages within Eurasian macrohaplogroups M and N. This is less than 0.1% of their huge Chinese sample. In a smaller sample, as is customary in most mtDNA studies, these outliers would probably have been undetected.

What is most interesting, is that the authors explicitly tried to distinguish between the two competing hypotheses described above: admixture and "relics". The new lineages do not appear to be the result of foreign admixture (e.g., some rare Indian M subclade that somehow found itself into southern China), but to be true relics.

The existence of relics pushes back the time of settlement/Out of Africa expansion, as more time is needed to "tie in" the relics with the rest of the tree.

This should serve as a warning for age estimation: so many times, peculiar lineages are brushed aside with a paragroup label as oddities, while researchers focus on the more established and phylogeographically informative lineages. While full-mtDNA sequencing is a viable option, the same procedure is not widely-applied in Y chromosomes, as the Y chromosome is much larger than mtDNA, and hence more difficult (and expensive) to fully sequence.

A 6,000-strong sample is probably not available for most countries and populations, except for the Genographic project -which seems to be missing in action of late. There are also large commercial samples which benefit from the desire of paying customers with unusual haplotypes to look deeper into their ancestry. Unfortunately these same customers are WEIRD, and give us little information about most of mankind, including about the most interesting and mysterious aspects of human prehistory.

Nonetheless, there is hope for the future, as sample sizes continue to increase and genotyping costs to decrease. While there is reason to share Craig Venter's bleak assessment of the accomplishment of genomics, the single, clear, field where human genetics has triumphed and will continue to triumph is that of human origins.

UPDATE: Gene Expression notes that commercial companies like 23andMe have even larger samples, and customers can download 550k SNPs for their sample. However, most of the people who buy 23andMe tests are -in the global context- near clones of each other, being predominantly of western European origin. Moreover, the thousands of SNPs included in the technology used by 23andMe include a limited number of mtDNA and Y chromosome SNPs which have been chosen for their informativeness, i.e., they define studies clades of the phylogeny, and are thus unsuitable for discovering new clades -as was done in this paper. I'm pretty sure there are paragroups a-plenty in both the 23andMe customer base or in the Genographic Project, but, as far as I know neither of the two aggressively mine their data for SNP discovery/phylogeny refinement, and there are ethical limitations to consider, as people who sign up for either service do not, necessarily approve of their DNA sample being used beyong the narrow scope of the provided service.

Molecular Biology and Evolution, doi:10.1093/molbev/msq219

Large-scale mtDNA screening reveals a surprising matrilineal complexity in East Asia and its implications to the peopling of the region

Qing-Peng Kong et al.

In order to achieve a thorough coverage of the basal lineages in the Chinese matrilineal pool, we have sequenced the mitochondrial DNA (mtDNA) control region and partial coding-region segments of 6,093 mtDNAs sampled from 84 populations across China. By comparing with the available complete mtDNA sequences, 194 of those mtDNAs could not be firmly assigned into the available haplogroups. Completely sequencing 51 representatives selected from these unclassified mtDNAs identified a number of novel lineages, including five novel basal haplogroups that directly emanate from the Eurasian founder nodes (M and N). No matrilineal contribution from the archaic hominid was observed. Subsequent analyses suggested that these newly identified basal lineages likely represent the genetic relics of modern humans initially peopling East Asia, instead of being the results of gene flow from the neighboring regions. The observation that most of the newly recognized mtDNA lineages have already differentiated and show the highest genetic diversity in southern China provided additional evidence in support of the Southern-Route peopling hypothesis of East Asians. Specifically, the enrichment of most of the basal lineages in southern China and their rather ancient ages in Late Pleistocene further suggested that this region was likely the genetic reservoir of modern humans after they entered East Asia.


August 16, 2010

Y chromosomes and mtDNA from Comoros islands

From the paper:

The low incidence of E-M293 (0.8%) and A-M91 (0%) on the Comoros contrasts strongly with the frequency of these haplogroups in East African populations.

This is an interesting piece of evidence in support of the idea of very recent genetic changes in east Africa.


A comparison of the relative incidences of E-M78(V22), E-M123, G, J, L, Q and R on the Comoros with populations around the Arabian Sea shows greatest similarities with Southern Iran, and, to a lesser extent, Turkey.


A possible source of the Northern Y-chromosomes is therefore the Shirazi traders from Southern Iran who established trading posts on the Comoros by 800YBP.
I have often noted that what you don't find in a population is often more informative about ancient history than what you do find, as it points towards -in the absence of a very small founding population- to its absence in the source populations. In the case of the Comoros, I note the absence of the R1b clade, which ties these islands with India and parts of the Middle East as the only R1b-less regions influenced by Caucasoids.

A trace of E-M78, in the form of E-V22 (0.5%) is also interesting, and certainly ties the Comoros with the interior of the Middle East where E-M78 is rare, rather than the more western regions where it is frequent.

The presence of 0.5% haplogroup I-P38 is also interesting, coupled with the absence of R1b: native Near Eastern haplogroup-I or European admixture: here are the I haplotypes for anyone interested in digging deeper into this:

DYS456 DYS389I DYS390 DYS389 II DYS458 DYS19 DYS385a DYS385b DYS393 DYS391 DYS439 DYS635 DYS392 Y-GATA-H4 DSY437 DYS438 DYS448
15 12 24 28 17 15 14 19 13 11 11 25 11 11 14 11 19
16 13 23 30 16 14 15 16 12 10 12 21 11 13 15 9 21

Feel free to leave a comment if you figure out something extra about these chromosomes.

as for the SE Asian component:

We found the O1 lineage (6%) in the Comoros sample, providing genetic evidence for an SEA influence ... All but one of the Comorian O1 chromosomes are O1a-M50
There were also C* and K* lineages on the islands, which could also plausibly be SEA in origin. However note that these are C*(xC1-5) and K*(xLMNOPQRST).

European Journal of Human Genetics advance online publication 11 August 2010; doi: 10.1038/ejhg.2010.128

Genetic diversity on the Comoros Islands shows early seafaring as major determinant of human biocultural evolution in the Western Indian Ocean

Said Msaidie et al.

The Comoros Islands are situated off the coast of East Africa, at the northern entrance of the channel of Mozambique. Contemporary Comoros society displays linguistic, cultural and religious features that are indicators of interactions between African, Middle Eastern and Southeast Asian (SEA) populations. Influences came from the north, brought by the Arab and Persian traders whose maritime routes extended to Madagascar by 700–900 AD. Influences also came from the Far East, with the long-distance colonisation by Austronesian seafarers that reached Madagascar 1500 years ago. Indeed, strong genetic evidence for a SEA, but not a Middle Eastern, contribution has been found on Madagascar, but no genetic trace of either migration has been shown to exist in mainland Africa. Studying genetic diversity on the Comoros Islands could therefore provide new insights into human movement in the Indian Ocean. Here, we describe Y chromosomal and mitochondrial genetic variation in 577 Comorian islanders. We have defined 28 Y chromosomal and 9 mitochondrial lineages. We show the Comoros population to be a genetic mosaic, the result of tripartite gene flow from Africa, the Middle East and Southeast Asia. A distinctive profile of African haplogroups, shared with Madagascar, may be characteristic of coastal sub-Saharan East Africa. Finally, the absence of any maternal contribution from Western Eurasia strongly implicates male-dominated trade and religion as the drivers of gene flow from the North. The Comoros provides a first view of the genetic makeup of coastal East Africa.


August 09, 2010

Local adaptation and admixture

This is a very interesting and well-written paper, and I highly recommend it (it's open access).

The authors deal with the problem of admixture between locally-adapted populations and newly introduced populations. Local adaptation is the occurrence of alleles and allele combinations that are well-suited to the local environment.

Admixture is a double-edged sword: on the one hand, it dilutes locally adapted gene pools by introducing foreign (non-adapted) alleles. On the other, it reduces homozygosity and inbreeding depression. Local adaptation is fitness enhancing, while inbreeding depression is fitness damaging. Whether one or the other wins out is probably case specific, and would depend e.g., on the level of adaptation, as well as the level of inbreeding depression in the population.

The paper does not address humans in particular, but a human example might be instructive. Consider the movement of tropically-adapted people into an arctic village. If inbreeding depression is substantial, then a proportion of the native-native (arctic) offspring would be homozygous for deleterious alleles and would not be able to compete successfully with invaders or native-invader offspring. On the other hand, invader and invader-native hybrids would lack locally adapted genomes (e.g. related to heat production) and would thus be at a disadvantage against functional natives.

The consequences of admixture in introduced populations are also interesting. Such populations lack local adaptation, and they are also often less genetically diverse (as they represent an often small subset of founders drawn from a larger non-native population). Thus, they benefit from admixture doubly: by receiving locally adapted variants, and by increasing their genetic diversity and thus becoming more capable of adaptation (*)

Thus, in considering the process of admixture between native and introduced populations, we must take into account a few factors:
  • Inbreeding depression, generally reduced by admixture
  • Genetic incompatibilities between divergent gene pools, generally exposed by admixture
  • The potential for selection, generally increased by admixture
  • The loss of local adaptation, generally reduced by admixture in the native population
The paper's utility is quite broad, and, while the authors do well not to limit themselves to a particular species, the implications for human societies are worth considering.

Proceedings of the Royal Society B doi:10.1098/rspb.2010.1272

Population admixture, biological invasions and the balance between local adaptation and inbreeding depression

Koen J. F. Verhoeven et al.

When previously isolated populations meet and mix, the resulting admixed population can benefit from several genetic advantages, including increased genetic variation, the creation of novel genotypes and the masking of deleterious mutations. These admixture benefits are thought to play an important role in biological invasions. In contrast, populations in their native range often remain differentiated and frequently suffer from inbreeding depression owing to isolation. While the advantages of admixture are evident for introduced populations that experienced recent bottlenecks or that face novel selection pressures, it is less obvious why native range populations do not similarly benefit from admixture. Here we argue that a temporary loss of local adaptation in recent invaders fundamentally alters the fitness consequences of admixture. In native populations, selection against dilution of the locally adapted gene pool inhibits unconstrained admixture and reinforces population isolation, with some level of inbreeding depression as an expected consequence. We show that admixture is selected against despite significant inbreeding depression because the benefits of local adaptation are greater than the cost of inbreeding. In contrast, introduced populations that have not yet established a pattern of local adaptation can freely reap the benefits of admixture. There can be strong selection for admixture because it instantly lifts the inbreeding depression that had built up in isolated parental populations. Recent work in Silene suggests that reduced inbreeding depression associated with post-introduction admixture may contribute to enhanced fitness of invasive populations. We hypothesize that in locally adapted populations, the benefits of local adaptation are balanced against an inbreeding cost that could develop in part owing to the isolating effect of local adaptation itself. The inbreeding cost can be revealed in admixing populations during recent invasions.


(*) Selection can proceed faster at more diverse populations, as selection depends heterozygosity, and is, in fact, the differential survival of one allele over another at a locus.

August 06, 2010

A rare genomic look at Aboriginal Australians

How strange that modern genetics is supposed to have invalidated the concept of race, yet, at every turn, it confirms most of the basic racial taxonomic observations of people working only with their eyes and, much later, their calipers.

On the left is the frappe analysis from the supplementary material, the Oceanian populations are seen on the far right.

The Australasid cluster emerges as an entity at K=5, showing Caucasoid admixture (AUR), Mongoloid admixture (MEL), and no apparent admixture (PAP).

At K=8 it is evident that the Caucasoid admixture in Aboriginal Australians is specifically European in origin, certainly the result of colonization in very recent times.

What can account for the Mongoloid admixture in Melanesians? It is probably the recent spread of Austronesian languages, arguably the most epic maritime language spread before Columbus, which affected a good deal of the southern hemisphere from Madagascar through Indonesia, Micronesia, Melanesia, and all the way to Polynesia on the far end.

As for the unadmixed Papuans, the indigenous inhabitants of New Guinea, their results are not surprising: there is a lack of admixture of East Asian Y chromosomes on the island, even in its most affected NW corner (Bird's head) where this admixture runs only to about 2.5%.

The American Journal of Human Genetics, doi:10.1016/j.ajhg.2010.07.008

Whole-Genome Genetic Diversity in a Sample of Australians with Deep Aboriginal Ancestry

Brian P. McEvoy et al.

Australia was probably settled soon after modern humans left Africa, but details of this ancient migration are not well understood. Debate centers on whether the Pleistocene Sahul continent (composed of New Guinea, Australia, and Tasmania) was first settled by a single wave followed by regional divergence into Aboriginal Australian and New Guinean populations (common origin) or whether different parts of the continent were initially populated independently. Australia has been the subject of relatively few DNA studies even though understanding regional variation in genomic structure and diversity will be important if disease-association mapping methods are to be successfully evaluated and applied across populations. We report on a genome-wide investigation of Australian Aboriginal SNP diversity in a sample of participants from the Riverine region. The phylogenetic relationship of these Aboriginal Australians to a range of other global populations demonstrates a deep common origin with Papuan New Guineans and Melanesians, with little evidence of substantial later migration until the very recent arrival of European colonists. The study provides valuable and robust insights into an early and important phase of human colonization of the globe. A broader survey of Australia, including diverse geographic sample populations, will be required to fully appreciate the continent's unique population history and consequent genetic heritage, as well as the importance of both to the understanding of health issues.


Genome-wide study of Indian men

Not a type of study I usually cover, but it has a hidden gem in Figure 2, reproduced on the left, which shows a PCA plot of Indian men color-coded by religion and language.

The existence of two clusters is kind of obvious, while their interpretation is not as dots of the same color appear in both clusters: a placement of these individuals in a global context might have been useful here. Things are clearer at the top cluster which shows a clear gradient anchored by Punjabi Sikh and Hindu Tamils on either end.

Also of interest is the group of isolated Muslim/Christian individuals on the left which deviate strongly from the mainstream; these probably represent exogenous elements that don't resembe the bulk of the Indian population.

PLoS ONE 5(8): e11961. doi:10.1371/journal.pone.0011961

A Genome-Wide Association Study of the Metabolic Syndrome in Indian Asian Men

Delilah Zabaneh, David J. Balding