Molecular Ecology DOI: 10.1111/j.1365-294X.2011.05361.x
Ancient DNA from an Early Neolithic Iberian population supports a pioneer colonization by first farmers
C. GAMBA et al.
The Neolithic transition has been widely debated particularly regarding the extent to which this revolution implied a demographic expansion from the Near East. We attempted to shed some light on this process in northeastern Iberia by combining ancient DNA (aDNA) data from Early Neolithic settlers and published DNA data from Middle Neolithic and modern samples from the same region. We successfully extracted and amplified mitochondrial DNA from 13 human specimens, found at three archaeological sites dated back to the Cardial culture in the Early Neolithic (Can Sadurní and Chaves) and to the Late Early Neolithic (Sant Pau del Camp). We found that haplogroups with a low frequency in modern populations—N* and X1—are found at higher frequencies in our Early Neolithic population (∼31%). Genetic differentiation between Early and Middle Neolithic populations was significant (FST∼0.13, P less than 10−5), suggesting that genetic drift played an important role at this time. To improve our understanding of the Neolithic demographic processes, we used a Bayesian coalescence-based simulation approach to identify the most likely of three demographic scenarios that might explain the genetic data. The three scenarios were chosen to reflect archaeological knowledge and previous genetic studies using similar inferential approaches. We found that models that ignore population structure, as previously used in aDNA studies, are unlikely to explain the data. Our results are compatible with a pioneer colonization of northeastern Iberia at the Early Neolithic characterized by the arrival of small genetically distinctive groups, showing cultural and genetic connections with the Near East.
Link
Showing posts with label X1. Show all posts
Showing posts with label X1. Show all posts
November 27, 2011
September 19, 2011
Inference of ancient human demography from individual genomes (Gronau et al. 2011)
This new paper is reminiscent of Li & Durbin (2011), in that it also fits a model of ancient human demography based on individual genome sequences. Unlike that paper, it also considers a San individual, and is hence a good realization of the project I proposed in response to the Li & Durbin paper.
Note that, unlike Li & Durbin, Gronau et al. do not consider a model with a structured African population, or the presence of archaic admixture. These would have produced observed divergence times by a combination of a younger divergence between modern human groups, coupled with admixture with a more distantly diverged (archaic or "Palaeoafrican") population, for which there is now genetic and palaeoanthropological evidence.
As is so often the case, the absolute age estimates are based on a calibration, which is spelled out quite nicely in the supplementary material (pdf; p. 55). In particular, the age estimates are based on:
- Human-chimp divergence of 6.5Mya
- Generation length of 25 years
an adjusted estimate of the per generation mutation rate would be slightly more than 2 × 10−8 mutations per site. This adjusted estimate agrees well with independent estimates of 1.8–2.5 ×10−8 (Nachman and Crowell, 2000; Kondrashov, 2003). It is slightly higher than recently reported estimates of 1.0–1.3 ×10−8 (The 1000 Genomes Project Consortium, 2010; Lynch, 2010; Roach et al., 2010), but, considering the many sources of uncertainty in these studies, we do not regard this difference as a serious concern. It is difficult to reconcile per-generation mutation rate estimates as low as 1×10−8 with the observed levels of human/chimpanzee genomic divergence.
However, Nachman & Crowell do not provide a mutation rate estimate independent of demography. As can be seen from Table 3 of their paper, their mutation rate estimate depends on human-chimp speciation as well as assumptions on ancestral effective population size. First, they assume a generation length of 20 years, hence their calibrations need to be scaled: 6.5My in 25y generations is equivalent to 5.2My in 20y generations. Nachman and Crowell estimate the mutation rate at 2.5x10-8 and 1.4x10-8 with an effective size of 10,000 individuals and speciation at 5 or 5.5Mya.
Hence, their mutation rate estimate for 5.2My would be between 1.4x10-8 and 2.5x10-8, i.e., close to the value of Gronau et al. (2011), assuming that the effective population size was 10,000 individuals. Gronau et al. estimate the effective population size at 9,000 individuals. So, there is nothing independent about N&C's age estimate: it is dependent on the effective population size, and the Gronau et al.'s mutation rate/effective size estimate of 2.0x10-8/9,000 individuals may be consistent with the data, but so is a lower mutation rate and higher effective size.
I do not have a strong opinion how the 2-fold mutation rate difference between different papers will be resolved. If the slower empirical estimates are accepted, then this would result in deeper divergences between human populations, as well as an earlier human-chimp split, but the difference is not necessarily linear.
As I have noted before, there is no reason I can think of why parent-offspring rates should be slower than evolutionary ones. Two potential processes might actually make them appear faster: phantom mutations based on current whole genome sequencing technology, or loss of mutations due to drift across geological time scales. So, unless there is a technical reason for the low 1000Genomes rate, I'm more inclined to trust it rather than circular calibrations of demography/mutation rate/effective population size. In any case, we will have more full genome sequences from family members in the coming years, so the mutation rate will be calibrated directly, without recourse to human-chimp speciation or ancestral population sizes.
A slower mutation rate would make sense to me on palaeoanthropological grounds:
- The authors estimate European/East Asian divergence at 30-45kya. But, the presence of clearly derived Caucasoid morphology in the Upper Paleolithic population of Europe, suggests to me that divergence may have begun some time before.
- Table S2 of adjusted Mahalanobis distances from Harvati et al. (2011) leaves little doubt that the Eurasian anatomically modern humans (EAM) from the Levant (Skhul/Qafzeh) are related to subsequent Eurasians. EAM has a distance of -0.25 to later Upper Cave from China (UC); 6.42 to recent Oceanians (OCE); 7.19 to Upper Paleolithic Eurasians. All of the above are well-within the maximum divergence observed between any two modern human groups. Ancestral Eurasians likely lived before 100ky, and did not split from Africans only 50ky.
- If there was a long isolation between Khoe-San and the rest of mankind, then where did it happen? It is no longer plausible to postulate multiple fully modern groups in Africa that are absolutely absent from the palaeoanthropological record in the timeframe in question, in reproductive isolation to the multiple archaic or archaic-like ones that keep turning up.
- How did the ur-humans in Africa manage reproductive isolation for tens of thousands of years between themselves (Khoe-San vs. rest or moderns vs. archaics), but apparently mixed a-plenty right after they left Africa with Neandertals/Denisovans? Were Neandertal women really that sexy?
- Actually, the fragmentary record, as it stands, has not revealed any traces of a Proto-San population, and the Hofmeyr skull from South Africa stands as an outlier in the African paleoanthropological record with its strong affinities to Upper Paleolithic Eurasians.
We are only now beginning to harness the power of full human genomes for evolutionary inferences, but it is inevitable that a new theory of human origins will appear that will reconcile the different and conflicting lines of evidence. That theory must take into account latent admixture as a cause of African genetic diversity, and it must also harmonize with the paleoanthropological record.
Related:
Nature Genetics (2011) doi:10.1038/ng.937
Bayesian inference of ancient human demography from individual genome sequences
Ilan Gronau et al.
Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108–157 thousand years ago, that Eurasians diverged from an ancestral African population 38–64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ~9,000.
Link
Bayesian inference of ancient human demography from individual genome sequences
Ilan Gronau et al.
Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108–157 thousand years ago, that Eurasians diverged from an ancestral African population 38–64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ~9,000.
Link
July 17, 2011
A good idea for a new project
I was a little disappointed because the excellent new paper by Li and Durbin had not included full genomes from Palaeoafrican individuals, as these are, perhaps, the most interesting ones in terms of the deep ancestry of our species.
I was then reminded, that a full genome of Khoisan individual (KB1) was, in fact, published by Schuster et al. in 2010, and both the paper and the genome are freely available online.
Why is this interesting? Consider the following figure from Schuster et al. (2010):
Notice that the African hunter-gatherer (KB1) has 1,704 private SNPs compared to a Yoruba (NA19240) and Archbishop Desmond Tutu (ABT), and 2,038 SNPs compared to a European American (J. C. Venter), and a Chinese (YH).
This amount of private variation admits to two explanations:
There is no mystery why this is the case: accumulated genetic variation is a consequence of the mutation rate (how aggressively variation is introduced), and the effective population size (which controls how severely variation is lost due to drift).
A substantial difference in effective population size means that almost certainly the indiscriminate use of a single 2.5x10-8 mutation rate for different human populations is unwise.
This is a serious limitation, as far as I can tell, of the PMSC method introduced by Li & Durbin, as it assumes a single mutation rate parameter which is then used to estimate past population sizes.
In any case, it would be interesting to see how far back the divergence of the Khoisan individual from other humans will be, even if the 2.5x10-8 rate is employed, how large the Khoisan effective population will be, and also what antiquity of population substructure followed by admixture within Africa will be sufficient to "save the phenomena."
Another interesting observation is that the genealogical autosomal mutation rate in humans (1.1x10-8) is actually lower than the estimated evolutionary rate from human-chimpanzee divergence (2.5x10-8)
Nothing in evolutionary biology can account for such a discrepancy, I think, unless there is extreme balancing selection maintaining variation across the entire genome.
So, either:
For example, Li & Durbin propose that gene flow between Eurasians could have been effected during the Ice Age, as they retreated southwards; such a proposal is necessary to account for divergence between Europeans and East Asians of ~20ky, which is about half the earliest known colonization of Europe. Halving the mutation rate harmonizes the genetic divergence with archaeology, but would push the divergence of Eurasians from West Africans to the dawn of anatomical modernity, and African hunter-gatherer antiquity well beyond it.
I predict that the next few years will reignite many old debates in anthropology.
I was then reminded, that a full genome of Khoisan individual (KB1) was, in fact, published by Schuster et al. in 2010, and both the paper and the genome are freely available online.
Why is this interesting? Consider the following figure from Schuster et al. (2010):
Notice that the African hunter-gatherer (KB1) has 1,704 private SNPs compared to a Yoruba (NA19240) and Archbishop Desmond Tutu (ABT), and 2,038 SNPs compared to a European American (J. C. Venter), and a Chinese (YH).This amount of private variation admits to two explanations:
- Higher effective population size in Khoisan
- Deep population structure followed by admixture
There is no mystery why this is the case: accumulated genetic variation is a consequence of the mutation rate (how aggressively variation is introduced), and the effective population size (which controls how severely variation is lost due to drift).
A substantial difference in effective population size means that almost certainly the indiscriminate use of a single 2.5x10-8 mutation rate for different human populations is unwise.
This is a serious limitation, as far as I can tell, of the PMSC method introduced by Li & Durbin, as it assumes a single mutation rate parameter which is then used to estimate past population sizes.
In any case, it would be interesting to see how far back the divergence of the Khoisan individual from other humans will be, even if the 2.5x10-8 rate is employed, how large the Khoisan effective population will be, and also what antiquity of population substructure followed by admixture within Africa will be sufficient to "save the phenomena."
Another interesting observation is that the genealogical autosomal mutation rate in humans (1.1x10-8) is actually lower than the estimated evolutionary rate from human-chimpanzee divergence (2.5x10-8)
Nothing in evolutionary biology can account for such a discrepancy, I think, unless there is extreme balancing selection maintaining variation across the entire genome.
So, either:
- There is a serious flaw in the genealogical rate as estimated from 1000 Genomes trios, or
- We are about to find out that quite deep population structure and admixture played a role in the history of the genus Homo, deep in a sense of human-ape interbreeding after Homo-Pan speciation 7 million years ago, an idea that was proposed, for different reasons, a few years ago
For example, Li & Durbin propose that gene flow between Eurasians could have been effected during the Ice Age, as they retreated southwards; such a proposal is necessary to account for divergence between Europeans and East Asians of ~20ky, which is about half the earliest known colonization of Europe. Halving the mutation rate harmonizes the genetic divergence with archaeology, but would push the divergence of Eurasians from West Africans to the dawn of anatomical modernity, and African hunter-gatherer antiquity well beyond it.
I predict that the next few years will reignite many old debates in anthropology.
July 13, 2011
Human population history from single human genomes (Li & Durbin 2011)
I will update this blog entry when I read the paper. In the meantime, see Nature News and New Scientist.
UPDATE I
From the supplementary material (p. 8):
As I said in that post (and more recently), we clearly have a lot to learn about autosomal mutation rates yet, and hopefully we will both get a better estimate of the rate from more trios of the 1000 Genomes project, as well as establish possible population variation in that rate.
UPDATE II (Divergence of Europeans and East Asians):
From the supplementary material (p. 13):
An easy reconciliation of the archaeological divergence times with the genetic evidence, would, of course, be immediately effected if the "slow" family-derived rate is adopted: this would double West/East Eurasian split time to about 40kya, but would also push back the split of West Africans from Eurasians to the dawn of anatomical modernity to more than 200kya, and, the African hunter-gatherers (not examined here) well into multiregional evolution time depths.
UPDATE III (Jul 14): (A chicken and egg problem)
The authors use a 2.5 × 10−8 mutation rate per site per generation and a 25-year generation time in the paper, citing Nachman and Crowell (2000).
Nachman and Crowell estimate this rate with a Chimpanze-Human divergence at 5 million years and an ancestral population size of 10,000. However, since their generation length is 20 years, their 5 million years become 6.25 million in 25-year generation terms; the authors of the current paper (Table S1) put the human-chimp divergence at 7 million years.
What is most interesting, is that the current paper estimates ancestral population sizes by fixing the mutation rate; whereas Nachman and Crowell (2000) estimated the mutation rate by making different assumptions about ancestral population size. For example, their rate of 2.5x10-8 assumes an ancestral population size of 10,000 whereas for an ancestral population size of 100,000, this becomes 1.5x10-8.
In other words, it's a chicken and egg problem: the mutation rate has been calibrated on assumptions about ancestral population size in the earlier paper; ancestral population size is estimated by using the mutation rate in the current one.
I really do think that the way forward is to get a better estimate of the mutation rate from actual parents and children, because I see no obvious way to go around the above-mentioned problem.

UPDATE IV (Jul 14): (Possible population structure)
From the paper:
In the supplement, the authors consider a split into two or three sub-populations at 250ky followed by admixture at 60ky. In such a scenario, the pattern of growth between 200ky and 60ky can be explained without any actual growth taking place: the apparent growth is due to the admixture event between different types of humans.
I would also add the difference between the apparent severity of the Eurasian bottleneck after 60ky (compared to Africans) may also be due to the continuation of admixture in Africa which keeps the apparent effective size high, whereas Eurasians now begin to move outside Africa, and no longer have the opportunity to mix with archaic Africans.
UPDATE V (Jul 14): It is extremely unfortunate that this type of research was not carried out on Native Americans, Native Australians, and African hunter-gatherers. All of these would provide useful insight:
Nature (2011) doi:10.1038/nature10231
Inference of human population history from individual whole-genome sequences
Heng Li & Richard Durbin
The history of human population size is important for understanding human evolution. Various studies1, 2, 3, 4, 5 have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH)6, a Korean male (SJK)7, three European individuals (J. C. Venter8, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10–20 kyr ago. Both populations experienced a severe bottleneck 10–60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure11. We also infer that the differentiation of genetically modern humans may have started as early as 100–120 kyr ago12, but considerable genetic exchanges may still have occurred until 20–40 kyr ago.
Link
UPDATE I
From the supplementary material (p. 8):
The TMRCA estimated by the PSMC model is in the units of mutation per site. To rescale TMRCA in the units of years, we need to know the mutation rate per site per year, which can be estimated by using closely related species. Table S1 implies that in primates, the mutation rate is broadly around 10−9 per site per year, the rate we used in rescaling the PSMC estimate (we assumed a 2.5 × 10−8 mutation rate per site per generation and a 25-year generation time, which is translated to a 1.0 × 10−9 mutation rate per site per year).The mutation rate was an issue in another recent paper, which used a similar 2.36x10-8 rate as the one here, and not the much lower rate from a couple of 1000 Genomes family trios.
However, recent direct measurement using whole genome sequences in pedigrees suggest that in the individuals examined the mutation rate per site per generation approaches 10−8 (Roach et al.,2010; 1000 Genomes Project Consortium, 2010), twice smaller than the rate we use. Nonetheless,what matters for population genetic based methods such as PSMC is the time average. A comparatively small fraction of higher mutation rates could change this average significantly. We therefore feel that although direct measurements are clearly valuable, there are not enough yet to change the mutation rates used in population genetic based analyses.
As I said in that post (and more recently), we clearly have a lot to learn about autosomal mutation rates yet, and hopefully we will both get a better estimate of the rate from more trios of the 1000 Genomes project, as well as establish possible population variation in that rate.
UPDATE II (Divergence of Europeans and East Asians):
From the supplementary material (p. 13):
On the other hand, several studies using nuclear DNA placed the East Asian-European divergence around 17–25kya (Keinan et al., 2007; Garrigan et al., 2007; Gutenkunst et al., 2009). Our PSMC estimate from the combined Venter and YH X chromosomes is also very recent (Figure S7d). This leads to the apparent inconsistency with the fossil evidence that anatomically modern human have spread across the continent by at least 40kya. One of the possible explanations is that during the Last Glacial Maximum at about 20kya, the non-African populations retreated southward (Forster, 2004), and gene flows may have occurred between the different populations again. Under this hypothesis, the recent gene flow between YRI.X and KOR.X would be reasonable, although autosomal data from more populations are needed to further confirm the existence of the recent gene flow.Gravel et al. suggested that there may have been "ghost populations" intermediate between Europeans and East Asians that suppress their divergence times; the explanation of Li & Durbin is different, but of the same kind.
An easy reconciliation of the archaeological divergence times with the genetic evidence, would, of course, be immediately effected if the "slow" family-derived rate is adopted: this would double West/East Eurasian split time to about 40kya, but would also push back the split of West Africans from Eurasians to the dawn of anatomical modernity to more than 200kya, and, the African hunter-gatherers (not examined here) well into multiregional evolution time depths.
UPDATE III (Jul 14): (A chicken and egg problem)
The authors use a 2.5 × 10−8 mutation rate per site per generation and a 25-year generation time in the paper, citing Nachman and Crowell (2000).
Nachman and Crowell estimate this rate with a Chimpanze-Human divergence at 5 million years and an ancestral population size of 10,000. However, since their generation length is 20 years, their 5 million years become 6.25 million in 25-year generation terms; the authors of the current paper (Table S1) put the human-chimp divergence at 7 million years.
What is most interesting, is that the current paper estimates ancestral population sizes by fixing the mutation rate; whereas Nachman and Crowell (2000) estimated the mutation rate by making different assumptions about ancestral population size. For example, their rate of 2.5x10-8 assumes an ancestral population size of 10,000 whereas for an ancestral population size of 100,000, this becomes 1.5x10-8.
In other words, it's a chicken and egg problem: the mutation rate has been calibrated on assumptions about ancestral population size in the earlier paper; ancestral population size is estimated by using the mutation rate in the current one.
I really do think that the way forward is to get a better estimate of the mutation rate from actual parents and children, because I see no obvious way to go around the above-mentioned problem.

UPDATE IV (Jul 14): (Possible population structure)
From the paper:
All populations showed increased Ne between 60 and 200 kyr ago, about the time of origin of anatomically modern humans17. An alternative to an increase in actual population size during this time would be that there was population structure involving separation and admixture11,16 (Supplementary Fig 5).
In the supplement, the authors consider a split into two or three sub-populations at 250ky followed by admixture at 60ky. In such a scenario, the pattern of growth between 200ky and 60ky can be explained without any actual growth taking place: the apparent growth is due to the admixture event between different types of humans.
I would also add the difference between the apparent severity of the Eurasian bottleneck after 60ky (compared to Africans) may also be due to the continuation of admixture in Africa which keeps the apparent effective size high, whereas Eurasians now begin to move outside Africa, and no longer have the opportunity to mix with archaic Africans.
UPDATE V (Jul 14): It is extremely unfortunate that this type of research was not carried out on Native Americans, Native Australians, and African hunter-gatherers. All of these would provide useful insight:
- Native Americans, because they would be somewhat immune to "late" gene flow with Africans that is hypothesized to have affected even East Asians
- Native Australians of Papuans, because of their substantial hypothesized "Denisovan" admixture which ought to register as an episode of "higher effective population size" prior to the admixture event
- African hunter-gatherers, because they, more than anyone else, would push the limits of inference to the past.
Nature (2011) doi:10.1038/nature10231
Inference of human population history from individual whole-genome sequences
Heng Li & Richard Durbin
The history of human population size is important for understanding human evolution. Various studies1, 2, 3, 4, 5 have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH)6, a Korean male (SJK)7, three European individuals (J. C. Venter8, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10–20 kyr ago. Both populations experienced a severe bottleneck 10–60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure11. We also infer that the differentiation of genetically modern humans may have started as early as 100–120 kyr ago12, but considerable genetic exchanges may still have occurred until 20–40 kyr ago.
Link
December 01, 2010
Human genetic variation: the first 50 dimensions
Here is a huge data dump for anyone interested in human variation. Part of the reason I started the Dodecad Project was to be able to analyze data on my own, rather than having to squint to make sense of a plot, to speculate about what might show up at higher dimensions, or with more clusters, to wonder how the inclusion of additional populations would affect the results, and so on.
The following dataset represents the culmination (so far), of my efforts.
Number of SNP markers: ~177,000 as in here
Populations: 139
Individuals: 2,230
In the RAR file (~11MB) you will find 49 scatterplots (5000x5000 pixels each) representing the first 50 dimensions of a multi-dimensional scaling analysis of this dataset, together with information about the samples and their sources. There is a plot of the 1st and 2nd dimensions, 2nd and 3rd, 3rd and 4th, and so on, until the 49th and 50th.
I don't believe Picasa allows such huge pics, so I've made a few smaller (still 1600x1600 pixels each) ones to give you an idea of what to expect. Note that the legend in these small ones is partly visible.
In all plots, population labels have been placed on the population averages; this usually correspond to blobs of datapoints belonging to that population, but occasionally they are shifted due to the presence of outliers.
Before I proceed, it might be worth to give a visual representation of the three poles of human variation in its broadest context; these are Basques/Sardinians, Mbuti/Biaka Pygmies, and She. Well, these are marginally more toward the three poles than many others, but they will do:
Mbuti image by Mikael Strandberg; She image from Portraits of Chinese ethnic groups and links therein.
1 vs 2
3 vs 4
5 vs 6
7 vs 8
Inspection of these plots gives you an idea of why Clusters Galore works so well. It can detect "clusteredness" of individuals along multiple dimensions. It does not look at a series of 2D plots, but it considers proximity of individuals to each other along multiple dimensions, and adapts to the shape, size, and orientation of the clusters.
October 11, 2010
Deep ancestors of human DNA compatible with structured African population
(Last Update Oct 13)
This is a wonderful paper as it directly deals with the old coalescence times of human autosomal DNA and their presumed incompatibility with the Out of Africa model:
A genome-wide frequency distribution of the TMRCAs has been reported by curatingThe four scenarios considered by the authors are seen schematically in the following figure from the paper:
the literature (Garrigan and Hammer 2006) but no systematic and consistent analysis has been performed in a single genome-wide data set. We report the fi rst genomewide estimation of the TMRCAs of anatomically modern humans, and we investigate if diff erent scenarios of human evolutionary history are supported by this estimate.
The Recent out of Africa: Single Origin Population model is the simple model that has found support in the shallow coalescence times of human Y-chromosomes and mtDNA and has made the jump to popular culture. In this model, humans are a young species that underwent a bottleneck, and Eurasians are descended from a group of Africans that left the continent. This model has been criticized for its perceived inability to explain deep divergence times in autosomal DNA.
The Recent out of Africa: Multiple Archaic Populations is the model I have advocated over the years (check out the "Palaeoafrican" label of the post for my past writings on the subject). It agrees with the previous model in the recent African origin of modern Homo sapiens but it states that the African population was structured and not panmictic: divided into fairly isolated long-standing subpopulations, and that Eurasians are descended from a single one of these African subpopulations (which I have termed "Afrasians").
The existence of a structured African population makes easy work of deep divergence times, as the variants that have such deep origins are presumed to have evolved separately in different African subpopulations, and then to have found themselves in the modern gene pool after the breakdown of this structure.
The Multi-Regional: Recent Admixture model is the one advocated by those seeing Neandertal and/or Homo erectus introgression in Eurasia. Like the previous two models, it agrees on the recent African origin of modern humans, but it sees a place for long isolated pre-existing Eurasian hominids, who contributed some of their mtDNA to modern humans.
Like the previous model, deep divergence times are no problem, as two variants with deep common ancestry are presumed to stem from the separated Eurasian and African Homo. This model has found recent support by analysis of the Neandertal genome but as the authors of that study and myself have stressed, the evidence for 1-4% Neandertal introgression into Eurasians has an alternative explanation consistent with the previous (Multple Archaic Populations in Africa) model.
Finally, the Multi-Regional: Long Standing Admixture model sees no special place for Africa, except as the point of origin of human Y chromosomes and mtDNA. Humans are descended from Homo populations from around the world that have always maintained gene flow between them. This model obviously explains deep divergence times, but has a difficult time explaining the African origin of the uniparental markers, the palaeoanthropological evidence for an emergence of anatomical modernity in East Africa and the genetic evidence for a diminution of genetic variation in Eurasia with increasing distance from East Africa.
The authors seem to propose a fifth model, Ancestral Bottleneck which is noted as a bottleneck 150,000 years ago in a possibly ancestral structured population. This model doesn't get its own figure, but can be seen in the Single Origin Population model as "Potential bottleneck 150,000 years ago".
This model seems to combine elements of the first two ones: it is an essentially single origin model for extant humans, but it keeps the possibility of structure in Africa prior to the bottleneck, and pushes the breakdown of this structure before the bottleneck.
Here is what the distribution of TMRCAs for autosomal DNA, mtDNA, and Y-chromosomes:
The authors observe that really old most recent common ancestors are predicted by all four models, so they are no reason to discount the Single Origin Population model. However, it is plain that the variance of TMRCAs observed for actual human autosomal DNA is great (the black curve is "flat"). Here is what they write:
The variance of the empirical TMRCAs is larger than the variance predicted by three of the four different models of human evolution (see Figure 2 and Supplementary Table 3), and this large variance has been interpreted as the result of archaic sub-structure in Africa (Harding and McVean 2004). Indeed, the Multiple Archaic Populations' (scenario 2) shows similar variance of TMRCAs as the empirical data, but the inflated variance of the empirical TMRCA estimates can also be due to variation in mutation or recombination rate across the 40 sequence-regions (McVean et al. 2004).
In other words, the variance is great (more young and old TMRCAs than expected), either because of variation in mutation and recombination rates (i.e., different genomic regions evolve at different paces), or because of the multiple archaic populations idea. Unfortunately, the paper does not attempt to show how e.g., a variable genome-wide mutation rate might serve to flatten the TMRCA variance of the three models that fail to reproduce the data.
When we look at uniparental markers (mtDNA and Y-chromosomes), all four models predict older ancestors than observed. Here is what they write:
While Gregory Cochran thinks I'm wrong:
The models of human evolution typically predict older TMRCAs compared to the estimated 170,000 years for mtDNA (Ingman et al. 2000) and the upper estimate of 100,000 years for the Y-chromosome (Tang et al. 2002; Wilder et al. 2004; Shi et al. 2010). For mtDNA, a TMRCA of 170,000 years is within the range of values predicted by the `Multiple Archaic Populations' scenario (P(TMRCA less than 170,000) = 0.21), but the mitochondrial TMRCA estimate is diffi cult to reconcile with the remaining three scenarios (P less than 4x10-2). For the Y-chromosome, a TMRCA of 100,000 years is clearly at odds with three of the models (P less than 6x10-4), but for the `Multiple Archaic Populations' scenario with archaic African admixture, the proportion of simulated gene trees with TMRCAs younger than 100,000 years is larger than for the other three models, albeit quite small (P = 1.5x10-2).
Thus, while all four models can perhaps account for old autosomal TMRCAs (The "multiple archaics" on its own, the other three with help from variable genome-wide evolution), none of them can account for the young ages of human Y-chromsomes and mtDNA, with "multiple archaics" again coming on top, being consistent with "mitochondrial Eve", and coming closer (but not quite) to consistency with "Y-chromosome Adam".
There are ways to reconcile all four models with the uniparental markers, however. For the Multiple Archaic Populations model, they acknowledge that the Y-chromosome problem would go away if they increased the number of these populations from their current 3, while for the rest they invoke selection to account for the recency of human mtDNA and Y-chromosomes.
The effective population size tug of war
Parenthetically, it is important to note here the problem of the effective population size, as it has fueled quite a lot of sensationalistic media stories and documentaries (of the "humans were at the brink of extinction, and then a small band of them survived and went on to conquer the world" kind).
Here are some useful observations:
High effective population size => old TMRCAs
Low effective population size => young TMRCAs
Directional selection => young TMRCAs
Balancing selection => old TMRCAs
Structured population => old TMRCAs
In order to account for the recency of human Y-chromosomes and mtDNA, scientists came up with very low population sizes for our ancestors ("the endangered tribe" meme).
Unfortunately, this has the side-effect of predicting very low ages for autosomal DNA, lower than observed! To fix one problem, another one is created.
Can we have our cake and eat it too? An idea is to invoke balancing selection in autosomal DNA, i.e., the persistence of two variants at a given locus because they confer different advantages/disadvantages and an equilibrium between them exists, not allowing one or the other to reach its destiny of fixation.
Another idea is to invoke directional selection in Y-chromosomes and mtDNA. In directional selection, competing alleles are weeded out not by the winds of fortune, but by the supremacy of the successful alleles (Adam and Eve in our case) which push them to the side.
A different idea is to invoke ancient population structure. This immediately adds time to the TMRCA (since the different sub-populations became separated), and can thus explain old divergence times.
A fourth idea is to invoke "technical" things like variable mutation rate across the genome, or see problems in the standard age estimations for Adam and Eve. That way you can explain why there are more old autosomal TMRCAs than your model predicts, or why Adam and Eve are younger.
No wonder that there is no consensus among experts!
Conclusion
This paper certainly shows that the multiple archaic African populations model that I have advocated is a strong contender for being close to what actually happened. A priori, I think that the ecological and climatic variation in Africa -especially due to its north-south geometrical orientation-, and the long-established presence of Homo in the continent, make it unlikely that a single population of Homo survived there at the expense of all others.
In short, I think that: humans were never endangered in Africa, never dwindled to small numbers (inferred ancestral effective population sizes in the paper are 8k for Multiple Archaic Populations and 14k for Ancestral Bottleneck), and were not a single panmictic population spanning ecological niches and climate zones.
Rather, there were always separate populations in Africa, and climatic change (and more lately behavioral/subsistence change) has resulted in an ever-present process of population fusions and fissions. One of these sub-populations, living somewhere in East Africa, accumulated enough biological advantages to become extremely successful, populating Eurasia on the one hand where some admixture with archaic Eurasians may have taken place, but, also, successfully populating the rest of Africa, where it absorbed other subpopulations of Homo in the continent itself.
UPDATE (Oct 13): Some discussion of the paper and my own theories in Gene Expression, wherein Chris Stringer, a leading proponent of the "Recent out of Africa: Single Origin Population" says that:
My new book covers all this, and your recent work, but I do agree with Dienekes on the importance of deep African population substructure to the story..
Dienekes is wrong about the Neanderthal interbreeding results being explained by African population substructure, , but there are a lot of indications that there was significant substructure. A lot of this involves work that is not yet published: I look forward to seeing the details. Some of what I hear is remarkable.For myself, I'm waiting to see data on native east Africans on segments of "Neandertal" ancestry. Let's look at native groups from Somalia, Kenya, Ethiopia, Tanzania with limited Caucasoid admixture and let's see how much "Neandertal" ancestry they have. If they don't have any, then "Neandertal" genes must have a Eurasian admixture explanation. If they have too little, then it can be explained by Caucasoid admixture in more recent times. But, if they have much more "Neandertal" admixture than Caucasoid admixture can explain, then the obvious solution is African population substructure.
Mol Biol Evol (2010) doi: 10.1093/molbev/msq265
Deep divergences of human gene trees and models of human origins
Michael GB Blum and Mattias Jakobsson
Two competing hypotheses are at the forefront of the debate on modern human origins. In the first scenario, known as the recent Out-of-Africa hypothesis, modern humans arose in Africa about 100,000-200,000 years ago, and spread throughout the world by replacing the local archaic human populations. By contrast, the second hypothesis posits substantial gene flow between archaic and emerging modern humans. In the last two decades, the young time estimates – between 100,000 and 200,000 years – of the most recent common ancestors for the mitochondrion and the Y-chromosome provided evidence in favor of a recent African origin of modern humans. However, the presence of very old lineages for autosomal and X-linked genes has often been claimed to be incompatible with a simple, single origin of modern humans. Through the analysis of a public DNA sequence database, we find, similar to previous estimates, that the common ancestors of autosomal and X-linked genes are indeed very old, living, on average, respectively 1,500,000 and 1,000,000 years ago. However, contrary to previous conclusions, we find that these deep gene genealogies are consistent with the Out-of-Africa scenario provided that the ancestral effective population size was approximately 14,000 individuals. We show that an ancient bottleneck in the Middle Pleistocene, possibly arising from an ancestral structured population, can reconcile the contradictory findings from the mitochondrion on the one hand, with the autosomes and the X-chromosome on the other hand.
Link
July 14, 2010
mtDNA of Yemeni and Ethiopian Jews
From the paper:
Mitochondrial DNA analysis also revealed a high diversity of sub-Saharan African and Eurasian haplotypes in both the Yemenite and Ethiopian Jewish populations (see Fig. 2). Specifically, common haplotypes (haplotypes present at [5%) in Yemenite Jews include the African haplogroup L3x1 and Eurasian haplogroups R0a (renamed from (preHV)1 (Torroni et al., 2006), HV1, J2a1a [renamed from J1b (Palanichamy et al., 2004)] K, R2, U, and U1, and in Ethiopian Jews include African haplogroups L2a1b2 and L5a1 and Eurasian haplogroups R0a and M1a1 (see Fig. 2). Overall, sub-I think that the authors' conclusion that Yemenite Jews are partially descended from Israeli exiles is premature. Sure, they can exclude large-scale introgression of Yemeni mtDNA, but the universe of possibilities is not limited to either Israeli or Yemenite.
Saharan African L haplotypes [hereafter referred to as L(xM,N), i.e., all African haplotypes except M and N, following the nomenclature of Behar et al. (2008)], comprise a large proportion of the genetic variation in both Jewish populations, representing 20% in the Yemenite Jews and 50% in Ethiopian Jews. This high frequency contrasts with other Jewish populations, such as Near Eastern and Ashkenazi Jews, who almost entirely lack L(xM,N) haplogroups (Thomas et al., 2002; Richards et al., 2003).
The way I see it, only a large-scale study of all global Jewish populations may uncover verified ancient Jewish lineages for both Y-chromosomes and mtDNA. The recent studies on Jews have uncovered several genetic sub-clusters of Jews, and only lineages that occur in 2 or more of these clusters, and preferably geographically separated ones have a strong claim of representing original Jewish lineages. There is a limit on what can be uncovered about the past from the study of living populations.
American Journal of Physical Anthropology doi: 10.1002/ajpa.21360
Mitochondrial DNA reveals distinct evolutionary histories for Jewish populations in Yemen and Ethiopia
Amy L. Non et al.
Abstract
Southern Arabia and the Horn of Africa are important geographic centers for the study of human population history because a great deal of migration has characterized these regions since the first emergence of humans out of Africa. Analysis of Jewish groups provides a unique opportunity to investigate more recent population histories in this area. Mitochondrial DNA is used to investigate the maternal evolutionary history and can be combined with historical and linguistic data to test various population histories. In this study, we assay mitochondrial control region DNA sequence and diagnostic coding variants in Yemenite (n = 45) and Ethiopian (n = 41) Jewish populations, as well as in neighboring non-Jewish Yemeni (n = 50) and Ethiopian (previously published Semitic speakers) populations. We investigate their population histories through a comparison of haplogroup distributions and phylogenetic networks. A high frequency of sub-Saharan African L haplogroups was found in both Jewish populations, indicating a significant African maternal contribution unlike other Jewish Diaspora populations. However, no identical haplotypes were shared between the Yemenite and Ethiopian Jewish populations, suggesting very little gene flow between the populations and potentially distinct maternal population histories. These new data are also used to investigate alternate population histories in the context of historical and linguistic data. Specifically, Yemenite Jewish mitochondrial diversity reflects potential descent from ancient Israeli exiles and shared African and Middle Eastern ancestry with little evidence for large-scale conversion of local Yemeni. In contrast, the Ethiopian Jewish population appears to be a subset of the larger Ethiopian population suggesting descent primarily through conversion of local women.
Link
American Journal of Physical Anthropology doi: 10.1002/ajpa.21360
Mitochondrial DNA reveals distinct evolutionary histories for Jewish populations in Yemen and Ethiopia
Amy L. Non et al.
Abstract
Southern Arabia and the Horn of Africa are important geographic centers for the study of human population history because a great deal of migration has characterized these regions since the first emergence of humans out of Africa. Analysis of Jewish groups provides a unique opportunity to investigate more recent population histories in this area. Mitochondrial DNA is used to investigate the maternal evolutionary history and can be combined with historical and linguistic data to test various population histories. In this study, we assay mitochondrial control region DNA sequence and diagnostic coding variants in Yemenite (n = 45) and Ethiopian (n = 41) Jewish populations, as well as in neighboring non-Jewish Yemeni (n = 50) and Ethiopian (previously published Semitic speakers) populations. We investigate their population histories through a comparison of haplogroup distributions and phylogenetic networks. A high frequency of sub-Saharan African L haplogroups was found in both Jewish populations, indicating a significant African maternal contribution unlike other Jewish Diaspora populations. However, no identical haplotypes were shared between the Yemenite and Ethiopian Jewish populations, suggesting very little gene flow between the populations and potentially distinct maternal population histories. These new data are also used to investigate alternate population histories in the context of historical and linguistic data. Specifically, Yemenite Jewish mitochondrial diversity reflects potential descent from ancient Israeli exiles and shared African and Middle Eastern ancestry with little evidence for large-scale conversion of local Yemeni. In contrast, the Ethiopian Jewish population appears to be a subset of the larger Ethiopian population suggesting descent primarily through conversion of local women.
Link
January 02, 2010
R-V88 and migration of Chadic speakers across the Sahara
The presence of R1b chromosomes in Africa is one of a few Y-chromosome phylogeographic anomalies I noted long ago. This new paper offers an insight into the migration of these chromosomes along with the Chadic branch of Afroasiatic from Asia to Europe. More on this after I read the paper.
European Journal of Human Genetics doi:10.1038/ejhg.2009.231
Human Y chromosome haplogroup R-V88: a paternal genetic record of early mid Holocene trans-Saharan connections and the spread of Chadic languages
Fulvio Cruciani et al.
Abstract
Although human Y chromosomes belonging to haplogroup R1b are quite rare in Africa, being found mainly in Asia and Europe, a group of chromosomes within the paragroup R-P25* are found concentrated in the central-western part of the African continent, where they can be detected at frequencies as high as 95%. Phylogenetic evidence and coalescence time estimates suggest that R-P25* chromosomes (or their phylogenetic ancestor) may have been carried to Africa by an Asia-to-Africa back migration in prehistoric times. Here, we describe six new mutations that define the relationships among the African R-P25* Y chromosomes and between these African chromosomes and earlier reported R-P25 Eurasian sub-lineages. The incorporation of these new mutations into a phylogeny of the R1b haplogroup led to the identification of a new clade (R1b1a or R-V88) encompassing all the African R-P25* and about half of the few European/west Asian R-P25* chromosomes. A worldwide phylogeographic analysis of the R1b haplogroup provided strong support to the Asia-to-Africa back-migration hypothesis. The analysis of the distribution of the R-V88 haplogroup in >1800 males from 69 African populations revealed a striking genetic contiguity between the Chadic-speaking peoples from the central Sahel and several other Afroasiatic-speaking groups from North Africa. The R-V88 coalescence time was estimated at 9200–5600 kya, in the early mid Holocene. We suggest that R-V88 is a paternal genetic record of the proposed mid-Holocene migration of proto-Chadic Afroasiatic speakers through the Central Sahara into the Lake Chad Basin, and geomorphological evidence is consistent with this view.
Link
UPDATE (8/1/10):
The paper, to its credit acknowledges that the "effective mutation rate" depends on population growth history as I have argued a year and a half ago. The authors write:
As I noted in haplogroup sizes and observation selection effects haplogroup sizes provide a sanity check to assumptions about population growth history:
Owing to the uncertainties associated with the estimate of the evolutionary effective microsatellite mutationrates, depending on the haplogroup demographic history,37 we considered two different population models: (1) a constant size population and (2) a single rate of m=0.01 for exponential population growth. After calibration for the specificmicrosatellites used in this study,13 we found evolutionary effective mutationrates of 7.9x10-4 and 1.3x10-3, respectively.and:
As an upper limit, we used the coalescence time of the R-M343/P25 haplogroup (12.9 ky, 95% CI=11.6–14.3 ky, under a conservative scenario of constant population size), which, on the basis of the accumulated nucleotide and microsatellite diversity (Table 1; Figure 2), most likely originated outside Africa. The coalescence time of the seemingly African-specific haplogroup R-V69 (6.0 ky, 95% CI=4.2–8.2 ky, under the hypothesis of an expanding population) was used as a lower limit.
Haplogroups do not reach commonly-observed present-day sizes under the assumption of constant population size. Inferences of age based on such an assumption are a very conservative upper limit. However, the assumption of m=0.01 also does not result in "large" present day haplogroups (see previous link).
Thus, I suppose that the age of R-V88 is younger than 4.2–8.2 ky, and could be as young as ~3-4ky in a rapidly expanding population. To determine how fast R-V88 actually grew, we must take into account its present-day demographic size (how many people in the world now possess it). The final estimate must be consistent with both the demographic size and the current Y-STR variance.
I don't have data on R-V88 prevalence today, but it really doesn't take a very large haplogroup in order to infer a very fast growth rate, and a Y-STR variance accumulation rate (effective rate) close to the germline one. Therefore, I am guessing that R-V88 is also one of a growing palette of haplogroups that expanded during the Bronze Age.
European Journal of Human Genetics doi:10.1038/ejhg.2009.231
Human Y chromosome haplogroup R-V88: a paternal genetic record of early mid Holocene trans-Saharan connections and the spread of Chadic languages
Fulvio Cruciani et al.
Abstract
Although human Y chromosomes belonging to haplogroup R1b are quite rare in Africa, being found mainly in Asia and Europe, a group of chromosomes within the paragroup R-P25* are found concentrated in the central-western part of the African continent, where they can be detected at frequencies as high as 95%. Phylogenetic evidence and coalescence time estimates suggest that R-P25* chromosomes (or their phylogenetic ancestor) may have been carried to Africa by an Asia-to-Africa back migration in prehistoric times. Here, we describe six new mutations that define the relationships among the African R-P25* Y chromosomes and between these African chromosomes and earlier reported R-P25 Eurasian sub-lineages. The incorporation of these new mutations into a phylogeny of the R1b haplogroup led to the identification of a new clade (R1b1a or R-V88) encompassing all the African R-P25* and about half of the few European/west Asian R-P25* chromosomes. A worldwide phylogeographic analysis of the R1b haplogroup provided strong support to the Asia-to-Africa back-migration hypothesis. The analysis of the distribution of the R-V88 haplogroup in >1800 males from 69 African populations revealed a striking genetic contiguity between the Chadic-speaking peoples from the central Sahel and several other Afroasiatic-speaking groups from North Africa. The R-V88 coalescence time was estimated at 9200–5600 kya, in the early mid Holocene. We suggest that R-V88 is a paternal genetic record of the proposed mid-Holocene migration of proto-Chadic Afroasiatic speakers through the Central Sahara into the Lake Chad Basin, and geomorphological evidence is consistent with this view.
Link
October 30, 2008
"Phoenician" Y-chromosomes
It has been several years since the inception of the Genographic project, and to say that the quantity and quality of the work produced by it is underwhelming would be charitable.
The newest bit of Genographic wisdom is that haplogroup J2 in the Mediterranean is associated not with the Neolithic, Greek, or other population movements, but with the sea-faring Phoenicians. They achieve this feat by (allegedly) comparing areas of Phoenician with those of no (or low) such influence.
I have intentionally limited myself to five major weak points of the study: to cover more would be too time-consuming and unnecessary.

1. The Hellenistic age did not happen
A central assumption of this work is that the conquest and occupation of the Middle East by Alexander the Great does not count as Greek influence, despite centuries of Greek domination that followed, both during Hellenistic, and later in Roman times.
The authors write that their method could be further used to:
Thus, the population of Phoenicia and its "periphery" is implicitly assumed to be free of Greek influence. That is a bizarre contention, given that Greek was spoken in "Phoenicia" long after the Phoenician language became extinct.
2. Crete was influenced by the Phoenicians
This totally unsupported claim is necessary for the authors' thesis, since Crete has the world maximum of haplogroup J2. I have no doubt that Phoenicians traded with Cretans, just as Cretans traded with Phoenicians. But, that is no excuse to think of Crete as an area of Phoenician influence.
Indeed, settlement of the Levant by Aegean peoples is archaeologically supported, while Phoenician settlement of Crete is not.
But, speaking of Phoenician settlement, the only area of Greece where such settlement is believed to have taken place is in mainland Greece, in Thebes, where Cadmus and his Phoenicians founded Cadmeis. I doubt that this had any substantial effect, but if the authors wanted to be intellectually honest, they would list this as an area of Phoenician influence, rather than Crete.
3. West Asia Minor (or the Pontus) was not colonized by Greeks
The most laughable claim of the authors (see map) is the absence of blue (Greek) dots on West Asia Minor, and the Pontus (Northeast Turkey). Apparently the Greek colonies of the far West (such as Marseilles) count as areas of Greek influence, while the countless Greek cities on the Asian side of the Aegean, or in northeast Turkey do not.
The motivation of this is obvious, since Asia Minor is a J2-heavy area and asserting the Greek influence there would upset the paper's thesis. But, it is absurd to place blue dots in Paphlagonia and Caria and not in Ionia or the Pontus.
4. Modern Lebanese are descendants of Phoenicians
This central assumption of the paper has no actual support, except for a vague geographical congruence. Modern Lebanese are a hybrid people, divided into Christians and Muslims. Both are Arabs, with Muslims being more influenced by the original Arabians, and Christians more influenced by the pre-Arab (Greco-Syrian) and post-Arab (West European) migrations. Perhaps, there is a trace of Phoenician genes in them, but this is really not a self-evident claim.
5. R1b in Greece and Turkey is due to the Celts
R1b in Greece and Turkey belongs primarily into the "eastern" variety, and not the "western" variety. It is in Italy and north of Greece where the two varieties begin to blend with each other. No care to distinguish between these varieties is taken.
Certainly, some R1b in this region may be due to Western Europeans (e.g. from the period of the Frankokratia), but to assign its totality to this factor is nonsensical. Apparently, the geniuses of the Genographic project have decreed that the brief foray of the Celts into Greece introduced massive amounts of R1b, but a thousand years of Greco-Roman domination of the Levant did nothing of the kind.
6 (bonus). Haplogroup J2 is more frequent in East than in West Sicily
Sicily is an island which had well-documented and not insignificant settlements by both Greeks and Phoenicians. Moreover, these settlements were geographically divided: Greeks in the East, Phoenicians in the West. It is in the East that J2 has its highest frequency, and not in the Phoenician West.
Conclusion
Is there anything of value in this paper? Well, it's a good idea to try to correlate Y-chromosome distribution with historical rather than pre-historical events. Too bad the authors botched the job, but their paper can at least serve as a reference point for how not to go about doing it.
UPDATE: Take a look at the "haplotype groups" suggested by the authors as signals of Phoenician and Greek colonization.

Not only are haplotype groups not clades (they do not designate common ancestry), but 7-marker haplotypes don't even designate anything that can be remotely tied to the time period in question, given the huge confidence intervals associated with even larger numbers of markers. Feel free to plug these haplotypes to yhrd or ysearch to find plenty of long-lost "Phoenicians" all over the planet.
UPDATE II: The "evolutionary" mutation rate rears its ugly head
From the paper:
The modern Lebanese are Arabs, as are most modern North Africans where Phoenician colonies were founded. The Arabs also affected several Mediterranean islands, as well as Iberia. One would think that the most salient feature of modern Mediterranean populations would be mentioned in a paper which attempted to trace patterns of Y-chromosome variation in the Mediterranean.
Certainly, the Neolithic, Greek, and Phoenician migrations, as well as the Jewish Diaspora moved people around. But the Phoenicians have been extinct for 2,000 years. The Jews had (and have) communities around the Mediterranean, but did not amount to a significant population element anywhere. It is the Arabs who are the elephant in the room, and yet they are ignored. Are similarities between the Levant, North Africa and Spain due to Phoenicians or due to this later Arab movement? By failing to trace the distribution of their "Phoenician colonization signals" among Arabians, the authors have overstated their case.
American Journal of Human Genetics doi: :10.1016/j.ajhg.2008.10.012
Identifying Genetic Traces of Historical Expansions: Phoenician Footprints in the Mediterranean
Pierre A. Zalloua et al.
Abstract
The Phoenicians were the dominant traders in the Mediterranean Sea two thousand to three thousand years ago and expanded from their homeland in the Levant to establish colonies and trading posts throughout the Mediterranean, but then they disappeared from history. We wished to identify their male genetic traces in modern populations. Therefore, we chose Phoenician-influenced sites on the basis of well-documented historical records and collected new Y-chromosomal data from 1330 men from six such sites, as well as comparative data from the literature. We then developed an analytical strategy to distinguish between lineages specifically associated with the Phoenicians and those spread by geographically similar but historically distinct events, such as the Neolithic, Greek, and Jewish expansions. This involved comparing historically documented Phoenician sites with neighboring non-Phoenician sites for the identification of weak but systematic signatures shared by the Phoenician sites that could not readily be explained by chance or by other expansions. From these comparisons, we found that haplogroup J2, in general, and six Y-STR haplotypes, in particular, exhibited a Phoenician signature that contributed > 6% to the modern Phoenician-influenced populations examined. Our methodology can be applied to any historically documented expansion in which contact and noncontact sites can be identified.
Link
The newest bit of Genographic wisdom is that haplogroup J2 in the Mediterranean is associated not with the Neolithic, Greek, or other population movements, but with the sea-faring Phoenicians. They achieve this feat by (allegedly) comparing areas of Phoenician with those of no (or low) such influence.
I have intentionally limited myself to five major weak points of the study: to cover more would be too time-consuming and unnecessary.

1. The Hellenistic age did not happen
A central assumption of this work is that the conquest and occupation of the Middle East by Alexander the Great does not count as Greek influence, despite centuries of Greek domination that followed, both during Hellenistic, and later in Roman times.
The authors write that their method could be further used to:
include systematic investigations of military expansions, such as the Greek signal, from the time of Alexander the Great in central and south AsiaApparently they didn't think of applying it to West Asia itself, which was also conquered by Alexander the Great, and in which the Greek-speaking element persisted far longer than in "south Asia".
Thus, the population of Phoenicia and its "periphery" is implicitly assumed to be free of Greek influence. That is a bizarre contention, given that Greek was spoken in "Phoenicia" long after the Phoenician language became extinct.
2. Crete was influenced by the Phoenicians
This totally unsupported claim is necessary for the authors' thesis, since Crete has the world maximum of haplogroup J2. I have no doubt that Phoenicians traded with Cretans, just as Cretans traded with Phoenicians. But, that is no excuse to think of Crete as an area of Phoenician influence.
Indeed, settlement of the Levant by Aegean peoples is archaeologically supported, while Phoenician settlement of Crete is not.
But, speaking of Phoenician settlement, the only area of Greece where such settlement is believed to have taken place is in mainland Greece, in Thebes, where Cadmus and his Phoenicians founded Cadmeis. I doubt that this had any substantial effect, but if the authors wanted to be intellectually honest, they would list this as an area of Phoenician influence, rather than Crete.
3. West Asia Minor (or the Pontus) was not colonized by Greeks
The most laughable claim of the authors (see map) is the absence of blue (Greek) dots on West Asia Minor, and the Pontus (Northeast Turkey). Apparently the Greek colonies of the far West (such as Marseilles) count as areas of Greek influence, while the countless Greek cities on the Asian side of the Aegean, or in northeast Turkey do not.
The motivation of this is obvious, since Asia Minor is a J2-heavy area and asserting the Greek influence there would upset the paper's thesis. But, it is absurd to place blue dots in Paphlagonia and Caria and not in Ionia or the Pontus.
4. Modern Lebanese are descendants of Phoenicians
This central assumption of the paper has no actual support, except for a vague geographical congruence. Modern Lebanese are a hybrid people, divided into Christians and Muslims. Both are Arabs, with Muslims being more influenced by the original Arabians, and Christians more influenced by the pre-Arab (Greco-Syrian) and post-Arab (West European) migrations. Perhaps, there is a trace of Phoenician genes in them, but this is really not a self-evident claim.
5. R1b in Greece and Turkey is due to the Celts
R1b in Greece and Turkey belongs primarily into the "eastern" variety, and not the "western" variety. It is in Italy and north of Greece where the two varieties begin to blend with each other. No care to distinguish between these varieties is taken.
Certainly, some R1b in this region may be due to Western Europeans (e.g. from the period of the Frankokratia), but to assign its totality to this factor is nonsensical. Apparently, the geniuses of the Genographic project have decreed that the brief foray of the Celts into Greece introduced massive amounts of R1b, but a thousand years of Greco-Roman domination of the Levant did nothing of the kind.
6 (bonus). Haplogroup J2 is more frequent in East than in West Sicily
Sicily is an island which had well-documented and not insignificant settlements by both Greeks and Phoenicians. Moreover, these settlements were geographically divided: Greeks in the East, Phoenicians in the West. It is in the East that J2 has its highest frequency, and not in the Phoenician West.
Conclusion
Is there anything of value in this paper? Well, it's a good idea to try to correlate Y-chromosome distribution with historical rather than pre-historical events. Too bad the authors botched the job, but their paper can at least serve as a reference point for how not to go about doing it.
UPDATE: Take a look at the "haplotype groups" suggested by the authors as signals of Phoenician and Greek colonization.

Not only are haplotype groups not clades (they do not designate common ancestry), but 7-marker haplotypes don't even designate anything that can be remotely tied to the time period in question, given the huge confidence intervals associated with even larger numbers of markers. Feel free to plug these haplotypes to yhrd or ysearch to find plenty of long-lost "Phoenicians" all over the planet.
UPDATE II: The "evolutionary" mutation rate rears its ugly head
From the paper:
Because there is a significant chance that a haplotype existing 3000 years ago has accumulated a one-step difference in an STR (we expect 0.6 mutations per seven-STR haplotype when a rate of 6.9x10-4 per locus per 25 yr is used), these one-step neighbors have been included in each set, producing what we have labeled STR+s. STR-s can contain both haplotypes deriving from mutations, which should have been included, and independent haplotypes unconnected with the migrations that we are trying to detect.UPDDATE III: What of the Arabs?
The modern Lebanese are Arabs, as are most modern North Africans where Phoenician colonies were founded. The Arabs also affected several Mediterranean islands, as well as Iberia. One would think that the most salient feature of modern Mediterranean populations would be mentioned in a paper which attempted to trace patterns of Y-chromosome variation in the Mediterranean.
Certainly, the Neolithic, Greek, and Phoenician migrations, as well as the Jewish Diaspora moved people around. But the Phoenicians have been extinct for 2,000 years. The Jews had (and have) communities around the Mediterranean, but did not amount to a significant population element anywhere. It is the Arabs who are the elephant in the room, and yet they are ignored. Are similarities between the Levant, North Africa and Spain due to Phoenicians or due to this later Arab movement? By failing to trace the distribution of their "Phoenician colonization signals" among Arabians, the authors have overstated their case.
American Journal of Human Genetics doi: :10.1016/j.ajhg.2008.10.012
Identifying Genetic Traces of Historical Expansions: Phoenician Footprints in the Mediterranean
Pierre A. Zalloua et al.
Abstract
The Phoenicians were the dominant traders in the Mediterranean Sea two thousand to three thousand years ago and expanded from their homeland in the Levant to establish colonies and trading posts throughout the Mediterranean, but then they disappeared from history. We wished to identify their male genetic traces in modern populations. Therefore, we chose Phoenician-influenced sites on the basis of well-documented historical records and collected new Y-chromosomal data from 1330 men from six such sites, as well as comparative data from the literature. We then developed an analytical strategy to distinguish between lineages specifically associated with the Phoenicians and those spread by geographically similar but historically distinct events, such as the Neolithic, Greek, and Jewish expansions. This involved comparing historically documented Phoenician sites with neighboring non-Phoenician sites for the identification of weak but systematic signatures shared by the Phoenician sites that could not readily be explained by chance or by other expansions. From these comparisons, we found that haplogroup J2, in general, and six Y-STR haplotypes, in particular, exhibited a Phoenician signature that contributed > 6% to the modern Phoenician-influenced populations examined. Our methodology can be applied to any historically documented expansion in which contact and noncontact sites can be identified.
Link
September 25, 2008
Deshpande et al. (2008) on Out of Africa
This is an important new paper which adds some complexity to the Out of Africa theory. Much existing work has focused on a "tree-like" story of the emergence of modern humans, with an African source population at the root, and other populations being less diverse the further they are (geographically) from the source.
This new model is not limited on colonization, i.e., the movement of a subset of a territory's population into a new uninhabited territory, but also on "lateral" gene exchange between pre-established populations.
From the paper:
Related: Geographic and genetic distance in human populations, A Geographically Explicit Genetic Model of Worldwide Human-Settlement History
Proceedings of the Royal Society B doi: 10.1098/rspb.2008.0750
A serial founder effect model for human settlement out of Africa
Omkar Deshpande, Serafim Batzoglou, Marcus W. Feldman, L. Luca Cavalli-Sforza
Abstract
The increasing abundance of human genetic data has shown that the geographical patterns of worldwide genetic diversity are best explained by human expansion out of Africa. This expansion is modelled well by prolonged migration from a single origin in Africa with multiple subsequent serial founding events. We discuss a new simulation model for the serial founder effect out of Africa and compare it with results from previous studies. Unlike previous models, we distinguish colonization events from the continued exchange of people between occupied territories as a result of mating. We conduct a search through parameter space to estimate the range of parameter values that best explain key statistics from published data on worldwide variation in microsatellites. The range of parameters we use is chosen to be compatible with an out-of-Africa migration at 50–60Kyr ago and archaeo–ethno–demographic information. In addition to a colonization rate of 0.09–0.18, for an acceptable fit to the published microsatellite data, incorporation into existing models of exchange between neighbouring populations is essential, but at a very low rate. A linear decay of genetic diversity with geographical distance from the origin of expansion could apply to any species, especially if it moved recently into new geographical niches.
Link
This new model is not limited on colonization, i.e., the movement of a subset of a territory's population into a new uninhabited territory, but also on "lateral" gene exchange between pre-established populations.
From the paper:
Unlike previous models, ours separated colonization events from the continued exchange of people between occupied territories. Our estimates of the exchange rate between neighbouring populations were very low (below 0.01), with carrying capacities ranging from approximately 600 to 1200. Assuming that the census size is three times this effective population size, we derive a census size of approximately 1800–3600 people in each deme. Since each deme has dimensions of 125x125 km, this corresponds to a population density of approximately 0.11–0.23 persons m-2, well within the range for hunter–gatherers referred to by Liu et al. (2006).
Related: Geographic and genetic distance in human populations, A Geographically Explicit Genetic Model of Worldwide Human-Settlement History
Proceedings of the Royal Society B doi: 10.1098/rspb.2008.0750
A serial founder effect model for human settlement out of Africa
Omkar Deshpande, Serafim Batzoglou, Marcus W. Feldman, L. Luca Cavalli-Sforza
Abstract
The increasing abundance of human genetic data has shown that the geographical patterns of worldwide genetic diversity are best explained by human expansion out of Africa. This expansion is modelled well by prolonged migration from a single origin in Africa with multiple subsequent serial founding events. We discuss a new simulation model for the serial founder effect out of Africa and compare it with results from previous studies. Unlike previous models, we distinguish colonization events from the continued exchange of people between occupied territories as a result of mating. We conduct a search through parameter space to estimate the range of parameter values that best explain key statistics from published data on worldwide variation in microsatellites. The range of parameters we use is chosen to be compatible with an out-of-Africa migration at 50–60Kyr ago and archaeo–ethno–demographic information. In addition to a colonization rate of 0.09–0.18, for an acceptable fit to the published microsatellite data, incorporation into existing models of exchange between neighbouring populations is essential, but at a very low rate. A linear decay of genetic diversity with geographical distance from the origin of expansion could apply to any species, especially if it moved recently into new geographical niches.
Link
November 17, 2006
Human eye color explained by a three-SNP haplotype
A very exciting new preprint in AJHG describes how a haplotype defined by three SNPs, i.e., single-letter changes in the genetic code, describes most variation in human eye color. I am sure that this paper will make the news once its edited version appears, but this is very exciting development for many different reasons.
American Journal of Human Genetics (preprint)
A three-SNP haplotype in the first intron of OCA2 explains most human eye color variation
David L. Duffy, Grant W. Montgomery, Wei Chen, Zhen Zhen Zhao, Lien Le, Michael R. James, Nicholas K. Hayward, Nicholas G. Martin, Richard A. Sturm
Abstract
We have previously shown that a QTL linked to the OCA2 region of 15q accounts for 74% of variation in human eye color. We conducted additional genotyping to clarify the role of the OCA2 locus in the inheritance of eye color and other pigmentary traits associated with skin cancer risk in white populations. Fifty eight synonymous and non-synonymous exonic SNPs and tagging SNPs were typed in a collection of 3839 adolescent twins, their sibs, and parents. The highest association for blue:non-blue eye color was found with three OCA2 SNPs; rs7495174 T/C, rs6497268 G/T and rs11855019 T/C (P-values of 1.02x10-61, 1.57x10-96, and 4.45x10-54 respectively) in intron 1. These three SNPs are in one major haplotype block with TGT representing 78.4% of alleles. The TGT/TGT diplotype found in 62.2% of samples was the major genotype seen to modify eye color, with a frequency of 0.905 in blue or green compared with only 0.095 in brown eye color. This genotype was also at highest frequency in subjects with light brown hair and was more frequent in fair and medium skin types, consistent with the TGT haplotype acting as a recessive modifier of lighter pigmentary phenotypes. Homozygotes for rs11855019 C/C were predominantly without freckles and had decreased mole counts. The minor population impact of the nonsynonymous coding region polymorphisms Arg305Trp and Arg419Gln associated with non-blue eyes, and the tight linkage of the major TGT haplotype within the first intron of OCA2 with blue eye color and lighter hair and skin tones, suggest that differences within the 5’ proximal regulatory control region of OCA2 gene alter expression or mRNA transcript levels and may be responsible for these associations.
Link
- First, it shows that a very striking observable difference among humans can be explained by minute differences in the genetic code. This should be a reminder to those who engage in grocery-style genetics. Quantity matters not.
- Second, eye color is an important phenotypical character that people actually care about. Genetics becomes exciting when it's about stuff that people are interested in (intelligence, eye color, the chance of getting cancer before 40, etc.).
- Third, we are finally getting to the point where genetics can be used to infer characteristics of organisms that are not preserved in bones. This will doubtlessly lead to applications in ancient DNA research (see also here).
American Journal of Human Genetics (preprint)
A three-SNP haplotype in the first intron of OCA2 explains most human eye color variation
David L. Duffy, Grant W. Montgomery, Wei Chen, Zhen Zhen Zhao, Lien Le, Michael R. James, Nicholas K. Hayward, Nicholas G. Martin, Richard A. Sturm
Abstract
We have previously shown that a QTL linked to the OCA2 region of 15q accounts for 74% of variation in human eye color. We conducted additional genotyping to clarify the role of the OCA2 locus in the inheritance of eye color and other pigmentary traits associated with skin cancer risk in white populations. Fifty eight synonymous and non-synonymous exonic SNPs and tagging SNPs were typed in a collection of 3839 adolescent twins, their sibs, and parents. The highest association for blue:non-blue eye color was found with three OCA2 SNPs; rs7495174 T/C, rs6497268 G/T and rs11855019 T/C (P-values of 1.02x10-61, 1.57x10-96, and 4.45x10-54 respectively) in intron 1. These three SNPs are in one major haplotype block with TGT representing 78.4% of alleles. The TGT/TGT diplotype found in 62.2% of samples was the major genotype seen to modify eye color, with a frequency of 0.905 in blue or green compared with only 0.095 in brown eye color. This genotype was also at highest frequency in subjects with light brown hair and was more frequent in fair and medium skin types, consistent with the TGT haplotype acting as a recessive modifier of lighter pigmentary phenotypes. Homozygotes for rs11855019 C/C were predominantly without freckles and had decreased mole counts. The minor population impact of the nonsynonymous coding region polymorphisms Arg305Trp and Arg419Gln associated with non-blue eyes, and the tight linkage of the major TGT haplotype within the first intron of OCA2 with blue eye color and lighter hair and skin tones, suggest that differences within the 5’ proximal regulatory control region of OCA2 gene alter expression or mRNA transcript levels and may be responsible for these associations.
Link
April 19, 2005
Three phylogeographic anomalies
In the last few years, the phylogeography of many clades of the human mtDNA and Y-chromosome systems has been adequately resolved, but there still exist several big remaining puzzles.
The first one is that of mtDNA haplogroup X, which has been addressed in a recent paper. This is a very ancient clade, which is found at low frequencies almost everywhere, and is divided into two subclades: X1 is found mainly in eastern and northern Africa, whereas X2 is found in northern Africa and everywhere else, including Native Americans. It is interesting that the X2 seems to have spread after the Last Glacial Maximum, and the Native American clade, X2a was an "early split": today's Siberian X2 seem to be recently derived from Western Eurasia than those of the ancient trek which brought X2 into the New World. It is fascinating that X2 was brought into the New World by some ancient expansion that did not leave any traces in the genes of modern inhabitants of the likely routes.
The second great puzzle is mtDNA haplogroup M1 which occurs in East and North Africa, West Asia and Southern Europe, but not apparently anywhere else. M1 is a branch of the mainly Asian macrohaplogroup M, which is of great antiquity in Asia and likely originated there. According to a recent abstract, Holden et al. indicate that M1 is found at high frequencies in East and Northern Africa but not in Sub-Saharan Africa, and hint that it may be linked to the Afro-Asiatic language family. This suggestion is reasonable, and in my opinion the correspondence between M1 and Y-chromosome haplogroup E3b is quite remarkable throughout the broad peri-Mediterranean region, with E3b also reaching high frequencies in Afro-Asiatic speakers.
The third puzzle is that of Y-chromosome haplogroup DE defined by the YAP mutation. The E clade of YAP encompasses the great majority of African Y-chromosomes, and is clearly split into a subclade, E3b which has a peri-Mediterranean distribution similar to that of the aforementioned M1, and all the rest, almost exclusively limited to Sub-Saharan Africa. The "brother" of E, is haplogroup D, which is found in such peoples as the Andamanese, the Tibetans, and the Ainu. At present it seems reasonable that E originated somewhere in Africa, but the origin of D is far from certain, as it is now found in certain "fringe" populations, but also in low frequencies among many Asians. Perhaps, D had a much more prevalent distribution in the past, but the expansion of later successful lineages, such as O, the main haplogroup found in East Asians today overwhelmed those earlier Asian populations. What about YAP itself? Dit it originate in Asia, where its D descendants are located, or in Africa, where its E descendants are? As late as 2003, we don't know, and no new research has appeared to shed light on this problem.
The first one is that of mtDNA haplogroup X, which has been addressed in a recent paper. This is a very ancient clade, which is found at low frequencies almost everywhere, and is divided into two subclades: X1 is found mainly in eastern and northern Africa, whereas X2 is found in northern Africa and everywhere else, including Native Americans. It is interesting that the X2 seems to have spread after the Last Glacial Maximum, and the Native American clade, X2a was an "early split": today's Siberian X2 seem to be recently derived from Western Eurasia than those of the ancient trek which brought X2 into the New World. It is fascinating that X2 was brought into the New World by some ancient expansion that did not leave any traces in the genes of modern inhabitants of the likely routes.
The second great puzzle is mtDNA haplogroup M1 which occurs in East and North Africa, West Asia and Southern Europe, but not apparently anywhere else. M1 is a branch of the mainly Asian macrohaplogroup M, which is of great antiquity in Asia and likely originated there. According to a recent abstract, Holden et al. indicate that M1 is found at high frequencies in East and Northern Africa but not in Sub-Saharan Africa, and hint that it may be linked to the Afro-Asiatic language family. This suggestion is reasonable, and in my opinion the correspondence between M1 and Y-chromosome haplogroup E3b is quite remarkable throughout the broad peri-Mediterranean region, with E3b also reaching high frequencies in Afro-Asiatic speakers.
The third puzzle is that of Y-chromosome haplogroup DE defined by the YAP mutation. The E clade of YAP encompasses the great majority of African Y-chromosomes, and is clearly split into a subclade, E3b which has a peri-Mediterranean distribution similar to that of the aforementioned M1, and all the rest, almost exclusively limited to Sub-Saharan Africa. The "brother" of E, is haplogroup D, which is found in such peoples as the Andamanese, the Tibetans, and the Ainu. At present it seems reasonable that E originated somewhere in Africa, but the origin of D is far from certain, as it is now found in certain "fringe" populations, but also in low frequencies among many Asians. Perhaps, D had a much more prevalent distribution in the past, but the expansion of later successful lineages, such as O, the main haplogroup found in East Asians today overwhelmed those earlier Asian populations. What about YAP itself? Dit it originate in Asia, where its D descendants are located, or in Africa, where its E descendants are? As late as 2003, we don't know, and no new research has appeared to shed light on this problem.
Subscribe to:
Posts (Atom)








