Showing posts with label Palaeoafricans. Show all posts
Showing posts with label Palaeoafricans. Show all posts

August 25, 2012

Genes and Geography (Wang et al. 2012)

Gene-geography correlations have been explored before at a regional level. More recently, they were also studied at the global level with the SPA method. A new open access paper shows gene-geography correlations across the world.

These correlations arise from the fact that humans tend to intermarry with their neighbors, so alleles have a decreasing probability of being transmitted from a person at location X to future generations, the further we go from X. But, the more interesting cases are those which show a violation of the overall pattern. These can usually arise because of genetic isolation or long-distance migration. An example is that of the African hunter-gatherer groups:
When hunter-gatherer populations (!Kung, San, Biaka Pygmy, and Mbuti Pygmy) and Mbororo Fulani were included in the analysis, they appeared as isolated clusters on the PCA plots and greatly reduced the similarity between PCA maps and geographic maps (Figure S3, Table S7). The similarity score decreased from 0.790 to 0.548 after including all five of these populations in the analysis. This value, however, is still statistically significant, with a -value of ; further, if we disregard the hunter-gatherer populations and Mbororo Fulani in Figure S3B and only examine the relative locations of the original 23 populations, we can still find a clear resemblance between genetic and geographic coordinates. Compared to the other 23 populations, the four hunter-gatherer populations appear as isolated groups at the south, and Mbororo Fulani appears at the north. These observations are clearer in plots with only one among the five outlier populations included at a time (Figure S3C–S3G), each of which also produces significant similarity scores between genetic and geographic coordinates (Figure S4, Table S7).
Figure S3 is very informative:



Observe that in Figure S3C, the Mbororo Fulani appear in the Balkans (!) relative to Sub-Saharan Africans. That is of course, due to their partial West Eurasian ancestry, but the magnitude of the difference is such that one suspects that it is not only due to this factor; if it were, then the Fulani would place somewhere between Europe and Central Africa.

The remaining figures (D-G) supply the explanation: the four hunter-gatherer groups appear well south of their actual locations; the Pygmy groups not in W/C Africa, but in S Africa; the Khoisan ones not in S Africa but in the Ocean well south of it.

Why does gene-geography correlation suffer such a violation in Africa? Figure S3 shows how different groups relate to W/C Africans. But, one could also use hunter-gatherers as an anchor point (i.e., place them where they actually live): in that case the W/C Africans would be the ones who would be pushed north towards the Mediterranean.

 And, indeed, that is a good argument for the idea I've floated a few times, of substantial Eurasian back-migration into Africa: the genetic difference between African farmers and African hunter-gatherers dwarfs the geographic distance. This can easily be explained if we assume that back-migration from Eurasia affected the former much more than the latter. So, African farmers can be shown to be the outcome of mixture between two-divergent elements: one Eurasian-like, one African hunter-gatherer-like. The latter could include both groups like existing African H-Gs but might also include other groups who had the misfortune of being completely absorbed before the Eye of Science set its sights on the African continent.


PLoS Genet 8(8): e1002886. doi:10.1371/journal.pgen.1002886

A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations

Chaolong Wang et al.

The spatial pattern of human genetic variation provides a basis for investigating the history of human migrations. Statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been used to summarize spatial patterns of genetic variation, typically by placing individuals on a two-dimensional map in such a way that pairwise Euclidean distances between individuals on the map approximately reflect corresponding genetic relationships. Although similarity between these statistical maps of genetic variation and the geographic maps of sampling locations is often observed, it has not been assessed systematically across different parts of the world. In this study, we combine genome-wide SNP data from more than 100 populations worldwide to perform a formal comparison between genes and geography in different regions. By examining a worldwide sample and samples from Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, we find that significant similarity between genes and geography exists in general in different geographic regions and at different geographic levels. Surprisingly, the highest similarity is found in Asia, even though the geographic barrier of the Himalaya Mountains has created a discontinuity on the PCA map of genetic variation.

Link

August 19, 2012

Raising a peace banner in the Neandertal Wars

The two camps in the Second Neandertal Wars (*)  have assumed maximalist positions on opposing sides of the argument: African structure explains it! vs. Neandertal admixture explains it!. Armed with the Vindija genome, that marvel of technological ingenuity, and a suite of impressive statistical models, the two sides have reached completely opposing conclusions.

In order to formulate my own position, I decided to do what I love best, i.e., to look at the data for myself. My main idea is that the signals of Neandertal and Denisova admixture as measured by these quantities (D-statistics) ...

D(Pop1, Yoruba, Neandertal, Chimp)
D(Pop1, Yoruba, Denisova, Chimp)

... will vary on different SNP ascertainment panels. SNPs ascertained in Africans may have a great number of Palaeoafrican alleles; SNPs in Neandertal-admixed populations will have a great number of Neandertal alleles; SNPs in Denisova-admixed populations will have a great number of Denisova alleles. If a population has admixture from hominin X, this admixture, as measured by the D-statistic, will tend to be inflated in panels possessing alleles that introgressed from X, and suppressed in panels that lack them.

The issue of ascertainment and archaic admixture was addressed by Skoglund and Jakobsson (2011); my aim is different: I am not so much interested in how ascertainment affects admixture estimates, but rather in exploiting the observation of the preceding paragraph (that Palaeoafrican, Neandertal, or Denisovan SNPs will lurk at different rates when ascertained in different individuals) to see what it tells us about human differences.

The signal of "archaic admixture" may be generated by genuine archaic admixture in one population (e.g., Eurasians), making it more similar to the archaic group (e.g., Neandertals), or by archaic admixture -of a different sort- in another population (e.g., Africans), making it less similar to that group. Both these processes may be at work, operating at different intensity in different populations and across different timelines.

I used the Harvard HGDP set, which contains 12 SNP panels, each of which has been ascertained in two chromosomes of a single individual. These panels are:
San, Yoruba, Mbuti, French, Sardinian, Han, Cambodian, Mongolian, Karitiana, Papuan1, Papuan2, Melanesian
A D-statistic was calculated relative to either Neandertal or Denisova for all HGDP populations, as well as the two archaic hominins. Subsequently, I used MCLUST to infer the number of different clusters on the basis of these statistics. In the optimal solution, MCLUST inferred 7 clusters, with each archaic hominin getting its own cluster, while the modern human populations were assigned to 5 clusters corresponding to five major human races recognized by traditional physical anthropology (Mongoloid, Negroid, Australoid, Capoid, and Caucasoid).
Note that these are not admixture proportions, but assignment probabilities! All populations fell into their expected clusters. The populations from Pakistan who are believed to be predominantly Caucasoid with varying degrees of minor admixture of an Ancestral South Indian element were assigned to the Caucasoid cluster. So did the Mozabite Berbers, a Caucasoid population with minority Negroid admixture. Finally, of the Central Asian populations, the Hazara of Pakistan showed mixed affiliations in the Caucasoid and Mongoloid clusters, while the Uygur were assigned to the Mongoloid cluster.

It is noteworthy that by exploiting patterns of relationship of modern to regional archaic humans, we have managed to recreate the major human groups. This is, perhaps, supportive of those who have argued that a degree of regional continuity across the Old World, and not only recent post-Out of Africa genetic divergence is responsible for present-day inter-population differences.

MCLUST also gave us the D-statistic means for the 7 inferred clusters. Remember that these are differences between a population Pop1 and Yoruba, relative to an archaic hominin (Neandertal or Denisova), and for 12 different ascertainment panels:


There are wonderful patterns to be discovered here; you can look at the data for yourselves; that's the open science thing to do.

All our ideas about human origins are conditioned on the availability of genomes from two archaic Eurasian hominins, and the lack of genomes of similar age from Africa.

But, remember:
  • You can fit Europe, China, India, and the US into Africa, with room to spare. 
  • If Vindija and Denisova, two caves less than 5,000km apart were home to people more divergent from each other than any two humans are today, it's strange to think that only "modern humans" inhabited Africa at the same time. 
  • The maximum genetic distance between living Africans is much higher than the maximum distance between living Eurasians: Africa is much more diverse than Eurasia. It's simpler to assume that the same relative pattern was true during the Middle Stone Age. The palaeoanthropology seems to support this, showing archaic forms present even during the terminal Pleistocene in Africa.
  • If modern humans did interbreed with 2/2 archaic humans whose sequences we possess, it's strange to think that they somehow shunned the African Others.
In view of the above, I humbly raise my peace banner in the Neandertal Wars, and declare that it isn't either-or: it's both!

(*) The First Neandertal Wars were fought decades ago by anthropologists working with calipers and magnifying lenses. Their outcome was to relegate Neandertals from the enviable position of our likely ancestors to that of an irrelevant sidekick, although a not-negligible minority continued an insurgency against the Out-of-Africa-only victors.

August 16, 2012

Eureka! African population structure contributing to signal of "Neandertal admixture"

Just as I was finishing my recent post arguing for a serious consideration of the African structure hypothesis, a much better way of proving it beyond a reasonable doubt occurred to me.

Consider a SNP where African populations are polymorphic (they have both alleles), but Eurasians are monomorphic (they only have one of the two alleles). Such a SNP may indicate one of three things:

  1. Ancestral modern humans possessed both alleles, but one of them was lost in the Out-of-Africa bottleneck.
  2. Ancestral modern humans possessed only one of the two alleles, and the second one appeared by mutation in Africans after the Out-of-Africa event.
  3. Ancestral modern humans possessed only one of the two alleles, and the second one appeared by introgression, from a second African population, after the Out-of-Africa event.
The first two cases are "pure Out-of-Africa" with no archaic introgression whatsoever. In such SNPs, humans are expected to be equally related to any archaic hominin.

The third case involves a structured population: one sub-population ("Afrasians") is monomorphic and spawns Eurasians, while another ("Palaeoafricans") possesses the second allele.

When population structure breaks down, "Afrasians" and "Palaeoafricans" mix, but the Out-of-Africa event has already happened, so Eurasians, descended exclusively from "Afrasians" do not have the opportunity to possess the second allele. Africans, on the other hand, end up with both alleles at the SNP.

So, let's see how the PCA looks like when we consider such SNPs, which constitute ~14% of panel 4.

Full PCA:
Modern human blowup:

Population means:


It is clear that the signal of differential Neandertal affinity is preserved for this class of SNPs. Eurasians are about 5.1% shifted towards Neandertals along the Neandertal-Yoruba axis, which is much stronger than in any of the experiments of the previous post.

The only reasonable explanation for this pattern is African population structure.

UPDATE (Aug 16):I have repeated the experiment including Papuans and Melanesians, and requiring SNPs to be monomorphic in both Papuans and Melanesians and polymorphic in Yoruba and Mandenka. The results can be seen below:
It is clear that modern Eurasians deviate towards archaic Eurasians for this class of SNPs that are monomorphic in Papuans/Melanesians and polymorphic in Africans, which is again consistent only with a structured ancestral African population.

August 15, 2012

African population structure and/or Eurasian back-migration contribute to signal of "Neandertal admixture"

The recent Sankararaman et al. (2012) paper used SNPs that occur at a low frequency in modern French (minor allele frequency MAF less than 10%) to estimate the date of modern human-Neandertal admixture. Presumably such low-frequency alleles have a higher chance of being inherited by Neandertals, and, indeed, the authors detect a stronger "Neandertal admixture" signal in this class of SNPs.

But, does this justify a casual dismissal of African population structure as a contributing factor to the observed signal?

To investigate this issue, I carried out a small experiment using the panel 4 (San-ascertainment) of the Harvard HGDP set. I used smartpca to do PCA using Neandertal/Denisova/Chimp, and projected a set of modern populations (Yoruba, Mandenka, French, Basque, Dai, Han-NChina) onto this PC space.

I carried out this analysis using all SNPs in panel 4, as well as SNPs with minor allele frequency less than 10% in French, Dai, Yoruba, and MbutiPygmy. While the authors chose to limit themselves to low-frequency SNPs in French, I chose to do the same with low-frequency SNPs in Africans. The full results can be seen at the end of this post.

It turns out that the "Neandertal admixture" signal persists even in the class of SNPs with MAF less than 10% in the Yoruba and MbutiPygmy.

There can be two explanations for this:
  • If these SNPs are of Neandertal origin, then their presence in Africa indicates substantial back-migration from Eurasia, which introduced them into West Africa.
  • If these SNPs are not of Neandertal origin, then their presence in Africa indicates archaic African admixture that has shifted West Africans away from the common ancestor of modern humans and Neandertals, and a structured African population.

I have argued for both Eurasian back-migration into Africa and African population structure. What does not appear to be tenable is a model which simultaneously (i) dismisses Eurasian back-migration into Africa, and (ii) dismisses African population structure as a possible explanation for the observation of greater Neandertal-Eurasian than Neandertal-African similarity.


All SNPs:

Full PCA:
Blowup of modern humans:



SNPs with French MAF less than 0.1:

Full PCA:

Blowup of modern humans:



SNPs with Dai MAF less than 0.1:

Full PCA:

Blowup of modern humans:



SNPs with Yoruba MAF less than 0.1:

Full PCA:

Blowup of modern humans:



SNPs with MbutiPygmy MAF less than 0.1:

Full PCA:

Blowup of modern humans:


UPDATE: The complete set of population means for the above experiments can be found in this spreadsheet.

UPDATE II: Even stronger evidence in a newer post.

July 30, 2012

Estimating the age of Y-chromosome Adam (again)

UPDATE (August 1): The dates in this post have been superseded by the ones in Dates of major clades of the Y-chromosome phylogeny.

I have used the official phase1 chrY SNP data instead of the working data used in my first experiment. The histogram of pairwise TMRCA values looks much sharper now; not sure what the difference between the two datasets was:

In any case, the divergence of the most basal African clade is very evident here on the right, corresponding to an age for the human Y-MRCA of 159,298 years.

Also of interest are the other peaks in the distribution of pairwise TMRCAs which correspond to 6.5, 40.0, 66.3 thousand years. I think we are getting some good signals corresponding to Out-of-Arabia (66.3ky?) where a hyper-arid phase in Arabia may very well have caused a bottleneck in the population of modern humans, and full behavioral modernity/UP revolution (40.0ky?) where modern humans start turning up all over Eurasia, and even some Africans look like UP Europeans.

Perhaps, I'll spend some more time assigning the Y-chromosomes to haplogroups so that I can give a more complete estimate of the major clades of the Y-chromsome phylogeny.

UPDATE: Node Ages


I will add various node ages as I calculate them. Thanks to ISOGG for a convenient correspondence between haplogroups and SNP genetic positions.



Clade
Comparison
Age (years)
Notes
BT
(B-M247 vs. CT-M294)
 71,188
C1-C3
(C1-P122 vs. C3-Z1453)
 25,022
CF
(C-M130 vs. F-P137)
 47,379
CT
(DE-M145 vs. CF-P143)
 62,439
DE
(D-Page3 vs. E-P169 )
 62,205 
(***)
E
(E1-P147 vs. E2-M75)
 57,703 
(****)
E1b1
(E1b1a-V38 vs. E1b1b-M215)
 43,587
E1b1b1a1a-E1b1b1a1b
(E1b1b1a1a-V12 vs. E1b1b1a1b-V13)
 13,817
E1b1b1a1a-E1b1b1a1c
(E1b1b1a1a-V12 vs. E1b1b1a1c-V22)
 19,482
E1b1b1a1b-E1b1b1a1c
(E1b1b1a1b-V13 vs. E1b1b1a1c-V22)
 16,210
I
(I1-L75 vs. I2-L68)
  26,885
I2a1
(I2a1a-M26 vs. I2a1b-S328)
 19,513 
(*****)
IJ
(I1-L75 vs. J-L60)
 35,589
IJK
(IJ-P125 vs. K-P131)
 41,910
J
(J1-M267 vs. J2-M172)
 24,497 
(*)
J2
(J2a-L212 vs. J2b-M102)
 23,420
K-M9  
36,389
(x)
NO
(N-M231 vs. O-P191)
 32,467
O1-O2
(O1a-M119 vs. O2-M268)
 26,145
O1-O3
(O1a-M119 vs. O3-P198)
 26,303
O2-O3
(O2-M268 vs. O3-P198)
 27,870
O3a1-O3a2
(O3a1-L465 vs. O3a2-P201)
 18,765
P
(Q-M242 vs. R-P224)
 33,043
R1
(R1a-M420 vs. R1b-M343)
 23,657 
(**)
R1b1a2a1a1a5
(R1b1a2a1a1a5a-Z156 vs. R1b1a2a1a1a5b-Z301)
 6,476



(*) This is, strictly speaking, the common ancestor of J1 and J2, since J*(xJ1, J2) chromosomes have also been observed.
(**) This is also the common ancestor of R1a and R1b, since R1* chromosomes have also been observed
(***) Paragroup DE* chromosomes have also been observed
(****)  Paragroup E* chromosomes have also been observed, so this is, strictly speaking, the common ancestor of E1 and E2; this is only a small underestimate, given that the DE node (62,205 years) is only marginally older
(*****) There is also the paragroup I2a1* and I2a1c

(x) Due to absence of published bifurcating structure within K-M9, I estimated this using the same model-free method used for Y-MRCA. This result in an "older peak" of pairwise K-M9 TMRCAs of 36,389 years. This seems appropriate as it lies between IJK (41,910 years) and P (33,043 years)

July 29, 2012

David Reich gives us some more hints



From 7:24 onwards:
For example people often think that Europeans are homogeneous group that arrived in a simple way there maybe 40 or 50 thousand years ago maybe based on the archaeology and just kind of sat there until they became the Europeans they are today, but that's probably not true: the Europeans today are a replacement population who came in much more recently and replaced the people who were there originally 40 thousand years ago. 

People in Africa certainly very diverse, but in fact there is very deep strands of variation in Africa and for example West Africans, people, the primary ancestral group of African Americans today, are actually turn out to be mixtures of very differently diverged groups that go very deep in time. So this is all very interesting, also true for East Asians, people in China today are not the same people who were there from the first time 40 thousand years ago, in fact they are a replacement population largely, that arrived after the first people there. So you see this history of more complex population movements than people think at first.
Previous hints about Europeans: onetwo.

All these themes ought to be familiar to readers of the blog, and I personally can't wait to sink my teeth into the new research results when they finally see the light of day.

July 26, 2012

New evidence for archaic admixture in African hunter gatherers (Lachance et al. 2012)

At the end of last year I predicted that full genome sequencing would begin turning up evidence for more archaic admixture in Africa. Halfway into the year, it appears that my prediction has proven to be correct: a new study in Cell by Lachance et al. documents the existence of such admixture between an archaic hominin and Pygmies from Cameroon, and the East African Hadza and Sandawe.

Archaic admixture in Biaka and San was previously detected by Hammer et al. Hence, we now have evidence for archaic admixture from several regions that encompass all major regions within sub-Saharan Africa. It seems that my old idea about layers of Palaeoafricans being absorbed by early modern humans in Africa was basically correct, and that some of these layers correspond to archaic African populations.

But not all agree. The New York Times coverage of the paper suggests that there is a controversy surrounding the new study:
All human fossil remains in Africa for the last 100,000 years, and probably the last 200,000 years, are of modern humans, providing no support for a coexistent archaic species. 
... 
Paleoanthropologists like Dr. Klein consider it “irresponsible” of the geneticists to publish genetic findings about human origins without even trying to show how they may fit in with the existing fossil and archaeological evidence. Dr. Akey said he agreed that genetics can provide only part of the story. “But hopefully this is just a period when new discoveries are being made and there hasn’t been enough incubation time to synthesize all the disparities,” he said.
This is of course completely wrong; as Chris Stringer mentions in the NY Times piece, there is ample evidence for archaic Africans down to quite recent times in the form of Iwo Eleru and Ishango, and there is more evidence besides. Indeed, it does not appear at all that there was a punctuational event that replaced archaic hominins with a new Homo sapiens species. If anyone wants to criticize the new study, complaining about it being in disharmony with physical anthropology is not a good way to go about it. Nor is it, of course, "irresponsible" to report the new findings.  And, apparently, there is more on the way:

In a report still under review, a third group of geneticists says there are signs of Neanderthals having interbred with Asians and East Africans. But Neanderthals were a cold-adapted species that never reached East Africa.
Things are bound to become quite interesting.

From the paper:

A striking finding in our data set is that compelling evidence exists that extant hunter-gatherer genomes contain introgressed archaic sequence, consistent with previous studies (Hammer et al., 2011; Plagnol and Wall, 2006; Reich et al., 2010; Shimada et al., 2007; Wall et al., 2009). We note that unambiguous evidence of introgression is difficult to obtain in the absence of an archaic reference sequence, which currently does not exist and may never be feasible given the rapid decay of fossils in Africa. Although we carefully filtered our data set in an attempt to analyze only high-quality sequences (Supplementary Information), it is possible that unrecognized structural variants or other alignment errors could generate a spurious signature similar to introgression. Encouragingly, we did not see an enrichment of structural variation calls in our candidate introgression regions. Additionally, through extensive simulations and analysis of European whole-genome sequences (Supplementary Information), we have demonstrated that the signatures of introgression that we observed are unlikely to be entirely accounted for due to other aspects of population demographic history, natural selection, or sequencing errors. Moreover, we did not find strong evidence that introgressed regions were clustered in the genome more often than expected by chance (p > 0.05; Supplemental Information). Nor did we find significant evidence that introgressed regions were enriched in genic regions (p > 0.05); rather, genic regions were significantly depleted for introgression in several populations (Supplemental Information). Therefore, the simplest interpretation of these data is that introgressed regions in extant human populations represent neutrally evolving vestiges of archaic sequences. In short, we find that low levels of introgression from an unknown archaic population or populations occurred in the three African hunter-gatherer samples examined, consistent with findings of archaic admixture in non-Africans (Reich et al., 2010). 


What are the implications of the new research? Where did modern humans actually originate and how can their archaic admixture be explained?

One possible explanation, consistent with multi-regional evolution (MRE) theory, is that modern humans didn't originate anywhere in particular; they emerged out of Homo populations that lived everywhere. And, certainly, the discovery of archaic admixture of a local origin is quickly reducing the number of places where the common ancestors of modern humans could have begun their expansion. Western Eurasia is out due to Neandertals; East Eurasia and Oceania is out due to Denisovans; the entirety of Sub-Saharan Africa seems to also be out. North Africa and Southwest Asia appear to be the only remaining candidates.

I don't particularly agree with MRE; one of its predictions (about the relevance of archaic hominins to the human story) has proven to be correct: it increasingly seems that there never was a new Homo sapiens species that was in reproductive isolation from the rest of the Homo genus. On the other hand, the existence of local admixture with different sets of archaic hominins, together with the relative homogeneity of our species is indicative of a range expansion that largely replaced archaic humans -- but not completely.

There does seem to have been a Big Bang of modern humans which caused the demographical explosion of a particular subset of genetic variation. This Big Bang is often associated with Out-of-Africa, but there are good reasons to doubt the traditional 60,000-year old Out-of-Africa theory, according to which humans from South or East Africa crossed into Arabia and followed the coast to populate the world. We now have more reasons to doubt this: evidence of archaic admixture in both the postulated homelands: South Africa, often cited as the region where the first signs of behavioral modernity appear, and East Africa, where the earliest anatomically modern human fossils appear.

My money continues to be on the "two deserts" theory I have proposed some time ago:

  • A green Sahara pumping the ancestors of modern humans pre-100 thousand years ago, and 
  • a deteriorating green Arabia pumping them post-70 thousand years ago, with some back-migration into Africa.
This would relate the two regions where no evidence (yet?) for archaic humans exist (North Africa and South West Asia), explain the causes of their dispersal (climate change), and harmonize with the evidence for archaic admixture, since the expanding wave of modern humans would partially absorb pre-existing hominins in both Sub-Saharan Africa and across Eurasia.


It must be noted that scientists have been rather conservative in their estimates of archaic admixture in the absence of ancient DNA sequence. Recombination obliterates traces of really old admixture, because introgressed segments become ever smaller, resulting in a pastiche of modern and archaic sequence that no longer looks statistically archaic. But, hopefully, the ever-solidifying case for archaic admixture in our species will finally deal the death blow to tree models, and reveal a much more interesting story of our origins.


Other coverage of the new paper: Nature, Science, ScienceDaily, EurekAlert, Washington Post, SciAm.


Cell doi:10.1016/j.cell.2012.07.009

Evolutionary History and Adaptation from High-Coverage Whole-Genome Sequences of Diverse African Hunter-Gatherers

Joseph Lachance et al.


To reconstruct modern human evolutionary history and identify loci that have shaped hunter-gatherer adaptation, we sequenced the whole genomes of five individuals in each of three different hunter-gatherer populations at >60x coverage: Pygmies from Cameroon and Khoesan-speaking Hadza and Sandawe from Tanzania. We identify 13.4 million variants, substantially increasing the set of known human variation. We found evidence of archaic introgression in all three populations, and the distribution of time to most recent common ancestors from these regions is similar to that observed for introgressed regions in Europeans. Additionally, we identify numerous loci that harbor signatures of local adaptation, including genes involved in immunity, metabolism, olfactory and taste perception, reproduction, and wound healing. Within the Pygmy population, we identify multiple highly differentiated loci that play a role in growth and anterior pituitary function and are associated with height.


Link

July 25, 2012

Khoisan genetic prehistory (Pickrell et al. 2012)

This appears to be the first paper using the specialized Affymetrix chip, which was announced some time ago, and used in some of my previous experiments. The new array has been dubbed "Affymetrix Human Origins array" and has been composed by intersecting panels of SNPs ascertained in individuals from several world populations.

It is of course great to see that this paper has appeared as a preprint in arXiv, and hopefully this is a trend that will continue; biology should be like physics, with papers appearing immediately online for commenting, and not hidden away in authors', editors', and reviewers' drawers for months if not years before they become available to all.

I will highlight some points of particular interest to me:

Some caveats of interpretation here are warranted. First, all the Khoisan populations have some level of admixture with non-Khoisan populations. There is thus no single \split time" in their history, and any method (like the one used here) that estimates a single such time will actually be estimating a composite of several signals. Second, we have made the modeling assumption that history involves populations splitting in two with no gene  ow after the split. More complex demographies are quite plausible, but render the interpretation of a split time nearly meaningless (if populations continue to exchange migrants after \splitting", they arguably have not split at all). We thus consider strong interpretations of split times estimated from genetic data to be impossible, but we nonetheless and the estimates to be useful in constraining the set of historical hypotheses that are consistent with the data. 

This echoes (somewhat) my sentiments about split times being a tug-of-war in the presence of admixture. Another interesting bit from the paper:

Interestingly, a few of the Khoe-speaking populations have slightly positive f4 statistics in this com- parison, and in the Shua the f4 statistic is significantly greater than zero. We speculate that some of the Khoe-speaking populations have a low level of east African ancestry, and that the relevant east African population was itself admixed with a western Eurasian population. The Shua also show a detectable signal of admixture LD, though we estimate the admixture date as much older (44 generations). This signal of east African ancestry specifically in Khoe-speaking populations is of particular interest in the light of the hypoth esis that the Khoe-Kwadi languages were brought to southern Africa by a pre-Bantu pastoralist immigration from eastern Africa [Guldemann, 2008] 

The authors also announce an improvement on TreeMix:

In the original TreeMix algorithm, one first builds the best-tting tree of populations. However, this approach is not ideal if there are many admixed populations (as in our application here, where all of the Khoisan populations are admixed). To get around this, we allow for known admixture events to be incorpo- rated into this tree-building step. Imagine that there are several populations that we think a priori might be unadmixed (in our applications, these are the Chimpanzee, Yoruba, Dinka, Europeans, and East Asians). We  first build the best tree of these unadmixed populations using the standard TreeMix algorithm. Now assume we have an independent estimate of the admixture level of each Khoisan population, and imagine we know the source population for the mixture. 
I don't think that Sub-Saharan African populations can any longer be considered unadmixed. When one used SNPs ascertained in Eurasian individuals, many Sub-Saharan populations appear symmetrically related to Eurasians, because they lack variation at sites where new polymorphism appeared outside Africa. 

This is not, however, the case when one uses SNPs ascertained in African individuals, and a clear pattern of differential affiliation with West Eurasians across the continent is evident. As I have said before, I strongly suspect that this is due to fairly late back-migration of Eurasians into Africa, carrying Y-haplogroup DE chromosomes. Within haplogroup CT, both its major subclades CF and DE are represented in Eurasia, and both D,E, and DE* as well. In Africa, as far as we know, only DE* and E are native. On balance, the weight of the evidence would suggest a Eurasian origin of the DE-YAP haplogroup.

(I would perhaps be as bold as to extend this into the even more basal clades of the phylogeny which turn up with surprising regularity in Eurasian datasets, and are usually discounted as the result of recent admixture. I'm not so sure; if recent admixture was at fault, then the African signal in Eurasia would be absolutely dominated by E-related lineages: but the A's and B's turn up in quite unexpected places. Are they really all recent Africans, or could they share a much deeper common ancestry? If I had deep pockets, I'd surely invest in genome sequencing the collection of such Eurasian erratics)

As a parting thought, I hope that the data used in this paper will become publicly available in time, perhaps when the article appears in journal form. True open science depends not only in the public availability of research results, but also of the data that produced them.

UPDATE: Here is the ADMIXTURE analysis from the paper (Figure 7):

It would have been nice if the Fst values between ancestral populations were reported in the paper; also, if an East Eurasian group was added in the analysis. In any case, there does appear a pattern of differential affiliation with the French population (K=2). At K=3 the main Sub-Saharan (blue) component emerges, and a few populations continue to exhibit an excess of West Eurasian affiliation.

arXiv:1207.5552v1 [q-bio.PE]
The genetic prehistory of southern Africa

Joseph K. Pickrell et al.

The hunter-gatherer populations of southern and eastern Africa are known to harbor some of the most ancient human lineages, but their historical relationships are poorly understood. We report data from 22 populations analyzed at over half a million single nucleotide polymorphisms (SNPs), using a genome-wide array designed for studies of history. The southern Africans-here called Khoisan-fall into two groups, loosely corresponding to the northwestern and southeastern Kalahari, which we show separated within the last 30,000 years. All individuals derive at least a few percent of their genomes from admixture with non-Khoisan populations that began 1,200 years ago. In addition, the Hadza, an east African hunter-gatherer population that speaks a language with click consonants, derive about a quarter of their ancestry from admixture with a population related to the Khoisan, implying an ancient genetic link between southern and eastern Africa.

Link

July 24, 2012

Broken Hill humerus: date unknown

Recently it was hinted that the Broken Hill skull is younger than has been thought before. Now, a brief communication by Erik Trinkaus in AJPA suggests that the BH humerus which has featured in debates about the evolution of human postcranial morphology lacks proper stratigraphic context, and is from a location where modern human disturbance is likely. The conclusion: it should be avoided in studies about the evolution of humans until it is securely dated.

Redating fossils is a problem that has plagued anthropology for a while. For example, H. sapiens idaltu (Herto) was given a Linnaean moniker and recognized as a precursor of modern humans because of its mix of modern and archaic traits, but shortly thereafter the Omo skulls were re-dated to tens of thousands of years earlier (~195ka), indicating that the more modern morphology of Omo I actually preceded the more archaic one of Herto Man.

When an older fossil is redated to a younger age, what was once seen as showing signs of evolutionary trends leading up to modern humans ends up being an embarrassing survival of archaic traits in a period where they are supposed to have been on their way to replacement. The once popular story of modern humans shedding their archaic traits and attaining modernity in Africa, and then populating the world is increasingly unbelievable, both because the 60,000-year-old coastal migration is bunk, but also because archaic traits persist in some African populations down to the Holocene. I strongly suspect that the story of our origins may be much more interesting than anyone had imagined.

AJPA DOI: 10.1002/ajpa.22118

Brief communication: The human humerus from the broken hill mine, kabwe, zambia

Erik Trinkaus et al.

The distal half of a right human humerus (E.898), recovered ex situ in 1925 by Hrdlicka at the Broken Hill Mine, Kabwe, Zambia, has figured prominently in assessments of Middle Pleistocene Homo postcranial variation and of the phylogenetic polarity and functional anatomy of Pleistocene Homo upper limb morphology. Reassessment of distal humeral features that distinguish modern human and some archaic Homo humeri, especially relative olecranon breadth and medial and lateral pillar thicknesses, confirm previous studies placing it morphologically close to recent humans, as well as possibly to Early Pleistocene Homo. However, it completely lacks stratigraphic context, and there is faunal and archeological evidence for human activity at Broken Hill from the Middle Pleistocene to the Holocene. Given its uncertain geological age and modern human morphology, the Broken Hill E.898 humerus should not be used in analyses of Pleistocene humans until it is securely dated. Am J Phys Anthropol, 2012. © 2012 Wiley Periodicals, Inc.

Link

July 21, 2012

Admixture matters

Until recently, tree models dominated models of human demography. Under such models, populations split off from each other in a branching pattern. African populations, and especially African hunter-gatherers, which are the most divergent occupy the basal positions in the tree. The story has been repeated many times: Africans are more genetically diverse, Eurasians carry a subset of African genetic variation, a small subset of Africans left the continent and colonized the world after going through a severe bottleneck and so on.

It's a simple and attractive story, but one which is wholly dependent on ignoring admixture. There are two types of admixture that are pertinent: one is admixture between modern human groups. An example of this is Ethiopia. Many studies have presumed to identify a signal of Out-of-East Africa based on diminishing distance from East Africa. But it is completely unclear how this model fares when one takes into account that East Africans are a recently admixed population: their great genetic diversity may be due to the recent intermingling of two very divergent groups of people (Caucasoids and aboriginal East Africans).

Or, consider two Englishmen, one with a Nigerian and another with a Chinese grandparent. These two individuals might appear greatly diverged from each other genetically and phenotypically, but this is the aggregate of sharing 3/4 of quite recent common ancestry (from their English grandparents), and not sharing 1/4 each of highly divergent ancestry (from their Chinese and Nigerian ones).

The situation is more interesting when we realize that admixture can occur not only between modern human groups, but also between modern humans and archaic ones. Both archaic genomes published so far (Neandertal and Denisova) show differential affiliation to modern human groups, and indirect evidence suggests that some African groups also admixed with archaic species that once lived in Africa.

Of course, levels of archaic admixture inferred from these studies are usually small, but we must remember that a little archaic goes a long wayThis is due to the fact that modern humans and archaic ones diverged from each other a long time ago. Their admixture, even in highly favorable (for modern humans) proportions introduces a substantial amount of new genetic variation. As a result, populations harboring archaic admixture appear more divergent from each other.

This point is made quite well in a new article:
If human populations do not all have the same level of archaic introgression, the current genetic structure of human populations might be partly shaped by differential admixture. Estimates of population sizes and divergence times between human populations should thus be affected by past admixture events. The divergence time between an admixed and a non-admixed population should be overestimated if admixture is not properly modelled. Similarly, the effective size of admixed populations should be overestimated as archaic lineages inflate genetic diversity. In Figure 2, we report a simulation study of this bias in a very simple case of population divergence without migration. The overestimations of divergence time and admixed population size are almost linearly increasing with admixture rate (Figure 2). For instance, a divergence time of 1,600 generations (40,000 y assuming a 25-y generation time) is perfectly recovered if none of the populations is admixed, but is overestimated by 100 generations (2,500 y) with 1% admixture in one population, and already by 350 generations (8,750 y) with 5% admixture. Even though our simulated scenario is unrealistically simple, it is likely that differential admixture should affect population genetic affinities under more complex models of population differentiation. The proper interpretation of human genetic affinities should thus probably be re-evaluated in the light of these results. In particular, the divergence between Africans and Oceanians (showing up to 5% archaic admixture [16]) could be more recent than previously reported (62–75 Kya [24]). It remains unclear whether the method used by Rasmussen et al. [24] to date this divergence is also sensitive to differential introgression, but, if that was the case, the colonization wave to Oceania thought to well predate that towards East Asia [24] could have occurred at roughly the same time once differential admixture had been taken into account.
This is an important point: the inferred early dispersal of Oceanians could in fact be the result of archaic admixture in both Africans and Oceanians. 


Lower levels of archaic admixture are sufficient to make two individuals or populations appear much more distant from each other. Archaic Homo populations may be as much as an order of magnitude more divergent to H. sapiens that particular H. sapiens groups are to each other.

But, admixture can also deflate divergence, if there is subsequent gene flow between the diverged populations. As an example, Near Eastern Arab populations: have both diverged from Europeans due to receiving African admixture, and also converged with them by the fact that Europeans have Neolithic Near Eastern admixture which renewed bonds between Europe and the Near East. It means little to speak of "when" Europeans and Near Eastern people diverged from each other: it's a balancing act of centrifugal and centripetal influences: if a Crusader lands on the Levant and marries a local woman, he diminishes apparent Europe-Near East genetic divergence; if a Somali does the same, he increases it. So, in the end, the apparent "divergence" between Europe and the Near East may have little to do with how much time has transpired since the colonization of a new region, and more to do with "who had sex with whom" in the intervening period.

In fact, the ability of admixture to "converge" populations is the basis of the multi-regional evolution theory, although that is usually posited in terms of gene flow. But, the basic idea is still the same: our relatively uniform human species may not be entirely the result of tree-like divergence of populations from an original African population, but rather of a confluence of streams of ancestry derived from Lower and Middle Paleolithic populations of Homo.

Admixture may not only lead us to overestimate divergence between populations: it might lead us to wrongly estimate the directionality of migration itself. 

Consider a future geneticist, working thousands of years after a collapse of civilization in the near future which led into a breakdown of long-distance travel. Such a scientist would perhaps conclude that the highest genetic diversity is to be found in North America, and conclude that North America colonized the rest of the world.

In some cases, it can well be argued that serial founder effects/bottlenecks restrict genetic variation/effective size. For example, the colonization of the Americas in three waves created a population that is clearly a subset of the East Eurasian parental population. But, notice that researchers trying to understand it had to carefully disentangle the various migration waves and cleanse their data from recent European admixture. The directionality of migration can be recovered through a diligent treatment of the evidence.

But, let's forget about the nested-subset analogy: the fact that a population X may appear to be a subset of another Y does not indicate that Y founded X any more than the fact that the genetic variation of any single European country is a subset of the cosmopolitan populations of the Americas will indicate an America-to-Europe migration to the future geneticist. Sometimes, X is a subset of Y because Y has a superset of variation formed by union with a divergent other population.

Sub-Saharan Africa is one example of a terra incognita for the historian. In the absence of written sources, science is mostly clueless as to what was going on there for thousands of years after the invention of writing in Mesopotamia. In some areas, due to the moist climate/abundant vegetation/political instability even archaeological evidence is lacking. What this means is that we are in the dark about what admixtures were going in the Dark Continent. Thankfully, people are working on it.

My own position is that while an origin of anatomically modern humans in Africa still seems to be correct, the pattern of divergence and reduced effective size of Eurasians from Africans is not wholly due to a small group of them leaving the continent at some Middle Pleistocene epoch.

How much of the African divergence and higher African effective population size is due to a Biblical-level bottlenecks coinciding with Out-of-Africa? As readers of the blog know, I don't buy the recent Out-of-Africa model, especially in its "endangered but crafty tribe of pioneers following the coast 60,000 years ago" variety. Archaic admixtures in Eurasia and Africa inflate divergence times; back-migration may deflate them. It's yet another balancing act.

Hopefully statisticians, with a little help from archaeology and palaeoanthropology can untangle the palimpsest of events and present us with a believable story about our own origins. It's time to give up trees and embrace networks!


PLoS Genet 8(7): e1002837. doi:10.1371/journal.pgen.1002837

Genomic Data Reveal a Complex Making of Humans

Isabel Alves et al.

In the last few years, two paradigms underlying human evolution have crumbled. Modern humans have not totally replaced previous hominins without any admixture, and the expected signatures of adaptations to new environments are surprisingly lacking at the genomic level. Here we review current evidence about archaic admixture and lack of strong selective sweeps in humans. We underline the need to properly model differential admixture in various populations to correctly reconstruct past demography. We also stress the importance of taking into account the spatial dimension of human evolution, which proceeded by a series of range expansions that could have promoted both the introgression of archaic genes and background selection.

Link

June 24, 2012

SMBE 2012 abstracts (Part II)

Some more abstracts from SMBE 2012.


The Neolithic trace in mitochondrial haplogroup U8 
Joana Barbosa Pereira 1,2 , Marta Daniela Costa 1,2 , Pedro Soares 2 , Luísa Pereira 2,3 , Martin Brian Richards 1,4 1 Institute of Integrative and Comparative Biology, Faculty of Biological Sciences, University of Leeds, Leeds, UK, 2 Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal,  3 Faculdade de Medicina da  Universidade do Porto, Porto, Portugal,  4 School of Applied Sciences, University of Huddersfield, Huddersfield, UK  

The mitochondrial DNA (mtDNA) still remains an important marker in the study of human history, especially if  considering the increasing amount of data available. Among the several questions regarding human history that are  under debate, the model of expansion of agriculture into Europe from its source in the Near East is still unclear. Recent  studies have indicated that clusters belonging to haplogroup K, a major clade from U8, might be related with the  Neolithic expansions. Therefore, it is crucial to identify the founder lineages of the Neolithic in Europe so that we may  understand the real genetic input of the first Near Eastern farmers in the current European population and comprehend  how agriculture spread so quickly throughout all Europe.  In order to achieve this goal, a total of 55 U8 samples from the Near East, Europe and North Africa were selected for  complete characterisation of mtDNA. A maximum-parsimonious phylogenetic tree was constructed using all published  sequences available so far. Coalescence ages of specific clades were estimated using ρ statistic, maximum likelihood  and Bayesian methods considering a mutation rate for the complete molecule corrected for purifying selection.   Our results show that U8 dates to ~37-54 thousand years ago (ka) suggesting that this haplogroup might have been  carried by the first modern humans to arrive in Europe, ~50 ka. Haplogroup K most likely originated in the Near East  ~23-32 ka where it might have remained during the Last Glacial Maximum, between 26-19 years ago. The majority of K  subclades date to the Late Glacial and are related with the repopulation of Europe from the southern refugia areas. Only  a few lineages appear to reflect post glacial, Neolithic or post-Neolithic expansions, mostly occurring within Europe. The  major part of the lineages dating to the Neolithic period seems to have an European origin with exception of haplogroup  K1a4 and K1a3. Clade K1a4 appears to be originated from the Near East where it also reaches its highest peak of  diversity. Despite the main clades of K1a4 arose in the Near East during the Late Glacial, its subclade K1a4a1 dates to  ~9-11 ka and is most likely related with the Neolithic dispersal to Europe. Similarly, K1a3 probably originated in the Near  East during the Late Glacial and its subclade K1a1a dispersed into Europe ~11-13 ka alongside with the expansion of  agriculture. 
Late Glacial Expansions in Europe revealed through the fine-resolution characterisation of mtDNA haplogroup  U8 
Marta Daniela Costa 1,2 , Joana Barbosa Pereira 1,2 , Pedro Soares 2 , Luisa Pereira 2,3 , Martin Brian Richards 1,4 1 Institute of Integrative and Comparative Biology, Faculty of Biological Sciences, University of Leeds, Leeds, UK, 2 IPATIMUP - Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal,  3 Faculdade de  Medicina, Universidade do Porto, Porto, Portugal,  4 School of Applied Sciences, University of Huddersfield, Huddersfield,  UK  

The maternally inherited and fast evolving mitochondrial DNA (mtDNA) molecule is a highly informative tool with which  to reconstruct human prehistory. This has become even more true in recent years, as mtDNA based studies are  becoming more robust and powerful due to the availability of complete mtDNA genomes. These allow better mutation  rate estimates and fine-resolution characterisation of the phylogeography of mtDNA haplogroups, or named  clades.  MtDNA haplogroup K, the major subclade of U8, occurs at low frequencies through West Eurasian populations,  and is much more common in Ashkenazi Jews. However, the lack of variation on the first hypervariable segment (HVSI) has precluded any meaningful phylogeographic analysis to date. We therefore completely sequenced 50 haplogroup  K and 5 non-K U8 mtDNA samples from across Europe and the Near East, and combined them with 343 genomes  previously deposited in GenBank, in order to reconstruct a detailed phylogenetic tree. By combining several inference  methods, including maximum parsimony, maximum likelihood and Bayesian inference it was possible to trace the  timescale and geography of the main expansions and dispersals associated with this lineage. We confirmed that  haplogroup K, dating to ~32 thousand years (ka) ago, descended from the U8 clade, which coalesces ~48 ka ago. The  latter is close to the timing of the first arrival of modern humans in Europe and U8 could be one of the few surviving  mtDNA lineages brought by the first settlers from the Near East. U8 split into the widespread U8b, at ~43 ka, and U8a,  which seems to have expanded only in Europe ~24 ka ago. Considering the pattern of diversity and the geographic  distribution, haplogroup K is most likely to have arisen in the Near East, ~32 ka ago. However, some subclades were  evidently carried to Europe during the Last Glacial Maximum (LGM). We observed significant expansions of haplogroup  K lineages in the Late Glacial period (14-19 ka), reflecting expansions out of refuge areas in southwest and possibly  also southeast Europe. 

Reticulated origin of domesticated tetraploid wheat 
Peter Civan Centro de Ciencias do Mar, Universidade do Algarve, Faro, Portugal  

The past 15 years have witnessed a notable scientific interest in the topic of crop domestication and the emergence of  agriculture in the Near East. Multi-disciplinary approaches brought a significant amount of new data and a multitude of  hypotheses and interpretations. However, some seemingly conflicting evidence, especially in the case of emmer wheat,  caused certain controversy and a broad scientific consensus on the circumstances of the wheat domestication has not  been reached, yet.  The past phylogenetic research has translated the issue of wheat domestication into somewhat simplistic mono- /polyphyletic dilemma, where the monophyletic origin of a crop signalizes rapid and geographically localized  domestication, while the polyphyletic evidence suggests independent, geographically separated domestication events.  Interestingly, the genome-wide and haplotypic data analyzed in several studies did not yield consistent results and the  proposed scenarios are usually in conflict with the archaeological evidence of lengthy domestication.  Here I suggest that the main cause of the above mentioned inconsistencies might lie in the inadequacy of the divergent,  tree-like evolutional model. The inconsistent phylogenetic results and implicit archaeological evidence indicate a  reticulate (rather than divergent) origin of domesticated emmer. Reticulated genealogy cannot be properly represented  on a phylogenetic tree; hence different sets of samples and genetic loci are prone to conclude different domestication  scenarios. On a genome-wide super-tree, the conflicting phylogenetic signals are suppressed and the origin of  domesticated crop may appear monophyletic, leading to misinterpretations of the circumstances of the Neolithic  transition.  The network analysis of multi-locus sequence data available for tetraploid wheat clearly supports the reticulated origin of  domesticated emmer and durum wheat. The concept of reticulated genealogy of domesticated wheat sheds new light  onto the emergence of Near-Eastern agriculture and is in agreement with current archaeological evidence of protracted  and dispersed emmer domestication.

High-coverage population genomics of diverse African hunter-gatherers 
Joseph Lachance 1 , Benjamin Vernot 2 , Clara Elbers 1 , Bart Ferwerda 1 , Alain Froment 3 , Jean-Marie Bodo 4 , Godfrey  Lema 5 , Thomas Nyambo 5 , Timothy Rebbeck 1 , Kun Zhang 6 , Joshua Akey 2 , Sarah Tishkoff 1 1 University of Pennsylvania, Philadelphia, PA, USA,  2 University of Washington, Seattle, WA, USA,  3 IRD-MNHN, Musee  de l'Homme, Paris, France,  4 Ministere de la Recherche Scientifique et de l’Innovation, Yaounde, Cameroon,  5 Muhimbili  University College of Health Sciences, Dar es Salaam, Tanzania,  6 University of California at San Diego, San Diego, CA,  USA     
In addition to their distinctive subsistence patterns, African hunter-gatherers belong to some of the most genetically  diverse populations on Earth.  To infer demographic history and detect signatures of natural selection, we sequenced  the whole genomes of five individuals in each of three geographically and linguistically diverse African hunter-gatherer  populations at >60x coverage.  In these 15 genomes we identify 13.4 million variants, many of which are novel,  substantially increasing the set of known human variation.  These variants result in allele frequency distributions that are  free of SNP ascertainment bias.  This genetic data is used to infer population divergence times and demographic history  (including population bottlenecks and inbreeding).  We find that natural selection continues to shape the genomes of  hunter-gatherers, and that deleterious genetic variation is found at similar levels for hunter-gatherers and African  populations with agricultural or pastoral subsistence patterns.  In addition, the genomes of each hunter-gatherer  population contain unique signatures of local adaptation.  These highly-divergent genomic regions include genes  involved in immunity, metabolism, olfactory and taste perception, reproduction, and wound healing.

Reconstructing past Native American genetic diversity in Puerto Rico from contemporary populations Marina Muzzio 1,2 , Fouad Zakharia 1 , Karla Sandoval 1 , Jake K. Byrnes 3 , Andres Moreno-Estrada 1 , Simon Gravel 1 , Eimear  Kenny 1 , Juan L. Rodriguez-Flores 5 , Chris R. Gignoux 6 , Wilfried Guiblet 4 , Julie Dutil 7 , The 1000 Genomes Consortium 0 ,  Andres Ruiz-Linares 8 , David Reich 9,10 , Taras K. Oleksyk 4 , Juan Carlos Martinez-Cruzado 4 , Esteban Gonzalez  Burchard 6 , Carlos D. Bustamante 1 1 Department of Genetics, Stanford University School of Medicine, Stanford, California, USA,  2 Facultad de Ciencias  Naturales, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina,  3 Ancestry. com®, San Francisco,  California, USA,  4 Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico,  5 Department  of Genetic Medicine, Weill Cornell Medical College, New York, New York, USA,  6 Institute for Human Genetics,  University of California San Francisco, San Francisco, California, USA,  7 Ponce School of Medicine, Ponce, Puerto Rico, 8 Department of Genetics, Evolution and Environment. University College London, London, UK,  9 Department of  Genetics, Harvard Medical School, Boston, Massachusetts, USA,  10 Broad Institute of MIT and Harvard, Cambridge,  Massachusetts, USA  

The Caribbean region has a rich cultural and biological diversity, including several countries with different languages,  and important historical events like the arrival of the Europeans in the late fifteenth century affected it deeply. Although it  has been said that two main Native American groups peopled the Caribbean at the time of Columbus’s voyages—the  Arawakan-speaking Tainos and the Caribs—this model has been questioned because it comes from the descriptions  written by the conquerors. The archaeological record shows a richer picture of trade among the islands, cultural change  and diversity than what colonial documents depict, from the early settlements around 8000 B.P. to the chiefdoms and  towns at the time of contact. How this area was peopled and how its inhabitants interacted with the surrounding  continent are questions that remain to be answered due to the fragmentary nature of the historical and archaeological  records.   
We aim to reconstruct the Native American genetic diversity from the time of the Spanish arrival at the island of Puerto  Rico from its contemporary population. We seek to find out how the original peopling of Puerto Rico occurred, along  with which contemporary Native American populations are the most closely related to the Native tracks found. We used  PCAdmix to trace Native American segments in admixed individuals, thus enabling us to reconstruct the original native  lineages previous to the European and African contact.   

Specifically, we generated local ancestry calls for the 70 parents of the 35 complete Puerto Rican trios from the wholegenome and Illumina Omni 2.5M chip Genotype data of the 1000 Genomes Project, both to examine genome-wide  admixture patterns and to infer demographic historical events from ancestry tract length distributions and an ancestryspecific PCA approach, adding 55 Native American groups as potential source populations (N=475 genotyped through  Illumina’s 650K array) and 15 selected Mexican trios (genotyped on Affymetrix’s 6.0 array, including about 906,000  SNPs) to provide population context. ADMIXTURE analysis has shown that in Puerto Rico there is no single source of  contribution for the Native component. Rather, this component seems to include a mixture of major Mexican and  Andean components with little contributions from the Amazonian isolates. On the other hand, the ancestry-specific PCA  plotted the Puerto Rican Native segments tightly clustered with the Native segments of groups from the same language  family as the Tainos (Equatorial-Tucanoan), showing a clear association between linguistics and genetics instead of a  geographical one.
 Inference of demographic history and natural selection in African Pygmy populations from whole-genome  sequencing data
 Martin Sikora 1 , Etienne Patin 2 , Helio Costa 1 , Katherine Siddle 2 , Brenna M Henn 1 , Jeffrey M Kidd 1,3 , Ryosuke Kita 1 ,  Carlos D Bustamante 1 , Lluis Quintana-Murci 2 1 Department of Genetics, School of Medicine, Stanford Uni, Stanford, CA, USA,  2 Unit of Human Evolutionary Genetics,  Institut Pasteur, CNRS URA3012, Paris, France,  3 Departments of Human Genetics and Computational Medicine and  Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA     

The Pygmy populations of Central Africa are some of the last remaining hunter-gatherers among present-day human  populations, and can be broadly classified into two geographically separated groups, the Western and Eastern Pygmies.  Compared to their neighboring populations of predominantly Bantu origin, Pygmy populations show distinct cultural and  physical characteristics, most notably short stature, often referred to as the “Pygmy phenotype”. Given their distinct  physical characteristics, the questions of the demographic history and origin of the Pygmy phenotype have attracted  much attention. Previous studies have shown an ancient divergence (~60,000 years ago) of the ancestors of modernday Pygmies from non-Pygmies, and a more recent split of the Eastern and Western Pygmy groups. However, these  studies were generally based on a relatively small set of markers, precluding accurate estimations of demographic  parameters. Furthermore, despite the considerable interest, to date there is still little known about the genetic basis of  the small stature phenotype of Pygmy populations.   
In order to address these questions, we sequenced the genomes of 47 individuals from three populations: 20 Baka, a  Pygmy hunter-gatherer population from the Western subgroup of the African Pygmies; 20 Nzebi, a neighboring nonPygmy agriculturist population from the Bantu ethnolinguistic group; as well as 7 Mbuti, Eastern Pygmy population, from  the Human Genome Diversity Project (HGDP). We performed whole-genome sequencing using Illumina Hi-Seq 2000 to  a median sequencing depth of 5.5x per individual. After stringent quality control filters, we call over 17 Million SNPs  across the three populations, 32% of them novel (relative to dbSNP 132). Genotype accuracy after imputation was  assessed using genotype data from the Illumina OMNI1 SNP array, and error rates were found to be comparable to  other low-coverage studies (< 3% for most individuals). Preliminary results show relatively low genetic differentiation  between the Baka and the Nzebi (mean FST = 0.026), whereas the Mbuti show higher differentiation to both Baka and  Nzebi (mean FST = 0.060 and 0.070, respectively). Furthermore, we find that alleles previously found to be associated with height in other populations are not enriched for the “small” alleles in the Pygmy populations. We find a number of  highly differentiated genomic regions as candidate loci for height differentiation, which will be verified using simulations  under the best-fit demographic model, inferred from multi-dimensional allele frequency spectra using DaDi. Our dataset  will allow a detailed investigation of the demographic history and the genomics of adaptation in these populations.
Genetic structure in North African human populations and the gene flow to Southern Europe
Laura R Botigué 1 , Brenna M Henn 2 , Simon Gravel 2 , Jaume Bertranpetit 1 , Carlos D Bustamante 2 , David Comas 1 1 Institut de Biologia Evolutiva (IBE, CSIC-UPF), Barcelona, Spain,  2 Stanford University, Stanford CA, USA Despite being in the African continent and at the shores of the Mediterranean, North African populations might have  experienced a different population history compared to their neighbours. However, the extent of their genetic divergence  and gene flow from neighbouring populations is poorly understood. In order to establish the genetic structure of North  Africans and the gene flow with the Near East, Europe and sub-Saharan Africa, a genomewide SNP genotyping array  data (730,000 sites) from several North African and Spanish populations were analysed and compared to a set of  African, European and Middle Eastern samples. We identify a complex pattern of autochthonous, European, Near  Eastern, and sub-Saharan components in extant North African populations; where the autochthonous component  diverged from the European and Near Eastern component more than 12,000 years ago, pointing to a pre-Neolithic  ‘‘back-to-Africa’’ gene flow. To estimate the time of migration from sub-Saharan populations into North Africa, we  implement a maximum likelihood dating method based on the frequency and length distribution of migrant tracts, which  has suggested a migration of western African origin into Morocco ~1,200 years ago and a migration of individuals with  Nilotic ancestry into Egypt ~ 750 years ago.  We characterize broad patterns of recent gene flow between Europe and Africa, with a gradient of recent African  ancestry that is highest in southwestern Europe and decreases in northern latitudes. The elevated shared African  ancestry in SW Europe (up to 20% of the individuals’ genomes) can be traced to populations in the North African  Maghreb. Our results, based on both allele-frequencies and shared haplotypes, demonstrate that recent migrations from  North Africa substantially contribute to the higher genetic diversity in southwestern Europe

Estimating a date of mixture of ancestral South Asian populations
Priya Moorjani 1,2 , Nick Patterson 2 , Periasamy Govindaraj 3 , Danish Saleheen 4 , John Danesh 4 , Lalji Singh* 3,5 ,  Kumarasamy Thangaraj* 3 , David Reich* 1,2 1 Harvard University, Boston, Massachusetts, USA,  2 Broad Institute, Cambridge, Massachusetts, USA,  3 Centre for  Cellular and Molecular Biology, Hyderabad, Andhra Pradesh, India,  4 Dept of Public Health and Care, University of  Cambridge, Cambridge, UK,  5 Genome Foundation, Hyderabad, Andhra Pradesh, India Linguistic and genetic studies have demonstrated that almost all groups in South Asia today descend from a mixture of  two highly divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners and  Europeans, and Ancestral South Indians (ASI) not related to any populations outside the Indian subcontinent. ANI and  ASI have been estimated to have diverged from a common ancestor as much as 60,000 years ago, but the date of the  ANI-ASI mixture is unknown. Here we analyze data from about 60 South Asian groups to estimate that major ANI-ASI  mixture occurred 1,200-4,000 years ago. Some mixture may also be older—beyond the time we can query using  admixture linkage disequilibrium—since it is universal throughout the subcontinent: present in every group speaking  Indo-European or Dravidian languages, in all caste levels, and in primitive tribes. After the ANI-ASI mixture that  occurred within the last four thousand years, a cultural shift led to widespread endogamy, decreasing the rate of  additional mixture.   
Long IBD in Europeans and recent population history 
Peter Ralph, Graham Coop  UC Davis, Davis, CA, USA  
Numbers of common ancestors shared at various points in time across populations  can tell us about recent demography, migration, and population movements.  These rates of shared ancestry over tens of generations can be inferred from  genomic data, thereby dramatically increasing our ability to infer population  history much more recent than was previously possible with population genetic  techniques.  We have analyzed patterns of IBD in a dataset of thousands of  Europeans from across the continent, which provide a window into recent  European geographic structure and migration.   
Gene flow between human populations during the exodus from Africa, and the timeline of recent human  evolution  
Aylwyn Scally, Richard Durbin  Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK 
We present a novel test for historical gene flow between populations using unphased genotypes in present-day  individuals, based on the sharing of derived alleles and making a minimal set of assumptions about their demographic  history. We apply this test to data for three human individuals of African, European and Asian ancestry. We find that the  joint distribution of European and Asian genotypes is compatible with these populations having separated cleanly at  some time in the past without subsequent genetic exchange. However the same is not true of the European-African and  Asian-African distributions, which instead suggest an extended period of continued exchange between African and nonAfrican populations after their initial separation. 
We discuss this in comparison with recent models and estimates of separation time between these populations. We  also consider the impact of recent direct experimental studies of the human mutation rate, which suggest rates of  around 0.5 × 10 -9  bp -1  y -1 , substantially lower than prior estimates of 1 × 10 -9  bp -1  y -1  obtained from calibration against  the primate fossil record. We show that in several places the lower rate, implying older dates, yields better agreement  between genetic and non-genetic (paleoanthropological and archaeological) evidence for events surrounding the  exodus of modern humans from Africa and their dispersion worldwide.
Long-term presence versus recent admixture: Bayesian and approximate-Bayesian analyses of genetic  diversity of human populations in Central Asia 
Friso Palstra, Evelyne Heyer, Frederic Austerlitz  Eco-anthropologie et Ethnobiologie UMR 7206 CNRS, Equipe Genetique des Populations Humaines, Museum National  d'Histoire Naturelle, Paris, France 
A long-standing goal in population genetics is to unravel the relative importance of evolutionary forces that shape  genetic diversity. Here we focus on human populations in Central Asia, a region that has long been known to contain  the highest genetic diversity on the Eurasian continent. However, whether this variation principally reflects long-term  presence, or rather the result of admixture associated with repeated migrations into this region in more recent historical  times, remains unclear. Here we investigate the underlying demographic history of Central Asian populations in explicit  relation to Western Europe, Eastern Asia and the Middle East. For this purpose we employ both full Bayesian and  approximate-Bayesian analyses of nuclear genetic diversity in 20 unlinked non-coding resequenced DNA regions,  known to be at least 200 kb apart from any known gene, mRNA or spliced EST (total length of 24 kb), and 22 unlinked  microsatellite loci.   
Using an approximate Bayesian framework, we find that present patterns of genetic diversity in Central Asia may be  best explained by a demographic history which combines long-term presence of some ethnic groups (Indo-Iranians)  with a more recent admixed origin of other groups (Turco-Mongols). Interestingly, the results also provide indications  that this region might have genetically influenced Western European populations, rather than vice versa. A further  evaluation in MCMC-based Bayesian analyses of isolation-with-migration models confirms the different times of  establishment of ethnic groups, and suggests gene flow into Central Asia from the east. The results from the  approximate Bayesian and full Bayesian analyses are thus largely congruent. In conclusion, these analyses illustrate  the power of Bayesian inference on genetic data and suggest that the high genetic diversity in Central Asia reflects both  long-term presence and admixture in more recent historical times. 
Population structure and evidence of selection in the Khoe-San and Coloured populations from southern Africa 
Carina Schlebusch 1 , Pontus Skoglund 1 , Per Sjödin 1 , Lucie Gattepaille 1 , Sen Li 1 , Flora Jay 2 , Dena Hernandez 3 , Andrew  Singleton 3 , Michael Blum 2 , Himla Soodyall 4,5 , Mattias Jakobsson 1 1 Uppsala University, Uppsala, Sweden,  2 Université Joseph Fourier, Grenoble, France,  3 National Institute on Aging (NIH),  Bethesda, USA,  4 University of the Witwatersrand, Johannesburg, South Africa,  5 National Health Laboratory Service,  Johannesburg, South Africa  

The San and Khoe people currently represent remnant groups of a much larger and widely distributed population of  hunter-gatherers and pastoralists who had exclusive occupation of southern Africa before the arrival of Bantu-speaking  groups in the past 1,200 years and sea-borne immigrants within the last 350 years. Mitochondrial DNA, Y-chromosome  and autosomal studies conducted on a few San groups revealed that they harbour some of the most divergent lineages  found in living peoples throughout the world.   

We used autosomal data to characterize patterns of genetic variation among southern African individuals in order to  understand human evolutionary history, in particular the demographic history of Africa. To this end, we successfully  genotyped ~ 2.3 million genome wide SNP markers in 220 individuals, comprising seven Khoe-San, two Coloured and  two Bantu-speaking groups from southern Africa. After quality filtering, the data were combined with publicly available  SNP data from other African populations to investigate stratification and demography of African populations.  

We also  applied a newly developed method of estimating population topology and divergence times. Genotypes and inferred  haplotypes were used to assess genetic diversity, patterns of haplotype variation and linkage disequilibrium in different  populations.  We found that six of the seven Khoe-San populations form a common population lineage basal to all other modern  human populations. The studied Khoe-San populations are genetically distinct, with diverse histories of gene flow with  surrounding populations. A clear geographic structuring among Khoe-San groups was observed, the northern and  southern Khoe-San groups were most distinct from each other with the central Khoe-San group being intermediate. The  Khwe group contained variation that distinguished it from other Khoe-San groups. Population divergence within the  Khoe-San group is approximately 1/3 as ancient as the divergence of the Khoe-San as a whole to other human  populations (on the same order as the time of divergence between West Africans and Eurasians). Genetic diversity in  some, but not all, Khoe-San populations is among the highest worldwide, but it is influenced by recent admixture. We  furthermore find evidence of a Nilo-Saharan ancestral component in certain Khoe-San groups, possibly related to the  introduction of pastoralism to southern Africa.   

We searched for signatures of selection in the different population groups by scanning for differentiated genome-regions  between populations and scanning for extended runs of haplotype homozygosity within populations. By means of the  selection scans, we found evidence for diverse adaptations in groups with different demographic histories and modes of  subsistence. 
Impacts of life-style on human evolutionary history: A genome-wide comparison of herder and farmer  populations in Central Asia 
Michael C. Fontaine 1,2 , Laure Segurel 2,3 , Christine Lonjou 4 , Tatiana Hegay 5 , Almaz Aldashev 6 , Evelyne Heyer 2 , Frederic  Austerlitz 1,2 1 Ecology, Systematics & Evolution. UMR8079 Univ. Paris Sud - CNRS - AgroParisTech, Orsay, France,  2 EcoAnthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cite, Paris, France, 3 Department of Human Genetics, University of Chicago, Chicago, USA,  4 C2BiG (Centre de  Bioinformatique/Biostatistique Genomique d’Ile de France), Plateforme Post-genomique P3S, Hopital Pitie Salpetriere,  Paris, France,  5 Uzbek Academy of Sciences, Institute of Immunology, Tashkent, Uzbekistan,  6 Institute of Molecular  Biology and Medicine, National Center of Cardiology and Internal Medicine, Bishkek,  

Kyrgyzstan Human populations use a variety of subsistence strategies to exploit an exceptionally broad range of habitats and  dietary components. These aspects of human environments have changed dramatically during human evolution, giving  rise to new selective pressures. Here we focused on two populations in Central Asia with long-term contrasted lifestyles:  Kyrgyz’s that are traditionally nomadic herders, with a traditional diet based on meat and milk products, and Tajiks that  are traditionally agriculturalists, with a traditional diet based mostly on cereals. We genotyped 93 individuals for more  than 600,000 SNP markers (Human-660W-Quad-V1.0 from Illumina) spread across the genome. We first analysed the  population structure of these two populations in the world-wide context by combining our results with other available  genome-wide data. Principal component and Bayesian clustering analyses revealed that Tajiks and Kirgiz’s are both  admixed populations which differed however from each other with respect to their ancestry proportions: Tajiks display a  much larger proportion of common ancestry with European populations while Kirgiz’s share a larger common ancestry  with Asiatic populations. We then examined the region of the genome displaying unusual population differentiation  between these two populations to detect natural selection and checked whether they were specific to Central Asia or  not. We complemented these analyses with haplotype-based analyses of selection. 
Bayesian inference of the demographic history of Niger-Congo speaking populations 
Isabel Alves 1,2 , Lounès Chikhi 2,3 , Laurent Excoffier 1,4 1 CMPG, Institute of Ecology and Evolution, Berne, Switzerland,  2 Population and Conservation Genetics Group, Instituto  Gulbenkian de Ciência, Oeiras, Portugal,  3 CNRS, Université Paul Sabatier, ENFA, Toulouse, France,  4 Swiss Institute of  Bioinformatics, Lausanne, Switzerland  
The Niger-Congo phylum encompasses more than 1500 languages spread over sub-Saharan Africa. This current wide  range is mostly due to the spread of Bantu-speaking people across sub-equatorial regions in the last 4000-5000 years.  Although several genetic studies have focused on the evolutionary history of Bantu-speaking groups, much less effort  has been put into the relationship between Bantu and non-Bantu Niger-Congo groups. Additionally, archaeological and  linguistic evidence suggest that the spread of these populations occurred in distinct directions from the core region  located in what is now the border between Nigeria and Cameroon towards West and South Africa, respectively. We  have performed coalescent simulations within an approximate Bayesian computation (ABC) framework in order to  statistically evaluate the relative probability of alternative models of the spread of Niger-Congo speakers and to infer  demographic parameters underlying these important migration events. We have analysed 61 high-quality microsatellite  markers, genotyped in 130 individuals from three Bantu and three non Bantu-speaking populations, representing a  "Southern wave" or the Bantu expansion, and a "Western wave", respectively. Preliminary results suggest that models  inspired by a spatial spread of the populations are better supported than classical isolation with migration (IM) models.  We also find that Niger-Congo populations currently maintain high levels of gene flow with their neighbours, and that  they expanded from a single source between 200 and 600 generations, even though available genetic data do not  provide enough information to accurately infer these demographic parameters.

A genetic study of skin pigmentation variation in India  
Mircea Iliescu1 , Chandana Basu Mallick 2,3 , Niraj Rai 4 , Anshuman Mishra 4 , Gyaneshwer Chaubey 2 , Rakesh Tamang 4 ,  Märt Möls 3 , Rie Goto 1 , Georgi Hudjashov 2,3 , Srilakshmi Raj 1 , Ramasamy Pitchappan 5 , CG Nicholas Mascie-Taylor 1 , Lalji  Singh 4,6 , Marta Mirazon-Lahr 7 , Mait Metspalu 2,3 , Kumarasamy Thangaraj 4 , Toomas Kivisild 1,3 1 Division of Biological Anthropology, University of Cambridge, Cambridge, UK,  2 Evolutionary Biology Group, Estonian  Biocentre, Tartu, Estonia,  3 Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia,  4 Centre for Cellular  and Molecular Biology, Hyderabad, India,  5 Chettinad Academy of Research and Education, Chettinad Health City,  Chennai, India,  6 Banaras Hindu University, Varanasi, India,  7 Leverhulme Centre for Human Evolutionary Studies,  Division of Biological Anthropology, University of Cambridge, Cambridge, UK  

Human skin colour is a polygenic trait that is primarily determined by the amount and type of melanin produced in the  skin. The pigmentation variation between human populations across the world is highly correlated with geographic  latitude and the amount of UV radiation. Association studies together with research involving different model organisms  and coat colour variation have largely contributed to the identification of more than 378 pigmentation candidate genes.  These include TYR OCA2, that are known to cause albinism, MC1R responsible for the red hair phenotype, and genes  such as MATP, SLC24A5 and ASIP that are involved in normal pigmentation variation. In particular, SLC24A5 has been  shown to explain one third of the pigmentation difference between Europeans and Africans. However, the same gene  cannot explain the lighter East Asian phenotype; therefore, light pigmentation could be the result of convergent  evolution. A study on UK residents of Pakistani, Indian and Bangladeshi descent found significant association of  SLC24A5, SLC45A2 and TYR genes with skin colour. While these genes may explain a significant proportion of  interethnic differences in skin colour, it is not clear how much variation such genes explain within Indian populations  who are known for their high level of diversity of pigmentation. We have tested 15 candidate SNPs for association with  melanin index in a large sample of 1300 individuals, from three related castes native to South India. Using logistic  regression model we found that SLC24A5 functional SNP, rs1426654, is strongly associated with pigmentation in our  sample and explains alone more than half of the skin colour difference between the light and the dark group of  individuals. Conversely, the other tested SNPs fail to show any significance; this strongly argues in favour of one gene  having a major effect on skin pigmentation within ethnic groups of South India, with other genes having small additional  effects on this trait. We genotyped the SLC24A5 variant in over 40 populations across India and found that latitudinal  differences alone cannot explain its frequency patterns in the subcontinent. Key questions arising from this research are  when and where did the light skin variant enter South Asia and the manner and reason for it spreading across the Indian  sub-continent. Hence, a comprehensive view of skin colour evolution requires that in depth sequence information be  corroborated with population (genetic) history and with ancient DNA data of past populations of Eurasia