August 25, 2012

Genes and Geography (Wang et al. 2012)

Gene-geography correlations have been explored before at a regional level. More recently, they were also studied at the global level with the SPA method. A new open access paper shows gene-geography correlations across the world.

These correlations arise from the fact that humans tend to intermarry with their neighbors, so alleles have a decreasing probability of being transmitted from a person at location X to future generations, the further we go from X. But, the more interesting cases are those which show a violation of the overall pattern. These can usually arise because of genetic isolation or long-distance migration. An example is that of the African hunter-gatherer groups:
When hunter-gatherer populations (!Kung, San, Biaka Pygmy, and Mbuti Pygmy) and Mbororo Fulani were included in the analysis, they appeared as isolated clusters on the PCA plots and greatly reduced the similarity between PCA maps and geographic maps (Figure S3, Table S7). The similarity score decreased from 0.790 to 0.548 after including all five of these populations in the analysis. This value, however, is still statistically significant, with a -value of ; further, if we disregard the hunter-gatherer populations and Mbororo Fulani in Figure S3B and only examine the relative locations of the original 23 populations, we can still find a clear resemblance between genetic and geographic coordinates. Compared to the other 23 populations, the four hunter-gatherer populations appear as isolated groups at the south, and Mbororo Fulani appears at the north. These observations are clearer in plots with only one among the five outlier populations included at a time (Figure S3C–S3G), each of which also produces significant similarity scores between genetic and geographic coordinates (Figure S4, Table S7).
Figure S3 is very informative:

Observe that in Figure S3C, the Mbororo Fulani appear in the Balkans (!) relative to Sub-Saharan Africans. That is of course, due to their partial West Eurasian ancestry, but the magnitude of the difference is such that one suspects that it is not only due to this factor; if it were, then the Fulani would place somewhere between Europe and Central Africa.

The remaining figures (D-G) supply the explanation: the four hunter-gatherer groups appear well south of their actual locations; the Pygmy groups not in W/C Africa, but in S Africa; the Khoisan ones not in S Africa but in the Ocean well south of it.

Why does gene-geography correlation suffer such a violation in Africa? Figure S3 shows how different groups relate to W/C Africans. But, one could also use hunter-gatherers as an anchor point (i.e., place them where they actually live): in that case the W/C Africans would be the ones who would be pushed north towards the Mediterranean.

 And, indeed, that is a good argument for the idea I've floated a few times, of substantial Eurasian back-migration into Africa: the genetic difference between African farmers and African hunter-gatherers dwarfs the geographic distance. This can easily be explained if we assume that back-migration from Eurasia affected the former much more than the latter. So, African farmers can be shown to be the outcome of mixture between two-divergent elements: one Eurasian-like, one African hunter-gatherer-like. The latter could include both groups like existing African H-Gs but might also include other groups who had the misfortune of being completely absorbed before the Eye of Science set its sights on the African continent.

PLoS Genet 8(8): e1002886. doi:10.1371/journal.pgen.1002886

A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations

Chaolong Wang et al.

The spatial pattern of human genetic variation provides a basis for investigating the history of human migrations. Statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been used to summarize spatial patterns of genetic variation, typically by placing individuals on a two-dimensional map in such a way that pairwise Euclidean distances between individuals on the map approximately reflect corresponding genetic relationships. Although similarity between these statistical maps of genetic variation and the geographic maps of sampling locations is often observed, it has not been assessed systematically across different parts of the world. In this study, we combine genome-wide SNP data from more than 100 populations worldwide to perform a formal comparison between genes and geography in different regions. By examining a worldwide sample and samples from Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, we find that significant similarity between genes and geography exists in general in different geographic regions and at different geographic levels. Surprisingly, the highest similarity is found in Asia, even though the geographic barrier of the Himalaya Mountains has created a discontinuity on the PCA map of genetic variation.


1 comment:

Ted Kandell said...

Why does Southwest Cameroon appear to be the origin of "Non-Paleoafrican AMH"?

We would have expected this to be somewhere in East Africa, not Cameroon.

This may not just be because of the Bantu Expansion from Cameroon. Perhaps the Cameroon region was the source of the East African population that expanded outward to Eurasia. Also, Cameroon and Equatorial Guinea have a substantial percentage of Y R1b1c-V88. Perhaps this "second reverse migration from Eurasia" (if Y E* was in fact the first) contributes to a greater similarity of nearby populations to Eurasians than the earlier Paleoafricans.

Can we plot the YRI NA18507 whole genome sequence and some of the new Paleoafrican whole genome sequences without the ascertainment bias of the SNP arrays based mainly on CEU and CHB/JPT ascertained SNPs? Or rather, plots based on whole genome sequences only worldwide, not SNP array data, so we can see if these are radically different?

My guess is that when these African whole genomes are put on a map of Africa, we may see a very different arrangement of East Africans and most Eurasians relative to Africans, and a real distance only among Denisovan admixed populations (the Papuan and Australian whole genome sequences). The SNP ascertainment bias problem may be emerging when we try to "fit" the African SNP array data geographically within Africa rather than the YRI and the Paleoafrican whole genomes to see where everyone else plots.
This should not be too difficult to replicate.

The inclusion of the Australian whole genome should be particularly revealing since this has not been utilized yet. There are certainly enough low-coverage whole genome sequences worldwide to give us a rather complete unbiased picture for the first time.