November 12, 2012

Ainu/Ryukyuan paper

The paper I had mentioned earlier is online (and open access!) at the Journal of Human Genetics. From the paper:
The SNP genotype data determined in this study are available upon requests to corresponding authors, under the conditions of collaboration with us and with an appropriate approval of human genomic DNA research ethics committee of institutions to which researchers involved in the data analyses belong.
I guess that means that I won't be able to use this data, but hopefully it will be made available to academic researchers who can use it for different analyses than those presented in this paper, some of which I have suggested here.

For example, in my review of MULTIMIX, I noted that populations that have 100% of one component in ADMIXTURE analysis (which has the same model as frappe used here) are not necessarily unadmixed. So, for example, the frappe analysis shown at the top left shows some Ainu individuals fully on the "blue" Ainu cluster, and others having evidence of admixture. But, are the "100% Ainu" really unadmixed? Using either the aforementioned MULTIMIX or ALDER, it may be possible to show if even they have some admixture. And, using the methodology introduced in a recent Mexican admixture study it may be possible to create "virtual" unadmixed Ainu genomes.

Journal of Human Genetics advance online publication 8 November 2012; doi: 10.1038/jhg.2012.114

The history of human populations in the Japanese Archipelago inferred from genome-wide SNP data with a special reference to the Ainu and the Ryukyuan populations

Japanese Archipelago Human Population Genetics Consortium*: Timothy Jinam1,18, Nao Nishida2,19, Momoki Hirai3,19, Shoji Kawamura3,19, Hiroki Oota4,19, Kazuo Umetsu5,19, Ryosuke Kimura6,19, Jun Ohashi7,19, Atsushi Tajima8,19, Toshimichi Yamamoto9,19, Hideyuki Tanabe10,19, Shuhei Mano11,19, Yumiko Suto12,19, Tadashi Kaname13, Kenji Naritomi13, Kumiko Yanagi13, Norio Niikawa14, Keiichi Omoto15,19, Katsushi Tokunaga2,19 and Naruya Saitou1,16,17,19

Abstract

The Japanese Archipelago stretches over 4000 km from north to south, and is the homeland of the three human populations; the Ainu, the Mainland Japanese and the Ryukyuan. The archeological evidence of human residence on this Archipelago goes back to >30 000 years, and various migration routes and root populations have been proposed. Here, we determined close to one million single-nucleotide polymorphisms (SNPs) for the Ainu and the Ryukyuan, and compared these with existing data sets. This is the first report of these genome-wide SNP data. Major findings are: (1) Recent admixture with the Mainland Japanese was observed for more than one third of the Ainu individuals from principal component analysis and frappe analyses; (2) The Ainu population seems to have experienced admixture with another population, and a combination of two types of admixtures is the unique characteristics of this population; (3) The Ainu and the Ryukyuan are tightly clustered with 100% bootstrap probability followed by the Mainland Japanese in the phylogenetic trees of East Eurasian populations. These results clearly support the dual structure model on the Japanese Archipelago populations, though the origins of the Jomon and the Yayoi people still remain to be solved.

Link

November 11, 2012

A00 at FTDNA2012: history in the making?

I've been following the #FTDNA2012 tag on twitter where Dr. Mike Hammer has been  talking about A00, the new most basal clade of the human Y-chromosome phylogeny. Apparently, 338ky old Y-chromosome ancestor for modern humans, at 98% confidence, with most basal clade found in western Cameroon and in African Americans separated by ~500 years from Cameroonian chromosome.

Root of human Y-chromosome phylogeny is now much older than both mtDNA Eve and first modern human fossils.

Conference attendees feel free to correct/supplement my understanding of what was said.

UPDATE: With respect to the confidence interval, Bonnie Schrack says:
The 338,000 years ago figure was the median (middle) of the confidence interval, which I believe was 95%, and not 98%. The lower limit of the confidence interval was still a bit over 200,000, I think -- that is, still before the time when fossils have been found showing fully anatomically modern features. Mike specifically said that even if the true age of A00 varied by 10 or 20% from the estimate, it would still be before the time when anatomically modern humans are thought to have appeared. I don't remember the upper limit too clearly, but as I recall, it was over 500,000 ybp.
UPDATE II: There is some uncertainty about the level of significance, with different people remembering anything from 90-98%. Some newer information from Tim Janzen:
Michael gave a TMRCA estimate of 338,000 years with a confidence interval range of 246,000 and 563,000 years for the A00/A0 node. He gave a TMRCA estimate of 202,000 years with a confidence interval range of 133,000 to 366,000 years for the A0/R-M269 node.
I guess we will have to wait for the publication to see the exact numbers, but it certainly appears that A00 branched off from the rest of mankind at an age that is much earlier than the next most basal clade (A0).

November 10, 2012

Iron Age Pazyryk mtDNA

The term "Scythian" is often used to describe a whole host of unrelated peoples across time periods, a practice that is not new but was also applied by the classical writers who were not well acquainted with the world of Eurasian nomads.

The distinction between "west" and "east" in terms of genetics and geography was not always very concordant. East Eurasian mtDNA has been uncovered as far west as Ukraine, and West Eurasian mtDNA well to the east of Europe, in Siberia and eastern Central Asia. The former was extended in the boreal zone of north Eurasian hunter-gatherers, while the latter in the intermediate steppe zone. The results of this paper might suggest that the Europeoid zone extended only up to the Altai, but a previous study discovered mtDNA U5a in Lake Baikal, well to the east of this region. A temporal transect of a particular region, such as the one reported here may help  elucidate not only the mixing of west/east types --which seems to be ancient across the northern parts of Eurasia-- but also the kinds of elements involved. For example, haplogroups K and J which are well-represented in the Iron Age results presented in this paper (especially the former), made their first appearance in the transition to the Iron Age in the Baraba forest-steppe zone to the west during the same time. The picture is still muddy, but a few patterns have begun to emerge: first U's, followed by T's during Andronovo horizon, followed by a wide assortment of lineages during the "Scythian" Iron Age. As I've written before, I strongly suspect that the last stratum originated in the area east of the Caspian sea, where the likely Proto-Indo-Iranian homeland existed, and where a segment of the BMAC population "went nomad" after the desiccation of their homeland.

PLoS ONE 7(11): e48904. doi:10.1371/journal.pone.0048904

Tracing the Origin of the East-West Population Admixture in the Altai Region (Central Asia)

Mercedes González-Ruiz et al.

Abstract

A recent discovery of Iron Age burials (Pazyryk culture) in the Altai Mountains of Mongolia may shed light on the mode and tempo of the generation of the current genetic east-west population admixture in Central Asia. Studies on ancient mitochondrial DNA of this region suggest that the Altai Mountains played the role of a geographical barrier between West and East Eurasian lineages until the beginning of the Iron Age. After the 7th century BC, coinciding with Scythian expansion across the Eurasian steppes, a gradual influx of East Eurasian sequences in Western steppes is detected. However, the underlying events behind the genetic admixture in Altai during the Iron Age are still unresolved: 1) whether it was a result of migratory events (eastward firstly, westward secondly), or 2) whether it was a result of a local demographic expansion in a ‘contact zone’ between European and East Asian people. In the present work, we analyzed the mitochondrial DNA lineages in human remains from Bronze and Iron Age burials of Mongolian Altai. Here we present support to the hypothesis that the gene pool of Iron Age inhabitants of Mongolian Altai was similar to that of western Iron Age Altaians (Russia and Kazakhstan). Thus, this people not only shared the same culture (Pazyryk), but also shared the same genetic east-west population admixture. In turn, Pazyryks appear to have a similar gene pool that current Altaians. Our results further show that Iron Age Altaians displayed mitochondrial lineages already present around Altai region before the Iron Age. This would provide support for a demographic expansion of local people of Altai instead of westward or eastward migratory events, as the demographic event behind the high population genetic admixture and diversity in Central Asia.

Link

Investigating East Asian admixture in Balkans/Anatolia/Caucasus

I used ALDER with a dataset of populations from the Balkans, Anatolia, and Caucasus, using the She, Japanese, Miaozu, and Dai as East Asian references. A few caveats for this analysis:

  1. Some populations may possess "South Asian" admixture which may be mistaken for East Asian
  2. Populations differ in the number of SNPs used in the analysis; for example, the Armenian_D sample includes mostly Family Finder data which has a smaller overlap with the SNP set used
  3. Populations differ in the number of individuals used, from a low of 5 (e.g., Turkish_Cypriot) to a high of 45 (Armenian_D)
I have also added the HGDP Europeans to the analysis. The results can be seen below; I have made bold those rows where all estimates are at least one standard error above zero, and bold/big those where the estimates are they are two standard errors above zero. I consider the latter to be the most reliable.



I have already discussed the Turkish signal of admixture at length elsewhere. I will note that the Iranian_D sample produces similar or younger admixture dates, which would make sense, given the fact that the Iranians came under control of the Mongols, while the successes of the latter in Anatolia were short-lived.

A very interesting signal is that of the North Ossetians which show admixture ~9-10 centuries ago. This seems to have occurred a little after the foundation of the kingdom of Alania, and I think it makes excellent sense to view it as the signal of Eurasian nomads (who must have carried some East Asian admixture at that time) intermingling with pre-Iranic local Caucasus populations, Two other populations from the Caucasus, the Georgians and Lezgins (and also the Abkhaz and Chechens) show earlier admixture signals that could very well date to the period of east-west Eurasian migrations inaugurated by the Huns, although a possible Sassanian origin of such influence cannot be overlooked.

The Kurds are another interesting case where the Dodecad sample and the Yunusbayev et al  sample produce very different dates. The different number of SNPs may be at play here, or it may be that some Dodecad participants have recent Turkish ancestry that cause the admixture date average to appear lower, although the globe13 analysis suggests that the "East Asian" found here may be in fact "South Asian".

It seems to me that with large, dense, and well-curated sample sets from several of these populations, their admixture dynamics will become more distinct.

November 09, 2012

Multiway Admixture Deconvolution with MULTIMIX

The software will appear here. This has already been used in the recent 1000 Genomes paper. Below is the analysis of the MEX data:



I ran a small CEU/YRI/MEX K=3 analysis using ADMIXTURE on 30 random individuals from each population.
Notice that ADMIXTURE assigns 100% "American" ancestry to the most Amerindian-admixed individuals. MULTIMIX, on the other hand, has correctly not assigned any Mexicans 100% to the Amerindian component, because it makes use of LD to infer the ancestry of individual segments.

A couple advantages of the new method is that it is not limited to two ancestral populations, and does not require phased data as input, although phased data may provide some accuracy benefit, if available.

I'm eager to try the new software when it becomes available. I am not sure how it will scale (CPU/Memory-wise) with more individuals/components, so it'll be fun to experiment with.

Genet Epidemiol. 2012 Nov 7. doi: 10.1002/gepi.21692. [Epub ahead of print]

Multiway Admixture Deconvolution Using Phased or Unphased Ancestral Panels. 

Churchhouse C, Marchini J.

Abstract

We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model-Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported.

Link

Report on Sardinian-like ancient Europeans from ASHG 2012

Tia Ghose reports from ASHG2012: 
To answer that question, Sikora's team sequenced Ötzi's entire genome and compared it with those from hundreds of modern-day Europeans, as well as the genomes of a Stone Age hunter-gatherer found in Sweden, a farmer from Sweden, a 7,000-year-old hunter-gatherer iceman found in Iberia, and an Iron Age man found in Bulgaria. 
The team confirmed that, of modern people, Sardinians are Ötzi's closest relatives. But among the prehistoric quartet, Ötzi most closely resembled the farmers found in Bulgaria and Sweden, while the Swedish and Iberian hunter-gatherers looked more like present-day Northern Europeans.
...

The findings add to a growing body of evidence showing that farming played a major role in shaping the people of Europe, said Chris Gignoux, a geneticist at the University of California San Francisco, who was not involved in the study.

"I think it's really intriguing," Gignoux said. "The more that people are sequencing these ancient genomes from Europe, that we're really starting to see the impact of farmers moving into Europe."
Any ASHG attendees want to add any interesting details from the presentation? Comment away or e-mail me.

November 08, 2012

Deep structure in Y-chromosome phylogeny (Scozzari et al. 2012)

The following two figures summarize the new phylogenetic information:



The discovery of new haplogroup C7 in Italian is potentially important, since it may mean that haplogroup C --for which region-specific clades covering East Asia (with an American twig), South Asia, Australasia are known-- may have also been present natively in Europe. It will certainly be interesting to resolve occasional C chromosomes that have been occasionally found in West Eurasia at a finer level, and do some whole-Y sequencing on the different C clades to figure out exactly how they are phylogenetically related.

It is also interesting to note that the geographical origin of the human Y-chromosome phylogeny (NW Africa) is discordant with that of mtDNA (apparently in the Khoe-San of South Africa). This underscores the futility of trying to determine "where" modern humans originated in a geographically circumscribed area.

PLoS ONE 7(11): e49170. doi:10.1371/journal.pone.0049170

Molecular Dissection of the Basal Clades in the Human Y Chromosome Phylogenetic Tree

Rosaria Scozzari et al.

One hundred and forty-six previously detected mutations were more precisely positioned in the human Y chromosome phylogeny by the analysis of 51 representative Y chromosome haplogroups and the use of 59 mutations from literature. Twenty-two new mutations were also described and incorporated in the revised phylogeny. This analysis made it possible to identify new haplogroups and to resolve a deep trifurcation within haplogroup B2. Our data provide a highly resolved branching in the African-specific portion of the Y tree and support the hypothesis of an origin in the north-western quadrant of the African continent for the human MSY diversity.

Link

Okinawans and admixture in East Asia

I don't use the Pan-Asian SNP Consortium data much, but the upcoming paper on the Ainu spurred me to give it a look, because it contains an Okinawan sample (JP-RK). I calculated all f3-statistics that involved this sample, and report the lowest f3-statistic for all populations in this set that appear to be admixed:


Several of these are interesting:
  • A set of Indonesian populations (ID prefix; Lamaholot, Lembata, Kambera, Manggarai) are mixed with Melanesians (AX-ME)
  • A set of Indian populations appear admixed (IN prefix). It seems that the Okinawan sample acts as a surrogate for "Asian" ancestry 
  • Filipino populations PI-UI and PI-UN (listed as Visaya, Chabakano and Tagalog) are seen as mixtures of Okinawans and PI-UB (Ilocano)
  • The three Singaporean populations (SG prefix) are seen as mixtures with Caucasoids (the SG-ID Tamil Indians with CEU), with Sunda Indonesians (SG-ML Malay with ID-SU), with Zhuang Chinese (SG-CH Singaporean Chinese with CN-CC Zhuang, northern)
  • Tai Yuan from Thailand with Mlabri (TH-TU with TH-MA)
  • Taiwanese (Hakka TW-HA and Minnan TW-HB) with CN-CC (Zhuang) and Jiamao (CN-JI)
  • Cantonese CN-GA  with Jiamao (CN-JI)
  • Uygur CN-UG with West Eurasians (CEU)
And, of course JPT and JP-ML (Japanese) are seen as a mixture of Okinawans and Mandarin Han (CN-SH) and Beijing Chinese (CHB).

An interesting question is whether the mainland East Asian Yayoi element in Japanese is more similar to Han (as the f3 statistic suggests) or to Koreans. Interestingly, Koreans themselves (KR-KR) appear admixed between Han (CN-SH) and Okinawans. So, it seems that whatever this Okinawan element represents was not limited to the isles of Japan.

I also calculated the D-statistic:

D(CN-SH      KR-KR  :      JP-RK        YRI) =      -0.0154   (Z = -14.779)

which suggests indeed, that there is an excess of "Okinawan"-like ancestry in Koreans compared to the Chinese. This is very interesting, because it suggests that similarity between Koreans and Japanese is due to a common substratum in the two populations. 

November 07, 2012

71,000-year old South African microliths

Press release, Nature News, and podcast.

Nature (2012) doi:10.1038/nature11660

An early and enduring advanced technology originating 71,000 years ago in South Africa

Kyle S. Brown et al.

There is consensus that the modern human lineage appeared in Africa before 100,000 years ago1, 2. But there is debate as to when cultural and cognitive characteristics typical of modern humans first appeared, and the role that these had in the expansion of modern humans out of Africa3. Scientists rely on symbolically specific proxies, such as artistic expression, to document the origins of complex cognition. Advanced technologies with elaborate chains of production are also proxies, as these often demand high-fidelity transmission and thus language. Some argue that advanced technologies in Africa appear and disappear and thus do not indicate complex cognition exclusive to early modern humans in Africa3, 4. The origins of composite tools and advanced projectile weapons figure prominently in modern human evolution research, and the latter have been argued to have been in the exclusive possession of modern humans5, 6. Here we describe a previously unrecognized advanced stone tool technology from Pinnacle Point Site 5–6 on the south coast of South Africa, originating approximately 71,000 years ago. This technology is dominated by the production of small bladelets (microliths) primarily from heat-treated stone. There is agreement that microlithic technology was used to create composite tool components as part of advanced projectile weapons7, 8. Microliths were common worldwide by the mid-Holocene epoch, but have a patchy pattern of first appearance that is rarely earlier than 40,000 years ago9, 10, and were thought to appear briefly between 65,000 and 60,000 years ago in South Africa and then disappear. Our research extends this record to ~71,000 years, shows that microlithic technology originated early in South Africa, evolved over a vast time span (~11,000 years), and was typically coupled to complex heat treatment that persisted for nearly 100,000 years. Advanced technologies in Africa were early and enduring; a small sample of excavated sites in Africa is the best explanation for any perceived ‘flickering’ pattern.

Link

Major new Ainu genetic study forthcoming

Genetic kinship found between Ainu and native Okinawans (The Asahi Shimbun):
The researchers examined and compared the DNA of 36 Ainu, 35 native Okinawans, and 243 people living in Honshu and elsewhere in Japan. They also studied the DNA of ethnic Han Chinese living in Beijing. The Ainu DNA was from stored samples that had been collected about 30 years ago.

The analysis found that the DNA of the Ainu bore closest similarity to people who had lived for generations in Okinawa. There was increasing dissimilarity with--in this order--those from Honshu, South Koreans and Chinese.
Meanwhile, the researchers found that the DNA of people living in Honshu showed similarities with that of South Koreans and Chinese.

The findings were to be published Nov. 1 in the Journal of Human Genetics.
I don't see the paper on the journal site yet. Loh et al. (2012) were able to infer that admixture in the Japanese occurred 45 +/- 6 generations ago, and involved at least 41 +/- 3% Yayoi ancestry. Another recent paper (He et al. 2012) estimated 23.1∼39.5% "Paleolithic" ancestry in mainland Japanese. But both studies lacked an Ainu genetic sample, which will apparently now become available (and I hope publicly so).

It will now be possible both to do a 2-reference text of admixture with software like ALDER for the Japanese, but also, and perhaps more importantly, to do a 1-ref test of admixture for the Ainu themselves! It is important to remember that the Ainu are not unmodified descendants of the Jomon, and their own ancestry is likely to be complex.

And, there will now be a second population of Y-haplogroup D descendants (the Ainu) to complement the Andamanese islanders genotyped by Reich et al. (2009). It is not clear to me whether there will be any autosomal signal left to link these peoples together, but the issue can now be investigated.

Finally, there is the whole issue of the relationship of the Ainu with West Eurasians; while research has not been supportive of that notion, it may still be useful to see whether the hirsuteness of the Ainu and other phenotypic similarities with Europeans have the same genetic aetiology or not. A link of a different kind that might be useful to investigate is the East Eurasian/Amerindian-like gene flow into Europe which seems to be more pronounced for Amerindians: will the signal also be present for the Ainu, and how strong will it be? And, of course there is that whole other issue of levels of affinity to Eurasian archaic hominins...

It is great that the last few gaps in our sampling of world genetic variation are being filled. Time and again we have discovered that at the "edges of variation" we often find the most interesting nuggets of information about our prehistoric past (e.g., Sardinians re: prehistoric Europe, Australo-Melanesians re: Denisovan admixture, Amerindians re: North Eurasian admixture in Europe, Khoe-San re: earliest divergences in the human family). The Ainu are likely to offer us new insight not only about their own origins, and those of the Japanese, but also about events taking place much further from the isles of Japan.

November 05, 2012

GWAS study of pigmentation in four European countries

From the paper:
Males (M) have consistently lighter pigmentation (lower scored) than females (F) in all four countries. Among countries, the largest pigmentation difference is with Ireland, where, in our sample, individuals have lighter pigmentation or lower M index on average than in Poland, Italy, or Portugal. Hair pigmentation histogram (C) and boxplot by country (D) in 341 individuals showing the distribution of hair pigmentation and the differences among countries. In our sample, individuals from Northern European countries (Ireland, Poland) have on average lighter hair pigmentation than individuals from Southern European countries (Italy, Portugal). The distributions in males are similar to those in females in all countries except Ireland, where, in our sample, males have darker hair color than females (not shown). Eye pigmentation histogram (E) and boxplot by country (F) in 468 individuals showing the bimodal distribution of eye pigmentation and the differences among countries. Comparison with self-reported phenotypes shows that the two modes of the distribution correspond to blue and brown eye color, while individuals reporting green and hazel eye color have intermediate C’ values. As with hair pigmentation, in our sample, individuals from Northern European countries have on average lighter eye pigmentation than individuals from Southern European countries. 
...   
Interestingly, our analysis of variation in skin color in Europe demonstrates a consistent difference in skin color between the sexes. By the DermaSpectrometer M index measure, males are more lightly pigmented than females in each of the four European countries we studied. The same trend in M index was reported previously in a sample of European Americans [38]. Our results in populations of European ancestry contradict earlier anthropological studies that have concluded females are more lightly pigmented than males in most populations (reviewed in [2]). One potential reason for the conflicting results is the different instruments used. In early studies, which used the Evans Electric Limited (EEL) and Photovolt broad-spectrum spectrophotometers, skin pigmentation estimates may be confounded by the hemoglobin level to a greater extent than for the DermaSpectrometer used in the present study [46].

Some data (lower = lighter):



One thing of interest is that while Irish males/females are both lighter-eyed than other Europeans, including Poles from northern Europe, Irish females appear to be lighter-haired than Irish males (96.3 vs. 106.7), but no such substantial sex difference exists in the Poles in this trait (107.5 vs. 109.5). Sexual dimorphism seems to lean in the direction of lighter male skins and lighter female hair across the four countries.

Peter Frost has offered the theory that "gentlemen prefer blondes" because during the Ice Age boreal hunters lived a harsh lifestyle that killed many of them, but the remainder could not adopt a polygynous lifestyle, because provisioning for a wife was expensive. As a result, women had to compete for the remaining men, and men could be picky, preferring those with a "rare color advantage." It is not immediately clear to me how this might explain the Ireland vs. Poland differentiation, assuming it reflects a broader NW/NE trend, since NE Europeans are more likely to be descended from hunter-gatherers of the tundra-steppe.

PLoS ONE 7(10): e48294. doi:10.1371/journal.pone.0048294

Genome-Wide Association Studies of Quantitatively Measured Skin, Hair, and Eye Pigmentation in Four European Populations

Sophie I. Candille et al.

Pigmentation of the skin, hair, and eyes varies both within and between human populations. Identifying the genes and alleles underlying this variation has been the goal of many candidate gene and several genome-wide association studies (GWAS). Most GWAS for pigmentary traits to date have been based on subjective phenotypes using categorical scales. But skin, hair, and eye pigmentation vary continuously. Here, we seek to characterize quantitative variation in these traits objectively and accurately and to determine their genetic basis. Objective and quantitative measures of skin, hair, and eye color were made using reflectance or digital spectroscopy in Europeans from Ireland, Poland, Italy, and Portugal. A GWAS was conducted for the three quantitative pigmentation phenotypes in 176 women across 313,763 SNP loci, and replication of the most significant associations was attempted in a sample of 294 European men and women from the same countries. We find that the pigmentation phenotypes are highly stratified along axes of European genetic differentiation. The country of sampling explains approximately 35% of the variation in skin pigmentation, 31% of the variation in hair pigmentation, and 40% of the variation in eye pigmentation. All three quantitative phenotypes are correlated with each other. In our two-stage association study, we reproduce the association of rs1667394 at the OCA2/HERC2 locus with eye color but we do not identify new genetic determinants of skin and hair pigmentation supporting the lack of major genes affecting skin and hair color variation within Europe and suggesting that not only careful phenotyping but also larger cohorts are required to understand the genetic architecture of these complex quantitative traits. Interestingly, we also see that in each of these four populations, men are more lightly pigmented in the unexposed skin of the inner arm than women, a fact that is underappreciated and may vary across the world.

Link

November 03, 2012

Recent admixture in Altaic populations: a legacy of Empire?

Continuing my experiments with ALDER, I took every single Altaic population publicly available, i.e., the following 25 populations:
Altai, Balkars_Y, Buryat, Chuvashs_16, Daur, Dolgan, Evenk_15, Hezhen, Kumyks_Y, Kyrgyz_Bishkek_Ho, Mongol, Mongola, Nogais_Y, Oroqen, Tu, Turkish_Aydin_Ho, Turkish_Istanbul_Ho, Turkish_Kayseri_Ho, Turkmens_Y, Turks, Tuva, Uygur, Uzbeks, Xibo, Yakut
I also took three West Eurasian populations unlikely to have historical East Asian admixture (French, French_Basque, and Sardinians), and three East Eurasian populations unlikely to have historical West Eurasian admixture (Dai, She, Miaozu). I merged all of the above in PLINK with a --geno 0.03 flag, and extracting SNPs present in the Rutgers recombination map for Illumina chips (a total of 524,822 SNPs).

I then ran ALDER for all 25 Altaic populations using any of the 3*3 West/East Eurasian reference pairs, or a total of 25*3*3= 225 runs. I retained only those 2-ref admixture analyses for which ALDER reported "success" with no warnings.

I then converted reported times to calendar dates: a generation of 29 years was assumed; lacking information about the age of the sampled individuals, I assumed that the "present" is 1980; finally, I report the earliest and latest -/+ limits of any confidence interval, as well as the median of all estimates.

The results can be seen below; for 11 of the 25 populations there was at least one test which was successful with no warnings. This does not mean that the other populations are unadmixed, but the following cases appear to be most "well-behaved":


Now, these appear to make excellent sense.

Of the Dolgans:
There also existed a group of Russian settlers on the River Heta, who, by the end of the 19th century, had become Dolganized and had gradually adopted the way of life of nomadic reindeer breeders. ... The tribes forming the nucleus of the Dolgans migrated from the banks of the River Lena at the end of the 17th century. One of the reasons for migration was the fact that Russian goods, flour, for instance, were coming to the Taimyr Peninsula by the boats on the Lena.
The 1770-1860AD range for the admixture appears to coincide with the period where the Dolgans came under Russian influence.

Of the Evenks:
The history of the Evenks' habitation can be traced in detail from the 17th century on. At that time the Evenks left several of their previous territories, for instance, the River Angara, when the Yakut, the Buryat and the Russians appeared in the province. The Evenks had especially bad relations with the Yakuts, who had settled in the river basin of the Lena in the 13th century. In the 18th and 19th centuries the Evenks living there adopted the Yakut language. In the Baikal area the Evenks began to speak the Buryat and the Mongolian languages, and even converted to lamaism. The southern Evenk -- the Manegir, the Birar, the Solon -- were influenced by the Manchu, Daur and Chinese cultures. The arable lands in Siberia were occupied by Russian settlers, migrating there in the 17th century, and those Evenks, living in the vicinity on the upper reaches of the Lena and near Baikal, were russified.
Again, the  1630-1800AD admixture range seems consistent with the time when Evenks came into contact with Russians.

Of the Nogais:
 In the first half of the 17th century a number of Nogay tribes were nomadic on the steppes between the Danube and the Caspian. The invasion of the warlike Kalmyks forced several of the Nogay tribes to leave their home steppes and withdraw to the foothills of the North Caucasus. By the River Kuban they met with the Cherkess.  In the Moscow chronicles from the 16th and 17th centuries there are several mentions of the Nogay, including the two Nogay Hordes, the Great and the Small. The former roamed beyond the River Volga, the latter somewhat to the west. Both had numerous military encounters with the Russians. In the 17th century some of the Nogay chiefs entered into an alliance with Moscow and fought at times together with the Russians against the Kabardians, the Kalmyks and peoples of Dagestan. 
 The 1610-1730AD range intersects the period when the Nogais settled in the North Caucasus and interacted with North Caucasians and Russians.

Not much needs to be said for the admixture signal in the Uygur, Uzbek, Kyrgyz, and Mongols which collectively ranges from 1260-1500AD. This was a period of Mongol power when Mongolian and Turkic speaking peoples assumed control over Central Asia and replaced to a great degree the previous inhabitants of the area.

The origin of the Balkars is less certain, because they are an old Turkic group that settled in the Caucasus, but the admixture (830-1220AD) date seems plausible. So does, of course, that of the Turks from Caesaria (990-1260AD) which parallels those of my recent experiment, and can be associated with the takeover of Anatolia following the Battle of Manzikert. Finally, I don't have a read explanation for the 11-12th century signal of admixture in the Siberian Altai and Buryat, but presumably it has something to do with the expansions of Altaic peoples around that time that were also felt in the west during this period; presumably, this involved some type of mixture with Caucasoid groups in Siberia.

The admixture dates are quite helpful in helping us better interpret other signals of admixture such as those of ADMIXTURE analyses (e.g., globe13). For example, the Dolgan have 13.1% North_European in that experiment, and the Altai have 13.2%, but apparently this occurred centuries apart and may have involved different groups of West Eurasian people.

In conclusion, ALDER seems to find some quite plausible dates for major admixture episodes in the history of Altaic populations that are compatible with fairly recent historical events.

rolloff and ALDER analysis of Turks

I carried out rolloff analysis of the Behar et al. (2010) sample of Turks together with the sample of Uzbeks from the same, and the Yunusbayev et al. (2011) sample of Armenians. A --geno 0.03 flag was applied for merging and SNPs available in the Rutgers recombination map for Illumina chips were used.

The exponential decay can be seen below:

The signal of admixture seems pretty clear and extends up to several cM. Of course, as always, this does not mean that exactly these two populations mixed to form the Turks sample, but it does mean that they are reasonable standins.

The jackknife gives an admixture time estimate of 27.622 +/- 5.348 generations or 800 +/- 160 years, which of course makes perfect historical sense as it is a date between the first arrival of the Seljuks in Anatolia and the final consolidation of power by the Ottomans. Note also that this probably applies principally to this particular sample (which I believe is from Cappadoccia) and there were perhaps different admixture dynamics elsewhere.

I had started this analysis before the announcement of ALDER, but since it is very fast, I decided to give it a go as well. Below is the raw output:




                    *** Admixture test summary ***

Weighted LD curves are fit starting at 1.45 cM

Pre-test: Does Turks have a 1-ref weighted LD curve with Armenians_Y?
   1-ref decay z-score:    0.09
   1-ref amp_exp z-score: -0.01
                                  NO: curve is not significant

Pre-test: Does Turks have a 1-ref weighted LD curve with Uzbeks?
   1-ref decay z-score:    6.56
   1-ref amp_exp z-score:  5.02
                                  YES: curve is significant

Does Turks have a 2-ref weighted LD curve with Armenians_Y and Uzbeks?
   2-ref decay z-score:    5.61
   2-ref amp_exp z-score:  5.58
                                  YES: curve is significant

Do 2-ref and 1-ref curves have consistent decay rates?
   1-ref Armenians_Y - 2-ref z-score:                  0.01   ( 13%)
   1-ref Uzbeks - 2-ref z-score:                       0.69   ( 11%)
   1-ref Uzbeks - 1-ref Armenians_Y z-score:          -0.00   ( -1%)
                                  YES: decay rates are consistent

Test FAILS (z=5.58, p=2.4e-08) for Turks with {Armenians_Y, Uzbeks} weights

DATA: failure 2.4e-08 Turks Armenians_Y Uzbeks 5.58 -0.01 5.02 13% 23.92 +/- 4.26 0.00002930 +/- 0.00000525 27.18 +/- 302.36 -0.00000082 +/- 0.00013129 26.84 +/- 4.09 0.00002316 +/- 0.00000461

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B



The age estimate appears to be very similar, and most curves appear to be significant, except the one with Armenians_Y. This makes good sense. From Loh et al. (2012):
Also, if a reference A' shares some of the same admixture history as C or is simply very closely related to C, the pre-test will typically identify long-range correlated LD and deem A' an unsuitable reference to use for testing admixture.
In our case, A'=Armenians and C=Turks. We can be fairly sure that Armenians lack the same admixture history as Turks (because they were not affected by Central Asian Turkic invasions), but we can try a 1-ref analysis of Armenians with Uzbeks to substantiate it. The admixture lower bound estimate is a huge interval 7.6 +/- 88.2 and the jackknife is unable to estimate the admixture time. Thus, more plausibly, the second explanation applies, and because Armenians_Y are very closely related to Turks, they are deemed as an inappropriate reference to test admixture.

Finally, the lower bound of the admixture fraction for Turks with an Uzbek reference is estimated as:

Mixture fraction % lower bound (assuming admixture): 29.8 +/- 4.0

This is a very interesting number. We can be fairly sure that Central Asian Turkic people who invaded Anatolia carried with them an East Eurasian component, but in what proportion to their West Eurasian one? The East Eurasian element in Turks has been rather consistently estimated at ~5-7% with various methods, so perhaps this formed the minority element in the Turkic people who arrived in Anatolia. 

On the other hand, this case is rather muddled by the occurrence of by-directional gene flow: Uzbeks may have West Eurasian ancestry of ultimate West Asian origin, just as Turks have Central Asian ancestry. And, indeed, when we estimate the admixture fraction of Uzbeks with the Turks as a reference, we obtain:

Mixture fraction % lower bound (assuming admixture): 46.7 +/- 2.4

The age estimate for this is ~16 +/- 2 generations = 460 +/- 60 years. Very similar time estimates appear when Armenians are used as a West Eurasian reference. So, this might indicate that the Uzbek population was formed by admixture after the Anatolian Turks were so formed.

I see no easy way to solve the problem of estimating admixture proportions when both extant populations have been both donors and recipients of gene flow, but in any case, these numbers are something to think about.

Analysis of Turks with a variety of Turkic and East Asian populations

I subsequently formed a new dataset by merging the sample of Turks with a variety of Turkic and East Asian populations (same procedure for SNP choice).


For the calendar year calculation, I arbitrarily set the birthdate of the modern sampled individuals at 1980; I have no idea on the age profile of the individuals comprising the Behar et al. sample of Turks. I have also used a mindis=0.5cM which facilitated the convenient automated extraction of the dates from the ALDER output and also gave a level playing field for all the reference populations. The age picked by ALDER using its own adaptive threshold did not usually differ from the reported one by more than a few generations.

The results indicate two things:

  • The % of admixture depends on the choice of population, with highest amount using Uzbeks  as a reference, and lowest using the far Asian populations from China. This indicates our uncertainty regarding the East/West Eurasian-ness of the people who settled in Anatolia.
  • Admixture times, on the other hand appear to be fairly constant and appear to frame an important watershed moment of Anatolian history, the Battle of Manzikert which paved the way for the eventual Turkification of the peninsula. The Turkmen sample appears as an outlier in this respect, which might indicate that limited migration of Turkmen tribes may have occurred at a later date.

Admixture in the Chuvash and the Uygur

I took the Behar et al. (2010) sample of Chuvash, excluding GSM536731 which has atypical ancestry and merged it with the Li et al. HGDP French_Basque and Dai. The latter two populations don't show evidence of admixture according to both the f3-statistic and ALDER (Loh et al. 2012). (I used a --geno 0.03 flag in PLINK and extracted a subset of SNPs including in the Rutgers recombination map for Illumina chips).

The f3-statistic f3(Chuvashs_16; French_Basque, Dai) was equal to -0.011311 (Z=-31.308), indicative of admixture.

I then ran an ALDER analysis:


Test SUCCEEDS (z=4.85, p=1.2e-06) for Chuvashs_16 with {French_Basque, Dai} weights

DATA: success (warning: decay rates inconsistent) 1.2e-06 Chuvashs_16 French_Basque Dai 4.85 3.78 5.18 50% 40.27 +/- 5.80 0.00032377 +/- 0.00006676 28.21 +/- 7.47 0.00004231 +/- 0.00000962 47.08 +/- 4.53 0.00016628 +/- 0.00003212

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

This indicates that the Chuvash can be seen as admixed, but with inconsistent decays: the one with the French Basque (=28.21) is younger than the one with the Dai (=47.08). I think this makes fairly good sense, because the Chuvash are descended from people who came to Europe during the 1st millennium AD and must have later mixed with Europeans, perhaps with eastern Slavs as these made their way eastward during the 2nd millennium AD.

I then carried out similar analyses on the HGDP Uygur. As expected f3(Uygur; French_Basque, Dai) = -0.023917 (Z = -60.362), indicative of admixture. The ALDER analysis:


Test SUCCEEDS (z=6.85, p=7.4e-12) for Uygur with {French_Basque, Dai} weights

DATA: success 7.4e-12 Uygur French_Basque Dai 6.85 4.47 7.39 15% 20.56 +/- 3.00 0.00036760 +/- 0.00003660 22.59 +/- 5.06 0.00010920 +/- 0.00002025 19.46 +/- 2.64 0.00007864 +/- 0.00000710

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

suggests a very recent admixture on both the European and East Asian side. It seems fairly clear that whatever admixture was taking place in Central Asia, perhaps for thousands of years, the present-day Ugyur were formed, at least in part, by a fairly recent, perhaps post-Mongol admixture event.

November 02, 2012

ALDER estimates of East Eurasian admixture in Europe

I used the 1-reference method of ALDER to infer lower bounds of East Eurasian admixture in a few European populations. This method does not include a statistical test of admixture (as does the 2-reference one or the f3 test), but we can probably reasonably suppose that some such admixture did take place on the combined evidence of the f3 test and ADMIXTURE evidence.

In any case, I took the East Asian populations of Loh et al. (2012) which had no evidence of admixture with either ALDER or the f3 test, and also a few populations from Rasmussen et al. (2010) that included representatives of Siberian Uralic speakers, as well as the three main branches of narrow-sense Altaic (Turkic, Mongolian, Tungusic), and estimated lower bounds of admixture for a set of European populations. Results can be seen below:


The evidence for admixture appears most convincing in the 1000 Genomes Finns and HGDP Russians where the +/- interval does not intersect or approach zero irrespective of the Asian population chosen. For these populations, the percentages vary from ~4-5% for the "pure" East Eurasians to ~10% for some Siberian groups such as Selkup and Altai. Thes latter carry some West Eurasian admixture, so it makes sense that a greater deal of admixture with them is necessary to account for the observed "East Eurasian" influence. And, indeed, it is probably via such "intermediate" Siberian populations that some East Eurasian ancestry flowed into Europe, rather than via the relatively untouched populations of the Far East.

PS: Note that this probably represents the most recent signal of admixture, and not the older and more general "North Eurasian"/Amerindian-like admixture that, as Loh et al. mention in their paper cannot be captured with ALDER.

ALDER paper and software (Loh et al. 2012)

A new paper has appeared on the arXiv that introduces ALDER, a method for testing for admixture and inferring its parameters (when it happened and the proportions of the two mixing populations). You can get the software from here.

I have already tried it and I can confirm two claims in the paper (i) it's extremely fast, and (ii) it is conservative in the sense that it's test fails even when an f3 test of admixture indicates admixture. Here is a plot of one case where it detected admixture, ASW as CEU+YRI, I got the output on the right, which shows a very clear pattern of exponential decay. I also tried a different experiment using Mozabites as the admixed population. The results are quite interesting:

Test SUCCEEDS (z=10.39, p=2.7e-25) for Mozabite with {CEU30, YRI30} weights

DATA: success (warning: decay rates inconsistent) 2.7e-25 Mozabite CEU30 YRI30 10.39 6.75 11.39 55%  17.45 +/- 1.68 0.00037417 +/- 0.00003187 28.63 +/- 3.84 0.00005311 +/- 0.00000787 16.21 +/- 1.42 0.00023789 +/- 0.00001752

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

Notice that the 1-reference decay using CEU is 28.63 and with YRI it is 16.21, while the 2-reference (both CEU and YRI) is an intermediate 17.45. I believe that this is capturing the same behavior as Jin et al. (2012), according to which:
There was an almost complete absence of recent gene flow from European populations to the Mozabite gene pool (Figure 6A). For the Sub-Saharan African ancestral component, there were more long CSDAs at the tail of empirical distribution than those in the HI model, which confirmed that recent gene flow from African populations had contributed to the Mozabite gene pool (Figure 6B). 
This is also what ALDER is telling us, since the decay using CEU is more "abrupt" (hence lack of long segments of admixture that might indicate recent admixture), while that using YRI is less so (and hence recent Sub-Saharan admixture has contributed longer segments).

In any case, enough with my own preliminary experiments. From the paper itself, there are interesting applications of the new methodology for Sardinians, Japanese, and Central African Pygmies:
Both Central African Pygmy populations in the HGDP, the Mbuti and Biaka, show evidence of admixture (Table 1), about 28 +/- 4 generations (800 years) ago for Mbuti and 38 +/- 4 generations (1100 years) ago for Biaka, estimated using San and Yoruba as reference populations (Figure 2A,C). The intra-population heterogeneity is low, as demonstrated by the negligible affine terms. In each case, we also generated weighted LD curves with the Pygmy population itself as one reference and a variety of second references. We found that using populations French, Han, or Yoruba as the second reference gave very similar amplitudes, but the amplitude was significantly smaller with the other Pygmy population or San as the second reference (Figure 2B,D). Using the amplitudes with Yoruba, we estimated mixture fractions of at least 15.9 +/- 0.9% and 28.8 +/- 1.4% Yoruba-related ancestry for Mbuti and Biaka, respectively. 
For Sardinians:
We detect a very small proportion of Sub-Saharan African ancestry in Sardinians, which our ALDER tests identified as admixed (Table 1; Figure 3A). To investigate further, we computed weighted LD curves with Sardinian as a test population and all pairs of the HapMap CEU, YRI and CHB populations as references (Table 2). We observed an abnormally large amount of shared long-range LD in chromosome 8, likely do to an extended inversion segregating in Europeans (PRICE et al. 2008), so we omitted it from these analyses. The CEU–YRI curve has the largest amplitude, suggesting both that the LD present is due to admixture and that the small non- European ancestry component, for which we estimated a lower bound of 0.6+/-0.2%, is from Africa. The existence of a weighted LD decay curve with CHB and YRI as references provides further evidence that the LD is not simply due to a population bottleneck or other non-admixture sources, as does the fact that our estimated dates from all three reference pairs are roughly consistent at about 40 generations (1200 years). Our findings thus confirm the signal of African ancestry in Sardinians reported in MOORJANI et al. (2011). The date, small mixture proportion, and geography are consistent with a small influx of migrants from North Africa, who themselves traced only a fraction of their ancestry ultimately to Sub-Saharan Africa, consistent with the findings of DUPANLOUP et al. (2004).
Moorjani et al. (2011) had estimated 2.9% admixture in Sardinians occurring at 71 +/- 28 generations, so the new results appear to be different, perhaps on account of the the treatment of the chromosome 8 inversion or the ability of ALDER to pick the distance threshold (hard-set at 0.5cM in rolloff) adaptively. Also, note that ALDER is able to estimate admixture proportions based on the amplitude of the weighted LD, whereas in the previous test the proportions were calculated using an F4 ratio test which did not take into account East Eurasian-like gene flow into the CEU population, and considered both CEU and Sardinians as having experienced no Asian-related gene flow.

So it appears that the African admixture in Sardinians is real, but may be both lower and later than previously estimated. In a recent experiment, I "scrubbed" possible segments of African ancestry in Sardinians, and this diminished their African ancestry from 3.1% to 1.8%. If we consider the 1.8% to be the spurious admixture due to Asian-related gene flow into northern Europe, then African admixture in Sardinians will be the remainder 1.3%, and perhaps lower due to the very "intensive" nature of the scrubbing procedure.

globe4 estimates African admixture in Sardinians as 0.8%, with some heterogeneity in its apportionment in 28 different individuals (left), with three individuals appearing as outliers and the remainder randomly distributed around the 0.8% median. The outlier individuals are HGDP01062, HGDP01076, and HGDP01071; the last of these is not included in the curated version of HGDP released by Patterson et al. (2012). ALDER includes a facility for detecting heterogeneity in admixture, but I did not see this particularly discussed in my first scan of the paper. In any case, it now appears that different methods converge on a small African admixture in Sardinians, and the 1200-year old age estimate seems consistent with medieval history.


The paper also deals with the Japanese: 
Genetic studies have suggested that present-day Japanese are descended from admixture between two waves of settlers, responsible for the Jomon and Yayoi cultures (HAMMER and HORAI 1995; HAMMER et al. 2006; RASTEIRO and CHIKHI 2009). We also observed evidence of admixture in Japanese (Table 1), and while our ability to learn about the history is limited by the absence of a close surrogate for the original Paleolithic mixing population, we were able to take advantage of the one-reference inference capabilities of ALDER. We observed a clear weighted LD curve using HapMap JPT as the test population and JPT–CHB weights (Figure 3B). This curve yields an estimate of 45 +/- 6 generations, or about 1,300 years, as the age of admixture. To our knowledge, this is the first time genome-wide data have been used to date admixture in Japanese. As with previous estimates based on coalescence of Y-chromosome haplotypes (HAMMER et al. 2006), our date is consistent with the archaeologically attested arrival of the Yayoi in Japan roughly 2300 years ago (we suspect that our estimate is from later than the initial arrival because admixture may not have happened immediately). Based on the amplitude of the curve, we also obtain a (likely very conservative) genome-wide lower bound of 41 +/- 3% “Yayoi” ancestry using formula (12) (under the reasonable assumption that Han Chinese are fairly similar to the Yayoi population). It is important to note that observation of a single-reference weighted LD curve is not sufficient evidence to prove that a population is admixed, but we did find a pair of references with which the ALDER test identified Japanese as admixed, which, combined with previous work and the lack of any signal of reduced population size, makes us confident that our inferences are based on true historical admixture.
This is a useful application of the idea that you don't need both reference populations to estimate admixture. If a population A experiences gene flow from another B, then A will become more like B over time, and allele frequency differences between A and B will diminish but will continue to reflect differences between the local and introgressing element. This idea was first used by Pickrell et al. (2012), and a new variation of it is used in the current paper.

According to Wikipedia, Japanese skeletons of the Kofun period resemble those of modern Japanese, so perhaps the age estimate is a little younger than the actual period of admixture. In any case, perhaps admixture between populations carrying varying amounts of Yayoi/Jomon ancestry was not instantaneous, so ALDER is not picking up the beginning of a continuous process that lasted for several centuries.

Finally, there is a reference to another paper currently in submission: "MOORJANI, P., N. PATTERSON, P. LOH, M. LIPSON, and OTHERS, 2012 Reconstructing Roma history from genome-wide data. In submission." Given that the Roma likely possess really old West Eurasian admixture related to "Ancestral North Indians", as well as really recent European admixture after they migrated to Europe, and perhaps even intermediate West/Central Asian admixture as they made their way from India to the west, this seems like a very complicated case, involving admixture at different time scales, and between different but related populations, so it will be interesting to see how it will all fit together.

To conclude, ALDER seems like a very practical tool for studying admixture in human populations, so I'm sure it will prove quite useful in the future.

arXiv:1211.0251 [q-bio.PE]

Inference of Admixture Parameters in Human Populations Using Weighted Linkage Disequilibrium

Po-Ru Loh, Mark Lipson, Nick Patterson, Priya Moorjani, Joseph K. Pickrell, David Reich, Bonnie Berger

Abstract

Long-range migrations and the resulting admixture between populations have been an important force shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We de?ne an LD-based three-population test for admixture and identify scenarios in which it can detect admixture that previous formal tests cannot. We further show that we can discover phylogenetic relationships between populations by comparing weighted LD curves obained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the computation. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.

Link