November 02, 2012

ALDER paper and software (Loh et al. 2012)

A new paper has appeared on the arXiv that introduces ALDER, a method for testing for admixture and inferring its parameters (when it happened and the proportions of the two mixing populations). You can get the software from here.

I have already tried it and I can confirm two claims in the paper (i) it's extremely fast, and (ii) it is conservative in the sense that it's test fails even when an f3 test of admixture indicates admixture. Here is a plot of one case where it detected admixture, ASW as CEU+YRI, I got the output on the right, which shows a very clear pattern of exponential decay. I also tried a different experiment using Mozabites as the admixed population. The results are quite interesting:

Test SUCCEEDS (z=10.39, p=2.7e-25) for Mozabite with {CEU30, YRI30} weights

DATA: success (warning: decay rates inconsistent) 2.7e-25 Mozabite CEU30 YRI30 10.39 6.75 11.39 55%  17.45 +/- 1.68 0.00037417 +/- 0.00003187 28.63 +/- 3.84 0.00005311 +/- 0.00000787 16.21 +/- 1.42 0.00023789 +/- 0.00001752

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

Notice that the 1-reference decay using CEU is 28.63 and with YRI it is 16.21, while the 2-reference (both CEU and YRI) is an intermediate 17.45. I believe that this is capturing the same behavior as Jin et al. (2012), according to which:
There was an almost complete absence of recent gene flow from European populations to the Mozabite gene pool (Figure 6A). For the Sub-Saharan African ancestral component, there were more long CSDAs at the tail of empirical distribution than those in the HI model, which confirmed that recent gene flow from African populations had contributed to the Mozabite gene pool (Figure 6B). 
This is also what ALDER is telling us, since the decay using CEU is more "abrupt" (hence lack of long segments of admixture that might indicate recent admixture), while that using YRI is less so (and hence recent Sub-Saharan admixture has contributed longer segments).

In any case, enough with my own preliminary experiments. From the paper itself, there are interesting applications of the new methodology for Sardinians, Japanese, and Central African Pygmies:
Both Central African Pygmy populations in the HGDP, the Mbuti and Biaka, show evidence of admixture (Table 1), about 28 +/- 4 generations (800 years) ago for Mbuti and 38 +/- 4 generations (1100 years) ago for Biaka, estimated using San and Yoruba as reference populations (Figure 2A,C). The intra-population heterogeneity is low, as demonstrated by the negligible affine terms. In each case, we also generated weighted LD curves with the Pygmy population itself as one reference and a variety of second references. We found that using populations French, Han, or Yoruba as the second reference gave very similar amplitudes, but the amplitude was significantly smaller with the other Pygmy population or San as the second reference (Figure 2B,D). Using the amplitudes with Yoruba, we estimated mixture fractions of at least 15.9 +/- 0.9% and 28.8 +/- 1.4% Yoruba-related ancestry for Mbuti and Biaka, respectively. 
For Sardinians:
We detect a very small proportion of Sub-Saharan African ancestry in Sardinians, which our ALDER tests identified as admixed (Table 1; Figure 3A). To investigate further, we computed weighted LD curves with Sardinian as a test population and all pairs of the HapMap CEU, YRI and CHB populations as references (Table 2). We observed an abnormally large amount of shared long-range LD in chromosome 8, likely do to an extended inversion segregating in Europeans (PRICE et al. 2008), so we omitted it from these analyses. The CEU–YRI curve has the largest amplitude, suggesting both that the LD present is due to admixture and that the small non- European ancestry component, for which we estimated a lower bound of 0.6+/-0.2%, is from Africa. The existence of a weighted LD decay curve with CHB and YRI as references provides further evidence that the LD is not simply due to a population bottleneck or other non-admixture sources, as does the fact that our estimated dates from all three reference pairs are roughly consistent at about 40 generations (1200 years). Our findings thus confirm the signal of African ancestry in Sardinians reported in MOORJANI et al. (2011). The date, small mixture proportion, and geography are consistent with a small influx of migrants from North Africa, who themselves traced only a fraction of their ancestry ultimately to Sub-Saharan Africa, consistent with the findings of DUPANLOUP et al. (2004).
Moorjani et al. (2011) had estimated 2.9% admixture in Sardinians occurring at 71 +/- 28 generations, so the new results appear to be different, perhaps on account of the the treatment of the chromosome 8 inversion or the ability of ALDER to pick the distance threshold (hard-set at 0.5cM in rolloff) adaptively. Also, note that ALDER is able to estimate admixture proportions based on the amplitude of the weighted LD, whereas in the previous test the proportions were calculated using an F4 ratio test which did not take into account East Eurasian-like gene flow into the CEU population, and considered both CEU and Sardinians as having experienced no Asian-related gene flow.

So it appears that the African admixture in Sardinians is real, but may be both lower and later than previously estimated. In a recent experiment, I "scrubbed" possible segments of African ancestry in Sardinians, and this diminished their African ancestry from 3.1% to 1.8%. If we consider the 1.8% to be the spurious admixture due to Asian-related gene flow into northern Europe, then African admixture in Sardinians will be the remainder 1.3%, and perhaps lower due to the very "intensive" nature of the scrubbing procedure.

globe4 estimates African admixture in Sardinians as 0.8%, with some heterogeneity in its apportionment in 28 different individuals (left), with three individuals appearing as outliers and the remainder randomly distributed around the 0.8% median. The outlier individuals are HGDP01062, HGDP01076, and HGDP01071; the last of these is not included in the curated version of HGDP released by Patterson et al. (2012). ALDER includes a facility for detecting heterogeneity in admixture, but I did not see this particularly discussed in my first scan of the paper. In any case, it now appears that different methods converge on a small African admixture in Sardinians, and the 1200-year old age estimate seems consistent with medieval history.


The paper also deals with the Japanese: 
Genetic studies have suggested that present-day Japanese are descended from admixture between two waves of settlers, responsible for the Jomon and Yayoi cultures (HAMMER and HORAI 1995; HAMMER et al. 2006; RASTEIRO and CHIKHI 2009). We also observed evidence of admixture in Japanese (Table 1), and while our ability to learn about the history is limited by the absence of a close surrogate for the original Paleolithic mixing population, we were able to take advantage of the one-reference inference capabilities of ALDER. We observed a clear weighted LD curve using HapMap JPT as the test population and JPT–CHB weights (Figure 3B). This curve yields an estimate of 45 +/- 6 generations, or about 1,300 years, as the age of admixture. To our knowledge, this is the first time genome-wide data have been used to date admixture in Japanese. As with previous estimates based on coalescence of Y-chromosome haplotypes (HAMMER et al. 2006), our date is consistent with the archaeologically attested arrival of the Yayoi in Japan roughly 2300 years ago (we suspect that our estimate is from later than the initial arrival because admixture may not have happened immediately). Based on the amplitude of the curve, we also obtain a (likely very conservative) genome-wide lower bound of 41 +/- 3% “Yayoi” ancestry using formula (12) (under the reasonable assumption that Han Chinese are fairly similar to the Yayoi population). It is important to note that observation of a single-reference weighted LD curve is not sufficient evidence to prove that a population is admixed, but we did find a pair of references with which the ALDER test identified Japanese as admixed, which, combined with previous work and the lack of any signal of reduced population size, makes us confident that our inferences are based on true historical admixture.
This is a useful application of the idea that you don't need both reference populations to estimate admixture. If a population A experiences gene flow from another B, then A will become more like B over time, and allele frequency differences between A and B will diminish but will continue to reflect differences between the local and introgressing element. This idea was first used by Pickrell et al. (2012), and a new variation of it is used in the current paper.

According to Wikipedia, Japanese skeletons of the Kofun period resemble those of modern Japanese, so perhaps the age estimate is a little younger than the actual period of admixture. In any case, perhaps admixture between populations carrying varying amounts of Yayoi/Jomon ancestry was not instantaneous, so ALDER is not picking up the beginning of a continuous process that lasted for several centuries.

Finally, there is a reference to another paper currently in submission: "MOORJANI, P., N. PATTERSON, P. LOH, M. LIPSON, and OTHERS, 2012 Reconstructing Roma history from genome-wide data. In submission." Given that the Roma likely possess really old West Eurasian admixture related to "Ancestral North Indians", as well as really recent European admixture after they migrated to Europe, and perhaps even intermediate West/Central Asian admixture as they made their way from India to the west, this seems like a very complicated case, involving admixture at different time scales, and between different but related populations, so it will be interesting to see how it will all fit together.

To conclude, ALDER seems like a very practical tool for studying admixture in human populations, so I'm sure it will prove quite useful in the future.

arXiv:1211.0251 [q-bio.PE]

Inference of Admixture Parameters in Human Populations Using Weighted Linkage Disequilibrium

Po-Ru Loh, Mark Lipson, Nick Patterson, Priya Moorjani, Joseph K. Pickrell, David Reich, Bonnie Berger

Abstract

Long-range migrations and the resulting admixture between populations have been an important force shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We de?ne an LD-based three-population test for admixture and identify scenarios in which it can detect admixture that previous formal tests cannot. We further show that we can discover phylogenetic relationships between populations by comparing weighted LD curves obained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the computation. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.

Link

5 comments:

Mark D said...

One topic I have not seen is the impact of slavery on the admixture of Med peoples such as Sardinians. You mention the results as being consistent with medieval history, but there was before that settlers from Carthage and Rome who no doubt brought with them slaves. Some African admixture could stem from that.

Creative said...

@Mark D
I think interracial slavery is only one aspect of the story, nevertheless if you look at the Fayum mummy portraits it is apparent that some sort of racial overlapping driven by the high cultures of the Mediterranean was present at all times.
For instance If Sardinia represent the ancient Sherden who were part of the Sea People coalition it would indicate that both sides of the Mediterranean were well aware of each other, especially in more peaceful times of Bronze Age maritime trade. The Hyksos capital of Avaris”Egypt” and its Minoan connection would underline this.

Mark D said...

@Creative
I'm sure you're right. I was only raising one aspect, slavery, that could account for any sub-Saharan African admixture. Cross-Mediterranean migrations and trade no doubt extend throughout prehistory and into historic times. I believe we all fail to realize the extent of our ancestors' wanderlust. That is why I accept admixture analysis from a small modern DNA sample with a bit of positive skepticism.

Juventus said...

Al Fayum portraits show pure mediterraneans faces not mix of race.

Aileen Kawagoe said...

Is the Y-DNA haplogroup DE 20.2 percent mentioned in your own paper on Greek Y- DNA considered to be upstream or ancestral to the hg D of the Ainu and Japanese? What are your thoughts on where the split of hg D from E might have taken place?