Showing posts with label ALDER. Show all posts
Showing posts with label ALDER. Show all posts

February 20, 2015

Bronze Age mixing of multiple populations => Armenians (?)

As far as I can tell, the hypothesis of "several mixtures" comes from looking at many pairs of populations and seeing that different types of pairs seem like they mixed to make Armenians. Possibility (1) is that Armenians have multiple mixtures, and possibility (2) is that none of the sources work very well.

Hellenthal et al. did not find mixture in Armenians, but they worked with a different methodology and smaller sample size. Either, the N=173 sample size enabled detection of this admixture, or differences in methodology account for differences in conclusions. If true, the admixture dates in this paper would be some of the earliest discovered by looking at modern populations (without the help of ancient DNA).

The TreeMix analysis (Figure 4) is inconclusive about admixture from a population best represented by Neolithic Europeans. There is no plot of residuals in this figure, so this model with one migration event may not be adequate. Prior knowledge suggests that it isn't, as Pakistani and European populations have no admixture in Figure 4.

It's great that the authors will share their data!
ftp://ngs.sanger.ac.uk/scratch/project/team19/Armenian
As of this writing, the data is not "live"; it might appear when the paper is published.

bioRxiv doi: http://dx.doi.org/10.1101/015396

Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations

Marc Haber et al.

The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain underrepresented in genetic studies and have a complex history including a major geographic displacement during World War One. Here, we analyse genome-wide variation in 173 Armenians and compare them to 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3,000 and ~2,000 BCE, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1,200 BCE when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of the Armenian ancestry may originate from an ancestral population best represented by Neolithic Europeans.

Link

August 08, 2013

Major admixture in India took place ~4.2-1.9 thousand years ago (Moorjani et al. 2013)

A new paper on the topic of Indian population history has just appeared in the American Journal of Human Genetics. In previous work it was determined that Indians trace their ancestry to two major groups, Ancestral North Indians (ANI) (= West Eurasians of some kind), and Ancestral South Indians (ASI) (= distant relatives of Andaman Islanders, existing today only in admixed form). The new paper demonstrates that admixture between these two groups took place ~4.2-1.9 thousand years ago.

The authors caution about this evidence of admixture:
It is also important to emphasize what our study has not shown. Although we have documented evidence for mixture in India between about 1,900 and 4,200 years BP, this does not imply migration from West Eurasia into India during this time. On the contrary, a recent study that searched for West Eurasian groups most closely related to the ANI ancestors of Indians failed to find any evidence for shared ancestry between the ANI and groups in West Eurasia within the past 12,500 years3 (although it is possible that with further sampling and new methods such relatedness might be detected). An alternative possibility that is also consistent with our data is that the ANI and ASI were both living in or near South Asia for a substantial period prior to their mixture. Such a pattern has been documented elsewhere; for example, ancient DNA studies of northern Europeans have shown that Neolithic farmers originating in Western Asia migrated to Europe about 7,500 years BP but did not mix with local hunter gatherers until thousands of years later to form the present-day populations of northern Europe.15, 16, 44 and 45
This is of course true, because admixture postdates migration and it is conceivable that the West Eurasian groups might not have admixed with ASI populations immediately after their arrival into South Asia. On the other hand, a long period of co-existence without admixture would be against much of human history (e.g., the reverse movement of the Roma into Europe, who picked up European admixture despite strong social pressure against it by both European and Roma communities, or the absorption of most Native Americans by incoming European, and later African, populations in post-Columbian times). It is difficult to imagine really long reproductive isolation between neighboring peoples.

Such reproductive isolation would require a cultural shift from a long period of endogamy (ANI migration, followed by ANI/ASI co-existence without admixture) to exogamy ~4.2-1.9kya (to explain the thoroughness of blending that left no group untouched), and then back to fairly strict exogamy (within the modern caste system). It might be simpler to postulate only one cultural shift (migration with admixture soon thereafter, with later introduction of endogamy which greatly diminished the admixture.

The authors cite the evidence from neolithic Sweden which does, indeed, suggest that the neolithic farmers this far north were "southern European" genetically and had not (yet) mixed with contemporary hunter-gatherers, as they must have done eventually. But, perhaps farmers and hunters could avoid each other during first contact, when Europe was sparsely populated. It is not clear whether the same could be said for India ~4 thousand years ago with the Indus Valley Civilization providing evidence for a large indigenous population that any intrusive group would have encountered. In any case, the problem of when the West Eurasian element arrived in India will probably be solved by relating it to events elsewhere in Eurasia, and, in particular, to the ultimate source of the "Ancestral North Indians".

It is also possible that some of the ANI-ASI admixture might actually pre-date migration. At present it's anyone's guess where the original limes between the west Eurasian and ASI worlds were. There is some mtDNA haplogroup M in Iran and Central Asia, which is otherwise rare in west Eurasia, so it is not inconceivable that ASI may have once extended outside the Indian subcontinent: the fact that it is concentrated today in southern India (hence its name) may indicate only the area of this element's maximum survival, rather than the extent of its original distribution. In any case, all mixture must have taken place somewhere in the vicinity of India.

A second interesting finding of the paper is that admixture dates in Indo-European groups are later than in Dravidian groups. This is demonstrated quite clearly in the rolloff figure on the left. Moreover, it does not seem that the admixture times for Indo-Europeans coincide with the appearance of the Indo-Aryans, presumably during the 2nd millennium BC: they are much later. I believe that this is fairly convincing evidence that north India has been affected by subsequent population movements from central Asia of "Indo-Scythian"-related populations, for which there is ample historical evidence. So, the difference in dates might be explained by secondary (later) admixture with other West Eurasians after the arrival of Indo-Aryans. Interestingly, the paper does not reject simple ANI-ASI admixture "often from tribal and traditionally lower-caste groups," while finding evidence for multiple layers of ANI ancestry  in several other populations.

My own analysis of Dodecad Project South Indian Brahmins arrived at a date of 4.1ky, and of North Indian Brahmins, a date of 2.3ky, which seems to be in good agreement with these results.

The authors also report that "we find that Georgians along with other Caucasus groups are consistent with sharing the most genetic drift with ANI". I had made a post on the differential relationship of ANI to Caucasus populations which seems to agree with this, and, of course, in various ADMIXTURE analyses, the component which I've labeled "West Asian" tends to be the major west Eurasian element in south Asia.

Here are the estimated admixture proportions/times from the paper:


Sadly, the warm and moist climate of India, and the adoption of cremation have probably destroyed any hope of studying much of its recent history with ancient DNA. On the other hand, the caste system has probably "fossilized" old socio-linguistic groups, allowing us to tell much by studying their differences and correlating them with groups outside India.

Coverage elsewhere: Gene Expression, HarappaDNA
Related podcast on BBC.

AJHG doi:10.1016/j.ajhg.2013.07.006

Genetic Evidence for Recent Population Mixture in India

Priya Moorjani et al.

Most Indian groups descend from a mixture of two genetically divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners, Caucasians, and Europeans; and Ancestral South Indians (ASI) not closely related to groups outside the subcontinent. The date of mixture is unknown but has implications for understanding Indian history. We report genome-wide data from 73 groups from the Indian subcontinent and analyze linkage disequilibrium to estimate ANI-ASI mixture dates ranging from about 1,900 to 4,200 years ago. In a subset of groups, 100% of the mixture is consistent with having occurred during this period. These results show that India experienced a demographic transformation several thousand years ago, from a region in which major population mixture was common to one in which mixture even between closely related groups became rare because of a shift to endogamy.

Link

November 26, 2012

Medieval signal of Swedish (?) admixture in Finland

I took the FIN (Finnish), GBR (British), and CDX (Chinese Dai) samples of the 1000 Genomes Project, each of which has a sample size of 100 in order to investigate the signal of East-West Eurasian admixture in Finns. While neither Britons nor Dai could be imagine of having contributed to Finns directly, they ought to make useful proxies of a NW European population lacking recent East Eurasian ancestry, and an East Eurasian population lacking recent West Eurasian ancestry respectively.

In the following, I will assume a generation length of 29 years and a sample birthyear of 1980 as in previous experiments.

First, the 1-reference analysis of FIN using GBR produced an admixture proportion lower bound of 37.4 +/- 5.1 percent.

The corresponding analysis of FIN using CDX produced an admixture proportion lower bound of 4.4 +/- 1.0 percent.

The 2-ref admixture test with {GBR,CDX} reported success:

Test SUCCEEDS (z=2.76, p=0.0057) for FIN with {GBR, CDX} weights
But, the decay rates were inconsistent, a situation which might occur when major admixture from different sources took place at different times. In particular, the one using CDX corresponded to 65.57 +/- 8.36 generations, and the one using GBR to 25.48 +/- 4.93 generations.

In calendar dates, Finns are estimated to have mixed with an East Eurasian CDX-like population between 170BC-320AD and with a NW European GBR-like population between 1100-1380AD.

The central date of the latter estimate is 1,240AD, which corresponds quite closely to the beginning of Swedish rule and is in the middle of the 13th. century, between the time when Finland was initially claimed for western Christendom (12th c.) and the time when the conflict between Sweden and Russia was settled (14th c.).

November 22, 2012

ALDER signal of admixture in Ashkenazi Jews

(You can skip the first part if you want, and head straight to the RESULTS section)

Previous studies on uniparental markers have indicated that Ashkenazi Jews (AJ) were formed by admixture between a Near Eastern population and European host populations; the evidence for the former element seems pretty clear on the basis of Y-chromosomes where Jews possess a relatively high frequency of Y-haplogroup J1 (and a few others) that are quite rare in non-Jewish north/east Europeans. As for the latter, it seems probable on the basis of the location of Ashkenazi Jews on PCA plots where they tend to occupy an intermediate position between extant populations of the Levant (including Near Eastern Jews) and non-Jewish Europeans.

Anyone who has played around with genetic data will know that while AJ may be positioned in the aforementioned "intermediate" location within the "West Eurasian continuum" between Europe and Near East, they tend to form their own cluster at higher dimensions. And, indeed, this is why it's fairly easy for a clustering algorithm, such as my "Clusters Galore" (MCLUST/MDS) approach to pick out a very specific AJ cluster (e.g., here, or here, using a fastIBD approach). An Ashkenazi Jewish-specific cluster also pops out at higher K in ADMIXTURE analyses. This cluster may reflect endogamy within the AJ community until quite recent times.

One way of detecting admixture in a group is through the use of f3-statistics. The statistic f3(AJ; European, Near_East) could be negative --which would indicate admixture-- but it is usually not -at least in the combinations of (European, Near_East) I've tried, and this is consistent with either the presence admixture or absence of admixture.

A simple and intuitive way to see why post-admixture drift might mask the presence of admixture can be seen by means of a simple calculation. Remember that the f3-statistic's +/- sign depends on the +/- sign of quantities (c-a)*(c-b) where c is an allele frequency in the admixed (?) population we are investigating, and a, b in the two reference populations. We can pick a to be less than b with no loss of generality.

In the absence of strong drift (e.g., if all populations have a very large number of individuals), then the allele frequency c=xa+(1-x)b where x is the amount of admixture --between 0 and 1-- from group A and (1-x) from group B, and this c will be maintained little changed in the post-admixture phase. With the aid of a little algebra, we get that:

(c-a)*(c-b) = (xa+(1-x)b-a)*(xa+(1-x)b-b)
= (xa+b-xb-a)*(xa+b-xb-b) =
= x(x-1)(a-b)^2

and this is of course negative because we assumed that x was less than 1.

In a large population, this c will remain near-constant, because of the lack of strong drift. As long as it remains within the interval (a,b), then (c-a)*(c-b) will also remain negative, and so will the f3 statistic.

But, what if strong drift affects the admixed population? Allele frequencies fluctuate more wildly in larger populations, so c might go outside the (a,b) interval. Without loss of generality, assume that c becomes greater than b in which case (c-a)*(c-b) will become positive.

The f3-statistic averages over many SNPs, so, depending on (i) the initial differentiation of the admixed populations, which could be seen as b-a, and (ii) the amount of drift, which causes c to jump outside the (a, b) interval as discussed above, it is possible that the evidence for admixture may disappear.

So, relying on allele frequency differences may help obliterate the signal of admixture. But, there is a different signal of admixture that uses the decay of admixture linkage-disequilibrium, most recently discussed in the ALDER paper. The admixture LD signal's evidence may also disappear in time, but only because the signal occurs at increasingly lower genetic distances over time due to recombination. Thankfully, it tends to occur at large enough --for the last few thousand years-- distances, for which the SNP density of existing genotyping platforms that measure a few hundred thousand SNPs per individual is sufficient.

METHODS

Naturally I was curious to see whether the admixture LD mechanism would produce the evidence of admixture that the f3-statistics did not. I combined three datasets in my possession (HGDP by Li et al. Behar et al. and Yunusbayev et al. ) and identified sets of European and Semitic populations. (Remember that these sets are non-exhaustive, but presumably usable surrogates for the true mixing populations exist within them):

Abhkasians_Y, Adygei, Belorussian, Bulgarians_Y, Chechens_Y, Chuvashs, French, French_Basque, Georgians, Hungarians, Lezgins, Lithuanians, Mordovians_Y, North_Italian, North_Ossetians_Y, Orcadian, Romanians, Russian, Sardinian, Spaniards, Tuscan, Ukranians_Y

and:

Bedouin, Druze, Egyptans, Ethiopian_Jews, Ethiopians, Iraq_Jews, Jordanians, Lebanese, Morocco_Jews, Palestinian, Saudis, Sephardic_Jews, Syrians, Yemenese, Yemen_Jews

I used my Dodecad Project sample of AJ which numbers 36 individuals and is larger than any other usable public sample available to me.

(ALDER was run with default parameters, using the Rutgets recombination map for Illumina chips, and with the merged dataset prepared with a --geno 0.03 flag. Note that the Ashkenazi_D sample consists of individuals typed on different Illumina platforms from 23andMe and FamilyTreeDNA. The total number of SNPs considered was 527,165.)

RESULTS

I report below the tests for which ALDER reported "success" for the test with no warnings:



The median of all these estimates is 36.78 generations or 1070 years which corresponds to a calendar date of 910CE, assuming the sample's birthday was 1980, and a generation length of 29 years.

Palamara et al. placed the beginning of demographic expansion of AJ in a similar timeframe (33 generations), following a severe founder effect reducing the population to ~270 individuals. Such a founder effect may have indeed served to produce positive f3-statistics, masking the presence of admixture, the occurrence of which appears to be substantiated on the basis of the ALDER test of admixture.

As for the levels of admixture, using a 1-ref analysis with the European populations, I get the following lower bounds:



I'd be interested in hearing people's opinions on the plausibility of these dates/proportions, as well as their potential historical associations; a lot of factors might affect these results, so perhaps this analysis could be improved in the future.

November 10, 2012

Investigating East Asian admixture in Balkans/Anatolia/Caucasus

I used ALDER with a dataset of populations from the Balkans, Anatolia, and Caucasus, using the She, Japanese, Miaozu, and Dai as East Asian references. A few caveats for this analysis:

  1. Some populations may possess "South Asian" admixture which may be mistaken for East Asian
  2. Populations differ in the number of SNPs used in the analysis; for example, the Armenian_D sample includes mostly Family Finder data which has a smaller overlap with the SNP set used
  3. Populations differ in the number of individuals used, from a low of 5 (e.g., Turkish_Cypriot) to a high of 45 (Armenian_D)
I have also added the HGDP Europeans to the analysis. The results can be seen below; I have made bold those rows where all estimates are at least one standard error above zero, and bold/big those where the estimates are they are two standard errors above zero. I consider the latter to be the most reliable.



I have already discussed the Turkish signal of admixture at length elsewhere. I will note that the Iranian_D sample produces similar or younger admixture dates, which would make sense, given the fact that the Iranians came under control of the Mongols, while the successes of the latter in Anatolia were short-lived.

A very interesting signal is that of the North Ossetians which show admixture ~9-10 centuries ago. This seems to have occurred a little after the foundation of the kingdom of Alania, and I think it makes excellent sense to view it as the signal of Eurasian nomads (who must have carried some East Asian admixture at that time) intermingling with pre-Iranic local Caucasus populations, Two other populations from the Caucasus, the Georgians and Lezgins (and also the Abkhaz and Chechens) show earlier admixture signals that could very well date to the period of east-west Eurasian migrations inaugurated by the Huns, although a possible Sassanian origin of such influence cannot be overlooked.

The Kurds are another interesting case where the Dodecad sample and the Yunusbayev et al  sample produce very different dates. The different number of SNPs may be at play here, or it may be that some Dodecad participants have recent Turkish ancestry that cause the admixture date average to appear lower, although the globe13 analysis suggests that the "East Asian" found here may be in fact "South Asian".

It seems to me that with large, dense, and well-curated sample sets from several of these populations, their admixture dynamics will become more distinct.

November 03, 2012

rolloff and ALDER analysis of Turks

I carried out rolloff analysis of the Behar et al. (2010) sample of Turks together with the sample of Uzbeks from the same, and the Yunusbayev et al. (2011) sample of Armenians. A --geno 0.03 flag was applied for merging and SNPs available in the Rutgers recombination map for Illumina chips were used.

The exponential decay can be seen below:

The signal of admixture seems pretty clear and extends up to several cM. Of course, as always, this does not mean that exactly these two populations mixed to form the Turks sample, but it does mean that they are reasonable standins.

The jackknife gives an admixture time estimate of 27.622 +/- 5.348 generations or 800 +/- 160 years, which of course makes perfect historical sense as it is a date between the first arrival of the Seljuks in Anatolia and the final consolidation of power by the Ottomans. Note also that this probably applies principally to this particular sample (which I believe is from Cappadoccia) and there were perhaps different admixture dynamics elsewhere.

I had started this analysis before the announcement of ALDER, but since it is very fast, I decided to give it a go as well. Below is the raw output:




                    *** Admixture test summary ***

Weighted LD curves are fit starting at 1.45 cM

Pre-test: Does Turks have a 1-ref weighted LD curve with Armenians_Y?
   1-ref decay z-score:    0.09
   1-ref amp_exp z-score: -0.01
                                  NO: curve is not significant

Pre-test: Does Turks have a 1-ref weighted LD curve with Uzbeks?
   1-ref decay z-score:    6.56
   1-ref amp_exp z-score:  5.02
                                  YES: curve is significant

Does Turks have a 2-ref weighted LD curve with Armenians_Y and Uzbeks?
   2-ref decay z-score:    5.61
   2-ref amp_exp z-score:  5.58
                                  YES: curve is significant

Do 2-ref and 1-ref curves have consistent decay rates?
   1-ref Armenians_Y - 2-ref z-score:                  0.01   ( 13%)
   1-ref Uzbeks - 2-ref z-score:                       0.69   ( 11%)
   1-ref Uzbeks - 1-ref Armenians_Y z-score:          -0.00   ( -1%)
                                  YES: decay rates are consistent

Test FAILS (z=5.58, p=2.4e-08) for Turks with {Armenians_Y, Uzbeks} weights

DATA: failure 2.4e-08 Turks Armenians_Y Uzbeks 5.58 -0.01 5.02 13% 23.92 +/- 4.26 0.00002930 +/- 0.00000525 27.18 +/- 302.36 -0.00000082 +/- 0.00013129 26.84 +/- 4.09 0.00002316 +/- 0.00000461

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B



The age estimate appears to be very similar, and most curves appear to be significant, except the one with Armenians_Y. This makes good sense. From Loh et al. (2012):
Also, if a reference A' shares some of the same admixture history as C or is simply very closely related to C, the pre-test will typically identify long-range correlated LD and deem A' an unsuitable reference to use for testing admixture.
In our case, A'=Armenians and C=Turks. We can be fairly sure that Armenians lack the same admixture history as Turks (because they were not affected by Central Asian Turkic invasions), but we can try a 1-ref analysis of Armenians with Uzbeks to substantiate it. The admixture lower bound estimate is a huge interval 7.6 +/- 88.2 and the jackknife is unable to estimate the admixture time. Thus, more plausibly, the second explanation applies, and because Armenians_Y are very closely related to Turks, they are deemed as an inappropriate reference to test admixture.

Finally, the lower bound of the admixture fraction for Turks with an Uzbek reference is estimated as:

Mixture fraction % lower bound (assuming admixture): 29.8 +/- 4.0

This is a very interesting number. We can be fairly sure that Central Asian Turkic people who invaded Anatolia carried with them an East Eurasian component, but in what proportion to their West Eurasian one? The East Eurasian element in Turks has been rather consistently estimated at ~5-7% with various methods, so perhaps this formed the minority element in the Turkic people who arrived in Anatolia. 

On the other hand, this case is rather muddled by the occurrence of by-directional gene flow: Uzbeks may have West Eurasian ancestry of ultimate West Asian origin, just as Turks have Central Asian ancestry. And, indeed, when we estimate the admixture fraction of Uzbeks with the Turks as a reference, we obtain:

Mixture fraction % lower bound (assuming admixture): 46.7 +/- 2.4

The age estimate for this is ~16 +/- 2 generations = 460 +/- 60 years. Very similar time estimates appear when Armenians are used as a West Eurasian reference. So, this might indicate that the Uzbek population was formed by admixture after the Anatolian Turks were so formed.

I see no easy way to solve the problem of estimating admixture proportions when both extant populations have been both donors and recipients of gene flow, but in any case, these numbers are something to think about.

Analysis of Turks with a variety of Turkic and East Asian populations

I subsequently formed a new dataset by merging the sample of Turks with a variety of Turkic and East Asian populations (same procedure for SNP choice).


For the calendar year calculation, I arbitrarily set the birthdate of the modern sampled individuals at 1980; I have no idea on the age profile of the individuals comprising the Behar et al. sample of Turks. I have also used a mindis=0.5cM which facilitated the convenient automated extraction of the dates from the ALDER output and also gave a level playing field for all the reference populations. The age picked by ALDER using its own adaptive threshold did not usually differ from the reported one by more than a few generations.

The results indicate two things:

  • The % of admixture depends on the choice of population, with highest amount using Uzbeks  as a reference, and lowest using the far Asian populations from China. This indicates our uncertainty regarding the East/West Eurasian-ness of the people who settled in Anatolia.
  • Admixture times, on the other hand appear to be fairly constant and appear to frame an important watershed moment of Anatolian history, the Battle of Manzikert which paved the way for the eventual Turkification of the peninsula. The Turkmen sample appears as an outlier in this respect, which might indicate that limited migration of Turkmen tribes may have occurred at a later date.

Admixture in the Chuvash and the Uygur

I took the Behar et al. (2010) sample of Chuvash, excluding GSM536731 which has atypical ancestry and merged it with the Li et al. HGDP French_Basque and Dai. The latter two populations don't show evidence of admixture according to both the f3-statistic and ALDER (Loh et al. 2012). (I used a --geno 0.03 flag in PLINK and extracted a subset of SNPs including in the Rutgers recombination map for Illumina chips).

The f3-statistic f3(Chuvashs_16; French_Basque, Dai) was equal to -0.011311 (Z=-31.308), indicative of admixture.

I then ran an ALDER analysis:


Test SUCCEEDS (z=4.85, p=1.2e-06) for Chuvashs_16 with {French_Basque, Dai} weights

DATA: success (warning: decay rates inconsistent) 1.2e-06 Chuvashs_16 French_Basque Dai 4.85 3.78 5.18 50% 40.27 +/- 5.80 0.00032377 +/- 0.00006676 28.21 +/- 7.47 0.00004231 +/- 0.00000962 47.08 +/- 4.53 0.00016628 +/- 0.00003212

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

This indicates that the Chuvash can be seen as admixed, but with inconsistent decays: the one with the French Basque (=28.21) is younger than the one with the Dai (=47.08). I think this makes fairly good sense, because the Chuvash are descended from people who came to Europe during the 1st millennium AD and must have later mixed with Europeans, perhaps with eastern Slavs as these made their way eastward during the 2nd millennium AD.

I then carried out similar analyses on the HGDP Uygur. As expected f3(Uygur; French_Basque, Dai) = -0.023917 (Z = -60.362), indicative of admixture. The ALDER analysis:


Test SUCCEEDS (z=6.85, p=7.4e-12) for Uygur with {French_Basque, Dai} weights

DATA: success 7.4e-12 Uygur French_Basque Dai 6.85 4.47 7.39 15% 20.56 +/- 3.00 0.00036760 +/- 0.00003660 22.59 +/- 5.06 0.00010920 +/- 0.00002025 19.46 +/- 2.64 0.00007864 +/- 0.00000710

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

suggests a very recent admixture on both the European and East Asian side. It seems fairly clear that whatever admixture was taking place in Central Asia, perhaps for thousands of years, the present-day Ugyur were formed, at least in part, by a fairly recent, perhaps post-Mongol admixture event.

November 02, 2012

ALDER estimates of East Eurasian admixture in Europe

I used the 1-reference method of ALDER to infer lower bounds of East Eurasian admixture in a few European populations. This method does not include a statistical test of admixture (as does the 2-reference one or the f3 test), but we can probably reasonably suppose that some such admixture did take place on the combined evidence of the f3 test and ADMIXTURE evidence.

In any case, I took the East Asian populations of Loh et al. (2012) which had no evidence of admixture with either ALDER or the f3 test, and also a few populations from Rasmussen et al. (2010) that included representatives of Siberian Uralic speakers, as well as the three main branches of narrow-sense Altaic (Turkic, Mongolian, Tungusic), and estimated lower bounds of admixture for a set of European populations. Results can be seen below:


The evidence for admixture appears most convincing in the 1000 Genomes Finns and HGDP Russians where the +/- interval does not intersect or approach zero irrespective of the Asian population chosen. For these populations, the percentages vary from ~4-5% for the "pure" East Eurasians to ~10% for some Siberian groups such as Selkup and Altai. Thes latter carry some West Eurasian admixture, so it makes sense that a greater deal of admixture with them is necessary to account for the observed "East Eurasian" influence. And, indeed, it is probably via such "intermediate" Siberian populations that some East Eurasian ancestry flowed into Europe, rather than via the relatively untouched populations of the Far East.

PS: Note that this probably represents the most recent signal of admixture, and not the older and more general "North Eurasian"/Amerindian-like admixture that, as Loh et al. mention in their paper cannot be captured with ALDER.

ALDER paper and software (Loh et al. 2012)

A new paper has appeared on the arXiv that introduces ALDER, a method for testing for admixture and inferring its parameters (when it happened and the proportions of the two mixing populations). You can get the software from here.

I have already tried it and I can confirm two claims in the paper (i) it's extremely fast, and (ii) it is conservative in the sense that it's test fails even when an f3 test of admixture indicates admixture. Here is a plot of one case where it detected admixture, ASW as CEU+YRI, I got the output on the right, which shows a very clear pattern of exponential decay. I also tried a different experiment using Mozabites as the admixed population. The results are quite interesting:

Test SUCCEEDS (z=10.39, p=2.7e-25) for Mozabite with {CEU30, YRI30} weights

DATA: success (warning: decay rates inconsistent) 2.7e-25 Mozabite CEU30 YRI30 10.39 6.75 11.39 55%  17.45 +/- 1.68 0.00037417 +/- 0.00003187 28.63 +/- 3.84 0.00005311 +/- 0.00000787 16.21 +/- 1.42 0.00023789 +/- 0.00001752

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

Notice that the 1-reference decay using CEU is 28.63 and with YRI it is 16.21, while the 2-reference (both CEU and YRI) is an intermediate 17.45. I believe that this is capturing the same behavior as Jin et al. (2012), according to which:
There was an almost complete absence of recent gene flow from European populations to the Mozabite gene pool (Figure 6A). For the Sub-Saharan African ancestral component, there were more long CSDAs at the tail of empirical distribution than those in the HI model, which confirmed that recent gene flow from African populations had contributed to the Mozabite gene pool (Figure 6B). 
This is also what ALDER is telling us, since the decay using CEU is more "abrupt" (hence lack of long segments of admixture that might indicate recent admixture), while that using YRI is less so (and hence recent Sub-Saharan admixture has contributed longer segments).

In any case, enough with my own preliminary experiments. From the paper itself, there are interesting applications of the new methodology for Sardinians, Japanese, and Central African Pygmies:
Both Central African Pygmy populations in the HGDP, the Mbuti and Biaka, show evidence of admixture (Table 1), about 28 +/- 4 generations (800 years) ago for Mbuti and 38 +/- 4 generations (1100 years) ago for Biaka, estimated using San and Yoruba as reference populations (Figure 2A,C). The intra-population heterogeneity is low, as demonstrated by the negligible affine terms. In each case, we also generated weighted LD curves with the Pygmy population itself as one reference and a variety of second references. We found that using populations French, Han, or Yoruba as the second reference gave very similar amplitudes, but the amplitude was significantly smaller with the other Pygmy population or San as the second reference (Figure 2B,D). Using the amplitudes with Yoruba, we estimated mixture fractions of at least 15.9 +/- 0.9% and 28.8 +/- 1.4% Yoruba-related ancestry for Mbuti and Biaka, respectively. 
For Sardinians:
We detect a very small proportion of Sub-Saharan African ancestry in Sardinians, which our ALDER tests identified as admixed (Table 1; Figure 3A). To investigate further, we computed weighted LD curves with Sardinian as a test population and all pairs of the HapMap CEU, YRI and CHB populations as references (Table 2). We observed an abnormally large amount of shared long-range LD in chromosome 8, likely do to an extended inversion segregating in Europeans (PRICE et al. 2008), so we omitted it from these analyses. The CEU–YRI curve has the largest amplitude, suggesting both that the LD present is due to admixture and that the small non- European ancestry component, for which we estimated a lower bound of 0.6+/-0.2%, is from Africa. The existence of a weighted LD decay curve with CHB and YRI as references provides further evidence that the LD is not simply due to a population bottleneck or other non-admixture sources, as does the fact that our estimated dates from all three reference pairs are roughly consistent at about 40 generations (1200 years). Our findings thus confirm the signal of African ancestry in Sardinians reported in MOORJANI et al. (2011). The date, small mixture proportion, and geography are consistent with a small influx of migrants from North Africa, who themselves traced only a fraction of their ancestry ultimately to Sub-Saharan Africa, consistent with the findings of DUPANLOUP et al. (2004).
Moorjani et al. (2011) had estimated 2.9% admixture in Sardinians occurring at 71 +/- 28 generations, so the new results appear to be different, perhaps on account of the the treatment of the chromosome 8 inversion or the ability of ALDER to pick the distance threshold (hard-set at 0.5cM in rolloff) adaptively. Also, note that ALDER is able to estimate admixture proportions based on the amplitude of the weighted LD, whereas in the previous test the proportions were calculated using an F4 ratio test which did not take into account East Eurasian-like gene flow into the CEU population, and considered both CEU and Sardinians as having experienced no Asian-related gene flow.

So it appears that the African admixture in Sardinians is real, but may be both lower and later than previously estimated. In a recent experiment, I "scrubbed" possible segments of African ancestry in Sardinians, and this diminished their African ancestry from 3.1% to 1.8%. If we consider the 1.8% to be the spurious admixture due to Asian-related gene flow into northern Europe, then African admixture in Sardinians will be the remainder 1.3%, and perhaps lower due to the very "intensive" nature of the scrubbing procedure.

globe4 estimates African admixture in Sardinians as 0.8%, with some heterogeneity in its apportionment in 28 different individuals (left), with three individuals appearing as outliers and the remainder randomly distributed around the 0.8% median. The outlier individuals are HGDP01062, HGDP01076, and HGDP01071; the last of these is not included in the curated version of HGDP released by Patterson et al. (2012). ALDER includes a facility for detecting heterogeneity in admixture, but I did not see this particularly discussed in my first scan of the paper. In any case, it now appears that different methods converge on a small African admixture in Sardinians, and the 1200-year old age estimate seems consistent with medieval history.


The paper also deals with the Japanese: 
Genetic studies have suggested that present-day Japanese are descended from admixture between two waves of settlers, responsible for the Jomon and Yayoi cultures (HAMMER and HORAI 1995; HAMMER et al. 2006; RASTEIRO and CHIKHI 2009). We also observed evidence of admixture in Japanese (Table 1), and while our ability to learn about the history is limited by the absence of a close surrogate for the original Paleolithic mixing population, we were able to take advantage of the one-reference inference capabilities of ALDER. We observed a clear weighted LD curve using HapMap JPT as the test population and JPT–CHB weights (Figure 3B). This curve yields an estimate of 45 +/- 6 generations, or about 1,300 years, as the age of admixture. To our knowledge, this is the first time genome-wide data have been used to date admixture in Japanese. As with previous estimates based on coalescence of Y-chromosome haplotypes (HAMMER et al. 2006), our date is consistent with the archaeologically attested arrival of the Yayoi in Japan roughly 2300 years ago (we suspect that our estimate is from later than the initial arrival because admixture may not have happened immediately). Based on the amplitude of the curve, we also obtain a (likely very conservative) genome-wide lower bound of 41 +/- 3% “Yayoi” ancestry using formula (12) (under the reasonable assumption that Han Chinese are fairly similar to the Yayoi population). It is important to note that observation of a single-reference weighted LD curve is not sufficient evidence to prove that a population is admixed, but we did find a pair of references with which the ALDER test identified Japanese as admixed, which, combined with previous work and the lack of any signal of reduced population size, makes us confident that our inferences are based on true historical admixture.
This is a useful application of the idea that you don't need both reference populations to estimate admixture. If a population A experiences gene flow from another B, then A will become more like B over time, and allele frequency differences between A and B will diminish but will continue to reflect differences between the local and introgressing element. This idea was first used by Pickrell et al. (2012), and a new variation of it is used in the current paper.

According to Wikipedia, Japanese skeletons of the Kofun period resemble those of modern Japanese, so perhaps the age estimate is a little younger than the actual period of admixture. In any case, perhaps admixture between populations carrying varying amounts of Yayoi/Jomon ancestry was not instantaneous, so ALDER is not picking up the beginning of a continuous process that lasted for several centuries.

Finally, there is a reference to another paper currently in submission: "MOORJANI, P., N. PATTERSON, P. LOH, M. LIPSON, and OTHERS, 2012 Reconstructing Roma history from genome-wide data. In submission." Given that the Roma likely possess really old West Eurasian admixture related to "Ancestral North Indians", as well as really recent European admixture after they migrated to Europe, and perhaps even intermediate West/Central Asian admixture as they made their way from India to the west, this seems like a very complicated case, involving admixture at different time scales, and between different but related populations, so it will be interesting to see how it will all fit together.

To conclude, ALDER seems like a very practical tool for studying admixture in human populations, so I'm sure it will prove quite useful in the future.

arXiv:1211.0251 [q-bio.PE]

Inference of Admixture Parameters in Human Populations Using Weighted Linkage Disequilibrium

Po-Ru Loh, Mark Lipson, Nick Patterson, Priya Moorjani, Joseph K. Pickrell, David Reich, Bonnie Berger

Abstract

Long-range migrations and the resulting admixture between populations have been an important force shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We de?ne an LD-based three-population test for admixture and identify scenarios in which it can detect admixture that previous formal tests cannot. We further show that we can discover phylogenetic relationships between populations by comparing weighted LD curves obained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the computation. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.

Link