November 22, 2012

ALDER signal of admixture in Ashkenazi Jews

(You can skip the first part if you want, and head straight to the RESULTS section)

Previous studies on uniparental markers have indicated that Ashkenazi Jews (AJ) were formed by admixture between a Near Eastern population and European host populations; the evidence for the former element seems pretty clear on the basis of Y-chromosomes where Jews possess a relatively high frequency of Y-haplogroup J1 (and a few others) that are quite rare in non-Jewish north/east Europeans. As for the latter, it seems probable on the basis of the location of Ashkenazi Jews on PCA plots where they tend to occupy an intermediate position between extant populations of the Levant (including Near Eastern Jews) and non-Jewish Europeans.

Anyone who has played around with genetic data will know that while AJ may be positioned in the aforementioned "intermediate" location within the "West Eurasian continuum" between Europe and Near East, they tend to form their own cluster at higher dimensions. And, indeed, this is why it's fairly easy for a clustering algorithm, such as my "Clusters Galore" (MCLUST/MDS) approach to pick out a very specific AJ cluster (e.g., here, or here, using a fastIBD approach). An Ashkenazi Jewish-specific cluster also pops out at higher K in ADMIXTURE analyses. This cluster may reflect endogamy within the AJ community until quite recent times.

One way of detecting admixture in a group is through the use of f3-statistics. The statistic f3(AJ; European, Near_East) could be negative --which would indicate admixture-- but it is usually not -at least in the combinations of (European, Near_East) I've tried, and this is consistent with either the presence admixture or absence of admixture.

A simple and intuitive way to see why post-admixture drift might mask the presence of admixture can be seen by means of a simple calculation. Remember that the f3-statistic's +/- sign depends on the +/- sign of quantities (c-a)*(c-b) where c is an allele frequency in the admixed (?) population we are investigating, and a, b in the two reference populations. We can pick a to be less than b with no loss of generality.

In the absence of strong drift (e.g., if all populations have a very large number of individuals), then the allele frequency c=xa+(1-x)b where x is the amount of admixture --between 0 and 1-- from group A and (1-x) from group B, and this c will be maintained little changed in the post-admixture phase. With the aid of a little algebra, we get that:

(c-a)*(c-b) = (xa+(1-x)b-a)*(xa+(1-x)b-b)
= (xa+b-xb-a)*(xa+b-xb-b) =
= x(x-1)(a-b)^2

and this is of course negative because we assumed that x was less than 1.

In a large population, this c will remain near-constant, because of the lack of strong drift. As long as it remains within the interval (a,b), then (c-a)*(c-b) will also remain negative, and so will the f3 statistic.

But, what if strong drift affects the admixed population? Allele frequencies fluctuate more wildly in larger populations, so c might go outside the (a,b) interval. Without loss of generality, assume that c becomes greater than b in which case (c-a)*(c-b) will become positive.

The f3-statistic averages over many SNPs, so, depending on (i) the initial differentiation of the admixed populations, which could be seen as b-a, and (ii) the amount of drift, which causes c to jump outside the (a, b) interval as discussed above, it is possible that the evidence for admixture may disappear.

So, relying on allele frequency differences may help obliterate the signal of admixture. But, there is a different signal of admixture that uses the decay of admixture linkage-disequilibrium, most recently discussed in the ALDER paper. The admixture LD signal's evidence may also disappear in time, but only because the signal occurs at increasingly lower genetic distances over time due to recombination. Thankfully, it tends to occur at large enough --for the last few thousand years-- distances, for which the SNP density of existing genotyping platforms that measure a few hundred thousand SNPs per individual is sufficient.


Naturally I was curious to see whether the admixture LD mechanism would produce the evidence of admixture that the f3-statistics did not. I combined three datasets in my possession (HGDP by Li et al. Behar et al. and Yunusbayev et al. ) and identified sets of European and Semitic populations. (Remember that these sets are non-exhaustive, but presumably usable surrogates for the true mixing populations exist within them):

Abhkasians_Y, Adygei, Belorussian, Bulgarians_Y, Chechens_Y, Chuvashs, French, French_Basque, Georgians, Hungarians, Lezgins, Lithuanians, Mordovians_Y, North_Italian, North_Ossetians_Y, Orcadian, Romanians, Russian, Sardinian, Spaniards, Tuscan, Ukranians_Y


Bedouin, Druze, Egyptans, Ethiopian_Jews, Ethiopians, Iraq_Jews, Jordanians, Lebanese, Morocco_Jews, Palestinian, Saudis, Sephardic_Jews, Syrians, Yemenese, Yemen_Jews

I used my Dodecad Project sample of AJ which numbers 36 individuals and is larger than any other usable public sample available to me.

(ALDER was run with default parameters, using the Rutgets recombination map for Illumina chips, and with the merged dataset prepared with a --geno 0.03 flag. Note that the Ashkenazi_D sample consists of individuals typed on different Illumina platforms from 23andMe and FamilyTreeDNA. The total number of SNPs considered was 527,165.)


I report below the tests for which ALDER reported "success" for the test with no warnings:

The median of all these estimates is 36.78 generations or 1070 years which corresponds to a calendar date of 910CE, assuming the sample's birthday was 1980, and a generation length of 29 years.

Palamara et al. placed the beginning of demographic expansion of AJ in a similar timeframe (33 generations), following a severe founder effect reducing the population to ~270 individuals. Such a founder effect may have indeed served to produce positive f3-statistics, masking the presence of admixture, the occurrence of which appears to be substantiated on the basis of the ALDER test of admixture.

As for the levels of admixture, using a 1-ref analysis with the European populations, I get the following lower bounds:

I'd be interested in hearing people's opinions on the plausibility of these dates/proportions, as well as their potential historical associations; a lot of factors might affect these results, so perhaps this analysis could be improved in the future.


Anonymous said...

To put this in perspective, similar runs using Sephardic Jewish and Moroccan Jewish as test populations would be good.

Then ditto using Middle Eastern Jewish as the test population.

Questions that arise:

1) Is this pattern of mixture specific to only some Jewish populations?

2) Can the North European part of this mixture be isolated characterized (a la ANI-ASI)? Is it garden variety European mixture (Slavic or German or etc.), or something else?

3) Did the bottleneck detected in Ashkenazi Jewish populations affect one or both sources of this mixture (i.e., the Middle Eastern part or the North European part)?

4) Did the source populations mix and then experience a bottleneck, or did one or both source populations experience bottlenecks and then later mix?

Charles Nydorf said...

Thanks for the clear explanation of the positive f3!

Anonymous said...

Sorry to double comment. A couple more thoughts:

1. The reference pops that get big 1-ref %'s (French, Romanians, and Tuscans) get younger 2-ref date estimates. Compared to Lithuanians e.g. Not sure what to make of that.

2. Spanish really should be in there as a reference population.

3. To get a handle on these outputs, you should also run similar analysis using Ashkenazi Jewish samples as a reference pop. That goes for Europeans and Arabs (are they partly descended from Jewish populations), etc.

4. The Levite R1a seems to be non-Slavic and instead have an Asian origin. So far no one seems to be commenting on this.

5. I will throw something out there. What if the North European part of Ashkenazi Jewish populations came from absorbing from an exotic bottlenecked population, maybe say a remnant of "Northern Caucasoids" (as opposed to extant Northern Europeans) from outside of Europe.

This might only be a small % (say, 10%) of Ashkenazi Jewish genetics, but might account for the extreme bottleneck effect. It might also be a source of the Levite R1a.

Dienekes said...

2. Spanish really should be in there as a reference population.

THey are, Spaniards from Behar et al. (2010) but they don't produce that signal of admixture.

a said...

"the evidence for the former element seems pretty clear on the basis of Y-chromosomes where Jews possess a relatively high frequency of Y-haplogroup J1 (and a few others) that are quite rare in non-Jewish north/east Europeans."

Remember your Dendrogram?

West Asian is closer to North European. Red Sea is on a separate branch. This also shows in ydna clusters of West Asian Ydna lines, Q/R/G, and common Red Sea ydna clusters J&E.

Red Sea Jewish ancestral ydna lines, more likely to be found around "Saudis" region;

"Approximately 30% to 40% of Jewish men are in the paternal line known as haplogroup J[Note 1] and its sub-haplogroups. This Haplogroup is particularly present in the Middle East, Southern Europe, and Northern Africa.[14] Furthermore, 15 to 30% are in haplogroup E1b1b[Note 2] (or E-M35) and its sub-haplogroups."

West Asian ydna lines more likely to be found around "Mordovians" region;

Q-245, [10-20] R1b-R1a-R2 lines, including v-88, z2105- L277 L584 banches, R1a-93, and approximately 10 ydna G lines,some uncommon.

aspromavro said...

It would be interesting to do this analysis for the Balkans, starting from the premise that the Iron Age population was Sardinian-like, and then calculating the additional admixture from West Asia, North Europe, etc.

andrew said...

Historically, this coincides with one of the first major pogroms in Europe which led to a major population bottleneck, more or less at the same time as the earlier Crusdades. Jews were expelled from Britain around this time (and remained absent for centuries thereafter) and were massacred many other places. Despite arguably being the most severe of the Rabbinic era Judaism massacres other than the Holocaust, it is not very well known comparatively, perhaps because it was too late for inclusion in the Hebrew Bible and pre-dated the invention of the printing press.