September 07, 2008

How bottlenecks affect Y-STR variance (not much)

Continuing my investigation of how Y-STR variance changes over time (most recent entry), I wanted to see how a bottleneck affects Y-STR variance. An ice age, a plague, a major war or military defeat are all possible causes of a massive reduction in population size, with an associated loss of STR variance.

To study this, I modified my code, so that rather than having a unique growth constant m, there may be multiple segments during which growth is governed by a different constant. In particular, I simulated a period of g=99 generations with m=1 or m=1.05, but in the last (100th) generation, I set m to 0.5, 0.1, 0.01, representing a bottleneck equivalent to the loss of half, nine tenths, or ninety nine hundreds of the haplogroup population; I will call these "mild", "severe", or "extreme" bottleneck.

In the following table I list the observed reduction in Y-STR variance compared to the case where there is no bottleneck. As usual, the results are averaged over 10,000 runs.



Change in Variance (%)
m Mild Severe Extreme
1 -1.9 -2.03 -15.86
1.05 1.6 0.55 -4.98

As expected, variance decreases more in an extreme bottleneck, but not as sharply as one might expect. Indeed, if the long-term trend is positive (m=1.05), the expected variance of surviving groups may even increase after a bottleneck. What gives?

During a bottleneck, two things happen:
  • Y-STR variance within individual lineages decreases, as e.g. rare alleles are lost, however:
  • Low-frequency lineages (which usually have lower Y-STR variance) are more likely to be lost, i.e. to become extinct during the bottleneck.
Thus, while all lineages suffer a loss of Y-STR variance during a bottleneck, the loss of low-frequency lineages means that, lineages that survive after a bottleneck may on average, even have higher Y-STR variance than the ones before it.

Predictably, larger groups (created at m=1.05) experience less severe effects during a bottleneck. Note, also that in this simulation, haplogroups did not even grow to very large sizes (only ~1,600 men for m=1.05 and no bottleneck); thus, real-world haplogroups will probably be even less susceptible to bottlenecks.

Conclusion

Bottlenecks don't seem to reduce Y-STR variance dramatically, especially for large haplogroups. So, while they are a potential mechanism for reducing the effective mutation rate, by periodically removing variance, their efficacy is limited.

Indeed, bottlenecks not only reduce Y-STR diversity but also haplogroup abundance. Thus, in order to achieve the same present-day haplogroup abundance, growth in the post-bottleneck eras must proceed even faster than if no bottleneck had occurred, with a therefore higher effective rate during those rebound periods.

In conclusion, while dramatic bottlenecks at a time when human population sizes were small, e.g., the Ice Ages during the Paleolithic, may have effectively reduced Y-STR variance of surviving lineages, this effect seems to have been unimportant in large human populations emerging from the Neolithic onwards.

14 comments:

McG said...

I agree with your conclusion. Early papers by ZUL, supporting their initial premise are not persuading to me. I also think the 3.6X factor is fictitious. Using different sets of data and different dys loci, I get ratios between chandlers rates and ZUL's over a range of 2 to 4.

That there is a constraint on the way mutations occur across dys loci is one possible, but almost esoteric explanation. Chandler used mixed haplogroups in his analysis and I do know that in different haplogroups, dys loci have different rates, spec. dys loci 388 in I1a and R1b. Oddly enough though, the average mutation rate for each haplogroup appears to be about the same.

I continue to believe the evolutionary rate is the more correct rate, but I cannot explain why.

Darius said...

Bottlenecks do not occur randomly. Individuals that survive such a big "wipe-out" are most likely relatives with similar genetic features, ergo bottlenecks must affect Y-STR variance.

McG said...

The underlying question is do bottlenecks affect mutation rates? My answer is, similar to Dienekes, No. When you have a bottleneck, there is a reduction in the population and that shows up in the TMRCA equation as a different factor. The ASD equation is the ratio of the squared difference in allele values at a dys loci divided by the number of entries. I would agree that the possibility of the Variances "variance" increasing with a smaller number of entries but overall, as Dienekes shows it is not the root problem between evolutionary rates and germ-line. Net sum: I don't believe bottlenecks change mutation rates.

DNACousins said...

What happens if you introduce the bottleneck earlier in the process, say at the 25th, 50th, and 75th generation?

McG said...

A hiccup, I think? Again, the expression we're looking at is square difference over N. As long as mutation rates don't change, the variance is the same whether you have N = 1000 or 100,000. e.g., if I have a mutation rate of .002 per gen over 37 dys loci. Then for the two populations I should see in one generation 2 mu's and 2000 mu's. The variance is unchanged.

Fundamentally, as Dienekes first put it, the question is how does variance accumulate??? As Dienekes shows in his analysis m=1 for the Poisson model is probably not a good assumption, because of his Patriarch concept/analysis who may start things going with many sons. There are several "errors" that I see in current counting techniques:1. ASD/Variance overcounts multi-step mutations. 2. m=1 as the mean number of sons probably doesn't represent Patriarchs and their lines. 3. The way that variance is assumed to grow is by consecutive single steps down a single line. Especially for slow mutators, I don't think that happens often. The data seems to support that assumption? If it is true that multi-steps occur at the 5% level (assumption), then if a histogram of mutations at a dys loci show more mutations of 5% then consecutive single steps can be assumed to have happened. However if the histogram only shows 2 or 3% of the mutations exceed one, then, I believe, it is more probable that those mutations were multi-step?

I don't have Dienekes C++ programming skills or tools, so I am thinking this through with a hand calculator.

In sum, for fixed mutation rates the only variable that can affect TMRCA is variance/ASD and its changes. This is where, I believe, the problem lies - very simply, how do you count mutations that appear on a histogram. That is the data we have to work with!!! Outside of that, it is the applicability of the model we create.

Dienekes said...

Bottlenecks do not occur randomly. Individuals that survive such a big "wipe-out" are most likely relatives with similar genetic features, ergo bottlenecks must affect Y-STR variance.

This may be the case (or not), but is not really relevant for this simulation.

Dienekes said...

What happens if you introduce the bottleneck earlier in the process, say at the 25th, 50th, and 75th generation?

If you introduce it earlier it has a bigger effect than if you introduce it later, but not a really dramatic difference. The reason is that at the earlier date, the haplogroup has a smaller size, and hence is more "vulnerable". 100 survivors out of 1,000 in a "severe" bottleneck will retain most of the accumulated variance, but 5 out of 50 will lose a lot of it.

McG said...

The definition of increasing variance requires values greater than 1 from the modal to produce it. Variance is not required to make a TMRCA estimate, all that is needed to be known is the number of mutations which have occurred. In fact for slow mutators, which are the dominant number, no variance is usually contributed. Look over the historgrams provided by Robert Tarin in his Iberian ad non Iberian data sets at World Families network Variance was selected as a tool, early on before no data existed. It answered some of the mental what ifs pop gens asked at that time about counting mutations.

At this point in time with the abundance of dys loci available, we don't need to use ASD/Variance. We just need to use slow mutators and recognize multi-steps where possible. At the slow mutators, accumulated variance is zero and if you were using the slow dys loci, no hiccup would appear. At different points in time N would be simply different.

So my point is that with all the problems Variance/ASD introducues; why use Variance???

At present, in many data sets, the paucity of data at large number of dys loci is disconcerting, but in time this will be remedied.

McG said...

Please excuse my bad English/spelling.
a.histograms not historgrams
b. and not ad
c. period after Network
d. introduces not introducues.

This is an important subject to me and I hurried.

Dienekes said...

At this point in time with the abundance of dys loci available, we don't need to use ASD/Variance. We just need to use slow mutators and recognize multi-steps where possible.

Slow mutators have the advantage of no back-mutations within reasonable time frames, so they may be useful if someone uses an infinite alleles model.

They have the disadvantage (which can be validated by anyone using YMGS)of worse estimate variance.

For example

(m=1.02,g=100,N=10000,mu=0.0025)

Age (ASD/mu) = 58 (s.d.=82)

(m=1.02,g=100,N=10000,mu=0.0005)

Age (ASD/mu) = 59 (s.d.=163)

McG said...

Slow mutators have the advantage of no back-mutations within reasonable time frames, so they may be useful if someone uses an infinite alleles model.

They have the disadvantage (which can be validated by anyone using YMGS)of worse estimate variance.

I do not agree with your first statement. In my observations back and forward mutations are almost random, with generally about equal numbers. Consider the Tarin Iberian and non iberian data at: www.bartonsite.org/observed_R1b_Allele_Frequencies_Tarin. Lets just observe the first 12 dys loci and there +1/-1 mutations. 393: 5.1%+,2.8%-; 390: 14.7%+,23%-; 19: 7.8%+,1.5%-;391: 4.3%+, 28.3%-;385a: 8.9%+, 2.6%-; 385b: 17.2%+,10%-; 426: 1.0%+, 0.8%-; 388: 1.2%+, 0.5%-; 439: 13.2%+, 19.9%-; 389i: 12.8%+, 5.l8%-; 392: 10.9%+, 1.2%-; 389ii: 22%+,9.9%-.

I consider a slow mutator to be defined as one in which all mutations greater than +/-1 make up less than 5% of the total apparent mutations. Under that criterion, all the above dys loci except 385b are slow. I generall, do not use multiple copy dys loci when I make TMRCA estimates.

I am not trained in C++, so I could not evaluate your simulation. However, the above data, which I find typical does not support your back mutation statement.

Dienekes said...

Slow mutators are defined as those whose mutation rate is much lower than average.

In g generations, the probability that a marker undergoes one forward (+1 or -1) and one backward mutation which restores the original value is:

(g choose 2) * mu^2 * (1-mu)^(g-2)

You can run these two in google, to see that back-mutations are less probable if the mutation rate is low.

(100 choose 2)*0.0005^2*(1-0.0005)^98

(100 choose 2)*0.0025^2*(1-0.0025)^98

McG said...

What you are calling a "back" mutation is what I call a "hidden" mutation. It is simply not observable.

In my parlance I use a different, possibly incorrect math? I simply consider the mutation rate = P(mutation). If I sum over all dys loci, I get the P(mutation). One - that value is the P(no mutation). Usually of the order .998 or so. If I look up a tree line, the P(mutation) = the probability found for each dys loci. Whats the p( two mutations at the same dys loci down a tree line)? I argue they are independent events and it is the P(Mutation)^2. Using ZUL rates I see numbers like 10^-10 to 10^-12.

These are extremely small numbers, almost improbable. I do not believe "hidden mutations" affect the number of mutations counted in any significant way. By the way, it is just this argument back in the 90's that got popgens going down the trail of Variance to avoid these kind of questions. I really don't believe, for slow mutators, it has any measurable impact on the simple count of mutations.

McG said...

Additional comment: I should point out that "hidden mutations" cannot be counted by any counting technique, be it "simple" or ASD/Variance.