Dienekes’ Anthropology Blog: Y-STR variance of Busby et al. (2011) dataset

August 27, 2011

Y-STR variance of Busby et al. (2011) dataset

I calculated the Y-STR variance of the Busby et al. (2011) dataset, for both the 10 and 15 Y-STR sets, as well as 4- and 5-most "linear" subsets thereof. Generation length of 31.5 years is used for the calendar year estimates.

My position that Y-STRs are effectively dead for age estimation stands, but I thought it'd be a good exercise to do this, as my personal adieu to more than a decade of Y-STRs: they didn't live up to their promise, but, indirectly, they helped create an entire field of "genetic prehistory" that will live on after their demise.

The greatest contribution of the Busby et al. (2011) paper is that it has cured the naivete of some who bought into the "more STRs = more accuracy" scheme. After this paper all Y-STR based estimates (including my own, above) are suspect.

The non-linearity of the Y-STR mutation model is only one of the problems of Y-STRs. Over the last few years, I've examined many commonly held wrong assumptions about the way Y-STRs have been used:

The "evolutionary" mutation rate and its inflated dates
The lack of appreciation of the true confidence intervals of age estimates (even under a well-behaved, symmetric stepwise mutation model), which are wider than believed by many, once uncertainty about generation length, mutation rates, and the inherent stochasticity of the mutation process is taken into account
A common conflation of haplogroup ages with migration events; a migration event may be actually much older or much younger than the Y-STR variance age, usually the latter, except in rare cases of the colonization of islands or remote regions of the world.
Influence of foreigner contamination or relics in the estimation of population ages.
Impact of population demography to age estimates, even "interclade" ones

From now on I am going on a Y-STR boycott on this blog. Y-STRs still have their obvious uses, for recent genealogy, or forensics. They may also convey some information about human prehistory in the broadest time scales.

But, on the whole, they are worse than useless for the prehistorian: not only do they produce estimates fraught with danger, but also, being the only game in town, are prone to over-interpretation and spurious associations.

Thankfully, it will only be a few years more until we can move past the Y-STR swamp, and into the more promising territory of well-behaved unique event polymorphisms that are currently too costly to type on a large number of samples. Archaeogenetics will also help, although that, too, has its own perils (namely contamination, and the inability to get data from the hot and humid regions of the world).

One way or another, we're bound to know more in the future, and destroying the Y-STR behemoth is the first step toward making some real progress in genetic prehistory.

55 comments:

Anatole Klyosov said...: Excellent, Dienekes. I truly appreciate your boycott. It means that one more person who understands nothing in the area, is out.

Until you and other realize that DNA genealogy takes the chemical kinetics approach, and it has nothing to do with "population genetics", you fail to obtain meaningfull data.

Here are a few rules of DNA genealogy:

(1) Separate a haplotype dataset into DNA-lineages. Typically, there is a mix of them in almost any dataset. In those cases a "common ancestor" is a phantom.

(2) Employ the mutation rate constant which is calibrated and which is different for ANY haplotype format. There are more than 30 haplotype formats in current use. Hence, there are more than 30 mutation rate constants which should be in use.

(3) Employ well-defined criteria to prove that every separate DNA-lineage in a dataset has one and only one common ancestor. There are several criteria, and two principal ones are to be (a) a separate branch on a haplotype tree, and (b) a fit between a "linear" and "logarithmic" calculation procedures. In other words, time-dependent dynamics of mutations for each lineage should obey the first-order kinetics.

The way how you have "calculated" data based on mutations across a dataset is exactly the same as Zhivotovsky did. Threw everything into a blend, got some meaningless cocktail, and voila. The problem with Zhivotovsky and yours "calculations" is not a wrong mutation rate, but lack of separation of DNA-lineages. In those cases all "calculations" are doomed.

You better listen to a professional in chemical and biological kinetics, rather then follow you unqualified and primitive way of consideration of pretty complex haplotype datasets, which, neverthelesss, obey very clear rules of kinetics. If you believe that anthropologists understand physical chemistry in general, and chemical kinetics in particular, you are completely wrong.

Regards,

Anatole Klyosov; Sunday, August 28, 2011 1:46:00 am
Dienekes said...: Anatole, I've simulated your method and it doesn't work; the logarithmic method in particular is crap and has incredibly huge confidence intervals.

Your lineage sorting is also crap, since haplotype clusters are not lineages, ancestral haplotypes can't be inferred (hence the need for "base" haplotypes). You go one step further and infer a number of phantom ancestors, reifying haplotype trees as phylogenies.

http://dienekes.blogspot.com/2008/09/reconstructing-ancestral-allele-value.html

In particular, erroneously reconstructed ancestral haplotypes lead to a systematic underestimation of ages.

Your method overlooks all important sources of uncertainty to come up with artificially low confidence intervals. You believe your own BS and come up with fanciful scenaria of R1b Proto-Turks invading Europe, or R1a's coming from the Altai to Europe and then going back to Siberia all with clockwork precision.

You better listen to a professional in chemical and biological kinetics

Perhaps I will when I get into the biochemical business; for the time being I don't have to listen to you at all.

PS: I'd appreciate it if you stopped spamming my inbox with unsolicited copies of your genetics papers.; Sunday, August 28, 2011 2:21:00 am
Anatole Klyosov said...: Well, no further comments then. They do not fall into a receptive ear.

Regards,

Anatole Klyosov; Sunday, August 28, 2011 3:56:00 am
Anatole Klyosov said...: O.K., since the readers are silent, let me explain why the Fig. 4 from the paper ("Here is a figure from the paper, showing age estimates of sub-haplogroups R-S21 vs. R-S116") is not adequate, mildly speaking.

First, they took totally wrong mutation rates - from Ballantyne et al (2010) [from father-son transmission]. The data are based on just a few mutations, and awfully unreliable. Let me show it.

Everyone who worked with haplotypes and their mutations, knows that DYS393 is a very slow marker, and DYS390 is a fairly fast one. Indeed, Chandler's table, the most reliable one for the first 12 markers, shows the respective mutation rate constants as 0.00076 and 0.00311 (mutations per marker per generation), a 4-time difference.

Whar do we see in the Ballantyne's data? 0.00211 and 0.00152 (!) DYS393 is FASTER than DYS390 (!!). An utter nonsense. How did it happen? Very simple: Among 1758 father-son pairs Ballantyne et al observed just 3 mutations in DYS393, and 2 mutations in DYS390, and they took it (!) as a solid base for their absurd mutation rate constants.

This is applicable to all their "mutation rates". The reason is that among those almost 2000 father-son pairs, there were 3, 2, 7, 5, 3, 6, 0, 0, 6, 9, 1, 6 mutations in the first 12 matkers. It just cannot be used for mutation rate estimates.

Enough? Not quite. Just a minor thing in this context. When one works with "fast" markers, a correction for back mutations is MUCH higher. Otherwise one obtains a systematic deviation in TRCAs from slow to fast markers.

So, the conclusion number one: forget about Fig. 4 and the "principal conclusions" of the Busby et al paper. They are all wrong.

Now, I can present here data on "age estimates of sub-haplogroups R-S21 vs. R-S116", based on a much better approach. This is an important question, because it likely sets a good DNA-related time estimate for Bell Beaker movements from Iberia up North to the continental Europe.

I think, Dienekes, that since you presented here negative "side of the coin" you with your fairness would like to see its positive side. Aren't you?

Thank you.

Anatole Klyosov; Monday, August 29, 2011 3:21:00 pm
Tiger Mike said...: Oh, well. I suppose Mark Twain would have the appropriate words.... "Reports of my death have been greatly exaggerated."; Monday, August 29, 2011 6:23:00 pm
Anatole Klyosov said...: O.K., good. Now, let's move to HOW TMRCAs from U106 (S21) and P312 (S116) haplotype datasets should by properly calculated.

For U106 I will use an old (2008) collection of 284 haplotypes, from U106 the FTDNA Project. For those who want to know, I can give a reference to the published haplotype tree of all 284 haplotypes. It is nice and symmetrical, showing that it all derived from one (in terms of DNA genealogy) common ancestor.

The 25 marker base haplotype of the tree is as follows:

13 23 14 11 11 14 12 12 12 13 13 29 -- 17 9 10 11 11 25 15 19 29 15 15 17 17

All 284 25 marker haplotypes contain collectively 1853 mutations from the above base haplotype. It gives 1853/284/0.046 = 142 --> 166 "conditional" generations (25 years each), that is 4150+/-430 years to a common ancestor.

Here 0.046 is the mutation rate constant for 25-marker haplotypes, 142-->166 is the correction for back mutations, as both described in J. Genet.Geneal. (2009).

In order to verify the calculation procedure, just for fun, one can look at the Donald Clan dataset for the main, the Red Group R1a1 haplotypes. Rather recently the list contained 60 haplotypes, currently there are 124 haplotypes in the group. They contained 69 and 166 mutations, respectively, in 25 marker haplotypes.

It gives 69/60/0.046 = 25 --> 26 generations, that is 650+/-100 years to the common ancestor for the earlier dataset, and

166/124/0.046 = 29-->30 generations, that is 750+/-95 years to the common ancestor.

For the record, John Lord of the Isles died in 1386.

O.K., back to R1b1a2. U106 subclade has 4150+/-430 years to its common ancestor, calculated from 25 marker haplotypes. Anyone can go to the U106 Project, get 25, 37 and 67 marker haplotypes, and calculate each series the same way. The muttaion rate constant for 67 markers is 0.12, for 37 markers it is 0.09. I bet you will get the same figure within the margin of error or much better for each series.

The same procedure for P312 can be applied for 337 P312 haplotypes in Mike Walsh FTDNA "L21" Project. They contain 1981 mutations in 25 marker format, 3663 mutations in 37 marker format, and 4956 mutations in 67 marker format.

Therefore,

1981/337/0.046 = 128-->147+/-15 generations,

3663/337/0.09 = 121 -->138+/-14 generations,

4956/337/0.12 = 123 -->141+/-14 generations

It all gives 3675, 3450, and 3525 years, plus-minus as indicated.

As you see, all complaints about "meaninglessly wide margins of error" are false.

In the 67 marker format the U106 and P312 base haplotypes differ from each other by 6 mutations. This places THEIR common ancestor, presumably L11, at 4800 years bp. I can explain how it was calculated. VERY easy.

Anatole Klyosov; Monday, August 29, 2011 11:30:00 pm
jeanlohizun said...: Anatole Klyosov said:

O.K., since the readers are silent, let me explain why the Fig. 4 from the paper ….

Firstly I would like to said that unlike Chandler’s et al(2006) publication in the Journal of Genetic Genealogy, at least the Ballantyne et al(2010) paper actually studied fathers-son pairs, so in that sense is far more reliable and descriptive than panels of 12, 25 or 37 markers from amateur Projects as FTDNA.

Anatole Klyosov said:

Whar do we see in the Ballantyne's data? 0.00211 and 0.00152 (!) DYS393 is FASTER than DYS390 (!!). An utter nonsense.....

So what if the sample size of mutations was small, at least this was an actual generation (father-son) mutational measurement, not some calculated mutations based on the assumed descendants of a man who live 650 years ago, which can lead to large confidence intervals in terms of errors, especially when assuming a constant generation time of 25 years regardless of the place of origin or history of each of the lineages. I have to say: Welcome to science doctor!! Sometimes randomly collected data doesn’t yield the preconceive results one was expecting.

As for your formulas for correction of back mutation, they have certain curiosities that perhaps the readers ought to know. For example whenever the tree is asymmetric, meaning epsilon is not equal to 0.5, the correction factor doesn’t work, or it yields the same corrected mutation regardless of the observed mutation. Now for a symmetric tree, the higher the mutation rate per marker the higher the correction. How did you test your exponential formula? What led you to believe that back mutations increment in an exponential way as time goes by? Did you experimentally tested the result, or was it one of your assumptions from observed data from FTDNA. Anyhow this polynomial fits with R2=1 your corrected mutation per marker(y) data.vs observed mutation per marker data(x)

y=0.7284x2 + 0.9515x + 0.0021

Anatole Klyosov said:

So, the conclusion number one: forget about Fig. 4 and the "principal conclusions" of the Busby et al paper. They are all wrong.

As much as you wish this paper was wrong, the essence of what this paper has brought over cannot be overturned by your rants about mutation rates in experimentally obtained data. Because the main argument brought forwards by this paper (Busby et al(2011)) against Balaresque et al or your theories is the fact that R1b-M269(xL11) achieve its highest variance in Central Europe(Figure-2b), something that cannot be invalidated by whatever value they chose to use for the mutation rate per generation for DYS393 or DYS390.

Regards,

Jean Lohizun; Tuesday, August 30, 2011 3:43:00 am
Anatole Klyosov said...: As always in the commentor's remarks, there is no substance.

Comment 1: negative and irrelevant

Comment 2: too wordy and incorrect

Comment 3: off target, highly questionable, and irrelevant to the subject of this discussion.

Anatole Klyosov; Tuesday, August 30, 2011 1:20:00 pm
jeanlohizun said...: Anatole Klyosov said:

As always in the commentor's remarks, there is no substance.

Comment 1: negative and irrelevant

Comment 2: too wordy and incorrect

Comment 3: off target, highly questionable, and irrelevant to the subject of this discussion.

Spare me the Ad Hominems sir, let the readers decide whether there is substance or not. I see that now we are to the point of stubbornly saying: "You are wrong". I was hoping someone who has written so many papers was going to be able to engage in a friendly debate a bit better, instead of just calling any position that doesn't agree with his "wrong". So you think my first comment was negative and irrelevant, yet when you called the data of Ballantyne et al (2010) "An utter nonsense or absurd" it was ok. Comment 3 has everything to do with the subject of the discussion, given the fact that you were already happily calling the conclusions of Busby et al (2011) "All wrong".

Now onto your studies: You provided the following equations to calculate the corrected average number of mutations per marker (λ) to account for back mutations given the observed average number of mutations per marker (λobs):

λ = (λobs/2)(1+epx(a_1* λobs))

a_1 = 1- a^0.8

a = (2*ε -1)^2

You mentioned that ε is the degree of asymmetry (Where 0.5<=ε<=1 ). How did you test the robustness of this correction? Also you give out an average mutation rate per generation per marker, yet you don't give out the standard deviation of that average mutation rate, would you care to provide such value for say 5, 10, 15, 19, 25 and 37 STR Markers. Also the 95% Confidence Intervals would be appreciated for the average calculated.

Now I believe that you calculated the average mutation rate per generation for the Iberian samples of Adams et al(2008) using the Donald Clan taken from the Donald Clan FTDNA Project of R1a1. I'm assuming you likely looked at the total number of mutations in the same 19 STR that Adams et al(2008) typed into the Iberian sample, and that number turned out to be 63 mutations. Now the question is how you came up with the 26 generations number; you mentioned that John Lord of the Isles died in 1386, so it seems to me you simply took the year when the paper was written (2008) subtracted the year of death of John Lord 1386, and divided the result by 25 years per generation. This yields 24.88 generations; likely you added 1 or 2 generations assuming he was a father or a grandfather when he died, and voila 26 generations. Did I get that right? Now here is a major issue with your calculation, not every generation is of 25 years, often times, especially during the middle ages, people had children even when they were 50. For example I know this person, who was born in 1983, and he has a distant ancestor born in 1560 in Lekeitio, Vizcaya, by assuming a generation time of 25 years, they must be 16-17 generations apart from each other. In reality they are only 12 generations apart from one another, and I know people younger than him who are once removed from the same ancestor, meaning they are 11 generations apart from that common ancestor. Thus assuming a generation time of 25 years can sometimes lead to erroneous mutation rates per generation. Which is why is far better to do what Ballantyne et al (2010) did of actually measuring the mutations in a sample of fathers and sons.

Summary:

1-What are the SD and CI of your average mutation rates per marker/haplotype.

2- How did you come up with the correction formula, was there any experiment, or was it just that it would fit your data?

3- How do you account for irregular generations (i.e. More than 25 years, less than 25 years, adoptions, etc) when calculating average mutation rate per generation using this ancestry clan projects from FTDNA?; Wednesday, August 31, 2011 3:35:00 pm
Anatole Klyosov said...: Dear friend,

Essentially, what you have asked me is to educate you. However, your track recond in this and a "parallel" discussion does not arm me with any enthusiasm to do that. The reason is simple - whatever I tell you, you jump and attack. Not that I care (I could not care less), however, I feel that it is not worthwhile to lecture you.

If someone else ask me the same questions, I will explain gladly. After all, it is my job to explain things to people who are willing to listen.

A hint: I have published answers and explanations to all those questions you asked. And to hundreds of other questions which you have not asked.; Wednesday, August 31, 2011 10:12:00 pm
jeanlohizun said...: Anatole Klyosov said:
Dear friend,

Essentially, what you have asked me is to educate you. However, your track recond in this and a "parallel" discussion does not arm me with any enthusiasm to do that. The reason is simple - whatever I tell you, you jump and attack. Not that I care (I could not care less), however, I feel that it is not worthwhile to lecture you.

If someone else ask me the same questions, I will explain gladly. After all, it is my job to explain things to people who are willing to listen.

A hint: I have published answers and explanations to all those questions you asked. And to hundreds of other questions which you have not asked.

Perhaps I should make the following clarifications:

I’m not asking you to educate me in Genetics or Genetic Genealogy. I’m asking you as a peer and fellow scientist (Yes I’m in Academia) to answer the following questions. Like I said before, I’m not trying to attack you or anything; I do find it ironic that you are so susceptible to engage in a friendly debate, when you have called the work of others very harsh things. Again the questions still stand:

1-What are the SD and 95% CI of your average mutation rates per marker/haplotype, for the most commonly used sets in your studies.

2- How did you come up with the correction formula, was there any experiment, or was it just so that it would fit your data?

3- How do you account for irregular generations (i.e. More than 25 years, less than 25 years, adoptions, etc) when calculating average mutation rate per generation using this ancestry clan projects from FTDNA?

How do you make sure the following doesn’t happen:

We have a DNA Project for R1b-M269 bearers, and they were tested using say 10 STRs. Then on that sample we have Murcians(n=23) and Catalans (n=200). It just so happens that the Murcians have 150 mutations from the base haplotype, on the other hand the Catalans have 400 mutations from the base haplotyple. Thus if we analyze the sample indiscriminately without separating the two subgroups, it yields that there are a total of 550 mutations, which translates into an average mutation rate per marker of 550/(223*10)=0.2466. Then using a constant mutation rate of 0.0025 (This is a hypothetical number), it yields 98 generations, which of course is then multiplied by a constant for back mutations.
Now if instead of using the whole sample we take just the Murcians(n=23) it turns out that it yields a total of 150 mutations which translates into an average mutation rate per marker of 150/(23*10)=0.6521, again using the 0.0025(hypothetical) gives out 261 generations, which translates into a TMRCA 2.66 times older than that of the combined sample.

Again I did not get the impression from your analysis of Adams et al(2008) that you looked at each single Iberian population independently, nor did I get the impression that you broke down haplogroups R1b-M153 or R1b-M167 from the R1b(xM65,M153 or M167) to do the STR diversity analysis.

Regards,

Jean Lohizun; Wednesday, August 31, 2011 11:15:00 pm
Anatole Klyosov said...: O.K., I have noticed a change in your tone. Let's make peace not war.

1. Standard deviations, margins of error, confidence intervals.

They are determined by a number of mutations in a properly handled haplotype dataset, by an assumed margin of error in the mutation rate constant, by a choice of sigma confidence level, and by actual fit to actual data (actual genealogy and actual known historical events).

After calculations of hundreds and probably thousands of haplotype datasets, I came to a formula which makes the best (it seems) and reasonable fit with actual data. I have demonstrated the result above in this thread (with the example of 337 haplotypes using 25, 37 and 67 marker haplotypes; explanations are given above):

1981/337/0.046 = 128-->147+/-15 generations,

3663/337/0.09 = 121 -->138+/-14 generations,

4956/337/0.12 = 123 -->141+/-14 generations

It all gives 3675, 3450, and 3525 years, plus-minus as indicated.

As you see, 25, 37 and 67 marker haplotypes fit quite well with each other. This is for one particular dataset, for R-L21. An average number of generations is 142, and the principal numbers are within 3.3% difference. For a much more massive dataset the number is 150+/-15 generations (3750+/-380 years) from a common ancestor, which is pretty much the same. As you remember, L21 is under M269, under L23, under L51, under L11, under P312.

A whole ladder for 30 subclades of R1b1a2-M269 with dates and confidence intervals is given in Proceedings, June 2011 (vol. 4, No. 6, p. 1127), http://aklyosov.home.comcast.net

2. Corrections for reverse (back) mutations.

When you look at the formula you have quoted above (taken from my paper in J. Genet. Geneal., 2009) [I did not like the title of the journal, that is why I set a new journal, on DNA genealogy, and it is very different indeed), you immediately recognize that it is the Poisson distribution. The way how this particular formula was derived is described in the Proceedings, October 2008, pp. 631-641. You need to use an e-translator, because the article is in Russian. It contains a few pages of mathematics, which I cannot reproduce here due to limited space. It does not need any experimental confirmation since it is based on Poisson distribution, as hundreds and thousands of other applications in the world. In fact, all calculations of mutation rates, as well as practically all chemical kinetics is based on the PD. This is also illustrated in cited J. Genet. Geneal.

3. 25 years per "conditional" generation.

It has nothing to do with a common "generation". It is a mathematical value. Common generations is a floating timespan. It depends on many factors - culture, lifestyle, climate, wars, famine, etc. It cannnot be used in calculations, therefore, all those endless discussions which value to take - 16, 18, 20, 25, 27, 30, 31.5, etc. years are practically senseless. None of them would be right for different people and different times.

In fact, it is not needed. When you work with haplotypes and mutations, you get "kt", a product of the mutation rate constant and a number of generations. Therefore you can chose whatever t (the length of generation) you like, and you get the mutation rate constant from it. They bound to each other. I chose 25 years, and the mutation rate constant came out of it as 0.022 for 12 marker haplotypes, 0.046 for 25 market ones, 0.090 for 37 marker ones, 0.12 for 67 marker haplotypes, and 0.198 for 111 marker haplotypes. The last value should be verified, but it would not be changed much. If one does not like 25 years per generation, he/she can take, say, 31 years, and the mutation rate constants will shift proportionally. The final result (in years) will be exactly the same.

I hope it helps.

Regards,

Anatole Klyosov; Thursday, September 01, 2011 2:19:00 am
Anatole Klyosov said...: Yes, one more thing (it exceeded the linit in the preceding message):

How do you make sure the following doesn’t happen:
We have a DNA Project for R1b-M269 bearers, and they were tested using say 10 STRs.....

This is a kindergarten "problem". It is "Rule No. 1" with which I have started this thread - do not mix different lineages (branches, populations). In that example Murcians have 150/23/0.025 = 261 generations to a common ancestor (w/out a correction), Catalans have 400/200/0.025 = 80 generations. If you mix the two, you get a phantom "common ancestor" somewhere in between. It is a no-no. In that particular case if you mix them you get 550/223/0.025 = 99 generations, almost three times lower than the oldest, 261 generations. In fact, after correction for back mutations you get 99-->110 generations and 261-->350 generations, that is the error equals 320%. That is exactly how "Zhivotovsky method" works: they mix all branches, get much lower TMRCA, and multiply it by 3 (by using 3 times lower "population mutation rate"). The problem is that they multiply by 3 everything, right and wrong TMRCAs, and as a result you never know what you got. Often a result is 300% higher than it should be.

Anatole Klyosov; Thursday, September 01, 2011 2:21:00 am
Dienekes said...: Anatole, do you really believe that R1a1 folk built both Gobekli Tepe _and_ Stonehenge? Your "new science" of DNA Genealogy sure leads to some funky conclusions.; Thursday, September 01, 2011 12:41:00 pm
Anatole Klyosov said...: Dienekes said...

>Anatole, do you really believe that R1a1 folk built both Gobekli Tepe _and_ Stonehenge? Your "new science" of DNA Genealogy sure leads to some funky conclusions.

Dienekes,

The question is not what I "believe" or "not believe". "Believe" belongs to religion, not science. By the way, I have never wrote what you have mentioned in any scientific edition, sich as I know or I believe. However, a legitimately scientific question would be whether we have any data (or indications) of R1a1 (possible) presence in the regions of
Gobekli Tepe and Stonehenge and in the respective time periods? Next question would be whether the two mentioned sites have something certain in common, say, from architectural viewpoint? Design?

Agree, that when we start to lay out DATA and their thoughtful consideration, we gain knowledge regardless the (current) answer would be "yes" or "no". I said "current", because some new data might appear and a current answer might go from yes to no, or in the opposite direction.

Agree, this is the way science works.

So I suggest you to ask me a different question - such as "Do you have ANY indications that R1a1 were in Anatolia some 11-9 thousand years before present and in Europe 5-4 thousand years bp?"

From my side, I would ask you a reciprocal question: Do you know for fact that there is no ANY indication that R1a1 were in Anatolia some 11-9 thousand years before present and in Europe 5-4 thousand years bp?"

This is how a constructive discussion takes place, rather then throwing questions with an accusatory connotation.; Thursday, September 01, 2011 1:15:00 pm
Dienekes said...: Do you deny that you wrote this?

http://www.rodstvo.ru/forum/index.php?showtopic=1866

in which you not only argue that R1a1 folks built Gobekli Tepe on their way from the Altai to Europe (where they built Stonhenge), but also that they "taught" J1, J2, G, E to build before going west.

I just thought it'd be a good idea for people to know some of your bizarre ideas that don't make it to your "scientific" publications.; Thursday, September 01, 2011 1:23:00 pm
Tiger Mike said...: Dienekes said:
Your "new science" of DNA Genealogy sure leads to some funky conclusions.

Dienekes also said:
Anatole, I've simulated your method and it doesn't work; the logarithmic method in particular is crap and has incredibly huge confidence intervals.

My comments:
Anatole does make speculative proposals on ancient cultural associations with haplogroups, but I don't think he has been misleading in presenting these proposals as anything other than speculative. I disagree with many of these proposals but appreciate anyone who is willing to go out on a limb as long as it is sincere.

I think many readers are interested in the objective arguments related to the methods and tools used. In that regards, can the concerns about Anatole's methods be explained more explicitly? or are those concerns directly and explicitly explained elsewhere?; Thursday, September 01, 2011 5:14:00 pm
Dienekes said...: Klyosov's methods have been extensively discussed on GENEALOGY-DNA-L (a particular lively criticism of his R1b from the Altai thesis transpired in Nov 2010). My own experiments on Y-STRs can be found in the Y-STR Series label of this blog.

People are free to evaluate his arguments and data for themselves, but I don't plan to devote any more time on them and on Y-STRs in general, unless some convincing new development takes place that will make me change my opinion on them.; Thursday, September 01, 2011 5:36:00 pm
Anatole Klyosov said...: Dienekes said:

Do you deny that you wrote this?

http://www.rodstvo.ru/forum/index.php?showtopic=1866

in which you not only argue that R1a1 folks built Gobekli Tepe on their way from the Altai to Europe (where they built Stonhenge), but also that they "taught" J1, J2, G, E to build before going west.

I think that you have to check you e-translator, or examine your informers.

First, that is what I wrote above:

>>By the way, I have never wrote what you have mentioned in any scientific edition

What you refer to is not a scientific edition. It is an open-minded discussion site, packed with jokes, entertaining things, testing area for new ideas, etc. etc.

Second, there was only one line in which I have mentioned those haplogroups, and it was as follows (Теоретически, Гобекли могли строить и G, J2, E) which trabslates as "theoretically, Gobekli could have been built by G, J2, E". Do you see any resemblance with what you have "translated"??

>Klyosov's methods have been extensively discussed on GENEALOGY-DNA-L (a particular lively criticism of his R1b from the Altai thesis transpired in Nov 2010).

Again off target. The question was

can the concerns about Anatole's methods be explained more explicitly?

You wrote "methods have been extensively discussed". Can you see a difference?

My answer to the question is that NOBODY has ever shown that the method is incorrect or based on incorrect assumptions, or that the mutation rate constants are incorrect, or the results are incorrect. Nobody. In fact, nobody has even tried to consider it, using any specific examples. Your quotation above is a good example. You just have said on passing that method is bad. I gave above a specific example, with R-L21. Where are the critics? Take any L21 dataset (or any dataset on that matter), show me that me methodology is incorrect, that the results should be noticeably different, that the margin of error makes the data meaningless, etc.

It is O.K., I have seen in my life in science very many people who have no idea what they they are talking about, or who criticise without offering anything positive on the subject.

We have the same situation here. Not the first time, not the last one.

My own experiments on Y-STRs can be found in the Y-STR Series label of this blog.

You forgot to mention my response to it, which completely nullified your "own experiments on the Y-STR".

Sincerely,

Anatole Klyosov; Thursday, September 01, 2011 7:06:00 pm
Dienekes said...: Second, there was only one line in which I have mentioned those haplogroups, and it was as follows (Теоретически, Гобекли могли строить и G, J2, E) which trabslates as "theoretically, Gobekli could have been built by G, J2, E". Do you see any resemblance with what you have "translated"??

You didn't have one post in that discussion, look further down.

"А может и вообще это были J1, J2, G, E, которых R1a1 научили строить, а сами ушли на запад. "

What you refer to is not a scientific edition. It is an open-minded discussion site, packed with jokes, entertaining things, testing area for new ideas, etc. etc.

Hopefully you acknowledge that your R1a1-built-Gobekli Tepe-and-Stonehenge theory is a joke.; Thursday, September 01, 2011 8:08:00 pm
Anatole Klyosov said...: >You didn't have one post in that discussion, look further down.

"А может и вообще это были J1, J2, G, E, которых R1a1 научили строить, а сами ушли на запад. "

Oh, yes, further down was a number of jokes indeed. It was one of theme. Again, you quote a site with thousands comments, jokes, etc., and not a scientific publication. I am impressed that even that stuff can be used as an "argument" against brainstorming.

In fact, the whole discussion was based on a recent paper in National Geographic (June 2011, p. 39-59), which said, I quote - "Known as Gobekli Tepe, the site is vaguely reminiscent of Stonehenge, except that Gobekli Tepe was built much earlier... some 11,600 years ago". The "vague" resemblance with Stonehenge was explained on a picture showing Gobekle Tepe as vertical pillars (16 tons of weight) arranged in a circle, with horizontally placed huge stone blocks on top of each one (T-shaped pillars).

But the main question here is - since when hypotheses (brain storming) are met with such a hostility? Do you know who built Stonehenge? Which "haplogroup"? Do you know who built Gobekli Tepe? Which "haplogroup"? O.K., you do not know. In fact, nobody knows. Again, why such a hostility to ideas, which - in principle - can be examined and verified?

Hopefully you acknowledge that your R1a1-built-Gobekli Tepe-and-Stonehenge theory is a joke.

First, it is not a "theory". At least yet. However, there are indications that R1a1 were in Central Asia some 20,000 years ago, that they might have been moving through Anatolia westward some 12-10,000 year ago (where some proto-IE language could have been left), and that they have been in Europe thousands of years before 4600 ybp (the last date from Haak et al, 2008). Stonehenge was built between 4500 and 3500 ybp (radiolabeling data are summarized in Johnson, Solving Stonehenge, 2008, 288 pp).

Please, Dienekes, try sometimes be positive in considering someone else's ideas, which CANNOT be overthrown, as least yet, by contemporary science. Do not try to control things when you can just listen. Peacefully. Openmindedness is a good thing in science. It is probably not a good thing for police officers and prison guards.

Regards,

Anatole Klyosov; Thursday, September 01, 2011 9:31:00 pm
Onur Dincer said...: You believe your own BS and come up with fanciful scenaria of R1b Proto-Turks invading Europe, or R1a's coming from the Altai to Europe and then going back to Siberia all with clockwork precision.

Are these fanciful scenari really Anatole's, and even if not his, at least supported by him, or, are these actually libels against him by his foes? I am asking this question as it is hard for me to believe that a scientist in Anatole's position really conceives of, creates, endorses and/or takes seriously such fanciful and infantile scenari.; Thursday, September 01, 2011 11:50:00 pm
Anatole Klyosov said...: Mike said...

Anatole does make speculative proposals on ancient cultural associations with haplogroups, but I don't think he has been misleading in presenting these proposals as anything other than speculative. I disagree with many of these proposals...

Dear Mike,

First, please lay out here, for example, a couple of those proposals of mine you disagree with, and JUSTIFY (based on clear and commonly accepted scientific evidencies) what is wrong with them.

I bet that you fail.

ALL my hypotheses are based on actual data that cannot be just dismissed. People generally base their attitude on "conventional wisdom", not on scientific data.

Please keep in mind that I am a professional scientist, and I have been trained to advance hypotheses in a brain storming manner. After I offer them, the next step is a careful consideration what they support, what they contradict, and what is a balance between the two. And by "support" and "contradict" I mean DATA, not "opinions", and not "conventional wisdom". This is what science is about.

It is in a way depressing, how so many folks are far away - mentally - from this truly scientific methodology. They curse, "dismiss", agressively attack, as it is something that intrudes their privacy, their family life, their wellbeing, a loyalty of their spouses. Sometimes I want to scream - folks, it is just a HYPOTHESIS! It just my attempt to explain things that contemporary science cannot explain. Why such a hostility? Do YOU have anything to offer in that regard?

A mystery, truly.; Friday, September 02, 2011 3:49:00 am
Anatole Klyosov said...: >Onur said...

Are these fanciful scenari really Anatole's...

Dear Onur,

You are quite right in your doubts.

Here are some relevant data and observations regarding the quoted in a "funny" fashion.

1. R1b1 arose in Central Asia around 16,000 ybp. There are a number of indications to it, such as a pattern of mutations in R1b1 haplotypes in Central Asia (Uigurs, Kazachs, Tuvans, Bashkirs, etc.), which dramatically differs from that in R1b1 in Europe, and the respective datings. Then, a treck of archaic Turkic languages which well coincides with the R1b1 migration route. Then, similar elements which might belong to a proto-Turkic language which only remoutely reminds current Turkic languages along this R1b1 migration treck. Linguists cannot read that proto-Turkic language, it has practically disappeared. On the other hand, there are common elements in the Basque, Northern Caucasian languages, Sumer language, some Sino-Caucasian languages and other agglutinative languages, compared to flective IE languages. So my suggestion was that an ancient language of R1b1 was that agglutinative language which can be called Sino-Caucasisn, or Proto-Turkic, or whatever, it does not make any difference how to call their non-IE language, since there is no name for it as yet. I can call it "Erbin" to reflect that it was the R1b1 tribe who spoke it. On the other hand, it is known that there was/were many non-IE languages across Europe before 1,000 BC (and some of them were carried into the 1st millennium AD).

In short, R1b1 arose ~16,000 ybp in Central Asia, they spoke a non-IE language (Erbin), they made their way to Europe and arrived ~4800 ybp by several routes. They brought their non-IE language to Europe that time, and only 500 BC a first IE language was found in Celts/Kelts, who - arguably - can be referred to as R1b1. Still, it is not clear, they might have been R1a1. After it, and along with disappearance of the Etruscans, Europe started switching to IE languages back again. "Back" - because IE languages are much older, and probably they were in Europe, or even dominating in Europe some 10-6 thousand years before present.

As you see, this is a very interesting and vaguely understood story or migrations of people and languages in Europe.

A similar story was with R1a1. Clearly, they did not appear in Europe out of the blue. Clearly, it is a brother subclade of R1b1. Clearly, they also came from Asia very long time ago. Apparently, it was them, who spoke IE languages 10-6 thousand years before present, and clearly it was them who brough IE language to India and Iran ~3500 ybp.

Indeed, it was shown that the most ancient R1a arose in Central Asia, either in the Altai region or in North China region (some of the local Chinese populations are up to 25-30% R1a1), where there are dated - by different haplotype datasets - between 17 and 21 thousand years ago. From there they have migrated to Europe by the "Southern Route" (unlike the "Northern Route" of R1b1) - via Tibet, Indostan, Iran, Anatolia to the Balkans, where some "oldest" R1a haplotypes are found (with DYS393=13, not 11 as among other R1a1). This branch is coined "R1a Old European branch" among total 22 branches of the R1a1 clade, and it is dated betweeen 11 and 8 thousand years back.

From there R1a1 appeared ~4800-4600 ybp on the Russian Plain, and moved eastward as a chain of archaeological cultures now shows, up to Andronovo and Afanasievo cultures (it is unclear for the last one, are they from Europe, via Andronovo, or they are the original R1a from 20,000 ybp). R1a1 are excavated in South Siberia and in Tarim basin.

I do not know what is the "clockwork precision" here and why?

Again, this pattern explains and put together many historical and lingustic puzzles, including those of the "Nostratic concept". Again, here I laid out only a small fraction of facts and observations. They are described in more details in my publications.; Friday, September 02, 2011 3:52:00 pm
jeanlohizun said...: Anatole Klyosov said:

This is a kindergarten "problem". It is "Rule No. 1" with which I have started this thread - do not mix different lineages (branches, populations).
[…]
That is exactly how "Zhivotovsky method" works: they mix all branches, get much lower TMRCA, and multiply it by 3 (by using 3 times lower "population mutation rate"). The problem is that they multiply by 3 everything, right and wrong TMRCAs, and as a result you never know what you got. Often a result is 300% higher than it should be.

Well is seems you failed to follow your own rule in your Anatole et al(2008) paper published in the Journal of Genetic Genealogy. At least that is the impression the reader gets from what you wrote on the paper.

I’m talking about this paper: http://www.jogg.info/52/files/Klyosov1.pdf

All 750 haplotypes showed 2796 mutations with respect to the above base haplotype, with a degree of asymmetry of 0.56. Therefore, the mutations are fairly symmetrical, and a correction for the asymmetry would be a minimal one. The whole haplotype set contains 16 base haplotypes.

An average mutation rate for the 19-marker haplotypes is not available in the literature, as far as I am aware of, and cannot be calculated using the Chandler's, Kerchner's, or other similar data. However, the Donald Clan latest edition of 88 haplotypes contains 63 mutations in the above 19 markers. Taking into account the 26 generations to the Clan founder (see above), this results in the mutation rate of 0.0015 mut/marker/gen and 0.0285 mut/haplotype/gen, listed in Table-1.

The logarithmic method gives ln(750/16)/0.0285 = 135 generations, and a correction for reverse mutations results in 156 generations (Table A), that is 3900 years to a common ancestor of all the 750 Iberian 19-marker haplotypes. It corresponds well with 3500±480 ybp value, obtained above for 12- and 25-marker Basque haplotype series. The "mutation count" method gives 2796/750/19 = 0.196±0.004 mutations per marker (without a correction for back mutations, that is obs = 0.196±0.004), or after the correction it is 0.218 ± 0.004 mutations per marker, or 0.218/0.0015 = 145±15 generations, that is 3625±370 years to a common ancestor of all 750 Iberian R1b1 haplotypes.

So why did you not separate the lineages of the different Iberian populations and instead used the bulk of lineages? By not separating the lineages and mutations into the different populations you have completely missed any possible internal population that can have a much higher diversity value. Because as I showed before it is possible for two populations to have a lower TMRCA when they are combined together than one of them can have separately. As for your estimation of the mutation rate of 0.0015 mut/marker/gen, I have to say, the results are not reproduced when using the average measured mutation rates of 14 of the 19 STR markers Adams et al(2008) used. Also certain STR have been shown to mutate faster in one haplogroup than in others(i.e. DYS392 in Haplogroup N and Q, or DYS388 in Haplogroup J). You used estimated data from one R1a1 project, and extrapolated that data to R1b1b2 bearers, yet there are indeed differences between the mutation rates of different haplogroups.

For example out of the 19 STR used by Adams et al(2008) 14 have mutation rates readily available on the Web, I used the old estimated mutation rates by Chandler et al(2006) which have a 15-20% error for the mean of each mutation rate, and differ from empirically measured mutation rates found in the YHRD.org.; Friday, September 02, 2011 10:38:00 pm
jeanlohizun said...: These 14 STR are DYS19(0.00151), DYS388(0.00022), DYS389i(0.00186), DYS389ii(0.00242), DYS390(0.00311), DYS391(0.00265), DYS392(0.00052), DYS393(0.00076), DYS437(0.00099), DYS438(0.00055), DYS439(0.00477), DYS460(0.00402), DYS385a(0.00226), and DYS385b(0.00226). There aren’t estimated or measured values for DYS434, DYS435, DYS436, DYS461 or DYS462 respectively.

Then we have that the mean mut/marker/gen is 0.0020. The problem is that the standard deviation of that average is 0.0014, which leads to huge margins of errors. Anatole metioned that there were 2796 mutations present in the 19 STR, thus subtracting the mutations found in DYS434(16 mutations),DYS435(4 mutations), DYS436(5 mutations), DYS461(127 mutations) and DYS462(45 mutations) which are a total of 197 +-20 mutations. Thus the total number of mutations in the remaining sample would:

2599/750/14=0.2475+-0.0019 mutations per marker

Now an assumption taken when taking the average of markers with very different mut/marker/gen rates is that one should expect faster markers to have the same ratio of mutations to the slow markers as they have in mutation rates(i.e There should be 2.9 times more mutations on DYS19 than in DYS392), in order to overcompensate for the mean mut/marker/gen. However as we would see, that is not the case. For example dropping the slowest STR markers and their mutations from sample should have no effect on the generations to MRCA because we assume they behave linearly. In the Adams et al(2008) the slowest STRs are: DYS388(0.00022), DYS393(0.00076), DYS392(0.00052), DYS437(0.00099) and DYS438(0.00055).

If we drop this 5 STR then the mean mut/marker/gen turns into 0.0028 and the standard deviation drops to 0.0010.

Let’s now explore what is effect of dropping the 5 slowest STR on the overall variance of the Adams et al(2008) sample. Thus far there are a total of 2599+-20 mutations on the sample, subtracting the mutations found in DYS388(17 mutations), DYS392(67 mutations), DYS393(103 mutations), DYS437(223 mutations) and DYS438(54 mutations) which are a total of 464 mutations +-40 mutations. The total number of mutations in the remaining sample would:

2135/750/9=0.3163+-0.0089 mutations per marker

Doing the same thing for the 5 slowest STRs, which have a mean mut/marker/gen of 6.08*10^-4 and a standard deviation of 2.8752*10^-4.

464/750/5= 0.1237+-0.0106 mutations per marker

Now doing the correction for back mutations assuming a symmetric tree yields:

14 STR—Corrected Mutation rate per marker: 0.2823+-0.0025
14STR --- Mean mut/marker/gen: 0.0020
14STR--- TMRCA(25 years/gen): 3529 +-31.25 ybp

9STR(Fastest)--- Corrected Mutation rate per marker: 0.3751+-0.0125
9STR(Fastest)--- Mean mut/marker/gen: 0.0028
9STR(Fastest)--- TMRCA(25 years/gen): 3349 +-112 ybp

5 STR(Slowest)---Corrected Mutation rate per marker: 0.1318+-0.0121
5 STR(Slowest)--- Mean mut/marker/gen: 6.08*10^-4
5 STR(Slowest)---TMRCA(25 years/gen): 5412 +-495 ybp

So which one are we going to choose as the right TMRCA, in the ideal case all STRs should yield a very close TMRCA regardless of their mutation rate, and thus why Dr.Klyosov uses an average mut/marker/gen, but as I just showed above, the choice of STR is very important in the calculations of TMRCA. Fast mutating STR tend to overshadow slow mutating STR in big samples, whereas slow mutating STRs tend to overshadow fast mutating STR in small samples. The thing is that given that we cannot control the direction(back or forth) of the mutations, over long spans of time slow STR are better are predicting TMRCA. However because they are slow, a small sample size would tend to undermine its presence. On the other hand the method of averaging mutation rates regardless of their values is very unreliable because it tends to undermine the values of slow STR in big samples. Thus I agree with Busby et al(2011) conclusion that is not the amount of STR but the sets of STRs what matters when determining TMRCA.

Regards,

Jean Lohizun; Friday, September 02, 2011 10:48:00 pm
Anatole Klyosov said...: My dear friend,

Unfortunately, you are too quick with the word "failed" directed to others, which know MUCH more than you in the area you dare to criticize. You had to write - "I seem failed to understand...". This would have been a fair and a correct statement.

>So why did you not separate the lineages of the different Iberian populations and instead used the bulk of lineages?

Because there was no need in it back in 2009. I could have done it, and the fit between the linear and logarithmic methods would have been better. But it was good enough: 145±15 and 158 generations, respectively. As you see, they are the same within the margin of error. The difference between their principal values is 9%, which is well within typical margins of error in DNA genealogy.

Remember, in the above example which you gave earlier, the difference with 4 times, or 400%. Can you see the difference between 400% and 9%?

Second reason why I did not separate the lineages back in 2009 is that that time I was not looking for recent lineages. I have dome the separation in a recent paper, in June 2011. In the latest paper I have isolated 30 subclades and determined the TMRCA for each one. Please notice that the Iberian major R1b1a2 subclade had the TMRCA of 3900±400 ybp (page 1133) and 3525±360 ybp (pp 1162-1163, different datasets). As you see, they are all the same thing.

One more advise to you: please look at the core data, not on some "side dishes". I have already explained to you earlier about checkers on a cab. They are not that important compared to the ride itself.

So, we are done with the first part.

The second part, in which you have manipulated with those five markers, is totally flawed. In fact, you got huge margins of errors, and you did not realize it. Instead, you wrote truly absurd things. Look what have you done:

14STR--- TMRCA(25 years/gen): 3529 +-31.25 ybp

Can you imagine? Margin of error of 0.9% (!!) What is 31.25 years (!?)

Next:

9STR-- TMRCA(25 years/gen): 3349 +-112 ybp

Margin of error of 3% (!!)

Next:

5 STR---TMRCA(25 years/gen): 5412 +-495 ybp

Margin of error of 9%.

It is all grossly incorrect. The way how you have been doing it would give you margins of error no less than 60-80%.

Result: all you calculations are based on an incorrect plattform. You should have re-calibrated your "estimated" 4, 5, and 9-marker "haplotypes", using some known genealogy, to make sure how they work. In fact, all "datings" which you have obtained - 3529, 3349, and 5412 are around ~ 4100+/-1140 ybp, within 30% margin of error. That is what you have actually obtained.

You have made many additional errors in your "explanations", but it is worthwile go over them. The thing is that your main conclusion -

Thus I agree with Busby et al(2011) conclusion that is not the amount of STR but the sets of STRs what matters when determining TMRCA

is applicable to you personally, but not to the field itself. You need just learn how to treat data correctly.

Anatole Klyosov; Saturday, September 03, 2011 9:55:00 pm
Onur Dincer said...: Anatole, your scenario of the spread of languages and haplogroups is very much on the speculative side of the debate so much so that it is on the border between science and pseudo-science.

BTW, Turkic languages are now accepted as a branch of the Altaic language family by most linguists, and both the homeland of the Turkic language family and the homeland of the Altaic family are commonly thought to be somewhere in Greater Mongolia and/or eastern part of Siberia, thus both language families were most probably originally spoken by full or almost full Mongoloid people. So your R1b1-Proto-Turkic connection isn't plausible (whatever R1b1 Central Asian Turkic people now carry seems to be entirely from the pre-Turkic inhabitants of Central Asia).; Sunday, September 04, 2011 12:54:00 am
jeanlohizun said...: Anatole Klyosov said:

The second part, in which you have manipulated with those five markers, is totally flawed. In fact, you got huge margins of errors, and you did not realize it. Instead, you wrote truly absurd things.

I did not manipulate anything sir, I would appreciate it if you would stop it with the Ad Hominems. If you think I manipulated something then why don’t you go back and count the amount of mutations present in the Adams et al(2008) sample for the markers DYS388(0.00022), DYS393(0.00076), DYS392(0.00052), DYS437(0.00099) and DYS438(0.00055). You would see that there are 464 +-40 mutations. You are the one that doesn’t realize that once you mix together slow STRs such as DYS388, DYS393, DYS392, DYS437, DYS438 with fast STR such as the other 9 you undermine the value of the slow STRs.

In fact while the mean(0.002) mut/marker/gen was close to your approximated value of 0.0015, the standard deviation was 0.0014. You are quick to label my writings absurd, but you can’t truly explain why when only the slowest 5 STRs are used the sample produces a TMRCA older by at least 1500 ybp than that of the combined sample or the fastest 9 STRs. I’m simply showing to you that the mean mut/marker/gen method is flawed; because for it to work you would have to have a direct proportionality between mutation rate and number of mutations present in each STR, and such proportionality does not exists, at least not in the Adams et al sample.

to be continued...; Sunday, September 04, 2011 2:59:00 am
jeanlohizun said...: Anatole Klyosov said:

Look what have you done:

14STR--- TMRCA(25 years/gen): 3529 +-31.25 ybp

Can you imagine? Margin of error of 0.9% (!!) What is 31.25 years (!?)

Next:

9STR-- TMRCA(25 years/gen): 3349 +-112 ybp

Margin of error of 3% (!!)

Next:

5 STR---TMRCA(25 years/gen): 5412 +-495 ybp

Margin of error of 9%.

Those margins of error do not include the uncertainty on the mutation rate which is 15-20%, I was simply trying to show the readers that while in your ideal world slow STRs, fast STRs and the combined sample should all produce a somewhat close result, the reality is that they don’t. There is no way around it, using the 5 slowest STRs in the Adams et al sample produces 464 mutations, which divided by the average mutation rate of those 5 slowest produces 209 generations to TMRCA without the correction for back mutations.

Anatole Klyosov said:

It is all grossly incorrect. The way how you have been doing it would give you margins of error no less than 60-80%.

No sir, your assumption of a mean mutation rate has margins of error on the 60-80% order. Your mean mut/marker/gen for 25 markers has a standard deviation of 0.0024 almost the same as the mutation rate. I know in your assumptions you estimated the standard deviations to be in the order 5-10%, however the calculated ones are far worse than your estimated ones.

Anatole Klyosov said:

Result: all you calculations are based on an incorrect plattform. You should have re-calibrated your "estimated" 4, 5, and 9-marker "haplotypes", using some known genealogy, to make sure how they work. In fact, all "datings" which you have obtained - 3529, 3349, and 5412 are around ~ 4100+/-1140 ybp, within 30% margin of error. That is what you have actually obtained.

You have made many additional errors in your "explanations", but it is worthwile go over them. The thing is that your main conclusion -

Thus I agree with Busby et al(2011) conclusion that is not the amount of STR but the sets of STRs what matters when determining TMRCA

is applicable to you personally, but not to the field itself. You need just learn how to treat data correctly.

Results: I have just proven to you that you cannot use a combined set of STRs where some STRs are two orders of magnitude faster than the slowest STRs. I used the same data you used, and anyone can use here, and count the mutations in the slowest STRs and they would see what I’m saying. Whether you are ready to admit that your methodology might have overlooked at the possibility of the STRs not behaving linearly is another story. Again I could explain it to you in more details if you want, but basically STRs do not behave as nicely and predictable as you want to portray them.; Sunday, September 04, 2011 3:01:00 am
Anatole Klyosov said...: My dear friend,

You have just written many words on a subject you are not knowledgeable in. Your manipulations are all based on an indiscriminate extraction of Chandler's numbers without their verification. Without even thinking that they (or just one of them) might be incorrect. Have it occurred to you that some numbers might be incorrect indeed?

You have overlooked my comment above in this thread:

>>Indeed, Chandler's table, the most reliable one for the first 12 markers...

Why do you think I mentioned only the first 12 markers as the most reliable? Because I know all of them, and examined and cross-examined each one of them. This reflects the principal difference between a professional such as myself (in kinetics of time-related processes) and you with your ignorance in the subject. You just grab numbers without thinking how reliable those numbers are.

You do not look at the core of the problem. You do not want to pay any attention at the fact, that those Iberian R1b1a2 haplotypes produce essentially the same TMRCA whether they are calculated using 19 marker haplotypes, 25 marker, 37 marker, or 67 marker haplotypes, using either linear or logarithmic methods. All those results are withing margins of error. In spite of this obvious result, you grab something indisctiminately, manipulated mindlessly with them, and voila. In a way, you have repeated the same flawed "approach" as the paper beloved by you. They also grabbed someting (wrong "rates" from father-son pairs) without thinking, and voila.

THIS is pseudo-science. And you with your manipulations fully belong to it, at least in this particular case.

Anatole Klyosov; Sunday, September 04, 2011 3:53:00 pm
Anatole Klyosov said...: Onur said...

...your scenario of the spread of languages and haplogroups is very much on the speculative side of the debate so much so that it is on the border between science and pseudo-science.

It is rather senseless to discuss the matter after I laid out here the background of the concept point by point, and you did not bother to respond with DATA. Pseudo-science is the way you chose to respond.

Turkic languages are now accepted as a branch of the Altaic language family by most linguists

WHICH Turkic languages? Do you know datings of origin/split of those languages which are considered by "most linguists"? Do you realize that I am talking about a time period for R1b1 and their language between 16 and 5 thousand ybp?

>...both the homeland of the Turkic language family and the homeland of the Altaic family are commonly thought to be somewhere in Greater Mongolia and/or eastern part of Siberia... So your R1b1-Proto-Turkic connection isn't plausible (whatever R1b1 Central Asian Turkic people now carry seems to be entirely from the pre-Turkic inhabitants of Central Asia).

You gave a very confusing and self-conflicting statement (rather, a mix of conflicting statements), considering that R1b1 arose in Central Asia, and likely in the Altai region, that their language was "Erbin" which linguists have not even considered (if they did, a reference please), that considering their possible route this Erbin might have been a Proto-Turkic language. How much linguists know on Proto-Turkic languages? How much do YOU know about them? Have you ever read on Proto-Turkic languages? I have.

How you would respond to a paper featured in the current Dienekes selection, which concluded that Turkic language in the present day Turkey was not brought from the East, as many other Turkic languages in the world?

I would not bet on that conclusion, however, it does not conflict with what I am taking about. Maybe there is something in it.

A lesson to you - don't throw around words such as "speculative", "pseudo-science". It is a bad sign, at least for a scientist. I doubt you are one of them. Your words mean that you either do not think, or just cannot think. You probably have not heard on "brain storming", when you first express various scenarios, and then look at available data and see what potentially confirms and what clearly contradicts. Where is that balance in your "critique"?

In a way, I am pleased. I see time and again that there are not many people around who want and can think and analyze DATA.

Anatole Klyosov; Sunday, September 04, 2011 4:25:00 pm
Dienekes said...: Turkic languages belong to the Altaic language family. Altaic languages (Mongolic, Turkic, and Tungusic, and more distantly Korean and Japanese) are primarily located in central-east Eurasia and spoken by Mongoloids and admixed Mongoloids. We also have good evidence now that there is a common autosomal component to Altaic speakers (see one of the links in the post), and this component is also aligned with East Eurasians (Mongoloids).

Every single piece of evidence points to the fact that Turkic languages were first spoken by Mongoloids. So, even if Anatole was right about R1b in the Altai a very long time ago -which he isn't, as discussed in GENEALOGY-DNA-L in 2010- there is no reason to associate that R1b with any sort of Turks. Actually the time depth he is talking about precedes the formation of the Turkic languages anyway.

Anatole always speaks about DATA, but he disregards all the data and relies entirely on his own Y-chromosome analysis to pretty much come up with the most imaginative scenaria.; Sunday, September 04, 2011 9:44:00 pm
Dienekes said...: How you would respond to a paper featured in the current Dienekes selection, which concluded that Turkic language in the present day Turkey was not brought from the East, as many other Turkic languages in the world?

Turkic languages were brought to Anatolia from the east; we are fortunate that this event happened in full light of history, so there is no argument to be had here.

What the current paper has concluded is that the arrival of Turkic languages in Anatolia was not the result of massive migrations from the East, which is a reasonable conclusion that has been confirmed time and again.; Sunday, September 04, 2011 9:46:00 pm
jeanlohizun said...: Anatole Klyosov said:
Why do you think I mentioned only the first 12 markers as the most reliable? Because I know all of them, and examined and cross-examined each one of them. This reflects the principal difference between a professional such as myself (in kinetics of time-related processes) and you with your ignorance in the subject. You just grab numbers without thinking how reliable those numbers are.

No sir, I’m sorry but I had it with your Ad Hominems. Firstly you are NOT a geneticist, you a Biochemist who is doing genetics as a hobby. Yet you dare to call the work of other Geneticists, and that of team of scientists “utter nonsense”. Firstly I was trying to be respectful towards you because you are older than me, and because unlike you I have professionalism. Anyhow, just to give you a taste of reality.

So you think that you estimated mean mutation rate from the Donald clan project is far more reliable than using empirically measured mutation data from Father-sons pairs? Well I got news for you: certain STRs such as DYS388 mutate differently in different haplogroups, so there goes the first strike in using the data from R1a1 to R1b1a2. Second strikes goes in that as I showed above, and unfortunately for you, I’ll show again using different data sets, the usage of a combined STR set tends to results in disastrous TMRCA estimations. I bet you didn’t even bother to check the standard deviation of the combined STR set. You seem to have forgotten that when the standard deviation of a set of mutation rates is almost the same as their mean, then you are in trouble. Something you don’t realize is that doing the mutation counting method on the Donald clan assumes that in the slowest markers the R1a1 people of the Donald clan would have the same number of mutations relative to their sample size and to their TMRCA as the Iberian set from Adams et al(2008).

In order for you calibrated mutations to work( even if used independently and not in the disastrous mean mut/marker/gen method ) that would mean that if the people from the Donald Clan project had say for example 14 mutations in the DYS388, DYS392, DYS393, DYS437, and DYS438 of their 88 haplotypes, the people from the Adams et al(2008) would have a similar number proportional to their sample size and TMRCA. Whereas you don’t take into account the effect of random chance, and the fact that a complete different set can have more or less mutations in those five positions. Also sir, did you know that the mutation rate forward and backward can be different for several STRs.

Again this has come down to you pretty much trying to invalidate the arguments of others based on you being a “Professional” and having written in papers(Though not published in any major Journal) about it. Again you seem to make a lot of assumptions, for example, you are assuming that I have no previous experience with STR mutation rates, as well as, that I do not have a degree in a related field.

So in this hypothetical example of the people from the Donald Clan having 14 mutations in their 5 positions above mentioned, what if one were to test a different set of people who had a known common ancestor also 26 generations ago, and they turned out to have only 4 mutations in those 5 positions. The only way to truly calibrate the data using the genealogy calibration instead of actual measured empirical data, would be to test multiple datasets with a known common ancestor and average the mutation rate of each STR independently, and check that the standard deviation is somewhat reasonable. Anyhow, if this is going to come down to you just claiming that no one but you understands the mutation rates, I think we have reach the end of our discussion. I would gladly back off now, before this discussion gets too personal, which I’m afraid is getting. Nonetheless I would keep bringing forth data that shows that the usage of a combined STR sample is flawed because fast STRs undermine slow STRs in large datasets.

Regards,

Jean Lohizun; Sunday, September 04, 2011 10:30:00 pm
Onur Dincer said...: It seems Dienekes has already written exactly what more or less I would have written in reply to Anatole's comment, so there is no need to add anything to Dienekes' last comments. The only thing I could add is that Proto-Altaic and all of its Proto-branches (Proto-Turkic, Proto-Mongolic, Proto-Tungusic, Proto-Korean and Proto-Japonic) were most probably all spoken in a region comprising what is now Greater Mongolia, northeast of what is now China and/or the eastern parts of Siberia, thus by full or almost full Mongoloids. Also, the Altai region isn't fully in Central Asia but in the intersection between Central Asia, Greater Mongolia and eastern Siberia. Central Asia proper is the region comprising the former Soviet republics that end with "-stan", and, unlike the regions that are the most likely homelands of the Proto-Altaic and Proto-Turkic languages, which are both east of Central Asia and are Mongoloid regions from time immemorial, Central Asia proper was originally inhabited by full or almost full Caucasoids until the migrations of Altaic speakers there beginning from the 1st millennium BCE at the earliest.; Sunday, September 04, 2011 10:32:00 pm
Anatole Klyosov said...: Dienekes said...

Turkic languages belong to the Altaic language family... So, even if Anatole was right about R1b in the Altai a very long time ago -which he isn't, as discussed in GENEALOGY-DNA-L in 2010- there is no reason to associate that R1b with any sort of Turks. Actually the time depth he is talking about precedes the formation of the Turkic languages anyway.

The last sentence in the above quotation dismisses the first one. There more inconsistencies in the quotation, however, I got used to it.

The discussion is pointless unless you, Dienekes, describe which language the earlier R1b1 spoke, between, say, 16 and 6 thousand years ago (or in any time period within that range). Because why to argue if you do not know? You cannot dismiss what you do not know, and when you do not have an answer.

This is a fundamental part of a scientific paradigm: if you do not know, and cannot even suggest, do not argue.

Science is not based on dismissals and denials, it is based on advancing of hypotheses and on their examinations and verifications.

What you do, it is neither examination not verification. It is an attempt of vague discreting without offerening a counterhypothesis.

Anatole Klyosov; Monday, September 05, 2011 1:56:00 am
Anatole Klyosov said...: >jeanlohizun said...

So you think that you estimated mean mutation rate from the Donald clan project is far more reliable than using empirically measured mutation data from Father-sons pairs?

No doubt. However, you forgot to mention that the Donald Clan was just the first step, and after it the data obtained were cross-examined and cross-verified on dozens of genealogies, historical events and other evaluations, and some adjustments were made and again examined and verified.

If you care, in the Proceedings, December 2010, p. 2039-2058, the first paper has the title "Reconsideration of the average mutation rate constant for 67 marker haplotypes from 0.145 to 0.120 mutations per haplotype per generation", by Klyosov and Rozhanskii. Hundreds of 67 marker datasets are considered there, to make the reconsideration. Can you see how serious people are who work in the area?

I have explained why the father-son pairs are not there yet, and why they often give misleaging and incorrect data due to poor statistics. Furthermore, I gave above concrete examples of those incorrect and inconsistent data from multiple father-son pairs. It is not my fault that you cannot understand it.

You are constantly bragging that I am not a geneticist. Mutation rate constants are not "genetics", it is chemical kinetics, my direct profession. You fail to understant it as well.

Well I got news for you: certain STRs such as DYS388 mutate differently in different haplogroups, so there goes the first strike in using the data from R1a1 to R1b1a2.

:-))))))))))

Here your ignorance goes again. It is not "news", I have researched into it for years. The verdict: DYS388 as well as ALL mutation rate constants are the same for ALL haplogroups. In short - the copying enzyme (in fact, the whole copying machinery) does not know what haplogroup is picked by us for that particular individual or a population.

DYS388 seems to be "jumpy" in J2 haplogroup only because one does not separate branches on a haplotype tree. Because the haplogroup is ancient one, it contains many branches with different DYS388 values. When one mixes them, the dataset contains DYS388 alleles in a rather wide range. When one separates branches, DYS388 is the same in each one of them. The best mutation rate constant for DYS388 is 0.00022 per the conditional generation of 25 years. It is THE SAME for all haplogroups and their subclades.

Examples for father-son transmissions:

-- in the Ballantine series: 0 mutations in 1636 pairs, that is the MRC is <0.0006 per generation,
-- in the Burgarella collection: 0.00042 per generation.

Do you see how jumpy and inconsistent data are in father-son pairs, even for about 2000 of them?

In the R1b1a2-P312 series of 2299 haplotypes (that is, 2299 DYS388 values) - 82 mutations. Since the calculations gave 4000 ybp for the dataset (to a common ancestor), that is 160 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 82/2299/160 = 0.00022 per generation.

In the R1a1 series of 1198 haplotypes (that is, 1198 DYS388 values) - 48 mutations (including 17 mutations with DYS388=10 counting them as one mutation each). Since the calculations gave 4600 ybp for the dataset, that is 184 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 48/1198/184 = 0.000217 per generation, that is practically 0.00022.

In the Chandler table it is the same 0.00022 per generation.

you are assuming that I have no previous experience with STR mutation rates, as well as, that I do not have a degree in a related field.

I do not care about you degree, I have seen in my life tons of ignorant people with degrees. Regarding "STR mutation rates" you are totally unqualified.

I would suggest you to listen to a professional.

Anatole Klyosov; Monday, September 05, 2011 3:03:00 am
Dienekes said...: This is a fundamental part of a scientific paradigm: if you do not know, and cannot even suggest, do not argue.

Science is not based on dismissals and denials, it is based on advancing of hypotheses and on their examinations and verifications.

Science IS based on dismissals and denials: it's called falsification. What I wrote is an entire array of arguments falsifying your R1b-Proto-Turkic theory. You can argue against my arguments, if you want, but you can't claim that falsification is not scientific.

Moreover, if you actually read what I wrote, you'd see that I _did_ advance a hypothesis about the origin of Turkic languages: that they came from Siberia/Central Asia and were associated with Mongoloids initially.; Monday, September 05, 2011 9:11:00 am
Onur Dincer said...: I _did_ advance a hypothesis about the origin of Turkic languages: that they came from Siberia/Central Asia and were associated with Mongoloids initially.

It is very unlikely that Proto-Turkic or any other Altaic main branch or Altaic itself developed in Central Asia proper. From the early Chinese records we know that Altaic speaking tribes (including Turkic speaking ones) primarily lived in a region north of, not west of, where Chinese speakers lived in early Chinese historical times (well into the Imperial Chinese times). Central Asia proper, on the other hand, was home to various Indo-European speakers before the Turkic and other Altaic expansions, which began during the 1st millennium BCE at the earliest (for most of the southern regions of Central Asia proper, as late as the 2nd millennium CE), thus well after the formation of the main branches of the Altaic language family. I also think the term "Altaic" is a misnomer, as the Altaic main branches are concentrated in a region east of the Altai mountains, so the Altaic homeland is probably east of the Altai region.

Correction to: Central Asia proper was originally inhabited by full or almost full Caucasoids until the migrations of Altaic speakers there beginning from the 1st millennium BCE at the earliest

Central Asia proper was originally inhabited by non-Altaic-speaking full or almost full Caucasoids until the migrations of Altaic-speaking Mongoloids to there from the east beginning from the 1st millennium BCE at the earliest; Monday, September 05, 2011 12:19:00 pm
jeanlohizun said...: Anatole Klyosov said:

If you care, in the Proceedings, December 2010, p. 2039-2058, the first paper has the title "Reconsideration of the average mutation rate constant for 67 marker haplotypes from 0.145 to 0.120 mutations per haplotype per generation", by Klyosov and Rozhanskii. Hundreds of 67 marker datasets are considered there, to make the reconsideration. Can you see how serious people are who work in the area?

I can see that you stubbornly keep
using the average of combined sets of STRs (Slow+Fast), even though anyone with access to any data set can see by running a quick experiment that often times the data from one data set cannot be extrapolated to another one, even more so, that in big data sets fast STRs undermine the TMRCA compared to slow STRs. So what if you analyzed hundreds of 67 marker datasets, none of them were actual datasets of father-son pairs, where one can truly see the true mutation rate of several markers across a generation. All of them were based on the assumption of a single common ancestor who live in x time ago, and who you assume is the common ancestor of all branches of that tree, and all branches are x generations from it. Again all these assumptions carry a lot of built in errors with it them that you keep dismissing. Look let’s talk science and not psychology, if you were “a serious member who works in the area” like you claim to be, you would have had a grant, you would have gone into the field and collected data yourself, and your results would have been in one of the major Genetic Journals. Instead you scavenge onto others collected data(i.e. Adams et al.2008) or use data collected from projects from FTDNA, that no single serious person would consider using because it lacks the quality control required for it to be consider a randomly collected representation of a population.

Anatole Klyosov said:

I have explained why the father-son pairs are not there yet, and why they often give misleaging and incorrect data due to poor statistics. Furthermore, I gave above concrete examples of those incorrect and inconsistent data from multiple father-son pairs. It is not my fault that you cannot understand it.

No quite, the fact that the sample size might be small doesn’t invalidate it at all, at least is still the far best empirically collected data we got of mutational rates across a generation. So they are often misleading because they contrast your estimated mutations rates. Thus your argument turns again into if it doesn’t agree with my data, then it is wrong. Well your data is subject to tons of errors, and I tell you what: For the 25 marker, find the average mutation rate in each one of those 25 positions, then tell me what the mean mutation rate is, and what the standard deviation for that mean is. So your methodology is misleading and incorrect because of poor statistics(i.e. The mean mut/marker/gen for 25 markers is in the same range~0.002 that the standard deviation).

Anatole Klyosov said:

You are constantly bragging that I am not a geneticist. Mutation rate constants are not "genetics", it is chemical kinetics, my direct profession. You fail to understant it as well.

No, it is more like you are constantly bragging of being an expert and attacking others by calling them ignorant, I was simply kindly reminding you that you are no geneticist. The thing is that there might be no such thing as mutation rate constants. Moreover molecular genetics papers require deep knowledge of Statistics, Math, and Biomechanics which are not your direct profession.

Anatole Klyosov said:

Here your ignorance goes again. It is not "news"….

Here goes your favorite tactic again, Ad Hominems left and right. Yes in the ideal world a position along the Y-Chomosome where a Short Tardem Repeat occurs should have nothing to do with a SNPs found along other position in the Y-Chromosome, however there have been correlations found between them. This is something rather puzzling yet true.

to be continued..; Monday, September 05, 2011 3:16:00 pm
jeanlohizun said...: Anatole Klyosov said:

Examples for father-son transmissions:

-- in the Ballantine series: 0 mutations in 1636 pairs, that is the MRC is <0.0006 per generation,
-- in the Burgarella collection: 0.00042 per generation.

Do you see how jumpy and inconsistent data are in father-son pairs, even for about 2000 of them?

The data is neither jumpy nor inconsistent, it is randomly collected data, so it comes as no surprise that DYS388 being a slow markers had no mutations present in one of the samples, it is still far better than some estimated mutational rate with a trillion assumptions. Only a person like you would argue that estimates are better than empirical data.

Anatole Klyosov said:

In the R1b1a2-P312 series of 2299 haplotypes (that is, 2299 DYS388 values) - 82 mutations. Since the calculations gave 4000 ybp for the dataset (to a common ancestor), that is 160 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 82/2299/160 = 0.00022 per generation.

:-)) How did you know their common ancestor was 160 “conditional generations” ago? Well you had to use estimated numbers from another data set. So you seem to forget that if there is an intrinsic error carried from the other calculations (which I’m pretty sure by now, there is), this mutation rate constant would also carry that intrinsic error. Again it seems we are going in circles.

Anatole Klyosov said:

I do not care about you degree, I have seen in my life tons of ignorant people with degrees. Regarding "STR mutation rates" you are totally unqualified.

I would suggest you to listen to a professional.

Of course you don’t care about my degree, because is not of importance to this conversation, as for me being unqualified to talk about “STR mutation rates”, well let the people chose who is qualified and who is not. Yes I listen to professionals everyday at my workplace, and just because I listen to them, which is why I can clearly show everyone here how flawed your methodology is. I would tell you though: that you do not sound like a professional at all.

Kind Regards,

Jean Lohizun.; Monday, September 05, 2011 3:17:00 pm
Onur Dincer said...: From the early Chinese records we know that Altaic speaking tribes (including Turkic speaking ones) primarily lived in a region north of, not west of, where Chinese speakers lived in early Chinese historical times (well into the Imperial Chinese times).

Anatole, just in case you don't know already, let me inform you: All the regions north of Chinese speakers have been Mongoloid regions from time immemorial.; Monday, September 05, 2011 4:58:00 pm
Anatole Klyosov said...: jeanlohizun said...

Well, forget it. It does not matter what he said. The individual has shown that he was not receptive to any data, reasoning, and explanations. He is a lost case.

I gave up.

Anatole Klyosov; Monday, September 05, 2011 5:20:00 pm
Anatole Klyosov said...: Dienekes said...
Onur said...

All right, now you are talking on some interesting things, minus confusions, unrelated to the subject of R1b1 appearance, migrations, and their language.

For example, known Turkic languages which appear in the 1st millennium AD, are irrelevant here, since we are talking on R1b1 between, say, 16 and 6 thousand ybp. So why constantly bring in contemporary Turkic languages?

If you folks did not catch it, my "
R1b-Proto-Turkic theory" can be equally called "R1b-Sino-Caucasian theory", or "R1b-Erbin theory", or whatever, since linguists do not have a name for that language. They apparently see some remnants of it, however, do not know where to assign it. Hence, "Eniseian", "Proto-Turkic", "Sino-Caucasian", "Sumer", "North-Caucasian", "Basque" languages, which are, apparently, all tips of one and the same iceberg, which was an ancient R1b language. Erbin.

Here is what I am talking about. All said languages are agglutinative languages, all belong to different time periods, all are found on the migration route of R1b1. That is why they do not look alike. They reflect different millennia, hence, differ from each other but all have similar elements. That is why Sumer language was considered as Manjurian, Siberian, having Turkic elements, etc. by variuos linguists, and all, of course, are in diagreement with each other. Sound familiar? That is why Basque language was considered as related to North-Caucasian languages, or Sino-Caucasian, and some linguists support it and some deny it. It is normal, since all those are ancient languages.

Regarding "Mongoloids", both Q and R are sister haplogroups, both are from Central Asia, and, maybe, both are from the same Altai region. So why R1b1 and Q could not interacted some 16,000 years ago and much later?

Unless we all agree that what is written above makes sense, there is no point to argue.

Are we all agree? If not, why not?

Anatole Klyosov; Monday, September 05, 2011 5:44:00 pm
Dienekes said...: Hence, "Eniseian", "Proto-Turkic", "Sino-Caucasian", "Sumer", "North-Caucasian", "Basque" languages, which are, apparently, all tips of one and the same iceberg, which was an ancient R1b language. Erbin.

Linking these various languages is a very controversial, albeit valid hypothesis.

The link, however, of the proposed macro-family with R1b is completely arbitrary. What is the evidence that "Sumer" was R1b or that Sino-Tibetan (which is part of Sino-Caucasian) was R1b?

Regarding "Mongoloids", both Q and R are sister haplogroups, both are from Central Asia, and, maybe, both are from the same Altai region. So why R1b1 and Q could not interacted some 16,000 years ago and much later?

First of all, R1b1 is not from the Altai even if R is from Central Asia (something that is uncertain in itself).

Second, R1b1 is not the same as R, and R1b1 is absent in a great many Altaic speaking populations. The fact that R1b1 is found in some Turkic populations does not mean that it goes up to the Proto-Altaic population; indeed the largely Mongoloid character of that population and the absence of R1b1 in most of its extant branches argue strongly against it. So, you have trouble getting R1b1 to Proto-Altaic, let alone whatever hypothetical "Erbin" might have been spoken in even more ancient times in inner Asia.; Monday, September 05, 2011 9:20:00 pm
jeanlohizun said...: Here is something folks so you can see how the approach taken by Klyosov in calculating TMRCA by calibrating the mutations rates per generation using family clan projects interacts with reality. Given that Klyosov has completely dismissed any empirically measured data from father-son pairs because of what he calls “poor statistics”, mainly fluctuations on the mutation rate per generation when two different samples are tested. He claims such is not the case with his calculations; this is what he said about it:

In the R1b1a2-P312 series of 2299 haplotypes (that is, 2299 DYS388 values) - 82 mutations. Since the calculations gave 4000 ybp for the dataset (to a common ancestor), that is 160 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 82/2299/160 = 0.00022 per generation.

In the R1a1 series of 1198 haplotypes (that is, 1198 DYS388 values) - 48 mutations (including 17 mutations with DYS388=10 counting them as one mutation each). Since the calculations gave 4600 ybp for the dataset, that is 184 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 48/1198/184 = 0.000217 per generation, that is practically 0.00022.

Of course because the TMRCA is not a known one, but an estimated one using the same methodology, this is going in circles. Why, because if his mean mutation rate methodology was known for underestimating TMRCA, using an underestimated TMRCA would likely result in the mutation being faster than it truly is. Nonetheless I have argued that one cannot extrapolate estimated data from one set to another, instead it is better to use the empirically measured data in father-son pairs. But let’s assume for a second that Anatole Klyosov is correct, and as he showed above we should expect DYS388 to yield an average mutation rate around 0.00022, let see what happens when we use a different sample:

The Iberian sample from Adams et al(2008) was used in the first paper published by Anatole Klyosov to show as a practical example of his methodology. He shows put the base haplotype of DYS388 as 12. The Iberian sample has 19 mutations (17 mutations up, 2 down) found in the position DYS388 amongst the 750 R1b-M269 haplotypes. Anyone can do the analysis which is found in the supplementary information section:
http://download.cell.com/AJHG/mmcs/journals/0002-9297/PIIS0002929708005922.mmc1.pdf

Now Anatole Klyosov in his study used the mean mut/marker/gen of 0.0015 mut/marker/gen obtained from measuring the amount of mutations in the same 19 STRs used by Adams et al(2008) on the members of the Donald Clan project, which all descend from a punative ancestor John Lord of the Isle who was an R1a1 bearer and lived 26 generations ago. With this mean mutation Anatole Klyosov dated the TMRCA of Iberians to 3625+-370 ybp.

to be continued; Tuesday, September 06, 2011 12:37:00 am
jeanlohizun said...: Here is his results:

” The "mutation count" method gives 2796/750/19 = 0.196±0.004 mutations per marker (without a correction for back mutations, that is λobs =0.196±0.004), or after the correction it is 0.218 ± 0.004
mutations per marker, or 0.218/0.0015 = 145±15 generations, that is 3625±370 years to a common ancestor of all 750 Iberian R1b1 haplotypes.”

http://www.jogg.info/52/files/Klyosov1.pdf

Now let try to see if the readily available mutation rate for DYS388 which was shown to yield similar results in two different datasets according to Klyosov produces the TMRCA in this different sample. There are 19 mutations in the DYS388, with 17 up and 2 down, this has a degree of asymmetry of 0.8947, thus a=0.6233, a1=0.3149.

The observed mutation rate per marker is:

19/750=0.02533 mutations per marker.

Applying the correction factor yields 0.02543.

Now 0.02543/0.00022 mut/marker/gen=116 generations (2890 ybp) clearly outside of the range of error of 3625+-370 ybp. Now does this mean that all Iberians descend from a common ancestor who lived 2890 years ago, likely not.What this means if we assume the TMRCA of Iberians to be 3625 ybp(145 generations) as Anatole Klyosov claimed, then the estimated mut/marker/gen that would produce such TMRCA would be 1.754*10^-4. Thus proving what I said above that using an underestimated TMRCA(i.e. 4000 ybo for R1b-P312) would likely result in the mutation being faster than it truly is(i.e. 0.00022>0.000175).

Now if we look at DYS392 which has 71 mutations (35 mutations up, 36 down) from the base haplotype. This has a factor of symmetry of 0.5070, thus a=1.983*10^-4, and a1=0.9989. The observed mutation rate per marker is:

71/750=0.09466

Applying the correction factor yields 0.09935.

Now using estimated mean mut/marker/gen of Chandler et al(2006) of 0.00052 yields:
0.09935/0.00052 mut/marker/gen=191 generations(4776 ybp).

Using empirically measured mutation rates for DYS392 which were estimated to have a mean mut/marker/gen of 4.123 × 10-4 ( 95% CI: 1.513 × 10^-4 to 8.972 × 10^-4 ) we get0.09935/0.0004123=241 generations(6024 ybp).

So thus far it has become very clear that estimated mutation rates tend to underestimated TMRCA even for slow markers, and that calibrated mutation rates estimated from FTDNA projects for family clans fail when extrapolated to a different dataset. Thus where does this leave us? We have to keep collecting more data of father-son pairs, and even three generation triples so it would at least give us a more descent picture of how mutation rates work. In any case, if we find a TMRCA using a set of slow STRs which is far greater than that of a combined set of STRs, chances are that the slow STRs TMRCA represents the true TMRCA, as the combined set might be affected by back mutations happening in the other STRs which even the correction factor might not account for. Should anyone have any questions, please let me know.

Kind Regards,

Jean Lohizun.; Tuesday, September 06, 2011 12:38:00 am
Anatole Klyosov said...: Dienekes said...

AK >>Hence, "Eniseian", "Proto-Turkic", "Sino-Caucasian", "Sumer", "North-Caucasian", "Basque" languages, which are, apparently, all tips of one and the same iceberg, which was an ancient R1b language. Erbin.

Dienekes Linking these various languages is a very controversial, albeit valid hypothesis.

The link, however, of the proposed macro-family with R1b is completely arbitrary. What is the evidence that "Sumer" was R1b or that Sino-Tibetan (which is part of Sino-Caucasian) was R1b?

A purpose of my "presentation" here is not to convince everyone, or not even lay out here all relevant facts and observations. There is no room for it, neither my desire, neither your intention to listen. A purpose is to show how science works: one collects some facts and observations, advance a hypothesis which explains a chain of things and events which are not explained as yet, and sometimes have not even been cosidered at the same angle, then, examine pros and cons, add some other facts and observations which were missed in the first version, adjust some parts of the hypothesis, etc. It is a never ending process.

Nothing is easier than to sit on a fence and criticize without adding anything to our knowledge in this particular case.

Nothing is easier than to say "The link, however, of the proposed macro-family with R1b is completely arbitrary". In fact, it is not. However, offer another haplogroup, or several of them if you KNOW (based on DATA) which are those several ones. Q? No, it is very unlikely. O? N? R1a? I highly doubt it. There is only one haplogroup, R1b1, which made this way, left a treck of R1b, and those R1b in Asia-Middle East today talk agglutinative languages with common elements between them. Some of them still speak very archaic and distinct varians of Turkic.

What is the evidence that "Sumer" was R1b

Nothing in this concept is out of the blue. Assyrians as likely descendants of Sumers have R1b as the predominant haplogroup compared with others. R1b1 in the Middle East is for 6000-5500 years, since the Sumerian times. Jews of R1b1a2 have the TMRCA of 5500 ybp, Sumerian times. Sumerian language at different times was associated with Scythian language, Turkic language, Manjurian language, North Caucasian languages.

It is not my job to do a linguisic analysis, I am not qualified. However, it is my job to point in this direction.

First of all, R1b1 is not from the Altai even if R is from Central Asia (something that is uncertain in itself).

I do not know how you define the "Altai", however, South of Altai/Altay is located in Xinjiang, with the town named Altay there. Many Uigurs have R1b1, which is VERY different from European R1b, with an estimated common ancestor 16,000 ybp.

Again, I do not have a desire to discuss here things when I do not see a presence of a receptive ear.

Wait for publications.

Regards,

Anatole Klyosov; Tuesday, September 06, 2011 1:54:00 am
Anatole Klyosov said...: My "opponent" has demostrated once more what he is not qualified. Nobody in a right mind calculates TMRCA using just one marker, in this case DYS388.

I repeat - "hopeless".

Anatole Klyosov; Tuesday, September 06, 2011 2:13:00 am
Gioiello said...: Lohisun writes:
”In any case, if we find a TMRCA using a set of slow STRs which is far greater than that of a combined set of STRs, chances are that the slow STRs TMRCA represents the true TMRCA, as the combined set might be affected by back mutations happening in the other STRs which even the correction factor might not account for”.

Back mutations and forwards ones I’d say.
I have expressed (also to Anatole Klyosov and to Ken Nordtvedt, who know mathematics better than me) these same concepts from many years but in vain:

1) mutations around the modal
2) convergence to the modal as time passes
3) clusters when a mutation (backwards or forwards) goes for the tangent (of course mainly of slow mutating markers).; Tuesday, September 06, 2011 7:39:00 am
Anatole Klyosov said...: Gioiello said...

1) mutations around the modal
2) convergence to the modal as time passes
3) clusters when a mutation (backwards or forwards) goes for the tangent (of course mainly of slow mutating markers).

Dear Gioiello,

Maybe there are deep thoughts behind those three items, however, the language employed in describing them effectively nullifies any use of them. Why wouldn't you give examples to illustrate what you mean?

What is "mutations around the modal"? Are 12-->13 and 12-->11 in, say,DYS388 are not "around the modal"? How about 12-->13-->14? Are they not "around the modal? What is new in what you have said? What problem does it solve?

What is "convergence to the modal as time passes"? Do you mean reverse mutations? Of course some of them returned back to the initial, base haplotype. It is all described mathematically and it is the core of calculations in "my" approach.

What is "clusters when a mutation (backwards or forwards) goes for the tangent"??

Care for an example, please?

Generally, it is good to have fast AND slow mutating markers in a haplotype dataset for calculations, since they balance each other. When a common ancestor lived only a few centuries ago, "slow" mutations are silent. So, effectively they are not there. Between, say, 2000-5000 ybp both slow and fast markers are good. For more then 10,000 ybp and to 100,000 ybp and older I have developed the 22 "superslow" marker panel, one mutation in those happens on average in about 5,000 years.

A correction for back mutation is applied to the whole panel, not to single markers. It does not matter that some markers are slow and some fast. We work with average values. When you pump air into your tire, some molecules move like crazy, some slower. However, your manometer shows a stable, average pressure. Whole chemistry also stands on that concept, because molecules move very differently from each other.

When you toss a coin, heads and tails happen in various combinations, however, the average in 0.5, but only after MANY tosses.

Mutations in haplotypes behave in the same fashion.

Regards,

Anatole Klyosov; Tuesday, September 06, 2011 1:42:00 pm
Gioiello said...: On Dec 30, 2009, at 10:24 AM, Gioiello Tognoni wrote:

Ken, as I said to you in the mail of 11 Dec 2009:
“Ken, I know better my R1b1b2.
DYS426 is a very slow mutating marker.
R1b1* had 12
R1b1b2 (L23-) had 11
R1b1b2/L23+ (mine) had and has 12
R1b1b2/L51+ had and has 13
All subclades have 12
From R1b1* to R1b1b2a1b may have passed 40,000 years and not a few thousands. Probably they have passed less than 40,000, but certainly not about 6,000 as Vizachero pretends and hopes.
In the meanwhile faster mutating markers have changed many times around the modal and perhaps now are the same of the origin, except someone: see DYS385: R1b1b2 14-11, R1a1 11-14 etc.
This is my thought (and my hope if you want)”.
How many years do you calculate from R1b1* to R1b1b2a1b? If a very slow mutating marker like DYS426 has had 4 mutations, how many mutations have had other markers which mutate faster?
Do you consider all their mutations in your calculations? And how about what I said re: mtDNA, where the more recent clades have less mutations because they derive from a purified mitochondrial?
Gioiello; Tuesday, September 06, 2011 3:18:00 pm
jeanlohizun said...: Just some thoughts folks, once more Dr.Anatole Klyosov has resorted to “strawman” instead of the main subject at hand. I said that the usage of an estimated TMRCA to calculate mutation rates per generation of an STR marker likely results in the calibration method yielding a faster mutation rate than the actual mutation rate. That was clearly proven in my examples above. When one assumes that the R1b-P312 DYS388 mutations obey to a common estimated ancestry of 4000 ybp, we get a mutation rate that is much faster than the one observed if we were to assume the Iberian sample from Adams et al(2008)had a TMRCA of 3625 ybp . Indeed 0.00022 is certainly faster than 0.000175. The usage of DYS392 was to show the readers that often times the estimated mutation rates yield more recent TMRCA than the observed mutation rates on father-son pairs.

In fact in the ideal world of constant mutation rates, almost all markers if used independently or in combined sets should yield a relative close TMRCA. They would always be likely outliers(i.e. DYS388 in the Iberian sample), nonetheless if you have a set of 19 markers and the slowest 5 of them produce TMRCA which is far older than the combined set, then this means that all those assumption taken to use a mean mut/marker/gen broke down. This is mainly because the standard deviation in those mean mutation rates per generation are so huge that they almost reach the same value as the mean.

Folks as you can see what I mentioned before of Dr.Anatole Klyosov assumptions is indeed what he believes in, just take a look at what he wrote:

Generally, it is good to have fast AND slow mutating markers in a haplotype dataset for calculations, since they balance each other. When a common ancestor lived only a few centuries ago, "slow" mutations are silent. So, effectively they are not there. Between, say, 2000-5000 ybp both slow and fast markers are good. For more then 10,000 ybp and to 100,000 ybp and older I have developed the 22 "superslow" marker panel, one mutation in those happens on average in about 5,000 years.

This shows that his methodology assumed that the amount of mutations per marker present in fast markers should be directly proportional to their mutation rate vs the amount of mutation per marker on the slow markers. Meaning if the 9 fastest markers mutate 4.6 times faster than the slow markers then there should be about 4.6 more mutations per marker present in the fast mutating positions than in the slow ones. Not the case in reality, take for example the Iberian sample from Adams et al(2008) it has a total of 0.3162 mutations per marker in the 9 fast mutation position, with a mean mutation per generation rate of 0.0028, on the other hand the 5 slowest markers only have 0.1237 mutations per marker with a mean mutation rate of 6.08*10^-4.

0.3162/0.1237=2.55 vs the 4.6 expected ratio.

Again there is no such balance, in fact, by assuming that they would balance each other out, he is assuming that the TMRCA would likely be in the 2000-5000 ybp time frame, so he doesn’t even care about testing the set using different STRs sets to check whether the TMRCA is indeed in that time frame or not. Should you folks have any questions please do not hesitate to ask.

Regards,

Jean Lohizun; Tuesday, September 06, 2011 6:40:00 pm
Anatole Klyosov said...: Hopeless.

You completely ignore margins of error, which are much higher when you operate with slow markers, which produce MUCH less mutations.

As a result you get ~ 50% margin of error, which explain the disparity.

That is why I do not use individual markers but whole panels of markers.

You need to practice more in order to understand such things.; Sunday, September 18, 2011 1:19:00 am

August 27, 2011

Y-STR variance of Busby et al. (2011) dataset

55 comments:

Old Blog Archive

Articles

Calculators

My Other Blogs

Reference

Blogroll