Comments on Dienekes’ Anthropology Blog: Y-STR variance of Busby et al. (2011) dataset

Hopeless. You completely ignore margins of error,...

2011-09-18T01:19:50.035+03:00

Hopeless.

You completely ignore margins of error, which are much higher when you operate with slow markers, which produce MUCH less mutations.

As a result you get ~ 50% margin of error, which explain the disparity.

That is why I do not use individual markers but whole panels of markers.

You need to practice more in order to understand such things.

Just some thoughts folks, once more Dr.Anatole Kly...

2011-09-06T18:40:10.884+03:00

Just some thoughts folks, once more Dr.Anatole Klyosov has resorted to “strawman” instead of the main subject at hand. I said that the usage of an estimated TMRCA to calculate mutation rates per generation of an STR marker likely results in the calibration method yielding a faster mutation rate than the actual mutation rate. That was clearly proven in my examples above. When one assumes that the R1b-P312 DYS388 mutations obey to a common estimated ancestry of 4000 ybp, we get a mutation rate that is much faster than the one observed if we were to assume the Iberian sample from Adams et al(2008)had a TMRCA of 3625 ybp . Indeed 0.00022 is certainly faster than 0.000175. The usage of DYS392 was to show the readers that often times the estimated mutation rates yield more recent TMRCA than the observed mutation rates on father-son pairs.

In fact in the ideal world of constant mutation rates, almost all markers if used independently or in combined sets should yield a relative close TMRCA. They would always be likely outliers(i.e. DYS388 in the Iberian sample), nonetheless if you have a set of 19 markers and the slowest 5 of them produce TMRCA which is far older than the combined set, then this means that all those assumption taken to use a mean mut/marker/gen broke down. This is mainly because the standard deviation in those mean mutation rates per generation are so huge that they almost reach the same value as the mean.

Folks as you can see what I mentioned before of Dr.Anatole Klyosov assumptions is indeed what he believes in, just take a look at what he wrote:

Generally, it is good to have fast AND slow mutating markers in a haplotype dataset for calculations, since they balance each other. When a common ancestor lived only a few centuries ago, "slow" mutations are silent. So, effectively they are not there. Between, say, 2000-5000 ybp both slow and fast markers are good. For more then 10,000 ybp and to 100,000 ybp and older I have developed the 22 "superslow" marker panel, one mutation in those happens on average in about 5,000 years.

This shows that his methodology assumed that the amount of mutations per marker present in fast markers should be directly proportional to their mutation rate vs the amount of mutation per marker on the slow markers. Meaning if the 9 fastest markers mutate 4.6 times faster than the slow markers then there should be about 4.6 more mutations per marker present in the fast mutating positions than in the slow ones. Not the case in reality, take for example the Iberian sample from Adams et al(2008) it has a total of 0.3162 mutations per marker in the 9 fast mutation position, with a mean mutation per generation rate of 0.0028, on the other hand the 5 slowest markers only have 0.1237 mutations per marker with a mean mutation rate of 6.08*10^-4.

0.3162/0.1237=2.55 vs the 4.6 expected ratio.

Again there is no such balance, in fact, by assuming that they would balance each other out, he is assuming that the TMRCA would likely be in the 2000-5000 ybp time frame, so he doesn’t even care about testing the set using different STRs sets to check whether the TMRCA is indeed in that time frame or not. Should you folks have any questions please do not hesitate to ask.

Regards,

Jean Lohizun

On Dec 30, 2009, at 10:24 AM, Gioiello Tognoni wro...

2011-09-06T15:18:47.180+03:00

On Dec 30, 2009, at 10:24 AM, Gioiello Tognoni wrote:

Ken, as I said to you in the mail of 11 Dec 2009:
“Ken, I know better my R1b1b2.
DYS426 is a very slow mutating marker.
R1b1* had 12
R1b1b2 (L23-) had 11
R1b1b2/L23+ (mine) had and has 12
R1b1b2/L51+ had and has 13
All subclades have 12
From R1b1* to R1b1b2a1b may have passed 40,000 years and not a few thousands. Probably they have passed less than 40,000, but certainly not about 6,000 as Vizachero pretends and hopes.
In the meanwhile faster mutating markers have changed many times around the modal and perhaps now are the same of the origin, except someone: see DYS385: R1b1b2 14-11, R1a1 11-14 etc.
This is my thought (and my hope if you want)”.
How many years do you calculate from R1b1* to R1b1b2a1b? If a very slow mutating marker like DYS426 has had 4 mutations, how many mutations have had other markers which mutate faster?
Do you consider all their mutations in your calculations? And how about what I said re: mtDNA, where the more recent clades have less mutations because they derive from a purified mitochondrial?
Gioiello

Gioiello said... 1) mutations around the modal 2...

2011-09-06T13:42:55.414+03:00

Gioiello said...

1) mutations around the modal
2) convergence to the modal as time passes
3) clusters when a mutation (backwards or forwards) goes for the tangent (of course mainly of slow mutating markers).

Dear Gioiello,

Maybe there are deep thoughts behind those three items, however, the language employed in describing them effectively nullifies any use of them. Why wouldn't you give examples to illustrate what you mean?

What is "mutations around the modal"? Are 12-->13 and 12-->11 in, say,DYS388 are not "around the modal"? How about 12-->13-->14? Are they not "around the modal? What is new in what you have said? What problem does it solve?

What is "convergence to the modal as time passes"? Do you mean reverse mutations? Of course some of them returned back to the initial, base haplotype. It is all described mathematically and it is the core of calculations in "my" approach.

What is "clusters when a mutation (backwards or forwards) goes for the tangent"??

Care for an example, please?

Generally, it is good to have fast AND slow mutating markers in a haplotype dataset for calculations, since they balance each other. When a common ancestor lived only a few centuries ago, "slow" mutations are silent. So, effectively they are not there. Between, say, 2000-5000 ybp both slow and fast markers are good. For more then 10,000 ybp and to 100,000 ybp and older I have developed the 22 "superslow" marker panel, one mutation in those happens on average in about 5,000 years.

A correction for back mutation is applied to the whole panel, not to single markers. It does not matter that some markers are slow and some fast. We work with average values. When you pump air into your tire, some molecules move like crazy, some slower. However, your manometer shows a stable, average pressure. Whole chemistry also stands on that concept, because molecules move very differently from each other.

When you toss a coin, heads and tails happen in various combinations, however, the average in 0.5, but only after MANY tosses.

Mutations in haplotypes behave in the same fashion.

Regards,

Anatole Klyosov

Lohisun writes: ”In any case, if we find a TMRCA u...

2011-09-06T07:39:51.949+03:00

Lohisun writes:
”In any case, if we find a TMRCA using a set of slow STRs which is far greater than that of a combined set of STRs, chances are that the slow STRs TMRCA represents the true TMRCA, as the combined set might be affected by back mutations happening in the other STRs which even the correction factor might not account for”.

Back mutations and forwards ones I’d say.
I have expressed (also to Anatole Klyosov and to Ken Nordtvedt, who know mathematics better than me) these same concepts from many years but in vain:

1) mutations around the modal
2) convergence to the modal as time passes
3) clusters when a mutation (backwards or forwards) goes for the tangent (of course mainly of slow mutating markers).

My "opponent" has demostrated once more ...

2011-09-06T02:13:47.799+03:00

My "opponent" has demostrated once more what he is not qualified. Nobody in a right mind calculates TMRCA using just one marker, in this case DYS388.

I repeat - "hopeless".

Anatole Klyosov

Dienekes said... AK >>Hence, "Eniseia...

2011-09-06T01:54:02.011+03:00

Dienekes said...

AK >>Hence, "Eniseian", "Proto-Turkic", "Sino-Caucasian", "Sumer", "North-Caucasian", "Basque" languages, which are, apparently, all tips of one and the same iceberg, which was an ancient R1b language. Erbin.

Dienekes Linking these various languages is a very controversial, albeit valid hypothesis.

The link, however, of the proposed macro-family with R1b is completely arbitrary. What is the evidence that "Sumer" was R1b or that Sino-Tibetan (which is part of Sino-Caucasian) was R1b?

A purpose of my "presentation" here is not to convince everyone, or not even lay out here all relevant facts and observations. There is no room for it, neither my desire, neither your intention to listen. A purpose is to show how science works: one collects some facts and observations, advance a hypothesis which explains a chain of things and events which are not explained as yet, and sometimes have not even been cosidered at the same angle, then, examine pros and cons, add some other facts and observations which were missed in the first version, adjust some parts of the hypothesis, etc. It is a never ending process.

Nothing is easier than to sit on a fence and criticize without adding anything to our knowledge in this particular case.

Nothing is easier than to say "The link, however, of the proposed macro-family with R1b is completely arbitrary". In fact, it is not. However, offer another haplogroup, or several of them if you KNOW (based on DATA) which are those several ones. Q? No, it is very unlikely. O? N? R1a? I highly doubt it. There is only one haplogroup, R1b1, which made this way, left a treck of R1b, and those R1b in Asia-Middle East today talk agglutinative languages with common elements between them. Some of them still speak very archaic and distinct varians of Turkic.

What is the evidence that "Sumer" was R1b

Nothing in this concept is out of the blue. Assyrians as likely descendants of Sumers have R1b as the predominant haplogroup compared with others. R1b1 in the Middle East is for 6000-5500 years, since the Sumerian times. Jews of R1b1a2 have the TMRCA of 5500 ybp, Sumerian times. Sumerian language at different times was associated with Scythian language, Turkic language, Manjurian language, North Caucasian languages.

It is not my job to do a linguisic analysis, I am not qualified. However, it is my job to point in this direction.

First of all, R1b1 is not from the Altai even if R is from Central Asia (something that is uncertain in itself).

I do not know how you define the "Altai", however, South of Altai/Altay is located in Xinjiang, with the town named Altay there. Many Uigurs have R1b1, which is VERY different from European R1b, with an estimated common ancestor 16,000 ybp.

Again, I do not have a desire to discuss here things when I do not see a presence of a receptive ear.

Wait for publications.

Regards,

Anatole Klyosov

Here is his results: ” The "mutation count&q...

2011-09-06T00:38:32.533+03:00

Here is his results:

” The "mutation count" method gives 2796/750/19 = 0.196±0.004 mutations per marker (without a correction for back mutations, that is λobs =0.196±0.004), or after the correction it is 0.218 ± 0.004
mutations per marker, or 0.218/0.0015 = 145±15 generations, that is 3625±370 years to a common ancestor of all 750 Iberian R1b1 haplotypes.”

http://www.jogg.info/52/files/Klyosov1.pdf

Now let try to see if the readily available mutation rate for DYS388 which was shown to yield similar results in two different datasets according to Klyosov produces the TMRCA in this different sample. There are 19 mutations in the DYS388, with 17 up and 2 down, this has a degree of asymmetry of 0.8947, thus a=0.6233, a1=0.3149.

The observed mutation rate per marker is:

19/750=0.02533 mutations per marker.

Applying the correction factor yields 0.02543.

Now 0.02543/0.00022 mut/marker/gen=116 generations (2890 ybp) clearly outside of the range of error of 3625+-370 ybp. Now does this mean that all Iberians descend from a common ancestor who lived 2890 years ago, likely not.What this means if we assume the TMRCA of Iberians to be 3625 ybp(145 generations) as Anatole Klyosov claimed, then the estimated mut/marker/gen that would produce such TMRCA would be 1.754*10^-4. Thus proving what I said above that using an underestimated TMRCA(i.e. 4000 ybo for R1b-P312) would likely result in the mutation being faster than it truly is(i.e. 0.00022>0.000175).

Now if we look at DYS392 which has 71 mutations (35 mutations up, 36 down) from the base haplotype. This has a factor of symmetry of 0.5070, thus a=1.983*10^-4, and a1=0.9989. The observed mutation rate per marker is:

71/750=0.09466

Applying the correction factor yields 0.09935.

Now using estimated mean mut/marker/gen of Chandler et al(2006) of 0.00052 yields:
0.09935/0.00052 mut/marker/gen=191 generations(4776 ybp).

Using empirically measured mutation rates for DYS392 which were estimated to have a mean mut/marker/gen of 4.123 × 10-4 ( 95% CI: 1.513 × 10^-4 to 8.972 × 10^-4 ) we get0.09935/0.0004123=241 generations(6024 ybp).

So thus far it has become very clear that estimated mutation rates tend to underestimated TMRCA even for slow markers, and that calibrated mutation rates estimated from FTDNA projects for family clans fail when extrapolated to a different dataset. Thus where does this leave us? We have to keep collecting more data of father-son pairs, and even three generation triples so it would at least give us a more descent picture of how mutation rates work. In any case, if we find a TMRCA using a set of slow STRs which is far greater than that of a combined set of STRs, chances are that the slow STRs TMRCA represents the true TMRCA, as the combined set might be affected by back mutations happening in the other STRs which even the correction factor might not account for. Should anyone have any questions, please let me know.

Kind Regards,

Jean Lohizun.

Here is something folks so you can see how the app...

2011-09-06T00:37:38.932+03:00

Here is something folks so you can see how the approach taken by Klyosov in calculating TMRCA by calibrating the mutations rates per generation using family clan projects interacts with reality. Given that Klyosov has completely dismissed any empirically measured data from father-son pairs because of what he calls “poor statistics”, mainly fluctuations on the mutation rate per generation when two different samples are tested. He claims such is not the case with his calculations; this is what he said about it:

In the R1b1a2-P312 series of 2299 haplotypes (that is, 2299 DYS388 values) - 82 mutations. Since the calculations gave 4000 ybp for the dataset (to a common ancestor), that is 160 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 82/2299/160 = 0.00022 per generation.

In the R1a1 series of 1198 haplotypes (that is, 1198 DYS388 values) - 48 mutations (including 17 mutations with DYS388=10 counting them as one mutation each). Since the calculations gave 4600 ybp for the dataset, that is 184 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 48/1198/184 = 0.000217 per generation, that is practically 0.00022.

Of course because the TMRCA is not a known one, but an estimated one using the same methodology, this is going in circles. Why, because if his mean mutation rate methodology was known for underestimating TMRCA, using an underestimated TMRCA would likely result in the mutation being faster than it truly is. Nonetheless I have argued that one cannot extrapolate estimated data from one set to another, instead it is better to use the empirically measured data in father-son pairs. But let’s assume for a second that Anatole Klyosov is correct, and as he showed above we should expect DYS388 to yield an average mutation rate around 0.00022, let see what happens when we use a different sample:

The Iberian sample from Adams et al(2008) was used in the first paper published by Anatole Klyosov to show as a practical example of his methodology. He shows put the base haplotype of DYS388 as 12. The Iberian sample has 19 mutations (17 mutations up, 2 down) found in the position DYS388 amongst the 750 R1b-M269 haplotypes. Anyone can do the analysis which is found in the supplementary information section:
http://download.cell.com/AJHG/mmcs/journals/0002-9297/PIIS0002929708005922.mmc1.pdf

Now Anatole Klyosov in his study used the mean mut/marker/gen of 0.0015 mut/marker/gen obtained from measuring the amount of mutations in the same 19 STRs used by Adams et al(2008) on the members of the Donald Clan project, which all descend from a punative ancestor John Lord of the Isle who was an R1a1 bearer and lived 26 generations ago. With this mean mutation Anatole Klyosov dated the TMRCA of Iberians to 3625+-370 ybp.

to be continued

Hence, "Eniseian", "Proto-Turkic&qu...

2011-09-05T21:20:46.998+03:00

Hence, "Eniseian", "Proto-Turkic", "Sino-Caucasian", "Sumer", "North-Caucasian", "Basque" languages, which are, apparently, all tips of one and the same iceberg, which was an ancient R1b language. Erbin.

Linking these various languages is a very controversial, albeit valid hypothesis.

The link, however, of the proposed macro-family with R1b is completely arbitrary. What is the evidence that "Sumer" was R1b or that Sino-Tibetan (which is part of Sino-Caucasian) was R1b?

Regarding "Mongoloids", both Q and R are sister haplogroups, both are from Central Asia, and, maybe, both are from the same Altai region. So why R1b1 and Q could not interacted some 16,000 years ago and much later?

First of all, R1b1 is not from the Altai even if R is from Central Asia (something that is uncertain in itself).

Second, R1b1 is not the same as R, and R1b1 is absent in a great many Altaic speaking populations. The fact that R1b1 is found in some Turkic populations does not mean that it goes up to the Proto-Altaic population; indeed the largely Mongoloid character of that population and the absence of R1b1 in most of its extant branches argue strongly against it. So, you have trouble getting R1b1 to Proto-Altaic, let alone whatever hypothetical "Erbin" might have been spoken in even more ancient times in inner Asia.

Dienekes said... Onur said... All right, now yo...

2011-09-05T17:44:01.826+03:00

Dienekes said...
Onur said...

All right, now you are talking on some interesting things, minus confusions, unrelated to the subject of R1b1 appearance, migrations, and their language.

For example, known Turkic languages which appear in the 1st millennium AD, are irrelevant here, since we are talking on R1b1 between, say, 16 and 6 thousand ybp. So why constantly bring in contemporary Turkic languages?

If you folks did not catch it, my "
R1b-Proto-Turkic theory" can be equally called "R1b-Sino-Caucasian theory", or "R1b-Erbin theory", or whatever, since linguists do not have a name for that language. They apparently see some remnants of it, however, do not know where to assign it. Hence, "Eniseian", "Proto-Turkic", "Sino-Caucasian", "Sumer", "North-Caucasian", "Basque" languages, which are, apparently, all tips of one and the same iceberg, which was an ancient R1b language. Erbin.

Here is what I am talking about. All said languages are agglutinative languages, all belong to different time periods, all are found on the migration route of R1b1. That is why they do not look alike. They reflect different millennia, hence, differ from each other but all have similar elements. That is why Sumer language was considered as Manjurian, Siberian, having Turkic elements, etc. by variuos linguists, and all, of course, are in diagreement with each other. Sound familiar? That is why Basque language was considered as related to North-Caucasian languages, or Sino-Caucasian, and some linguists support it and some deny it. It is normal, since all those are ancient languages.

Regarding "Mongoloids", both Q and R are sister haplogroups, both are from Central Asia, and, maybe, both are from the same Altai region. So why R1b1 and Q could not interacted some 16,000 years ago and much later?

Unless we all agree that what is written above makes sense, there is no point to argue.

Are we all agree? If not, why not?

Anatole Klyosov

jeanlohizun said... Well, forget it. It does not...

2011-09-05T17:20:58.021+03:00

jeanlohizun said...

Well, forget it. It does not matter what he said. The individual has shown that he was not receptive to any data, reasoning, and explanations. He is a lost case.

I gave up.

Anatole Klyosov

From the early Chinese records we know that Altaic...

2011-09-05T16:58:11.620+03:00

From the early Chinese records we know that Altaic speaking tribes (including Turkic speaking ones) primarily lived in a region north of, not west of, where Chinese speakers lived in early Chinese historical times (well into the Imperial Chinese times).

Anatole, just in case you don't know already, let me inform you: All the regions north of Chinese speakers have been Mongoloid regions from time immemorial.

Anatole Klyosov said: Examples for father-son tra...

2011-09-05T15:17:06.712+03:00

Anatole Klyosov said:

Examples for father-son transmissions:

-- in the Ballantine series: 0 mutations in 1636 pairs, that is the MRC is <0.0006 per generation,
-- in the Burgarella collection: 0.00042 per generation.

Do you see how jumpy and inconsistent data are in father-son pairs, even for about 2000 of them?

The data is neither jumpy nor inconsistent, it is randomly collected data, so it comes as no surprise that DYS388 being a slow markers had no mutations present in one of the samples, it is still far better than some estimated mutational rate with a trillion assumptions. Only a person like you would argue that estimates are better than empirical data.

Anatole Klyosov said:

In the R1b1a2-P312 series of 2299 haplotypes (that is, 2299 DYS388 values) - 82 mutations. Since the calculations gave 4000 ybp for the dataset (to a common ancestor), that is 160 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 82/2299/160 = 0.00022 per generation.

:-)) How did you know their common ancestor was 160 “conditional generations” ago? Well you had to use estimated numbers from another data set. So you seem to forget that if there is an intrinsic error carried from the other calculations (which I’m pretty sure by now, there is), this mutation rate constant would also carry that intrinsic error. Again it seems we are going in circles.

Anatole Klyosov said:

I do not care about you degree, I have seen in my life tons of ignorant people with degrees. Regarding "STR mutation rates" you are totally unqualified.

I would suggest you to listen to a professional.

Of course you don’t care about my degree, because is not of importance to this conversation, as for me being unqualified to talk about “STR mutation rates”, well let the people chose who is qualified and who is not. Yes I listen to professionals everyday at my workplace, and just because I listen to them, which is why I can clearly show everyone here how flawed your methodology is. I would tell you though: that you do not sound like a professional at all.

Kind Regards,

Jean Lohizun.

Anatole Klyosov said: If you care, in the Proceed...

2011-09-05T15:16:28.729+03:00

Anatole Klyosov said:

If you care, in the Proceedings, December 2010, p. 2039-2058, the first paper has the title "Reconsideration of the average mutation rate constant for 67 marker haplotypes from 0.145 to 0.120 mutations per haplotype per generation", by Klyosov and Rozhanskii. Hundreds of 67 marker datasets are considered there, to make the reconsideration. Can you see how serious people are who work in the area?

I can see that you stubbornly keep
using the average of combined sets of STRs (Slow+Fast), even though anyone with access to any data set can see by running a quick experiment that often times the data from one data set cannot be extrapolated to another one, even more so, that in big data sets fast STRs undermine the TMRCA compared to slow STRs. So what if you analyzed hundreds of 67 marker datasets, none of them were actual datasets of father-son pairs, where one can truly see the true mutation rate of several markers across a generation. All of them were based on the assumption of a single common ancestor who live in x time ago, and who you assume is the common ancestor of all branches of that tree, and all branches are x generations from it. Again all these assumptions carry a lot of built in errors with it them that you keep dismissing. Look let’s talk science and not psychology, if you were “a serious member who works in the area” like you claim to be, you would have had a grant, you would have gone into the field and collected data yourself, and your results would have been in one of the major Genetic Journals. Instead you scavenge onto others collected data(i.e. Adams et al.2008) or use data collected from projects from FTDNA, that no single serious person would consider using because it lacks the quality control required for it to be consider a randomly collected representation of a population.

Anatole Klyosov said:

I have explained why the father-son pairs are not there yet, and why they often give misleaging and incorrect data due to poor statistics. Furthermore, I gave above concrete examples of those incorrect and inconsistent data from multiple father-son pairs. It is not my fault that you cannot understand it.

No quite, the fact that the sample size might be small doesn’t invalidate it at all, at least is still the far best empirically collected data we got of mutational rates across a generation. So they are often misleading because they contrast your estimated mutations rates. Thus your argument turns again into if it doesn’t agree with my data, then it is wrong. Well your data is subject to tons of errors, and I tell you what: For the 25 marker, find the average mutation rate in each one of those 25 positions, then tell me what the mean mutation rate is, and what the standard deviation for that mean is. So your methodology is misleading and incorrect because of poor statistics(i.e. The mean mut/marker/gen for 25 markers is in the same range~0.002 that the standard deviation).

Anatole Klyosov said:

You are constantly bragging that I am not a geneticist. Mutation rate constants are not "genetics", it is chemical kinetics, my direct profession. You fail to understant it as well.

No, it is more like you are constantly bragging of being an expert and attacking others by calling them ignorant, I was simply kindly reminding you that you are no geneticist. The thing is that there might be no such thing as mutation rate constants. Moreover molecular genetics papers require deep knowledge of Statistics, Math, and Biomechanics which are not your direct profession.

Anatole Klyosov said:

Here your ignorance goes again. It is not "news"….

Here goes your favorite tactic again, Ad Hominems left and right. Yes in the ideal world a position along the Y-Chomosome where a Short Tardem Repeat occurs should have nothing to do with a SNPs found along other position in the Y-Chromosome, however there have been correlations found between them. This is something rather puzzling yet true.

to be continued..

I _did_ advance a hypothesis about the origin of T...

2011-09-05T12:19:54.433+03:00

I _did_ advance a hypothesis about the origin of Turkic languages: that they came from Siberia/Central Asia and were associated with Mongoloids initially.

It is very unlikely that Proto-Turkic or any other Altaic main branch or Altaic itself developed in Central Asia proper. From the early Chinese records we know that Altaic speaking tribes (including Turkic speaking ones) primarily lived in a region north of, not west of, where Chinese speakers lived in early Chinese historical times (well into the Imperial Chinese times). Central Asia proper, on the other hand, was home to various Indo-European speakers before the Turkic and other Altaic expansions, which began during the 1st millennium BCE at the earliest (for most of the southern regions of Central Asia proper, as late as the 2nd millennium CE), thus well after the formation of the main branches of the Altaic language family. I also think the term "Altaic" is a misnomer, as the Altaic main branches are concentrated in a region east of the Altai mountains, so the Altaic homeland is probably east of the Altai region.

Correction to: Central Asia proper was originally inhabited by full or almost full Caucasoids until the migrations of Altaic speakers there beginning from the 1st millennium BCE at the earliest

Central Asia proper was originally inhabited by non-Altaic-speaking full or almost full Caucasoids until the migrations of Altaic-speaking Mongoloids to there from the east beginning from the 1st millennium BCE at the earliest

This is a fundamental part of a scientific paradig...

2011-09-05T09:11:42.680+03:00

This is a fundamental part of a scientific paradigm: if you do not know, and cannot even suggest, do not argue.

Science is not based on dismissals and denials, it is based on advancing of hypotheses and on their examinations and verifications.

Science IS based on dismissals and denials: it's called falsification. What I wrote is an entire array of arguments falsifying your R1b-Proto-Turkic theory. You can argue against my arguments, if you want, but you can't claim that falsification is not scientific.

Moreover, if you actually read what I wrote, you'd see that I _did_ advance a hypothesis about the origin of Turkic languages: that they came from Siberia/Central Asia and were associated with Mongoloids initially.

>jeanlohizun said... So you think that you es...

2011-09-05T03:03:15.738+03:00

>jeanlohizun said...

So you think that you estimated mean mutation rate from the Donald clan project is far more reliable than using empirically measured mutation data from Father-sons pairs?

No doubt. However, you forgot to mention that the Donald Clan was just the first step, and after it the data obtained were cross-examined and cross-verified on dozens of genealogies, historical events and other evaluations, and some adjustments were made and again examined and verified.

If you care, in the Proceedings, December 2010, p. 2039-2058, the first paper has the title "Reconsideration of the average mutation rate constant for 67 marker haplotypes from 0.145 to 0.120 mutations per haplotype per generation", by Klyosov and Rozhanskii. Hundreds of 67 marker datasets are considered there, to make the reconsideration. Can you see how serious people are who work in the area?

I have explained why the father-son pairs are not there yet, and why they often give misleaging and incorrect data due to poor statistics. Furthermore, I gave above concrete examples of those incorrect and inconsistent data from multiple father-son pairs. It is not my fault that you cannot understand it.

You are constantly bragging that I am not a geneticist. Mutation rate constants are not "genetics", it is chemical kinetics, my direct profession. You fail to understant it as well.

Well I got news for you: certain STRs such as DYS388 mutate differently in different haplogroups, so there goes the first strike in using the data from R1a1 to R1b1a2.

:-))))))))))

Here your ignorance goes again. It is not "news", I have researched into it for years. The verdict: DYS388 as well as ALL mutation rate constants are the same for ALL haplogroups. In short - the copying enzyme (in fact, the whole copying machinery) does not know what haplogroup is picked by us for that particular individual or a population.

DYS388 seems to be "jumpy" in J2 haplogroup only because one does not separate branches on a haplotype tree. Because the haplogroup is ancient one, it contains many branches with different DYS388 values. When one mixes them, the dataset contains DYS388 alleles in a rather wide range. When one separates branches, DYS388 is the same in each one of them. The best mutation rate constant for DYS388 is 0.00022 per the conditional generation of 25 years. It is THE SAME for all haplogroups and their subclades.

Examples for father-son transmissions:

-- in the Ballantine series: 0 mutations in 1636 pairs, that is the MRC is <0.0006 per generation,
-- in the Burgarella collection: 0.00042 per generation.

Do you see how jumpy and inconsistent data are in father-son pairs, even for about 2000 of them?

In the R1b1a2-P312 series of 2299 haplotypes (that is, 2299 DYS388 values) - 82 mutations. Since the calculations gave 4000 ybp for the dataset (to a common ancestor), that is 160 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 82/2299/160 = 0.00022 per generation.

In the R1a1 series of 1198 haplotypes (that is, 1198 DYS388 values) - 48 mutations (including 17 mutations with DYS388=10 counting them as one mutation each). Since the calculations gave 4600 ybp for the dataset, that is 184 "conditional generations" of 25 years, the mutation rate constant for DYS388 is equal to 48/1198/184 = 0.000217 per generation, that is practically 0.00022.

In the Chandler table it is the same 0.00022 per generation.

you are assuming that I have no previous experience with STR mutation rates, as well as, that I do not have a degree in a related field.

I do not care about you degree, I have seen in my life tons of ignorant people with degrees. Regarding "STR mutation rates" you are totally unqualified.

I would suggest you to listen to a professional.

Anatole Klyosov

Dienekes said... Turkic languages belong to the ...

2011-09-05T01:56:11.668+03:00

Dienekes said...

Turkic languages belong to the Altaic language family... So, even if Anatole was right about R1b in the Altai a very long time ago -which he isn't, as discussed in GENEALOGY-DNA-L in 2010- there is no reason to associate that R1b with any sort of Turks. Actually the time depth he is talking about precedes the formation of the Turkic languages anyway.

The last sentence in the above quotation dismisses the first one. There more inconsistencies in the quotation, however, I got used to it.

The discussion is pointless unless you, Dienekes, describe which language the earlier R1b1 spoke, between, say, 16 and 6 thousand years ago (or in any time period within that range). Because why to argue if you do not know? You cannot dismiss what you do not know, and when you do not have an answer.

This is a fundamental part of a scientific paradigm: if you do not know, and cannot even suggest, do not argue.

Science is not based on dismissals and denials, it is based on advancing of hypotheses and on their examinations and verifications.

What you do, it is neither examination not verification. It is an attempt of vague discreting without offerening a counterhypothesis.

Anatole Klyosov

It seems Dienekes has already written exactly what...

2011-09-04T22:32:30.430+03:00

It seems Dienekes has already written exactly what more or less I would have written in reply to Anatole's comment, so there is no need to add anything to Dienekes' last comments. The only thing I could add is that Proto-Altaic and all of its Proto-branches (Proto-Turkic, Proto-Mongolic, Proto-Tungusic, Proto-Korean and Proto-Japonic) were most probably all spoken in a region comprising what is now Greater Mongolia, northeast of what is now China and/or the eastern parts of Siberia, thus by full or almost full Mongoloids. Also, the Altai region isn't fully in Central Asia but in the intersection between Central Asia, Greater Mongolia and eastern Siberia. Central Asia proper is the region comprising the former Soviet republics that end with "-stan", and, unlike the regions that are the most likely homelands of the Proto-Altaic and Proto-Turkic languages, which are both east of Central Asia and are Mongoloid regions from time immemorial, Central Asia proper was originally inhabited by full or almost full Caucasoids until the migrations of Altaic speakers there beginning from the 1st millennium BCE at the earliest.

Anatole Klyosov said: Why do you think I mentioned...

2011-09-04T22:30:01.171+03:00

Anatole Klyosov said:
Why do you think I mentioned only the first 12 markers as the most reliable? Because I know all of them, and examined and cross-examined each one of them. This reflects the principal difference between a professional such as myself (in kinetics of time-related processes) and you with your ignorance in the subject. You just grab numbers without thinking how reliable those numbers are.

No sir, I’m sorry but I had it with your Ad Hominems. Firstly you are NOT a geneticist, you a Biochemist who is doing genetics as a hobby. Yet you dare to call the work of other Geneticists, and that of team of scientists “utter nonsense”. Firstly I was trying to be respectful towards you because you are older than me, and because unlike you I have professionalism. Anyhow, just to give you a taste of reality.

So you think that you estimated mean mutation rate from the Donald clan project is far more reliable than using empirically measured mutation data from Father-sons pairs? Well I got news for you: certain STRs such as DYS388 mutate differently in different haplogroups, so there goes the first strike in using the data from R1a1 to R1b1a2. Second strikes goes in that as I showed above, and unfortunately for you, I’ll show again using different data sets, the usage of a combined STR set tends to results in disastrous TMRCA estimations. I bet you didn’t even bother to check the standard deviation of the combined STR set. You seem to have forgotten that when the standard deviation of a set of mutation rates is almost the same as their mean, then you are in trouble. Something you don’t realize is that doing the mutation counting method on the Donald clan assumes that in the slowest markers the R1a1 people of the Donald clan would have the same number of mutations relative to their sample size and to their TMRCA as the Iberian set from Adams et al(2008).

In order for you calibrated mutations to work( even if used independently and not in the disastrous mean mut/marker/gen method ) that would mean that if the people from the Donald Clan project had say for example 14 mutations in the DYS388, DYS392, DYS393, DYS437, and DYS438 of their 88 haplotypes, the people from the Adams et al(2008) would have a similar number proportional to their sample size and TMRCA. Whereas you don’t take into account the effect of random chance, and the fact that a complete different set can have more or less mutations in those five positions. Also sir, did you know that the mutation rate forward and backward can be different for several STRs.

Again this has come down to you pretty much trying to invalidate the arguments of others based on you being a “Professional” and having written in papers(Though not published in any major Journal) about it. Again you seem to make a lot of assumptions, for example, you are assuming that I have no previous experience with STR mutation rates, as well as, that I do not have a degree in a related field.

So in this hypothetical example of the people from the Donald Clan having 14 mutations in their 5 positions above mentioned, what if one were to test a different set of people who had a known common ancestor also 26 generations ago, and they turned out to have only 4 mutations in those 5 positions. The only way to truly calibrate the data using the genealogy calibration instead of actual measured empirical data, would be to test multiple datasets with a known common ancestor and average the mutation rate of each STR independently, and check that the standard deviation is somewhat reasonable. Anyhow, if this is going to come down to you just claiming that no one but you understands the mutation rates, I think we have reach the end of our discussion. I would gladly back off now, before this discussion gets too personal, which I’m afraid is getting. Nonetheless I would keep bringing forth data that shows that the usage of a combined STR sample is flawed because fast STRs undermine slow STRs in large datasets.

Regards,

Jean Lohizun

How you would respond to a paper featured in the c...

2011-09-04T21:46:30.809+03:00

How you would respond to a paper featured in the current Dienekes selection, which concluded that Turkic language in the present day Turkey was not brought from the East, as many other Turkic languages in the world?

Turkic languages were brought to Anatolia from the east; we are fortunate that this event happened in full light of history, so there is no argument to be had here.

What the current paper has concluded is that the arrival of Turkic languages in Anatolia was not the result of massive migrations from the East, which is a reasonable conclusion that has been confirmed time and again.

Turkic languages belong to the Altaic language fam...

2011-09-04T21:44:18.797+03:00

Turkic languages belong to the Altaic language family. Altaic languages (Mongolic, Turkic, and Tungusic, and more distantly Korean and Japanese) are primarily located in central-east Eurasia and spoken by Mongoloids and admixed Mongoloids. We also have good evidence now that there is a common autosomal component to Altaic speakers (see one of the links in the post), and this component is also aligned with East Eurasians (Mongoloids).

Every single piece of evidence points to the fact that Turkic languages were first spoken by Mongoloids. So, even if Anatole was right about R1b in the Altai a very long time ago -which he isn't, as discussed in GENEALOGY-DNA-L in 2010- there is no reason to associate that R1b with any sort of Turks. Actually the time depth he is talking about precedes the formation of the Turkic languages anyway.

Anatole always speaks about DATA, but he disregards all the data and relies entirely on his own Y-chromosome analysis to pretty much come up with the most imaginative scenaria.

Onur said... ...your scenario of the spread of l...

2011-09-04T16:25:33.884+03:00

Onur said...

...your scenario of the spread of languages and haplogroups is very much on the speculative side of the debate so much so that it is on the border between science and pseudo-science.

It is rather senseless to discuss the matter after I laid out here the background of the concept point by point, and you did not bother to respond with DATA. Pseudo-science is the way you chose to respond.

Turkic languages are now accepted as a branch of the Altaic language family by most linguists

WHICH Turkic languages? Do you know datings of origin/split of those languages which are considered by "most linguists"? Do you realize that I am talking about a time period for R1b1 and their language between 16 and 5 thousand ybp?

>...both the homeland of the Turkic language family and the homeland of the Altaic family are commonly thought to be somewhere in Greater Mongolia and/or eastern part of Siberia... So your R1b1-Proto-Turkic connection isn't plausible (whatever R1b1 Central Asian Turkic people now carry seems to be entirely from the pre-Turkic inhabitants of Central Asia).

You gave a very confusing and self-conflicting statement (rather, a mix of conflicting statements), considering that R1b1 arose in Central Asia, and likely in the Altai region, that their language was "Erbin" which linguists have not even considered (if they did, a reference please), that considering their possible route this Erbin might have been a Proto-Turkic language. How much linguists know on Proto-Turkic languages? How much do YOU know about them? Have you ever read on Proto-Turkic languages? I have.

How you would respond to a paper featured in the current Dienekes selection, which concluded that Turkic language in the present day Turkey was not brought from the East, as many other Turkic languages in the world?

I would not bet on that conclusion, however, it does not conflict with what I am taking about. Maybe there is something in it.

A lesson to you - don't throw around words such as "speculative", "pseudo-science". It is a bad sign, at least for a scientist. I doubt you are one of them. Your words mean that you either do not think, or just cannot think. You probably have not heard on "brain storming", when you first express various scenarios, and then look at available data and see what potentially confirms and what clearly contradicts. Where is that balance in your "critique"?

In a way, I am pleased. I see time and again that there are not many people around who want and can think and analyze DATA.

Anatole Klyosov

My dear friend, You have just written many words ...

2011-09-04T15:53:27.925+03:00

My dear friend,

You have just written many words on a subject you are not knowledgeable in. Your manipulations are all based on an indiscriminate extraction of Chandler's numbers without their verification. Without even thinking that they (or just one of them) might be incorrect. Have it occurred to you that some numbers might be incorrect indeed?

You have overlooked my comment above in this thread:

>>Indeed, Chandler's table, the most reliable one for the first 12 markers...

Why do you think I mentioned only the first 12 markers as the most reliable? Because I know all of them, and examined and cross-examined each one of them. This reflects the principal difference between a professional such as myself (in kinetics of time-related processes) and you with your ignorance in the subject. You just grab numbers without thinking how reliable those numbers are.

You do not look at the core of the problem. You do not want to pay any attention at the fact, that those Iberian R1b1a2 haplotypes produce essentially the same TMRCA whether they are calculated using 19 marker haplotypes, 25 marker, 37 marker, or 67 marker haplotypes, using either linear or logarithmic methods. All those results are withing margins of error. In spite of this obvious result, you grab something indisctiminately, manipulated mindlessly with them, and voila. In a way, you have repeated the same flawed "approach" as the paper beloved by you. They also grabbed someting (wrong "rates" from father-son pairs) without thinking, and voila.

THIS is pseudo-science. And you with your manipulations fully belong to it, at least in this particular case.

Anatole Klyosov