October 02, 2009

Decreased Rate of Evolution in Y Chromosome STR Loci of Increased Size of the Repeat Unit (Järve et al. 2009)

The paper unfortunately repeats the false explanation for the alleged difference between the germiline and "evolutionary" mutation rate:
These so-called ‘pedigree’ rates have turned out to be an order of magnitude higher than the ‘evolutionary’ rate estimate of 2.6×10−4 per generation for the same STR loci, obtained in a study based on counting the number of mutations on the branches of a haplotype network [14].

This discrepancy might be explained by the fact that a large share of STR variation derived within a haplogroup is being effectively removed by genetic drift, rendering mutation rate estimates based on evolutionary considerations 3 or more times lower than those based on pedigree studies [15].The effective mutation rate (based on evolutionary considerations) has been estimated as 1.52×10−3 per generation for an average autosomal dinucleotide STR locus and as 0.85−0.93×10−3 per generation for tri- and tetranucleotide loci [16]; the mutation rate for an average Y chromosome tri- or tetranucleotide STR locus has been estimated as 6.9×10−4 per 25 years [17].

PLoS ONE 4(9): e7276. doi:10.1371/journal.pone.0007276

Decreased Rate of Evolution in Y Chromosome STR Loci of Increased Size of the Repeat Unit

Mari Järve et al.

Abstract

Background

Polymorphic Y chromosome short tandem repeats (STRs) have been widely used in population genetic and evolutionary studies. Compared to di-, tri-, and tetranucleotide repeats, STRs with longer repeat units occur more rarely and are far less commonly used.

Principal Findings

In order to study the evolutionary dynamics of STRs according to repeat unit size, we analysed variation at 24 Y chromosome repeat loci: 1 tri-, 14 tetra-, 7 penta-, and 2 hexanucleotide loci. According to our results, penta- and hexanucleotide repeats have approximately two times lower repeat variance and diversity than tri- and tetranucleotide repeats, indicating that their mutation rate is about half of that of tri- and tetranucleotide repeats. Thus, STR markers with longer repeat units are more robust in distinguishing Y chromosome haplogroups and, in some cases, phylogenetic splits within established haplogroups.

Conclusions

Our findings suggest that Y chromosome STRs of increased repeat unit size have a lower rate of evolution, which has significant relevance in population genetic and evolutionary studies.

Link

73 comments:

Gioiello said...

Of course I don’t agree with what Dienekes says: “The paper unfortunately repeats the false explanation for the alleged difference between the germline and "evolutionary" mutation rate”.

I think that this paper is decisive of the controversy.
I invite you all to meditate on this:

Table 3. Coalescence age estimates and ancestral haplotypes of Y chromosome haplogroups.
Haplogroup
Penta/hexanucleotide repeats:
Y PENTA 1-DYS594-DYS596-Y
PENTA 2-DYS643-DYS645-
DYS438-DYS448
Tri/tetranucleotide repeats: DYS19-DYS389IDYS389II-
DYS390-DYS391-DYS392-DYS393-
DYS437-DYS439-DYS456-DYS458-DYS635-Y
GATA H4
SNP-based
coalescence
age estimates [24]
Coalescence
age estimate
R1a 17,50062,700 15,80063,100 -
R1b1b1 16,700(4,700 22,900(9,300 -
R1b1b2 10,900(1,800 16,600(6,000 -
R1 30,900(3,300 31,900(6,200 -
R1 (Europe, 14 R1a+14
R1b1b2)
23,300(4,300 27,000(5,500 18,500 (12,500–25,700)
R (8 balanced samples) 39,600(5,300 41,800(11,400 26,800 (19,900–34,300)
P (8 R+4 Q) 31,700(4,500 41,300(8,100 34,000 (26,600–41,400)
K (12 P+4 NO+1 L) 42,100(3,900 42,600(9,200 47,400 (40,000–53,900)
F (27 samples, incl 17 K) 43,600(3,100 46,000(10,000 48,000 (38,700–55,700)
CF 64,700(5,700 42,200(7,200 68,900 (64,600–69,900)
Ancestral
haplotype
R1a 11-10-10-10-10-8-11-20 16-13-17-25-11-11-13-14-10-16-15-23-12
R1b1b1 13-10-10-10-9-8-10-19 14-14-17-21-11-13-13-15-13-15-16-23-11
R1b1b2 11-10-10-11-10-8-12-19 14-13-16-24-11-13-13-15-12-16-17-23-12
R1 11-10-10-10-10-8-11-19 15-13-17-24-11-12-13-15-11-16-16-23-12
R1 (Europe, 14 R1a+14
R1b1b2)
11-10-10-10-10-8-11-20 15-13-16-24-11-12-13-15-11-16-16-23-12
R (8 balanced samples) 11-10-10-10-10-8-11-19 15-13-16-24-11-12-13-15-12-15-17-23-12
P (8 R+4 Q) 11-10-10-10-10-8-11-19 15-14-16-24-10-11-13-15-11-15-17-23-12
K (12 P+4 NO+1 L) 11-10-9-10-10-8-10-19 15-13-16-23-10-13-13-15-11-15-17-22-12
F (27 samples, incl 17 K) 11-10-9-10-10-8-10-20 15-13-16-23-10-11-13-15-12-15-16-21-12
CF 11-11-10-9-10-8-10-20 15-13-16.5-24-10-11-13-14-12-15-17-22-11
Coalescence age estimates, based on penta/hexanucleotide and tri/tetranucleotide repeats and the respective mutation rates, and ancestral haplotypes (estimated as
the weighted median number of repeats at each locus) of Y chromosome haplogroups. SNP-based age estimates from [24] are reported for comparison. Multicopy
markers DYF411S1 and DYS385a/b were excluded from the calculations.

Vincent said...

I'm not sure I'd be quick to condemn the authors of this paper re: "evolutionary" vs pedigree rates. What I see them doing in the quote highlighted by Dienekes is attempting to explain WHY the so-called evolutionary rates differ from pedigree (aka "actual") rates.

While I'd prefer to see the whole notion of "evolutionary" rates dispensed with, introducing the concept of drift (even if they don't treat it with any detail) at least opens the door to the crucial question: "are haplogroups likely to have been affected by drift?". I think any thoughtful approach to that question would arrive at the same answer that many of us (Dienekes included) seem to have arrived at: almost certainly "no". That answer, in turn, would give us a strong defense of using pedigree rates to date the coalescence of at least most large, extant y-haplogroups.

More broadly, I think this paper adds some forward progress to the field of coalescence estimation in that it highlights the fact that all STRs are not created equal with regard to their "clockiness" They open some doors (e.g. drift and mutational saturation) that hold some promise.

I do wish they had pushed a little harder on the "evolutionary" vs "pedigree" thing. They go out of their way to highlight in the paper that R1b1b1, unlike R1b1b2, appears - on the basis of their MJ network - to have been subject to drift. Yet they turn around and use the same "evolutionary" mutation rate for both groups even after admitting earlier that drift is affecting the "evolutionary" rate. It doesn't seem self-consistent to me.

VV

Gioiello said...

If we look at the modal, for instance of DYS392, CF had 11, K had 13, P had 11, R had 12, R1b1b2 had 13, I have had a last mutation to 12. Within 65,000 years my haplotype has changed 6 times, but if we compare my 12 with the ancestral CF=11, it would seem it has changes only by 1. For this I think Zhivotowsky is right.

It is what I tried to say when I spoke in the past of the mutations turning around the modal.

German Dziebel said...

From their Fig 1 (B) and 2 (B) it looks like that Underhill's deepest rooting African haplogroups A and B are in fact composed of several distinct haplotypes which are distributed across several nodes in the tree and hence are closer to different non-African haplogroups than to other A and B haplotypes. This suggests that Africa may have been peopled by several distinct groups of ancient hunter-gatherers ultimately derived from specific non-African populations.

Gioiello said...

German, I would say more: not only Hg.E probably entered Africa from Middle East and of course Hg. J, but R1b1* is certainly from Asia through Middle East. In the past I wandered about the closeness of the Bantu languages with Eurasian languages (and the tone languages of Guinea Gulf remind me of Chinese) and I think we must consider the idea that these languages have been brought to Africa from R1b1* or other Eurasian haplogroups.

Maju said...

...the crucial question: "are haplogroups likely to have been affected by drift?". I think any thoughtful approach to that question would arrive at the same answer that many of us (Dienekes included) seem to have arrived at: almost certainly "no".

And how come? Specially dealing with male-only transmission, that if anything should favor drift and fixation...

It seems you are just denying the very concept of genetic drift. This is totally counter-intuitive: obviously after several generations some lineages will always be more or less lucky than others, eventually leading to the extinction of many and the fixation of the lucky ones. This of course is in function of time and population size but all Paleolithic populations were small and had plenty of time (almost all of humankind's history is Paleolithic).

And then also the question is: if drift is not behind the quite shocking uniformity of haplogroups in some populations, then what? We should see an almost infinite mosaic of different haplogroups all hanging from the root of the human genealogical tree (well, there would not even be a tree at all most likely because all haplogroups that ever existed would have survived).

All this seems totally absurd. And if you're thinking of replacing drift by fitness selection, the bad news is that the effective behaviour would be the same as with drift. You'd not be able to tell the difference.

Maju said...

German: the graphs shown there are not reliable, precisely they are examples of imperfect typing. Only SNPs are reliable and what this paper seems to show is that STRs are just messing around and confusing things. They do claim though that the full set of STRs does approach the SNP-based tree but this is not in any of the graphs.

Vincent said...

Maju said "It seems you are just denying the very concept of genetic drift.

Not at all. I'm just saying that the haplogroups we find ourselves studying (e.g. E1b1b1a2, I1, R1b1b2, etc.) most often have not experienced drift to any substantial degree.

I know that drift occurs, but I also know the conditions under which it occurs, and young haplogroups that have been expanding during their entire existence are not likely to have had enough "drift" to push the "evolutionary" mutation rate far away from the pedigree rate.

As you move farther back in time then drift can become a bigger factor, of course. But I seriously doubt anyone can, with a straight face, use "drift" to justify using a Zhivesque rate to put a TMRCA on R1b1b2 (for example).

However, I also think that in cases where you believe drift is significant factor you should not be using intraclade variance to date clades anyway.

VV

German Dziebel said...

Let's wait and see. I'll write Lev Zhivotovsky. He and I used to talk a lot about genetics when he did projects with Marcus Feldman at Stanford. The B graphs fit my data (kinship systems, languages, folklore motifs, etc.) better. All these phylogenies on "SNP-steroids" may artificially present younger clades (A and B) as old and older clades (P-Q) as young. Same for mtDNA: 9 bp deletion is a unique event but it's presumed to have occurred several times (independently in America-Asia from Africa) just because there's a bunch of unique SNPs in Africa that make several 9 bp carrying haplotypes look dramatically different from each other.

terryt said...

"African haplogroups A and B are in fact composed of several distinct haplotypes which are distributed across several nodes in the tree".

That's very interesting, but makes sense when you think about it.

"This suggests that Africa may have been peopled by several distinct groups of ancient hunter-gatherers ultimately derived from specific non-African populations".

I'm glad to see someone else has come to the conclusion that (at least some) Y-hap As may have come into Africa. I've been arguing something similar with Maju elsewhere.

"if drift is not behind the quite shocking uniformity of haplogroups in some populations, then what?"

I'd still claim selection, but not through any inate survivability of the Y-chromosome. Technology is often passed from father to son, so any improvements are likely to lead to Y-hap expansion.

Maju said...

I'm just saying that the haplogroups we find ourselves studying (e.g. E1b1b1a2, I1, R1b1b2, etc.) most often have not experienced drift to any substantial degree.

You think they are too recent for that, ok. Well, it's really difficult to see how R1b1b2a got fixated in Western Europe the way it did without any help from drift. But that's precisely what makes me think that the lineage can't be so recent after all.

...young haplogroups that have been expanding during their entire existence are not likely to have had enough "drift" to push the "evolutionary" mutation rate far away from the pedigree rate.

But how do you get a pedigree rate mutation fixated? They should always remain more or less "private", as all lineages have the same statistical chances of success.

Any novel mutation happens always in a single individual, how come do you make it become 80 or 90% of a population without any drift?

However, I also think that in cases where you believe drift is significant factor you should not be using intraclade variance to date clades anyway.

That's an interesting observation, thanks.

Vincent said...

You think they are too recent for that, ok. Well, it's really difficult to see how R1b1b2a got fixated in Western Europe the way it did without any help from drift. But that's precisely what makes me think that the lineage can't be so recent after all.

You are focused on allele frequencies, which is understandable given the textbook treatments of drift, but the paper is focused on the effect of drift on mutation rates. Related, but not the same thing.

"Genetic drift" encompasses a lot of different real world scenarios, and different scenarios will have different impacts on frequencies and accumulated variance.

For example, in a situation where the population is very small but expands rapidly allele frequencies can fixate in pretty short order without having any significant impact on accumulated variance (aka "mutation rate"). In a population that remains small - even just "moderately small" - for a very long period of time, you are likely to see allele frequencies fixate AND variance to accumulate at something less than the "pedigree" rate.

Again, don't think I am dismissing drift as a EVER being a factor or even dismissing that it may be a partial explanation for the frequency of R1b1b2 in Europe. But the question of how drift has or has not affected the accumulation of variance in R1b1b2 is largely a different question.

VV

Maju said...

I understand the differences but the basic concept is the same and the question is: was there high (decisive) or low (unimportant) drift? When you have high frequencies of a single allele (fitness neutral) the natural concussion is that there was some very intense drift at some point in the past, causing that fixation.

For example, in a situation where the population is very small but expands rapidly allele frequencies can fixate in pretty short order...

This I do not understand/agree with. If the population is expanding, drift is by definition low (all clades have high chances of effectively reproduce because there is demographic growth for all). It is high when the population is small and stable (or contracting or even growing slowly), low when the population grows rapidly. Expansion can explain that a previously fixed allele extends its area of hegemony, or can explain local founder effects but cannot be a reason behind drift, rather the opposite.

In a population that remains small - even just "moderately small" - for a very long period of time, you are likely to see allele frequencies fixate AND variance to accumulate at something less than the "pedigree" rate.

Very much less. The new mutations have almost no chance of survival. The tendency is conservative and, randomness allowing, the already dominant alleles should tend to fixation. Only in the long run, very much longer than the pedigree rate, can some of the novel mutations, by the very effect of randomness, succeed and replace them. But most novel "pedigree rate" mutations would just fail in that, only maybe one in 100 or even 1000 will succeed because those are the chances (more or less).

The pedigree rate is trivial because it implies that all novel mutations succeed and then no haplogroups are formed except as private lineages, which is not what we mean, right? Of course, the pedigree rate can only be approached, never fully acomplished (there's always some drift), when the population is very high and expanding energically. But that's not a process that can form haplogroups in the usual sense: just a myriad of private lineages.

Going back to our fetish example of R1b1b2a, it cannot explain the haplogroup, much less in its dominant status in such large and historically diverse region, but it can maybe explain the many private lineages downstream, which have not been significatively affected by drift.

Vincent said...

the question is: was there high (decisive) or low (unimportant) drift?

The question is not that simple. I was trying to point out that the genetic history could lead to decisive allele frequency fixation without decisive effect on the accumulation of variance. It just depends on what was driving the drift.

If the population is expanding, drift is by definition low (all clades have high chances of effectively reproduce because there is demographic growth for all).

I think it will be hard for us to resolve this in an abstract discussion. But I think you'll agree that even though everyone may have high CHANCES for demographic growth, not everyone experiences equally high reproductive success. The first few generations are obviously the critical ones. Have you read any of the recent simulation papers on mutational wave fronts?

The pedigree rate is trivial because it implies that all novel mutations succeed and then no haplogroups are formed except as private lineages, which is not what we mean, right?

I think you are using the term "pedigree rate" to mean something quite different from what people usually mean. We are talking about mutation rate, in this case Y-STR mutation rate, in father-son generational events. No implication about whether "novel mutations succeed" or about haplogroups forming is involved. A father has a son, and the son has some probability mu of being slightly different in haplotype from the father. That's pedigree rate, and it is not a direct determinant of the rate of variance accumulation. The rate of variance accumulation (aka "evolutionary rate") is a conflation of pedigree mutation rate with a host of other demographic factors (e.g. degree of drift).

VV

Maju said...

I think it will be hard for us to resolve this in an abstract discussion.

You can toss some maths in but if you cannot make it appear logical in abstract or at least a simplified example...

But I think you'll agree that even though everyone may have high CHANCES for demographic growth, not everyone experiences equally high reproductive success.

That's because drift (effective result from the otherwise neutral "coin tossing") remains active.

But still a huge number of rare lineages would have great chances of success within a context of an expansive population. There's no way that anything similar to fixation can happen in such context: exactly the opposite is true.

Have you read any of the recent simulation papers on mutational wave fronts? -

I fear not. Am I missing something important? What's that of a fitness neutral "mutational wave front"? Sounds quite illogical, at least pretty much counter-intuitive.

I think you are using the term "pedigree rate" to mean something quite different from what people usually mean. We are talking about mutation rate, in this case Y-STR mutation rate, in father-son generational events.

Maybe I'm misunderstanding something but I believe it's exactly the same: when people talks about pedigree rate they basically assume that if a mutation happens, then it succeeds (stays for many generations). That's why the so-called pedigree rate is identical to the mutation rate.

The reality is very different as many mutations will just succumb to drift eventually, even when drift is low. How many men do, after three or four generations have still patrilineal male descendants? Maybe a half? Three fourths? Even if it's 90%, the pure pedigree rate just can't apply in reality: there's always some drift.

A father has a son, and the son has some probability mu of being slightly different in haplotype from the father. That's pedigree rate, and it is not a direct determinant of the rate of variance accumulation.

But the few mutated sons have always lower chances of success than the majority of not mutated ones. It's just a matter of numbers. For this doesn't even matter if drift is high or low, as long as there not every son founds a highly successful continuous till present patrilineage (and not all will, unavoidably, even in the most favorable circumstances), the chances are (in fitness neutral cases) in favor of the conservative alleles, because the chances of mutation are always quite low.

[Note: per Chandler 2006, the mutation rates of various usual DYS are between 0.00061 and 0.00530: the highest chance of mutation for a single locus is 5 per thousand, the lowest of 0.6 per thousand. Even adding all the usual DYS markers you can only get maybe a 3% mutation chance at best. The chances of success or survival of the novel lineage therefore are very low. In the long run some will succeed by mere luck but in general, even in favorable expansive conditions, most just won't. Hence the pedigree rate applied to any real situation is an absurd concept; the real mutation rate must be a lot slower].

The rate of variance accumulation (aka "evolutionary rate") is a conflation of pedigree mutation rate with a host of other demographic factors (e.g. degree of drift).

Right. But as there is always some drift, even if low, the pedigree rate as anything happening in reality is an oxymoron. For me even Zhivotovski rate is too high, at least for pre-Neolithic circumstances. It may be correct for post-Neolithic ones maybe.

Vincent said...

I fear not. Am I missing something important? What's that of a fitness neutral "mutational wave front"? Sounds quite illogical, at least pretty much counter-intuitive.
You probably should read them.

Maybe I'm misunderstanding something but I believe it's exactly the same: when people talks about pedigree rate they basically assume that if a mutation happens, then it succeeds (stays for many generations). That's why the so-called pedigree rate is identical to the mutation rate.

The pedigree rate IS identical to the mutation rate, precise because there are not embedded assumption of the kind that you mention. The pedigree mutation rate is merely the probability that a son's haplotype differs from his fathers (per locus or summed over the whole Y). It makes no difference what happens to the son. He could be eaten by wolves at the age of 9 months. We don't care: if his haplotype is different from the father, there is a mutation.

The "evolutionary effective" mutation rate is measuring something different: the rate of variance accumulation in a population. It is something less than the pedigree rate, but how much less depends on a bunch of things.

VV

Maju said...

You probably should read them.

I probably would if I knew which are they.

The pedigree rate IS identical to the mutation rate, precise because there are not embedded assumption of the kind that you mention. The pedigree mutation rate is merely the probability that a son's haplotype differs from his fathers (per locus or summed over the whole Y). It makes no difference what happens to the son. He could be eaten by wolves at the age of 9 months. We don't care: if his haplotype is different from the father, there is a mutation.

That's absolutely impractical and unrealistic. What matters is how many mutations survive, not how many happened but died off. This last may (probably should) be a variable in the equations but never the result.

The "evolutionary effective" mutation rate is measuring something different: the rate of variance accumulation in a population. It is something less than the pedigree rate, but how much less depends on a bunch of things.

Ok. But we agree that the effective mutation rate is ALWAYS smaller than the technical mutation rate.

That means that you just CANNOT use the pedigree rate without any correction. You need a corrected rate that reflects as well as possible the effective chances of a mutation to survive.

Just in case there is any doubt, this does not only affect populations anyhow but clades as well. In the evolutionary history of whichever haplogroup, there were many many mutations that happened but never made it after all: many more than the ones we can find and track. Of course, recent mutations are much more likely to still exist than older ones, which have probably either gone extinct or (the few lucky ones) become fixated (when the haplogroup was carried by just a small population, back in the Paleolithic probably).

Vincent said...

That's absolutely impractical and unrealistic. What matters is how many mutations survive, not how many happened but died off. This last may (probably should) be a variable in the equations but never the result.

Clear thinking is definitely called for. The mutation rate is our divisor: this rate measures how frequently do mutations HAPPEN, not how frequently we OBSERVE them. Both are important in TMRCA estimation, but they go to different places in the equation. Zhiv et al. were intellectually sloppy in trying to conflate them, in my opinion, and we should be able to think more clearly by keeping the apples from the oranges. But since we are talking about effective mutation rates, we are by definition already doing sloppy thinking.

Ok. But we agree that the effective mutation rate is ALWAYS smaller than the technical mutation rate.
I think I can agree with that.

That means that you just CANNOT use the pedigree rate without any correction. You need a corrected rate that reflects as well as possible the effective chances of a mutation to survive.
Okay, but what is the correction? Is it small, or large? In fact, it depends on a lot of factors. But this brings me right back where I started, which is to say that in the case of the kind of haplogroups we are usually bickering about (e.g. R1b1b2) the correction is so small we can usually ignore it: to any statistically significant degree, for this kind of case, the effective mutation rate IS the pedigree rate.

Maju said...

I think I can agree with that.

Nice.

But this brings me right back where I started, which is to say that in the case of the kind of haplogroups we are usually bickering about (e.g. R1b1b2) the correction is so small we can usually ignore it: to any statistically significant degree, for this kind of case, the effective mutation rate IS the pedigree rate.

I'd like to know why do you think that. You have a lineage that is not just widespread in most of West Eurasia but actually dominant in almost half of it (and in many cases extremely dominant, almost the only haplogroup). With a recent coalescence date, there would be not time nor demographic conditions for any fixation to happen, so such situation of de-facto fixation in half Europe requires of an older date: one when the population was sufficiently small for fixation to happen. That can only be in the Paleolithic. I'd dare say that before the Late Paleolithic, when the population began expanding apparently at high speed (this expansive secenario would not allow for drift to cause a fixation anymore, at least in principle).

Even if the lineage expanded with some Neolithic process, it should have been already fixated at the source population(s) for the derived founder effects to be so homogeneous.

Vincent said...

I'd like to know why do you think that. You have a lineage that is not just widespread in most of West Eurasia but actually dominant in almost half of it (and in many cases extremely dominant, almost the only haplogroup).

First, let me make clear that many of the assumptions you are making about the demographic history of Europe I don't necessarily agree with. Principally, I don't grant that the Paleolithic is the "only" time when the population was sufficiently small for R1b1b2 to reach fixation.

Second, let me also assert that drift may not be the only explanation for high frequency of R1b1b2 in Europe. If R1b1b2 arrived in Europe as part of a neolithic package (as I suspect it did), it doesn't seem too far fetched to assert that the advantages of that package may have conferred a degree of advantage on R1b1b2 itself: like some kind of linkage disequilibrium, except with the linkage being between the Y-chromosome and technology/culture instead of between two genetic loci.

But that begs the question of WHY I assert these things. I don't have time to produce a complete defense of my view, but in short I am persuaded by my phylogeographic assessment of R1b1b2. The subclades of R1b1b2 align spatially in a clear cline, appearing to radiate from the Near East. Yet these clades all have virtually identical modal haplotypes. A scenario like the one you advocate (long drift in a large population) would be expected to produce more long branches in the phylogeny than we observe. On the other hand, a neolithic expansion scenario explains the data quite well: high frequency, expanding along the path of expansion, with little trace of non-linear variance accumulation.

BTW, here's one of the mutational wave front papers I was thinking about earlier.

http://www.pnas.org/content/101/4/975.full.pdf+html

Also, Arredi et al. have written a couple of papers (one on Europe and another on North Africa) that illustrate this phenomenon with direct application to the neolithic.

Anonymous said...

It is said that STRs with higher allele values will mutate faster, than the same STRs with lower allele values. Haplogroup J1 is cited as an example with DYS388.

Experimental mutation rates are well and good, as is Dienekes "reduce them all by dividing them by three" idea but we must rely on what happens in practice, in real life, between men and their sons. Also when dealing with many STRs in a haplotype of a haplogroup it would be prudent to use the average mutation rate rather than one calculated for one STR out of the haplotype. Example, I am J1e with anomalous DYS607, and DYS578. I am the only J1 with DYS578=7, and with a genetic cousin with same surname but no paper connection, we are the only J1s who are DYS607=8, the mode is 14. Using the mutation rates for both STRs, we would not be closely related to any other J1 men for many thousands of years. Common sense must enter the calculation of TMRCA.

Any dating should be accompanied with proven archeological or other data to substantiate it. At the present time, there has been absolutely no evidence just supposition, to link any haplogroup with any archeological site or ancient cultural group based on pottery and other finds or language groups. Using today's genetic distributions are not good enough. More effort should be made to link languages, cultures, "fossil" remains with haplogroups, and the peoples living today in the same localities as the archeological finds. Most studies on ancient dna have yields results that show no connections of modern populations with ancient ones. Only recent migrations of people have shown positive results, e.g Avars in Hungary or presence of mtDNA F in Croatia. And now results of horses used by Avars compared with Hungarians! As for R1b in Western Europe, it shows all the hallmarks of a recent founder effect. Same with J1e in Arabia, and North Caucasus. Frequencies over 50% beg questioning irregardless of the estimated age of the haplogroup.

Maju said...

If R1b1b2 arrived in Europe as part of a neolithic package (as I suspect it did), it doesn't seem too far fetched to assert that the advantages of that package may have conferred a degree of advantage on R1b1b2 itself...

Sure, founder effect is in theory possible at the beginning of Neolithic. But there is always a problem with the fact that there was not one single homogeneous European Neolithic but at least half a dozen (two main waves and many local original ones in the East and West specially).

We'd also need that the lineage would be highly dominant at the origin, but R1b is low in Greece and only weak in Turkey or the Balcans. even in crucial intermediate places like South Italy or Hungary it's still too low to justify such a multiple massive founder effect. We can't either associate it with other logical Neolithic clades like E1b1b1 or probably also J2b, which do have geographic patterns that correspond very well with the documented Neolithic spread from the Balcans.

The subclades of R1b1b2 align spatially in a clear cline, appearing to radiate from the Near East.

That is not the case AFAIK within R1b1b2a, which is particularly low in West Asia and the Balcans. IMO we are in the presence of two different haplogroups R1b1b2a1 in West Europe and R1b1b2b (conjectural: based on Ht35) in Turkey and SE Europe. In any case, R1b1b2a1 should be treated as something distinct: the same you make a difference between R1b1b2 and the rest of lineages, you should make a difference just one step downstream because there's a clear geographic and phylogenetic split at that point.

We need to clarify R1b1b2a1, which is the surprisingly homogeneous bloc, not R1b or R1b1b2 (which is just transitional between R1b and R1b1b2a if not actually a parallel haplogroup mostly) before we can deal with anything upstream. Ht35 may have spread in the Neolithic but that does not prejudges anything for the bulk of European R1b, which does not belong to that haplotype cluster and likely distinct haplogroup.

A scenario like the one you advocate (long drift in a large population) would be expected to produce more long branches in the phylogeny than we observe.

That is a very good point. However the phylogeny we know may still be faulty. We are almost every other day adding SNPs to the tree, specially in this area. The downstream phylogeny of R1b1b2a1 or R1b as a whole is relatively well known for Northern Europe but really not much elsewhere. This is because the lead has been largely in private hands, who only research their customers, who in turn have mostly that ancestry. Non-Basque Iberian R1b is almost all still being classified as R1b1b2a1* because no downstream clades have been described. This is probably also the case with Eastern Mediterranean R1b and of course with African one (probably yet another distinct haplogroup).

The problem is that we are still talking of R1b or R1b1b2 when these categories show little homogeneity. We should be using lower tier denominations where needed, the same we make a difference between E1b-V13 and the bulk of E1b1b1. In the case of R1b it urgently needs further research in the Mediterranean specially both in the West (Iberia) and the East, as well as in Sahelian Africa (though this conjectural lineage is probably rather upstream). And it needs differentiated treatment for each subclade too. R1b1b-Ht35 is very interesting indeed but it's not the same (and very possibly not even at the root) of R1b1b2a1-L51 and R1b1b2a1a-L11. Almost all European genetics come from West Asia, whether Paleo- or Neolithic, anyhow.

BTW, here's one of the mutational wave front papers I was thinking about earlier...

Thanks. I'll take a look immediately.

Gioiello said...

Ponto, as I said up, using DYS392, Zhivotovsky et alii have used mutations step by step, and these mutations were turning around the modal. If you look at, sometimes we have, passing from a haplogroup to another, a two-step mutation, and we can think either to a multi-step mutation or to the fact that who had that SNP which generated a new haplogroup was two-step distant from the preceding haplogroup. In this case from my DYS392=12 and the CF=11 there would be not 6 mutations but 7, not counting the initial 11. Of course if you, with your multistep mutation do origin to a new haplogroup, we should consider this fact on calculating for this marker the distance from a modal. This fact can be happened in the past, for instance the DYS390 of R1b1b1.
Yours are clearly a multistep mutations, which only you and your cousin have, and they are a quasi-SNP, interesting for connecting surely you both, but in a large comparing they shall be thought as 1 step mutation.
I am studying a Brazilian R1b1b2a1b, who finds two very close cousin, but they differ in DYS458: he has the low value 15 and they have 18. What must I think? That, being they R1b1b2a1b, probably they have the true value and my Brazilian friend has had in these last times a multistep mutation from 18 to 15.

Gioiello said...

Majy says: "R1b1b-Ht35 is very interesting indeed but it's not the same (and very possibly not even at the root) of R1b1b2a1-L51 and R1b1b2a1a-L11".

You know my theory is completely different: not only R-L51 and R-L11presuppose Italian R-L23-, R-L23+, but the crucial R-L23+/L150-, so far the Italian Romitti.

Maju said...

Vincent: read the paper already. It is interesting but adds nothing that I would not have considered to the discussion. The key issue is whether a clade or population expands (and hence *effectively mutates* much faster, generating higher internal diversity) or does not. It does not provide any suggestion that sublineages would become fixated or anything of the like, nor when these expansions may have happened.

Interestingly enough, by the end of the paper the Rh blood system is mentioned as example and the Rh- (dce) type has a distribution totally consistent in West Europe with that of R1b1b2a1 (or the local R1b). But the authors do not suggest that Rh- is a recent Neolithic phenomenon but rather a very old founder effect of the time of Eurasian expansion. Food for thought.

Maju said...

You know my theory is completely different: not only R-L51 and R-L11presuppose Italian R-L23-, R-L23+, but the crucial R-L23+/L150-, so far the Italian Romitti.

I know... but I don't have to agree with it, right? I am quite persuaded that there are many untyped relevant SNPs still to be discovered within R1b so the *apparent* structure may (and probably will be eventually demonstrated to be) in fact a misrepresentation of the real tree. For me Ht35 (like so many haplotype clusters in the past) will eventually be shown to be a distinct haplogroup.

So far, excepting the Nordic sub-branch of R1b1b2a1 (namely R1b1b2a1a1), the whole haplogroup lacks (somehow surprisingly) enough internal definition. No wonder when most often the description of the clade has been P(xR1a). Like has happened in the past in so many cases, sufficiently defined haplotype clusters are likely to be distinct subhaplogroups.

Enjoy.

Gioiello said...

Maju, your theory is interesting and must be treated with care, but my question to you is the same I did many times to Vizachero: find somewhere R-L23+/L150-. If the origin isn’t in Italy, you must find this haplotype/haplgroup elsewhere. I am yet waiting for. Anyway, also if there are many SNPs not yet discovered as you think, R-L23+/L150- is crucial. But we haven’t yet found R-L23+/L49- and vice versa, nor R-M269+ and some S3/10/13/17-… Of course when we have many SNPs together there has been a bottleneck. And if there has been a bottleneck, who carried those SNPs has died without descendants and we'll never find them.

Maju said...

You have mentioned it before but L150 remains unlisted at ISOGG as of today. Hence I really do not know what you're talking about when you mention that clade. For what I can see at ISOGG, the sequence is L23, L49, etc. (defining R1b1b2a) and downstream L51 (defining R1b1b2a1). I can't give an opinion on the rest.

Gioiello said...

Maju, I already said to you in the past that L150 is in the Adriano's spreadsheet. Romitti is R-L23+/L150-, I am R-L23+/L150+ like everyone downstreem. All your haplotypes/haplogroups descend from me, both R-U106 and R-P312.
This SNP is so important that Vizachero are testing all his guys for it, but so far he hasn't probably found (and I think he won't find where he is looking for it).
But I ivite you to pass to the thread up, I believe explosive.

Anonymous said...

Multistep mutations don't occur very often. Most studies, the father/son ones show that most mutations are of the order of one, plus one. Multistep mutations occur very rarely. Mutations reducing allele values are less common than those increasing allele values. Of course, many studies must be undertaken to form a reasoned average mutation rate. YHRD shows a number of mutation rates for each locus and an average rate.

I do not accept that R1b entered Europe in the Neolithic, that is less than 10 kya. I accept a entry time some thousands of years earlier during the time after the LGM but before the Neolithic age, that is before the domestication of the horse, Proto I.E languages and farming. I just don't accept any of the associations of R1b to any AMH remains found in Europe like those of Cro Magnon or Grimaldi or the others. R1b entered from the east and ended up in Iberia, the Western Isles and Northern Italy much later than its presence in the ameliorating locales of Europe now called France and Germany. Iberia, and Italy were cut off by ice covered mountains. Many here are referring to subclades of R1b as found today in Spain or Ireland or Italy. They are only at best 4 kya. It would be better to concentrate on the SNPs up stream from those recent mutations to find the progenitor of those johnny come lately subclades.

I don't know about you folks but for me, my haplogroups,SNPs and STRs were ascertained from my the dna in the nucleus of my somatic cells, not my germ line cells. In fact just about every dna test done by commercial testing companies is on dna from somatic cells. How those dna results from somatic cells compare with the dna in germ line cells, frankly I do not want to know. Obviously a man is going to pass his Y chromosome haplogroup to his son via his germ line cells and any differences would mostly occur there. I don't know any studies which have done tests on the dna obtained from that source between father and son!

Blood group Rhesus -ve, may coincide with a subclade of R1b. Well, congratulations. Coincidence is not the basis for anything. Rhesus -ve may coincide with halitosis or a tendency for ingrown toenails, and having dark brown to black hair. Rhesus negative is not an advantage. Even Basques are far from 100%. Many neonatal and post natal complications must have arisen due to the high likelihood of Rhesus -ve women being impregnated by Rhesus +ve men. It is like redhair. It carries more disadvantages than advantages.

Gioiello said...

My father was 0 Rh-, I am 0 Rh+, and so my sons.
I invite you too, like Maju, to the thread up, I believe explosive.

Maju said...

Many here are referring to subclades of R1b as found today in Spain or Ireland or Italy. They are only at best 4 kya.

That is what I find simply impossible (if you mean R1b1b2a1, which is widespread through all Western Europe, sensu lato) because no possible migration that late could have caused such a huge impact, with such a tremendous uniformization everywhere. Downstream clades found maybe in 5-10% of the people might be that recent but not something that is shared by 50-90%, depending of the area.

Four or five thousand years ago, there were already thriving civilizations over there. Celtic expansion, the only sizeable phenomenon within that timeframe of yours, never affected even half of the Iberian geography, often as hybrid cultures. It just doesn't fit anywhere.

I don't know about you folks but for me, my haplogroups,SNPs and STRs were ascertained from my the dna in the nucleus of my somatic cells, not my germ line cells.

I've never tested my DNA. What for? What matters is population history, not individual ancestry. Anyhow, I know exactly the towns and even farmhouse where my patri- and matrilineal ancestors lived at, so I'd expect to be R1b1b2a1 and either H or U5. I could be wrong about that (who knows, maybe E1b1b and J?) but that would not change the real deeply rooted local ancestry anyhow. I'm more interested in the ancestry of my people (either Basque or European or Human in general) than in my own private lineages. I'd donate my DNA to a public "open source" database though.

Maju said...

Blood group Rhesus -ve, may coincide with a subclade of R1b. Well, congratulations. Coincidence is not the basis for anything.

I don't believe in coincidences, much less when recessive deleterious traits are involved, as is the case with Rh-. A large inflow of exotic males, as the one needed to cause such a homogeneity in post-Neolithic times, would just have erased the Rh- to virtually zero levels - unless the incoming males would have been also mostly Rh-. But that would be a coincidence!

Rhesus negative is not an advantage. (...) Many neonatal and post natal complications must have arisen due to the high likelihood of Rhesus -ve women being impregnated by Rhesus +ve men.

Yes indeed, specially for the children of Rh+ fathers, except the first one. It's a "racist killer" gene... in about 13% of cases... so not that intense anyhow, specially if once upon a time everybody was Rh-, as was probably the case (otherwise the trait would have been destroyed by evolutionary pressure long ago). But whatever the case it is clear that is recessive and deleterious but it's there anyhow in huge numbers, so there can't have been any such huge immigration after all.

McG said...

I agree with gioellos comment re: modal haplotype value. It does not appear to change/drift from Hg E to R1b for the 67 FtDNA dys loci.

I think the papers conclusions are fairly trivial. We are aware that there is about a 100:1 range of STR mu rates.

I have a haplotype very similar to ht35, yet I am L-21. How can that be. Rather than drift I would believe catastrophe (doggerland submersal) or multiple SNP mutations are the culprit. I am Ysearch z5hg3.

The data indicates that multiple step mutations occur at about the 5% level. They are not at all rare and confound variance estimates.

Vincent said...

The key issue is whether a clade or population expands (and hence *effectively mutates* much faster, generating higher internal diversity) or does not. It does not provide any suggestion that sublineages would become fixated or anything of the like, nor when these expansions may have happened.

In the case of a "wave", such as the paper discusses, there are too possible outcomes: either the new allele survives, or it does not.

Most often it does not survive.

But when it DOES survive, the paper accounts for some interesting phenomena. One is that in cases when the novel allele survives it is likely to rise to extremely high frequency. Another is that the frequency of the novel allele is likely to be much higher AWAY FROM the place it first appears than AT the place it first appears.

Of course when we look back a the neolithic transition, we don't have very good insight to the genetic composition of the people instigating the transition. And we have the disadvantage of only observing one outcome of all the possible outcomes.

But we can look at what we see with R1b1b2: high frequency, which is highest away from the point of origin; and lower STR variance, which is lowest away from the point of origin. And we can ask ourselves, what kind of process would be the best explanation for that data? A mutational wave front. And how many SE to NW wave fronts has Europe experienced? Not so many, and two are clearly stronger than the others: the first arrival of modern humans, and the neolithic transition. Given the low variance of R1b1b2 overall, the second is a hugely better fit than the first.

VV

Gioiello said...

Vizachero says: “And how many SE to NW wave fronts has Europe experienced? Not so many, and two are clearly stronger than the others: the first arrival of modern humans, and the Neolithic transition. Given the low variance of R1b1b2 overall, the second is a hugely better fit than the first”.

The true problem is when this happened: 4000 (as you are thinking), 6000, 8000, 10000 or more years ago.
We have many proofs that the most part of your hypotheses were wrong:
1) A very recent entrance was due to justify the Middle Eastern origin of Jewish R. Falsified ad abundantiam: all Jewish Ashkenazi R (and not only) are of European extraction (the last demonstration is given by the paper up on this list).
2) Western Europeans inhabit here from Paleolithic times (see the same paper up) and we can’t think only to women, and men are above all hg. R.
3) Your calculation of the R1b1b2 age has been falsified ad abundantiam. The last researches of Argiedude (and mine) has demonstrated that R1b1* which generated all the European haplogroups has wintered in the Cantabrian Refugium, and we find the origin of the subclades from R1b1b2 (L23-, L23+) above all in Italy. My theory is that from Italy there was an expansion to the Balkans, reaching the Anatolia and Middle East with Italian mtDNA U5b3. Certainly R1b1b2 and Indo Europeans were in that time from Italy to Balkans- Black Sea, in contact with Ugro-Finnic people in Ukraine/South Russia (see the same paper above).They gave origin to Linearbandkeramik, and they generated all the modern Central-North European with the most recent subclades.
4) When you’ll find elsewhere than in Italy an R-L23+/L150-, write to me, and I’ll be willing to change my theory.

Maju said...

But when it DOES survive, the paper accounts for some interesting phenomena. One is that in cases when the novel allele survives it is likely to rise to extremely high frequency.

Yah, that's a founder effect.

Another is that the frequency of the novel allele is likely to be much higher AWAY FROM the place it first appears than AT the place it first appears.

Guess it does fit with what I understand by founder effect as well.

Of course when we look back a the neolithic transition, we don't have very good insight to the genetic composition of the people instigating the transition. And we have the disadvantage of only observing one outcome of all the possible outcomes.

Well, we know they came from the Balcans (except very possibly the peripheral cases of the Atlantic and Eastern Europe) where previously there was surely a founder effect from West Asia (E-V13 and all that).

But we also know that "the wave" were in fact two mostly unrelated waves (very different material culture) and that the Mediterranean wave only shows so much signs of genuine colonization, as in most places the toolkit remained the same as in the Epipaleolithic.

We also know that there was no cultural group that even approaches the modern extension of R1b1b2a1 in Western Europe - except the Dolmenic Megalithic phenomenon, that is generally interpreted as implying no or limited colonization as well, but rather a religious phenomenon into already estabilished cultures. But if modern Western European haploid genetics have to be Neolithic and can't be Paleolithic, then it's the only phenomenon that can explain it. It would mean that R1b1b2a1 expanded from Portugal in the Chalcolithic (late Neolithic in British usage). Does that fit with the data?

Anyhow autosomal genetics do not correspond with that distribution at all: they emphasize a rather homogeneous Central/North European cluster, whatever the peripherical influences and distinct Iberian and Basque clusters, as well as a distinct SE European/Anatolian one too. I have generally considered that these reflected the Epipaleolithic/Neolithic regional homogenization but, if we have to follow a recentist model, then how did they form?

But we can look at what we see with R1b1b2: high frequency, which is highest away from the point of origin; and lower STR variance, which is lowest away from the point of origin. And we can ask ourselves, what kind of process would be the best explanation for that data? A mutational wave front.

I know of no study as of yet that has analyzed the STR variance of R1b1b2a1 as such, which is the mutation we are interested in. All studies have dealt with R1b as a whole, which is totally confusing, as wherever several lineages converge, the diversity will almost always be greater. For example the R1b diversity is surely much higher in Brazil than in Portugal or in the USA than in Britain but that does not mean they are any origin.

Maju said...

(cont.)

And how many SE to NW wave fronts has Europe experienced? Not so many, and two are clearly stronger than the others: the first arrival of modern humans, and the neolithic transition.

We do not know for a fact that there has been no more waves in the Paleolithic. The origin of novel cultures such as Gravettian are still obscure and might well have originated in West Asia too, where there are potential precursors. Also there were back-migrations to West Asia as well.

And, crucially, we do not know if the R1b of West Asia and largely SE Europe too is comparable with that of Western Europe. By mixing them we are probably comparing apples and oranges, or sort of that (so, where there are both types of fruits, the diversity is always higher).

You are the expert with access to relatively abundant data, Vincent: which do you think is the origin of R1b1b2a1a-L11 (and several other SNPs)? Probably L11 (rather than L51) is the important clade, as there is some indication of somewhat long coalescence prior to expansion because it has at least four SNPs without ramifications on top (and the branches out quite quickly).

Also I was thinking that after all, all this issue of wavefornts is just a computer simulation, which may or not be correct. Does it agree with real known cases like the Bantu expansion of the colonization of North America or is just another hypothesis that does not match with reality?

The classical explanation for a situation like the one we witness re. R1b1b2a1a is that it has receded as other waves (Neolithic, Indoeuropeans) have marched against it, much like Rh- and some autosomal stuff.

Of course a massive founder effect would twist things but there are no really good occasions for such a FE to happen with the geography needed. Would it be a FE, I'd expect at least two different ones: one in the Mediterranean and another in Central Europe because it'd be too much coincidence that the people that migrated by these two different routes, with so much different cultures, were all cousins.

Vincent said...

And, crucially, we do not know if the R1b of West Asia and largely SE Europe too is comparable with that of Western Europe.

The composition of R1b in SW Asia is indeed somewhat different from what we see in W Europe: R1b1*, R1b1b1, R1b1b2*, R1b1b2a*, etc. are more frequent in SW Asia than in W Europe while R1b1b2a1a and subclades are more common in W Europe than in SW Asia. Exactly what you'd expect for a haplogroup that originated in SW Asia and expanded into Europe.

However, when it comes to variance within R1b1b2 what we observe is that it is slightly higher in SW Asia but not dramatically so. And the period of time elapsed from the overall R1b1b2 MRCA to the MRCA of the European-dominant forms (e.g. R-L11) is very short: 1,000 years or less.

In short, a dramatic spatial cline in frequency and a much less dramatic spatial cline in variance: exactly what you'd expect to result from the kind of wave that Edmonds was describing and that Arredi et al. propose for both North Africa (vis-a-vis E-M81) and indeed for Europe vis-a-vis R-M269.

VV

Vincent said...

Also, let us not ever forget that we are implicitly assuming that the Y-chromosome is strictly a neutral marker. It may not be.

VV

Maju said...

However, when it comes to variance within R1b1b2 what we observe is that it is slightly higher in SW Asia but not dramatically so.

And you have not yet focused on R1b1b2a1a, which is the real matter.

And the period of time elapsed from the overall R1b1b2 MRCA to the MRCA of the European-dominant forms (e.g. R-L11) is very short: 1,000 years or less.

Are you telling me that 4 SNPs happened in just 1000 years? It's about 1/4 of the known length of the whole R1b lineage and more than 1/3 of the better studied R1b1b2 clade.

...

Anyhow, the case is that I did focus today a bit in the haplotypes that might correspond to R1b1b2a1a and got a bit frustrated because, following the limited data of Alonso'05, the apparent diversity center remains in SE Europe (Croatia, followed by Turkey and Italy, and only then by Central Europe and some Basque samples).

But also got to know some interesting and quite odd stuff in the process, that makes me think that diversity alone isn't such a good indicator and that the wavefront model doesn't really work:

1. England seems to have quite lower diversity (relative to sample) than the Celtic countries around it. This makes no sense because whoever migrated to Wales, Ireland or Scotland at any time almost necessarily had to go through England.

This makes me think that huge samples can't really increase much the absolute number of clades (for which England and Iberia are leaders), which may be limited after all (at least with the limited amount of DYS markers that Alonso used), but increases the factor by which these are divided. Seems some sort of decreased benefits rule but applied to statistical genetics.

2. Iceland, that could be a great real example of wavefront migration, with its corresponding founder effects, enhanced by extreme drift in a very small isolated population... has almost the same apparent diversity as Denmark or Norway (and well, Norway should have less diversity than Denmark too, right?). It is surely frustrating but how come can such an ideal case of colonization of a totally empty land by a bunch of people reproduces almost exactly the motherland's genetic diversity? It should not but it does.

I don't know what to think after noticing those two striking oddities, sincerely.

Vincent said...

Are you telling me that 4 SNPs happened in just 1000 years?

The focus should be on the tree nodes, not the SNPs. And there are 3 SNPs between the node for R-M269 and the node for R-L11 (L23, L51, and L11: M269 doesn't count).

But, yes, I am saying that the MRCA of R1b1b2a1a lived less than 1,000 years after the MRCA of R1b1b2.

VV

Maju said...

I'm not in agreement with the focus only being on the nodes: SNPs are specially meaningful mutations (much more than STRs) and take time to appear and consolidate. A series of four (or more) SNPs without any individual hanging by the middle strongly suggests that the process took some time of calm for such a drift to happen. Would there be only one SNP, I could take a swift expansion but with a long SNP chain without a single ramification I just can't.

Of course, we can't rely too much on the SNPs because we do not know all those that in fact exist but just those that have been discovered and described. But still...

And there are 3 SNPs between the node for R-M269 and the node for R-L11 (L23, L51, and L11: M269 doesn't count).

They do, of course. Who said the opposite?

But, at the current state of knowledge, the main periods of coalescent calm within R1b1b2 appear to have happened at the formation of this lineage (5 SNPs) and at the formation of R1b1b2a1 (4 SNPs). A shorter "coalescent calm" period may be assigned to the formation of R1b1b2a (2 SNPs).

R1b1b2 is a pretty well studied haplogroup, so the known SNPs may be of more significance than in less well known lineages like C or H: they should be a reasonably good sample of the real thing.

From R1b1b2a1a1d1a (an arbitrary well studied subclade) to the root there are slightly more than 100 SNPs. If four SNPs mean less than 1000 years, then Y-DNA Adam lived probably some 25,000 years ago only. Just makes no sense at all.

Maju said...

Erratum: "...at the formation of R1b1b2a1 (4 SNPs)" should read "... at the formation of R1b1b2a1a (4 SNPs)". Damn nomenclature!

Vincent said...

I'm not in agreement with the focus only being on the nodes: SNPs are specially meaningful mutations (much more than STRs) and take time to appear and consolidate. A series of four (or more) SNPs without any individual hanging by the middle strongly suggests that the process took some time of calm for such a drift to happen. Would there be only one SNP, I could take a swift expansion but with a long SNP chain without a single ramification I just can't.

The trouble is that you cannot use the discovered and published SNPs (e.g. the ISOGG ones or YCC ones) as a clock. We really don't know two important things: what portion of SNPs have been discovered, and whether the rate of discovery is equal on every branch. STRs actually work much better, at least for now, since the ascertainment bias is so much lower.

Simply put, you cannot reliably use the ISOGG tree as any sort of clock.

From R1b1b2a1a1d1a (an arbitrary well studied subclade) to the root there are slightly more than 100 SNPs. If four SNPs mean less than 1000 years, then Y-DNA Adam lived probably some 25,000 years ago only. Just makes no sense at all.

The rate of Y-SNP discovery is definitely not equal on every branch (see above), so this kind of extrapolation just doesn't work.

VV

Maju said...

The trouble is that you cannot use the discovered and published SNPs (e.g. the ISOGG ones or YCC ones) as a clock. We really don't know two important things: what portion of SNPs have been discovered, and whether the rate of discovery is equal on every branch.

True. You can do it but at your own risk: the conclusions are not compelling - suggestive at best.

But you can't just go with the MC hypothesis and blind faith and claim that what looks like a valley is a highly sloped mountain. I, who have been for long highly sceptical of MC and its conclusions, will tell you: erm, 4 SNPs on the way, think again.

And by "think again" I mean even all the TRMC theory, etc.

There might be a molecular clock or not, it may be behave regularly or probably not, and we may be just ignoring out of simplicity loads of factors.

Plus, yeah, what's the TRMCA of "Adam" with your system? Have you even tried it at all? If you apply shorter mutation rates than the usual "conservative" ones (that IMO are very high anyhow), you move all haplogroups many many milennia towards the present and you dod that with all the phylogenetic tree, so what's the age of "Adam", the human MRCA with your equations?

It can't be but long after Toba: surely in the Solutrean. You have gone to extremes that bring you to that unavoidably. You just don't seem able to look at the overall picture and measure the consequences of applying extremely fast, uncorrected, mutation rates.

Even in an expansive process, no matter what the ideal simulations with groups of 10 virtual people say (groups ideally designed to imitate huntergatherers, not farmers), there must be irregularities of all sorts. And mostly people has not expanded but remained idle locally.

The "pedigree" mutation rate is just absurdly extreme, the so-called "evolutionary" mutation rate is already fast enough that looks like a fast expansion. IMO. We need an even slower normalized mutation rate to account for maybe 100,000 years of humans in Asia, as suggested by the fossil record. People most of the time did not expand at all.

Maju said...

The rate of Y-SNP discovery is definitely not equal on every branch (see above), so this kind of extrapolation just doesn't work.

You can correct for that. But anyhow I was using only one branch, the longest and best researched (SNP-wise) one.

But sure, I agree that this method can only produce very crude approximations. Still you just cannot ignore that, IMO, because the STR production mechanism was lazy at the time for the loci you use.

terryt said...

"Any novel mutation happens always in a single individual, how come do you make it become 80 or 90% of a population without any drift?"

Vincent has offered the same explanation as I did, so obviously I think it is correct: 'like some kind of linkage disequilibrium, except with the linkage being between the Y-chromosome and technology/culture instead of between two genetic loci'.

"We need an even slower normalized mutation rate to account for maybe 100,000 years of humans in Asia, as suggested by the fossil record".

Aren't you making an assumption here? You're assuming that the 100,000 year old humans in Asia possessed modern haplogroups. Until we're able to actually check that we have no way of knowing for sure. And Dienekes has reminded us often enough that haplogroups can easily be replaced.

Vincent said...

But you can't just go with the MC hypothesis and blind faith and claim that what looks like a valley is a highly sloped mountain.
Who said anything about "bind faith"? Not me, for sure. We use the best data and the best methods, and keep looking for ways to get better. In fact, that's the whole point of the paper this post is about.

Plus, yeah, what's the TRMCA of "Adam" with your system?

I already said that using intraclade variance to estimate TMRCAs where drift is big factor is unwise. This surely applies to questions of Y-Adam.

But, assuming you cared so much about this that you persisted anyway. Think this through. The tool I you to measure the width of a room may be very different from the tool you use to measure the width of an ocean.

There is nothing incongruous about that, so why should you be shocked to find that the right tool for estimating the TMRCA of a R1b1b2 may not be identical to the right tool for estimating the TMRCA of the entire Y-chromosome phylogeny?

Again, that's a crucial take-away from the current paper: if you pick the appropriate markers for your task in the first place, you won't have to do all the "evolutionary effective" correction BS in the first place. Don't use dinucleotide or trinucleotide STRs for dating Y-Adam. Pick as many pentnuclelotide and hexanucleotide markers as you can find and work from there. Or better yet, use SNPs.

VV

Maju said...

Someone from the extinct Yugoslavia wrote: "be two to agree is not to be two to be right".

Linkage disequilibrium between the Y-DNA and culture, WTF? What culture, btw? I only see generalist claims all based on the MCH (or rather an extremist version of it) and little preshistorical knowledge. Stones and bones are still more real than fuzzy equations and computer models.

Aren't you making an assumption here? You're assuming that the 100,000 year old humans in Asia possessed modern haplogroups.

I am assuming they were somewhere in the tree, in fact in the Eurasian Y(xA,B) branch. Could be wrong but is a reasonable thought if you do not begin with the MC straight away. Alternatively they went extinct.

And Dienekes has reminded us often enough that haplogroups can easily be replaced.

That's his opinion. Just that. I don't see Dienekes as any teacher but as someone who shares interests with me, even if from a very different viewpoint often.

Who said anything about "bind faith"? Not me, for sure.

Obviously it was me - and I assume all the responsability, of course. When you build all the explanations based on a badly tested hypothesis (or rather a racial variant of it, much more controversial), instead of using a wider array of data, when you, rather shockingly, claim things are that way based only on that very feeble theoretical basis, I call that faith. The MC is not C-14 by any means but you treat them the same way.

It's not essentially different than when someone claims that Earth is 4000 years old based "on the Bible" only without much or any contrast with other more relevant data.

I think most scientists are much more cautious when dealing with these. They do not make bold claims based only on TRMCA educated hunches: they wisely just state them and let us and other scientists judge. But in a more "commoner" layer of the debate some people have gone too far in this line and are at just a mere step away from worshipping MCH estimates.

And I wonder how much of this has to do with the business of selling genetic tests and promising "accurate results" that should also sound familiar (i.e. recent and not remotely prehistoric and rather trivial) to customers.

I already said that using intraclade variance to estimate TMRCAs where drift is big factor is unwise.

We can agree on that. The problem is that I understand that this applies to all lineages that are widespread: that non-drifted lineages are all private or at most moderately extended. That there is impossible to explain things like R1b or whatever of the like without a brutal drift at various stages prior to R1b1b2a1a1 expansion.

The tool I [use] to measure the width of a room may be very different from the tool you use to measure the width of an ocean.

Absolutely. But we are talking of something only some 5 times longer (I mean: the "most derived" R1b branch has, from the R1b node, about 1/5 of the whole length when we count from the absolute human root), if we attend to known SNPs. It's a room and a house what we're comparing here: the difference is not that big - or does not look that way at least.

Or better yet, use SNPs.

Hmmm... ok. But if I can use SNPs to measure the house why can't I use them to measure the room? It's all in meters after all.

Maju said...

Erratum: "or rather a racial variant" should read "or rather a radical variant".

Vincent said...

But we are talking of something only some 5 times longer (I mean: the "most derived" R1b branch has, from the R1b node, about 1/5 of the whole length when we count from the absolute human root), if we attend to known SNPs. It's a room and a house what we're comparing here: the difference is not that big - or does not look that way at least.

For starters, the cases are more different than you concede. R1b1b2 has a TMRCA of less than 8,000 years while Y-Adam lived more like 150,000 years ago. Don't let yourself fall again into the weak thinking of using the known SNPs as a clock.

And the thing you must aim to avoid is mentioned in the paper: mutational saturation. Saturation refers to the effect of constraints on allele range, which cause variance to accumulate non-linearly with time. In other words, it is what causes a marker to have different degrees of "clockiness" when compared over different lengths of time.

Some STRs that work perfectly well as a clock for short time frames (say, 100 generations) are horrible clocks for longer time frames (say, 1000 generations) and completely unreliable at even longer time frames (say, 5000 generations).

So STRs are great for estimating short periods of time. They "tick" quickly, so you can measure small numbers of generations with a small number of markers. With greater numbers of generations, saturation starts to kick in and their performance as clocks deteriorates.

SNPs are great for estimating long periods of time. You need a huge number of them if you want any precision, but each SNP "ticks" so slowly that saturation is hardly a concern at al.

Want to estimate TMRCA of R1b1b2? 50 to 100 medium-fast STRs should do the trick.

Want to estimate TMRCA of Y-Adam? Dump the medium-fast STRs and find 50 to 100 really slow ones, or else sequence 100k bp of SNPs.

VV

Maju said...

For starters, the cases are more different than you concede. R1b1b2 has a TMRCA of less than 8,000 years while Y-Adam lived more like 150,000 years ago. Don't let yourself fall again into the weak thinking of using the known SNPs as a clock.

But that's mere circular reasoning: to justify a TRMCA estimate you use a TRMCA estimate. It's not valid evidence.

All the rest you say is very interesting but only applies within the MC paradigm. And MCH has not been proven in any way like C14 or other generally accepted datation methods have been once and again. It's just a theoretical construct.

Vincent said...

But that's mere circular reasoning: to justify a TRMCA estimate you use a TRMCA estimate. It's not valid evidence.

I am not trying to "justify" anything other than using the best data and methods for the task at hand. I wouldn't have thought that such rigor demanded justification.

And MCH has not been proven in any way like C14 or other generally accepted datation methods have been once and again. It's just a theoretical construct.

That's just flim-flam. The molecular clock is "generally accepted" practice in the field of population genetics, without doubt.

And C-14 dating is no less a "theoretical construct" than the molecular clock is. Or, said more directly, both are equally real. C-14 dating may be both more accurate and more precise than STR based dating or Y-SNP based dating, but there is nothing "theoretical" about observing a genome mutate from father to son.

VV

terryt said...

"I am assuming they were somewhere in the tree, in fact in the Eurasian Y(xA,B) branch. Could be wrong but is a reasonable thought".

Only because it fits your pre-existing beliefs. I agree it could be, but not necessarily so.

Vincent. That link you provided on waves is most interesting. I've always maintained that individual genes move through populations effectively in a wave.

A short digression from the direction you and Maju are going: I have a problem with the author's Rh data. He (or rather an author he quotes) claims the original was Dce, but it seems this claim is based primarily on the fact that this version is predominant in Africa, and it is then presumed all modern human genes originated there.

We know from animal and poultry breeding that mutations are much more likely to give rise to recessive genes than to dominant genes. This is just as well because we all have at least one allele that would be very disadvantageous if we had a double dose. Therefore my guess would be that the original Rh gene would have been DCE, never mind that we don't find it today. However the author mentions that dCE, a single mutation from DCE, is still around though very rare. Another single mutation DcE is found in Siberia.

The next single mutation DCe is found in SE Asia and Oceania. The African version, Dce, requires another mutation from either this version or from the Siberian DcE, so is probably not the original version. The other mutation from DCe, dCe, is present today but again described by the author as being a minor haplogroup. The Rh- gene, dce, could derive from either dCe or Dce, depends on where the first is most commonly found.

Maju said...

I wouldn't have thought that such rigor demanded justification.

You present conjectural stuff as proven and that demands a justification... or rather proof.

That's just flim-flam. The molecular clock is "generally accepted" practice in the field of population genetics, without doubt.

But is not demonstrated in any consistent way. Lobotomy and electroshock was "generally accepted practice" not so long ago in the field of psychiatric medicine. That is a commonplace (maybe for lack of better means or maybe for intellectual laziness and academic inertia) means absolutely nothing about it being true.

C-14 is a valid datation method but before it was accepted, maybe in a less credulous age, it had to be satisfactorily proven. Even after that, it has needed of some significative refinements anyhow to reach the quite decent calibrated dates we have now. But first of all it was confirmed in its efficiency, would it be a mere statistical conjecture as the MC, it would have never been accepted.

I'd like some clear evidence of MC really working as it's claimed before it is presented as evidence of anything else. But some people seem to prefer to put the cart before the horses and to "prove" hypothesis on the results of equations based on other hypothesis. That I call credulity.

I accept, with due reservations, the MC as a method of conjectural estimation but I cannot accept it as proof of anything that is against other logical elements present. I cannot just accept it on its own regardless of everything else.

... but there is nothing "theoretical" about observing a genome mutate from father to son.

I have absolutely no problem with that part. But the accumulative processes in the haploid genome hardly have to do anything with that raw mutation rate. They are much more dependent on demographic factors than anything else.

In a sense we are in agreement: we agree that the raw mutation rate cannot be used in all circumstances. But you claim that it can be used in some circumstances, while I instead think that such claim is highly conjectural and contradictory. I rather think that with an expanding population and the raw mutation rate, no fixation can ever happen.

For example, you are using a computer simulation (by definition highly imperfect) that is concieved to reproduce a very simplified hunter-gathering model, to justify the same process happening within a farming context, where the population figures should be 10 or 100 times higher at least. And you are not even including by any means the dual nature of European Neolithic waves, which should have caused not one but at least two different founder effects.

For example, you are simply discarding a lot of contradictory evidence from the presence of other haplogroups that do fit much better with the Neolithic waves structure, like E1b1b1 and J2b. Or, for example, you are ignoring the very apparent period of calm that lies between R1b1b2a1-L51 and R1b1b2a1a-L11, by mere SNP count.

How come, for example, the Neolithic wave, that is an East to West movement, produces links between SW and NW Europe, axis that is clearly perpendicular to the double Neolithic path?

You may not be yet aware but it has been known these days that small mammals like shrews and voles also have their own "Celtic fringe" in Great Britain, in a very much parallel way to that of humans. The human "Celtic fringe" is not just defined by extremely high R1b but also by other genetic factors like very high Rh-, factors that you prefer to ignore.

For you there seems to be nothing else that conjectural MC estimates. And these are for you enough evidence to discard everything else.

Obviously I can't agree with you in this: I want to look at the whole picture and not allow the tree of my obsessions to hide the forest behind it. MC estimates alone cannot prove anything.

Vincent said...

Obviously I can't agree with you in this: I want to look at the whole picture and not allow the tree of my obsessions to hide the forest behind it. MC estimates alone cannot prove anything.

We may be at an impasse, because you discard data and the scientific method as implausible because they don't fit your preconceptions. That is no way to carry on a discussion.

But before I abandon you, let me warn you once more to abandon any attempt to use the ISOGG tree as a molecular clock. That you do so belies your grasp of the subject, I'm afraid, and risks leading other poor souls down the same path.

That you reference "the very apparent period of calm that lies between R1b1b2a1-L51 and R1b1b2a1a-L11, by mere SNP count" is sadly bizarre. Sad, because it is completely wrong. Bizarre, because you dismiss the molecular clock as a tool on one hand, then cling to it with the other.

VV

Maju said...

I cannot accept that you claim the MC as "scientific method". It's essentially conjectural, nothing else.

The scientific method basically has three phases:

1. Hypothesis, including predictions (this is the stage of the MC conjecture for most of its corpus)
2. Testing of such predictions experimentally (the MCH has never gone through this)
3. Refinement, including independent replication

Bizarre, because you dismiss the molecular clock as a tool on one hand, then cling to it with the other.

As I said before I don't fully dismiss the MCH, I just take it for what it is: an unconfirmed fashionable hypothesis. I know of the severe limitations of using the SNP tree for MC analysis but I also know that a more or less random sample of the actual SNPs is already there for the longer and best studied branches. And I know that when you claim (on mere hypothetical unconfirmed grounds, worse: on your personal radical reading of such feeble premises) that a 4% of the sample suddenly appeared in less than 1% of the time, looks suspicious.

It's not my sole objection, if you bothered reading.

eurologist said...

My theory is that from Italy there was an expansion to the Balkans...

Gioiello,

Also, don't forget that the geography and climate strongly hint that the Northern Italian and NW Balkan LGM refugia may very well be identical:

The northern part of the Adriatic (now under water), together with the Po valley and the NE Adriatic coastline formed one continuous source region, and probably by a wide margin the most abundant one in the general area. However, during summers and milder years, it had easy access to both the very nearby small inland plains and the farther, larger ones (around today's Ljubljana and Zagreb). In addition, the NE Adriatic cost receives and likely received at earlier times some of the most rainfall (and snowfall) in the region.

In contrast, most of today's Hungary and Romania must have been brutally dry and cold (they are even today, in comparison) - perhaps someone with better regional knowledge can tell me about their suitability to animal grazing during LGM.

Gioiello said...

I spoke on this some years ago on "Genealogy-dna" before they banned me. But in those times the refugia were in Spain, Balkans, South Russia and nobody spoke of Italy. This was what was unacceptable for me. Now someone speaks also of Italy as a refugium and for me is already something.

eurologist said...

That may be because people pictured Italy just like most of (Southern) Spain, Greece, and much of SW Anatolia: a dry wasteland that could not sustain pines nor oaks nor berries or other sources of vegetable food, nor relevant animals other than perhaps Rabbits, and had few reliable sources of fresh water (except at the base of the highest mountains). None of this is of course true for the contiguous region I outlined above.

Just looking at modern data as a proxy:

Ljubljiana: three months below freezing, average low -1C to -4C, average precipitation: around 100mm year-round

Zagreb is similar in the winter with an average rainfall of 70mm.

Craiova, on the other hand, has much lower extremes, averages ~-8C in January and February, with ~5 months of average below-freezing temps, and about half of Ljubljiana's precipitation.

Budapest today is just slightly warmer in the winter, but as dry year round.

In comparison, Kaiserstuhl, southern upper Rhein valley, still disputed if it could have harbored any humans at all during LGM has an almost Mediterranean climate in comparison.

Maju said...

Eurologist: Hungary, or at least parts of it (as well as of nearby Austria and very specially Moravia) were inhabited in long Ice Age periods, however the LGM as such is not clear enough. Moravia in particular is claimed to have been an LGM "oasis" (i.e. it had patches of forest even in that extremely cold period) and may have hosted some of the Central European "survivors" of the time (but I know of no material fossils).

Now for the material record, in Hungary there seems to be a late "Solutrean", before even Magdalenian expansion (but after the LGM). In NW Germany there seems to have been late Aurigancian around the LGM, that is claimed sometimes as the inspiration for Magdalenian (though this culture evolved as such in SW France).

You are right about Romania (and in general the Balcans) being quite empty (at least in what regards to the fossil record) in the UP. Romania specifically nevertheless was colonized from Ukraine in the Epipaleolithic.

And you're right too that the East Adriatic should correlate with Italy more than anything else.

However, beyond industries ("cultures"), there is a cultural divide in UP Europe that I never fully understood: there are areas with rock art and others where all art is portable. This may be trivial or not. The areas with rock art are the Franco-Cantabrian region (very specially), Iberia, southern (but not northern or central) Italy, Dalmatia and, out of Europe as such and rather late, southern Turkey and southern Egypt. The rest, including Central-North France, Central Europe, most of Italy and Eastern Europe only show portable art (venuses and other stuff).

The main climatic areas (excluding arctic/tundra/taiga) seem to be:

1. Loess steppe (rather rich) in the Rhine and the Danube basins, as well as in southern East Europe.
2. Continental steppe (hostile) in Northern France, the Balcans and central Iberia. These areas were mostly devoid of humans.
3. What I would call Ice Age Oceanic (semi-steppe but milder and more humid) in most of the Franco-Cantabrian region. Possibly the best climate.
4. Ice Age Mediterranean in most of Italy, coastal Iberia and surely most of Greece too, dominated with deciduous forests. Not so good for humans attending to the abundance of remains but quite temperate.

eurologist said...

Thanks, Maju.

I knew about Moravia, and that - as well as the Italian refugium - could very well be candidates for the "central axis" of Y-haplogroup I (perhaps I2, only).

I think during LGM, southern Iberia, southern Italy, and Greece were probably much, much less hospitable than just before.

In the above, I just wanted to point out the huge East-West gradient of temperature and precipitation both away from the Atlantic (long scale), and away from the Mediterranean (shorter scale). Even today, about a factor 3 in precipitation away from the Adriatic, and over 10C colder in the winter, within just a little over 500 km - that's roughly the equivalent of a 2,000m altitude change!

As to the cave art, AFAIK there are remains of paint on several southern German caves. However, most caves are of material and/or situated in regions that do not allow much surface preservation over long time. Clearly, Hohle Fels and other Danubian caves have shown that portable art and musical instruments were pretty much part of the people right when they came to the area, 35,000 to 40,000 years ago, or at least a second wave.

Maju said...

I think during LGM, southern Iberia, southern Italy, and Greece were probably much, much less hospitable than just before.

The case I know best is Mediterranean Iberia (or just Iberia for short) and the opposite is actually the case: in the LGM is when Gravettian culture arrived (Aurignacian settlement was very sparse) and soon after hybridates with Solutrean, generating the most original culture of the regional UP: the Iberian Gravetto-Solutrean (which IMO may be at the origin of Oranian, but that's another story).

Even if it got colder, it was always warmer than the rest of Europe. It had a quality of refugium, that's quite clear, but it's not likely that it actively participated in any post-LGM recolonizations. For that you have to look at the Franco-Cantabrian region mostly. In this sense, I always regret the little attention that Occitania (Southern France) is being paid by geneticists.

In the above, I just wanted to point out the huge East-West gradient of temperature and precipitation both away from the Atlantic (long scale), and away from the Mediterranean (shorter scale). Even today, about a factor 3 in precipitation away from the Adriatic, and over 10C colder in the winter, within just a little over 500 km - that's roughly the equivalent of a 2,000m altitude change! -

That's largely because of the orography of Europe, with all the mountains being rather close to the Mediterranean. The transition when you cross those chains is very intense. There's nothing of the like at the Atlantic-Interior axis. But that says nothing against human life being possible and even good quality at the Mediterranean coasts. The differences between the Med and the Atlantic/Continental areas seems more a product of ecology: the steppes (or at least some of them) may have been more productive than the forests. In the very particular case of the FC province, the rather harsh steppe conditions were largely ameliorated by the Ocean, being somewhat warmer and more humid. It is also several degrees to the south in comparison to the Central or East European provinces.

Whatever the case, it is quite demonstrated that the FC region hosted most of the population of Europe through all the Ice Age. All the others were secondary provinces.

As to the cave art, AFAIK there are remains of paint on several southern German caves. However, most caves are of material and/or situated in regions that do not allow much surface preservation over long time.

Maybe I'm totally wrong when suggesting this but the case is that the Bramanti paper (and other aDNA data, like Chandler's Portuguese) would seem to suggest certain duality between Southern and Central Europe that just doesn't fit well with the cultural processes of the tool kits. That's why I am considering other possible variables; because obviously Neolithic did not expand from Portugal or anywhere in SW Europe, so H must have existed in other parts of southern Europe, like the Balcans or Middle Danube.

It might be totally unrelated to any particular cultural element excepting maybe the somewhat dual early colonization of the continent, as Mellars suggested. Or it might be that the spread of U5 may be associated with Gravettian specifically but only at the origin (Central Europe), with the situation elsewhere being different. It might even be that both spread with Gravettian and that SW Gravettian arrived from Italy and not Germany (though Gravettian was late and weak in the FC province).

Largely speculating in search of the best possible answer(s). What I think is that the Taforalt data strongly suggests that H existed in SW Europe in Gravettian times (LGM roughly for that region), as it's been apparently estabilished that North African H almost totally of Iberian derivation. Chandler's Portuguese data seems to ratify this idea, as does arguably the Paglicci samples.

Just food for thought.

eurologist said...

Thanks for the link. The Adriatic is of course a bit inconclusive, since much or it is under water, now.

I agree that populations in the plains would always be larger, as long as good sustenance is possible: it's basically 2-D versus a bit over 1-D (fractal) along a coast, and the possibility of hunting grazing animals that often don't exists close to coasts.

I think one could make a good case for the Danubian to be a bit of a melting pot in the 500 or so years of transition before agriculture truly took of. Both the Danubian and the Mediterranean coastline IMO had their own reasons why initially there was little genetic impact from the Anatolian agriculturalists (except in the southern Balkans, of course):

before the advent of large sea-faring populations, agriculture could spread in a filtered way along the coastline through existing populations with very little genetic input. My guess is that later travels (Phoenicians, Etruscan, Greeks) had a much larger impact.

At the Danubian, agriculture just got stuck for the numerous reasons I have often mentioned. But there are indications that the region was already "up-and-coming" with some of their own plantings and separation of labor (fishing, stone collecting and tool making).

Since agriculture was just being tested and was marginal, you really didn't have that single genetically overwhelming population there coming in from Anatolia: in fact, at the NW boundary agriculture obviously for centuries was so marginal that population growth seized altogether.

And when it did take off, it pretty much started with some random locally assortment of haplogroups. Some of it local H, some of it likely Anatolian, and some of it just idiosyncratic (N1a).

I still don't know what to make of today's anatomical differences between Northern Europeans and Mediterraneans, that are neither reflected in Y or Mt-DNA nor in linguistics. The latter are clearly more closely related to the original Southern groups and later (mostly bronze age and historic) Mediterranean newcomers.

But the more Northern populations should have originated from the same LGM refugia - with the only difference of a more significant, homogeneous agricultural wave, and subsequent waves of invasions from the East. Some of that seems to make sense, but much of it is not reflected in the genetic data (outside of autosomal).

It would be so much easier if one could pin-point later population groups to specific refugia, but that seems to be somewhere between wishful thinking and extremely difficult, at this point.

One thing is clear: once the climate improved, the large grazing animals vanished, life became harder again, and population density likely was relatively low before the advent of agriculture.

Maju said...

The Adriatic is of course a bit inconclusive, since much or it is under water, now.

Sure. Although that also applies to other areas with large continental platform, like SW France or Doggerland. We can only extrapolate from the still existing land's findings.

My guess is that later travels (Phoenicians, Etruscan, Greeks) had a much larger impact.

In some specific regions maybe but in others it's pretty clear that the impact was strictly limited to some colonies and outposts (emporia). For instance I would not expect any strong genetic influence in Iberia from either colonial ethnos. Now, I am quite intrigued by the possible role of less famed "colonialist powers" of the Bronze and Iron age, like post-Mycenaean Cyprus or whatever was driving the cultural exchanges across the E-W Mediterranean axis in the pre-Mycenaean Copper/Bronze Age (Troy?, Cyclades?, Crete?).

But I still suspect that most of the East Mediterranean genetic impact in "the Hesperides" was anyhow Neolithic, just that it was initially very localized in some specific areas of true colonization like the Valencia-Alicante area, getting diluted only with time.

You make a difference between North and South Europe on mere anatomical evaluation but there is a real autosomal genetic evaluation (Bauchet 2007 for example) that clearly defines more than one South European group. Iberians for example, while carrying a god deal of East Mediterranean blood, are essentially just Iberians (a distinct group). Basques too (with much lower exotic input). So either the affinities are older or they are merely caused by climatic adaptation (i.e. the big differences/affinities are in pigmentation in fact, not in other anatomical aspects).

But the more Northern populations should have originated from the same LGM refugia...

That's something we don't know for sure. We know that the Magdalenian cultural package expanded in such a pattern but the case for a total depopulation of Central Europe in the LGM has been questioned a lot and may in the end happen that Central Europeans (and by extension Northern Europeans too) are largely derived from local pre-LGM populations, or at least that they were until the arrival of Neolithic.

One thing is clear: once the climate improved, the large grazing animals vanished, life became harder again, and population density likely was relatively low before the advent of agriculture.

Can you document that? I don't see that clear at all. AFAIK there was a shift in the focus of the economy, more focused on smaller animals and seafood, a shift in some cultural manifestations (rock art for example vanished in many areas, it seems) and a microlithization of the toolkit. But I have never seen any evidence that population decreased. In fact many areas that were once uninhabitable became available instead. You see people migrating to Scandinavia and Scotland, almost as fast as the ice receeded but you also see people colonizing inner Iberia, that was mostly a desert before.

For example in the case of the Basque Country, the number of important archaeological sites grows in Azilian in contrast to the Magdalenian period, partly driven by that colonization of the south but not only. And Epipaleolithic is much shorter than the Magdalenian period.

terryt said...

"Can you document that? I don't see that clear at all. AFAIK there was a shift in the focus of the economy, more focused on smaller animals and seafood".

We know that the large grazing animals dissappeared, and we can safely assume that 'smaller animals and seafood' had always been a part of the diet, so life would certainly have become more difficult. The example of 'people migrating to Scandinavia and Scotland, almost as fast as the ice receeded' is quite a while after the animals had died out, same for the Magdaleian/Azilian transition.

Maju said...

No species of big game got extinct except mammoth (early on in the UP, I think), the woolly rhinoceros and the cave bear (if Neanderthals had left any). All the major species survived, though reindeer and maybe bison migrated northwards, where they were followed by some groups maybe (though most likely they followed seals instead, IMO). Aurochs, deer and wild goat, which made up a good deal of the UP diet in the FC region (and many other places surely too), survived and did not have to migrate. Deer (and similar species) in fact was favored by the expansion of the woodlands northwards, and the same must have happened with boars.

More important may have been the migration of seals in my opinion and that may account for an increased consuption of ivertebrate seafood.

I have no particular reason to think that the population of Europe got negatively affected by warming. The archaeological data rather suggests continuity and even limited expansion. What may have increased, as the big herds moved north, was semi-sedentarism, already existent in the UP anyhow. As people relied more on aurcohs, deer, boar, wild goat, fish and seafood, instead of migratory herds like bison and reindeer, they probably needed to migrate much less behind their "livestock".

Vincent said...

I know of the severe limitations of using the SNP tree for MC analysis but I also know that a more or less random sample of the actual SNPs is already there for the longer and best studied branches.

I would think even a novice could see that the published SNP tree is nothing remotely like a random sample of "actual SNPS".

Take R1b1b as an example.

How many SNPs in the ISOGG tree between the R1b node and a living man in R-L21? Sixteen.

How many SNPs in the ISOGG tree between the R1b node and a living man in R-L23? Nine.

How many SNPs in the ISOGG tree between the R1b node and a living man in R-M73? Three.

How many SNPs in the ISOGG tree between the R1b node and a living man in R-P25? Two.

The time elapsed between the R1b MRCA and now (e.g. today) is a single, definite length. Yet according to the ISOGG tree one of these guys differs from the R1b MRCA by 8 TIMES as many SNPs as another. It should be quite obvious that all those branches have not been sampled equally.

In short, is possible to use SNPs as a molecular clock but not by simply gathering up whichever SNPs happen to be lying about close at hand. Karafet et al. are only ones to have published an attempt, at least with Y-SNPs and they make clear reference to the limitations that ascertainment bias imposes: limitations that should not be ignored.

VV

Maju said...

Please. I always think to the root of the human tree or maybe to something as remote as the F node. I would not dare to estabilish a chronology between the ill studied R1ba and the highly studied R1b1b2a1a - it would be at best a mere crude exercise with virtually no value.

But this says nothing about the fact that you are trying to push four SNPs into a fraction of what we would expect them to take to evolve and become fixated just because of blind faith on the MCH, or rather in an extremist revised version of it.

You are entitled to do that but I'm entitled not to believe a word and to say it loudly. Based on the SNP data, which for me is not worse than the MCH, much less than your peculiar revised version of it, there should be a pause and not acceleration of growth between R1b1b2a1 and R1b1b2a1a, i.e. between the arrival of R1b to Western Europe and its likely main expansion. This suggests a time to drift, maybe the LGM, before R1b1b2a1a expanded and no MCH speculations will make me think otherwise, specially if they are so blatantly in contradiction with the archaeological data.

Where are the two distinct founder effects in "Cardial" Mediterranean Europe and "Linear" Central Europe? Considering the random nature of FEs, they should be any two (or even fifteen) different "West Asian" or at least "Balcanic" lineages but they are nowhere! Hence it cannot be Neolithic, at least most probably not.

Think again, please. You need drift to produce what we see and you need a cultural and demic phenomenon that extended through the Western half of Europe as a set, not one through Central Europe and another through the Mediterranean (and a whole array of different ones like slow growing mushrooms throughout the Atlantic).

Vincent said...

But this says nothing about the fact that you are trying to push four SNPs into a fraction of what we would expect them to take to evolve and become fixated just because of blind faith on the MCH, or rather in an extremist revised version of it.

First, I am not pushing anything. I am looking at data and understanding it. There is no "push" involved.

Second, do you have an idea about the Y-SNP mutation rate and the length of the Y-chromosome? If you do, then you know that there is roughly one new SNP produced each generation. So three or six or eight SNPs in a lineage over 30 generations is barely even remarkable.

Further, if you want to believe those three SNPs between R-M269 and R-L11 account for 10,000 years instead of 800 then what do you think about the 60 or so SNPs between R1 and R1b1b2? Is R1 really 200,000 years old in your conception of things? I sincerely hope not.

And there is no refuge for your logic further up the tree, I'm afraid. Look the the 23andMe dataset. How many SNPs between F and Q? About 80. How many between F and O? 32 or fewer, depending on which branch of O you count.

The point is, the universe of known Y-SNPs is extremely skewed by ascertainment bias. Only a rigorous approach can make that skewed sample into a clock, and counting the SNPs on the ISOGG tree is not such a rigorous approach.

Maju said...

Forget about the SNPs. It's not my main point, just another element of consideration.

My main point is the impossibility of a recent demic expansion causing the "Celtic fringe" type of distribution of R1b in Europe. This can only be the product of a demic expansion not of R1b itself but of something else (Neolithic, Indoeuropeans).

And there is not one single Neolithic but several and you are avoiding all the time to answer to this crucial issue that simply makes impossible the kind of expansion you propose. You are just using the SNP/tandem repeat apparent contradiction to divert the real issues here.

If you'd be telling me: "I think that, as my TRMCA estimates give me a "recent" expansion and the pattern is in concordance with Tardenoisian expansion in the Epipaleolithic, then it must be Epipaleolithic". If you said something like that, I'd say: ok, it's a most interesting possibility. But you are all the time talking Neolithic, as if Neolithic was a single culture and its expansion a single process. So I have to say: no way!

Maju said...

Or to go to a extreme comparison: there has been a very fast demic expansion in America (not just the USA but the whole continent/s) in the last few centuries, surely with rates of demic replacement that Neolithic farmers could only dream of. Do you see any of your model's ideas become real in America? Or rather you see a series of local founder effects with no or very limited phylogenies?

Maybe there's a limited founder effect's trail expanding from Pennsylvania to Oregon but this will never be the same as the founder effect that extends from Cuba to Tijuana or from Panamá to Peru. These are just assumed examples, just in case.

Plus in America you also have a "Celtic fringe" of sorts: a Native fringe where the pre-colonial peoples and their lineages are much more frequent than elsewhere, for instance the Andes or the Arctic. All this, that is just common sense even for the extreme case of massive demic replacement like the modern colonialist one, is missing in your equation.