September 18, 2008

Y chromosomes from the Pyrenees

Once again, this paper uses the inappropriate 0.00069/locus/generation mutation rate, hence all its age estimates are wrong. I wonder who the first scientist will be to say that the Emperor has no clothes; the practice of uncritically using a mutation rate derived under totally inapplicable demographic assumptions will eventually be noticed.

From the paper:
However comparing the average STR variances of the R1b1b2c (0.243), R1b1b2d (0.207) and I2a2 (0.278) lineages considered in this study and given the replicated estimates pointing to a Mesolithic time frame for the origin, diversification and diffusion of the I2a2 clade (Rootsi et al. 2004), the temporal interpretation here provided for R1b1b2c seems reliable.
Reliable indeed. Even with the wrong mutation rate these lineages can't be pushed to the Paleolithic. Better estimates for them are: R1b1b2c: ~1,350BC; R1b1b2d: ~850BC; I2a2: ~1,800BC.

From the paper:
However, the time to the most-recent common ancestor (TMRCA) of the Pyrenean R1b1b2d lineages was here estimated at 7383 ± 1477 years ago, which is consistent with an early dispersion of R1b1b2d all over the Pyrenees and subsequent dissemination outside the mountain range from the Neolithic era onwards. The much younger age estimated by Hurles et al. (1999) for the SRY2627 mutation can, nevertheless, be explained by the mutation rate used (2.1×10−3, for microsatellites), which does not take into account evolutionary considerations (see Zhivotovsky et al. 2006).
Hurles was right; the authors should follow their own advice and see Zhivotovsky et al. 2006. They will realize that their 0.00069/locus/generation is derived for a demographic scenario in which a lineage originating 7383 years ago has only ~150 living descendants, an underestimation of several orders of magnitude.

From the paper:
The Y lineages representative of what might have been a pre-Neolithic male genetic composition in Iberia, were those bearing the Palaeolithic mutations M269, including its Mesolithic derived branches R1b1b2c-M153 and R1b1b2d-SRY2627, plus those falling in the I clade defined by the Mesolithic M170.
It's as if time has frozen and scientists are doomed to forever repeat what other scientists have said before them.

Annals of Human Genetics doi: 10.1111/j.1469-1809.2008.00478.x

In search of the Pre- and Post-Neolithic Genetic Substrates in Iberia: Evidence from Y-Chromosome in Pyrenean Populations

A. M. López-Parra et al.

Abstract

The male-mediated genetic legacy of the Pyrenean population was assessed through the analysis of 12 Y-STR and 27 Y-SNP loci in a sample of 169 males from 5 main geographical areas in the Spanish Pyrenees: Cinco Villas (Western Pyrenees), Jacetania and Valle de Arán (Central Pyrenees) and Alto Urgel and Cerdaña (Eastern Pyrenees). In the Iberian context, the Pyrenean samples present some specificities, being characterizeded by a high proportion of chromosomes R1b1b2-M269 (including the usually uncommon R1b1b2d-SRY2627 and R1b1b2c-M153 types) or I2a2-M26 and low proportions of other haplogroups. Our results indicate that an old pre-Neolithic substrate is preponderant in populations of the whole Pyrenean fringe. However, AMOVA revealed a high level of substructure within Pyrenean populations, partially explained by drift effects as well as by the signature of an ancient genetic differentiation between Western and Eastern Pyrenees.

Link

18 comments:

Maju said...

Most interesting for me is the apparent west/east divide between the two R1b1b2 subclades, with Basques (Cinco Villas included) having the -a2c (M153) subclade in greater apportion and other Pyreneans instead showing the -a2d (M167) as comparatively dominant.

It is very noticeable that this last subclade is found most frequently among Aranese, who are not just a very isolated Pyrenean population but also ethnically Gascons (Aranese are the only Gascons in Spain and the Val d'Aran is the only municipality that has Gascon language as co-official anywhere). This suggests that Gascony and, in general, Occitania (roughly Southern France) deserve more and better genetic studies. After all the Basque, Gascon and Occitan countries comprise most of the most important Paleolithic province of Europe and some genetic remains should sill be there, specially among Gascons (who used to be Basques until some 1000 years ago).

As for age estimates, I think it was me who used the metaphore of the emperor's nakedness in a past discussion on the same issue. The reason why geneticists prefer the Zhivotovski approach to that of Hurles and yourself is surely that it fits much better with the archaeological record, regardless of theoretical considerations. Personally I would even push it somewhat (15-30%) into the past as that is probably what the Homo-Pan divergence age should be moved to be realistic, what for this case would yield dates of c. 10-9,000 BP for the R1b1b2 subclades' origin.

Anyhow, if I don't understand you wrong, you claim that there are a handful of guys who get it all (meaning: descendants) and a vast majority that gets nothing in the end. This idea doesn't seem very reasonable, sincerely, as in our quotidain experience we can perfectly see that most guys do have offspring, normally including males, and that the most succesful men don't seem to manage to transfer their gender-biased advantage to their descendants for many generations.

dienekesp said...

Anyhow, if I don't understand you wrong, you claim that there are a handful of guys who get it all (meaning: descendants) and a vast majority that gets nothing in the end.

Even under conditions of reproductive equality, in which each man has offspring according to the same probability distribution, over many generations, most of the offspring are produced by a small percentage of the males.

as in our quotidain experience we can perfectly see that most guys do have offspring, normally including males

From this generation, a fraction of the males won't have children, and a fraction of those who do won't have sons.

Repeat for a few dozen generations, and you see that a small percentage of living males will have patrilineal descendants.

dienekesp said...

The reason why geneticists prefer the Zhivotovski approach to that of Hurles and yourself is surely that it fits much better with the archaeological record, regardless of theoretical considerations.

Well, the Zhivotovsky approach is not based on archaeological considerations but on genetic ones.

It is based on a theory that Y-STR variance is reduced by bottlenecks in a haplogroup's history. That theory in turn is based on two assumptions that make it inapplicable:

(1) Y-STR variance is reckoned from a founding "Patriarch" of a lineage and not from the actual MRCA. Hence it can't be used for TMRCA calculations under _any_ demographic scenario.

(2) The 0.00069 rate is specific to a particular scenario (m=1), and a higher rate is observed when m is greater than 1. The assumption of m=1 would lead to a haplogroup of ~150 men living today, while m>1 would lead to larger haplogroups, and as I have shown, for such larger haplogroups the rate is close to the germline rate.

So, if anyone wants to uphold the "evolutionary rate", they must propose a new genetic mechanism that would lead to such a low rate. Saying that it correlates with archaeology in a non-quantitative hand-waiving type of way (i) isn't persuasive, and (ii) isn't what geneticists are paid to do.

What has happened is a cyclical reasoning trap, where the Semino/Wells model from the early 2000s, which was based mostly on archaeological correlations of haplogroups has come to be taken as evidence from archaeologists themselves that the geneticists agree with their own statist theories.

Maju said...

Repeat for a few dozen generations, and you see that a small percentage of living males will have patrilineal descendants.

I can consider from historical records one or two dozen generations easily (that's 3-6 centuries, nothing more) and there doesn't seem to be any major change in surnames frequency. Anyone who has made some genealogical research can tell you that. Sure that some surnames may tend to decrease/increase slightly and even a handful may disappear totally but the effect is not, definitively, even a fraction as dramatic as you describe.

Surnames are not exactly the same as Y-DNA clades (adoptions and illegitimate children are treated differently) but they are a pretty good approximate.

You are ultimately talking of drift and it's well known that the effects of drift in large populations tends to zero. It only behaves as you say in small populations, like the ones that surely existed in Paleolithic times. As population grows, the random effect of drift becomes nearly trivial.

...

Your explanation of Zhivotovski's method and underlying theory is interesting in any case... but I fail to see what's the most recent common ancestor in a pure patrilineage: it is the same for all humans (and beyond until the first male of whichever ancestral haploid species - some Jurassic "mammaloid" lizard probably). The concept of "patriarch" (the first man having a particular clade, the first one carrying that mutation in his Y-DNA) makes sense instead.

The assumption of m=1 would lead to a haplogroup of ~150 men living today

In fact, we surely have such kind of minihaplogroups too: they are all those termed "private" or "familiar". They are not considered in broad population genetics and seldom analyzed or even detected at all (their defining SNPs are normally not known). We always talk of haplogroups that are much larger, earlier steps of the same diversification chain, older common ancestors or patriarchs, patriarchs that in most cases must have existed in Paleolithic times or soon after (the only time when drift would have been effective enough to fixate them).

So, if anyone wants to uphold the "evolutionary rate", they must propose a new genetic mechanism that would lead to such a low rate.

I am no geneticist but I understand that the mechanism is simply that drift (and the subsequent fixation) are largely (almost totally) neutralized because of the much larger (and growing) population sizes that have existed since the Neolithic onwards.

That's why all those family minihaplogroups are mostly unknown and irrelevant: they had very limited options to expand and create large well defined clades. Instead, in the time of small hunter-gatherer bands, they did that easily.

Saying that it correlates with archaeology in a non-quantitative hand-waiving type of way (i) isn't persuasive, and (ii) isn't what geneticists are paid to do.

Well, I have also adressed it in genetic terms in this post. But, as Julien Riel-Salvatore says in his most recent post at A Very Remote Period Indeed: DNA provides some information, fossils provide other types of information and archaeology provides yet other information, all of which is necessary and complementary to reach an adequate understanding of this process.

If these different types of data appear to be contradictory, something is wrong most probably. The model needs a good review.

dienekesp said...

Maju the idea that a very small number of males of the past produces all the males of today isn't my theory it's a basic fact of coalescent theory.

The only scenario in which this wouldn't happen is if every man in a population has at least one son, something which never happens in real human populations.

The distribution of surnames is really irrelevant. For example if x% of people were called Gonzalez in 1800 and x% of them are called Gonzalez today, that doesn't mean that the all or most of the Gonzalez males of 1800 have patrilineal descendants today, only a small percentage of them do.

dienekesp said...

but I fail to see what's the most recent common ancestor in a pure patrilineage: it is the same for all humans (and beyond until the first male of whichever ancestral haploid species - some Jurassic "mammaloid" lizard probably).

The most recent common ancestor is defined for a set of Y-chromosomes. or the set of all human Y-chromosomes it is "Y-chromosome Adam", while for a set of E-V13 Y-chromosomes it is either the guy who first had the V13 mutation or one of his patrilineal descendants.

dienekesp said...

The Jurassic mammaloid is _a_ common ancestor but is not the _most recent_ common ancestor.

ren said...

Maju, when I first showed you the expansion of Neolithic cultures from Eastern Europe and the eastern Mediterranean into the Iberian peninsula, you argued that it goes against the genetic evidence.

Now, it's the genetic evidence that you claim to contradict the archaeological evidence. This analysis by Dienekes (superbly done in a series of entries) is simply the latest piece to a mountain of evidence that's been piling up for a fews years now.

My advice to you is to just give up this religion of yours about how the Basques are Paleolithic. You'd feel a lot happier once that addiction is let go.

Maju said...

The distribution of surnames is really irrelevant. For example if x% of people were called Gonzalez in 1800 and x% of them are called Gonzalez today...

I was thinking of Basque surnames. Not sure elsewhere but here each is rooted at a specific household and even the most common surnames are very rare. Surnames's distribution has not been altered significatively in one or two dozen generations. If your theory would be true now nearly everybody would be an Agirre or an Urrutia - but even these somewhat more common surnames are one in a thousand maybe.

Spanish surnames are, for the most part, not viable because they are too often patronimics ("son of...") or professional identifiers that were shared in origin by many many unrelated men. This is not the case with most Basque surnames certainly. Even Basque "González" (from the south) are not just that but González de *some place*, meaning their ancestral household or village. This kind of surnames can be safely assimilated to Y-DNA markers (with the exception of bastards and adoptees, of course).

Coalescent theory is strongly dependent on population sizes. In large populations this effect is not zero but as close to it that it's irelevant. It is not different in practice from what I was refering as genetic drift. The overall tendency is zero for large populations such as the ones we find since Neolithic times.

... for a set of E-V13 Y-chromosomes it is either the guy who first had the V13 mutation or one of his patrilineal descendants.

I understand now (after some meditation) what you mean. But I suspect that the difference between the haplogroup patriach and MRCA is minor if not null. A haplogroup that would have been kept by a thin lineage of many single sons (effective sons, who did transmit the lineage) is extremely unlikely to have ever existed. Only in small populations, where drift is an effective force, this would be the final outcome (fixation) but this drift and fixation is also part of the clade expansion process. Would it not be the case for a significative number of generations, we would eventually find new SNPs in the middle of the lineage (as often happens - and many of these SNPs are not detected yet), defining lower level haplogroups (a new "patriarch" and MRCA). When you see instead a haplogroup that branches out (be it starlike or not), you are looking at some sort of expansion.

You are right re. the mammaloid lizard anyhow. Not sure what I was thinking when I wrote that. Even for a diverse population, the MRCA can't be older than Y-DNA Adam - as long as we talk of H. sapiens.

...

@Ren: I can say the same re. your "religion" of hyperrecent ancestry. Dienekes is complaining that all geneticists are adhering to older age estimates that the ones he and you seem to prefer. I am no geneticist certainly but I do feel vindicated by such an overwhelming majority of experts.

What, IMO, is missing here is a realistic consideration of the odds of clade fixation (virtually null) in large and growing populations, like the ones that have existed since Neolithic times.

Geneticists are obviously aware of this fact.

McG said...

Very interesting thread!! I agree with Maju here, in part. First, ZUL had an ace up their sleeve in their first paper. Zhiv had already shown that autosomal microsatellites had a rate similar to what he proposed for STR microsatellites, and I can think of no reason why they shouldn't be the same(similar)? His analysis brought together everything they knew in 2004 about rates. There is no mention of his Poisson model. That was presented in 2006 to justify bottlenecks and basically I think that analysis is flawed. Until we build "intelligence" into the mutation process, I think we are going to miss the boat??? In some sense the descendants of a common ancestor are "connected"!!
I should also mention that I think Chandlers rates are flawed. He only used haplotypes that were close to each other in his sampling. This is antithetical to what Kerchner advises in the analysis of his family. If you blindly count the mutations of his family you get 14, the real number of unique transmission events is 8!! The data base which has been created isn't random, there is a lot of close relationships structured in it. Its what people are looking for, close relationships. Note his estimates approach ZUL's at the slow mutators where this effect is smallest(388,426,392 etc.). Another problem with Chandlers rates are that they are "cross" (mixed) haplogroup. 388 for example has a much different rate in I than R1b.
In closing I will support one of Dienekes observation based on living in Mazatlan, Mex. There are strong "name" trends among the Mestizo of Mexico; fully 5%(?) of Mazatlecos are named Lizarraga from a french colonist; Osuna is another very popular, Sanchez etc. The names aren't extremely diverse in the Mestizo; not true in the upper class though.

Maju said...

Re. Mestizo surnames. The input of European male colonists in Latin America has been very high, causing probably some important, yet localized, founder effects; but another reason might be that the Church forced natives to adopt Spanish surnames, maybe those of their masters or patronymics.

(And, as mentioned before, Lizarraga is a Basque surname).

yeomanrycavalry said...

In regards to SRY2627+.
I wonder what cities in the Valle D'Aran they tested in. Also the comarca seems to be divided in dialect or even culture between Les and Arties. Meaning that in Salardu we see more Catalan like influence versus in areas such as Bossost where we would see more Gascon influence. So far results have been trickling in from areas within France but nothing that is an eye opener. The subclade is a relatively small one with no more than 200 confirmed as SRY2627+ so numbers such as 48% is a big jump from the early estimates of 22% in Catalonia (Girona) region.

Maju said...

It doesn't look like a small clade to me. It's fairly common through the Pyrenees, notably among Catalans, Gascons, Basques and others. It seems the most important (known) subhaplogroup of R1b1b2a1 in Iberia with an eastern distribution centered in Catalonia and, per Wikipedia, is common enough also in France (5%) and also in Bavaria (3%). Aran is small and isolated enough to have been affected by whatever type of founder effects and drift.

yeomanrycavalry said...

Yep, looks like a small sublcade in comparison to many other ones in Europe.

Maju said...

Catalans alone are six million people: they are more than Scots. Gascons or Basques, each, are comparable to the Irish, and the lineage is also common in Aragon and the other Valencian countries (and other places, it seems). This lineage is no doubt, by numbers, the largest one R1b1b2a1a2, at least in Europe (not sure in the former colonies).

Not sure what you understand by "small" but obviously it's not the case.

Maju said...

Erratum I meant "the other Catalan countries", not "Valencian", though this means Valencia and Balearic islands primarily.

yeomanrycavalry said...

"This lineage is no doubt, by numbers, the largest one R1b1b2a1a2, at least in Europe (not sure in the former colonies)."

Hardly, it remains fairly small in comparison to many other subclades, not to say it's the smallest one.

yeomanrycavalry said...

This lineage is no doubt, by numbers, the largest one R1b1b2a1a2, at least in Europe (not sure in the former colonies).

Where are the numbers because I have yet to see large numbers of SRY2627 anywhere. Only 30% of Catalonia is not that big. The highest numbers still in Val D'Aran at 48% albeit this can be due to various reasons. The numbers for this subclade barely exceed 10% in any other region away from Catalonia. Most of them with even a considerably less population than the Basque Country, etc.