At first, I calculated the histogram of pairwise TMRCA between all these 109 Y-chromosomes:
Most times are around 6-7 thousand years ago, but there is an outlier bump at around 15 thousand years ago. To further investigate this bump, I carried out multidimensional scaling of the collection of Y-chromosomes:
It is clear that the group of high pairwise TMRCAs correspond to the individual on the left of the figure that emerges as a clear outlier vis a vis the rest. The ID of that individual is HG00640 (from PUR population). One possibility is that this individual is M343+ due to sequencing error and belongs to a different lineage altogether. However, HG00640 is also R1-S1+ and R1b1-L278+ but R1b1a-P297-.
It will appear therefore that the HG00640 Puerto Rican belongs to the R1b1-L278 clade, but not to the R1b1a-P297 subclade. He thus represents an earlier split from the tree than the R1b1a2-M269 (frequent in West Eurasia), as well as the R1b1a1-M73 (frequent in Central Asia). It seems that I have chanced upon a real relic Y-chromosome!
The estimate of the age difference between HG00640 and the remaining M343+ chromosomes that cluster on the right is: 15,426 years. We now have direct evidence that haplogroup R1b1 is quite old, and R1b-M343 itself must have emerged sometime between 23,657 years (the TMRCA of R1a vs. R1b) and 15,426 years.
This little exercise reinforces my belief, first expressed in the outliers article, that there are real relic Y chromosomes in the world today, and we neglect them at our own peril.
Most European and European-derived men from the 1000 Genomes Project who belong to the R1b-M343 clade share patrilineal descent within the last 7,000 years or so. But, not all of them do, and outliers like HG00640 can only be caught with very large worldwide sample sizes and full genome sequencing.
* * *
Addendum: There appears to be a R1b1(xP297) DNA Project. There appear to be a quite rich collection of men with SNP results similar to HG00640, including R1b1c-V88+ (as suggested by Roy King in the comments), but also of V88- individuals. I see great utility in such projects, because if one can detect very aberrant Y-STR haplotypes (which can be done with a simple histrogram or MDS plot, as in this post), then one can identify candidate Y chromosomes for full sequencing.
* * *
17 comments:
Very interesting! I wonder whether HG00640 is V88+? There is also a small cluster from the MDS plot that is situated on the lower right of the figure. Could these be L23*(xL51)?
I don't see the V88 SNP in the 1000 Genomes data. Are there any equivalent SNPs that might not be in the ISOGG reference?
Someone needs to invest in FGS of individuals who belong to unusual haplotypes. There are so many interesting cases in the literature: Y*(xBT) far from Africa, B in Iran and Afghanistan, that whole suite of R*, R1*, R1a* etc. in Iran, and so on.
I think we are blinded by the mega-successful lineages, but there are probably many very low frequency ones that have managed to linger on without ever experiencing the success of their brethren.
So...
(1) Is he a back mutation?
(2) Is he a fellow traveller from West Asia who for some reason did not expand.
(3) Or is he a relict of a West European source population for R1b1a1 AND R1b1a2?
I am voting on a stray Iranian. We would need more of these to say anything more.
As I write in the updated post, there are many individuals like HG00640. Only through a careful sorting of these individuals in large samples will we be able to progressively bridge the gap between 15 and 7 thousand years, and discover where the very successful founder(s) corresponding to the latter period lived.
Dienekes, I did a little snooping on the Web. The 100 Genomes Project Sample Group listed a University of Puerto Rico professor - Taras Oleksyk, who is in fact Ukrainian. Possibly his own or a family member sample?
I see a Puerto Rican with a Hispanic name who is P297- in the R1b1(xP297) DNA Project, so I don't think we need to look for the exotic in this case.
There are a few known bifurcations in the R1b1 tree that ISOGG has not yet incorporated. The most important is that all R-M343 y-chromosomes are either V88+ or L389 (discovered to be relevant in an analysis of 23andMe results, but ISOGG ignores it).
http://vizachero.com/R1b1/R1btreev2.png
I notice that HG00640 is L389+ and so therefore V88-.
http://eng.molgen.org/viewtopic.php?f=86&t=385
I notice that HG00640 is L389+ and so therefore V88-.
http://eng.molgen.org/viewtopic.php?f=86&t=385
Are there any more SNPs one ought to look at, or does the list in the above post cover all the known ones?
Since he is known to be P297-, he should also be negative for L320, rs4032353, rs4141961, rs7067278, rs9785953, rs9786169, rs9786335, rs9786353, rs9786386, rs9786576, & rs9786772.
If he is positive for any of those, that would be an interesting find.
"I see a Puerto Rican with a Hispanic name who is P297- in the R1b1(xP297) DNA Project, so I don't think we need to look for the exotic in this case."
It would be nice if the 1000 Genome Project listed last known ancestors, as FTDNA projects do. (as you linked) "Puerto Rican" I believe, is an inappropriate descriptor, as there are no indigenous Puerto Ricans remaining(Carib and Arawak). Most are descendants of Spanish immigrants, but there are many other ethnicities now living there. Who really knows where HG00640's paternal ancestors came from?
Since he is known to be P297-, he should also be negative for L320, rs4032353, rs4141961, rs7067278, rs9785953, rs9786169, rs9786335, rs9786353, rs9786386, rs9786576, & rs9786772.
If he is positive for any of those, that would be an interesting find.
I manually checked all of these and they don't appear to exist in the 1000G data. But, someone ought to double-check this, cause doing it manually is a little cumbersome.
Puerto Rico's population history was radically altered around 1830, when the King of Spain opened up the island to immigration by any Roman Catholic that would swear allegiance to him. The largest numbers of these newer migrants came from Corsica,the Turkish Empire (mostly Lebanese & Iraqi Catholics), and Portugal. Simulateneously, immigration from Spain shifted away from a predominance of Andalusians (characteristic of earlier Spanish colonialism), to more migrants from northern Spanish regsions such as Asturias, Galicia, the Basque Country, and Catlonia. According to some historians, having lost most of his colonies in the Americas, the King of Spain was diluting on purpose the more mestizo and mulatto mix of older Puerto Rican ethnic stock so as to avoid another insurrection.
There is a new (Phase II) sample from the 1000 Genomes Project (PEL sample HG01947) that also seems to belong to R1b1(xP297):
https://docs.google.com/spreadsheet/ccc?key=0Agq_ez43qXCjdFlxemtlUnZ1Qk01cVhMRVBFcm5WX3c&authkey=CIOag_UD#gid=6
"The 100 Genomes Project Sample Group listed a University of Puerto Rico professor - Taras Oleksyk, who is in fact Ukrainian. Possibly his own or a family member sample?"
Very interesting observation, but I have to disappoint you: It was not my sample - we restricted sampling to the puerto ricans who had grandparents born in puerto rico.
Taras
I said it 10 years ago, and I'll say it again: It is extremely likely given the range of ultra-rare R groups in Iran, that R1b formed somewhere in and about the plateau - and probably in a zone that includes e. turkey and the S. Caucaus states. Additionally haplogroup I and G, J, likely had an Iranian origins.
I completely agree with this, good work. It has been very obvious to me that Western R1b is much older than the pundits have been claiming.
Europe too is full of relic lines, though not this far back. These have also been ignored to create a particular theory that doesn't hold water.
By the way pairwise correspondence always underestimates coalescent times.
Post a Comment