September 06, 2012

Y-chromosome phylogeny using Complete Genomics data

Since taking a trip to Innsbruck on a day's notice is out of the question, and since I don't want to wait for the eventual journal publication, I figured I'd try my hand at using the Complete Genomics data for myself to build a Y-chromosome phylogeny.

With the aid of PhyML (default parameters and 100 bootstrap replicates), here is what I get:

Note that the above was done by isolating the Y-SNPs on 28 unrelated males in the data. I also threw out all SNPs that had no-calls. I tried to infer terminal classifications for the different individuals based on current ISOGG nomenclature, although it's possible that there are downstream mutations that I missed. NA18940 that is cut off in the figure is D2a-M11 and, NA19649 is R1b1a2a1a1b2a1a-L20I couldn't quite figure out NA19670.

Here is the tree code for anyone who wants to play with it:

  (NA21732_MKK_E1b1b1a1c-V22:0.00295296,NA21737_MKK_E1b1b1a1c-V22:0.00270971,(NA20510_TSI_E1b1b1a1b-V13:0.02092798,((NA19239_YRI_E1a2-P110:0.08297556,(NA18940_JPT_D2a-M11
6.1:0.12168757,((NA12891_Utah_I1a4-Z63:0.01039303,(NA06994_CEU_I1a3a1b-Z73:0.00945249,NA20511_TSI_I1a3a1a-Z140:0.01697537)54:0.00049670)100:0.07005125,(NA19670_MXL_?:0.
08181470,(NA18558_CHB_NO-M214:0.10352315,(NA19735_MXL_Q1a3a1-M3:0.05725892,(NA20845_GIH_R2a1a-L294:0.05384443,((NA20846_GIH_R1a1a1b2-Z93:0.01116683,NA20850_GIH_R1a1a1b2
-Z93:0.00810758)100:0.02945897,(((NA12889_Utah_R1b1a2a1a1b5-DF19:0.01118764,HG00731_PUR_R1b1a2a1a1b1-DF27:0.00955759)24:0.00020607,NA07357_CEU_R1b1a2a1a1b2-U152:0.00963
759)21:0.00012284,(NA19649_MXL_R1b1a2a1a1b2a1a-L20:0.05342631,(NA20509_TSI_R1b1a2a1a1b2c3-Z146:0.00943290,NA10851_CEU_R1b1a2a1a1b3-L21:0.01307073)15:0.00019824)5:0.0000
0002)100:0.03211886)100:0.00848072)100:0.00921281)100:0.02274049)100:0.00299324)54:0.00000005)100:0.03274296)100:0.01917298)100:0.00999561,((NA18504_YRI_E1b1a1a1f1a1-U1
74:0.01490192,(NA19026_LWK_E1b1a1a1f1a1-U174:0.01033155,NA19834_ASW_E1b1a1a1f1a1-U174:0.00961706)75:0.00031406)100:0.00787632,(NA18501_YRI_E1b1a-V38:0.00966164,((NA1902
0_LWK_E1b1a-V38:0.01193590,NA19025_LWK_E1b1a-V38:0.00793484)100:0.00317626,(NA19700_ASW_E1b1a-V38:0.01130782,NA19703_ASW_E1b1a-V38:0.01054823)77:0.00022885)100:0.001228
37)100:0.00457000)100:0.04581828)100:0.04898306)100:0.01922685);

6 comments:

  1. Nice tree, Dienekes! But you should add the number of Y SNPs for each branch, so that you and others can estimate dates for various lineages.

    ReplyDelete
  2. Do I miss something or does it claim, that R1b is closer related to the Asian R1a than to the European R1a?

    ReplyDelete
  3. "There is no European R1a here."

    Ah! All right. Missed that its a R2 not R1

    When I googled for L294 I got a list with individuals including Czech and Slowakians, so I got fooled that this must be the Euro R1a. X-D

    ReplyDelete
  4. In case it helps, our collaborative spreadsheet at https://docs.google.com/spreadsheet/ccc?key=0Agq_ez43qXCjdFlxemtlUnZ1Qk01cVhMRVBFcm5WX3c&authkey=CIOag_UD#gid=12 has NA19670 as G2a3b1a2 (L497+). (I didn't personally contribute to this particular categorization.)

    ReplyDelete
  5. @GregRM,

    Thanks! I had read somewhere that within F, haplogroup G branches off early, and this seems consistent with that.

    I think the deep phylogeny will be near perfectly resolved soon, based on the papers that are coming out.

    ReplyDelete

Stay on topic. Be polite. Use facts and arguments. Be Brief. Do not post back to back comments in the same thread, unless you absolutely have to. Don't quote excessively. Google before you ask.