A cool new paper by a team of citizen scientists. The most important new piece of evidence is the joining together of haplogroup M (Papuans) with P in a new MP internal node. Your guess is as good as mine as to whether this MP may have come from, as his descendants are presently spread from the Atlantic via Siberia to the Amazon and all the way to New Guinea. The Mal'ta boy belonged to haplogroup R.
The other interesting discovery is of one Telugu man from India who shares mutations with haplogroups N and O but belongs to neither N nor O, so this defines a new "X" clade in the phylogeny. I am wondering if this could perhaps be called NO0 instead, similar to the way that more basal clades of the entire phylogeny were called A0, A00, and so on? Terminology is tricky...
I am aware of a few commercial ventures to resequence Y chromosomes, and I'm pretty sure that citizen scientists will soon not only be able to re-analyze data such as those from the 1000 Genomes Project, but will be able to generate data of their own.
bioRxiv doi: 10.1101/000802
Generation of high-resolution a priori Y-chromosome phylogenies using “next-generation” sequencing data
Gregory R Magoon et al.
An approach for generating high-resolution a priori maximum parsimony Y-chromosome (chrY) phylogenies based on SNP and small INDEL variant data from massively-parallel short-read (next-generation) sequencing data is described; the tree-generation methodology produces annotations localizing mutations to individual branches of the tree, along with indications of mutation placement uncertainty in cases for which "no-calls" (through lack of mapped reads or otherwise) at particular site precludes a precise placement of the mutation. The approach leverages careful variant site filtering and a novel iterative reweighting procedure to generate high-accuracy trees while considering variants in regions of chrY that had previously been excluded from analyses based on short-read sequencing data. It is argued that the proposed approach is also superior to previous region-based filtering approaches in that it adapts to the quality of the underlying data and will automatically allow the scope of sites considered to expand as the underlying data quality (e.g. through longer read lengths) improves. Key related issues, including calling of genotypes for the hemizygous chrY, reliability of variant results, read mismappings and "heterozygous" genotype calls, and the mutational stability of different variants are discussed and taken into account. The methodology is demonstrated through application to a dataset consisting of 1292 male samples from diverse populations and haplogroups, with the majority coming from low-coverage sequencing by the 1000 Genomes Project. Application of the tree-generation approach to these data produces a tree involving over 120,000 chrY variant sites (about 45,000 sites if singletons are excluded). The utility of this approach in refining the Y-chromosome phylogenetic tree is demonstrated by examining results for several haplogroups. The results indicate a number of new branches on the Y-chromosome phylogenetic tree, many of them subdividing known branches, but also including some that inform the presence of additional levels along the trunk of the tree. Finally, opportunities for extensions of this phylogenetic analysis approach to other types of genetic data are examined.