In a past post I had noted that the occurrence of low-frequency divergent haplotypes in a population might be a "relic of a bygone age". The point I was trying to make is that early settlement in a region may create a diverse gene pool (as there is plenty of time for variation to accumulate), but this antiquity of settlement may be obscured by later (including fairly recent) expansions of sublineages that appear to be young in evolutionary terms.
Hence, the importance of outliers in age estimation, as these may alternatively be "relics" of the most ancient population (prior to the expansion, due to either selection or demographic increase, of the recent lineages), or introgressed lineages from abroad.
In order to discover outliers, you need a large sample. The authors of this paper, in the context of mtDNA, discovered 5 new basal (=near the trunk) lineages within Eurasian macrohaplogroups M and N. This is less than 0.1% of their huge Chinese sample. In a smaller sample, as is customary in most mtDNA studies, these outliers would probably have been undetected.
What is most interesting, is that the authors explicitly tried to distinguish between the two competing hypotheses described above: admixture and "relics". The new lineages do not appear to be the result of foreign admixture (e.g., some rare Indian M subclade that somehow found itself into southern China), but to be true relics.
The existence of relics pushes back the time of settlement/Out of Africa expansion, as more time is needed to "tie in" the relics with the rest of the tree.
This should serve as a warning for age estimation: so many times, peculiar lineages are brushed aside with a paragroup label as oddities, while researchers focus on the more established and phylogeographically informative lineages. While full-mtDNA sequencing is a viable option, the same procedure is not widely-applied in Y chromosomes, as the Y chromosome is much larger than mtDNA, and hence more difficult (and expensive) to fully sequence.
A 6,000-strong sample is probably not available for most countries and populations, except for the Genographic project -which seems to be missing in action of late. There are also large commercial samples which benefit from the desire of paying customers with unusual haplotypes to look deeper into their ancestry. Unfortunately these same customers are WEIRD, and give us little information about most of mankind, including about the most interesting and mysterious aspects of human prehistory.
Nonetheless, there is hope for the future, as sample sizes continue to increase and genotyping costs to decrease. While there is reason to share Craig Venter's bleak assessment of the accomplishment of genomics, the single, clear, field where human genetics has triumphed and will continue to triumph is that of human origins.
UPDATE: Gene Expression notes that commercial companies like 23andMe have even larger samples, and customers can download 550k SNPs for their sample. However, most of the people who buy 23andMe tests are -in the global context- near clones of each other, being predominantly of western European origin. Moreover, the thousands of SNPs included in the technology used by 23andMe include a limited number of mtDNA and Y chromosome SNPs which have been chosen for their informativeness, i.e., they define studies clades of the phylogeny, and are thus unsuitable for discovering new clades -as was done in this paper. I'm pretty sure there are paragroups a-plenty in both the 23andMe customer base or in the Genographic Project, but, as far as I know neither of the two aggressively mine their data for SNP discovery/phylogeny refinement, and there are ethical limitations to consider, as people who sign up for either service do not, necessarily approve of their DNA sample being used beyong the narrow scope of the provided service.
Molecular Biology and Evolution, doi:10.1093/molbev/msq219
Large-scale mtDNA screening reveals a surprising matrilineal complexity in East Asia and its implications to the peopling of the region
Qing-Peng Kong et al.
In order to achieve a thorough coverage of the basal lineages in the Chinese matrilineal pool, we have sequenced the mitochondrial DNA (mtDNA) control region and partial coding-region segments of 6,093 mtDNAs sampled from 84 populations across China. By comparing with the available complete mtDNA sequences, 194 of those mtDNAs could not be firmly assigned into the available haplogroups. Completely sequencing 51 representatives selected from these unclassified mtDNAs identified a number of novel lineages, including five novel basal haplogroups that directly emanate from the Eurasian founder nodes (M and N). No matrilineal contribution from the archaic hominid was observed. Subsequent analyses suggested that these newly identified basal lineages likely represent the genetic relics of modern humans initially peopling East Asia, instead of being the results of gene flow from the neighboring regions. The observation that most of the newly recognized mtDNA lineages have already differentiated and show the highest genetic diversity in southern China provided additional evidence in support of the Southern-Route peopling hypothesis of East Asians. Specifically, the enrichment of most of the basal lineages in southern China and their rather ancient ages in Late Pleistocene further suggested that this region was likely the genetic reservoir of modern humans after they entered East Asia.