Roewer et al. had previously discovered structure in European Y-chromosomes with Y-STRs. The new study, five years later, uses a huge database of population samples. While Y-SNPs defining haplogroups are safer due to the avoidance of homoplasy, which can be a problem with a few Y-STR markers, I believe that most major haplogroups can be distinguished even with few Y-STRs, so the paper's results are valid.
From the paper:
Interesting that such a small fraction of haplotypes corresponds to almost half the Y chromosomes. 7 Y-STRs are generally not sufficient to define monophyletic lineages (as the Cohen Modal Haplotype folks well know by now). It would be interesting to see what this fraction is expected to be under an assumption of reproductive equality, to assess the strength of social selection that I've speculated may be behind the mega-haplogroups we observe in the world today.
In a total of 33,010 males we identified 4176 different haplotypes, 2192 were unique, and 56 corresponded to 42% of the Y chromosomes
Here is a synthetic map of Europe showing distribution of different clusters:
Now, take a look at a map of predicted language distribution by Finnish scholar Kalevi Wiik for 5,500 BC:
The correspondence is not perfect, but it's pretty close to merit study. The little differences can be ascribed to 7,500 years of history; for example, in 5,500BC there were probably no Germanic speakers in Scandinavia.
Also of interest:
Two clusters were assigned to large areas of the Balkan Peninsula: 1) Croatia, Bosnia and Herzegovina, Serbia, Romania,Western and Eastern Hungary, and Central Ukraine: cluster 18;(2) continental Greece, Bulgaria, and Macedonia: cluster2. Cluster13 was assigned to Albania and to the western area of the Balkans 10 and cluster 11 to the Caucasus.
Forensic Science International: Genetics doi:10.1016/j.fsigen.2010.09.010
Geostatistical inference of main Y-STR-haplotype groups in Europe
Amalia Diaz-Lacava et al.
We examined the multifarious genetic heterogeneity of Europe and neighboring regions from a geographical perspective. We created composite maps outlining the estimated geographical distribution of major groups of genetically similar individuals on the basis of forensic Y-chromosomal markers. We analyzed Y-chromosomal haplotypes composed of 7 highly polymorphic STR loci, genotyped for 33,010 samples, collected at 249 sites in Europe, Western Asia and North Africa, deposited in the YHRD database (www.yhrd.org). The data set comprised 4176 different haplotypes, which we grouped into 20 clusters. For each cluster, the frequency per site was calculated. All geostatistical analysis was performed with the geographic information system GRASS-GIS. We interpolated frequency values across the study area separately for each cluster. Juxtaposing all 20 interpolated surfaces, we point-wisely screened for the highest cluster frequencies and stored it in parallel with the respective cluster label. We combined these two types of data in a composite map. We repeated this procedure for the second highest frequencies in Europe. Major groups were assigned to Northern, Western and Eastern Europe. North Africa built a separate region, Southeastern Europe, Turkey and Near East were divided into several regions. The spatial distribution of the groups accounting for the second highest frequencies in Europe overlapped with the territories of the largest countries. The genetic structure presented in the composite maps fits major historical geopolitical regions and is in agreement with previous studies of genetic frequencies, validating our approach. Our genetic geostatistical approach provides, on the basis of two composite maps, detailed evidence of the geographical distribution and relative frequencies of the most predominant groups of the extant male European population, examined on the basis of forensic Y-STR haplotypes. The existence of considerable genetic differences among geographic subgroups in Europe has important consequences for the statistical inference in forensic Y-STR haplotype analyses.