Here is a huge data dump for anyone interested in human variation. Part of the reason I started the Dodecad Project was to be able to analyze data on my own, rather than having to squint to make sense of a plot, to speculate about what might show up at higher dimensions, or with more clusters, to wonder how the inclusion of additional populations would affect the results, and so on.
The following dataset represents the culmination (so far), of my efforts.
Number of SNP markers: ~177,000 as in here
In the RAR file (~11MB) you will find 49 scatterplots (5000x5000 pixels each) representing the first 50 dimensions of a multi-dimensional scaling analysis of this dataset, together with information about the samples and their sources. There is a plot of the 1st and 2nd dimensions, 2nd and 3rd, 3rd and 4th, and so on, until the 49th and 50th.
I don't believe Picasa allows such huge pics, so I've made a few smaller (still 1600x1600 pixels each) ones to give you an idea of what to expect. Note that the legend in these small ones is partly visible.
In all plots, population labels have been placed on the population averages; this usually correspond to blobs of datapoints belonging to that population, but occasionally they are shifted due to the presence of outliers.
Before I proceed, it might be worth to give a visual representation of the three poles of human variation in its broadest context; these are Basques/Sardinians, Mbuti/Biaka Pygmies, and She. Well, these are marginally more toward the three poles than many others, but they will do:
Mbuti image by Mikael Strandberg; She image from Portraits of Chinese ethnic groups and links therein.
1 vs 2
3 vs 4
5 vs 6
7 vs 8
Inspection of these plots gives you an idea of why Clusters Galore works so well. It can detect "clusteredness" of individuals along multiple dimensions. It does not look at a series of 2D plots, but it considers proximity of individuals to each other along multiple dimensions, and adapts to the shape, size, and orientation of the clusters.