I've been trying out ADMIXTURE recently. It's lightning fast compared to both frappe and STRUCTURE, its main competitors in the admixture estimate field, simple to use, and well-documented.
My main goal was to analyze the data in the recent Xing et al. (2010) paper. It's unfortunate that many recent papers do not have their data online, or they hide them behind various institutional controls, but the data in that paper (a total of 40 populations typed for a quarter million markers) is available online.
My main goal is to eventually update the EURO-DNA-CALC, making it more powerful and extending it with non-European populations. There are a few aspects that are particularly important:
- You can't assume that people will have the computing power and know-how to go through various steps to run ADMIXTURE themselves.
- The alternative of having people send me their genotype data is impossible because of legitimate privacy concerns and the obvious impossibility of accommodating a large number of requests.
Here is a 10k SNP/K=7 run of ADMIXTURE on the aforementioned data, which had a running time of a few minutes in my machine. As you can see 10k is already quite good in separating different groups of individuals. I will probably use more SNPs in the final version.
Feel free to leave comments on what features you'd like to see in the new version. I can't promise a timetable, but I will try to incorporate as many suggestions as I can.
UPDATE I (Sep 21):
Here is a run with all 246,554 SNPs for the 850 individuals. If you notice, this looks like the figure published in the Xing et al. paper, although I've kept the individuals in the order they appear in the genotype file, while the published version has re-arranged them so that the different clusters will appear contiguously. This run took several minutes, and I am estimating that the full run for K=12, i.e., to generate the other figure from the paper will take about half a day, so I will probably leave it running overnight one of these days, and post it as well.
UPDATE II (Sep 12):
The results for K=12 and 246,554 SNPs, which took (as I had estimated) about 10.5 hours to compute.