Back in 2008, personal genome services using microarrays had just recently started, and it was a great opportunity to bring together the published data about human populations in the scientific literature with the new flood of data from customers of the new companies. I thought, that these were either underserved by the simple European-Asian-African model in the more reputable companies, or fed fairytales by the less reputable ones.
Of course, EURO-DNA-CALC was rudimentary by today's standards, as it used only a few hundred ancestry informative SNPs. Nonetheless, it did manage to be of some use before it was retired.
Almost a year ago, I decided to take up the mantle of genome blogger once again, with the goal of updating and improving EURO-DNA-CALC with the new wealth of population data that had since become available. Two reasons made me deviate from my original plan:
- ADMIXTURE only runs on Linux or MacOS, while my favorite R is quite underpowered to do the job; hence, the vast majority of regular PC users would not/could not try a new tool that upped both the number of SNPs, and the number of ancestral populations substantially.
- I realized the value of not only providing a tool to the community based on published data, but on collecting data myself; this would ensure that the tool would take into account several regions of the world that are "black holes" as far as publicly accessible data are concerned.
Nonetheless, I always kept thinking of how I could encompass the Dodecad Project's main admixture analysis in a DIY tool; I explain the reasons why in my post introducing the new software, but they all boil down to one:
- Interest in the Project has been huge, and I always felt bad when I had to turn down someone's relatives, or people of mixed ancestry, or the n-th member of a well-represented group. With all the automation in place, it still takes me a couple of minutes to process a sample, and the task of doing it myself for potentially thousands is daunting.
I learned that the hard way when I briefly opened submission to everybody; I had to close it in less than 12 hours, because of the overwhelming demand.
The new tool allows everyone to calculate their Dodecad v3 results. It did take me a few hours to write a couple hundred lines of code for it, but it will both make Dodecad analysis accessible to nearly everyone and save me a lot of time, some of which will, hopefully, be spent on experimenting with new interesting ideas for the Project beyond Clusters Galore, concordance ratios, zombies, the Dodecad Oracle, etc.
So, if you have a PC or Linux machine and 23andMe or Family Finder data, give it a try!