July 26, 2011

DIY Dodecad

It's been more than 3 years that I started the non-commercial autosomal ancestry analysis field with the release of EURO-DNA-CALC. Today, I am releasing DIY Dodecad 1.0, the next generation of ancestry self-analysis.

Back in 2008, personal genome services using microarrays had just recently started, and it was a great opportunity to bring together the published data about human populations in the scientific literature with the new flood of data from customers of the new companies. I thought, that these were either underserved by the simple European-Asian-African model in the more reputable companies, or fed fairytales by the less reputable ones.

Of course, EURO-DNA-CALC was rudimentary by today's standards, as it used only a few hundred ancestry informative SNPs. Nonetheless, it did manage to be of some use before it was retired.

Almost a year ago, I decided to take up the mantle of genome blogger once again, with the goal of updating and improving EURO-DNA-CALC with the new wealth of population data that had since become available. Two reasons made me deviate from my original plan:
  • ADMIXTURE only runs on Linux or MacOS, while my favorite R is quite underpowered to do the job; hence, the vast majority of regular PC users would not/could not try a new tool that upped both the number of SNPs, and the number of ancestral populations substantially.
  • I realized the value of not only providing a tool to the community based on published data, but on collecting data myself; this would ensure that the tool would take into account several regions of the world that are "black holes" as far as publicly accessible data are concerned.
Thus began the Dodecad Ancestry Project, based on the idea of providing the community with results in exchange for data that could then create better results, and so on, in a virtuous circle.

Nonetheless, I always kept thinking of how I could encompass the Dodecad Project's main admixture analysis in a DIY tool; I explain the reasons why in my post introducing the new software, but they all boil down to one:

  • Interest in the Project has been huge, and I always felt bad when I had to turn down someone's relatives, or people of mixed ancestry, or the n-th member of a well-represented group. With all the automation in place, it still takes me a couple of minutes to process a sample, and the task of doing it myself for potentially thousands is daunting.

I learned that the hard way when I briefly opened submission to everybody; I had to close it in less than 12 hours, because of the overwhelming demand.

The new tool allows everyone to calculate their Dodecad v3 results. It did take me a few hours to write a couple hundred lines of code for it, but it will both make Dodecad analysis accessible to nearly everyone and save me a lot of time, some of which will, hopefully, be spent on experimenting with new interesting ideas for the Project beyond Clusters Galore, concordance ratios, zombies, the Dodecad Oracle, etc.

So, if you have a PC or Linux machine and 23andMe or Family Finder data, give it a try!


apostateimpressions said...

Would people recommend 23andMe or Family Finder?

Dienekes said...


1. You are off-topic
2. What "articles" are you referring to and by what "Greek comrades-in-arms?"?
3. I'm happily opposed to the ideology of multiculturalism and to left-wing ideology in general; some murderous Norwegian nutcase's opinions are irrelevant to me and would certainly not sway me one way or another in my political views.

Dienekes said...

Back to the topic.

Larry said...

I downloaded your DYIDodecadWin and found it a little hard to get used to but the results were rewarding. I have myself and three siblings in my 23andMe account. However, I am the only one in your project. I computed the values for the three other siblings and computed a composite profile for the family. There are only minor variations in the four major components for the four of us. I believe this confirms the validity of the methods to some degree and was surprisingly close. I fit the data to a mixture of:

Population Admixture
Orkney_1KG 69.51%
N_Italian_D 20.78%
Russian_D 7.19%
Lezgins 1.78%
Kalash 0.75%
RMSD 0.0410

As you can see from the RMSD (root mean square deviation) for the fit is only 0.041 which I consider a very good fit. The data:
East European West European Mediterranean Neo African West Asian South Asian Northeast Asian Southeast Asian East African Southwest Asian Northwest African Palaeo African
SmiserComposite 12.44 49.17 27.13 0.00 9.23 0.85 0.32 0.00 0.01 0.83 0.02 0.00
Mixture as below 12.42 49.17 27.12 0.00 9.23 0.85 0.43 0.14 0.00 0.88 0.09 0.00
Differences -0.01 0.00 0.00 0.00 -0.01 0.00 0.12 0.14 -0.01 0.05 0.07 0.00

B.M. said...

I didn't succeed in making working my downloaded Dodecad software.
There will be skillful persons who did succeed.

May I ask if one of them would be so kind to investigate my genome that was tested at 23andme ?
My ancestors in the fifth generation, 32 persons, were all born within a circle of 30 kilometers round Maastricht in the South of the Netherlands.

If granted I send the text file by e-mail.

Anonymous said...


I have accounts at both. 23 and me has a much bigger database, is more professional, better value for money and has more extras.

Folk at Family Finder are more likely to respond to correspondence and share with you. The also have the associated STR Surname projects.

But the size of the 23and Me database is the clincher.

jes-r said...

Wonderful program, it worked for me, no issues.

B.M. said...

Pffff ! Succes!

East_European 10.15%
West_European 48.71%
Mediterranean 26.26%
Neo_African 0.08%
West_Asian 11.73%
Southeast_Asian 0.17%
Southwest_Asian 2.08%
Northwest_African 0.81%

