August 12, 2012

East Eurasian and African fastIBD analyses

Mark D said...

Dienekes, I appreciate the caveat you included, "These results should not be construed as measures of overall genetic similarity or origins. Rather, they suggest which populations have exchanged genes in the relative recent past."

I find the growing commercial attempts to depict autosomal DNA ancestry troubling. They range from the 23 and Me "Ancestry Painting" which show only Europe, Asia and Africa to DNA Tribes' exhaustive list of regions and sub-regions. In between is FTDNA's "Popultation Finder" which regionalizes to some extent. Your recent mention of the 1000 Genomes Project alludes to an attempt there and the upcoming National Geographic Geno 2.0 will also include some type of autosomal painting. In response to an email I sent FTDNA on Geno 2.0, I was advised that "we will analyze a collection of more than 130,000 other markers from across your entire genome to reveal the regional genetic affiliations of your ancestry"

The trouble I have with all this is the almost total absence of any description of sampling size and methodology. FTDNA simply states that they use their own databank and base their "biogeographic" analysis on samples from "scientific studies". I've tested with both FTDNA and 23 and Me, but not DNA Tribes who claim that they use a "proprietary algorithm".

Moreover, I found both tests less than informative; 23 and Me's because it was limited to three continents, and FTDNA's because of the their descriptions and margins of error. The test showed 86.41% +- 10.38% Western European, which they describe as inlcuding French, Spanish and "Orcadian". Orcadian?Of course I had to look up what they meant by Orcadian, which refers to the Orkney Islands north of Scotland. Well, the residents of the Orkneys are mostly descended from Vikings with some Brits and Scots thrown in, but its a small community and I couldn't imagine what their sample size was from the Orkneys. So did FTDNA really mean Scandinavian? And this leads back to what I consider most troubling.

Using a small sample as somehow representative of a relatively small region of the world can be grossly misleading. I mentioned in your blog on the R1b showing up in the 1000 Genomes Project database from a Puerto Rican sample that I considered the description inappropriate as there are no indigenous Puerto Ricans remaining. Will that now be the descriptive region for the Porject? If so, and if the sample was indeed from the Ukrainian professor, will a sizable portion of Europeans be told they have a percentage of Puerto Rican in them?

Of course all this is ridiculous and I assume the Project will not use such a descriptor, but it points out the pitfalls in arbitrarily assigning regional descriptions to small samples as being somehow representative of that region. 1000 genomes may be enough to describe an Icelandic modal admixture provided the 1000 people tested verified their Icelandic heritage a few centuries back as most of them can, but it can hardly form the basis of worldwide "biogeographic" analysis. Again, I appreciate your caveat and hope you can one day expamd your Dodecad Project beyond 12,000 (Am I reading too much in the name, or are you from Rhodes?)

Joey B. said...

Are the Z-scores, haplotype counts, total and mean IBD in cM shared available?

The Heatmaps don't appear to correlate with the median shared IBD by rows.

Dienekes said...

The Heatmaps don't appear to correlate with the median shared IBD by rows.