October 03, 2010

PopAffiliator: estimating origin with forensic autosomal STRs

Anders Pålsen alerts me to PopAffiliator, a neat little tool that guesses the origin of a sample from one of several major population groups:
The STR collection database used to train and evaluate the machine learning model encompasses data gathered from more than 40 different studies and contains a total of 56,222 individuals, distributed by 7 major geographical locations: East Asia, Eurasia, sub-Saharan Africa, North Africa, Near East, Central-South America and North America. The data is available here.
The tool uses 17 forensic autosomal STR markers. This may seem like too little in this day and age, but it is sufficient for the purpose at hand.

A few years ago someone had posted at the dna-forums site -I can't seem to find the topic today- a STRUCTURE-based calculator on mostly the same markers. That calculator contained 500 German/500 Chinese/500 African individuals:
  • Group 3=Africans
  • Group 2=Chinese
  • Group 1=Germans
Back then, I ran STRUCTURE on the data yielding the following results:

1: 0.041 0.025 0.934 500
2: 0.949 0.012 0.039 500
3: 0.019 0.955 0.026 500

If we look at the 1,500 individuals, it turns out that correct "guess" of a person's origin (i.e., his maximum inferred cluster membership coefficient corresponding to the real one) occurred in 497/500 Africans, 491/500 Chinese, and 490/500 Germans.

Pretty good! Typing hundreds of thousands of SNPs to guess if someone is East Asian, European, or Sub-Saharan African is overkill, and there is already widespread forensic profiling of numerous human populations, so why not amortize all this data?

The problem with using so few numbers isn't that they are insufficient to guess one's origin, but that they are insufficient to estimate admixture, if present. Here is the triangle plot from my aforementioned mini-experiment:


If hundreds of thousands of SNPs had been used, the red, green, and blue dots would be gathered in tight clusters in the three corners, with the occasional individual deviating in a different direction. Nonetheless it's also clear that very few individuals deviate beyond the 50% cutoff from their true origin, which might fool one into assigning them to the wrong population.

Since I have some autosomal test data of my own, I decided to give PopAffiliator a try.


The link to PopAffiliator will go to the right sidebar of the blog.

INTERNATIONAL JOURNAL OF LEGAL MEDICINE
DOI: 10.1007/s00414-010-0472-2

PopAffiliator: online calculator for individual affiliation to a major population group based on 17 autosomal short tandem repeat genotype profile

Luísa Pereira et al.

Because of their sensitivity and high level of discrimination, short tandem repeat (STR) maker systems are currently the method of choice in routine forensic casework and data banking, usually in multiplexes up to 15–17 loci. Constraints related to sample amount and quality, frequently encountered in forensic casework, will not allow to change this picture in the near future, notwithstanding the technological developments. In this study, we present a free online calculator named PopAffiliator (http://cracs.fc.up.pt/popaffiliator) for individual population affiliation in the three main population groups, Eurasian, East Asian and sub-Saharan African, based on genotype profiles for the common set of STRs used in forensics. This calculator performs affiliation based on a model constructed using machine learning techniques. The model was constructed using a data set of approximately fifteen thousand individuals collected for this work. The accuracy of individual population affiliation is approximately 86%, showing that the common set of STRs routinely used in forensics provide a considerable amount of information for population assignment, in addition to being excellent for individual identification.

Link

24 comments:

onur said...

I was a bit surprised as -being Greek from both sides of the Aegean- I expected to register at least some "Near Eastern" probability, but the populations in the Near Eastern group included Arabs and Iranians; these are probably poor representatives of any West Asian component in my genome.

There is no "Near Eastern probability", read the abstract and the site information carefully. "Near Eastern" is just one of the population categories in the database, but the results don't have a "Near Eastern" category but only the three categories appearing in your result, namely: 1- "Eurasian" 2- "East Asian" and 3- "sub-Saharan African".

Dienekes said...

Cool, thanks.

onur said...

Cool, thanks.

You're welcome. Being an Anatolian/Balkan Greek mix, you must surely have much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much ... more Near Eastern DNA than any East Asian or Sub-Saharan DNA even if there is any DNA in you from these two far away (to the Balkans and Asia Minor) regions. Wish they also tested for Near Easternness.

Spy said...

Hey Dienekes, wait until 23andme has another outreach sale (full package, $99). It will be great fun to see what your Ancestry Painting result is. I'll bet that your Mongoloid component is enough to make you wonder if you have any genes from the Karamanli or Yürüks. While you haven't said expressly which parts of the Ottoman Empire you derive, your comments make me think Cappadocia and Arcadia.
I've discovered likely Venetian and Vlach ancestry no one in my family knew about. It's fun!
What result do you predict for your own EuroDNACalc result? I've seen 100% SE in peninsular Greeks, but not in any other Balkan person. I wish some Sicilians would post their results in the 23andme fora (another cool feature).

Ponto said...

I don't consider 23andMe the acme of genetic profiling. The fora is as dumb as you can get. Americans!

Why the interest in Sicilians? I noted a couple of Sicilian Italians, and Southern Italians, both 100% European. My results, 100% European. I am Maltese, and that island is further south than Sicily. A Greek from Crete was 100% European, yet the Norwegians and some American Whites? have Asian admixture. Pocahontas? The Cherokee Princess?

onur said...

While you haven't said expressly which parts of the Ottoman Empire you derive, your comments make me think Cappadocia and Arcadia.

Didn't Dieneke write earlier in his blog that his paternal side is completely from Pontus (Northeast Asia Minor)? As to his maternal side, it is completely from the Balkans but I don't know from which part.

As to the non-Caucasoid admixture, he needs to try much more detailed tests like the 23andMe ancestry test instead of very simple and rough ones like PopAffiliator to learn whether he really has non-Caucasoid admixture or not and how much he has.

onur said...

Even his surname is Pontikos (=from Pontus) BTW.

onur said...

Karamanli

Which Karamanli do you mean? If you mean the famous Turkish-speaking Karamanli Christians, I should note that they were all Greeks who by passage of time adopted Turkish as mother tongue.

Dienekes said...

Even his surname is Pontikos (=from Pontus) BTW.

Correct. Also, why don't you open a Notepad and combine your comments into one, instead of breaking the multiple-comment rule once again.

Spy said...

"The fora is as dumb as you can get. Americans!" saith Ponto.
In the fora, at least those Americans deploy complete subject-verb agreement. :)

Ponto, my interest in Sicilians has to do with their subracial breakdown, not their macroracial assignment. A recent paper had 37% of the male-line Sicilian ancestry as Greek. DNATribes and the defunct EurasianDNA both split off what I would call a Byzantine Cluster that stretches from Southern Italy through Asia Minor. Naturally, I expect that EuroDNACalc's southeastern readings to peak in Greece.

Onur, thanks for your note about the Karamanlides. Turkification of Greek Christians is indeed a more likely explanation for their origin than resettled Turkish converts, but as long as we're speculating, why not source Asianness to a variety of possible sources? By the way, I don't see the Pontus as so out of the way. You could get Ukrainian or Laz genes that way!
As to Dienekes' regions and surname, I always took those to be noms de plume. But if they're chosen to identify his local ancestry, then he's half Spartan!

Dienekes said...

I don't see the Pontus as so out of the way.

The Pontus region occupies about half the area of Greece, so there's plenty of variety there.

But if they're chosen to identify his local ancestry, then he's half Spartan!

Sparta is not out of the way.

princenuadha said...

Wow, I didn't know that Greek colonies survived that long outside of modern Greece and the aegean. I thought it may possibly be the case in Cyprus and parts of S. Italy also. I know Greek colonies got absorbed most elsewhere. I just looked up Pontus and man its pretty.

I just checked out 23and me and I feel quite iffy about it. How much of their research is based on consumer initiated testing and self descriptions given online? And what's up with them using what seems to be 3 races and other intermediates... it just looks sloppy to group NA, Chinese and Australians. That's the other problem I couldn't get much detail out what descriptions you get back.

How useful would it be for me? I should be nearly all western European and from multiple nations. I guess that's good because I'm not really from intermediate areas (makings the admixture more meaningful?) and I might be varied enough to make it interesting. I really want a result showing some "mixed" Westerns European heritage. Some result like I'm likely part Scottish English and swiss (or I suppose British isle and central European). Or would it just average those into say Belgian? Also I have a relative saying they are exactly 5 generations away from an NA ancestor. Can 23andme test that claim reliably like they say?

Lastly how does 23andme compare to dna tribes? I have a better idea of how they work.

onur said...

Dieneke, I forgot to mention Leucosyri as one of the ancestral peoples (maybe the most important one) to Pontian Greeks.

Correction for "Tsan": Tzan

Spy: why not source Asianness to a variety of possible sources?

Spy, what "Asianness" are you talking about? If you are referring to the PopAffiliator result of Dieneke, as I and Dieneke have pointed out above, PopAffiliator is a very poor tool for detecting admixture, so his result doesn't tell anything clear about Mongoloid or Negroid admixture, thus he can very well be actually 100% Caucasoid. 23andMe, on the other hand, using hundreds of thousands of SNPs instead of the only 17 (yes, only 17!) STRs of PopAffiliator, is far better in detecting the real or close to real admixture values.

Dienekes said...

Dieneke, I forgot to mention Leucosyri as one of the ancestral peoples (maybe the most important one) to Pontian Greeks.

How can you intuit that the most important ancestral people to Pontian Greeks is a people that exists as a footnote or two in an ancient text?

Actually Strabo says that the Greeks called the Leucosyri Cappadocians, so they were probably not a particularly strong element in the Pontus.

Dienekes said...

onur, I've warned you several times not to post multiple times, and now you've sent another 4 comments.

As long as you don't respect the rules, you can't post here.

onur said...

Sorry, Dieneke, as I said, I made a serious mistake while reading your post, so I had to repost.

So I will post my previous comment in its last form:

Dieneke, I said maybe, didn't make a definite judgment. Besides, you didn't publish my previous comment, which was mentioning peoples ancestral to Pontian Greeks other than Leucosyri. So I will reframe my previous comment in its updated form:

Spy: By the way, I don't see the Pontus as so out of the way. You could get Ukrainian or Laz genes that way!

Pontian Greeks are essentially Hellenized/Romanized (in Ancient and Byzantine times) Leucosyri, Tzans, Laz, Georgians and other indigenous peoples of the Pontus area with a minor contribution of Greek colonizers (Ancient and Byzantine-era) mostly in the coastal areas.

Spy: I've discovered likely Venetian and Vlach ancestry no one in my family knew about. It's fun!

What are your ethnicity and location?

Ponto said...

When people try to understand their ancestry going back some hundreds to thousands of years or the ancestry of certain nationalities like Italians or Greeks or Spaniards, they should leave the history books in the library and just go with the data that comes from the testing of dna, and combine evidence found from archaeological digs. Someone said history is bunk, at least most of it is distorted and speculative. With the present day Jews, the Bible is used out of its religious context as some sort of reference. No one has proven the Jews have any connection to any Biblical folk or that they come from Levantine region. DNA shows present Jews to be essentially Southeast Europeans, hard to distinguish from Sicilian Italians, Southern peninsula Italians, Anatolians and others in the eastern basin of the Mediterranean Sea. They could be from the Levant or Southern Italy or Anatolia or the Caucasus region. All my SNP results show is that I come from where I was born, it is a GPS indicator and very accurate. The results haven't told me whom my ancestors were ethnically or racially other than West Eurasians.

The point is stop speculating about the origins of certain people unless you have proof. Don't just quote the Bible or use schoolbook history about Romans or Phoenicians to try to proof your points.

It is interesting that FTDNA have admitted in their information about their Population Finder that they cannot distinguish with their small battery of SNPs, Jews from Southern Italians from Anatolians from Greeks from Levantines. Their PF is not very sophisticated. This is from the firm that employs that Behar man who did a recent study on Jews! Makes his recent study utter rubbish.

Spy said...

Nuadha asks if 23andme locates one's ancestry down to the level of admixture of ethnies. No. Ancestry Painting calculates macroracial admixture and show you on which chromosomes such components lie, whereas Global Similarity—Advanced reduces your allele frequencies to two composite variables correlating with west-east and north-south migration, and then places you on a graph with those two axes. So, you should find yourself clustering with a geographically intermediate country, usually.

Your buddy could be mistaken about his putative NA ancestry. If his genome shows NO Asian admixture, then he surely has no such ancestor a mere five generations back.

You can also run freeware on your Raw Data File for ancestry (think EuroDNACalc) and health-related applications.

Dienekes, I agree Sparta is not out of the way. Other than Arcadia, West Thessaly sounds like a good candidate for a non-coastal redoubt west of the Aegean.

Onur, one of the 23andme features is calleded 'Ancestry Finder'. It allows you to see which customers have 'half-identical runs' with you. So, for example, on the X Chromosome I have at least 5cM HIRs with a Romanian. I have at least five other Romanians appearing on my autosomes who must also be distant cousins. Conclusion? Vlachs on my mothers side.

I also wrote a Greek who shares my mother's maiden name. He informed me that he had traced it to Souli by way of Venezia. That plausibility was enhanced by at least two historical events, etymology, and by my having two distant Brazilian cousins in Ancestry Finder whose—luckily for me—public profiles listed as their common mitochondrial ancestor a Venetian lady (d. 1840). (Otherwise, they were Portuguese and Amerindian)

About Asianness: In my experience, when you compare classifiers against admixture calculators, the admixture percentage for minor components is greater than the classifier likelihood, which makes sense. I had a friend (and distant relative according to 23andme!) mistake EuroDNACalc1.0 for 1.1 and tell me that he was "98.7% Northwestern". Wait a minute! You only get whole numbers from that. EDC1.0 is saying that IF he had a single origin THEN it's 98.7% likely to be NW. He downloaded and ran EDC1.1 and obtained a Maximum Likelihood Admixture of 85% NW/15% SE.

As for my own ethnicity and regions, I'm an all-Greek American with ancestry from areas fully redeemed in 1832.

onur said...

In my experience, when you compare classifiers against admixture calculators, the admixture percentage for minor components is greater than the classifier likelihood, which makes sense. I had a friend (and distant relative according to 23andme!) mistake EuroDNACalc1.0 for 1.1 and tell me that he was "98.7% Northwestern". Wait a minute! You only get whole numbers from that. EDC1.0 is saying that IF he had a single origin THEN it's 98.7% likely to be NW. He downloaded and ran EDC1.1 and obtained a Maximum Likelihood Admixture of 85% NW/15% SE.

EuroDNACalcuses hundreds of markers, but PopAffiliator uses only 17! Such a small number of markers, no matter how carefully chosen, should always be viewed with suspicion. Maybe one day Dieneke decides to take a much more detailed test like 23andMe and removes most of the doubts.

Marnie said...

" Conclusion? Vlachs on my mothers side."

Spy, you might want to consider that your mom's ancesters were Saracaciani (that's what the Sarakatsani are called in Romania.) Or maybe a Sarakatsani/Vlach mix.

Here's a map of Greece showing the current areas for Vlach and Sarakatsani populations:
http://www.eliznik.org.uk/RomaniaHistory/maps/Sarakatsani-map.htm

The map is posted on a nice website that discusses the Vlach populations of the Balkans.

onur said...

http://www.eliznik.org.uk/RomaniaHistory/maps/Sarakatsani-map.htm

Was nomadism so widespread in Greece before modernization?

Typo correction for "EuroDNACalcuses" in my previous post: EuroDNACalc uses

Spy said...

Thanks, Marnie. It looks like Vlachs outnumber Sarakatsans by a large ratio. (2M to 80K) So, for now, I'm just going with the more likely population.

What happened to Sarakatsans in Romania? Wikipedia has 80k Sarakatsans in Greece and 4k in Bulgaria, with no mention of Romania.

Marnie said...

Spy:

"What happened to Sarakatsans in Romania? Wikipedia has 80k Sarakatsans in Greece and 4k in Bulgaria, with no mention of Romania."

Don't know. Here's this link:

http://ro.wikipedia.org/wiki/Sărăcăciani

I haven't google translated it, but Google translate apparently does Romanian.

Here's a fun collection of village phrases from KOZANH facebook, Discussions, xoriatka phrases. A few of them are latiniko/vlahiko:

http://www.facebook.com/topic.php?uid=2209228234&topic=4617

Marnie said...

Spy:

One other comment regarding the current population of Sarakatsans is that I'm pretty sure their current official number is low simply because Sarakatsans gradually abandoned their traditional way of life. Shepherding hasn't been profitable as a business for several generations. Many of the areas of traditional shepherding have also experienced significant outmigration of people.

So the current population of Sarakatsans is much lower than their traditional numbers of even two or three generations ago.

Vlachs, on the other hand, may have hung onto their ethnic designation in greater number by way of the fact that they speak a different language.