October 09, 2010

Utah Whites vs. Tuscans with 1.5 millions SNPs

This time I took the full ~1.5M SNPs and ran ADMIXTURE on two HapMap-3 European populations, CEU Utah Whites and Tuscan Italians. I sorted the individuals based on their ancestral proportions, but still the cutoff is quite obvious, and the separation is perfect.

CEU belong to the blue cluster with 93.7%, and TSI in the red one with 94.9%.

The most red Utah White has 31.4% "southern" ancestry, while the most blue Tuscan has 21.2% "northern" ancestry. Thus, the cutoff "jump" between the two populations, at the middle of the figure, is 47.4%.

The standard deviation of the "southern" component among CEU individuals is 6.9%, and the standard deviation of the "northern" component among Tuscans is 5.7%.

How real is this "admixture"?

As I have mentioned before in the blog, apparent "mixedness" between populations decreases as the number of markers increases. Thus, the question arises: is the apparent "admixture" between Tuscans and Utahns a real effect of individuals diverging toward a population other than their own, or an artefact of a limited number of markers?

We should note that increasing the number of markers has diminishing returns: most new markers are in linkage disequilibrium with existing markers, and hence provide little additional information: going from 10 to 110 markers has a huge effect, but going from 1000 to 1100 a trivial one.

To study this question I took a 1/5 random sample of the markers, or about 300K SNPs and repeated the ADMIXTURE run:

Now, CEU are 93.9% in the blue cluster (vs. 93.7% in the 1.5M run) and the variance of the red component in CEU individuals is 6.8% (vs. 6.9% in the 1.5M run).

Tuscans are 94.0% in the red cluster (vs. 94.9% in the 1.5M run) and the variance of the blue component in TSI individuals is 6.5% (vs. 5.7% in the 1.5M run).

The conclusion is obvious that the 5-fold increase in markers from 300K to 1.5M had no noticeable effect in the apparent mixedness of populations and individuals.

10 comments:

Marnie said...

Nice.

On this dataset, how about 60K markers and 12K markers?

Anonymous said...

My guess is that given two populations and running Admixture or Structure, the program will separate the two populations. For example North Germans and South Germans, however the separation into two parts per population may be identical or nearly so. It would not matter too much how many markers you used in the run.

Can't wait until EuroDNACalc mark II comes out so I can play around with my SNP data and see what the run produces.

Andrew Oh-Willeke said...

It is worth noting, of course, that this is not a case of admixture in the ordinary sense of the word.

There are two genetically distinct populations, with a certain amount of commonality due to common ancestory in Europe. But, the commonalities arose in a time period well before anyone but Native Americans lived in Utah (or earlier) and before Tuscany was a socially or politically meaningful entity. In all likelihood, not more than one or two people, and probably no one in the sample, has roots the involve actual admixture of Tuscans and Whites from Utah at any point where either community actually existed.

One suspects one is seeing basically a distinction between LBK Neolithic populations (modified by subsequent invasions from the East), and Mediterranean/Atlantic populations, with the split rooted somewhere in the North Levant or Greece or the Balkans around 6,000 BCE that make their way to Utah through the fact that the Protestant Reformation split along North-South lines with Mormons mostly coming from the Protestant side of the line and Tuscans coming from the Catholic side of the line.

Average Joe said...

These results confirm once again that North Europeans and Italians aren't different races but clinal parts of the same race (Caucasoid).

How can you conclude that? Over 90% of Utah whites are in the blue cluster while over 90% of Tuscans are in the red cluster. In other words, most members of each group are genetically distinct from the other group.

Onur Dincer said...

How can you conclude that? Over 90% of Utah whites are in the blue cluster while over 90% of Tuscans are in the red cluster. In other words, most members of each group are genetically distinct from the other group.

Average, the intra-Caucasoid cline would be much more clear if intermediate populations like the French and North Italians were included in the analysis.

Fanty said...

I would also say, it would fo it with any 2 populations.

It would do that with 2 families from the same population even.

The software is meant to find Clusters and patterns. So it WILL find em. No matter what.

I would think that a a software like this and the right number of patters, does this with your fathers and mothers family.

One family the red and one family the blue cluster. With yourself amnoung the admixed one between those "races" ;-)

We should not forget that there are NO different RACES in the one humman race.

At least no anymore. Only one human race survived.

The Neandertal people are a human RACE.

Onur Dincer said...

We should not forget that there are NO different RACES in the one humman race.

At least no anymore. Only one human race survived.

The Neandertal people are a human RACE.


Neanderthals were more than one race (=subspecies), as they had, like us, more than one race, so calling them one race is an oversimplification at best. We too have more than one race and this isn't something to be afraid of or suppress.

Onur Dincer said...

Fanty, today, with enough number of markers, two nearby villages can be completely or near completely genetically distinguished from each other. But that doesn't make them two separate races, because, as I have told many times, races (=subspecies) are much more than genetic clusters, for instance, morphology is also a very important distinguishing factor in defining races.

princenuadha said...

Onur, what have you thought of my definition (of race) so far?

Onur Dincer said...

Onur, what have you thought of my definition (of race) so far?

Nuadha, my confidence in your definition has somewhat increased since our last talk about it. But I am still open to other definitions of race (=subspecies).

For those who don't know, here is your definition:

"A race is a cluster on a global scale which cannot be recreated from other clusters."

And these were your examples to clarify it:

"Mestizos are an example of a cluster that could be recreated by a combination of other clusters (mostly Spanish and NA).

Caucasians are a global cluster which cannot be recreated using other clusters; so Caucasian is a race."

For sources and more details, see our discussion at:

http://dienekes.blogspot.com/2010/08/rare-genomic-look-at-aboriginal.html