September 22, 2010

Using ADMIXTURE on the Xing et al. (2010) dataset

This is the result of running ADMIXTURE on 246,554 SNPs / 850 individuals / 40 populations from the Xing et al. (2010) dataset.

Below are the admixture proportions for the 40 populations:


The clusters can be loosely labeled as:

A: South Asian
B: Altaic
C: Irula (S Indian tribals)
D: !Kung (Khoisan speakers from Africa)
E: Sub-Saharan
F: Polynesian
G: Southeast Asian
H: Pygmy
I: West Asian
J: European
K: Amerindian
L: Northeast Asian

24 comments:

TwoYaks said...

Have you tried breaking out the clusters and running them separately (ala Vähä et al 2007)? I've suspected that ADMIXTURE gets stuck on high level clusters sometimes, missing subdivision...

Maju said...

That is an interesting piece of research, thanks.

Some points of interest I see:

1. "Mixed" South Asians, such as Pakistanis or Dravidian Brahmins (known to have arrived from the North) show strong West Asian component (21-45%) but they show only tiny European component (0.4-3%), what is not supportive of any meaningful IE flow from Europe. It is supportive of strong Neolithic flow from West Asia however.

2. Southern African Bantus show high levels (28-39%) of the Khoisan component. This anyhow should be studied in an African-only frame as there may be other hidden components such as the one found (very strong) in Mozambicans (Sikora 2010 but previously also detected by Patin 2009, without emphasizing it because the paper is about Pygmies).

3. The only Mongol representative, Buryats show negligible (0.5%) European component, unlike what you have claimed in the past.

4. Chinese (but not Japanese) show strong SE Asian component (17-30%), much like SE Asian show a strong Mid-East Asian component (14-48%). This is not at all suggestive of any unidirectional flow in mainland East Asia but rather of diffuse generalized bidirectional flows. Curiously Austroasiatic look much more "Chinese/Japanese" than Thais, who are sometimes claimed to have arrived from the North. Again this is probably best understood in an East Asian only analysis.

5. Tuscans appear as 34% West Asian but the lack of other South European samples makes this affinity suspicious of being sample-biased.

6. Genuine Native Americans (Totonac) appear unmixed (98% in their own cluster). In the past you have claimed sometimes that they are admixed. Mixed Native Americans (Bolivians) still have 88% of NA blood (the rest being West Eurasian and, considering the levels of West Asian component in this fraction, potentially supportive of my doubts about the Tuscans being properly described).

Onur Dincer said...

Luis, have you read this new paper:

"W. S. Watkins et al. Admixture in New World populations: an analysis of Y-chromosome, mtDNA, and genome-wide microarray data

The first major interaction between Native Americans and Europeans is documented historically and occurred less than 550 years ago. This recent time frame provides an excellent opportunity to investigate the effects of admixture between two populations that were previously separated for hundreds of generations. To characterize European admixture in Native American populations, we sampled and analyzed a group of isolated Totonac agriculturists from tropical Mexico near Veracruz and a group of native Bolivians predominantly from the mountainous region near La Paz, Boliva. Mitochondrial sequencing of HVS1 showed that all samples had pre-Columbian mtDNA haplogroups (A, B, C, and D). Using a panel of 48 STRs or 12 Y-chromosome SNPs, Totonac Y-chromosomes lineages were all assigned to the pre-Columbian haplogroup Q1a3a, and Bolivian Y-chromosome lineages were assigned to haplogroups Q1a3a, R1, and J2. Haplogroups R1 and J2 are common in European populations. Principal components analysis (PCA) using >800K autosomal SNPs typed in 24 Totonacs and 23 Bolivians showed that all Totonacs and 14 Bolivians clustered distinctly from Eurasian individuals. Nine Bolivians, however, were positioned between the New World and European PCA clusters. Admixture analysis showed that these nine samples had 21 - 33% European admixture using a European reference population. All three observed Y-chromosome haplogroups, including the well-studied pre-Columbian haplogroup Q1a3a, occurred in the admixed individuals. Two of the nine admixed individuals had pre-Columbian mtDNA and Y-chromosome haplogroups but 21-23% European ancestry. This result demonstrates that Y-chromosome and mtDNA haplogroups are only partial indicators of an individual’s complete ancestry."

It is in the American Society of Human Genetics 2010. Judging by the abstract, it seems to confirm our common conclusion that the Totonac are genetically almost pure Natives.

Anonymous said...

The Tuscans are north central Italians and would not be representative of South Europeans in general. Samples from Greeks particularly Cretan Greeks, Sicilian Italians and Southern Spanish would be a truer indicator of West Asian admixture in those Southern Europeans. It is likely to be higher than in the Tuscan group except for the Spanish. The accepted belief is that the Neolithic farmers from Anatolia settled highly in the Balkans followed by movements into Central Europe via river systems and sporadically around the Mediterranean Basin by movement by ships. There must be a higher West Asian component in the Balkans despite the movement of slav speakers into the Balkans in the 6th century.

Fanty said...

One needs to know what ethnicy "Neolithic farmers" had been in the first place.

Because atm there is only guesses into the blue.

One knows the mtDNA of Neolithic farmers. Thats all.

And that mtDNA mentions that only like 5%-10% of the Europeans have neolithic farmer mtDNA.

The best way would be to have a autosomal DNA Profile of Neolitic farmers, made from several hundred neolitic farmer individuals.

princenuadha said...

Majuuuuuuuu your back dienekes just wasn't the same; I could tell he wanted a real challenge.

@your post

3) he said before that Mongolians represent the eastern limit of Caucasian (Orwell perhaps European) which is consistent with this data. Limit means it goes to and not beyond which is the case here. European does not need to be in every Mongolian population. In needs only be in one bi not in any northeast Asians.

4. It does suggest bidirectional flow but I don't know what "diffuse generalized bidirectional flow" means... the admixtures in those countries vary greatly between one another (not very diffuse). That's not just semantics, the data fits with dienekes saying that the northeast Asian contributions in southeast Asia happened relatively recently.

As for the Tai vs Asiatics NEA composition I noticed that NEA admixture in southeast Asia seems to be a function distance from China along the coast. Maybe that suggests migrations happened mostly along the coast.

6 ... the ratio of west Asian to west Eurasian in the columbians is about 1/4 which is not at all suspicious when considering both history and this data. That ratio is in-between Tuscans and northern Europeans; why would that be unusual?

princenuadha said...

One thing I notice is the parallel in the relative levels of altaic and NA in Europeans. From highest to lowest levels of either altaic or native American in European populations it goes Slovenia, NE and CEU, then Tuscans with 0 for both.

Considering the low levels of NA in altaic populations I think that the small component of NA in Europeans was not mediated by the altaics but instead represents a very old shared ancestry between NA, and most Europeans (along with other Eurasian populations). A shated ancestry not in common with many other global populations. What is so intriguing is that the Tuscans don't have it, at least not measurably. One explination would be that the Wests Asians settling Europe had even less NA than modern Europeans and since Tuscans have more west Asian they have less NA. Another explination would be that, to some extent the north Europeans and south Europeans diverged a very long time ago.

Maju said...

"Majuuuuuuuu your back dienekes just wasn't the same; I could tell he wanted a real challenge".

LOL.

"he said before that Mongolians represent the eastern limit of Caucasian"

I'm sure he has stubbornly argued on very weak data European admixture. I have argued that what he sees is not apparent when specifically Central Asian components show up in the structure.

"It does suggest bidirectional flow but I don't know what "diffuse generalized bidirectional flow" means..."

I'm not really sure either but probably means that the structure is weak and has been "trespassed" repeatedly in both directions. That there's never been a real barrier to gene flow even if both populations are somewhat distinct.

"That's not just semantics, the data fits with dienekes saying that the northeast Asian contributions in southeast Asia happened relatively recently".

I'm not even arguing against this, just pointing out that the flow is bidirectional. And, considering how these structure algorithms work, surely of similar age in both directions.

"As for the Tai vs Asiatics NEA composition I noticed that NEA admixture in southeast Asia seems to be a function distance from China along the coast. Maybe that suggests migrations happened mostly along the coast".

Compare with the HUGO paper, which I pondered here and here. I suspect that the Austronesian-specific component is not showing up here (maybe at greater K depth?) and, when that happens, typically populations show up as a poorly defined mixture of their secondary components. The same is probably happening to Tuscans and the Iberian component in Bolivians.

Also, while Vietnamese might have some "Chinese" input, I have no reason to think that Cambodians are more "Chinese" than Thais, on light of all the data I have seen along time. If anything, the opposite should be true.

"... the ratio of west Asian to west Eurasian in the columbians is about 1/4 which is not at all suspicious when considering both history and this data. That ratio is in-between Tuscans and northern Europeans; why would that be unusual?"

Bolivians! Colombians, with 'O' (from Colón, as Columbus is spelled in Spanish), are another nation and are surely much more European in their overall ancestry.

See what I just said regarding Vietnamese and Cambodians and misleading appearance of admixture when the local dominant component does not show up. Both Tuscans and the Iberian component in Bolivians are showing as their minor components from North Europe and West Asia. They are not showing their real colors for lack of depth and/or small sample size.

...

"have you read this new paper..."

No.

"Judging by the abstract, it seems to confirm our common conclusion that the Totonac are genetically almost pure Natives".

Judging from what I have seen in more specialized studies too, most populations described as Native American or "Indian" are very pure. The main exception might be Mayas. Notice that there was a huge pressure in Spanish America to become a Mestizo, and if possible a white creole, because of racism (but a racism that was not apartheid as in Angloamerica but assimilationist), so those who keep their Native identity are generally very much genuine natives, while those who have a creole identity are often also natives to a large degree, sometimes a very striking degree, as happens with the undifferentiated Bolivians.

Onur Dincer said...

One thing I notice is the parallel in the relative levels of altaic and NA in Europeans. From highest to lowest levels of either altaic or native American in European populations it goes Slovenia, NE and CEU, then Tuscans with 0 for both.

Considering the low levels of NA in altaic populations I think that the small component of NA in Europeans was not mediated by the altaics but instead represents a very old shared ancestry between NA, and most Europeans (along with other Eurasian populations). A shated ancestry not in common with many other global populations. What is so intriguing is that the Tuscans don't have it, at least not measurably. One explination would be that the Wests Asians settling Europe had even less NA than modern Europeans and since Tuscans have more west Asian they have less NA. Another explination would be that, to some extent the north Europeans and south Europeans diverged a very long time ago.


No, that admixture looks more like originally northeast European rather than north as a whole, probably related to Uralics and/or Altaics. Its extremely minuscule presence in northwest Europeans can easily be explained with diffusion of NE European genes to NW Europe over the long term. And I am saying these not as a NW European, but as a Turk.

princenuadha said...

"No, that admixture looks more like originally northeast European rather than north as a whole, probably related to Uralics and/or Altaics. Its extremely minuscule presence in northwest Europeans can easily be explained with diffusion of NE European genes to NW Europe over the long term. And I am saying these not as a NW European, but as a Turk."

It sounds like your hypothesis is that some other (non-north European) population defused NA into the Northern Europeans. However that population would need to be at least equal parts NA and altaic since northern Europeans have equal parts. Furthermore if the any contributions to Europe had more altaic than NA (perhaps Huns, Mongols, Scythians) then there must be a population with more NA than altaic to contribute to Europeans. You still think the fins fit the bill?

I still think northern Europeans greater NA comes from old divergence in north and south Europeans or old divergence in Europeans and west Asians.

princenuadha said...

While most the small numbers seem reasonable and leading some are just way off. What is with the hema having northeast Asian. Or the Japanese having. 4% jung. Or the Slovenians having. 3% sub-saharan African, more than the Tuscans.

Also wasn't instant commenting workings? Why go back

Dienekes said...

What is with the hema having northeast Asian.

0.1% is not really "having".

Or the Japanese having. 4% jung.

That's 0.4% !Kung

Or the Slovenians having. 3% sub-saharan African, more than the Tuscans.

That's 0.3% and I don't see why Tuscans should have more or less. In any case, both this and the ones you mentioned are within the typical imperfections of admixture analysis like this.

Onur Dincer said...
This comment has been removed by the author.
princenuadha said...

Yes, I meant .x% but messed up.

Is the 1/1000 place even meaningful (at face value)?

Onur Dincer said...
This comment has been removed by the author.
Onur Dincer said...

then there must be a population with more NA than altaic to contribute to Europeans. You still think the fins fit the bill?

I still think northern Europeans greater NA comes from old divergence in north and south Europeans or old divergence in Europeans and west Asians.


There is no Uralic population tested to compare to. BTW, Urkarah, a NE Caucasian-speaking population, has ten times more "NA" component than "Altaic" component.

Anyway, I think Dieneke is right in saying that these are within the typical imperfections of ADMIXTURE analysis. We shouldn't make too much of so minuscule numbers, as such genetic analyses are imperfect. I think if more populations from the central Eurasian areas (including Uralic lands) had been tested, "NA" component would appear in smaller amounts (maybe nil, at least in West Eurasia) in Eurasia.

Onur Dincer said...

Is the 1/1000 place even meaningful (at face value)?

I don't think so given the limitations and imperfections of this analysis due to the sampling strategy (this is the major problem I think), SNP choice and software.

terryt said...

"the data fits with dienekes saying that the northeast Asian contributions in southeast Asia happened relatively recently".

And has been generally accepted, except by some who've come recently to the subject.

"I suspect that the Austronesian-specific component is not showing up here"

Presumably would have to be part of the Polynesian base (F), although what is labelled such probably includes Austro-Asiatic as well. Interestingly what is labelled Polynesian shows as a very small proportion just about everywhere. Widespread in South Asia (A) and SE Asia (G), absent only in some Sub-Saharan and European populations, and not at all common in West Asia (I).

Onur Dincer said...

Interestingly what is labelled Polynesian shows as a very small proportion just about everywhere.

As Dieneke and I explained above, very small proportions should be taken with a grain of salt (with a better analysis, most of them would probably show up nil).

Unknown said...

Terry wrote

Presumably would have to be part of the Polynesian base (F), although what is labelled such probably includes Austro-Asiatic as well. Interestingly what is labelled Polynesian shows as a very small proportion just about everywhere

terryt said...

"As Dieneke and I explained above, very small proportions should be taken with a grain of salt"

I certainly took the Polynesian admixture with a grain of salt. But others seem to be taking such small admixture levels seriously.

"while Vietnamese might have some 'Chinese' input, I have no reason to think that Cambodians are more 'Chinese' than Thais, on light of all the data I have seen along time. If anything, the opposite should be true".

The Vietnamese have more than 'some' Chinese input. They are nearly 50% Chinese, which is not surprising given the amount of Y-hap O through the region.

Onur Dincer said...

But others seem to be taking such small admixture levels seriously.

In this blog or outside?

terryt said...

I've gone back through the comments and it certainly wasn't here. Must have got confused.

Corduene said...

good research but the lack of kurdish samples of other Regions like Anatolia were the big majority lives makes it not really representative. The Study is unfortunally only about Iraqi Kurds.