May 07, 2011

Beware of sample sizes: why Ancestral North Indians came from West Asia, not Eastern Europe

This is the final part of my Indian Trilogy. Part I and Part II.

In the previous two parts of my trilogy, I presented the evidence for the clear West Asian origin of the bulk of South Asian Caucasoid ancestry. There was one thing that nagged me, however: Reich et al. (2009) had presented evidence (based on their 4-population test) that Ancestral North Indians formed a clade with CEU White Utahns at the exclusion of the Adygei from the Caucasus. This seemed inconsistent with my theory, and I considered many potential solutions to the problem, until I recently realized what was happening.

A good way to determine whether ANI is more similar to CEU or to Adygei is to calculate the first principal component of variation between CEU and Adygei, and then project the Indian Cline samples onto it. Since these samples are composed of an Onge-like South Indian component (outgroup) and a Caucasoid factor X, their position on PC1 of CEU vs. Adygei will be determined by the relationship of X with either CEU or Adygei.

Here are the results:

Notice that many populations are around the 0.5 mark between Adygei and CEU, i.e., they are not particularly closer to one population than the other. But, a few of them, notably Pathans and Kashmiri Pandits are closer to CEU than to Adygei.

Now, consider the following table (Note S3 Table 1) from Reich et al. (2009):

This is the evidence, based on the 4-population test that CEU and ANI form a clade. Notice that this is based on comparisons with Pathans and Kashmiri Pandits, i.e., with two groups that seem to deviate towards CEU in the PC1 projection. Indeed, only for the Pathans (the most CEU-like group) is the Z-score more than 3, the condition considered necessary for statistical significance. We can thus conclude that ANI is not in general a clade with CEU. This may be true only for the most CEU-like South Asian populations, but it is not generally true.

Now, we will see that it is not true even for the most CEU-like South Asian populations.

Clearly, the PC1 projections presented above hint why the evidence for CEU-ANI forming a clade is stronger for Pathans and Kashmiri Pandits. But, they seem to go against all the data I presented in my earlier two posts about the main West Asian origin of Ancestral North Indians. If that were true, then we would expect Ancestral North Indians to be projected closer to Adygei (0) rather than in the middle (0.5), or towards CEU (1).

I puzzled long about why this was the case, considering inter alia: that Adygei were not a good representative of West Asians, that ASI was not a true outgroup, or that I was wrong. All of these explanations failed, until I realized the true culprit: uneven sample sizes of Adygei and CEU.

Let's repeat the PC1 projection, but using a 17-person random sample from CEU, so that Adygei and CEU have equal sample sizes.

Unexpectedly, now all Indian Cline populations are clearly shifted towards the Adygei side of the CEU-Adygei PC1, and the results are compatible with the idea of the mainly West Asian origin of ANI.

It's not entirely clear to me why this is happening, without dissecting the results. Here is my tentative guess: CEU and Adygei populations both possess low-frequency West Eurasian variants that are absent in the smaller Adygei sample, but present in the much larger CEU one. When one of these variants pops up in an Indian Cline sample, it is mistaken for a CEU variant. By equalizing sample sizes, CEU does not have an edge over Adygei at including low-frequency variants, hence this bias is removed.

I have also carried out another experiment substituting CEU with 10 Lithuanians and 9 Belorussians from Behar et al. (2010) and Adygei with 19 Iranians from the same.

Any northern European component in Indians is likely to be more similar to eastern Europeans than to CEU, which is mainly of northwestern European origin. Also, I did not use Russians, as their low-level East Eurasian admixture might have altered them somewhat compared to the putative ancestors of the Indo-Aryans who lacked such eastern Asian influences.

I chose Iranians, as they are the linguistic cousins of the Indo-Aryans, and also happened to have a convenient sample size of 19 that was equal to the sum of Belorussians+Lithuanians.

Here are the results:

And another experiment using Hungarians as a Central European sample:

I have also carried out the same experiment using CEU17 and Iranians:
There you have it: clear evidence that the Ancestral North Indian component is more closely related to West Asians than to N/C/E Europeans.


To cut a long story short:
  1. CEU and ANI do not form a clade: the evidence for this clade is based on the most CEU-like Indian Cline populations, and even in their case it is an artefact of unequal sample sizes
  2. ANI is most similar to people from West Asia rather than Eastern Europe


Onur Dincer said...

I think you should repeat the experiment with different West Asian populations, as Iranians are geographically adjacent to South Asia and linguistically closely related to the IE speaking part of South Asia so they may be genetically much closer to South Asians than the average West Asian.

Also you can use South Europeans instead of North Europeans to further test different scenarios.

Nirjhar999 said...

LOL! again so what? I mean:
1. Having west eurasian, n.european or e. European comonent gives no poop as its presence depth cant be judged!
2.arya in the oldest religious book rig-veda is no way racial! Its an adjective for who performs the arya rituals. It can be both the people of ANI(45000ybp) and ASI(60000ybp) and fully the opposite as in the battle of ten kings in r.v. lots of arya blooded people(bharatas,druhyus etc.) are called unarya! So its a identity of work not blood or colour.
2. Very archaic sages as vashistha, vyaasa are low caste borns, but they have the supreme position for their having ASI component is not carnal, its a result of fairplay, which have no trait in the academic circle at the moment.
P.s. Check your countrys nicholas kazanas attempts.

batman said...

Interesting indeed.

May one ask what the relations would look like if one relates red Indians with the oldest 'Uralians' still kicking, as in the Finnish population...!?

A few decades ago Scandinavian anthtopologers could make friendly jokes about the rural, nature-oriented Finns - calling them "The Indians of Europe". Over the later years the same Finns have been named "The UFO of the European Genome".

One just wonders how the Finnish and the Red Indians would look like, in a genetic perspective.

Unknown said...

I wish people would stop using CEU. Its time has passed. It is clearly an admixtured population.

Mauri said...

Many thank for you work to see how the sample size affects to the results in component based analyzes. It has been obvious already long time. Biased sample sizes are not only leading to a false closeness, but also to false differences between populations. And going further, biased sample sizes lead to biasd histiry analyses. This has been a very common problem.

Dienekes said...

I think you should repeat the experiment with different West Asian populations, as Iranians are geographically adjacent to South Asia and linguistically closely related to the IE speaking part of South Asia so they may be genetically much closer to South Asians than the average West Asian.

In case you missed it, the experiment was first done with Adygei who are about the most distant West Asian population you can find (genetically, they are technically European).

LOL! again so what? I mean:

Nirjhar999, you are off-topic

I wish people would stop using CEU. Its time has passed. It is clearly an admixtured population

I don't see any evidence that CEU is particularly admixed. It's quite similar to people from Northwestern Europe, and the British Isles in all my experiments. Nonetheless, it was used here because I was comparing with published work, and I've also used other European populations with quite similar results.

Davidski said...

Dieneks I was just told my someone that you actually think the greater genetic diversity of horse haplotypes around the Caspian means this is where the Indo-European pastrolist expansion took place from.

If true, and I don't know if it is, then it's a pretty stupid thing to say. You don't have to be a biologist specializing in horses to understand that higher genetic diversity in animals points to wild expansion zones.

On the other hand, lower diversity, with high frequecies of certain haplotypes, points to domesticated expansion events.

So the Caspian area is likely a wild horse expansion region, but wild horses didn't speak Indo-European.

Onur Dincer said...

In case you missed it, the experiment was first done with Adygei who are about the most distant West Asian population you can find (genetically, they are technically European).

If you compared West Asians, South Asians and, say, Africans, I would agree with you. But as you are comparing West Asians, South Asians and North Europeans and as Adygei have elevated North European genetic affinity compared to West Asians proper, it would be better you to use a third West Asian population other than Iranians and Adygei (and also other than Lezgins) as a proxy for West Asians.

Nirjhar999 said...

"Off topic" well i am discarding what you are trying to prove(in disguise) via your experiments! You are smart but hey where is the undisputable clue for your west eurasians, east eurasians and north europeans coming to india and creating vedic culture?
You are trying to prove that europeans dont have ASI component, but even Reich et al. Have said ASI is not related to any other population in the world! So
1. ANI and ASI mix is only possible in india.

2. You said that if ANI is present in india from 45000 y.b. Then why europeans dont have the ASI component?
Ans: its simple, one branch of the ANI left india way before any mixing and created the bulk of the european peep's , while the other stayed home and later mixed with the 'blocked in india' ASI people.
According to oppenheimer r1a1a* did populated the eurasia from indian subcontinent 30000 years ago.

eurologist said...

I am still worried that these kind of studies can't distinguish between recent (i.e., neolithic) and ancient relations.

I know it's for some reason not very popular, but suppose for a moment that northern Indians and Pakistani populations were, as expected from climate studies, separated from the south by many tens of millennia starting from the initial population of the subcontinent, due to severe drought and absence of a monsoon. Then, most West Asian and European populations may very well originate from this northern portion of the sub-continent population, which could only re-unite with the south after LGM (when climate removed the large in-between desert areas, there).

So, what I am saying is that a significant portion of the similarity between ANI and both West Asian and European may go back ~45,000 years.

Then, only that portion of the signal which is exclusively West Asian is relevant to the neolithic. From that alone, I would concentrate on northern European populations that have the least known West Asian admixture. And I also would not be surprised that even for such Northern European populations with very little (neolithic) West Asian admixture, there is a remaining 20% - 30% affinity to ANI.

wagg said...

1/ Iranians had possibly received an archaic European input (one example: for instance, the Zarzian culture might be related to eastern Gravettian culture that apparently arrived via the north of the caucasus, maybe before 15,000 BCE). Your former Autosomal data could also support it as the Iranian profile's "European" components have differences in size of the diverse components than in south/central/east Asia).

2/ How comes the (mainly) north-west Indians share a mutation for lactase persistence (the -13,910 C>T allele) that the west Asian populations totally lack but that is very frequent among the Europeans? ( in fig. 3).

3/ And it's also weird that saraswathy et al, 2010 ("Brief communication: Allelic and haplotypic structure at the DRD2 locus among five North Indian caste populations") show the population of the north indian high castes as quite close to the populations of (north-)eastern Europe (such as Russian and Chuvashs) in his study ( These populations that are actually living in the region of the Abashevo culture which is a proposed location of origin for proto-indo-iranian language by several scholars. Strange coincidence.

It seems to go against it.

sykes.1 said...

One of the difficulies in reading your site is your use of undefined acronyms.

What pray tell is a CEU?

I assume you mean central Eurasian. However, this acronym is not defined in any of your posts on the ANI.

When I advised graduate students, I told them that in every paper one first spelled out the phrase and then but the abbreviation in parentheses after it. One could then use the abbreviation in place of the phrase.

I always insisted that they do this for even the most widely understood abbreviations, because there will always be a first time reader who doesn't know all the standard ones.

Unless of course, you're running a closed club with secret handshakes.

simon said...

I am dubious about how these components are defined.If you defined an Indus component,it probably show a spread all over Euroasia,real or imagined.

Fanty said...

"What pray tell is a CEU?"

"CEU" is a pretty widespread used term on various DNA related websites and Forums.

Dienekes most often uses "White Uthan" instead, wich I never read anywhere else but in fact describes much better what it actually is.

"CEU" are white Americans from the state of Utah.

Utah is known for mainly beeing settled by Brits, Scandinavians and Germans. Thats why "CEU" is often enough used as a proxy for a "Northwest European".

These "CEU" however are kind of strange. I have seen numbers in wich North-Germans are closer to CEU than to Southgermans. While Southgermans are aswell closer to CEU than to Northgermans.

In Dienekes exoeriments the current state is, that the only Europeans, still clustering with the CEU are the Brits. While Scandinavians and Germans seperated already.

Dienekes said...

The Caspian horse isn't a wild animal, it's one of the oldest domesticated horse breeds in existence. The paper on its genetic diversity calculates admixture estimates using the Caspian Horse as the eastern refugium contributor, and these are very high even for British and Iberian horses that also show evidence for the secondary domestication center. Diversity of eastern European horse breeds is lower than in the Caspian horse, of course someone can use that as evidence that horses were domesticated in eastern Europe, but using the same kind of "logic" we would conclude that the lowest diversity areas point to the area of domestication, so wheat, barley, sheep, oxen, etc. were all domesticated in Europe and not the Middle East.

The only thing that remains to be seen is whether Russian or Central Asian horse breeds exceed the Caspian in diversity. Time will tell. Central Europe is out, however.

It is important to note that the Caspian Horse does not have steppe antescendents, it is derived from the E. ferus populations of the Middle East. Its high diversity, its discovery in an early Iranian context, and its continued use down to the historical period in a a ceremonial and symbolic context is strongly inconsistent with the idea that "steppe pastoralists" introduced horses to Iran.

Unknown said...


Most of the samples in your data are from South India. They are highly skewed there.

By excluding Kashmiri Pandit and Sindhi and agreeing that they are close to CEU yourself made your argument week.

Of the 20 samples in the list Sindhi, meghaval and Srivastava are the only other north Indian Samples.

Geographically West Asia is closest so genetic flow either way is not surprising. Whoever has more diversity that is the direction of gene flow.

All these populations you sampled from India(South) you can see the actual data with high Y R2, and mt M.

You should plot the graph with real north Indian populations.

Dienekes said...

By excluding Kashmiri Pandit and Sindhi and agreeing that they are close to CEU yourself made your argument week.

I did not agree that these populations are close to CEU. I showed why Kashmiri Pandits and Pathans appear to be closer to CEU than to Adygei when there are unequal samples of CEU and Adygei. When samples are equalized all Indian Cline groups (from southmost India to Pakistan) form a narrow band on PC1 that is shifted to the West Asian side.

Nirjhar999 said...

No conclusiveness, suppose basque and N.E. are virgins and most unadmixed then why they have Y-DNAs like I, which is clealy meditterian?