October 18, 2012

Relatives/duplicates in ADMIXTURE

The presence of relatives in a dataset tends to throw ADMIXTURE out, but this does not always happen. In particular, I've noticed that at low K, relatives do not appear to form their own hyper-specific clusters. A good example of this is the Yunusbayev et al. Armenians_Y sample (N=16) that happens to include what appears to be a common individual (or a twin?) with my own own Armenian_D sample from the Dodecad Project. This was discovered the last time I ran ADMIXTURE, so I henceforth began using a subset of 15 Armenians (Armenians_15_Y) from that dataset whenever I also included my Dodecad sample.

In my current ongoing analysis of the world dataset, I included two versions of the Sakilli, Paniya, and Malayan samples, from Behar et al. and Chaubey et al. I believe that HarrappaDNA Project has previously identified that some of these are not exactly the same individuals, so I wanted to see what the ancestry of all these individuals was, to help me decide which ones to keep.

Here are the K=5 ancestral proportions of the Behar et al. Sakilli:


GSM536813 10.2 7.8 2.2  0 79.9
GSM536814  8.5 9.3 2.1  0 80.0
GSM536815  9.7 7.9 3.6  0 78.8
GSM536816  8.8 8.7 2.1  0 80.4

and of the Chaubey et al. Sakilli:

SAKD60 10.2 7.8 2.2  0 79.9
SAKD72  9.7 7.9 3.6  0 78.8
SAKD75  8.8 8.7 2.1  0 80.4
SAKD64  8.5 9.4 2.1  0 80.0

These appear to be the same individuals, which was confirmed by IBD analysis.

The Malayan individuals also appear to be the same:

GSM536915 0.3 15.5 2.7  0 81.6
GSM536812 3.3 16.6 2.8  0 77.3

A382 0.3 15.5 2.7  0 81.6
MLYA383 3.3 16.6 2.8  0 77.3

But, as noticed by HAP, the Paniya individuals are not the same:

GSM536916 5.1 11.2 2.2 0.0 81.6
GSM536806 0.4 69.7 0.0 4.3 25.6
GSM536807 0.0 79.7 0.0 2.4 18.0
GSM536808 0.0 77.5 0.5 1.7 20.3

2953   D36 5.1 11.2 2.2 0.0 81.6
2954 PNYD9 0.0 19.8 2.5 0.6 77.1
2955 PNYD3 0.0 21.2 1.5 0.0 77.3
2956 PNYD1 0.0 21.7 2.7 0.3 75.2

As I move forward in my "world" analysis, I've decided to drop GSM536916 and the Chaubey et al. versions of Sakilli and Malayan. Thus, PANIYA will refer to the Southeast Asian-like individuals of the Behar et al. set, and Paniya_Ch to the South Asian-like individuals of the Chaubey et al. set, with one copy of the duplicated individual removed.

5 comments:

terryt said...

"As I move forward in my 'world' analysis"

Have you managed to track down any Australian Aboriginal or Papuan populations? They would make any 'world' analysis much more meaningful than any analysis without them. I would presume that the people who managed to cross Wallace's Line had advanced enough boating technology to expand back west and north as well.

Etyopis said...

Unfortunately, I can not perform my clinality test on this particular data-set, since you have opted not to report your K2 results, I have found that the results of ADMIXTURE generated components on a global level, even at higher K values, vary significantly as a function of the 'clinality' (or lack thereof ) of the particular dataset....

Dienekes said...

@Etyopis,

In general, your idea of determining the clinality of a dataset by taking pairwise differences across the sorted order is interesting, and could be useful for real clines that occur within the human species.

However, at K=2 there is no such cline in the human species; in particular Caucasoids (who appear intermediate in a K=2 analysis) are not really the product of admixture between Africans and East Eurasians (who appear terminal). Populations of Caucasoid+African or Caucasoid+Asian ancestry (or even African+Asian, although there are not many of those) may come to occupy neighboring positions in terms of their K=2 proportions, despite having very different origins.

Etyopis said...

However, at K=2 there is no such cline in the human species

But there indeed is a Cline that is pegged by Africans on one side and East Asians / Amerindians on the other side, furthermore, the component that further appears at K3, peaking in Basques and Sardinians, certainly emerges from the synthesis of those polarized components @ K2, as evidenced by the fact that the West Asian component that emerges @ K3 has an intermediate Fst distance with respect to the East Asian and African components that were already present @ K2 , whether this 'intermediateness' is a result of a signal of common ancestry with Africans and East Asians that was later preserved in West Asians, or whether it is a signal of later Admixture events is off-course a different story altogether, but the facts do stipulate that it is an intermediate cluster that is composed of ~1/3 African and ~2/3 East Asian in-terms of K2 ADMIXTURE proportions.

Dienekes said...

But there indeed is a Cline that is pegged by Africans on one side and East Asians / Amerindians on the other side

If by "cline" you mean a sequence of numbers of that increases, then there will always be a tautological cline.

But, that is not a definition of a cline. A cline is variation of a trait (in this case of an admixture proportion) that varies monotonically over a geographical path.

As I have already explained, populations that are not geographically close will end up as neighbors in the sequence of increasing proportions; that sequence is not a cline.