October 01, 2005

Clusters strike back

Yet another paper discovers that self-reported ethnicity corresponds almost perfectly with genetic cluster membership. See some of my previous posts on the subject. Here is the conclusion:
The current study extends the previous findings of Rosenberg et al. (2002) indicating that analysis of population structure using a non-hierarchical clustering algorithm can separate population groups based on DNA polymorphisms. We show here that both continental and sub-continental populations can be readily distinguished and that admixed populations can be examined in the context of the contributions of putative parental populations. These results were robust when k < 7 were examined and were reproducible under many different models. In addition, the findings were not sensitive to exclusion of random groups of individuals, nor inclusion of large numbers of individuals from admixed groups. These findings and implications differ from those suggested by the recent studies of Serre and Paabo in which the microsatellite data set utilized by Rosenberg was reexamined (2004). These investigators suggested that observations of continental grouping in population structure analyses is due to the sampling methods and that there are gradients of human genetic diversity rather than discontinuities between the continents. In contrast, the current study using diallelic AIMs supports the conclusion that the continental population groups are relatively discrete and that such results are not due to limited sampling and exclusion of admixed populations. As discussed subsequently, the robust results observed in the current study may depend on the use of diallelic AIMs.

Also of interest from the paper:
  • Hispanics vary in their admixture proportions. For example, Mexican Americans are more Caucasoid than Mexicans, while Puerto Ricans are more Negroid than other Hispanics
  • South Asians are assigned their own cluster when k=5, confirming my earlier suggestions. and in agreement with uniparental markers which establish beyond a doubt that admixture between Caucasoids and East Asians does not reflect the origin of South Asian populations. There is no need to postulate East Asian grandparents for South Asians. The authors write:
"However, at k = 5 or greater, the STRUCTURE analysis shows the presence of a new cluster (here designated cluster 5) in the South Asian subjects (Fig. 2b, c). This cluster was the predominant group in the South Asian population for each of the South Asian subjects and was present in the other populations only in small percentages. Similarly, the triangle plot descriptions of the different populations (Fig. 3), shows not only the clear separation of each of the continental populations but also the South Asian population from each of the other populations (Fig. 3c). This result was consistently observed in all analyses performed at k=5 using either the linkage or admixture models applied by the STRUCTURE program and was observed for all individuals that included members with diverse language dialects and states of Indian origin (see Materials and methods)."
Human Genetics (online early)

Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine

Nan Yang et al.


We and others have identified several hundred ancestry informative markers (AIMs) with large allele frequency differences between different major ancestral groups. For this study, a panel of 199 widely distributed AIMs was used to examine a diverse set of 796 DNA samples including self-identified European Americans, West Africans, East Asians, Amerindians, African Americans, Mexicans, Mexican Americans, Puerto Ricans and South Asians. Analysis using a Bayesian clustering algorithm (STRUCTURE) showed grouping of individuals with similar ethnic identity without any identifier other than the AIMs genotyping and showed admixture proportions that clearly distinguished different individuals of mixed ancestry. Additional analyses showed that, for the majority of samples, the predicted ethnic identity corresponded with the self-identified ethnicity at high probability (P > 0.99). Overall, the study demonstrates that AIMs can provide a useful adjunct to forensic medicine, pharmacogenomics and disease studies in which major ancestry or ethnic affiliation might be linked to specific outcomes.


