In the recent African American Lives documentary, geneticists were able to estimate the most probable African origin of Henry Louis Gates, Jr. by using a 11,555-SNP array. The following paper reports on the study of several human populations using this array.
Until now, it was well known that major continental origin could be predicted accurately, but by increasing the number of polymorphisms used, one can go one step further and discover patterns at the intra-continental level.
The following plot of the first PCA components is pretty self-explanatory. As you can see, very clear clusters corresponding to populations emerge when such large numbers of polymorphisms are used.
Human Genomics Volume 2, Number 2, June 2005
Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation
Shriver, Mark D et al.
Understanding the distribution of human genetic variation is an important foundation for research into the genetics of common diseases. Some of the alleles that modify common disease risk are themselves likely to be common and, thus, amenable to identification using gene-association methods. A problem with this approach is that the large sample sizes required for sufficient statistical power to detect alleles with moderate effect make gene-association studies susceptible to false-positive findings as the result of population stratification.1,2 Such type I errors can be eliminated by using either family-based association tests or methods that sufficiently adjust for population stratification.3–5 These methods require the availability of genetic markers that can detect and, thus, control for sources of genetic stratification among populations. In an effort to investigate population stratification and identify appropriate marker panels, we have analysed 11,555 single nucleotide polymorphisms in 203 individuals from 12 diverse human populations. Individuals in each population cluster to the exclusion of individuals from other populations using two clustering methods. Higher-order branching and clustering of the populations are consistent with the geographic origins of populations and with previously published genetic analyses. These data provide a valuable resource for the definition of marker panels to detect and control for population stratification in population-based gene identification studies. Using three US resident populations (European-American, African-American and Puerto Rican), we demonstrate how such studies can proceed, quantifying proportional ancestry levels and detecting significant admixture structure in each of these populations.