As expected, the first principal component (PC 1) distinguishes Africans from non-Africans. The next three principal components also characterize continental regions: PC 2 distinguishes East Asians from Africans and Europeans, with South Asians and Mexicans at intermediate values; PC 3 distinguishes South Asians from East Asians; and PC 4 distinguishes Mexicans from non-Mexicans. The subsequent principal components mark within-continent variation. PC 5 reveals a north-to-south cline within Europeans (Figure 3), consistent with existing studies of European substructure. [...] PC 6 distinguishes the African Americans from the HapMap Africans. [...] Principal component 7 (Figure 2D) separates the three East Asian populations: Japan (left), HapMap CHB (center right), and Taiwan (far right). [...] We do not show further results because PC 8 and subsequent PCs display substructure within Africans and African Americans, but do not correspond to any known geographic or population structure among individuals.The American Journal of Human Genetics, doi:10.1016/j.ajhg.2008.08.005
The Population Reference Sample, POPRES: A Resource for Population, Disease, and Pharmacological Genetics Research
Matthew R. Nelson et al.
Technological and scientific advances, stemming in large part from the Human Genome and HapMap projects, have made large-scale, genome-wide investigations feasible and cost effective. These advances have the potential to dramatically impact drug discovery and development by identifying genetic factors that contribute to variation in disease risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions. In spite of the technological advancements, successful application in biomedical research would be limited without access to suitable sample collections. To facilitate exploratory genetics research, we have assembled a DNA resource from a large number of subjects participating in multiple studies throughout the world. This growing resource was initially genotyped with a commercially available genome-wide 500,000 single-nucleotide polymorphism panel. This project includes nearly 6,000 subjects of African-American, East Asian, South Asian, Mexican, and European origin. Seven informative axes of variation identified via principal-component analysis (PCA) of these data confirm the overall integrity of the data and highlight important features of the genetic structure of diverse populations. The potential value of such extensively genotyped collections is illustrated by selection of genetically matched population controls in a genome-wide analysis of abacavir-associated hypersensitivity reaction. We find that matching based on country of origin, identity-by-state distance, and multidimensional PCA do similarly well to control the type I error rate. The genotype and demographic data from this reference sample are freely available through the NCBI database of Genotypes and Phenotypes (dbGaP).