The genetic variation and population history in the Baltic Sea region
Sharp genetic borders within a geographically restricted region are known to exist among the populations around the northern Baltic Sea on the northern edge of Europe. We studied the population history of this area in greater detail from paternal and maternal perspectives with Y chromosomal and mitochondrial DNA markers. Over 1700 DNA samples from Finland, Karelia, Estonia, Latvia, Lithuania and Sweden were genotyped for 18 Y-chromosomal biallelic polymorphisms and 8 microsatellite loci, together with 18 polymorphisms from the coding area of mtDNA and sequencing of the HVR1. Y chromosomal haplogroups from the biallelic data indicate both various phases of gene flow and existence of genetic barriers within the Baltic region. Haplogroup N3, being abundant on the eastern side of the Baltic, differentiates between eastern and western sides of the Baltic Sea, just like R1b that has a reverse frequency pattern to N3. The typically Scandinavian haplogroup Ia1 has a high frequency of up to 40%, separating not only Sweden but also Western Finland from the other populations. The frequency of haplogroup R1a1, most characteristic to Slavic peoples, varied substantially across the populations. In addition to biallelic markers, Y-chromosomal microsatellite loci were analyzed for a more detailed approach to the history of the paternal lineages in the region. We also analyzed mtDNA markers with special interest for sub-haplogroups of H and U, that among other haplogroups, show substantial variation between the populations (e.g. haplogroups H1, H2, T and J1). In conclusion, our current Y-chromosomal and mtDNA data suggest various incidents of gene flow from different sources, each reaching partly different areas of the Baltic region, which can be thus seen as a meeting point of a not only culturally but also genetically diverse set of populations.Asian Nomads traces in the mitochondrial gene pool of Slavs.
Mitochondrial DNA (mtDNA) variability was studied in a sample of 179 individuals representing Czech population from west Bohemia. MtDNA analysis revealed that the majority of Czech mtDNAs belongs to the common West Eurasian mitochondrial haplogroups. However, about 3 per cent of Czech mtDNAs encompass East Eurasian lineages (A, N9a, D4, M*). Comparative analysis of published data has shown that different Slavonic populations contain small but marked amount of East Eurasian mtDNAs (e.g. 1.3 per cent in Eastern Slavs, 1.8 per cent in Western Slavs, and 1.2 per cent in Southern Slavs). It is noteworthy that Baltic populations (Latvians, Lithuanians and Estonians) have avoided a marked influence of maternal lineages of East Eurasian origin (0.3-0.6 per cent). The two East Eurasian mtDNA haplogroups, Z1 and D5, are present in gene pools of North European Finnic populations (Saami, Finns, and Karelians). Unlike them, Slavonic populations in general are characterized by heterogeneous mtDNA structure, defined, in addition to Z1 and D5, by haplogroups A, C, D4, G2a, M*, N9a, F and Y. Therefore, different scenarios of female-mediated East Eurasian genetic influence on Northern and Eastern Europeans should be highlighted: (1) the most ancient, probably originated in the early Holocene, influx of Asian tribes, which brought a few selected East Asian mtDNA haplotypes (like Z and D5) to Fennoscandia (Tambets et al. 2004), and (2) gradual gene flows of historic times occurred mostly in the Middle Ages due to migrations of nomadic peoples (such as the Huns, Avars, Bulgars, Mongols) to Eastern and Central European territories inhabited mainly by Slavonic tribes. We suggest that the presence of East Eurasian mtDNA haplotypes is not original feature of gene pool of the proto-Slavs, but mostly is a consequence of admixture with Central Asian nomadic tribes, who migrated into Central and Eastern Europe in the early middle Ages.Use of Forensic Markers in the Assessment of Population Stratification.
Assignment of individuals to population groups is important to genetic case control association studies, admixture mapping, medical risk assessment, genealogy, and forensic studies. Polymorphic sequences can be used to infer ancestry but their utility for such an application is related to the number of alleles and relative frequency differences of these alleles between the population groups under study. Multiple study designs differing in numbers and types of polymorphic markers with differing levels of informativeness make comparison of studies difficult. The use of commercially-available highly-informative markers that are used internationally in forensic applications could provide a universal first tier analysis for assignment of individuals to population groups prior to inclusion in association and admixture studies. We evaluated the utility of the PowerPlex kit of 16 markers from Promega for this purpose. Multiple population groups including African, Bengalis, Chinese, Japanese, Koreans, Crypto Jews, Sephardic Jews, and Dutch were genotyped using the PowerPlex kit. The data were analyzed with STRUCTURE (Pritchard et al.) using an admixture model, correlated alleles and 3 clusters. Africans, Asians (Bengalis, Koreans, Chinese and Japanese), and Caucasians (Dutch, Sephardic Jews, and Crypto Jews) were clearly delineated. Individuals showing admixture were detectable and their removal resulted in more discrete clustering. An independently collected and genotyped set of Dutch individuals was indistinguishable from the original Dutch group providing reproducibility across data sets. The sensitivity conferred by the number of markers used in the analysis was assessed by removing markers. Delineation of population groups was apparent when 14 markers were used, although clusters were noisier; however it was not possible to delineate population groups when only 8 markers were used. The use of forensic markers is a promising strategy for clustering individuals into population groups and will be an inevitable outcome of their forensic use.Evaluation of Ancestry and Linkage Disequilibrium Sharing in Admixed Population in Mexico
National Institute of Genomic Medicine, Mexico. More than 80% of the Mexican population is considered Mestizo, resulting from the admixture of ethnic groups with Spaniards. To generate an initial estimate of ancestral contribution (AC) of populations from Europe, Africa and Asia to the Mexican Mestizos, we genotyped 104 samples from the states of Sonora (n=20), Yucatan (n=17), Guerrero (n=21), Zacatecas (n=19), Veracruz (n=18) and Guanajuato (n=8) using the 100K Affymetrix SNP array, and used data from the International HapMap Project as the parental population information. From 3,055 ancestry informative SNPs reported by Smith et al. and Choudhry et al., we identified 105 present in the 100K array and used them to calculate AC from each population to our sample. To infer AC we used Structure software under the admixture model. Based on this analysis, the average AC in our samples is 58.96% European, 10.03% African and 31.05% Asian. Sonora shows the highest European contribution (70.63%) and Guerrero the lowest (51.98%) where we also observe the highest Asian contribution (37.17%). African contribution ranges from 7.8% in Sonora to 11.13% in Veracruz. Based on these data, we grouped our population according to European AC (<50%,>70%). We used the Carlson algorithm to derive European tagSNPs from the 100K marker set. To explore Linkage Disequlibrium Sharing (LDS) between Mestizos and Europeans, we calculated the proportion of tagSNP-marker pairs that maintained an r2≥0.8 in each evaluated population. In general, comparison of LDS between European and Asian population is ~73%, whereas comparison with African population is ~40%. Mestizos from Guerrero show the lowest LDS (74%), whereas those from Sonora show the highest (77%). Similar results are seen in the group of lower (<50%)>70%) European ancestry. Our results suggest that the Mexican Mestizo population shows ancestry-based stratification that will requiere the appropriate corrections to avoid spurius results in association studies. Our results show that admixed populations have unique patterns of LD depending on levels of ancestral contribution.European mitochondrial haplogroups exhibit differential risk of developing presbycusis.
The genetic basis of human presbycusis (age-related hearing loss) is unknown. This common disorder is characterized by difficulty understanding conversation, particularly in noisy backgrounds. Audiograms of presbycusics show sloping hearing loss, with greatest deficiencies at the highest frequencies, and over time an individual’s hearing loss progresses into the lower frequencies that are more important for understanding speech. We investigated the hypothesis that the mitochondrial (mt) genome plays a role in presbycusis. Subjects of European ancestry, all over age 58, were tested using both classical and advanced audiometric measures and then genotyped to determine mt haplogroups. We found that subjects belonging to haplogroup H (N=93) had better hearing than other Europeans (N=80), with the greatest differences observed in the right ear at 3 kHz (p=0.017) and 10-14 kHz (p=0.016). The difference at 3 kHz correlates with the common noise notch location, and thus may indicate a difference in susceptibility to noise damage. Distortion product otoacoustic emissions also indicated better hair cell health in haplogroup H subjects, at higher frequencies and in the right ear (average DPOAEfor 4-6 kHz, p= 0.010). These results support the hypothesis that a mitochondrial factor influences susceptibility to the development of presbycusis. We are currently investigating the mt genome for causative mutations linked to the haplogroups.
Estimating the split time of Human and Neanderthal populations
Previous genetic studies of Neanderthal ancestry have used mtDNA and thus have been limited in their conclusions on the relationship of humans and Neanderthals. We present here the first use of Neanderthal genomic DNA to assess the joint history of human and Neanderthal populations. Our data consist of 37kb of short fragments of genomic DNA sequenced in Neanderthal. By studying the degree to which modern human diversity is shared with Neanderthal we can assess the time at which the human and Neanderthal populations split. We use a flexible simulation based approach that demonstrates the power of using human variation data in such analyses. We find that the two populations split ~400,000 years, predating the emergence of modern humans. Our best fitting model predicts that the Neanderthal lineage will be outgroup to the human population ~52% of the time.The Genetic Structure of Human Populations in Africa.
Africa contains the greatest levels of human genetic variation and is the source of the worldwide range expansion of all modern humans. Knowledge of the genetic population boundaries within Africa has important implications for the design and implementation of genetic epidemiologic studies of Africans and African Americans, and for reconstructing modern human origins. A dataset consisting of ~3.7 million genotypes has been generated from the Marshfield panel of 773 microsatellites and 392 in-del polymorphic genetic markers. These markers were genotyped in ~3,200 individuals from >100 diverse ethnic populations across Africa as well as in 118 African Americans and in the CEPH Human Genome Diversity Panel, consisting of 1048 individuals from 51 globally diverse populations. Preliminary analysis of population structure using the program STRUCTURE1 indicates considerably more substructure amongst global populations (estimate for the number of genetic clusters, K, is 12) and amongst African populations (K = 9) than had previously been recognized2. Population clusters are correlated with self-described ethnicity and shared cultural and/or linguistic properties (e.g. Pygmies, Khoisan-speakers, Bantu-speakers, etc). African Americans have predominantly West African Bantu (~80%) and European (~17%) ancestry, although individual admixture levels vary considerably. These results justify the need to include a broad range of geographically and ethnically diverse African populations in studies of human genetic variation. 1Pritchard JK, et al. Genetics 155:945-59 (2000) 2Rosenberg NA, et al. Science 298:2381- 5 (2002).Patterns of admixture in Latino populations
We examined the diversity of 13 Latino populations from seven countries (Mexico, Guatemala, Costa Rica, Colombia, Chile, Argentina and Brazil) typing 745 autosomal microsatellite markers in 250 individuals. Estimates of genetic ancestry for these populations varied substantially. Native American ancestry varied between 19.6% and 70.3%, European ancestry between 26.9% and 70.6%, and African ancestry between 1.1% and 9.8%. Genetic structure analysis provides evidence of a genetic continuity between pre- and post-Columbian populations for specific geographic regions. For instance, a Chibchan-Paezan ancestry is detectable in Latinos from lower Central America and northwest South America. Individual admixture estimates vary considerably between populations. Some Latinos (e.g. Mexico City) show marked variation in individual admixture, whereas others (e.g. Antioquia and Costa Rica) show little variation. This variation is likely to reflect the history of admixture of each geographic region examined: some Latino populations are still undergoing substantial admixture whereas others underwent admixture mostly in early colonial times. These results have important implications for admixture mapping and association mapping studies in Latino populations.
Genomic diversity and population structure of Native Americans
We examined 745 autosomal microsatellite markers in 432 individuals sampled from 24 indigenous populations in the Americas. These data were analyzed jointly with similar data available in 54 other indigenous populations from across the world (including an additional 5 Native American groups). The populations from the Americas show lower diversity and more differentiation than populations from other continental regions (global Fst=0.08). Signals of long-range linkage disequilibrium are detectable to a greater extent in Native Americans than in other populations, as are signals of recent bottlenecks followed by population growth. A negative correlation is observed between population diversity and geographic distance from the Bering Strait, an observation consistent with the north-to-south dispersal of humans upon initial entry into the continent. A higher diversity is observed in western vs. eastern South American populations, potentially reflecting differences in long-term effective population size or in colonization routes within South America. Phylogenetic trees relating Native American populations show a marked differentiation between Canadian and other Native populations. Canadian natives also show a detectable shared ancestry with contemporary Siberian populations, which is less visible for more southerly Americans. A substantial agreement is observed between phylogenetic relatedness and population affiliation according to the linguistic classification of Greenberg.
The rare nonsynonymous SCN5A-S1103Y variant in Caucasians is due to recent African Admixture as revealed by 100k SNP genotyping.
The SCN5A-S1103Y variant is an established and confirmed risk factor conferring an odds ratio up to 8.5 for cardiac ventricular arrhythmias and sudden cardiac death (Splawski et al, Science, 2002, Burke et al., Circulation, 2005, Plant et al., J. Clin. Invest. 2006). In Africans it is a common nonsynonymous SNP (MAF=8%), but it is rarely observed in Caucasians (Chen et al, J. Med. Genet. 2002). In a Bavarian family appearing of entirely Caucasian descent and affected with long QT Syndrome we have detected this variant in heterozygote state as the only causal nonsynonymous variation upon diagnostic ion channel resequencing. To resolve the question, whether in the family the variant was (a) of ancient African descent, (b) due to recent African admixture or (c) a de novo mutation, we analyzed the genetic segment it resided on. Dense SNP genotyping in admixed individuals allows to infer the ethnicity of chromosomal regions if allele frequencies are known in the original populations. Ethnicity inference for any given locus can be carried out by applying the product rule to a sliding window of neighboring SNPs or via modeling ancestry by hidden Markov Chain Monte Carlo Methods (Tang et al. Am. J. Hum. Genet, 2006). By 100k SNP genotyping of the Bavarian family, we demonstate that the S1103 variant is due to recent African admixture (b) and could rule out possibilities (a) and (c). This application demonstrates that inferring ethnicity of chromosomal regions by high density SNP genotyping is a powerful approach with prospects also to admixture mapping of disease loci and population stratification correction of genomewide association mapping of complex disease loci.
Allele frequency estimates from DNA pools for 317,000 SNPs for multiple European and worldwide populations and discovery of Ancestry Informative Markers for Europe.
The identification of Ancestry Informative Markers (AIMs) and inference of individual genetic history is useful in many applications, including studies of geography and evolution of human populations, forensic sciences, pharmacogenomics, admixture mapping and association studies of complex diseases. While many AIMs have been reported that define strong genetic differences between major continents, it is more difficult to identify markers that reflect subtle, within-continent diversity, such as the heterogeneous ancestry of European Americans contributed by different populations within Europe. We have analyzed DNA pools, each for a different population, on Illumina HumanHap300 BeadArrays to estimate allele frequencies for ~317,000 Single Nucleotide Polymorphisms for 9 European, 6 African, and 2 Amerindian populations in the Human Genome Diversity Project collection. We have also evaluated the performance of this method by analyzing three HapMap pools (YRI, CHB, and JPT), for which the true allele frequencies are already known from the International HapMap Project. We found that the allele frequency estimates differed between replicate chips by less than +/-5% for 95% of the SNPs, and that the estimated frequencies and the true frequencies differed by +/-5-10% for 90% of the SNPs. The data for nine European populations, from western Caucasus, Scotland, Tuscany, Sardinia, France, Iberia, Russia, Northern Italy, and a Basque region, showed a clear excess of SNPs having large allele frequency differences (e.g. >30%) between most pairs of populations, compared to what would be expected given the sample sizes. These results provide a valuable resource of European AIMs for monitoring within-continent stratification in association studies. We are currently validating the most informative SNPs by individually genotyping samples that formed the pools as well as those from additional European populations.
Mitochondrial haplogroups are associated with asthma and total serum IgE levels
Maternal history of asthma and/or atopy is a major risk factor for the subsequent development of asthma and allergy in childhood. Although mitochondrial mutations have been implicated in several maternally inherited monogenic disorders, no studies of mitochondrial polymorphisms and asthma have been reported.Weevaluated whether common mitochondrial haplogroups are associated with asthma and total serum IgE levels. 8 common mitochondrial single nucleotide polymorphisms (mtSNP) were genotyped in two cohorts of European ancestry: 512 adult women with incident asthma and 517 matching controls participating in the Nurses’ Health Study (NHS) and 654 children ages 5-12 years with mild to moderate asthma participating in the Childhood Asthma Management Program (CAMP). Genotyping was performed using TaqMan® probe hybridization assays. 93 random NHS samples were run in duplicate for all assays and demonstrated 100% concordance. In the CAMP Study, genotype data from probands’ mothers was also 100% concordant across all assays. Completion rates in both cohorts were > 95% for all markers. mtSNP 9055 was seen at higher frequency in NHS asthma cases (frequency 11.1%) than controls (8.0%, p = 0.02). Association analysis using haplo.score identified two haplogroups associated with asthma: one haplogroup at a frequency of 3.83% among cases compared to 1.27% among controls (p=0.0002) and another at a frequency of 9.97% among cases and 11.3% among controls (p=0.04). The CAMP Study is a case-only (family-based) cohort, thus precluding evaluation of mitochondrial SNP associations with asthma status. However, quantitative analysis of mitochondrial haplogroups identified two haplogroups of 11.0% and 1.87% frequency that were associated with log-transformed total serum IgE levels, an important intermediate phenotype in asthma and atopy (p=0.006 and 0.01, respectively). These data suggest that common mitochondrial haplogroups influence asthma diathesis.