Haplogroup H of mitochondrial DNA, a far echo of the West in the heart of Central Asia
Through the millennia, Inner Asia played a pivotal role in shaping the history that greatly added to the cultural, ethnic, and genetic diversity observed throughout present Eurasia. Perhaps the two most significant phenomena witnessed in this part of the world were the ambitious expansion strategy employed by Mongolia’s most prominent personality, Genghis Khan and the complex network known as the Silk Road that for nearly 3,000 years contributed to the exchange of goods and the transmission of philosophy, art, and science that laid the foundation for the great civilizations of China, India, Egypt, Persia, Arabia, and Rome, and in several respects to the modern world. Over the last few years, through an international collaborative effort, researchers at the Sorenson Molecular Genealogy Foundation were able to collect 2,727 DNA samples, informed consents, and genealogical data in Mongolia, Kyrgyzstan, and Kazakhstan. All the samples were sequenced for the three hypervariable segments of the mitochondrial DNA (mtDNA) control region to assess the genetic composition of the modern population of these countries. We identified ~600 different haplotypes that could be ascribed to more than 30 haplogroups and sub-haplogroups. As expected, most haplogroups are typical of modern East Asian populations, but intriguingly, many different Western Eurasian clades were also identified, with a particular high incidence of H (~8.0%), the most common haplogroup in Europe. This feature cannot be attributed to genetic drift since different H sub-lineages have also been identified, each of them represented by several different haplotypes. The mtDNA distribution profile in the heart of Central Asia suggests a direct link between this area and Western Eurasia that could be explained by ancient migrations or by more recent historical events, such as Genghis Khan’s conquering efforts and trade or cultural exchanges along the Silk Route. To discriminate between these two possible scenarios, we are now analyzing a subset of these samples at the highest possible level of resolution - that of complete mtDNA sequences - focusing particularly on those H mtDNAs that seem to be the most informative considering their control-region haplotypes. Our preliminary data seems to be in favor of rather ancient genetic inputs from the West in shaping the peculiar mtDNA gene pool of Inner Asia’s present-day populations.
The following study seems to do precisely what I recently asked for:
However, as the PCA analysis shows, Ashkenazi Jews are distinct from both Europeans and non-Jewish Middle Eastern populations and cannot be viewed as a simple mix of the two; their distinctiveness must be -in part- due to the specific features of the small founder population of that community after it became effectively reproductively semi-isolated from gentiles after Roman times. It would be interesting to see different Jewish communities studied in the context of a broad variety of European and Middle Eastern populations, to determine whether Ashkenazi distinctiveness is specifically Ashkenazi or more generally Jewish distinctiveness; I would bet on a combination of the two.
Abraham's children in the genome era: Major Jewish Diaspora populations comprise distinct genetic clusters with shared Middle Eastern ancestry
Despite residence all over the world, Jewish populations have maintained continuous genetic, cultural, and religious tradition over 4,000 years. The unique ethnic makeup and social practices provide an invaluable opportunity to understand their genetic origins and migrations and to elucidate the genetic basis of complex disorders. To generate a comprehensive HapMap of ethnically diverse, healthy Jewish populations, we used the Affymetrix array 6.0 to genotype 381 samples recruited from 7 Jewish communities with different geographic origins: Eastern European Ashkenazim; Italian, Greek and Turkish Sephardim; Iranian, Iraqi, and Syrian Mizrahim (Middle Easterners). Here, we present population structure results from compiled datasets after merging with the Human Genome Diversity Project and the Population Reference Sample studies, which consisted of 146 non-Jewish Middle Easterners (Druze, Bedouin and Palestinian), 30 northern Africans (Mozabite from Algeria), 1547 Europeans, and 653 individuals from other African, Asian, Latin American, and Oceanian populations. Both principal component analyses and multi-dimensional scaling analysis of pairwise Fst distance show that Jewish populations form a cluster clearly distinct from all major continental populations. The results also reveal a finer population substructure in which each of 7 Jewish populations studied here form distinctive clusters - in each instance within group Fst was smaller than between group, although some groups (Iranian, Iraqi) demonstrated greater within group diversity and even sub-clusters, based on village of origin. By pairwise Fst analysis, the Jewish groups are closest to Southern Europeans (i.e. Tuscan Italians) and to Druze, Bedouins, Palestinians. Interestingly, the distance to the closest Southern European population follows the order from proximal to distal: Ashkenazi, Sephardic, Syrian, Iraqi, and Iranian, which reflects historical admixture with local communities. STRUCTURE results show that the Jewish Diaspora groups all demonstrated Middle Eastern ancestry, but varied significantly in the extent of European admixture. There is almost no European ancestry in Iranian and Iraqi Jews, whereas Syrian, Sephardic, and Ashkenazi Jews have European admixture ranging from 30%~60%. Analysis of identity-by-descent provides further insight on recent and distinct history of such populations. These results demonstrate the shared and distinctive genetic heritage of Jewish Diaspora groups.So, it seems that there will soon be real genomic data on the source and extent of admixture in Jews. The absence of Greek and Anatolian samples may be problematic in finding the sources of such admixture, but the presence of Tuscans, who are reasonably close to them in a pan-European context should do well to serve as a substitute. In a recent sutdy (in which Anatolians were not included), the closest populations to Ashkenazi Jews were Italians of mostly southern provenance (Fst=0.0040) and Greeks (Fst=0.0042) and fairly close to Tuscans (Fst=0.0066)
The following study seems to demonstrate my recent suggestion of archaic admixture in Africa itself:
It does not, however, tell us that this is because of archaic introgression in Europeans. The culprit could equally well be long-term population structure in Africa, i.e., the presence of "modern" and "archaic" populations in Africa itself.Deep population structure in sub-Saharan African populations
We analyzed ~500 Kb of resequencing data from 91 different intergenic regions in samples from three sub-Saharan African populations: Mandenka from Senegal, Biaka pygmies from the Central African Republic and San from Namibia. We employed novel methodology to estimate the split times and migration rates between populations. We found strong evidence for split times that predate the exodus of modern humans out of Africa (e.g., > 100 Kya). In addition, we also found evidence of ancient admixture (with unknown ‘archaic’ human groups) in the recent history of both the Biaka and the San.Analysis of Genomic Admixture in Costa Rica Population
Costa Rica (CR) population is a unique population representing a typical admixture of major continental ancestral populations. 1,301 samples collected from participants in a population-based study conducted in the Guanacaste region of CR were genotyped on a custom Illumina iSelect chip harboring 27,635 SNPs. The SNPs on the chip were selected based on multi-ethnic tagging strategy for three HapMap populations: CEU, YRI and JPT+CHB and cover 1,000 candidate genes/regions for a range of cancers. This data set was sufficiently large for the investigation of population substructure in our CR study and the examination of linkage disequilibrium (LD) patterns. Three HapMap major continental populations and a Native American population from the Illumina iControl DB were used as the reference populations for these analyses. Our preliminary results indicate that the Guanacaste CR population was formed mainly by a three-way admixture with 42.5%, 38.3% and 15.2% Native Indian, European, and African respectively. In addition, 4.0% residual genetic component derived from Asians was observed in our CR samples. Both model based STRUCTURE program and Principal Component Analysis (PCA) revealed consistent substructure pattern for the CR population. The magnitude of LD in the CR population seems to be smaller than all the reference populations except YRI. A more detailed knowledge of the underlying genetic structure of the CR population would be informative to assess its population genetic history and to assist in the interpretation of investigations of complex diseases in the CR or a comparably admixed population.Analysis of Genetic Substructure of Han Chinese Using Genome-Wide SNP Arrays: Implication for Association Studies.
China will start this year a $30 million effort of genome-wide association studies (GWAS) of common diseases in Chinese populations which have been largely underrepresented in the similar effort worldwide. A general concern is population stratification (ancestry differences) among subpopulations which can cause false positive associations. Han Chinese is the largest ethnic group in the world, however, its population substructures are often expected and yet well characterized. In this study, we examined population substructures in a diverse set of >1,700 Han Chinese samples collected from 26 regions, each genotyped with at least 160K single nucleotide polymorphisms (SNPs). Our results showed that: (a) Han Chinese population is complicatedly substructured, with the main observed clusters roughly corresponding to northern Han, central Han and southern Han; (b) Han Chinese samples collected from large cities, such as Shanghai, Beijing and Guangzhou, show diverse source of ancestries including three aforementioned clusters; (c) HapMap samples (CHB & CHD) and HGDP samples (Han & Han-NChina) deliver a limited representation of Han Chinese people. Building on the above insights, we investigated false positive rates and statistical power in various study designs using both empirical and simulated data. We further explored sample collection strategies and public data usage for future association studies.It will be interesting to see if the authors of the following study estimated gene flow in non-southern European populations as controls, to see what is the excess of Sub-Saharan admixture detected in the three southern European samples, and exactly what "methods that can infer admixture proportions in the absence of accurate ancestral populations" they used. Hopefully they will also extend their linkage disequilibrium analysis for the other populations besides Spaniards.
Characterizing the history of sub-Saharan African gene flow into southern Europe
Recent analyses of whole-genomeSNP data sets have suggested a history of sub-Saharan African ancestral contribution into southern Europe but not in northern Europe, consistent with previous analyses based on the Ychromosome and mitochondrial DNA. However, there has been no characterization of the proportion of African admixture in southern Europe, or of its date. Here we analyze data from ~450,000 autosomal SNPs in the Population Reference Sample, ~650,000 SNPs from the Human Genome Diversity Panel, and ~1.5 million SNPs from the HapMap Phase 3 Project, and studied patterns of correlation in allele frequencies across populations to confirm the evidence of African ancestry in many southern European populations but not in northern Europeans. Using methods that can infer admixture proportions in the absence of accurate ancestral populations, we estimated that the proportion of sub-Saharan African ancestry in Spain is 2.4 +/- 0.3%, in Tuscany 1.5 +/- 0.3%, and in Greece 1.9 +/- 0.7% (1 standard error). We also studied the decay of admixture linkage disequilibrium with genetic distance, which provided a preliminary estimate of the date of African gene flow into Spain of roughly 60 generations ago, or about 1,700 years ago assuming 28 years per generation. This date is consistent with the historically known movement of individuals of North African ancestry into Spain, although it is possible that this estimate also reflects a wider range of mixture times.Genome-wide patterns of population structure and admixture among Hispanic/Latino populations
In order to document genome-wide patterns of variation in Hispanics/ Latinos (HL’s) we genotyped individuals from five distinct populations recruited in the US: Mexico, Colombia, Ecuador, Dominican Republic and Puerto Rico. We present population structure results from an extensive genome-wide SNP dataset compiled by merging Affymetrix 500K and Illumina 650K data from these populations together with the Human Genome Diversity Panel, HapMap, Mao et al (2005), and POPRES studies. We apply Principal Component Analysis (PCA) and a clustering method, frappe, to infer admixture and genetic relationships of 262 HL individuals with 467 Africans, 715 Europeans, and 210 Native Americans comprising a total of 88 populations. We observe substructure within Native Americans, and, as expected, find that the admixed HL populations show Native American ancestry derived from local Native American populations. We find striking differences in estimated population-wide mean African, European and Native American ancestry proportions which are consistent with historical admixture and proximity to slave trade routes. The Dominican Republic and Puerto Rico, located on islands along slave trade routes, show high levels of African Ancestry (means 41.7% and 23.6% respectively) with less Native American Ancestry (11.5% and 18.9%). Colombians show a wide range of both African and Native American ancestry, though they have an overall mean of slightly higher Native American ancestry (36.3%) and lower African ancestry (11.7%) than the highly-African Dominicans and Puerto Ricans. Ecuadorians show the highest Native American mean ancestry (54.0%) with low estimated mean African Ancestry (7.3%). Mexico shows the largest range of Native American ancestry (11.0% - 79.0%) with an overall mean of 50.1% Native American ancestry and the lowest African ancestry (5.6%). Our study shows a broad range in admixture proportions across different HL individuals as well as different admixture patterns across populations. We also compare this genotype data with mtDNA and Y chromosome genotypes and use simulations to estimate ancient male and female sex ratios in each HL population. Lastly, we discuss implications of population structure for genome-wide association studies in admixed populations such as HL’s, especially when recruited in the United States.A new statistical method to infer population admixture events using genetic variation data
We present a novel statistical method that uses densely-spaced Single- Nucleotide-Polymorphism (SNP) data to identify the major admixture events occurring throughout a population’s history. The model has several advantages over leading available analytical approaches in this area, such as principal-components-analysis and STRUCTURE. In particular it can simultaneously (i) take advantage of the information inherent in patterns of linkage disequilibrium, i.e. non-random associations amongst neighbouring SNPs along a chromosome, (ii) efficiently analyse hundreds of individuals at hundreds of thousands of SNPs genome-wide, and (iii) allow for relatively straight-forward interpretation and direct inference of key historical parameters, such as the proportions and times of major admixture events. Using simulated data matched to currently available human datasets, we show that our model can identify and accurately date admixture events that have occurred between 7 and 150 generations ago. As our technique exploits the rich information in genetic data to infer details of a population’s admixture history, it marks a powerful complement to anthropological research and can help to resolve a number of existing controversies. We present results from applications of our model to two datasets: (1) SNP data from 22 distinct genetic regions for individuals from three chimpanzee populations in Africa; (2) genome-wide 650K SNP data for individuals from 53 world-wide populations of the Human Genome Diversity Panel (Science 319, 1100-1104). We highlight a number of intriguing new insights from these analyses. For example, the chimpanzee analysis showcases the model’s ability to infer the relative divergence among populations. The human analysis identifies several important admixture events, some of which are historically wellestablished (e.g. identification of recent European genetic influx into the Maya Native American population), others that can be placed into a clear historical context (e.g. an East Asian genetic influx into several Central and South Asian populations dated precisely to the era of the Mongol empire), and some that are to our knowledge novel (e.g. admixture in the Cambodian population between a Central/South Asian source and an East Asian source dated to around the period of the Cambodian Empire).Bayesian methods of estimating ancestry using whole-genome SNP data
Estimation of the genetic ancestry of an individual is useful for association studies, disease risk prediction, population genetic analyses and is of inherent interest for the individual themselves. We have investigated methods of estimating ancestry using whole-genome SNP data on each individual. We focus on the scenario where the goal is to determine ancestry in relation to a set of genotype or haplotype data that is available from a set of distinct source populations, for example, the HapMap 2, HapMap 3 or 1000 Genomes datasets. Inference in this setting can focus either on the estimation of global ancestry, in which an overall estimate of the proportion of ancestry from the source populations is needed, or local ancestry, which aims to partition an individual genome into distinct segments of ancestry from the source populations. We have compared 2 models based on the estimated allele frequencies in the source populations at a set of unlinked SNPs. Model 1 only models global admixture, whereas Model 2 models both global and local admixture. Using simulated individuals with differing proportions of CEU and YRI admixture (based on HapMap3 data) we find that there is a relatively small difference in the mean square error of the estimates of global admixture from the 2 methods (1.16 10-4 and 8.88 10- 5 respectively). Since Model 1 is much faster to fit that Model 2 these results suggest that Model 1 can be used to estimate the level of global ancestry, or at the very least will be useful as an initial estimate for use in Model 2. Further investigation is required to see how these results hold for more genetically similar source populations. In contrast, the mean square error for the estimates of local admixture from the 2 methods is 0.298 and 0.0861 respectively, suggesting that an explicit model of local ancestry is needed to carry out this level of inference. We are also investigating the utility and practicality of using linked SNP data to estimate global and local admixture.A detailed phylogeography of mtDNA haplogroup C1d: another piece in the Native American puzzle
Recent studies based on complete mitochondrial DNA (mtDNA) sequences revealed that two almost concomitant paths of migration from Beringia led to the dispersal of the first Americans (Paleo-Indians) approximately 15-17 thousand years ago (kya). This first expansion was followed by later more restricted diffusion events from the same dynamically changing Beringian source. Thus, five pan-American (A2, B2, C1, D1, and D4h3a) and four geographically confined (D2, D3, X2a, and C4c) mtDNA haplogroups represent the current female legacy of the ancient migratory events that gave rise to the native populations of the double continent. Regarding haplogroup C1, all its members appear to belong to one of three branches: C1b (characterized by the control-region transition at np 493), C1c, and C1d (with the control-region transition at np 16051). These three sub-haplogroups are found throughout the Americas, thus supporting the scenario that they most likely differentiated at the early stages of the Paleo-Indian southward migration. If considered as three separate founders, C1b, C1c, and C1d would bring the currently known number of native pan-American lineages to seven. As a whole, the C1 haplogroup has an estimated age of 17.0- 19.6 ky, while the three individual branches are dated 16.5-17.0 ky, 17.2- 17.6 ky, and 7.6-9.7 ky, respectively. The extremely young age estimate of C1d has been attributed, at least for the moment, to a major underrepresentation of C1d mtDNAs (only nine complete sequences published to date) in the current Native American mtDNA phylogeny. We have addressed this issue in the current study by completely sequencing more than 60 novel mtDNAs belonging to haplogroup C1d, which were carefully selected on the basis of both control-region variation and geographic/ethnic origin. Phylogeographic analyses have provided not only an accurate evaluation of the expansion time of C1d in the Americas, but also a detailed picture of its current distribution in both general mixed and indigenous populations.
Genetic diversity of European population isolates in the context of their geographic neighbors
Mapping traits in population isolates provides an opportunity to simplify the challenges of complex trait mapping because such populations likely have enhanced levels of linkage disequilibrium and reduced genetic heterogeneity for the underlying traits. Here we analyze high-throughput SNP genotyping data to compare genomic-scale patterns of variation in several European population isolates (Adygei, Basque, Orcadian, Roma from Slovakia, Sardinians, and Sorbs) and contrast their patterns of variation to geographical proximal populations. Our results reveal insights for the demographic history of each of these unique populations, suggest substantial variation among these population isolates in patterns of diversity, and highlight the importance of population selection in genome-wide association mapping.Incompatibility of current Finnish mitochondrial diversity with simulations of assumed settlement history
Traditionally, geneticists studying Finnish population history have assumed a model where Northern and Eastern Finland were mostly uninhabited until the 16th Century A.D. and were then settled by small family groups from South-Western Finland. The reduced genetic diversity and the distinct Finnish disease heritage are seen as consequences of these founder effects. Y-chromosomal diversity is indeed reduced in the present population, especially in the eastern parts of the country. However, mitochondrial diversity is not heavily reduced compared to South-Western Finnish or other European populations. This discrepancy has been explained with the higher mitochondrial mutation rate having restored mitochondrial diversity in these populations since the founder effects.
In our view it seems unlikely that even with high mitochondrial mutation rates mtDNA diversity could be restored over a mere 17 generations after the alleged tight bottlenecks. Archaeological evidence also suggests a different settlement history, e.g. settlement beginning in South-Eastern instead of South-Western Finland.
In this study we use simuPOP, a state-of-the-art forward simulation tool, to simulate datasets corresponding to Finnish mitochondrial diversity under the traditional model and compare them with actual present-day Finnish data. We show that current mitochondrial variation is unlikely under this model, increasing the credibility of alternative hypotheses.On the borderline between the east and the west: the maternal genetic background of Karelians
Introduction: The frontier between Finland and Russia represents one of the most conspicuous socioeconomic gaps in the world. Based on the mean gross national product, there is a ten-fold difference between Russian Karelian Republic and Finnish Karelia. Otherwise these populations share the same geophysical environment. For these reasons, Karelia has been a very interesting field of research for multifactorial disease studies. However, this area has undergone many demographic incidents, such as wars and famine, which may cause local differences in the gene pool. In this study, we wanted to elucidate the maternal genetic background of Karelians. Materials: Blood samples were collected from healthy unrelated individuals without known foreign background from four Karelian districts; Aunus(n=218), Viena(n= 87), Tver(n=61) and Finnish Karelia (n=70), The sample collection was performed according to the Basic Principles of the Declaration of Helsinki. Methods: The entire mitochondrial DNA was sequenced in 32 reactions per sample with the BigDye® Terminator v3.1 Cycle Sequencing Kit in the Applied Biosystem’s 3730 Genetic Analyzer sequencing machine. Sequence alignments were made by the SeqScape® Software, Version 2.5 (Applied Biosystem). Results: Haplogroup H was very common in all populations. However, H1a is almost absent in Finnish Karelia. Also U and its subhaplogroups were common. Specially U5b1b1 reached over 16% in Viena Karelians. U4 was most common among Tver Karelians. Conclusions: The maternal genetic background seem to be complex in this area. There is clear regional differences. Also there is solid evidence of gene flow from various sources. Representation of the clearly Asian haplogroups is strikingly low.Genetic Landscape of Eurasia Viewed from Large Allele Frequency Differences.
The diversification leading to modern human populations in Eurasia is one of the most important topics in the study of human expansions after leaving Africa. Most studies of Eurasia populations have used either limited markers or involved insufficient population coverage. We chose 68 markers based on large allele frequency differences among a few Eurasian populations and then typed them on 1766 individuals from 34 populations representing all subdivisions of Eurasia. Analyses using the STRUCTURE program showed a clinal east-west division when K=2, with a median border dividing Central Asia along the Ob River, the Kazakh highland, the western side of Pamir Mountains, and the southwestern side of the Himalayas. We fit curves to the STRUCTURE loadings using distances of the population coordinates from the median border. The genetic structure changed dramatically only within 2000km on each side of the border. At higher values of K the western populations of East Asia are the first to be distinguished (at K=3): Mongols, Tibetans, Qiang, and Baima, are most distinct from the more eastern populations. At K=4 Southwest and South Asians are distinguished from the Europeans; At K=5 Southeast Asians and at K=6 Central Asians are successively distinguished from eastern East Asians. Several more isolated populations such as Samaritans, Atayals, or Micronesians were distinguished in different independent runs when K=7 providing no clear anthropological information. South Asians were always clustered with Southwest Asians with pronounced similarity to Central Asians. The failure to distinguish South Asians maybe due to the selection of the markers with large allele frequency differences specifically between Europeans and East Asians. We also tested for statistical differences in the allele frequencies for all pairs of clusters when K=6. The results showed significant borders (P less than 0.0001) including those between western East Asians and eastern East Asians or Central Asians; however, insignificant borders were observed between Southwest Asians and Southeast Asians or western East Asians, neither was between Central Asians and eastern East Asians. This indicates substantial gene flow in North Asia between eastern East Asians and Central Asians, and in South Asia between South Asians and Southeast Asians. Using increased population and marker coverage, this study helps to understand the details of genetic diversity and landscape of Eurasians.
Dairy intake associates with the IGF2 rs680 polymorphism to height variation in Greek children. The GENDAI study
Objective: Height is a classic polygenic trait with a number of genes underlyingits variation. We evaluated the prospect of gene to diet interactions ina children cohort for the IGF2 rs680 polymorphism and height variation.Methods: We screened 795 peri-adolescent children (424 females) aged10-11 years old from the (Gene and Diet Attica Investigation; GENDAI)paediatric cohort for the IGF2 rs680 polymorphism. Results: Children homozygousfor common allele (GG) were taller (148.9 ± 7.9 cm) comparing tothose with the A allele (148.1 ± 7.9 cm), after adjusting for age, sex, anddairy intake (β±SE: 2.1± 0.95, p=0.026). A trend for interaction for theIgfrs680xdairy intake is also revealed (p=0.09). Stratification by IGF2 rs680genotype revealed a positive association between dairy products intakeand height only in A allele carriers, adjusted for the same confounders(standardized β=0.111, p=0.014). When dairy intake was classified, basedon the median value, into two equal groups of low (1.9 ± 0.7 servings/day)and high dairy products intake (4.4 ± 1.5 servings/day), it was found thatin A allele children high dairy eaters were significantly taller (p=0.05) comparedwith low dairy eaters (148.8 ± 7.9 cm vs 147.4 ± 7.7 cm respectively,adjusted for age and sex). Conclusion: A higher consumption of dairy productsassociated with increased height depending on the rs680 IGF2 genotype.Thus, exploring height variants and elucidating possible interactionswith environmental factors like diet could help us to designA Non-synonymous HNF4A Variant is Associated with Glycemia During Pregnancy and Offspring Head Circumference in Populations of European Ancestry in the HAPO Study
The Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study is a multicenter, international study, which examined the association of maternal glucose levels with fetal growth and outcome in 25,000 pregnant women from multiple ethnic groups to demonstrate a continuous relationship between maternal glucose measures and birth size throughout the range of glucose concentrations. We hypothesize genetic factors contribute to these phenotypes, and examined 1536 fetal and maternal SNPs in 79 candidate loci previously implicated in insulin secretion or sensitivity to determine associations with maternal glycemia and insulin secretion (fasting glucose and Cpeptide and 1-hr glucose from the OGTT) at ~28 weeks gestation and/or offspring size at birth (birth weight, length, head circumference, and sum of skinfolds) for HAPO mothers of European (Belfast and Manchester, UK, and Brisbane and Newcastle, Australia; N=3828) and Asian (Bangkok, Thailand; N=1813) ancestry and their offspring. Associations were assessed through linear regressions with the single trait/outcome under an additive genetic model adjusting for known confounders. Among our strongest signals was rs1800961G>A, which encodes a Thr>Ile amino acid change in exon 4 of HNF4A, recently identified in a GWAS meta-analysis as a variant associated with decreased HDL levels. In the HAPO study, this SNP was strongly associated with increased fetal head circumference (0.5cm [95%CI: 0.3-0.7] per maternal minor allele; P=1.2x10-7) in those of European descent. The maternal minor allele was also weakly associated with 1-hour glucose (4.3mg/dL [95%CI: 0.5-7.9]; P=0.03), birth length (0.7cm [95%CI: 0.2-1.1]; P=0.003), birth weight (52.6g [95%CI: -8.0-113.3]; P=0.09), and sum of skinfolds (0.3cm [95%CI: -0.1-0.6]; P=0.13). This same minor allele in the fetal genome was weakly associated with cord C-peptide (0.1ug/dL [95%CI: 0.01-0.22]; P=0.03), and head circumference (0.2cm [95%CI: -0.1-0.4]; P= 0.08). The same trends were observed among the Thai, although not significantly probably due to a reduction in power from the low risk allele frequency (<2%).>
In a recent study, Heyer used germline mutation rates to estimate time depth, so I am more inclined to take her dates at face value than in papers which used "evolutionary" rates. It will be interesting to see which Y-chromosome types the authors associates with the both the older and recent expansions.
Super Y-chromosomes in Eurasia and the impact of social selection and Neolithic transition
Some Y-chromosomal haplotypes have been found at unusually high frequenciesin Asian and European human populations. The massive spreadof these lineages has been explained by the impact of social selection i.e.the high reproductive success of some males and their relative/descendantsdue to their high social status. The most well-known examples are the “Khanhaplotype” and the “Manchou haplotype” in Asia, and the U’Neill haplotypein Ireland. But are these frequent haplotypes always associated with recentevents of social selection, or could they be linked to much older processes?To address this question, we have surveyed ~ 3500 males in 97 populationsfrom Turkey to Japan. We have focused on the 12 most frequently representedhaplotypes in Eurasia and tested whether their expansions are linkedto a specific factor such as language or subsistence methods. Our resultsshow that both recent and ancient processes are responsible for the expansionsof these lineages. The recent expansions (2000-3000 years) likely tobe linked to social selection are prevalent in Altaic-speaking and pastoralpopulations. This might indicate a recent cultural change in the social organizationof these populations. The ancient expansions (8000-10000 years)are over-represented in Indo-European speaking and sedentary farmer populations,and are likely to be the result of the Neolithic transition.
Lactase Persistence; Multiple causal mutations in sub-Saharan pastoralists
Background Milk is the primary source of nutrition for newborn mammals, including humans. The majority of human adults, estimated at approximately 65%, are unable to digest lactose (the main carbohydrate in milk) effectively since lactase expression is down-regulated after weaning, as it is in other mammals. In some humans however, lactase expression persists into adulthood (lactase persistence, LP) allowing adult consumption of milk from other species, and the frequencies of this trait vary throughout the world. A C-T SNP -13910 bases upstream from the lactase gene (LCT) is associated with LP in Europe. The -13910*T is rare in milk drinking groups in Africa although two other variants (-13915*G, -14010*C) have been shown previously to be significantly associated with LP and in an accompanying abstract (Ingram et al) we confirm a third locus (-13907*G) and present a fourth candidate SNP. However some LP individuals have also been identified who carry none of these alleles. Aims To examine the distribution across Africa of these and other allelic variants; to examine other regulatory regions in population groups in which enhancer alleles are lacking. Results The geographic and ethnic distribution of -13907*G, -13910*T, -13915*G, -14009*G, and -14010*C in 10 different countries and 15 distinct ethnic groups across Africa (n=1221 individuals) is presented here. Several other variants in this enhancer region are also described here for the first time. These tightly clustered enhancer variants are more frequent in pastoralist milk drinking groups than agriculturalist populations and are associated with several different LCT core haplotypes. Two further candidate regulatory regions have been sequenced in the same populations including a 1000bp region immediately upstream from LCT where novel variants have been found. Conclusions The data support the notion that many different mutations do have a functional role in LP, and that the trait has arisen independently several times, being subject to the positive selection conferred by the increased ability to digest milk lactose by people in pastoralist societies.Extreme Evolutionary Disparities Seen in Positive Selection Across Seven Complex Diseases
Genome-wide association studies (GWASs) have successfully illuminated disease-associated variation. But whether human evolution is heading towards or away from disease susceptibility remains an open question. We analyzed the seven diseases studied by the Wellcome Trust Control Case Consortium (WTCCC), to calculate the relative selective pressure at every significant loci. Results reveal striking differences between the seven studied diseases. We find evidence of recent positive selection in favor of alleles increasing the risk of Type 1 Diabetes (T1D), Crohn’s Disease (CD), Hypertension (HT), Rheumatoid Arthritis (RA), and Bipolar Disorder (BD). Riskassociated alleles (defined as the allele most strongly associated with disease among associated SNPs) for Type 2 Diabetes (T2D) fall largely within the random neutral region, and Coronary Artery Disease (CAD) shows less positive selection than expected by random. When only protective alleles are considered (defined as the allele least strongly associated with disease among associated SNPs), we find that SNPs only associated with T1D, CD, and RA appear to exhibit significant signatures of positive selection. There is significant asymmetry in the 96 SNPs strongly associated with T1D (pvalue ≤0.005) showing strong signs of positive selection, with 79 SNPs selecting for the risky allele, and only 17 SNPs selecting for the protective allele. Furthermore, selection patterns of Coronary Artery Disease (CAD) fall far below the expected levels of random, implying stable allele frequencies. Results reveal the evolutionary trajectories of T1D and CD favor risk alleles, possibly due to their simultaneous role in protection from infectious diseases. These results inform on current understanding of disease etiology, thus aiding efforts to discover novel approaches to disease treatment and prevention.Detecting Natural Selection in the Human Genome from Pilot1 Data in the 1000 Genomes Project
Identifying signatures of natural selection in the human genome is of fundamental implication for the study of population evolution and for the biomedical research. The distribution of selection in genome will provide important functional information. Natural selection modify the level of variability within and between populations and shapes the pattern of genetic variations in the genome. Genetic variation in genome is the raw data for detection of natural selection. The 1000 Genomes Project produces whole genome sequencing data and offers a unique and great opportunity to scan the genome for signature of natural selection. Five statistics: Tajima’D, Fu and Li’s F, Achaz’s Y, Fay and Wu’s H and Zeng et al.’s E (based on comparing the site frequency spectrum within population) and Fst statistic (based on the measure of population subdivision) were applied to Pilot 1 data in 1,000 genome project to scan the entire genome for detection of selection, where 344 chromosomes from ASI, CEU and YRI were sequenced. A total of more than 20 million of variant sites, 4.8 millions common in three populations were identified. We calculated seven statistics in 10 kb and 100 kb windows across the genome for each population and obtained their empirical distributions. Results show that two kinds of windows analyses lead to the similar distributions. The proportional rank of the test statistic in a particular window compared with the overall empirical genomic distribution was taken as empirical P-value for that window. We identified 3,046 candidate selection regions in ASI population, 2,015 selection regions in CEU, and 2,204 selection regions in YRI at 5% empirical significance level in 10 kb by five statistics based on differences in frequency spectrum. Among 457 candidate genes of selection reported from PubMed, we detected 102 selection genes in ASI, 53 selection genes in CEU, and 101 selection genes in YRI and 11 selection genes common in three populations by familiar Tajima D test. By comparison we obtained 3.9 million SNPs and the whole genome’s fixation index about 0.10~0.11. By compared with the empirical genome-wide distribution of FST, we identified 5, 278 candidate selection regions at an empirical significance level of 2.5% from each of the 22 autosomal chromosomes. Among 581 identified selection regions by FST which were reported from literatures, we found that 294 selection regions overlap our results.Genomic Landscape of Positive Natural Selection in North European Populations
Analysing genetic variation of human populations to detect loci that have been affected by positive natural selection is important for understanding adaptive history and phenotypic variation in humans. In this study, we analysed recent positive selection in Northern Europe from genome-wide datasets of 250 000 and 500 000 single nucleotide polymorphisms in a total of over 1000 individuals from Great Britain, Northern Germany, Eastern and Western Finland, and Sweden. Coalescent simulations were used to demonstrate that the integrated haplotype score (iHS) and long-range haplotype (LRH) statistics have sufficient power in genome-wide datasets of different sample sizes and SNP densities. Furthermore, the behavior of the FST statistic in closely related populations was characterized by allele frequency simulations. In the analysis of the North European dataset, dozens of regions in the genome showed strong signs of recent positive selection. Most of these regions have not been discovered in previous scans, and many contain genes with interesting functions (e.g. RAB38, INFG, NOS1AP, and APOE). In the putatively selected regions, we observed a statistically significant overrepresentation of genetic association to complex disease, which emphasizes the importance of the analysis of positive selection in understanding the evolution of human disease. Altogether, this study demonstrates the potential of genome-wide datasets to discover loci that lie behind evolutionary adaptation in different human populations.Evidence of Indigenous American specific selection in skin pigmentation genes
Recent studies of selection in human pigmentation genes have focused on Old World populations, neglecting the evolutionary changes that have occurred in Indigenous American populations since their migration into the Americas. Previous research shows correlations between Indigenous American ancestry and skin pigmentation variation, suggesting a genetic role in the determination of skin pigmentation among these populations. However, few genes contributing to these differences have been described. To identify genes that may have undergone Indigenous American specific changes, this work examines signatures of selection in 82 pigmentation candidate genes by genotyping 88 indigenous individuals from Central and South America using the Affymetrix Genomewide Human SNP Array 6.0. The resulting 906,600 single nucleotide polymorphisms (SNPs) were surveyed for signatures of selection in the Indigenous American populations compared to the HapMap Phase I populations. Evidence of selection was identified using four measures selected for the complementarity of their approaches, including the reduction in heterozygosity (lnRH), Locus-Specific Branch Length (LSBL), Tajima’s D, and by examination of the haplotype block structure. When computing lnRH and LSBL as well as when examining changes in haplotype frequency, the East Asian and European HapMap populations were included because they are the most closely related populations available. These analyses differentiate the selective changes that appear to be shared among East Asian and Indigenous American populations from those that are unique to the Indigenous American populations. For each test, the top5%of the empirical distribution of results was examined and pigmentation genes falling in this tail of the distribution were considered to show statistically significant evidence of selection. Based on these analyses, 12 genes - ADAM17, POMC, AP3B1,OPRM1, SILV, OCA2/HERC, PLDN, MYO5A, RAB27A, CYP1A2, ATRN, and ASIP - show evidence of selection unique to the Indigenous American populations. Many of these genes have known functional roles in melanogenesis and suggest potential pathways responsible for the observed differences in skin pigmentation between Indigenous American and Old World populations.Patterns of correlation between genetic ancestry and facial features suggest selection on females is driving differentiation.
Human facial features show extensive variation within and among populations. By investigating the relationship between dimorphism in facial features and genetic ancestry in different populations, we can explore the roles of sexual and natural selection on the human face. We measured sexual dimorphism in facial traits while controlling for the effects of overall size differences and then tested for interactions between sex and genetic ancestry. The study sample consists of 254 subjects (n=170 females, n=84 males), ages 18-35, showing West African and European genetic ancestry sampled in the United States and Brazil. Maximum likelihood genetic ancestry estimates were determined from 176 ancestry informative markers (AIMs), which allowed for the proportional estimation of genetic ancestry from four parental populations (West African, European, East Asian, and Native American). Three-dimensional photographs of faces were acquired using the 3dMDface imaging system (Atlanta, GA). 22 standard anthropometric landmarks were placed on each image and XYZ coordinates were collected. All 231 possible pairwise inter-landmark distances were calculated and then log transformed. Using the pairwise distances, we tested whether some distances were larger in one sex than the other, having taken size into account, in a) African Americans sampled in the United States, b) Brazilians sampled in Brazil, and c) the combined African American and Brazilian sample. We found that several pairwise distances differed between the sexes. For example, the distance from the brow to nasal bridge was found to be more than 5% larger in females than males. We then tested for an interaction between sex and genetic ancestry by testing for differences in the slopes of the ancestry association between males and females. Although the pattern differed slightly between samples, after Bonferroni correction many correlations were the found to be same in both sexes. However, females in all three samples had many additional significant correlations that were not seen in males, while males had very few correlations that were not found in females. The results of these analyses suggest that selection on females is driving the differentiation in facial features among populations.Effect of natural selection on North Asian mitochondrial haplogroup variation
The human mtDNA exhibits striking, region-specific sequence variation. The regional distribution of mtDNA haplogroups have attributed either to genetic drift assisted by purifying selection (Elson et al., 2004; Kivisild et al., 2006; Ingman, Gyllensten, 2007) or to an adaptation to different climates (Mishmar et al., 2003; Ruiz-Pesini et al., 2004). In an attempt to study the mode of selection in mtDNA variation in human populations we sequenced and analyzed 211 complete mtDNA sequences belonging to haplogroups A, C and D accounting in total for 49.3% of mtDNA lineages in North Asia. The North Asian haplogroups A, C and D showed a highly significant deviation from the standard neutral model as well as a bell-shaped distribution of pairwise differences consistent with rapid population expansion. To determine the overall importance of selection in shaping human mtDNA variation we calculated Ka/Ks ratio both for aggregated mtDNAs and for 13 proteinencoding genes within particular haplogroups (A, C and D). We have found a prevalence of Ks over Ka within haplogroups A, C and D indicating the influence of negative selection on mtDNA during evolution. Consistent with some previous reports we have found the Ka/Ks ratio for the ATP6 gene to be the highest among the North Asian sequences suggesting thereby that this gene has been subject to positive selection. We have also observed a set of genes with a somewhat higher Ka/Ks ratio relative to other mitochondrial genes - CO2 for haplogroup A, ND3 and ND4 for haplogroup C. Meanwhile the other approach taking into account the difference in NS/S ratios between the haplogroup-associated and private substitutions (Elson et al., 2004) shows the significant departures from neutrality only for haplogroup D and its subhaplogroup D4. Furthermore single gene analysis reveals the relatively strong influence of negative selection only in CYTb gene within haplogroupD(p=0.011, NI=14.1). In general, our results indicate that there is an evidence for both gene-specific and lineage-specific variation in selection acting on North Asian mtDNAs.Selection for blue eyes in Europe and light skin pigmentation in East Asia at OCA2/HERC2
OCA2 and HERC2 are two genes on chromosome 15 separated by lessthan 10 kb. Mutations in this region have been shown to have an effect onpigmentation including causing oculocutaneous albinism type 2. In Europeans,a three SNP haplotype (rs4778138, rs4778241, rs7495174) and threeindividual SNPs (rs12913832, rs916977, rs1667394) have been associatedwith blue eyes. We have labeled the three SNP haplotype BEH1. We foundthat the first individual SNP, rs12913832, was in near complete LD withanother SNP (rs1129038). Wetreat these two SNPs together as a haplotype,BEH2. We also found that the other two individual SNPs were actually innear complete LD with each other and decided to label them BEH3. In EastAsians, a SNP (rs1800414) has been identified that is associated with alight skin pigmentation phenotype. We typed these eight SNPs in 64-70population samples. We then examined worldwide distribution of the fourpigmentation alleles. We saw that the light skin allele was at its highestfrequency in eastern East Asia, at midrange frequencies in Southeast Asia,and at lower frequencies in western East Asia. It is virtually absent from therest of the world. BEH1 and BEH3 show very similar global patterns, lowfrequencies to midrange frequencies in Africa and East Asia, midrangefrequencies in India and Eastern Siberia, and midrange to high frequenciesin Southwest Asia, Europe, Western Siberia, the Pacific Islands, and theAmericas. BEH2 shows a different pattern from the other two. It showslow frequencies in East Africa, India, Eastern Siberia, and the Americas,midrange frequencies in Southwest Asians and Southern Europeans, andhigh frequencies in Eastern and Northwestern Europe and Western Siberia.We then typed additional SNPs and test each pigmentation allele for selectionusing the Relative Extended Haplotype Homozygosity (REHH) test. Wefound that the light skin allele of rs1800414 is under selection in East Asiaand that the blue eye allele of BEH2 is under selection in Europe andSouthwest Asia. We show light skin pigmentation has been selected for inEast Asia. This is likely due to lower UV exposure at the higher latitudes(compared to equatorial Africa) and the need for lighter skin for vitamin Dproduction. We also show that blue eyes are selected for in Europe. Thisis most likely due to sexual selection, though another unknown effect of thisparticular allele could be selected for and the blues eyes are a side effect.Ancestry variation along the genome in Latin American populations and implications for recent natural selection
Latin American populations stem from the admixture starting about 500 years ago of Europeans, Africans and Native Americans. Extreme deviation in ancestry estimates at certain genome locations (relative to the genomewide average) could reflect the action of recent natural selection. We evaluated the distribution of ancestry estimates along the genome using 678 microsatellite markers in 249 individuals sampled from 13 admixed populations across Latin America. We found a significant deviation in ancestry at two genomic locations with more than four times standard deviations from the genome-wide mean: an excess of European ancestry at 14q32 (Zscore = 4.14), and an excess of African ancestry at 6p22 (Z-score = 4.71). These deviations in ancestry were observed in the analysis of the combined dataset as well as in most of the individual populations examined. We showed that our findings are robust to the Native American ancestry populations used. We discussed the implications for recent natural selection in the context of the unique history of the New World, as well as the possibility of artifacts.