September 05, 2009

More ASHG 2009 abstracts

For the first part, see here.

See Part I for another study on Ashkenazi Jews. We will have to look at the details of the study when it comes out, but the fact that Ashkenazi Jews are (a) between southern Europeans and Near Easterners, and (b) form a distinct cluster of their own at K=3 seems to support my theory that most of the European ancestry in Jews is of ancient origin in southern Europe rather than due to recent admixture with Central/Eastern Europeans: at K=2 the ancestral components are identified, but these components mixed a relatively long time ago, so that after a subsequent period of relative isolation, a distinctive pattern was formed out of the mixture which is identified at K=3.

Genome-wide SNP analysis of Ashkenazi Jews reveals unique population substructure
The Ashkenazi Jews (AJ) are a genetic isolate that has been widely utilized in genetic studies of both mendelian and complex disorders. However, the genetic variation and population structure of the AJ have been previously investigated with relatively few individuals and few genetic markers. We have now genotyped a large AJ cohort with the Affymetrix 6.0 genome-wide SNP array. After strict quality control filters, genotype data at 775K SNPs in 466 unrelated AJ individuals were available for analysis. To investigate the genetic structure of the AJ relative to other populations we used principle components analysis (PCA) as well as the frappe clustering algorithm. When merged with the worldwide Human Genome Diversity Project dataset, PCA shows the AJ are distinct from all other groups, including both European and Middle-Eastern populations. Further PCA using AJ genotypes combined with a large European dataset again validates the separation of AJ from European populations. Interestingly, principle component one seems to largely separate European and Middle-Eastern populations geographically according to latitude with the AJ fitting South of Europe and North of the Middle-East. Additional analysis using the frappe population clustering algorithm is consistent with a unique population signature for the AJ. Limiting the frappe clustering to only two population groups, specifying k=2, reveals that AJ cluster more closely to Europeans than Middle-Eastern populations but when allowing three populations, k=3, AJ form a group distinct from both the Middle-East and Europe. Compared to European populations, AJ also show an increase in genome-wide linkage disequilibrium, consistent with possible founder effects. These findings will aid in the design and use of AJ in case-control and association studies and clearly demonstrate the genetic separation of AJ from other populations.
Another paper on the topic, albeit one which uses HLA haplotypes to infer admixture and is limited to Jewish/Central European admixture.

Admixture between Ashkenazi Jews and Central Europeans
When distinct populations inhabit the same geographic space, culture often acts to restrict random mating in our species, while at the same preventing complete genetic privacy. The residency across Central Europe by the Ashkenazi Jews over the last thousand years is such a case. HLA typing from bone marrow donor registries in Israel, Poland and Germany were utilized to measure admixture between central European host populations and Ashkenazim. Inferred high resolution HLA A-B-DRB1 haplotype frequencies were generated from each population. A total of 1,676 Polishorigin- Ashkenazim and 13,556 Polish haplotypes were analyzed, along with a similar sample of ~5 million German haplotypes. The informativeness of HLA haplotypes is shown by the A-B-DRB1 haplotype 0101-0801-0301, the most common haplotype found in northern Europe. HLA B*0801 bearing haplotypes are present in the Near East, but those B*0801 haplotypes carry the HLA C allele Cw*0702 instead of the Cw*0701 found in 0101-0801- 0301. The 100 most common haplotypes constituted 53% of the total Ashkenazi, and 45% of the Polish, and 43% of the German samples, reflecting the sizeable total fraction of very rare haplotypes familiar in population samples of the diverse HLA system. The most common Ashkenazi haplotype had a frequency of 6.14% (n = 102.9) and the 100th haplotype was present at 0.29% (n = 4.86). Comparable values for the Polish sample were 5.83% (n = 790.3) and 0.13% (n = 17.6), respectively. Haplotypes from one population compared to those haplotypes in a second could be classified into three categories: less frequent, statistically identical or more frequent. In the graph of the ordered 100 Polish haplotypes, the less frequent Ashkenazi haplotypes supply a possible signature of admixture from the Poles into the Polish Ashkenazim, while the haplotypes more frequent in Ashkenazim than Poles are candidates for movement of genes from the Ashkenazim to the Poles. The averaged frequency differences between these categories give an indication of population admixture. The analysis showed that 1.8% of Polish haplotypes may be of Ashkenazi origin and 0.6% of Ashkenazi of Polish origin. The sample from Germany, in which the initial generations of Polish- Ashkenazi history was spent, was useful in demonstrating consistency of haplotype frequencies by rank order. The results show clear evidence of admixture occurring in both directions between two largely HLA-distinct populations.

The following study demonstrates a point I have argued several times before with Afrocentrists, namely the intermediate genetic position of Ethiopians between Caucasoids and Sub-Saharan Africans. It also underscores the difference between social and biological classifications: Ethiopians are undoubtedly "socially" black in most other societies, but intermediate between Negroids and Caucasoids anthropologically. This reality was recognized even by early anthropologists who coined the term of Ethiopids to describe them as a separate intermediate category between Caucasoids and Negroids.

The distribution of sex-specific human genetic variation in Ethiopia.
Ethiopia has been proposed as a candidate location for the emergence of anatomically modern humans, and the source region for the expansion out of Africa. It is also a region of substantial cultural diversity as expressed in languages (Nilo-Saharan, Cushitic, Semitic, and Omotic language families), religions (Christians, Jews, Moslems and Animists), ethnic identities (over 80 groups) as well as many marginalised groups socially excluded on grounds of caste-like occupation, supposed origin, or both. The demographic history of Ethiopia over the past several thousand years has involved both sustained migration of Semitic speakers from the Arabian Peninsula as well as internal conquests of lands in the south. To investigate the demographic histories of ethnic groups we analysed a battery of SNPs and microsatellites on the non-recombining portion of the Y chromosome (NRY) and sequence variation in the Hypervariable Segment 1 (HVS1) of mtDNA (5756 samples from 45 ethnic groups). Commonly used summary statistics (gene diversity h, genetic distance Fst) were analysed within the context of non Ethiopian data e.g. West Africa (Igbo, Nigeria) and Europeans. We present preliminary results reporting a wide range of genetic diversity values within ethnic groups (h: NRY = 0.743 - 0.972, HVS1 = 0.962 - 0.996) and pairwise genetic distance values between groups (Fst: NRY = 0.000 - 0.294, HVS1 = 0.000 - 0.035). A clustering of Ethiopian groups was observed when using principal coordinate analyses with genetic distances, appearing midway between a West African Niger-Congo speaking group (Igbo of Nigeria) and an Indo- European speaking group (Greek Cypriots). Some south-western groups (e.g. Anuak) showed greater similarity to West-Africans while the culturally influential Amhara were more similar to Europeans. Gene flow between dominant Dawuro agriculturalists and excluded members of the Manja was sex-biased, with many more NRY haplotypes common to the two groups than mtDNA haplotypes, relative to the distribution of the two systems across all the ethnic groups. The marginalised group had a particularly low level of mtDNA HVS1 diversity (h = 0.705). Of particular interest is the extensive sharing of discriminating NRY and mtDNA haplotypes across many ethnic groups, suggesting either a) the creation or preservation of cultural diversity despite substantial inter-group gene flow or b) recent ethnogenesis of the currently extant groups.
Yet another study of differences between ancient and modern mtDNA gene pools. I hope the 2012 crowd doesn't follow up on this for its own bizarre purposes...

Genetic Diversity of the Ancient People in Mesoamerica
DNAs were extracted from the human remains buried in the Moon Pyramidat archaeological Teotihuacan site in Mexico. Nucleotide sequences of theirmitochondrial D-loop and SNP sites were determined by the PCR-directsequencing. To reveal the genealogy of mitochondrial DNA sequences ofthe individuals buried in the Moon Pyramid and assess their positions amongNative Americans, we first constructed a network of the mitochondrial DNAfrom the contemporary Native Americans; the northern Native Americans(Haida, Bella Coola, and Nuu Chah Nulth), the central Native Americans(Huetar, Kuna, and Ngöbé), and the southern Native Americans (Yanomami,Zoro, Gavião, and Xavante), and compared them with those of the individualsfrom the Moon Pyramid. All of the mitochondrial DNA types from the MoonPyramid individuals were unique, and clear genetic affinities were notobserved between the Moon Pyramid individuals and any of the 10 NativeAmerican populations. To investigate genetic diversity among the contemporarycentral Native American populations, we constructed a phylogenetictree of their mitochondrial DNA sequences using the neighbor-joiningmethod. There was a major mitochondrial DNA sequence common to thesethree central Native American populations. However, there were a relativelysmall number of mitochondrial DNA types in each population, most of whichwere, moreover, unique to each Native American population. Next we comparedthe mitochondrial DNA sequences of the Moon Pyramid individualswith those of the ancient Mesoamerican people, ancient Maya people fromthe classic Copán site. We also used Huetar people as a reference for thecontemporary central Native Americans. The distribution of the mitochondrialDNA types found in the ancient Native Americans is greatly different fromthat found in the contemporary Native Americans. These results show thatgenetic diversity in the ancient Native Americans was not as low as that inthe contemporary Native Americans, suggesting an occurrence of bottleneckin the past.
This will be of great interest to Y chromosome enthusiasts.

Improved resolution of the human Y-chromosomal phylogeny using
targeted next-generation sequencing

The non-recombining part of the Y chromosome provides unique insights into male-specific aspects of human genetics and history. We are using next-generation Illumina sequencing to fully re-sequence targeted regions of the Y and resolve the Y-chromosomal phylogeny by characterization of additional single nucleotide polymorphisms (SNPs) on lineages of interest. Initially ~6 Mb of Y sequence (NCBI36:Y-chromosome: 12,308,579- 18,230,132) is being generated for an African haplogroup A male. The strategy involves sequence enrichment by long template PCR of genomic DNA (10-20 ng/reaction) using overlapping fragments of 5.5 - 6.5 kbp. Currently ~70% of primer pairs work using a standard touchdown PCR protocol. Fragments obtained from a single individual are pooled and used for library preparation and IIlumina sequencing. Re-sequencing generates accurate high coverage data; SNP calling and their subsequent validation will be presented. Most SNPs are expected to be rare but some are likely to resolve deep divisions within African populations. Subsequently, we aim to (1) determine the time depth of the human Y phylogeny, (2) resolve multifurcations in the major lineages by discovering additional SNPs on the relevant and (3) discover SNPs that mark any lineage of particular interest. In addition, we will be able to provide a subset of all primers that work well with this protocol to investigators who are interested in Y-chromosomal phylogenies so that comparable standard datasets can be generated for use by the community.
Female to male breeding ratio in the history of modern humans
Was the genetic contribution of men and women to successive generations the same? As a population, did we have fewer fathers than mothers? Was polygyny present among hominid lineages to influence relative divergence rates of autosomes and sex chromosomes? Students of genetic variation of the uniparentally inherited mitochondrial and Y-chromosome DNA confronted these questions, fewer addressed it by looking at the DNA diversity of autosomes and sex chromosomes (Hammer et al. 2009, Keinan et al. 2009) with equivocal results. Our approach is different: we analyzed the ratio of the population recombination rate, ρ, between autosomes and the X chromosome. The chromosome X recombines only in the female meiosis whereas autosomes undergo cross-overs in both male and female germ lines such that their relative ρ reflects changes in the breeding ratio, β. The estimate of β is calculated from the observed chromosomal ρ’s, obtained by InfRec (Lefebvre and Labuda 2008), after their calibration with the average chromosomal recombination rates known from pedigree data. We have tested our approach using coalescent simulations under different input parameters’ values and various demographic scenarios. For the HapMap populations we obtained β of 1.4 in Yoruba from West Africa, 1.2 in European and 1.0 in East Asian samples. This suggests that in the history of modern humans the reproductive variance between men and women did not drastically differ, thus consistent with the prevalence of monogamy or mild polygyny in the human lineage. Known incidences of polygyny may be of recent origin, related to raise of agriculture and shift from hunter-gathering to food producing economies, and therefore not sufficiently common to leave a strong genetic signature in the recombinational record. (Supported by GenomeQuebec/Genome Canada and Canadian Institutes of Health Research).

Accurate inference of individual ancestry geographic coordinates
within Europe using small panels of genetic markers

The study of genomewide datasets of thousands of individuals of European ancestry supports the close correspondence between genetic distances and geographic coordinates within Europe, especially when information from hundreds of thousands of genetic markers is used. In fact, Principal Components Analysis (PCA), summarizing genetic variation over the top two principal components (PCs), results in plots that are surprisingly reminiscent of geographic maps of Europe. We set out to discover those markers that are most closely correlated with geographic origin, seeking to predict individual ancestry at a fine level, and even for closely spaced populations. To this end we analyzed a previously described subset of the Population Reference Sample (POPRES). We focused on 12 populations and 1224 individuals for which geographic coordinates (longitude and latitude) of individual origin are given for at least 20 individuals per population. First, we performed a complete leave-one-out crossvalidation experiment using 447,212 SNPs, and a simple nearest neighbors approach to infer geographic coordinates. This resulted in extremely high accuracy, placing individuals within an average longitudinal error of 2.2 degrees, and an average latitudinal error of 0.88 degrees. Next, we applied an algorithm that we have previously described to select the top 5,000 SNPs that correlate well with population structure as captured by PCA. We then filtered highly correlated SNPs using standard linear algebraic algorithms for the column subset selection problem. We thus selected 500 maximally uncorrelated markers, which have a Pearson correlation coefficient of 0.92 with PC 1, and 0.83 with PC 2. We extensively validated the effectiveness of such SNP panels for genetic ancestry testing by once more performing a complete leave-one-out crossvalidation experiment on the 1224 studied individuals (approx. two weeks of CPU time in commodity hardware). Using 500 carefully selected SNPs we can place individuals within a few hundred kilometers of their reported origin (average longitudinal and latitudinal error of 4.7 and 1.9 degrees respectively). Finally, we crossvalidated our best panel of 500 SNPs on the HapMap CEPH European individuals, placing them accurately on the Northwestern corner of Europe. Not surprisingly, our SNP panel includes markers that are either within genes reported to be under selective pressure in Europeans, or in high LD with such genes.

Genetic relationships among the ancient Chinese populations viewed
from discrete cranial traits

The discrete cranial traits are informative in revealing the genetic relationship of human populations. Given little available knowledge on these traits, especially their underlying genetic determinants, the primary aim of this study is to select a small number of traits that are sufficiently informative to represent genetic differentiation among East Asian populations. We studied overall 51 traits for 1,578 skulls from 19 necropolises, and found that 5 traits could capture the largest variation in East Asian populations studied. They are accessory mandibular foramen, palatine torus, mandibular torus, mastoid foramen extra-sutural, and infraorbital suture. The analysis on these 5 traits resulted in similar population relationships to that using all 51 traits. The study on discrete cranial traits could not only facilitate exploration of the genetic relationship of populations, and could also allow identification of the genes underlying these anthropological traits.

Admixed ancestry and stratification of regional gene pools of Quebec
In Quebec, studies of different molecular polymorphisms have shown that the French Canadian gene pool is as diverse as its source European populations and, contrary to what was previously anticipated, does not display more homogeneity. To better understand the genetic structure of the contemporary population, we analyzed the origins and contribution of 7,798 immigrant founders identified in the genealogical ascendance of a sample of 2,221 subjects representative of the French Canadian population of Quebec. As expected, French founders are the most important in number (n=5,326) in all Quebec regions. They contribute for about 90% of the regional gene pools, except for regions located in the easternmost part of the province (76%), which are characterized by more diverse origins. Although this study supports the French founders’ importance, it also puts in the balance arguments in favor of the heterogeneity of the founding pool. The majority of immigrants landed as single member of their family, originating from all the regions of France. In addition, nearly all subjects have mixed origins, including French and non-French. Taken together, these results put into perspective the idea of the homogeneity of the origins of the French Canadians and of a pan-Quebec founder effect. The differential descent and genetic contribution of immigrant founders across regions points to the stratification of the French Canadian population of Quebec, showing a east-west gradient of diversity. These results will contribute to optimize study design in gene mapping studies relying on the founder effect in the French Canadian population of Quebec.

A nonsynonymous SNP in EDAR is associated with tooth shoveling
Teeth display variations among individuals in the size and the shape of cusps, ridges, grooves, and roots. In addition, there are certain dental characteristics which are predominant in certain human groups, such as tooth shoveling of upper incisors that is major in Asian populations but rare or absent in African and European populations. The common characteristics of dental morphology are thought to be determined mainly by genetic factors. However, genetic polymorphisms associated with dental morphology have not been elucidated yet. In humans, the ectodysplasin A receptor gene (EDAR) as well as the ectodysplasin A gene (EDA) is know to be responsible for hypohidrotic ectodermal dysplasia, a genetic disorder causing abnormal morphogenesis of teeth, hair, and eccrine sweat glands. Human genome diversity data have revealed that the derived allele of a nonsynonymous single nucleotide polymorphism (SNP), rs3827760 that is also called EDAR T1540C, is predominant in East Asian populations but absent in populations of African and European origins. It has recently been reported that the 1540C allele is associated with Asian-specific hair thickness. The aim of this study is to clarify whether the nonsynonymous polymorphism in EDAR is also associated with dental morphology in humans or not. For this purpose, we measured crown diameters and tooth shoveling grades, genotyped EDAR T1540C, and analyzed the correlations between them in Japanese populations. To comprehend individual patterns of dental morphology, we applied a principal component analysis (PCA) to individual-level metric data, the result of which implies that multiple types of factors affect the tooth size. This study clearly demonstrated that the number of the Asian-specific EDAR 1540C allele is strongly correlated with the tooth shoveling grade. The SNP significantly affected PC1 and PC2 in PCA, which denotes overall tooth size and the ratio of mesiodistal diameter to buccolingual diameter, respectively. Our study revealed a main genetic determinant of tooth shoveling that has classically received great attention from dental anthropologists. Further studies using powerful DNA technology will lead to clearer understanding about genetic factors for phenotypic variations in tooth morphology such as Carabelli’s tubercle, the numbers of cusps and roots, and the size balances shown in metric measurements.
Direct estimation of the microsatellite mutation rate
Characterizing the behavior of mutations is fundamental to our understanding of genetic variation. Attempts to directly observe DNA mutations arising from germline transmissions are confronted by two challenges: The large amount of DNA sequence that needs to be collected in order to observe a mutation (since the mutation rate in humans is estimated to be ~2x10-8 per generation), and a poor signal-to-noise ratio, due to the fact that any modern genotyping technology has an error rate far exceeding the mutation rate. Using deCODE Genetics’ database of over 95,000 Icelanders genotyped at over 3,000 microsatellite loci, we directly observed mutations in germline transmissions from pedigrees. Microsatellites are thought to have mutations rates as high as 10-3 per locus per generation. To overcome the genotyping error rate, which was estimated in this data set to be ≤10-2 per allele call after appropriate filtering, we carried out two independent analyses: (1) We restricted our analysis to mother-father-child trios, and required the mutated allele to be genotyped at least twice in both the child and in the transmitting parent to confirm mutant transmissions. This identified 2,124 mutant events from 5.62 million instances of parent-child transmissions, yielding a mutation rate estimate of 3.78x10-4 averaged across the markers that we analyzed. (2)Wetraced the haplotype affected by the mutation through local pedigrees, requiring that the mutant haplotype is observed in the affected proband’s children, and simultaneously, that the wildtype haplotype is observed in the affected proband’s siblings. This identified 788 mutant events from 1.59 million instances of parent-child transmissions, yielding a mutation rate of 4.96x10-4. Our collection of mutant events is significantly larger than previous studies. This allows for categorical analyses of microsatellite mutation rates partitioned based on the gender and age of the individual transmitting the allele, as well as the repeat type and cytogenetic position.


Vincent said...

We are using next-generation Illumina sequencing to fully re-sequence targeted regions of the Y and resolve the Y-chromosomal phylogeny by characterization of additional single nucleotide polymorphisms (SNPs) on lineages of interest. Initially ~6 Mb of Y sequence (NCBI36:Y-chromosome: 12,308,579- 18,230,132) is being generated for an African haplogroup A male.

This is kind of encouraging. Though by my count 6 Mb of Y-SNPs would have about the same phylogenetic resolution as 75 Y-STRs (in that the sum of mutation rates is about the same), of course the SNPs are much better suited to building a phylogeny. If indeed this approach turns out to be cost-effective, and the goals of applying the same method to other samples and having a GenBank-like database, it will be a huge step forward. And of course it might be possible (depending on quality and overlap) to compare the 6 Mb sequence from haplogroup A to existing full sequences from E, I, O, and R to date not only Y-Adam but several other tree nodes as well.


Anonymous said...

I am only commenting on this as someone on another forum has quoted the highlighted statement for Ethiopians from this blog: "The demographic history of Ethiopia over the past several thousand years has involved both sustained migration of Semitic speakers from the Arabian Peninsula as well as internal conquests of lands in the south."

That statement does not comply with the evidence. Semitic speakers in the Arabian Peninsula, populations of over 70% J1 (in Yemen), have a reduced genetic diversity compared to Ethiopians. Ethiopians have over 0.40 Averaged gene diversity (AGD) compared with Omani, who are Peninsula Arabians, and have 0.25 AGD, eveb less than Tunisians who are about 0.30 AGD. The Sudanese J1 population which is known to be due to Arabian admixture and Arabian migration of males to the Sudan from the Arabian Peninsula are about 0.18 AGD.

It is impossible for Peninsula Arabians speaking Semitic languages albeit of the Old Arabian Semitic language variety, to have significantly contributed to the Ethiopian population.

You have to find another source for the more than 30% J1 in Amharic speaking Ethiopian men, and another source of the variety of Semitic language spoken by Ethiopians than the simple Arabian Peninsula Arabic speaking groups. It is more likely Ethiopia was colonised by another group of Semitic language speakers who were Caucasoid, and at an earlier date than the Arabians who seem to be Johnny come latelies to their region. The Semitic language group developed in Africa and its speakers colonised the Middle East, the Levant first and the Arabian Peninsula much, much later. At any rate, Semitic languages in the Middle East are no older than 5,000 years ago. Ethiopia was probably colonised by Semitic speaking Caucasoids much earlier than that. African speakers of Afro-Asiatic languages, Berber and now Arabic, have more genetically diverse J1 than any Peninsula Arabians. If the statement made by the report was true, it would be the reverse.

The conclusion of the report, as often is the case, is wrong and shows very poor scholarship.