October 27, 2005

HapMap publication explosion

Haplotype map offers new insights into human disease, evolution
Cambridge, MA, Wed., Oct. 26, 2005 – In several papers published this week in Nature, Nature Genetics, PLoS Biology and Genome Research, Broad researchers and an international set of collaborators announce substantial advances in relating human genetic variation to disease and understanding human evolutionary history.

This flurry of high-profile studies are grounded in data described in a significant paper published in the Oct. 27 issue of the journal Nature by an international consortium of more than 200 researchers from Canada, China, Japan, Nigeria, the United Kingdom and the United States. In this paper, the authors describe the common patterns of genetic variation in human DNA samples collected from four sites around the world. The Consortium's findings provide overwhelming evidence for previous scientific work suggesting that genetic variants located physically close to each other are collectively inherited as groups, called haplotypes. The comprehensive catalog of all of these blocks, known as the "HapMap," which is now publicly available to the biomedical research community, has already accelerated the search for gene variants involved in common diseases and brought new insights into the genes involved in human evolution.


In line with the Broad Institute's commitment to building critical resources for the scientific community, HapMap data are freely available in several public databases, including the HapMap Data Coordination Center (www.hapmap.org), the NIH-funded National Center for Biotechnology Information's dbSNP (www.ncbi.nlm.nih.gov/SNP/index.html) and the JSNP Database in Japan (snp.ims.u-tokyo.ac.jp). Further information can also be found at the Broad Institute's web site (www.broad.mit.edu).

Read more:

HapMap Could Yield Genetic Clues to Many Diseases Forbes
HapMap Catalogue of Human Genetic Variation Published Bio-IT World
Phase I of HapMap complete The Scientist
Researchers have released a public database of human genetic variation, designed to help scientists study the effects of small genetic differences on health, reports an international consortium in this week's Nature. The findings suggest that only 260,000 to 470,000 single nucleotide polymorphisms (SNPs) are needed to capture all the common genetic variation in the populations studied, despite the fact that there are an estimated 10 million common SNPs in the human genome.

List of papers/editorials/commentaries (most require access):

Nature Deeper into the genome
Nature Genomics: Understanding human diversity
Nature A haplotype map of the human genome
Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

Nature Genetics Efficiency and power in genetic association studies
We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.

Genome Research
Calibrating a coalescent simulation of human genome sequence variation
Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.

PLoS Biology The Geographic Spread of the CCR5 Δ32 HIV-Resistance Allele
The Δ32 mutation at the CCR5 locus is a well-studied example of natural selection acting in humans. The mutation is found principally in Europe and western Asia, with higher frequencies generally in the north. Homozygous carriers of the Δ32 mutation are resistant to HIV-1 infection because the mutation prevents functional expression of the CCR5 chemokine receptor normally used by HIV-1 to enter CD4+ T cells. HIV has emerged only recently, but population genetic data strongly suggest Δ32 has been under intense selection for much of its evolutionary history. To understand how selection and dispersal have interacted during the history of the Δ32 allele, we implemented a spatially explicit model of the spread of Δ32. The model includes the effects of sampling, which we show can give rise to local peaks in observed allele frequencies. In addition, we show that with modest gradients in selection intensity, the origin of the Δ32 allele may be relatively far from the current areas of highest allele frequency. The geographic distribution of the Δ32 allele is consistent with previous reports of a strong selective advantage (>10%) for Δ32 carriers and of dispersal over relatively long distances (>100 km/generation). When selection is assumed to be uniform across Europe and western Asia, we find support for a northern European origin and long-range dispersal consistent with the Viking-mediated dispersal of Δ32 proposed by G. Lucotte and G. Mercier. However, when we allow for gradients in selection intensity, we estimate the origin to be outside of northern Europe and selection intensities to be strongest in the northwest. Our results describe the evolutionary history of the Δ32 allele and establish a general methodology for studying the geographic distribution of selected alleles.

No comments: