January 04, 2013

Deep whole-genome sequencing of 100 Malays

The 1000 Genomes Project is the largest collection of full human genomes currently available, but most of its 2.5k samples have been sequenced at low coverage. One downside of this is that infrequent variants are often missed. If an individual is polymorphic at some site, then the chance of detecting this polymorphism increases with the number of reads covering that site. If a number of individuals are sampled, then polymorphisms that are common in the population will probably be detected in a few individuals even if a low number of reads is used for each of them; but, if they are infrequent, then they are more likely to be missed. Hence, low-coverage sequencing of population samples will tend to find common variants and will tend to miss less common variants relative to high-coverage sequencing.

This idea is intuitively correct, but the question of the added power of high-coverage sequencing to detect variants can only be addressed by giving the same individuals both low- and high-coverage sequencing. This is the topic of a new paper in AJHG which creates a useful comparison benchmark for the performance of the two types of sequencing methods. High-coverage sequencing may be needed for things like disease studies (because deleterious alleles tend to be low-frequency), or the study of recent human demography (because recent population growth has resulted in an abundance of low-frequency SNPs that have not had enough time to reach a high population frequency yet).

AJHG dx.doi.org/10.1016/j.ajhg.2012.12.005

Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

Lai-Ping Wong et al.


Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30? coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (less than 5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.



Link

No comments: