October 31, 2012

A thousand (and ninety two) genomes

This is an open access paper describing the phase1 data of the 1000 Genomes Project. There is plenty of interest in the paper and supplement, but look at Figure S8 (left). This indicates the median shared haplotype length around f2 sites, i.e., sites where the variant exists twice in the sample and hence it makes sense to speak of shared length.

The maximum such length is for FIN (Finns) at 140kb, but it seems fairly obvious visually that the lowest sharing is found in multi-origin populations from the Americas (MXL, CLM, PUR, AS), in which segments are probably "interrupted" because of admixture. African populations (LWK/YRI) also tend to have low sharing, followed by Europeans, and East Asians.

There are little details in evidence: for example, IBS sharing with Luhya (LWK) seems higher than the European average, consistent with some level of African admixture in Spain, that has probably contributed some African haplotypes.

There seems to be a hint of an excess of sharing between Japanese (JPT) and Luhya (LWK). I have to wonder whether this might have something to do with Y-haplogroup D which links the Japanese with African Y-haplogroup E bearers. An excess of sharing between CHS (Singapore Chinese) and PUR (Puerto Rico) also seems to be suggested, for which I have no good hypothesis.

Nature 491, 56–65 (01 November 2012) doi:10.1038/nature11632

An integrated map of genetic variation from 1,092 human genomes

The 1000 Genomes Project Consortium

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38|[thinsp]|million single nucleotide polymorphisms, 1.4|[thinsp]|million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.


1 comment:

truth said...

IBS sample has Canarians in it