Showing posts with label Copy Number Variation. Show all posts
Showing posts with label Copy Number Variation. Show all posts

July 01, 2009

Common variants and schizophrenia

Until now, the hunt for the genetic etiology of psychiatric disorders didn't go very well. Three new papers in Nature make some progress by noting that a combination of many genes, especially of the Major Histocompatibility Complex is predictive of schizophrenia and bipolar disorder risk. From the NIH news release:
Three schizophrenia genetics research consortia, each funded in part by NIMH, report separately on their genome-wide association studies online July 1, 2009, in the journal Nature. However, the SGENE, International Schizophrenia (ISC) and Molecular Genetics of Schizophrenia (MGS) consortia shared their results – making possible meta-analyses of a combined sample totaling 8,014 cases and 19,090 controls.

All three studies implicate an area of Chromosome 6 (6p22.1), which is known to harbor genes involved in immunity and controlling how and when genes turn on and off. This hotspot of association might help to explain how environmental factors affect risk for schizophrenia. For example, there are hints of autoimmune involvement in schizophrenia, such as evidence that offspring of mothers with influenza while pregnant have a higher risk of developing the illness.
From news.com.au:
As well as pinpointing key immune system mutations, complementary discoveries from each consortium showed clearly that many small genetic variations combine in different ways to increase a person's risk of developing schizophrenia.

"If you look at any individual with schizophrenia no single gene is really strong, but put these genes together and you get a meaningful influence," Dr Cairns said.

From the deCODE news release:

"Genetics offers a unique window for better understanding diseases like schizophrenia because the brain and cognition are so little understood and so difficult to study. Discoveries such as these are crucial for teasing out the biology of the disease and making it possible for us to begin to develop drugs targeting the underlying causes and not just the symptoms of the disease. One of the reasons this study was so successful is its unprecendented size. Pooling our resources has yielded spectacular results, which is what the participants from three continents hoped for. At the same time, this study underscores the fact that rare variants may well carry a significant part of the genetic risk of schizophrenia, so our next task is to use the ever more affordable sequencing technologies to find more of them," said Kari Stefansson, CEO of deCODE and corresponding author on the paper.

In the first phase of the study, the deCODE-led SGENE consortium conducted a genome-wide scan of more than 300,000 SNPs in a total of 17,000 patients and controls from England, Finland, Germany, Iceland, Italy and Scotland. The 1500 SNPs with the best signal were then analyzed in 11,000 patients and controls from the International Schizophrenia Consortium (ISC) and the European-American portion of the Molecular Genetics of Schizophrenia studies (MGS). Twenty-five SNPs with strong suggestive correlation were then followed up in more than 20,000 patients and controls from the Netherlands, Denmark, Germany, Hungary, Norway, Russia, Finland and Spain. Bringing together the results of different consortia established he association between the the total of seven markers on chromosomes 6, 11, and 18 with increased risk of schizophrenia.

So, while these three studies are a vindication of sorts for common variants, the greater part of the risk remains to be found in rare variants that are not captured by the 300K SNPs or so that were genotyped. But, by any means, this is a significant victory in the search for the hidden heritability.

From the Stanford release:

Using commercially available "SNP chips" designed to detect those more-common variants, the investigators looked for differences between the DNA of people with schizophrenia versus the DNA of those without the disease. The scientists required that such differences achieve "genome-wide statistical significance." Here's why: If you flip a million coins, one at a time, you're going to see all kinds of seemingly miraculous events — say, 15 heads in a row — that may seem significant but are typical when you toss even a perfectly balanced coin so many times.

Shi's job was to devise analytical techniques to determine whether the "finding" of a SNP's greater likelihood among schizophrenics was real or spurious. The genomic region on chromosome 6 survived this rigorous statistical test.

"These findings show that our genetic methods are working, and that the genetic underpinnings of schizophrenia can be understood," said Levinson. "Similar methods have produced critical new discoveries in many other common diseases, once very large numbers of people could be studied. Now we see that the same approach works for psychiatric disorders like schizophrenia."



The three papers (in no particular order):

Nature doi:10.1038/nature08185

Common polygenic variation contributes to risk of schizophrenia and bipolar disorder

The International Schizophrenia Consortium

Abstract

Schizophrenia is a severe mental disorder with a lifetime risk of about 1%, characterized by hallucinations, delusions and cognitive deficits, with heritability estimated at up to 80%1, 2. We performed a genome-wide association study of 3,322 European individuals with schizophrenia and 3,587 controls. Here we show, using two analytic approaches, the extent to which common genetic variation underlies the risk of schizophrenia. First, we implicate the major histocompatibility complex. Second, we provide molecular genetic evidence for a substantial polygenic component to the risk of schizophrenia involving thousands of common alleles of very small effect. We show that this component also contributes to the risk of bipolar disorder, but not to several non-psychiatric diseases.

Link

Nature doi:10.1038/nature08186

Common variants conferring risk of schizophrenia

Hreinn Stefansson et al.

Abstract

Schizophrenia is a complex disorder, caused by both genetic and environmental factors and their interactions. Research on pathogenesis has traditionally focused on neurotransmitter systems in the brain, particularly those involving dopamine. Schizophrenia has been considered a separate disease for over a century, but in the absence of clear biological markers, diagnosis has historically been based on signs and symptoms. A fundamental message emerging from genome-wide association studies of copy number variations (CNVs) associated with the disease is that its genetic basis does not necessarily conform to classical nosological disease boundaries. Certain CNVs confer not only high relative risk of schizophrenia but also of other psychiatric disorders1, 2, 3. The structural variations associated with schizophrenia can involve several genes and the phenotypic syndromes, or the 'genomic disorders', have not yet been characterized4. Single nucleotide polymorphism (SNP)-based genome-wide association studies with the potential to implicate individual genes in complex diseases may reveal underlying biological pathways. Here we combined SNP data from several large genome-wide scans and followed up the most significant association signals. We found significant association with several markers spanning the major histocompatibility complex (MHC) region on chromosome 6p21.3-22.1, a marker located upstream of the neurogranin gene (NRGN) on 11q24.2 and a marker in intron four of transcription factor 4 (TCF4) on 18q21.2. Our findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.

Link

Nature doi:10.1038/nature08192

Common variants on chromosome 6p22.1 are associated with schizophrenia

Jianxin Shi et al.

Abstract

Schizophrenia, a devastating psychiatric disorder, has a prevalence of 0.5–1%, with high heritability (80–85%) and complex transmission1. Recent studies implicate rare, large, high-penetrance copy number variants in some cases2, but the genes or biological mechanisms that underlie susceptibility are not known. Here we show that schizophrenia is significantly associated with single nucleotide polymorphisms (SNPs) in the extended major histocompatibility complex region on chromosome 6. We carried out a genome-wide association study of common SNPs in the Molecular Genetics of Schizophrenia (MGS) case-control sample, and then a meta-analysis of data from the MGS, International Schizophrenia Consortium and SGENE data sets. No MGS finding achieved genome-wide statistical significance. In the meta-analysis of European-ancestry subjects (8,008 cases, 19,077 controls), significant association with schizophrenia was observed in a region of linkage disequilibrium on chromosome 6p22.1 (P = 9.54 times 10-9). This region includes a histone gene cluster and several immunity-related genes—possibly implicating aetiological mechanisms involving chromatin modification, transcriptional regulation, autoimmunity and/or infection. These results demonstrate that common schizophrenia susceptibility alleles can be detected. The characterization of these signals will suggest important directions for research on susceptibility mechanisms.

Link

May 10, 2009

ESHG 2009 abstracts

ESHG 2009 is in two weeks, and there are some very interesting abstracts, including a tantalizing new study on Y-chromosome haplogroup R1b1b2 (R-M269).

Phylogeography of human Y chromosome haplogroup R1b1b2 (R-M269) in Europe
F. Cruciani et al.

The human Y chromosome haplogroup R1b1b2 (R-M269) displays an extremely wide geographic distribution within Europe, with a decreasing frequency cline from Iberia (frequencies up to 90%) towards the Balkans (usually less than 10%). Previous studies have proposed that the observed R1b1b2 frequency cline is due to a population expansion from an Iberian Ice-age refugium after the LGM (Malaspina et al. 1998; Semino et al. 2000).

In this study, we explored the phylogeography of the human Y chromosome haplogroup R1b1b2 by analyzing more than 2,000 males from Europe. The haplogroup-defining marker M269 (Cruciani et al. 2002), and two additional internal markers (U106 and U152, Sims et al 2007) which identify internal branches (R1b1b2g and R1b1b2h) were analyzed. The paragroup R1b1b2*(xR1b1b2g, R1b1b2h) and the haplogroups R1b1b2g and R1b1b2h showed quite different frequency distribution patterns within Europe, with frequency peaks in the Iberian Peninsula, northern Europe and northern Italy/France, respectively. The overall frequency pattern of R1b1b2 haplogroup is suggestive of multiple events of migration and expansion within Europe rather than a single and uniform spread of people from an Iberian Ice-age refugium.

References:

Malaspina et al. (1998) Am J Hum Genet 63:847-860
Semino et al. (2000) Science 290:1155-1159
Cruciani et al. (2002) Am J Hum Genet 70:1197-1214
Sims et al. (2007) Hum Mutat 28:97
Note that in the abstract below, the authors refer to Slavopaionians, not Macedonians.

Y chromosome haplogroup R1a is associated with prostate cancer risk among Macedonian males
D. Plaseska-Karanfilska et al.

Prostate cancer (PC) is one of the most common male-specific cancers. Its incidence varies considerably between populations. Recent surveys suggest that PC is influenced by both genetic and environmental factors, although the etiology of the disease remains unknown in the majority of cases. Certain Y chromosomal lineages have been suggested to predispose individuals to prostate cancer in Japanese population, but no association has been found among Korean and Swedish patients. The aim of this study was to investigate the association between Y chromosomal haplogroups and predisposition to prostate cancer in Macedonian men. We studied 84 PC patients and 126 males from the general population of Macedonian ethnic origin. A total of 28 markers have been studied by multiplex PCR and SNaPshot analysis. Nineteen different Y haplogroups were determined; the most frequent being I1b-P37b, E3b1-M78, R1a-SRY 1532, R1b-P25 and J2b1a-M241. The frequency of R1a was significantly higher in PC patients (20.2%) in comparison with the controls (9.5%) [p=0.027; OR=2.41 (1.09-5.36)]. When stratified according to age, even stronger association was observed between haplogroup R1a and prostate cancer in patients of >65 years of age [p=0.004; OR=3.24 (1.41-7.46)]. Our results suggest that Y chromosome haplogroup R1a is associated with an increased prostate cancer risk in Macedonian men.


The genetic position of Western Brittany (Finistère, France) in the Celtic Y chromosome landscape
K. Rouault et al.

Brittany, a large peninsula located at the western part of France, is of particular interest because of its historical settlement and its relative geographic and cultural isolation. Brittany was invaded by waves of migration from Britain and Ireland between the 4th and 7th centuries and, therefore, belongs to the Brythonic branch of the Insular Celtic language. We have focused our study on the department of Finistère, the most western territorial unit of Brittany, and its administrative and historical areas. To explore the diversity of the Y-chromosome, we analyzed a total of 348 unrelated males using a combination of 23 biallelic markers and 12 microsatellite loci. The molecular analysis revealed that 82.2% of the Y chromosomes fell into haplogroup R1b, placing Finistère within the Western European landscape. Interestingly, at a microgeographical level, differences were detected by the haplogroup R1a* being confined to the south of the department, while haplogroups E3b, F, G, J2, K and R1a1 were found in the north. Nevertheless, geographical distribution of haplogroups and haplotypes suggested territorial homogeneity inside Finistère. Most of the Y-chromosomal gene pool in Finistère is shared with European, especially British, populations, thus corroborating the historical reports of ancient migrations to Brittany. Finally, the results are consistent with those obtained from classic genetic markers and support the Celtic paternal heritage of the Finistère population.

Mitochondrial Genome Diversity in Tungusic-speaking Populations (Even and Evenki) and Resettlement of Arctic Siberia After the Last Glacial Maximum
I. O. Mazunin et al.

The present study includes the Even/Evenki, hunters and reindeer-breeders, sampled from a few localities scattered across their vast geographic range encompassing low Yana-Indigirka-Kolyma in the west and the Sea of Okhotsk coast in the east. The mtDNA data show a very close affinity of the Even/Evenki with the Yukaghir, typical reindeer hunters, dominating in extreme northeastern Siberia until the middle of 18th century but now being on the brink of extinction. We found that the majority of mtDNA diversity in the Tungusic-speaking populations was accounted for by Siberian-East Eurasian lineages C2, C3, D2, D3, D4-D9 and G1. The similarity in the haplogroup C and D mtDNA intrinsic variation between the Even and Yukaghir populations is pronounced and indicates that the Even/Evenki harbor an essential portion of the ancestral Yukaghir pool. The phylogeography of the D4-D9 point to an early Neolithic phase expansion initiated northward to the northern and eastern perimeters of former Beringia. Concerning unique D2* lineage (Volodko et al. 2008), the network analysis encompassing four complete sequences, three of the Yukaghir from the low Indigirka-Kolyma region and one of the Evenk from the upper reaches of the Aldan River would suggest that the founding haplotype (1935-8683-14905) for D2* originated within western part of former Beringia. In the meanwhile, the core of the Even/Evenki mtDNA pool residing in the midst of the Yukaghir ancient territory would represent a recent amalgamation of the remnants of the Yukaghir and northern Tungusic-speakers (Even/Evenki) originated in the mid-Amur region.

X-chromosomal haplotypes in global human populations
V. A. Stepanov, I. Y. Khitrinskaya;

To reconstruct the origin and evolution of X-chromosomal lineages in global human populations we investigated the genetic diversity in 23 population samples (about 1500 individuals totally) using SNP markers in a single linkage disequilibrium region of ZFX gene. About sixty haplotypes belonging to 3 phylogenetic branches (A, B, and F) originated from the single African root were found in the total sample. Branch A includes mostly African haplotypes, whereas four major haplotypes belonging to different sub-branches of B (haplotype E8) and F (haplotypes H4, I3 and I11) were present in Eurasia. Major haplotype of the older branch B (E8) is almost evenly distributed among Eurasian populations. Haplotypes of the younger phylogenetic branches demonstrates clinal distribution with the sharp frequency changes from East to West. Haplotype H4 is presumably “Eastern-Eurasian”. It reaches the highest frequency in Eastern and South-Eastern Asians. Haplotypes I3 and I11 in the contrary show the clear frequency gradient from West to East with the highest frequency in Europeans, moderate frequency in Central Asia, and the minimal frequency in North-East and South-East Asia. The total level of genetic differentiation of global human populations estimated by the analysis of molecular variance of X-chromosomal haplotypes (Fst = 9.1%) is quite high and roughly corresponds to those measured for most other types of genetic markers except Y-chromosomal haplogroups which are characterized by the much higher level of between-population differences.

Dissecting the genetic make-up of Central Eastern Sardinia using a high density set of sex and autosomal markers

L. M. Pardo

Genetic isolates are valuable for identifying genetic variations underlying complex traits. However, prior knowledge of the genetic structure of the isolate is fundamental for carrying-out genome-wide association studies (GWAS) in these populations. The Sardinian population is currently the target of GWAS because of its ancient origin and long-standing isolation. To perform GWAS in Sardinia, we aim to characterize a subpopulation from the archaic area of Central-Eastern Sardinia at the genomic level. We used sex-specific markers (Y-chromosome and mtDNA) to assess the heterogeneity of the founder lineages and the divergence from other populations. In addition, we used a dense set of autosomal markers (SNP 5.0 array, Affymetrix) to investigate genome-wide Linkage Disequilibrium, to construct a Copy Number Variation map and to estimate pair-wise kinship and inbreeding.We first determined Y-chromosome lineages in 256 unrelated Sardinians using biallelic and microsatellite markers. Our analysis showed that the frequency of the major Y haplogroups clearly sets this population apart from other European haplogroups. The analysis of microsatellite markers revealed a high degree of gene diversity. Pairwise kinship and inbreeding were estimated in 113 subjects using 77709 autosomal SNP markers. We found that 16% of the subject pairs shared identical-by descent alleles more often than expected by chance. Furthermore, 60% of the subjects had low inbreeding coefficient values. Our preliminary results confirm that Sardinia is genetically different from other populations, as shown by Y-chromosome markers. The kinship and inbreeding estimates indicate some degree of relatedness among Sardinians, as expected for an isolated population.

Genetic differences between four European populations

V. Moskvina et al.

Population stratification can distort the results of genome-wide association studies (GWAS). One approach to deal with this inflation of the statistic is to estimate the inflation factor and adjust the detection statistic accordingly. However, the evolutionally forces work with different strength in some regions of the human genome, e.g. around the lactase gene (LCT) and the HLA region, making such an adjustment inappropriate.

We examined the population differences in four European populations (Scotland, Ireland, Sweden and Bulgaria) using data from GWAS performed with the Affymetrix 6.0 array at the Broad Institute. We show that there are >20,000 SNPs which are highly (p less than 10-6) significantly stratified between the four populations, after genome wide Bonferroni correction for multiple testing. We then examined the top 20 stratified regions to see what genes might have caused the top differences, using a highly conservative cut-off of p less than 10-40. Some of the loci span genes reported before: hair colour and pigmentation (HERC2, EXOC2), the LCT gene, genes involved in NAD metabolism, and genes involved in immunity (HLA and the Toll-like receptor genes TLR10, TLR 1, TLR 6). Among the top hits were several genes which have not yet been reported as stratified within European populations, indicating that they might also provide a selective advantage. Some involve other immunity genes (CD99, ILT6), but others show no obvious effect on positive selection: several zinc fingers, and most intriguingly, FOXP2, implicated in speech development. Future GWAS should take into consideration any positive associations with these genes.
Genomic runs of homozygosity: population history and disease

R. McQuillan

Runs of homozygosity (ROH), resulting from the inheritance from both parents of identical haplotypes, are abundant in the human genome. ROH length is determined partly by the number of generations since the common ancestor: offspring of cousin matings have long ROH, while the numerous shorter ROH reflect shared ancestry tens and hundreds of generations ago. In studies of European populations we show that Froh, a multipoint estimate of individual autozygosity derived from genomic ROH, distinguishes clearly between subpopulations classified in terms of demographic history and correlates strongly with pedigree-derived inbreeding coefficients. In a global population dataset, analysis of ROH allows categorisation of individuals into four major groups, inferred to have (a) parental relatedness in the last 150 years (many south and west Asians), (b) shared parental ancestry arising hundreds to thousands of years ago through population isolation and restricted effective population size (Ne), but little recent inbreeding (Oceanians, African hunter-gatherers, some European and south Asian isolates), (c) both ancient and recent parental relatedness (Native Americans), and (d) only the background level of shared ancestry relating to continental Ne (east Asians, urban Europeans; African agriculturalists). Long runs of homozygosity are therefore a widespread and underappreciated characteristic of our genomes which record past consanguinity and population isolation and provide a unique record of individual demographic history. Individual ROH measures also allow quantification of the disease risk arising from polygenic recessive effects. We present preliminary data from a survey of the effects of ROH on quantitative disease-related traits and disease risk.


European Lactase Persistence Allele is Associated With Increase in Body Mass Index

J. A. Kettunen et al.

The global prevalence of obesity, usually indexed by body mass index (BMI) cut-offs, has increased significantly in the recent decades, mainly due to positive energy balance. However, the impact of a selection for specific genes cannot be excluded. Here we have tested the association between BMI and one of the best known genetic variants showing strong selective pressure: the functional variant in the cis-regulatory element of the lactase gene. We tested this variant since it is presumed to provide nutritional advantage in specific physical and cultural environments. We found that the variant responsible for lactase persistence among Europeans was also associated with higher BMI in a Nordic population sample (p = 1.3*10-5) of 15 209 individuals, the size of the effect being close to that of FTO. We tested the effect of population stratification and concluded that the association was not due to population substructure.

November 06, 2008

In search of the Hidden Heritability

Nature has a very interesting high level survey of the problem of the "hidden heritability". While many traits such as height, autism, or schizophrenia are known to be significantly heritable, recent genome scans with high-density microarray chips, that look at hundreds of thousands of DNA polymorphisms, have failed to produce any significant results.

So, if these traits are in our genes, how come we can't find them there?

The article does a great job at identifying the possible ways to find the "hidden heritability". Here they are, in my own words:

1. Look at more DNA spots

There is a long way between the million or so DNA bases covered by current microarray chips and the whole human genome. Because of linkage disequilibrium, i.e., DNA's propensity to be cut and inherited in large chunks, and not small pieces, you can often tell the value of a marker by looking at nearby markers. But, still, you don't really know until you look. So, denser microarrays, or even whole genome sequences may uncover some of the hidden heritability.

2. Look at more people

Associations between traits and genes are established by statistics. To find a weak association, or an association between a not-so-common variant and the trait in question, you need a large sample. So, if the hidden heredity is hidden away in markers that are beyond your statistical power, you can simply increase this power: sample more people.

3. Look at copy-number variations (CNVs)

Any two individuals don't just have single-letter differences, but also structural changes, where an individual may have more or fewer copies of entire chunks of DNA. So, by looking at single nucleotides you are examining one source of human variation, but missing another chunk of it that may as important.

3. Study gene-gene interactions

Genes form complex networks of interaction. If you flip a SNP from C to T, you don't always get the same effect on the phenotype. This flip may increase, decrease, or leave unaffected, your risk for a disease, depending on what other genes you have. This epistatic interaction of genes makes it difficult to detect associations. It's a lot easier to study the individual effects of 2N alleles at N genes than it is to study the effects of 2N possible combinations.

4. Don't trust heritability estimates

What if inherited conditions thought to be genetic aren't really genetic, because of epigenetic modifications of gene expression, or shared environments (e.g., in the womb) that aren't accounted for?

5. Don't trust diagnoses of conditions

If you want to find a correlation between a gene G and a trait T, you'd better be sure what T actually is. If it's a whole set of different behaviors, conveniently bundled into a condition T (such as schizophrenia), then you're in trouble, since each of these conditions may have its own causative agent. Many major diseases may be caused by more than one underlying condition, with a different genetic background. So, if you are seeking to find the common thread between people with trait T, you might not find it because there is no common thread!

My guess is that the bulk of the missing heritability is to be found in three sources:
  1. Epistasis. Humans are makeshift accidents of evolution, and not well-engineered machines where the effects of individual components have been designed to work well in isolation, shielding other components from their effects. Most things in the human body affects most other things, either directly or indirectly. There are, of course, some master switches which do have individual pronounced effects (e.g., giving you lactose tolerance or breast cancer), but these are the exception. Normal variation is due to how well-put together the individual is, and not so much in the individual components.
  2. Gene-Environment interactions. Just as the effect of genes depends on the joint presence of other genes (epistasis), so it depends on the presence of particular environmental influences. Imagine an allele that shows zero association with a particular trait. Does this mean that it has no influence on that trait? No, since zero association is perfectly compatible with even a huge influence, provided that a positive influence under one type of genomic or environmental background is balanced by a negative influence under another.
  3. Very low frequency (family) alleles. Natural selection faces a constant battle against the continuing re-emergence of less-than-optimal alleles. Children are almost certainly on average genetically worse than their parents, since parents have survived and reproduced, while children's ability to do so is yet to be tested (*) While human variation is -in part- due to long-lived alleles that have braved the generations, quite a lot of it is due to recent alleles that arose in families, and have not had the time to spread to many bodies. It is these extremely rare family alleles and allele combinations that population studies can't quite capture.
Read the original story at Nature: Personal genomes: The case of the missing heritability.

Some related posts on the limits of genome-wide association studies: on intelligence, on height and body mass index, and on CNVs.

(*) Incidentally, this is why the population replacement rate is more than 2 children per woman.

September 28, 2008

Integrated detection of SNPs and Copy number variation

While SNPs are single-letter changes in the genetic code, copy number variation (CNV) involves the multiplication (or deletion) of entire chunks of DNA. While in a SNP, the allele is a single letter (e.g., C or T), in CNVs, the allele is an integer number of how many copies of the particular chunk of DNA an individual has. What this paper shows is that most human CNVs don't appear to be "fresh" changes but rather old "frozen" changes that are linked to specific SNPs or combinations of SNPs. Practically, this means that a CNV allele can be inferred fairly accurately by looking at SNPs in the region of the chromosome where it occurs.

Nature Genetics 40, 1166 - 1174 (2008)

Integrated detection and population-genetic analysis of SNPs and copy number variation

Steven A McCarroll et al.

Abstract

Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.

Link

June 18, 2008

Scanning the human genome at kilobase resolution

This is yet another step for higher resolution study of human variation. In the current crop of association or population studies, scientists use microarrays to examine a few hundred thousand SNPs or copy number variations. The end goal is to read all bases in a person's genome (full genome sequencing). The cost of these two technologies is at least two orders of magnitude apart. This paper proposes to offer a more thorough scan of the human genome, about an order of magnitude higher than current techniques.

Genome Res. 2008 May;18(5):751-62. Epub 2008 Feb 21.

Scanning the human genome at kilobase resolution.

Chen J, Kim YC, Jung YC, Xuan Z, Dworkin G, Zhang Y, Zhang MQ, Wang SM.

Normal genome variation and pathogenic genome alteration frequently affect small regions in the genome. Identifying those genomic changes remains a technical challenge. We report here the development of the DGS (Ditag Genome Scanning) technique for high-resolution analysis of genome structure. The basic features of DGS include (1) use of high-frequent restriction enzymes to fractionate the genome into small fragments; (2) collection of two tags from two ends of a given DNA fragment to form a ditag to represent the fragment; (3) application of the 454 sequencing system to reach a comprehensive ditag sequence collection; (4) determination of the genome origin of ditags by mapping to reference ditags from known genome sequences; (5) use of ditag sequences directly as the sense and antisense PCR primers to amplify the original DNA fragment. To study the relationship between ditags and genome structure, we performed a computational study by using the human genome reference sequences as a model, and analyzed the ditags experimentally collected from the well-characterized normal human DNA GM15510 and the leukemic human DNA of Kasumi-1 cells. Our studies show that DGS provides a kilobase resolution for studying genome structure with high specificity and high genome coverage. DGS can be applied to validate genome assembly, to compare genome similarity and variation in normal populations, and to identify genomic abnormality including insertion, inversion, deletion, translocation, and amplification in pathological genomes such as cancer genomes.

Link

February 21, 2008

New study on global human variation based on SNPs and CNVs

A new letter in Nature combines data from single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) across 29 human populations. STRUCTURE results from the paper are below based on SNPs, haplotypes, and CNVs. Note in particular the Green cluster, which was not seen in some previous studies that did not include Oceanian populations, the differentiation between African farmers and hunter-gatherers, and the differentiation between northern and southern Mongoloids evident in the bottom row.



Nature 451, 998-1003 (21 February 2008) | doi:10.1038/nature06742; Received 2 December 2007; Accepted 29 January 2008

Genotype, haplotype and copy-number variation in worldwide human populations

Mattias Jakobsson1,2,14, Sonja W. Scholz4,5,14, Paul Scheet1,3,14, J. Raphael Gibbs4,5, Jenna M. VanLiere1, Hon-Chung Fung4,6, Zachary A. Szpiech1, James H. Degnan1,2, Kai Wang7, Rita Guerreiro4,8, Jose M. Bras4,8, Jennifer C. Schymick4,9, Dena G. Hernandez4, Bryan J. Traynor4,10, Javier Simon-Sanchez4,11, Mar Matarin4, Angela Britton4, Joyce van de Leemput4,5, Ian Rafferty4, Maja Bucan7, Howard M. Cann12, John A. Hardy5, Noah A. Rosenberg1,2,3 & Andrew B. Singleton4,13


Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups1, 2, 3. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.

Link

November 23, 2006

Copy number variation in humans

From the paper:
We obtained the optimal clustering with the assumption of three ancestral populations, with the African, European and Asian populations clearly differentiated

Nature 444, 444-454 (23 November 2006)

Global variation in copy number in the human genome

Richard Redon et al.

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

Link (Free access)

September 01, 2006

Humans have more copies of mystery gene MGC8902 affecting the brain

I had written before about the importance of large scale genomic differences in human evolution. Now, a new paper in Science presents clear evidence that a particular gene, MGC8902, which encodes for a protein of unknown function, has the largest number of copies (212) in humans, compared to other primates. A related story in LiveScience. John Hawks comments.

Science Vol. 313. no. 5791, pp. 1304 - 1307

Human Lineage–Specific Amplification, Selection, and Neuronal Expression of DUF1220 Domains

Magdalena C. Popesco et al.

Extreme gene duplication is a major source of evolutionary novelty. A genome-wide survey of gene copy number variation among human and great ape lineages revealed that the most striking human lineage–specific amplification was due to an unknown gene, MGC8902, which is predicted to encode multiple copies of a protein domain of unknown function (DUF1220). Sequences encoding these domains are virtually all primate-specific, show signs of positive selection, and are increasingly amplified generally as a function of a species' evolutionary proximity to humans, where the greatest number of copies (212) is found. DUF1220 domains are highly expressed in brain regions associated with higher cognitive function, and in brain show neuron-specific expression preferentially in cell bodies and dendrites.

Link

May 19, 2005

The importance of large genomic differences

The published sequence of the human genome is essentially a really long sequence of letters. Individual genomes differ from this sequence, and from each other, because they substitute one letter for another. However, these polymorphisms are quite rare, and hence humans are usually said to be genetically 99.9% the same.

However, two genomes may differ in other ways as well. Entire segments of DNA may be duplicated in some, or missing in others, or they could exist, but written "backwards".

Until recently, it was generally assumed that differences between individuals and populations were due to the really small changes in our genes. But, as reported in Nature, scientists are discovering that the large differences in which big chunks of DNA are duplicated, missing, or inverted, may be even more important for explaining human variation.
Two years ago, a group of researchers led by Michael Wigler at Cold Spring Harbor Laboratory found the first evidence that some of us have more copies of certain genes than do others (R. Lucito et al. Genome Res. 13, 2291−2305; 2003). And at last week's meeting, Evan Eichler of the University of Washington in Seattle reported that this is just the beginning: not only do we carry different copy numbers of parts of our DNA, we also have varying numbers of deletions, insertions and other major rearrangements in our genomes.

In fact, Eichler found at least 297 places in the genome where different individuals have different forms of these major structural variations. At these spots, some of us might carry a major deletion, for example, or an extra hundred bases of DNA.

But do such differences mean anything? Here, too, fresh evidence paints an intriguing picture. In January, scientists at the Iceland-based company deCODE Genetics found a long inversion — a stretch of DNA that is flipped around backwards — that is common in Europeans, but not in Asians and Africans (H. Stefánsson et al. Nature Genet. 37, 129−137; 2005). They also found that women who have this inversion bear more children than those who don't — a classic sign that the inversion confers an evolutionary advantage.

At the Cold Spring Harbor meeting, scientists presented more evidence that structural differences are important in human evolution. Duc-Quang Nguyen, a postdoctoral fellow in Chris Ponting's laboratory at the University of Oxford, UK, reported an analysis of areas where there are different numbers of copies of DNA stretches. Nguyen found that natural selection is actively working on these genes.

What's more, he found that many of these genes belong to groups that seem to help us interact with our environment. For instance, many work in the immune system, and affect how we fight off disease. These are exactly the sort of genes that could explain our diversity — why some of us get asthma when exposed to air pollution, or why some of us can eat plenty of cheeseburgers without gaining weight.

"We knew these variations existed, but this year we're asking, do they matter?" says Ewan Birney, head of bioinformatics for the European Molecular Biology Laboratory, based in Cambridge, UK. "The answer seems to be yes."


April 24, 2005

Spencer Wells responds (again)

Via an e-mail forwarded to the GENEALOGY-DNA-L list:
We will primarily be collecting males from the indigenous populations around the world, which will maximize the number of Y-chromosomes while providing mtDNA as well. It is also easier to study X-chromosome variation in men, since the X is only present in one copy and it is therefore easier to infer haplotypes.

The populations will be chosen through a process of consultation with elders and the people themselves. I have been in Australia for the past few days getting this started, and am off to Singapore and India this week. We will sample both ‘ethnically defined’ (by language, customs, etc.) and ‘geographically defined’ (i.e. if a group, such as the Kazaks of Central Asia, are widespread then we will attempt to sample roughly on a grid) groups.

Sampling from indigenous groups will be through blood draws, which will yield hundreds of micrograms of DNA. This amount of DNA if far more than we need for typing Y and mtDNA, and it will allow us to apply new markers to the study of migratory patterns in the future. Particularly as the HapMap data becomes available, new autosomal haplotype systems should provide great resolution for questions that are unanswerable using Y and mtDNA (remember that these only assess a tiny fraction of your complete genomic ancestry). The DNA will be stored at the regional center that collected it, and will be available for study in the future by all members of the scientific community – effectively a virtual, global biobank. These studies will only take place as collaborations, and the proposed genotyping must follow the guidelines for the study – e.g. only markers that tell us about historical or anthropological information. Also, the actual laboratory work will take place at the regional center(s) – one of our project goals is to build scientific capacity in the less developed countries (Brazil, South Africa, India, etc.) where we have centers. No medical research will ever be conducted using these samples, for reasons having to do with informed consent and intellectual property. We will release all of the anonymous data into the public domain as we analyze it. We feel that this information is part of the ‘commons’ of our species – it belongs to everyone – and no attempt will be made to patent it.

We will be testing every indigenous sample collected for Y and mtDNA. In the case of the former, a multiplex PCR technique will be used to type AT LEAST the 12 STR markers typed by Family Tree DNA. We will probably be typing more – perhaps as many as 20 – in the initial screen. We will also sequence HVR-1 in each individual. Initially, we will also SNP type every Y-chromosome and mtDNA to confirm the haplogroup. Once we have a sufficient database, we will probably be able to predict haplogroup affiliation with a high degree of precision, allowing us to simply type the STRs for most individuals. Over time new markers will be discovered – some perhaps by us, to answer specific questions – but the key will always be having access to the indigenous DNA samples to type these markers. These are the most valuable asset of the project, and we don’t have to limit ourselves to any particular markers – we’ll choose the best ones to answer the questions we are investigating.

The maps shown on the website atlas at the moment demonstrate the routes followed by the markers that will be reported in the public component at the moment. Over time we will add more routes (= subhaplogroups) as the information on them improves. Remember that this is a GLOBAL project of enormous logistical complexity, and therefore that we may not show all of the details of a well-studied region like Europe at this time. We will be improving the level of detail over the coming years, and European users in particular should see their routes become much more detailed. Purchasing a participant’s kit is like purchasing a ‘subscription to your genome’, and you will be able to check back every few months to see what has been updated in your profile.

Finally, the data collected from the public part of the project will allow us to add an enormous number of genotypes to the database, giving us the power to answer some key questions. For instance, at the moment there is no evidence for interbreeding with Neanderthals as modern humans migrated into western Europe, but this is based on only 15-20,000 individuals who have been genotyped. Will we find a rare Neanderthal lineage in the 234,000th sample we type? Also, the public samples will allow us to assess patterns of genetic variation in admixed populations. There are some interesting studies we hope to do with the US census data, comparing Y and mtDNA patterns to that database. So these samples really are part of the project, not simply a way to raise funds - although that is a great aspect as well. Most people I've spoken to love the fact that all of the net proceeds from their kit - slightly more than 20% of the $99.95 price - get plowed back into the research and Legacy project.

Spencer Wells

See the earlier response as well.