While SNPs are single-letter changes in the genetic code, copy number variation (CNV) involves the multiplication (or deletion) of entire chunks of DNA. While in a SNP, the allele is a single letter (e.g., C or T), in CNVs, the allele is an integer number of how many copies of the particular chunk of DNA an individual has. What this paper shows is that most human CNVs don't appear to be "fresh" changes but rather old "frozen" changes that are linked to specific SNPs or combinations of SNPs. Practically, this means that a CNV allele can be inferred fairly accurately by looking at SNPs in the region of the chromosome where it occurs.
Nature Genetics 40, 1166 - 1174 (2008)
Integrated detection and population-genetic analysis of SNPs and copy number variation
Steven A McCarroll et al.
Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.