In addition, the comparison of indels between SJK and YH (Table 4) showed that the two genomes shared the same type of indels by 99.5% on the same genomic loci (SJK and HuRef shared 86.2%, SJK and Watson shared 87.8%, SJK and NA18507 shared 93.6%).
So -based on indels- the Korean and Chinese individuals are ~24 times less distant to each other than the Korean is to James Watson (a European descendant) and ~13 times less distant to each other than the Korean was to NA18507 (a Nigerian). Table 4 in the paper has all the detailed numbers.
Figure 2 shows the overlap -number of SNPs- between various full genomes available.
Consider (E): 1.2 million SNPs are shared by the Korean and Venter/Watson; ~0.5 million are shared by the Korean and Venter (but not Watson) and the Korean and Watson (but not Venter), i.e., they transcend racial lines.
But, another ~0.5 million is shared by Venter and Watson, but not the Korean. A subset of these may be shared by accident for these three individuals (i.e., another Korean might also possess some of them). Another subset may be shared by Venter and Watson and most other Caucasoids; another subset may be shared by Venter and Watson, presumably due to their common Western European ancestry (or shared other minor ancestry), and so on.
As we sample more full genomes, we will be able to zero in on the pan-human SNPs, which represent shared human genetic diversity, as well as SNPs limited to races, subraces, ethnic groups, regions, ..., individuals.
This is an open access paper, so you can read it for yourselves.
Genome Research doi:10.1101/gr.092197.109
The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group
Sung-Min Ahn et al.
We present the first Korean individual genome sequence (SJK) and analysis results. The diploid genome of a Korean male was sequenced to 28.95-fold redundancy using the Illumina paired-end sequencing method. SJK covered 99.9% of the NCBI human reference genome. We identified 420,083 novel SNPs that are not in the dbSNP database. Despite a close similarity, significant differences were observed between the Chinese genome (YH), the only other Asian genome available, and SJK: 1) 39.87% (1,371,239 out of 3,439,107) SNPs were SJK-specific (49.51% against Venter's, 46.94% against Watson's, and 44.17% against the Yoruba genomes), 2) 99.5% (22,495 out of 22,605) of short indels (less than 4 bp) discovered on the same loci had the same size and type as YH, and 3) 11.3% (331 out of 2920) deletion structural variants were SJK-specific. Even after attempting to map unmapped reads of SJK to unanchored NCBI scaffolds, HGSV, and available personal genomes, there were still 5.77% SJK reads that could not be mapped. All these findings indicate that the overall genetic differences among individuals from closely related ethnic groups may be significant. Hence, constructing reference genomes for minor socio-ethnic groups will be useful for massive individual genome sequencing.