December 29, 2011

Chinese, Korean, Japanese (genetic edition)

My 2006 post on facial composites of Chinese, Korean, and Japanese women is, surprisingly, the most widely read single entry of this blog. People still occasionally guess "who is who" in that post, five years later.

As I was going through the list of the Dodecad populations, I realized that there are 5+ participants in each of the Korean, Japanese, and Chinese groups. So, it seemed like a simple exercise to see whether the relatively high success rate of people's guesses could be corroborated using the DNA data.

Below is the MDS plot; there are 9 Chinese, 5 Japanese, 5 Koreans in the Dodecad Project; I have also added 30 HapMap Chinese (CHB) and Japanese (JPT):
Only the first MDS dimension showed deviation from normality according to a Shapiro-Wilk test. Using MCLUST, that dimension was enough (as can be seen from the above figure) to infer the presence of 3 clusters which corresponded to the 3 groups, with 100% correct assignments.

Interestingly, when I did not use the extra HapMap individuals, MCLUST did not split Koreans from Chinese. This goes to show that the absence of apparent structure does not imply absence of structure. The extra Chinese and Japanese individuals helped flesh out the existing structure in these East Asian groups.

Below is the list of the Dodecad populations that are below the 5-individual limit:


Algerian_D 4 East_African_Various_D 3 Greek_Italian_D 2 Belgian_D 1
North_African_Jews_D 4 Danish_D 3 Swiss_German_D 2 Latvian_D 1
Slovenian_D 4 Tunisian_D 3 Szekler_D 2 Estonian_D 1
Mixed_Scandinavian_D 4 Austrian_D 3 Mandaean_D 2 Bangladesh_D 1
Moroccan_D 4 Saudi_D 3 Azeri_D 2 Yemenese_D 1
Serb_D 4 Pakistani_D 3 Czech_D 2 Sri_Lanka_D 1
Tatar_Various_D 3 Georgian_D 2 Hungarian_D 1
Palestinian_D 3 Kazakh_D 2 Basque_D 1
Romanian_D 3 Udmurt_D 1
Ukrainian_D 1
Egyptian_D 1

If you belong to one of the above groups (all 4 grandparents) and have tested with either 23andMe or Family Finder, you are especially invited to contact me at dodecad@gmail.com (but do not send data right away!), about possible inclusion in the project. 

For example, in the most recent Clusters Galore analysis, there was a generic "Balkan" cluster. Does this imply that Balkan ethnic groups cannot be distinguished from each other, or that sample sizes are simply not yet sufficient to make manifest the existing structure?

14 comments:

Justin said...

Interesting how the Koreans and some CHB differ from Japanese in the vertical PC. The Japanese and Chinese_D are shifted upward. Assuming that most Chinese_D are Southern Chinese, maybe this could suggest some common ancestry between the Japanese and Southern Chinese?

It would be interesting if there was a PCA for more East Asian ethnic groups including some Tungusic and Mongolic individuals.

Also a differentiating factor between Japanese and mainland Northeast Asians is Jomon ancestry. Jomon Japanese are related to Australoids. It has been observed from previous admixture runs that Japanese have some Melanesian or Papuan admixture (although small- more significant than other East Asians). The Japanese also have some South Asian admixture evident from the Dodecad V3 study (you have to download the spreadsheet or individual graphs to see). South Asian admixture is slightly present in some Chinese_D individuals, while completely absent among Korean_D individuals.

Justin said...

I'm sure you can get more Korean samples from other studies. The lack of Korean samples may skew the results.

Perhaps: Koreans in genomic context (Jung et al. 2010)

Gene Flow between the Korean Peninsula and Its Neighboring Countries

eurologist said...

At first I was juxtaposing the following two concepts: if there is only one significant PC, which one separates the three populations --- increasing loss of rice farmer diversity towards Japan, or increasing native admixture?

However, looking at the graph, perhaps PC1 represents the former, and PC2 the latter. Because, why would Japan have the same (order of magnitude) PC2 diversity if it was not due to ancient admixture?

Dienekes said...

PC2 doesn't represent anything, since the data points are normally distributed along it.

Justin said...

I wouldn't say Jomon admixture since Koreans and Chinese have almost no Y-haplogroup D2 (detected in some samples but in totality they only represent about 1~2% of total Y-haplogroup variation in China and Korea). Also ancient records of pre-historic Korea and China do not have any Jomon-like pottery and artifacts, as well as the lack of a Sundadont dental pattern in North Chinese and Korean skeletons.

I would say that the first component represents an East Asian component which underwent selection in China, then again mutated in Korea and then underwent another mutation in the southern half of the Korean peninsula before it went and mixed with the Jomon in Japan.

There is no other explanation because there is a lack of Tungusic and Southeast Asian groups which could further assign Chinese, Koreans and Japanese their relative positions on such a PCA.

Justin said...

If you compared the French, Swedes and Poles you would get a similar PCA to this.

The French would appear to the left, Swedes in the middle and Poles to the right.

It doesn't explain other variables, such as Southeast Asian, Australoid or Siberian ancestry as these populations are missing from the PCA.

terryt said...

"If you compared the French, Swedes and Poles you would get a similar PCA to this".

yes. In many ways it is just a cline of variation with Koreans lying between Chinese and Japanese.

princenuadha said...

@Justin

Mtdna evidence implied the Hokkaido jomon were largely the direct descendants of paleolithic siberians, not southeast asians. So I don't think the jomon were Australiods.

Justin said...

@princenuadha

Southeast Asians are not Australoid. Australian Aboriginies and Papuans are Australoid.

Also, Jomon are not Hokkaido Jomon. Jomon people are as diverse as the diversity between French and Italians. Japanese Y-DNA evidence suggests that they came from Southeast Asia (the Andaman Islands or nearby) so there may be multiple origins of the Jomon.

Also Jomon are not Ainu, however Ainu are partially Jomon. Ainu could be mixed with neighboring populations like Tungusic peoples and Russians.

The South Asian admixture among Japanese and sometimes Melanesian admixture (detected in a few admixture runs by genetic bloggers) suggest that Australoid ancestry among the Japanese came from the Jomon. Australoid ancestry is absent among neighboring Northeast Asians such as the Koreans and Nanai/Ulchi, while Y-DNA evidence suggests that haplogroup D (the Jomon-Japanese haplogroup) is largely absent in the Korean peninsula and other parts of Northeast Asia.

princenuadha said...

> Also, Jomon are not Hokkaido Jomon. Jomon people are as diverse as the diversity between French and Italians.

Well, paleo siberian and australiod would be an insanely diverse group and... too diverse to come to Japan as one. I think there was some craniometric study that suggested the jomon spread through Japan from hokaido, so the hokaido jomon are closer to the original jomon.

I.guess you could still be right about southern groups scewing the Japanese away from the other NE asians, though I never heard they had australiod before.

terryt said...

"Australoid ancestry is absent among neighboring Northeast Asians such as the Koreans and Nanai/Ulchi, while Y-DNA evidence suggests that haplogroup D (the Jomon-Japanese haplogroup) is largely absent in the Korean peninsula and other parts of Northeast Asia".

Haplogroup D is most certainly not 'Australoid'. Through SE Asia the haplogroup is present (as a very small minority) only in Thailand and Sumatra. It is absent from the remainder of SE Asia including the islands, and from Australia, New Guinea and neighbouring islands.

It is therefore doubtful that Y-DNA D reached Japan by any other means than over land. Its absence in Korea and other parts of Northeast Asia seems most probably because an ancient connection between Tibet and Japan has been obscured by the later expansions of haplogroups O, N and C3.

Justin said...

Haplogroup D* is Australoid.

Haplogroup D* has the highest frequency and variation among Andaman Islanders (proto-Australoids), which means haplogroup D* carriers were Australoid before transferring these Y-DNA haplogroups to the Himalayas and Japan.

Haplogroup D2 could have originated in Southeast Asia or Andaman Islands/Thailand. It didn't originate in Northeast Asia. If it originated in Northeast Asia or Tibet, you would at least expect Tibetans to carry haplogroup D2, however D2 is absent in all Northeast Asians, including Tibetans. Only the Japanese have D2.

Actually the Jomon lived in southern and central Japan before being pushed to Hokkaido by the invading Yayoi. Jomon are phenotypically closest to Australoids, with short stature, hirsutism and wide noses.

The Hokkaido Ainu (mixed with Northeast Asians) are genetically different from the Jomon (Australoids). The Australoid phenotype is evident among the common Jomon Japanese , such as Ken Hirai and Shimoji Isamu. They are 100% Japanese by ancestry (although it is assumed they have high Jomon Japanese admixture).

terryt said...

"Actually the Jomon lived in southern and central Japan before being pushed to Hokkaido by the invading Yayoi. Jomon are phenotypically closest to Australoids, with short stature, hirsutism and wide noses".

Perhaps phenotypically Australoid, but certainly not regarding their haplogroup. Mind you, haplogroups, especially Y-DNA, can be replaced.

"Haplogroup D* is Australoid".

On what grounds do you claim that? It is not found anywhere that surviving Australoids are found. Absent from Australia, New Guinea or Melanesia.

"Haplogroup D* has the highest frequency and variation among Andaman Islanders (proto-Australoids)"

Are you sure the Andaman Islanders are proto-Australoids? Strange that the haplogroup should have been lost in the Australoids proper. To me they are completely independent of 'Australoids', although they may look somewhat like some of them.

"Haplogroup D2 could have originated in Southeast Asia or Andaman Islands/Thailand. It didn't originate in Northeast Asia. If it originated in Northeast Asia or Tibet, you would at least expect Tibetans to carry haplogroup D2, however D2 is absent in all Northeast Asians, including Tibetans. Only the Japanese have D2".

It seems correct that only Japanese have D2, but surely if D2 originated in Southeast Asia or Andaman Islands/Thailand we should expect the haplogroup to be found there too. D2 could well have originated in Japan from undifferentiated D. Whre D originated is an unsolved question at this stage although most claim SE Asia. Tibetans evidently contain both D1 and D3, so that leaves open the possibility that D originated in Tibet. Andaman Islands D is given as D*. But is likely to be monophyletic there because D* is also claimed as present in Turkic and Mongol speakers. Probably yet another monophyletic D.

terryt said...

http://www.isogg.org/tree/ISOGG_HapgrpD.html

Quote:

"Y-DNA haplogroup D is seen primarily in Central Asia, Southeast Asia, and in Japan and was established approximately 50,000 years ago. Sub-group D1 (D-M15) is seen in Tibet, Mongolia, Central Asia, and Southeast Asia, and the sub-groups D* (D-M174) and D3 (D-P47) are seen in Central Asia. The sub-group D2 (D-M55) is seen almost exclusively in Japan. The high frequency of haplogroup D in Tibet (about 50%) and in Japan (about 35%) implies some early migratory connection between these areas. Examination of the genetic diversity seen in sub-group D2 in Japan implies that this group has been isolated in Japan for between 12,000-20,000 years. The highest frequencies of D2 in Japan are seen among the Ainu and the Ryukyuans. An isolated incidence of haplogroup D has also been seen in the Andaman Islands in the Indian Ocean. This implies that the group may once have had a much greater range, but has subsequently been displaced by more recent population events".