September 25, 2008

Population structure in Japan with 140k SNPs

After the many recent studies on fine-scale genetic ancestry in Europe, a new paper investigates population structure in Japan using 140k SNPs. From the paper:
Our present study has clearly shown, on the basis of analysis of genome-wide SNP genotypes that most Japanese individuals fall into two main clusters: the Hondo cluster and the Ryukyu cluster. Our results also show that local regions in Honshu Island (the largest island of Japan) are still genetically differentiated, even though human migration within Japan has become rather frequent in the past 100 years or so. Our finding that the individuals from Tohoku were less related to Han-Chinese individuals than were the individuals from Kinki and Kyushu suggests that the individuals in Tohoku were less affected by immigrants from the Asian continent than were the individuals in Kinki. The immigrants who came to Japan from the Asian continent through the Korean Peninsula may have entered Japan from northern Kyushu, the Japan Sea side of Kinki or Chugoku.

American Journal of Human Genetics doi: doi:10.1016/j.ajhg.2008.08.019

Japanese Population Structure, Based on SNP Genotypes from 7003 Individuals Compared to Other Ethnic Groups: Effects on Population-Based Association Studies

Yumi Yamaguchi-Kabata et al.

Abstract

Because population stratification can cause spurious associations in case-control studies, understanding the population structure is important. Here, we examined Japanese population structure by “Eigenanalysis,” using the genotypes for 140,387 SNPs in 7003 Japanese individuals, along with 60 European, 60 African, and 90 East-Asian individuals, in the HapMap project. Most Japanese individuals fell into two main clusters, Hondo and Ryukyu; the Hondo cluster includes most of the individuals from the main islands in Japan, and the Ryukyu cluster includes most of the individuals from Okinawa. The SNPs with the greatest frequency differences between the Hondo and Ryukyu clusters were found in the HLA region in chromosome 6. The nonsynonymous SNPs with the greatest frequency differences between the Hondo and Ryukyu clusters were the Val/Ala polymorphism (rs3827760) in the EDAR gene, associated with hair thickness, and the Gly/Ala polymorphism (rs17822931) in the ABCC11 gene, associated with ear-wax type. Genetic differentiation was observed, even among different regions in Honshu Island, the largest island of Japan. Simulation studies showed that the inclusion of different proportions of individuals from different regions of Japan in case and control groups can lead to an inflated rate of false-positive results when the sample sizes are large.

Link

9 comments:

Maju said...

What's the small pink cluster in the left corner?

Ebizur said...

maju asked,

"What's the small pink cluster in the left corner?"

I would guess that the small, pink cluster in the upper left corner probably represents this study's control group, which seems to have been a sample of Han Chinese.

Also, I would note that the authors' claim, "Our finding that the individuals from Tohoku were less related to Han-Chinese individuals than were the individuals from Kinki and Kyushu suggests that the individuals in Tohoku were less affected by immigrants from the Asian continent than were the individuals in Kinki," is totally unwarranted by the data shown in these graphs. The authors are probably just paying lip service to the proponents of certain archaeological theories. If one actually looks at the graphs, it is obvious that the only Japanese subgroup that produced any individuals whose DNA fell into the control group cluster (probably Han Chinese) was Kanto-Koshinetsu (referring to the Kanto Region of southeastern Honshuu plus Yamanashi, Nagano, and Niigata Prefectures of east-central Honshuu), which contained three individuals who clustered with the outgroup as well as a good number of individuals who fell into a range intermediate between the outgroup cluster and the main Hondo Japanese cluster. The Kinki (central Honshu, including the environs of the historical capitals of Japan, such as Nara and Kyooto) sample did not contain any individuals who clustered with the outgroup, but it did contain a fair number of individuals who were intermediate between the outgroup cluster and the Hondo Japanese cluster. The Kyushu sample, on the other hand, contained only three individuals who fell into the intermediate range, while all other individuals in the Kyushu sample fit snugly into the Hondo Japanese cluster or the Ryukyu cluster, or else were intermediate between the Hondo Japanese and Ryukyu clusters. In fact, most individuals in the Kyushu sample clustered close together in the lower left quadrant of the range covered by the Hondo Japanese cluster.

In my opinion, as someone who has actually lived in Japan and seen the country's people with my own eyes, I would expect samples from the Chuugoku region (westernmost Honshuu) to exhibit the greatest amount of overlap with continental East Asian outgroups. The idea, popular in certain archaeological circles, that Kyushu should exhibit some sort of connection with the continent seems absolutely ridiculous to me, because the people in Kyushu are really the most distinctively "Japanese"-looking of all Japanese, excepting, perhaps, some Ryukyuans. Western Honshuu Japanese (and much more so those in the Chuugoku region than those in the Kinki region, in my opinion) are the only Japanese who I would expect to show a significant deviation in the direction of Chinese or Koreans.

Maju said...

I would guess that the small, pink cluster in the upper left corner probably represents this study's control group, which seems to have been a sample of Han Chinese.

I guess you are right. Thanks.

Also, I would note that the authors' claim... is totally unwarranted by the data shown in these graphs.

I can agree that these graphs may tell only so much about the overall structure (no K-means clustering): what they say is that there are two detected main components, maximized at Ryukyu (ev1) and Tohoku (ev2) and that the contrast to both (most negative for both parameters) is China.

Nevertheless, considering that the Chinese control sample is pretty small in comparison (and therefore should not overweight other Japanese major components), I would tend to think also that there are no more highly relevant components. But we may still be missing some locally important info.

It is curious anyhow that Tohoku people appear somewhat more distinct than Hokkaido people. Was Hokkaido more heavily colonized than northern Honsu?

I also agree that the lack of sampling of SW Honsu (Chugoku) is odd and may hide some further info, probably in the line you mention.

Ebizur said...

maju said,

"I guess you are right. Thanks."
Well, I was only guessing, so don't blame me if I was wrong. Actually, I am quite certain that the pink cluster is the study's control group, but I am less sure about the identification of this control group as a sample of Han Chinese.

"I can agree that these graphs may tell only so much about the overall structure (no K-means clustering): what they say is that there are two detected main components, maximized at Ryukyu (ev1) and Tohoku (ev2) and that the contrast to both (most negative for both parameters) is China."
Yes, this is correct, but look at where the Kyushu samples fall: they mostly belong to a coherent cluster that, in regard to Eigenvector 1, is neutral and located almost exactly midway between the Okinawa sample and the control sample, and, in regard to Eigenvector 2, deviates from the control sample even more than the Okinawan sample deviates from the control sample. Furthermore, the Kyushu sample is no closer to the control sample along Eigenvector 1 than the Tokai-Hokuriku, Hokkaido, and Tohoku samples are. This distribution cannot be explained by a lesser impact of immigration on Tohoku, Okinawa, etc. in comparison to Kyushu.

"It is curious anyhow that Tohoku people appear somewhat more distinct than Hokkaido people. Was Hokkaido more heavily colonized than northern Honsu?"
There is no need to invoke colonization to explain the variation between Kyushu and Tohoku along Eigenvector 2; Okinawa was, in fact, the closest to the control sample along this dimension, and the position of the Kyushu sample between the Okinawa and Tohoku samples along Eigenvector 2 is best explained by simple isolation-by-distance within the Japanese Archipelago. There does not seem to be any significant difference between the Kyushu and Tohoku samples in regard to Eigenvector 1.

If we assume that the Okinawa sample best represents the "aboriginal inhabitants" and that the position of the Kyushu and Tohoku samples in regard to Eigenvector 1 is due to immigration from the population represented by the control sample, then there is no case for claiming that the Tohoku sample has received less immigration from the outgroup population; in fact, the Kyushu sample shows much more deviation in the direction of the Okinawa sample than the Tohoku sample does.

The reason why Hokkaido appears squarely in the center of the major Hondo Japanese cluster is because the modern population of Hokkaido is descended from recent migrants who originated from all over the Hondo Japanese range (although supposedly with a bias in favor of the Hokuriku region), so the present population of Hokkaido is a homogenized group of Hondo Japanese.

"I also agree that the lack of sampling of SW Honsu (Chugoku) is odd and may hide some further info, probably in the line you mention."
I really wouldn't be surprised if many of the individuals who appeared in the intermediate range (between the control sample and the Hondo Japanese cluster) turned out to have origins in the Chugoku region of westernmost Honshu. However, the Chugoku region is sparsely populated despite its small size; within Japan, only Hokkaido and parts of the Tohoku region (especially Iwate and Akita prefectures) are more sparsely populated. This limits the amount of genetic influence the Chugoku region can exert on other parts of Japan.

Ebizur said...

Historically, the most famous part of the Chugoku region has been the province of Izumo (also sometimes romanized as Idumo, according to the historical kana spelling), which served as the setting of the "Kuni-biki" (land-pulling) myth, as well as containing the entrance to Yomi, a Japanese analogue of the realm of Hades.

According to Erwin von Bälz, a German physician who is often credited with introducing Western medical practices to Japan, Japanese people could be phenetically classified into two types, the "Satsuma type" and the "Choshu type." (Satsuma refers to Satsuma Province of southwestern Kyushu; Choshu refers to Nagato Province at the western tip of Honshu.) Bälz described the Satsuma type as short-statured and stocky, with large, well-defined eyes, thick lips, and a great breadth of the nose, and said that the majority of Japanese belonged to this type. Bälz described the Choshu type as being tall and oval-faced with thinly drawn eyes and a narrow nose, and he said that this was an elegant look found among some noble families.

From my personal experience, I would say that the "Choshu type" Japanese may easily be confused for some Chinese people, whereas most of the "Satsuma type" Japanese are very obviously Japanese.

Maju said...

but look at where the Kyushu samples fall: they mostly belong to a coherent cluster that, in regard to Eigenvector 1, is neutral and located almost exactly midway between the Okinawa sample and the control sample, and, in regard to Eigenvector 2, deviates from the control sample even more than the Okinawan sample deviates from the control sample. Furthermore, the Kyushu sample is no closer to the control sample along Eigenvector 1 than the Tokai-Hokuriku, Hokkaido, and Tohoku samples are. This distribution cannot be explained by a lesser impact of immigration on Tohoku, Okinawa, etc. in comparison to Kyushu.

Please consider that what separates the Okinawans from the control is basically the ev1, they can perfectly be neutral in ev2 and still have little to do with the control group. Nevertheless they are not really neutral but do appear to have more "Tohoku" than continental blood, even if probably both are rather low among them.

The same applies to Kyushu people but they cluster much better with the "Tohoku" pole in both vectors.

There is no need to invoke colonization to explain the variation between Kyushu and Tohoku along Eigenvector 2; Okinawa was, in fact, the closest to the control sample along this dimension, and the position of the Kyushu sample between the Okinawa and Tohoku samples along Eigenvector 2 is best explained by simple isolation-by-distance within the Japanese Archipelago.

As said above the Okinawans appear to be relatively neutral in ev2 because of very low levels of either component (Tohoku or continental). Would you be right, Kyushu main cluster would appear right in a diagonal between Tohoku and Okinawa clusters, but it actually appears more to the left, in a diagonal between the rest of Japanese (main cluster always) and Okinawans. I would even say that the line crossing Okinawan and Kyushuan clusters (gravity center) falls rather to the left (towards the continent, but still inside the main Japanese cluster).

Of course there could be a 4th component, etc. that we can't see here.

If we assume that the Okinawa sample best represents the "aboriginal inhabitants" and that the position of the Kyushu and Tohoku samples in regard to Eigenvector 1 is due to immigration from the population represented by the control sample, then there is no case for claiming that the Tohoku sample has received less immigration from the outgroup population

You may be right. We would certainly need to know where more or less pure Ainu samples would fall in this graph. They may well represent a 4th component not visble here due to undersampling. Still the Tohoku sample appears clearly as the "purest" Japanese in ev2 (but all Japanese except Okinawans appear very much like Chinese in ev1).

I was assuming that the ev2 might represent the Ainu/Jomon component but it may actually be just the difference between the Yayoi and Chinese - nothing else.

In that case you are right about Okinawans and ev1 representing the native component best - but it's clear that the Ainu component is missing here.

The reason why Hokkaido appears squarely in the center of the major Hondo Japanese cluster is because the modern population of Hokkaido is descended from recent migrants who originated from all over the Hondo Japanese range (although supposedly with a bias in favor of the Hokuriku region), so the present population of Hokkaido is a homogenized group of Hondo Japanese.

That's what I imagined, more or less. And I wonder therefors if the ev1 component therefore does not represent best the Jomon/Ainu component, specially because it's more dominant in northern Honsu, where the Ainu are known to have resisted for long.

According to Erwin von Bälz, a German physician who is often credited with introducing Western medical practices to Japan, Japanese people could be phenetically classified into two types, the "Satsuma type" and the "Choshu type."

What about the Jomon/Yayoi distinction some make? I would expect the latter to be closer to Northern Chinese and other peoples of mainland middle East Asia (Koreans, Mongols, Tungus) and the Jomon type to be soemwhat more like the Ainu. The two types you mention are both from the SW and while the Satsuma would surely relate to Okinawans, the northern component (clear in this graphs too) seems missing.

ren said...

Maju:
The graphs are plotting people into relative positions, not showing how much "Ainu blood" they have. If we have 10,000 clones of Obama compared to 10,000 clones of Maju, they will show up as two extremes without the Obama's showing a "negroid" component.

Ebizur:
I'd say those Kyushuites of whom you said to be uniquely Japanese in a European way are a combination of the two archetypes.

Maju said...

The graphs are plotting people into relative positions, not showing how much "Ainu blood" they have. If we have 10,000 clones of Obama compared to 10,000 clones of Maju, they will show up as two extremes without the Obama's showing a "negroid" component.

What in my comment makes you think I am considering it to be otherwise? Anyhow, obviously these people are not clones, even if they may be distant relatives.

I'd say those Kyushuites of whom you said to be uniquely Japanese in a European way are a combination of the two archetypes.

In this I think I agree (inference from the graph). They do not appear particularly "Tohokuan" within the Japanese anyhow (though they do when contrasted with continentals, sure).

Unknown said...

I just wish a research like this happens for my country too!