November 26, 2009

Τwo papers on Genetic structure of Han Chinese in AJHG

A couple of new papers on population structure in the Han Chinese have just appeared in the American Journal of Human Genetics. My comments will follow once I read the two papers.

UPDATE (on Chen et al.):

The PCA plot on the left from Chen et al. shows clearly the north-south cline of genetic variation in China. While there are no apparent barriers within the Han ethnic group, it is clear that subsets of Han Chinese can be perfectly distinguished from each other by just looking at the first two principal components.

From the paper:
The one-dimensional subpopulation structure of the Han Chinese population (along PC1) showed a close resemblance to their sampling location son a geographic map, and there is a very high correlation of 0.93 between the mean PC1 values of samples and the median latitudes of the provinces

The results of the STRUCTURE analysis are also very interesting as they show the expected clinality of variation within China rather than sharp distinctions, paralleling the situation in the landmass of Europe. However, at K=3 the major component of the Japanese (JPT) is shown to be a low-level component within the Chinese. It's hard to interpret this, but a first hypothesis could be that the Japanese are descended from an earlier Mongoloid genetic stratum that has since been admixed in the Asian mainland with other Mongoloid groups, but retained its "purity" in the Japanese islands.

The American Journal of Human Genetics, 25 November 2009

Genetic Structure of the Han Chinese Population Revealed by Genome-wide SNP Variation

Jieming Chen et al.


Population stratification is a potential problem for genome-wide association studies (GWAS), confounding results and causing spurious associations. Hence, understanding how allele frequencies vary across geographic regions or among subpopulations is an important prelude to analyzing GWAS data. Using over 350,000 genome-wide autosomal SNPs in over 6000 Han Chinese samples from ten provinces of China, our study revealed a one-dimensional “north-south” population structure and a close correlation between geography and the genetic structure of the Han Chinese. The north-south population structure is consistent with the historical migration pattern of the Han Chinese population. Metropolitan cities in China were, however, more diffused “outliers,” probably because of the impact of modern migration of peoples. At a very local scale within the Guangdong province, we observed evidence of population structure among dialect groups, probably on account of endogamy within these dialects. Via simulation, we show that empirical levels of population structure observed across modern China can cause spurious associations in GWAS if not properly handled. In the Han Chinese, geographic matching is a good proxy for genetic matching, particularly in validation and candidate-gene studies in which population stratification cannot be directly accessed and accounted for because of the lack of genome-wide data, with the exception of the metropolitan cities, where geographical location is no longer a good indicator of ancestral origin. Our findings are important for designing GWAS in the Chinese population, an activity that is expected to intensify greatly in the near future.


The American Journal of Human Genetics, 25 November 2009

Genomic Dissection of Population Substructure of Han Chinese and Its Implication in Association Studies

Shuhua Xu et al.


To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at ∼160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (FST = 0.0002 ∼0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (FST > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p less than 10−101). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.


terryt said...

"our study revealed a one-dimensional 'north-south' population structure".

Years ago I used some books written in 1978 and 1979 as references for part of a research project on Polynesian origins where the authors claimed such a structure in China. In fact that structure has continued to be significant in helping to unravel Polynesian origins.

Dienekes said...

Yes, it would be interesting to see Chinese genetic variation in the context of other Mongoloid groups. Chen et al. shows, for example, that the Japanese are closer in admixture proportions to Liaoning and other northern groups at K=2, while becoming distinct at K=3.

Kepler said...

It would be nice to see a map distribution of haplogroups as well.

Unknown said...

Some observations from a westerner who has lived in China and (now) in Korea. (1) It is obvious to westerners that people from the dongbei (literally, east-north, in this study 'Liaoning', which abuts North Korea) are (generally) taller and lighter in complexion than southerners. Hebei ("River [Yangtse] north") is a province that surrounds Beijing and abuts Liaoning; people from this area fall on the cline. (I recall a cladistic study a few years ago which showed that northern chinese are more closely related to caucasians than to southern chinese.) The obvious hypothesis involves introgression from russia. I do not see any data from 'further north' in China (Heilongjian and Jilin, much more affiliated with the Russian far east). Samples from there and from Inner Mongolia would be interesting.
(2) Japan was colonized by people from Korea. This explains their similarity (K2) with people from Liaoning. A further study of korean genetics would be very informative. I predict that they are intermediate between Liaoning and Japan.

Unknown said...

what kepler said

Ebizur said...

Derek said,

"Hebei ("River [Yangtse] north") is a province that surrounds Beijing and abuts Liaoning; people from this area fall on the cline."

As a fellow "Westerner," I am ashamed that you would confuse the he of Hebei, which refers to the Yellow River (Huang He), for jiang, which refers to the Yangtze River (Chang Jiang).

Anonymous said...

The difference between NE Asia ( Japan and Korea) and Han chinese is the ratio of Y haplo O2 and D1. which is less in Han chinese. Coincidentally same clades appear in Tibetans.

May be that is the difference at K=2.

It could have been nice if they extended this study beyond singapore to Malaysia and Indonesia where again you can see O2-95.

Unknown said...

Ebizur: You're correct. My mistake. Shoddy research on my part.

Anonymous said...

@ Derek : "I recall a cladistic study a few years ago which showed that northern chinese are more closely related to caucasians than to southern chinese"

Are you sure you're not confounding with studies about the 500 BC population of Linzi (Shandong, China) ?

This :;col1

Unknown said...

waggg: No, that's not it. The cladogram I'm thinking about was in a book (Cavalli-Sforza maybe?) and pertained to modern Chinese populations. I took the point that cladistic analysis is not appropriate if populations still interbreed after they "split". I believe the authors also reached the same conclusion: that significant "caucasian" genetic material had seeped into the northern han gene pool.

aargiedude said...
This comment has been removed by the author.
aargiedude said...

In the 1st study the FST distance (aka genetic distance) between north and south (of eastern China) is 0,0022:

1st supplementary page
[page 12 of the pdf]

In the 2nd study the average FST distance between eastern Chinese regions seems to be somewhere between 0,0100 and 0,0150:

1st supplementary page
[page 3 of the pdf]

A five-fold difference between the 2 studies. Also notice, in the 2nd pdf, that there's a cluster of results in the right part of the graph. Those are obviously the Xinjiang samples (northwest China). Their FST distances to eastern Chinese regions seem to also average between 0,0100 and 0,0150.

Anonymous said...

The Japanese islands may have been peopled by immigrants from the mainland, today's Korea, but the Ainu have contributed somewhat to Japanese genetics, as did people from the south. The Japanese are not homogeneous unless you mean they have mainly bred with each other exclusively for thousands of years to produce a consistent hybridized or mixed predominantly East Asian.

terryt said...

"The Japanese are not homogeneous unless you mean they have mainly bred with each other exclusively for thousands of years to produce a consistent hybridized or mixed predominantly East Asian".

True of every other 'ethnic' group as well. Although obviously not all are East Asia.

Anonymous said...
This comment has been removed by a blog administrator.
Unknown said...

Instead of peering down microscopes just take a tour of China. Of course, people are different because they are different races. Just like in USA, you have Irish, Italians, Japanese, people from China and all parts of the world. China is a vast region with many people living there since time immemorial.

The Han is not even a race but a label (named after the Han dynasty)created by post imperial China's leaders to group people for nation building purposes.

Only those that could be passed off as one 'Han race' were included in that grouping. Others, like Tibetans obviously are a group of their own.

Over time people there will be some 'merging' when people inter marry. My dad was Hakka, mum Hokkien, on paper, but their dads and mums inter married too and so forth. I probably have more than a dozen mix of Hakka, Hokkien, Cantonese, Foochow and I was told some Portuguese as well.