December 08, 2010

Genome-wide analysis of population structure in the Finnish Saami

The K=6 ADMIXTURE results from the supplementary material can be seen below:

This is based on ~38k SNPs.

It is unfortunate that they included Native American HGDP populations, but did not include the most relevant published data on Siberians that I first used to study population structure across north Eurasia here and here and here.

Hence, they discover a "Native American"-like component in Saami, which in all likelihood can be further resolved into Siberian-specific components utilizing the Rasmussen et al. dataset.

The "closest approximation" to the East Eurasian component in Saami in the HGDP panel are the Yakuts, but finer-scale analysis (see my previous posts) reveals that the Yakuts are made up almost entirely of an Altaic-specific component tying them to Turkic, Mongol, and Tungusic populations, while the eastern component in European Finns, Vologda Russians and Chuvashs has relationships with Central Siberians such as Kets, Selkups, and Nganasans, all of which are missing in this paper.

Hopefully this data will become publicly available online for re-analysis with the relevant populations included.

European Journal of Human Genetics advance online publication 8 December 2010; doi: 10.1038/ejhg.2010.179

A genome-wide analysis of population structure in the Finnish Saami with implications for genetic association studies

Jeroen R Huyghe et al.

The understanding of patterns of genetic variation within and among human populations is a prerequisite for successful genetic association mapping studies of complex diseases and traits. Some populations are more favorable for association mapping studies than others. The Saami from northern Scandinavia and the Kola Peninsula represent a population isolate that, among European populations, has been less extensively sampled, despite some early interest for association mapping studies. In this paper, we report the results of a first genome-wide SNP-based study of genetic population structure in the Finnish Saami. Using data from the HapMap and the human genome diversity project (HGDP-CEPH) and recently developed statistical methods, we studied individual genetic ancestry. We quantified genetic differentiation between the Saami population and the HGDP-CEPH populations by calculating pair-wise FST statistics and by characterizing identity-by-state sharing for pair-wise population comparisons. This study affirms an east Asian contribution to the predominantly European-derived Saami gene pool. Using model-based individual ancestry analysis, the median estimated percentage of the genome with east Asian ancestry was 6% (first and third quartiles: 5 and 8%, respectively). We found that genetic similarity between population pairs roughly correlated with geographic distance. Among the European HGDP-CEPH populations, FST was smallest for the comparison with the Russians (FST=0.0098), and estimates for the other population comparisons ranged from 0.0129 to 0.0263. Our analysis also revealed fine-scale substructure within the Finnish Saami and warns against the confounding effects of both hidden population structure and undocumented relatedness in genetic association studies of isolated populations.

Link

6 comments:

German Dziebel said...

"It is unfortunate that they included Native American HGDP populations, but did not include the most relevant published data on Siberians that I first used to study population structure across north Eurasia here and here and here.

Hence, they discover a "Native American"-like component in Saami, which in all likelihood can be further resolved into Siberian-specific components utilizing the Rasmussen et al. dataset."

If you look at other cluster analyses such as Rosenberg et al. 2002, you'll see a Native American component (purple, especially in Surui and Karitiana, the least admixed of all world populations) in all world populations with declining frequencies from east to west. This study simply shows that in Saami this component is higher than in other Europeans, which is to be expected, as fringe populations retain more of the original genes (comp. the recent discovery of a Native American-related C1 mtDNa lineage in Iceland), while the mainland gets swept by later population movements (e.g., mtDNA U and H in Europe). They could've included Siberian data - true - but this wouldn't have change anything because Siberian populations tend to have the Native American component at higher frequencies than Europeans or Africans but not as high as Native Americans themselves. If Africans are said to be the first continental population to split off, Native Americans haven't "split off" at all.

Mauri said...

They should have used Norwegian Saamis, because the Saami population in Finland is tiny and heavily mixed with Finns. Norwegian Saamis are more numerous and less mixed.

Anonymous said...

Rather than playing the which Saami is the least mixed and pure game, it would have been better to spend some extra money and obtain samples from all the people who call themselves Saami and have some sort of credibility as Saami among Saami. Don't restrict Saami to any country whether Norway, Sweden, Finland or Russia. I have seen Norwegian Saami that would make the grade as super Herrenfolk pure Nordics. Least admixed?

Amerindians are close to Europeans, and as close as East Asians, but it would indeed have been better to use Siberian Asians as the check ethnic groups instead of Amerindians. As far as purity goes which Amerindians are the least admixed? I doubt there are enough unmixed Amerindians in the whole of the Western Hemisphere to supply a decent sample of pure Amerindians.

My question is: Why use so few SNPs? It is evident from the differences between Dodecad 23andMe and FTDNA results that the larger sample used for 23andMe and their applicability to mainstream SNP genetic studies makes it more useful and accurate than those oddment SNPs chosen by FTDNA.

Mauri said...

I meaning was not to underline the purity but keep in mind where they live

Norway 60000-100000
Sweden 15000
Finland 10000
Russia 2000

We have not possibility the make any "purity check" and it would be anyway a stupid idea. All this science is on a very basic level and driven basically by personal opinions, so I see that we should have people tested where they mostly live to get reliable results.

German Dziebel said...

"As far as purity goes which Amerindians are the least admixed? I doubt there are enough unmixed Amerindians in the whole of the Western Hemisphere to supply a decent sample of pure Amerindians."

I wouldn't be so skeptical. Just a couple of weeks ago I visited a Guarani village on the border between Argentina and Brazil, and they told me, anecdotally, that in the case of an outsider "marrying" a Guarani, the couple and their offspring tend not to stay around but move out into the "bigger world." Hence, there's a pretty good chance that whoever stays behind is - with all caveats - unadmixed.

Andrew Oh-Willeke said...

"As far as purity goes which Amerindians are the least admixed? I doubt there are enough unmixed Amerindians in the whole of the Western Hemisphere to supply a decent sample of pure Amerindians."

You don't need a very pure sample, because there is such a large phylogenetic gap between the admixed European and African populations, and the historical events given rise to the admixture are documented well enough, that the Amerindian component can be separated statistically with a high degree of confidence. Also, even a small pure Amerindian sample can add a great deal of power to resolution of the close calls statistically from the admixed population.

The Ancesteral South Indian component in India that basically doesn't exist in an unadmixed form any more has been recontructed in a similar way with far less clarity about what the admixing Ancesteral North Indian component looked like and far less accurate historical documentation of the admixture process itself. In the Americas it is frequently possible to do things like determine the exact ethnic mix of non-Amerindians during the period of peak admixture from historic records.