November 16, 2010

Genomic runs of homozygosity in worldwide populations (Kirin et al. 2010)

This is a very interesting paper about the global distribution of runs of homozygosity. Such runs are typical of recently inbred individuals (who have a greater chance of inheriting the same chunk of DNA from their related parents), but also occur because of population history (populations that today number in the millions are descended from a much smaller of ancestors, so even if one's parents aren't "relatives" in the genealogical sense, they, nonetheless contain chunks of identical DNA).

"Old" inbreeding manifests itself in small chunks, as DNA chunks of ancestors get cut into ever finer pieces across the generations, while recently inbred individuals may have very long chunks.

Oceanians and Native Americans, for example, who are descended from relatively few founders because of the bottlenecks associated with crossing the Beringian/maritime voyages have an excess of short runs of homozygosity, but Native Americans also have long ones, suggestive of recent consanguinity.

Raw data can be found in the supplement.

PLoS ONE 5(11): e13996. doi:10.1371/journal.pone.0013996

Genomic Runs of Homozygosity Record Population History and Consanguinity

Mirna Kirin et al.

The human genome is characterised by many runs of homozygous genotypes, where identical haplotypes were inherited from each parent. The length of each run is determined partly by the number of generations since the common ancestor: offspring of cousin marriages have long runs of homozygosity (ROH), while the numerous shorter tracts relate to shared ancestry tens and hundreds of generations ago. Human populations have experienced a wide range of demographic histories and hold diverse cultural attitudes to consanguinity. In a global population dataset, genome-wide analysis of long and shorter ROH allows categorisation of the mainly indigenous populations sampled here into four major groups in which the majority of the population are inferred to have: (a) recent parental relatedness (south and west Asians); (b) shared parental ancestry arising hundreds to thousands of years ago through long term isolation and restricted effective population size (Ne), but little recent inbreeding (Oceanians); (c) both ancient and recent parental relatedness (Native Americans); and (d) only the background level of shared ancestry relating to continental Ne (predominantly urban Europeans and East Asians; lowest of all in sub-Saharan African agriculturalists), and the occasional cryptically inbred individual. Moreover, individuals can be positioned along axes representing this demographic historic space. Long runs of homozygosity are therefore a globally widespread and under-appreciated characteristic of our genomes, which record past consanguinity and population isolation and provide a distinctive record of the demographic history of an individual's ancestors. Individual ROH measures will also allow quantification of the disease risk arising from polygenic recessive effects.

Link

14 comments:

Marnie said...

Thanks for both this paper and the Oceania paper.

The theory about age associated granularity in DNA "chunking" is very interesting.

You've probably noticed that your CEU data doesn't have much variation. It's a very "settled" population.

With that in mind, the unusual combinations of skin color, freckling, eye color and hair color in the British Isles, which are associated with the OCA2 gene, suggest that OCA2 has been subjected to fine "chunking" within this population.

Brown hair, light eyes and fair skin also appear in certain parts of Greece, as you know. Not just in the North. For instance, it seems to be common in people from Kalamata.

Andrew Oh-Willeke said...

Central and South America have a double bottleneck/founder effect. There is the historically distant Beringian founder bottleneck, but there is also the founder effects associated with rapid population growth during the pre-Columbian adoption of agriculture (Olmec/Mayan/Aztec in Central America, Inca in South America).

Andrew Oh-Willeke said...

One more thought. At the Columbian era, there was massive European disease related deaths across the Americas. Those who survived probably had adaptive mutations that ran in families. Thus, a good share of pre-Columbian genetic diversity may have been removed from the population within the last 500 years.

Anonymous said...

0.5- 1 Range
So as expected Africa with the most diverse population has the lowest homozygosity in the 0-1 range.

It can be expected that the ROH value for the other populations will depend on the diversity of the founder population. So it makes sense that Central/South/West Asia is next and South America is the highest.

But why is Oceania high and EXACTLY the same as South America? I expected Oceania to be similar to South Asia. Does this mean that they were both founded by the same basic population (identical diversity) and isolation has retained the initial diversity? Is it a coincidence?

>16 ROH
This is really a measure of inbreeding but it has to be adjusted in accordance with the population diversity. So all things being equal the populations should have the same proportions as for 0.5-1 ROH.

Oceania has disportionately low inbreeding. This is surprising given the isolation of some island populations but fits with some known cultural practices, in Australia at least, that carefully restrict breeding practices.

Central/South America seems to be disproportionately high. No idea what that is about.

Central/South/West Asia are also higher than expected.

Europe and East Asia are lower than expected.

I would expect USA/Canada to be higher than the Europe value in this range.

German Dziebel said...

"The Biaka and Mbuti pygmies and !Kung San have on average more than double the total length of ROH between 1–16 Mb compared to the Bantu, Yoruba and Mandenka."

This means that African foragers are less genetically diverse than African agriculturalists and pastoralists. It also means that Pygmies and the San as outliers in Africa and are closer to populations outside of Africa in their degree of homozygosity. It doesn't make sense when people argue that Africans is an older population because they are more diverse - no, the more recent subset of Africans (agriculturalists and pastoralists) is more diverse than the older subset of Africans. The whole argument is, therefore, flawed.

"ROH, like other aspects of our genetic variation including genetic diversity and linkage disequilibrium, demonstrate a strong correlation with distance from East Africa."

How can homozygosity decrease from East Africa to the rest of the world, if Pygmies and San are more homozygous than West Africans. (Presumably, they are more homozygous than East Africans, although this paper didn't sample East Africans.)

"both ancient and recent parental relatedness (Native Americans); and (d) only the background level of shared ancestry relating to continental Ne (predominantly urban Europeans and East Asians; lowest of all in sub-Saharan African agriculturalists)"

Here we're getting to the thing hidden since the foundation of the world. If we attribute greater antiquity to more heterozygous populations, then we are due to argue that humans originated in agglomerated, urbanized Sub-Saharan Africa and made their first stop in New York or Tokyo.

Since this is clearly impossible, extreme levels of homozygosity in the New World are likely to be of great antiquity and represent long-term isolation and prolonged tribal (second-cousin marriage) and demic (bilateral first-cousin marriage, e.g., in the Amazon) endogamy. Under a single origin scenario of human evolution, humans evolved from a small group of founders. American Indians appear to have retained this early human demographic pattern, which has been progressively lost as populations colonized the Old World and expanded in size.

American Indians, a continental population that scores the highest mean total ROH in all length categories, are divided into 140 language stocks. This extreme linguistic diversity shows that they cannot be recently derived from a Siberian antecedent. These are, respectively, the genetic and linguistic signatures of an old, Mid-Pleistocene outlier in the human tree.

Andrew Oh-Willeke said...

"This extreme linguistic diversity shows that they cannot be recently derived from a Siberian antecedent."

There are arguments that some peoples of Northern North America have more recent orgins in Siberia, but nobody is seriously arguing that indigeneous people in Central America and South America have Siberian roots that are any younger than Clovis, and there is continuing academic debate over whether the origins could be a few thousand years old than that.

11,000 or more years is more is a linguistic depth that makes it difficult to recover any linguistic unity.

Are you arguing that Clovis is "recent"?

German Dziebel said...

"Are you arguing that Clovis is "recent"?"

This is the accepted way to refer to the Clovis timeframe. We're way past Hrdlicka times, Andrew.

"11,000 or more years is more is a linguistic depth that makes it difficult to recover any linguistic unity."

Yes, no and irrelevant. Yes because for the majority of language families out there linguists entertain dates between 3 and 5,000 years. No, because there are language families that seem to be detectable by standard comparative method beyond 5,000 years. Examples: Dene-Yeniseian, Austronesian, Austric (Austronesian + Austroasiatic, with some caveats), Afroasiatic. Irrelevant because 140 language stocks found in the Americas (compare only 20 in Africa) will coalesce much earlier than 15,000 years, by any estimate. If Amerindians all stem from a single small founding population, as this study seems to re-affirm, then it's unlikely it encompassed more than a couple of language stocks.

Andrew Oh-Willeke said...

As far as "recent" goes, the term has very different meanings in different disciplines (e.g., geologists think 65 mya is recent, U.S. historians think 60 ya is recent, journalists think last week is recent, and European historians think 500 ya is recent), so it is hard to know on the internet what one means by the term. I would normally think of "recent" in the linguistic context as events happening in the historical era (e.g. no more than 5500 ya and often less), since that marks the divide roughly between definitively understood parts of the discipline and those where theory is critical to reaching conclusions.

It is also worth noting that one reason that Amerindian has so many "language stocks", in part, as an artifact of lack of scholarship. There aren't a whole lot of people doing Amerindian linguistics in Central and South America where most of the language families are located (and the number of language families and isolates is considerably fewer in better studied North America and is falling), and the current linguistic discipline trend is to set the bar for showing linguistic relationships in a single language family very high, even when, as in the Amerindian case, there is every reasons to infer from coroborating evidence that there are genetic relationships between languages.

I seriously doubt that there are as many different language families in Latin America as current lists imply, although there may be more linguistic diversity simply for lack of Neolithic sweeps in much of the region to standardize languages, and because most of the languages were no committed to writing in the Pre-Columbian era.

German Dziebel said...

"As far as "recent" goes, the term has very different meanings in different disciplines..."

Let's stick to American archaeology.

"It is also worth noting that one reason that Amerindian has so many "language stocks", in part, as an artifact of lack of scholarship."

It's kinda ironic: I consider the paucity of archaeological sites in America earlier than 13,500 partially a result of poor/non-existent scholarship.

I would disagree with you: it doesn't take much effort to establish a language family. Sir William Johnson did it for a bulk of IE languages already in 1787. It takes much longer to flesh out all the nitty gritty phonological, etc details, but the fact of relatedness of tens and hundreds of languages is a very manageable task. Amerindian linguistics has as much history as Indo-European, Uralic, Bantu or Austronesian linguistics, with enough scholars dedicated to genealogical taxonomy studies.

"the number of language families and isolates is considerably fewer in better studied North America and is falling."

It's true that there are fewer isolates in North America than in South America. But everywhere we go there are differences between cultural properties found in South American and North American areas. For instance, indigenous musical instruments are much more diverse in South America, than in North America but it doesn't mean that South America is understudied musicologically. It's just objective reality for which there are historical causes.

The number of isolates in North America isn't falling. The number of isolates in Asia has just fallen by one, namely Ket, that's now an offshoot of Na-Dene (in North America), while Haida continues to be an isolate.

"the current linguistic discipline trend is to set the bar for showing linguistic relationships in a single language family very high."

Again, ironically, it's exactly what's going on with the standards of "proofiness" in American archaeology. And, again, I don't agree with your assessment. Linguistic methods are pretty objective: you either have sound laws across a large swathe of vocabulary and across grammatical paradigms or you don't. In the Old World, there are many more good candidates for long-distance language relationship (e.g., Austric, Sino-Austronesian, Nilo-Congo) than in the Americas.

One thing that favors Indo-European and some other Old World language families is the existence of ancient written attestations. Without those the kinship of Armenian with French would have been hardly demonstrable. But these are exceptions.

Another argument that was made is that Amerindian languages are grammatically and morphologically structured in such a way as to favor faster phonetic and lexical drift. This is indeed an interesting argument, since indeed grammatically and morphologically Amerindian languages are rather unique. But if we use this argument as a null hypothesis, we may confront another difficulty: why should Amerindian languages be so unique if they separated from a Siberian antecedent relatively recently? We are just going to trade one puzzle for another, plus nobody has actually demonstrated that Amerindian stock diversity is a function of Amerindian grammar.

German Dziebel said...

"I seriously doubt that there are as many different language families in Latin America as current lists imply, although there may be more linguistic diversity simply for lack of Neolithic sweeps in much of the region to standardize languages, and because most of the languages were no committed to writing in the Pre-Columbian era."

Papua New Guinea has more linguistic stocks per unit of geography than the New World. (Although the Amazon still has the largest number of isolates.) This suggests that the New World saw plenty of "neolithic" language replacements. Plus, and I brought it up with you already earlier, Amerindian languages suffered dramatic extinctions in the past 500 years - more than any other continent - so it's linguistic diversity is in fact lower than it could have been.

terryt said...

"Papua New Guinea has more linguistic stocks per unit of geography than the New World".

That's hardly surprising if the currently accepted dates are correct. New Guinea has been inhabited for at least 40,000 years, and is very mountainous and jungle-clad giving rise to isolated populations. America may have been settled as recently as 15,000 years.

"the Amazon still has the largest number of isolates"

Much the same holds for that region as for New Guinea.

German Dziebel said...

"That's hardly surprising if the currently accepted dates are correct. New Guinea has been inhabited for at least 40,000 years, and is very mountainous and jungle-clad giving rise to isolated populations. America may have been settled as recently as 15,000 years."

By your logic, Terry, Africa, then was settled 5,000 years ago, as linguistic diversity there is much lower than in the New World or PNG.

terryt said...

"By your logic, Terry, Africa, then was settled 5,000 years ago, as linguistic diversity there is much lower than in the New World or PNG".

But Africa is not near as mountainous or jungle-clad as Africa.

terryt said...

Correction:

'But Africa is not near as mountainous or jungle-clad as New Guinea'.