September 22, 2012

Structural stability and ancient connections between languages

From the press release:

Using a large database and many alternative methods Dediu and Levinson show that both positions are right: there are universal tendencies for some features to be more stable than others, but individual language families have their own distinctive profile. These distinctive profiles can then be used to probe ancient relations between what are today independent language families.  
"Using this technique we find for instance probable connections between the languages of the Americas and those of NE Eurasia, presumably dating back to the peopling of the Americas 12,000 years or more ago," Levinson explains. "We also find likely connections between most of the Eurasian language families, presumably pre-dating the split off of Indo-European around 9000 years ago."

From the paper:
Quite convincing is the evidence that Core Eurasian families (comprising Altaic – or Mongolic + Turkic –, Dravidian, Indo-European, Uralic and the Caucasian families) might form a group (p=0.0013, 5 methods, and , p=0.094, 4 methods, when controlling for geography).
The authors were also able to reject the "broad" Afroasiatic group "comprising Afro-Asiatic, Indo-European, Dravidian and Uralic". I think this makes some sense, since Afroasiatic is basically an African language family with a Near Eastern offshoot, so I did not expect it to group with the Eurasian language families.

The Core Eurasian group seems very interesting in light of accumulating evidence about contacts between human groups across Eurasia. Such a group is pushing the limits of what can be inferred using linguistic data, and, perhaps, archaeogenetics might provide some evidence that might be used to plausibly argue for such a relatively broad group.

PLoS ONE 7(9): e45198. doi:10.1371/journal.pone.0045198

Abstract Profiles of Structural Stability Point to Universal Tendencies, Family-Specific Factors, and Ancient Connections between Languages

Dan Dediu, Stephen C. Levinson

Language is the best example of a cultural evolutionary system, able to retain a phylogenetic signal over many thousands of years. The temporal stability (conservatism) of basic vocabulary is relatively well understood, but the stability of the structural properties of language (phonology, morphology, syntax) is still unclear. Here we report an extensive Bayesian phylogenetic investigation of the structural stability of numerous features across many language families and we introduce a novel method for analyzing the relationships between the “stability profiles” of language families. We found that there is a strong universal component across language families, suggesting the existence of universal linguistic, cognitive and genetic constraints. Against this background, however, each language family has a distinct stability profile, and these profiles cluster by geographic area and likely deep genealogical relationships. These stability profiles seem to show, for example, the ancient historical relationships between the Siberian and American language families, presumed to be separated by at least 12,000 years, and possible connections between the Eurasian families. We also found preliminary support for the punctuated evolution of structural features of language across families, types of features and geographic areas. Thus, such higher-level properties of language seen as an evolutionary system might allow the investigation of ancient connections between languages and shed light on the peopling of the world.

Link

22 comments:

princenuadha said...

IE looks relatively close to Uralic and far from Afro-asiatic.

eurologist said...

Interesting study.

Wow, the SE Asian / Oceanian (including Papuan and Australian) languages are all over the place. Looks almost like any relatedness is that they are... languages.

Caucasian seems the most remote within "Eurasia;" interesting to see Dravidian so close. Any suggestions as to why?

Belenos said...

Who are these people, and why do they keep publishing nonsense?

You can write off these findings with a one word answer:

Sprachbund.

tew said...

I second Belenos' comment. There is more, though. It isn't just that these non-linguists/pseudo-linguists keep writing nonsense, it's that their PR machine keeps spreading it in non-specialist publications and the Pop media even though linguists are virtually unanimous in rejecting that same nonsense. Not sure what the agenda is. If this happened in biology, geneticists here would be screaming "crackpottery" and "academic corruption" from the rooftops.

This is all very, very strange. Groundbreaking work by well respected linguists nearly always goes unnoticed in the general media; why all of a sudden various works by the same half a dozen non-linguists purporting to "revolutionise" a field that is not their own and whose methodology is completely discredited in linguistics are all over the place?

Would the New York Times and Nature promote Young Earth Creationism? Radical climate skepticism? Would they entirely ignore the negative views of geneticists while reporting on DNA research done by psychologists and others aiming at debunking tried-and-true science?

terryt said...

"IE looks relatively close to Uralic and far from Afro-asiatic".

Yes. But IE is related also to other north Eurasian languages, not just Uralic. Altaic, Dravidian and North-Caucasian. That all makes sense, as does:

"I think this makes some sense, since Afroasiatic is basically an African language family with a Near Eastern offshoot, so I did not expect it to group with the Eurasian language families".

The African languages, apart from San, form a related group.

"Wow, the SE Asian / Oceanian (including Papuan and Australian) languages are all over the place. Looks almost like any relatedness is that they are... languages".

But we knew that already. They are anciently diverged languages. New Guinea/Melanesia has possibly three separate language families: (old classification), one of which is Austronesian, the other two being Sepik-Ramu and trans-New Guinea. Australia has just two (probably): a widespread Pama-Nyungan and non-Pama-Nyungan confined to the northwest. Austronesian, the most widespread SE Asian language family, is not related at all to any earlier New Guinea/Australian families.

"interesting to see Dravidian so close. Any suggestions as to why?'

Dravidian may have been introduced to India with the expansion of agriculture. Some see a relationship between Dravidian and Elamite, so the connection between Dravidian and North-Caucasian makes sense in that context. Another interesting connection is that between Austro-Asiatic and Na-Dene. That makes complete sense if Austro-Asiatic was introduced to SE Asia by Y-DNA O2a. That would make Austro-Asiatic the southern representative of a language sprachbund that included Na-Dene. And Sino-Tibetan seems to be on its own.

Ebizur said...

terryt wrote,

"Another interesting connection is that between Austro-Asiatic and Na-Dene. That makes complete sense if Austro-Asiatic was introduced to SE Asia by Y-DNA O2a. That would make Austro-Asiatic the southern representative of a language sprachbund that included Na-Dene. And Sino-Tibetan seems to be on its own."

How does that make complete sense? Austro-Asiatic-speaking ethnic groups in Southeast Asia and (especially) South Asia seem to be closely associated with patrilineages derived from haplogroup O2a-M95. Na-Dene-speaking ethnic groups in North America, on the other hand, are associated with patrilineages derived from haplogroup C3b-P39 or haplogroup Q-M242(xQ1a3a1-M3). The closest known phylogenetic connection in regard to the Y-DNA of these populations is MNOPS-M526.

Anonymous said...

Curious..at the location of Sino-Tibetan and possible connection to Eurasia.

George said...

Why the absence in this work of the Kartvelian family? The Kartvelian is the closest to IE.

Belenos said...

Just to point out to those here who seem to be taking this paper seriously, you shouldn't, and here's why:

If you apply the same methodology to modern European languages you get the following results:

Slavic splits from the Latin languages early, then all the other Latin languages split from Romanian, Remain together for a long time, and then diverge. What actually happened was that Slavic and all Latin languages diverged, then Slavic influenced Romanian as prt of a Sprachbund.

Similarly, Germanic and Celtic split, then English splits from Germanic and the Germanic family continues and diverges into Gothic and everything else, then into Scandinavian and West Germanic, then into its various languages. We know this didn't happen, but Celtic either acted on English as a substrata or existed in a sprachbund, or both.

We could do similar analyses for Breton, Papamiento, Haitian Creole, Quechua and French that would come up with the wrong answer.

Why should anybody take this form of analysis seriously in deep time when it doesn't even work for the languages whose histories we know?

andrew said...

The most solid methodological gem in the study is this one:

"Atkinson and colleagues have recently shown that the basic vocabulary does not evolve gradually but shows bursts of rapid change following language splits. Essentially, the amount of evolution on the path leading from the root of the tree to a language is positively correlated with the number of nodes (splits) on the path. Using a complex methodology which controls for phylogenetic relatedness and the so-called “node-density” artifact in three language families (Indo-European, Bantu and Austronesian), they find that between 9.5% and 33% of the vocabulary change is due to punctuational bursts around splitting events. . . .

We found that across all language families and datasets, the correlation between path length and number of nodes is very high (range 0.65–0.80, mean = 0.75, sd = 0.046), suggesting that punctuational bursts might explain about 50% of structural change. There are large differences between language families and datasets (Materials S1) with most families showing a positive correlation (range −0.66–0.87, mean = 0.37, sd = 0.32; one-sample t-test comparing to 0: ). We also estimated the strength of punctuated evolution for different categories of linguistic features for the four datasets using Harald Hammarström’s classification and found important punctuational effects for all categories (on average on the order of 25%), and small but significant differences between them ( across all families). Phonology and Morphology show the lowest punctuational effects (on the order of 20%), while Nominal Categories, Word Order and Simple Clauses show the biggest effects (on the order of 35%); . . . When estimating punctuated evolution for each category in each family (Materials S1), we discovered quite extensive variation between categories across families (the interaction between family and category is highly significant,), but all categories tend to show consistent punctuational evolution in all families . . . Interestingly, the strongest punctuation is shown by the largest families and, while this could be entirely an artifact of better sampling and branch length estimation, it might also suggest that large and small families evolve through different processes. Thus, within the limits of this method, our data suggest that structural features also evolve in punctuational bursts around language splits."

As a few outlier cases like Iceland, which is remarkably conservative of lexicon over a long period of of isolation without a language split suggests, language split and language contract are increasingly shaping up to be the predominant factors in language evolution (social change creating a need for new words might be another), while mere random linguistic drift itself appears to be a decidedly secondary factor.

Belenos said...

Andrew: I get the feeling that you geneticists are a little over focusing on the maths without bothering to think about what it is trying to represent. What you report as a methodological gem is actually just a collection of made up numbers.

"
As a few outlier cases like Iceland, which is remarkably conservative of lexicon over a long period of of isolation without a language split suggests, language split and language contract are increasingly shaping up to be the predominant factors in language evolution (social change creating a need for new words might be another), while mere random linguistic drift itself appears to be a decidedly secondary factor. "

You need to go away and think about what you mean by "language split" because what you've written is effectively "One of the main causes of fire is fire".

terryt said...

"How does that make complete sense? Austro-Asiatic-speaking ethnic groups in Southeast Asia and (especially) South Asia seem to be closely associated with patrilineages derived from haplogroup O2a-M95. Na-Dene-speaking ethnic groups in North America, on the other hand, are associated with patrilineages derived from haplogroup C3b-P39 or haplogroup Q-M242(xQ1a3a1-M3)".

What you say is very true, however we know that languages are not intimately tied to haplogroups. In fact languages usually spread beyond the spread of any haplogroup. O2a related to O2b, which is a northern version of the haplogroup O2. So, presumably, members of O2b originally spoke a language similar to Austro-Asiatic (that gave rise to Na-Dene?). Some research has suggested that Na-Dene is related to Ket. The majority of Ket speakers carry haplogroup Q, not O2. Yet I see no problem for a language having expanded beyond the spread of Y-DNA O2b. The majority of Na-Dene speakers may or may not be members of Y-DNA C3, but again the language could easily have spread beyond the spread of any haplogroup. So hopefully you can now see why I think a connection between Austro-Asiatic and Na-Dene makes complete sense.

terryt said...

"Just to point out to those here who seem to be taking this paper seriously, you shouldn't, and here's why"

You are quite possibly correct, but just to explain my comment of yesterday, 'a connection between Austro-Asiatic and Na-Dene makes complete sense':

I agree that it doesn't make sense if you insist on trying to fit the language family's movement as having been from south to north. But if you're prepared to consider that the movement could have been in the other direction all the different pieces of the puzzle fit into place.

Members of O2a haplogroups do speak languages other than Austro-Asiatic, notably Austronesian/Thai and Sino-Tibetan. But these two language groups are usually accepted as overlaying a pre-existing Austro-Asiatic substrate. So the O2a distribution coincides exceptionally well with Austro-Asiatic's distribution. In fact surprisingly well.

The fact that the fit is so unusually close suggests that the expansion of both the haplogroup and the language family is relatively recent, no more than within the last 10-12,000 years. And Austro-Asiatic is a well-accepted language family, which also is an argument in favour of a an expansion within the last 10,000 years.

And the panda provides yet more evidence. Pandas were evidently widespread through much of China south of the Tsin Ling Mountains until as recently as perhaps 2000 years ago. That strongly suggests that the region was sparsely inhabited until then.

So Y-DNA O2a and the Austro-Asiatic languages arrived in South China, Southeast Asia and India from somewhere further north. From a region where the ancestral forms of Ket and Na-Dene were spoken.

bmdriver said...

http://www.biomedcentral.com/1471-2148/7/47/figure/F1?highres=y

bmdriver said...

http://www.rdmag.com/uploadedImages/RD/News/2011/11/Genographic1.jpg

German Dziebel said...

@terryt

"What you say is very true, however we know that languages are not intimately tied to haplogroups. In fact languages usually spread beyond the spread of any haplogroup. O2a related to O2b, which is a northern version of the haplogroup O2. So, presumably, members of O2b originally spoke a language similar to Austro-Asiatic (that gave rise to Na-Dene?). Some research has suggested that Na-Dene is related to Ket. The majority of Ket speakers carry haplogroup Q, not O2. Yet I see no problem for a language having expanded beyond the spread of Y-DNA O2b. The majority of Na-Dene speakers may or may not be members of Y-DNA C3, but again the language could easily have spread beyond the spread of any haplogroup. So hopefully you can now see why I think a connection between Austro-Asiatic and Na-Dene makes complete sense."

Terry, you embarrass me. We exchanged so many times on languages and genes, I thought. But on a good side of things, as you seem to be armed with rather absurd logic, you don't surprise me anymore when you express your belief in out-of-Africa.

terryt said...

@bmdriver:

That first map showing O2a1 originating in India can only have been made by a very patriotic Indian. It does not take O2b into account and ignores the related haplogroups O1 and O3. The secnd map, on the other hand, does seem to fit F's whole expansion but misses out Australia and so ignores Y-DNA C.

@German:

"Terry, you embarrass me. We exchanged so many times on languages and genes"

Surely you can accept that languages and genes are in no way intimately connected. Or are you saying there is no connection between Ket and Na-Dene? Or that Na-Dene is not connected to C3 in America?

"But on a good side of things, as you seem to be armed with rather absurd logic, you don't surprise me anymore when you express your belief in out-of-Africa".

Please explain what you mean by my 'absurd logic'. Example?

German Dziebel said...

@Terry

Education is key, Terry. You can browse my website at www.anthropogenesis.kinshipstudies.orgto see how I interpret linguistic and genetic patterns in conjunction with each other.

Onur Dincer said...

Education is key, Terry. You can browse my website at www.anthropogenesis.kinshipstudies.orgto see how I interpret linguistic and genetic patterns in conjunction with each other.

You first educate thyself, Dziebel. How can you claim to reconcile genetics with linguistics when you fail even at the most basic levels of genetics?

German Dziebel said...

@Onur

"How can you claim to reconcile genetics with linguistics when you fail even at the most basic levels of genetics?"

This is just plain weird. You are a nobody, Onur. I'm a multi-field anthropologist trained in all the relevant fields including population genetics. If this simple fact escapes you, how can you hope to make sense of modern human origins?

terryt said...

"You can browse my website at www.anthropogenesis.kinshipstudies.orgto see how I interpret linguistic and genetic patterns in conjunction with each other".

Have you anything on your blog dealing specifically with Austro-Asiatic or Y-DNA O2a?

Onur Dincer said...

This is just plain weird. You are a nobody, Onur.

You are in no position to call me nobody in genetics, Dziebel.

I'm a multi-field anthropologist trained in all the relevant fields including population genetics. If this simple fact escapes you, how can you hope to make sense of modern human origins?

Some training in population genetics does not warrant adequacy in basic level genetics.