November 14, 2011

Splits or Waves? Trees or Webs?

Tree models are used in both linguistics and genetics for inferring population history. The trouble with them is that human populations do not really evolve (either genetically or culturally, as in language), tree-like, but rather exchange both genes and words.

Linguistic evolution has been mostly described in terms of tree models, but languages are not insulated from each other, and they interact after their initial differentiation. This interaction is facilitated by geographic proximity, and also by linguistic proximity.

Geographic proximity makes it possible for speakers of different languages to talk to each other, learn each other's languages, or develop hybrid languages or a lingua franca. Linguistic proximity facilitates communication: it is fairly easy, for example, for speakers of Germanic languages to interact, and much more difficult for those of, say, English and Chinese.

If speakers of a language become separated by distance or geographical barriers, then lateral exchange between different groups becomes minimal, and language evolution can be well-described by a tree model. If, on the other hand, there exists a language continuum across a wide area, effected by a common process (say, the spread of agriculture), then there is room for substantial cross-interaction of different emergent languages at the stage when they can be still thought as dialects of the parent language.

While the current paper's focus is on Germanic languages, the endgame seems to be on the much harder and more vigorously contested field of Indo-European studies.

The author has put up a nice supplementary page online on a first attempt of using NeighborNet with an Indo-European dataset, pictured on the right. A publication on the topic is listed as being in preparation:
The utility of Germanic as a case-study is that it provides a (reasonably) known external history against which to assess our methodological approaches. On the strength of the findings here, a similar logic can now be extended to probing the unknown of how the early divergence history of Indo-European unfolded. In the full exploration in Heggarty (in preparation a), it transpires that even the data underlying figures 1 and 2 here suggest an early divergence along the lines of a dialect continuum. And for all the purported analytical elegance of binary branches, as a real-world demographic scenario it is this Indo-European continuum that offers the more straightforward and economical explanation. A splits-then-borrowing scenario has instead to invoke not just a complex series of divergent migrations, but then later movements to attenuate this by bringing certain groups back into contact again. This in turn entails consequences for which of the main rival hypotheses—the migratory Kurgan ‘horse culture’, or the progressive demic diffusion of agriculture—best fits as the driving force that shaped the pattern of the earliest Indo-European expansion.
Phil. Trans. R. Soc. B 12 December 2010 vol. 365 no. 1559 3829-3843

Splits or waves? Trees or webs? How divergence measures and network analysis can unravel language histories

Paul Heggarty et al.

Linguists have traditionally represented patterns of divergence within a language family in terms of either a ‘splits’ model, corresponding to a branching family tree structure, or the wave model, resulting in a (dialect) continuum. Recent phylogenetic analyses, however, have tended to assume the former as a viable idealization also for the latter. But the contrast matters, for it typically reflects different processes in the real world: speaker populations either separated by migrations, or expanding over continuous territory. Since history often leaves a complex of both patterns within the same language family, ideally we need a single model to capture both, and tease apart the respective contributions of each. The ‘network’ type of phylogenetic method offers this, so we review recent applications to language data. Most have used lexical data, encoded as binary or multi-state characters. We look instead at continuous distance measures of divergence in phonetics. Our output networks combine branch- and continuum-like signals in ways that correspond well to known histories (illustrated for Germanic, and particularly English). We thus challenge the traditional insistence on shared innovations, setting out a new, principled explanation for why complex language histories can emerge correctly from distance measures, despite shared retentions and parallel innovations.

Link

25 comments:

eurologist said...

Great paper. From the Germanic NeighborNet, it seems clear that the projection to two dimensions is a major limiting factor. I think simply going to three dimensions would help, tremendously (not sure how easy that would be - or even 4-D, using colors?). For example, in 2-D we can't see that Frisian retains known affinity to the UK branch, and Danish appears further removed from Northern German than the more extreme Scandinavian languages (likely because of its closer association with English).

The IE net is fantastic. It shows a Balkan center better than anything I have ever seen (my pet hypothesis... ;)). Other details: by geographic position, Celtic must have migrated west very early - check. All three have strong ties (west alpine region), but in general, as expected, there is a stronger Italic-Germanic association.

Germanic-Baltic I would expect close based on LBK migrations (first East through Poland, then south along the rivers to the Ukraine/ almost Urals), with lots of contact, and with much of Poland remaining in the (East) Germanic range, or becoming Baltic, long before Slavic expansion.

Greek seems to be removed a lot from the Balkan Sprachbund because of its extreme and (many) islands position, but also because of its old age. Yet, it is again the Balkan axis that provides the Anatolian and beyond outliers. And as expected, there is a huge affinity between Greek and Anatolia, and beyond.

Since Tocharian and Hittite are no further removed from a Balkan center than Albanian or Armenian (thought to be Balkan), and not closely related to the southeastern (i.e., Asian) branches, I can't really make out an Anatolian origin, here.

Ebizur said...

eurologist wrote,

"The IE net is fantastic. It shows a Balkan center better than anything I have ever seen (my pet hypothesis... ;)). Other details: by geographic position, Celtic must have migrated west very early - check. All three have strong ties (west alpine region), but in general, as expected, there is a stronger Italic-Germanic association."

The IE network clearly shows a stronger (and expected) association between Italic and Celtic than between either of those groups and Germanic. Perhaps you have made an error in typing.

eurologist said...

...stronger than Germanic - Celtic

... is what I said (or tried to express). That is, Celtic seems like a derivative that migrated very early and developed in the extreme west.

In this network, as expected, there is an Alpine connection with Italic - but apparently later, and apparently closer to Germanic than Celtic to Germanic.

pconroy said...

Interesting Neighbor Network diagram of Indo-European, I notice that the 3 languages that appear nearest the center point are:
1. Albanian A
2. Albanian B
2. Tocharian
3. Celtic B1
4. Slavic B

Does anyone know what which modern or extinct languages these language designations refer to?

Vincent said...

Since the data must be reduced to a 2-dimensional matrix to be input into SplitsTree, there would seem to be a limited utility in calculating a 3D network from that data. But I agree that further advances in the algorithm to allow a more sophisticated analysis would be potentially fascinating.

I didn't see a direct link in Dienekes post, but the SplitsTree software is freeware and easily used. http://www.splitstree.org/

NeighborNet networks are also one of the best ways of visualizing true phlogenies (eg Y-DNA) in which case the reticulations represent uncertainty rather than continual genetic exchange.

Andrew Oh-Willeke said...

For a very sophisticated algorhythm it seems to be quite a close fit to the Centum-Statem paradigm, with a fairly basal Indo-Hittite spur.

Andrew Lancaster said...

Concerning your first graphic, interestingly, Dutch dialect from Limburg and German dialect from Cologne (Koeln) are approximately the same language and fairly well mutually comprehensible. They have long been recognized as such by the people who live in these regions, and indeed these regions are neighbouring and were once interconnected by politics and trade. But the tree does not show them even as neighbours. This depends what level of dialect they have taken. My guess is that they've used a relatively standardized modern Limburgish, leaning away from the older dialect as spoken by many people still, and basically now a variant of Brabant Dutch.

eurologist said...

Since the data must be reduced to a 2-dimensional matrix to be input into SplitsTree...

Clearly, that is something that needs a major algorithmic redesign - but conceptionally, it does not seem to be a problem, to me. Seems like it would be worth the money.

German Dziebel said...

Very good paper. The IE network makes a lot of sense. In the very least it deals a blow to the Anatolian homeland theory. It's noteworthy that the two subgroups most divergent geographically - Celtic and Tocharian - are among the closest to the center of the net.

mr. Know When said...

I am not a statistical geek, but matrixes are usually 2 dimensional information. In this case it would make no sense to use 3d. You would need a third or fourth variable linked to one of used datasets.

"Germanic-Baltic I would expect close based on LBK migrations (first East through Poland, then south along the rivers to the Ukraine/ almost Urals), with lots of contact, and with much of Poland remaining in the (East) Germanic range, or becoming Baltic, long before Slavic expansion."

LBK is not a likely candidate for being a carrier of the Germanic languages.
http://www.eupedia.com/forum/showthread.php?26083-new-ancient-DNA-study-in-LBK

Besides LBK was closely linked to Loess soils which are not that common in Poland and Northern Germany.

The oldest U106 found in Europe is from 1000BC (Lichtenstein cave). Which means it dates in the late Bronze age.

pconroy said...

@German,

Re: Celtic and Tocharian

Not surprising at all...

Here's what I wrote over 3 years ago:
http://dienekes.blogspot.com/2008/06/mtdna-of-tarim-mummies_13.html?showComment=1213542540000#c2527361961689493142

pconroy said...
Interestingly, I purchased a book a few years ago called the Mummies of Urumchi, by Elizabeth Wayland Barber, who is a textile expert, and she traced the tartan style cloth worn by the mummies to the North Caucasus region.

As a side not, she theorized that a single group, dressed in such Tartan, from the North Caucasus region started a range expansion, with some going West - to become Celts - and some going East - to become Tocharians.

Of course when I read that recent book, The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World, I still think a case can be made that Indo-European languages were first spoken by the Kura-Araxes Culture, which seem to have introduced metal working, ox wagons and chariots to the North Caucasus/ Pontic Steppe area.
Sunday, June 15, 2008 6:09:00 PM

German Dziebel said...

That's pretty cool. I didn't know about this textile evidence. Conroy, you should publish a volume of your Collected Posts.

terryt said...

"Since Tocharian and Hittite are no further removed from a Balkan center than Albanian or Armenian (thought to be Balkan), and not closely related to the southeastern (i.e., Asian) branches, I can't really make out an Anatolian origin, here".

Both diagrams fit geography closely. Just the distances between languages don't match closely. But what to me is extremely interesting is the lonely position Hittite takes, both linguistically and presumably geographically. As you say that suggests Hittie is an immigrant into Anatolia. Anatolia is not the source region for Indo-European.

eurologist said...

"LBK is not a likely candidate for being a carrier of the Germanic languages."

I don't know anyone espousing this, PIE - not Germanic.

And, please, Eupidia is pseudo-scientific garbage.

mr. Know When said...

You mentioned a Germanic-Baltic language. But even this proto-language based on the distribution of Lbk settlements on Loess soils has little to do with Germanic-Baltic languages in my opinion. After the arrival of Lbk the neolithicization of NW-Europe(Swifterbant, Ertebölle etc) took another 400 years before it became visible as being not a hunter-gatherer society.

The main problem here is we don't even know if the LBK people spoke an Indo-European or PIE language and if their language did survive in some kind of form. For instance the Basks are not genetically that different from its neighbouring fellow-Iberians.


However for now the absence of R-M269 in Central Europe during the neolithic is at least peculiar.All the ancient DNA samples even those in France are all are predominantly G2a.
http://www.buildinghistory.org/distantpast/ancientdna.shtml

Well, time will tell.

terryt said...

"However for now the absence of R-M269 in Central Europe during the neolithic is at least peculiar".

Yes. And I think best explained as replacement by movements into Europe from further east after R1b1a2-M269 had spread as far as the western extremity. That would almost certainly be pre-Indo-European, perhaps as long ago as 10,000 years.

mr. Know When said...

It originated from the east for sure, but I don't think it indicates necessarily something pre-Indo-European in western Europe. On the contrary it might even indicate another wave of farming settlers of M269 people into an already farming Europe during the late neolithic or Bronze age. Whatever language M269 people spoke when they arrived in Western Europe is speculative. Fact is, most of its descendants speak an Indo-European language nowadays, which could lead to the plausible assumption they were Indo-Europeans itself - (like its predecessors only with a different haplogroup?).
For instance in this neighbour-net, Eastern-Germanic seem to share equally similarities with its western and northern cousins. So this subclade of M269 could have moved from Poland to Skandinavia and across the plain of Northern Germany.

But again we know nothing yet for sure. The dataset of ancient DNA and its regional cover is to small to make any conclusive reconstruction/interpretation possible.

terryt said...

"it might even indicate another wave of farming settlers of M269 people into an already farming Europe during the late neolithic or Bronze age".

But it seems unlikely thay could have moved through Europe as just a thin ribbon of migration and yet expanded hugely once they reached the far western extremity. That needs explaining.

mr. Know When said...

It needs explaining I agree, but the answer is not that clear as it seems given the DNA samples we have of neolithic populations.

Not every movement will have produced a DNA trace in modern populations. What some tend to forget is that people could have moved by boat into river systems and along coasts bypassing other tribes.

During the metal ages there will have been an increased oversea exchange of goods with Great Britain and Ireland in order to get some tin from the Devon/Cornwall mines and copper and gold from the mines in Ireland. In other words water is not only barrier.

pconroy said...

Terry, Know,

I've been saying for a year or more that I think R1b pastoralists may have journeyed across North Africa, cattle in tow, and then up through Iberia to France and the Isles.

Of course they may have just skirted the Southern Mediterranean in boats and followed basically the same path.

Look at the Malagasy, who crossed thousands of miles to end up where they did?!

BTW, I've seen video of fishermen from the Aran Islands loading a bull onto their traditional hide-covered boat, the Currach. Basically they tie the legs tightly, thus immobilize it, then hoist/shove it aboard the floating currach.

Dienekes said...

I've been saying for a year or more that I think R1b pastoralists may have journeyed across North Africa, cattle in tow, and then up through Iberia to France and the Isles.

The Atlantic/Mediterranean main distribution of R1b, coupled with its rapid diminution in Central/South Asia could be consistent with such a scenario.

It would also put the alleged R1b of King Tut in a whole new light, as it would the ancient accounts about the blue-eyed Libyans.

Of course, on the downside, the relatives of R1b seem to be in Asia, and North Africa is not a particularly R1b-rich area today. But, if Europe during the Neolithic looked, apparently, nothing like the Europe of today, I don't see an a priori reason to believe that North Africa of the Neolithic was like the North Africa of today.

It would be nice if we started getting ancient Y-chromosome results from Africa and West Asia, which may, perhaps, be more difficult than is the case for Europe.

pconroy said...

Dienekes,

Right, this will only be solved when aDNA of people like the Garamanates are sampled - http://en.wikipedia.org/wiki/Garamantes - cattle herders from today's North Africa, who have some Mummies?!

Also, Berbers from the Siwa Oasis, who may be related to the Red-Haired Pharoah Ramesses the Great http://en.wikipedia.org/wiki/Ramesses_II

The Egyptians described the ancient Libyans as the "Red People" - probably on account of their skin color.

BTW, Irish legends say that that the ancestors of the Ancient Irish were many different groups, and one of them at least had something to do with Troy, then journeyed through Egypt, and on to Iberia and departed from Galicia for Ireland.

terryt said...

"Whatever language M269 people spoke when they arrived in Western Europe is speculative. Fact is, most of its descendants speak an Indo-European language nowadays, which could lead to the plausible assumption they were Indo-Europeans itself"

Many Basques are M269, and Basque is generally regarded as pre-Indo-European.

'What some tend to forget is that people could have moved by boat into river systems and along coasts bypassing other tribes".

Possible. Would account for the western concentration. But what movement could they have been part of?

"R1b pastoralists may have journeyed across North Africa, cattle in tow, and then up through Iberia to France and the Isles".

Again, possible.

"Look at the Malagasy, who crossed thousands of miles to end up where they did?!"

But that is not really comparable as the Malagasay were probably the first to enter an uninhabited Madagascar. Europe, including the Mediterranean islands, were already inhabited.

mr. Know When said...

"....the blue-eyed Libyans."

Some Tuareg and other Berber tribes still have blue eyes.
Maybe some Bell beaker people settled in Libya long before the Vandals did. The nearest settlements of this copperage culture were found on southern coast of Sicily.
http://de.wikipedia.org/w/index.php?title=Datei:Bellbeaker_map_europe.jpg&filetimestamp=20060728211215

A came across a German documentary in which they suspected Bell-beaker warriors to have killed some Corded Ware people of site in East Germany. Based on a Catalonian site with 200 bodies which were killed by Bell-beaker archers they assumed the Elau massacre had a similar ethnic background. Which was not the case eventually - some northern Harz neighbours were to blame.

Nonetheless revealing were the mentioned Bell-beaker case-studies.: (scroll vertical bar to 29min:52sec)
http://www.zdf.de/ZDFmediathek/beitrag/video/1118362/Tatort-Eulau#/beitrag/video/1118362/Tatort-Eulau

Like the British bellbeaker burialsite of the man with a knife produced in Northern Spain. When they applied an isotope analysis on his teeth it turned out that this individual originally grew up in the upper Rhine region. Which illustrates how well connected these people were within a lifetime. But the most intriguing thing was that Stonehenge had been modified after the Bell-beaker people arrived in Britain. It seem to have adopted some features very similar to Egyptian sites. For instance the sunalley which apparently was new to this part of Europe.

I wonder what kind of Haplogroup these Catalonian victims were... G2a? ;)

mr. Know When said...

"Many Basques are M269, and Basque is generally regarded as pre-Indo-European."

It is regarded PIE yes, but who knows what kind of language it is. Did it simply survive or is it a newcomer like the Indo-European languages?


"Possible. Would account for the western concentration. But what movement could they have been part of?"

Well, in my opinion the bell beakers are good candidates for this western concentration; as mentioned above. The were good seafarers and technologically advanced. http://en.wikipedia.org/wiki/Beaker_culture (=>R1b-L21 (S145)?) http://www.eupedia.com/images/content/Haplogroup-R1b-L21.gif

The Corded Ware culture on the other hand could have been linked to proto Germanic-Baltic branch http://en.wikipedia.org/wiki/Corded_Ware_culture (U106 +some R1a?).

But again reality will proof to be far more complex.