August 29, 2005

Haplogroup frequency correlations in Southeastern Europe

I have decided to investigate the correlations between haplogroup frequencies in southeastern Europe and some neighboring populations. Currently, I have collected frequency data for the main haplogroups found in the region (E3b, J2, I, R1a, R1b) for 16 populations. Most 3-letter codes should be recognizable, but KAL=Kosovo Albanians, SMA=Slav Macedonians, CAL=Calabrians. I should also note that the frequency of haplogroup I in Bulgarians is interpolated from frequencies in Romanians, Greeks, Slav Macedonians and Serbians, as it was missing in the original article. Conclusions about Bulgarians are especially weak, due to this reason, and also the small original sample (N=24).

I began by calculating the correlation matrix in my sample.

A few features strike the eye:

  • The negative correlation between haplogroup R1 and haplogroups E3b, J2, and R1b
  • The negative correlation between haplogroup I and haplogroups J2 and R1b
  • The positive correlation between haplogroup J2 and haplogroup R1b
  • The absence of a substantial correlation between "Neolithic" haplogroups J2 and E3b
As the next analysis will make clear, variation is explained by the presence of two main groupings: a "continental" group comprising of Slavic speakers and a "coastal" group comprising of all others.

The absence of a correlation between J2 and E3b is significant, because it hints that these haplogroups did not diffuse as a result of a single process. The eastern-most populations of our sample, but also the two Italian populations show a higher J2/E3b ratio compared to the "continental" populations.

The second analysis is a dendrogram using Euclidean distance of the normalized haplogroup frequencies. As is apparent, this way of representing the frequency data results in a separation of the two main clusters.

Finally, a principal components analysis is shown in the following plot. The first two components summarize about 77% of the variance.

We observe the two main "contrasts" in the data between "coastal" J2/R1b and "continental" I1b and between "Neolithic" E3b and "Slavic" R1a (*)

Several conclusions can be drawn.

  • The spread of the Neolithic economy into continental Europe involved E3b bearers in a riverine expansion whose northern expression is associated with the Linearbandkeramik. This does not mean that E3b was the only haplogroup associated with these early European farmers, only that it definitely seems to correlate better with this movement compared to the other Neolithic haplogroup (J2).
  • The early diffusion of E3b occurred over a haplogroup I Paleolithic background. It is likely that as groups moved northward the frequency of haplogroup E3b abated, and this is in fact shown in the frequency distribution. This movement is probably associated with the narrow-faced Danubian Mediterranean racial types.
  • This native European population later received an influx of R1a speakers; the frequency of R1a is correlated with latitude. This led to a decrease of the native component in favor of the foreign R1a component (*)
  • The frequency of haplogroup J2 was established by three movements: (i) the initial arrival of J2 from Asia Minor; this did not significantly penetrate into the Western Balkans; (ii) the initial dispersal of J2 into Italy and further west, and around the Black Sea in pre-Greek times, which may be associated with the arrival of gracile Mediterranean racial types into the Ukraine; (iii) the latter dispersal of additional J2 as a result of Greek colonization.
It is imperative that the fine-level phylogeography of haplogroup J2 be resolved. The high frequency of this haplogroup around the Black Sea compared to the western Balkans is highly suggestive of Greek colonization, as it is well known that Greek colonization of the Black Sea was much more intensive than Greek activity in the Adriatic. However, archaeological evidence also shows the northward diffusion of agriculturalists in Thrace to Romania, culminating in the Tripoljie culture and its steppe offshoots. We must be able to distinguish between this earlier movement and the later maritime arrival of the Greeks.

The critical question would be: what fraction of J2 lineages in the Ukraine can be explained as the result of ancient and recent Greek settlement in the Crimea, and what fraction predates the Greeks?

(*) We should note that these are rough correspondences. If the theory of riverine diffusion of haplogroup E3b into Central and Northern Europe is correct, then it is likely that E3b existed in a small frequency in Proto-Slavs; conversely, R1a diffused after the LGM before its most recent diffusion associated perhaps with Slavic languages.

Update: A reader alerts me to a different study which listed the Hungarian R1a frequency as substantially lower than the one used here (Semino et al. 2000). Unfortunately, that study did not list frequencies of all haplogroups needed for comparison, so it could not be used directly. If the frequency of R1a=20.4% is used, then a slightly different clustering is obtained.

Free Image Hosting at


petertuzo said...

your piece requires a succinct translation clarifying for the untutored in anthropology who do not understand your scientific terminology and significance. This would enhance your research objectives and expand its appreciation. Peter Timber

Dr Rob said...

There's one minor but quite significant flaw here. Romanians consistently cluster with Serbs, Bulgarians and Macedonians. Something is amiss in this analysis which places them with Greeks, Albanians and Turks.