July 31, 2008

Expansion of E-V13 explained

E-V13 is the main European clade of haplogroup E. It has been variously interpreted as a signature of early Balkan Bronze Age, or Mesolithic, the Greek colonization of Southern Italy, Greek ancestry in some Pakistanis, or Roman soldiers of Balkan origin in Britain. A proper understanding of its age would help resolve the problem of its origins.

Age, of course, depends on a proper choice of mutation rate, and as I have argued (part I and part II), the proper effective mutation rate is near the germline rate and not 3.6x slower as argued by Zhivotovsky, Underhill, and Feldman (2006). This is especially true for a relatively young haplogroup (very low STR variance compared to other lineages), which is also quite frequent in its area of origin, while much reduced away from it, giving a definite impression of a sudden and relatively recent expansion.

In my previous post, I estimated a Late Bronze Age for E-V13 in Greece and areas affected by historical Greek colonization. I now used Ken Nordtvedt's Generations2 program to obtain estimates of the age of E-V13 in three different datasets: the King set, 12-marker data from the E-M35 Phylogeny Project (Haplozone), as well as E-M78 data -most of which should be E-V13- from Bosch et al. (2006). In the latter set, I used two marker sets: all 12 markers common between Generations2 and Bosch, as well as 8 markers common between them, but excluding markers after DYS392 (in the Generations2/FTDNA order).


Age (25y/gen) Age (30y/gen)
Nea Nikomedeia 8
1725 BC 2470 BC
Sesklo/Dimini 20
225 AD 130 BC
Lerna Franchthi 20
1000 BC 1600 BC
Crete 13
300 AD 40 BC
Haplozone 103
1350 BC 2020 BC
Aromuns (12) 32
225 AD 130 BC
Aromuns (8) 32
175 AD 190 BC
Slavomacedonians (12) 13
725 AD 470 AD
Slavomacedonians (8) 13
525 AD 230 AD
Albanians (12) 9
250 AD 100 BC
Albanians (8) 9
525 AD 230 AD

Both the King et al. E-V13 data, as well as the diverse, mostly European Haplozone E-V13 agree in placing the expansion of this haplogroup squarely in the Aegean Bronze Age.

Aromuns (Vlachs) coalesce to the Roman era, consistent with the idea that they are Balkan natives who became Latinized linguistically at around that era.

Albanians also coalesce to Roman/Late Antique times, consistent with the idea that their high frequency of haplogroup E-V13 (which reaches very high numbers in e.g. Kosovars) is not associated with high diversity. Founder effects in that time frame are the reason for the high frequency of E-V13 in them.

Finally, Slavomacedonians from the former Yugoslav Republic of Macedonia coalesce well into AD times, at around the time of the first Slavic arrivals in the Balkans. This suggests that E-V13 in them is the result of local founders at around that time who adopted the Slavic language. However, Pericic et al. (2005) (see below) report high (but unspecified) diversity of E-M78α in "Macedonia", so it is possible that a larger number of earlier inhabitants were absorbed.

Pericic et al. (2005) give a 7.3kya estimate for the expansion of E-M78α (almost perfectly equivalent to E-V13) for Southeastern European populations north of Greece. Due to their use of the 3.6x slower mutation rate, this figure needs to be converted to equivalent years. The Nea Nikomedeia time depth was estimated as 9.2kya by King et al. Therefore, the equivalent age for the Pericic et al. (2005) expansion is (7.3/9.2) * 149 generations or 118 generations (1,540-950BC). They note that STR variance is higher in Greece, Macedonia, and Apulia, all areas with well-known historical Greek connections.

Cruciani et al. (2007) propose that E-V13 arrived in Europe from West Asia and underwent an expansion in Europe at 4-4.7 kya. This age is calculated using effective mutation rates that are 2.4 or 2.8 slower than the germline rate, which seems to suggest a Late Bronze Age or even later expansion with a rate closer to the germline one.

In the Balkans, it is fairly clear that E-V13 is mostly concentrated south of the Jirecek Line which separated native Greek from Latin speakers. In Italy, the highest frequencies are found in the south, the areas of historical Greek colonization. High frequencies are also attained in Cyprus. Cyprus also high STR diversity, consistent with an early arrival, suggestive of both early Mycenaean and later colonizations from the Aegean.


The age and distribution of E-V13 chromosomes suggest that expansions of the Greek world in the Bronze and later ages were the major causes of its diffusion.

Who was the E-V13 patriarch in Greece? He was perhaps one of the legendary figures of Greek mythology some of whom are said to have come from abroad. For whatever reason, his progeny grew, and were around to participate in the expansion of the Mycenaean world and the subsequent Greek colonization.

UPDATE (Aug. 1):

An additional piece of evidence is Y-chromosome distribution in Calabria, a Southern Italian region with well-known Greek connections. According to Semino et al. (2004) [Am. J. Hum. Genet. 74:1023–1034, 2004], the Calabrian sample has an E-M78 frequency of 16.3%, whereas "Calabria 2" representing the "Albanian community of the Cosenza province" has only 5.9%. This is consistent with the idea that E-V13 in modern Albanians is to a great degree due to Greek founders (Epirotes or ancient colonists).

Antikythera mechanism and the timing of the Olympiads

Complex clock combines calendars:
The Antikythera Mechanism, a clockwork device made in Greece around 150–100 BC, astounded the world two years ago when scientists deduced how this machine was used to make complex astronomical time-reckonings. Now they say that the instrument, discovered in 1901 in a Mediterranean shipwreck, did much more than that.


Researchers have been trying to decode the mechanism's inscriptions and functions for several years. Their latest findings reveal that it links the technical calendars used by astronomers to the everyday calendars that regulated ancient Greek society — most strikingly, the calendar that set the timing of the Olympic Games.

“The mechanism is full of surprises,” says Alexander Jones of the Institute for the Study of the Ancient World in New York, who is one of the decoding team. “The latest revelations establish its cultural origin for the first time.”


In 2006, Freeth was part of a team that used this and other techniques to figure out much of the mechanism's function, showing it to be an instrument of unparalleled sophistication in antiquity, more or less unrivalled until the clockwork mechanisms of the later Middle Ages3.

Now they say that the device was even more sophisticated than that — it unites abstruse astronomical determinations of time with the calendar of civic society. Another ancient Greek calendar cycle, called the Metonic cycle, was established to cope with the incommensurability of the lunar cycle and the solar year — the period of Earth's rotation around the Sun, as determined, say, by the time between successive summer solstices. One Metonic period is equal to 235 lunar months, which is almost exactly 19 solar years. The Metonic cycle, thought previously to be used only by astronomers, is represented on a dial on the Antikythera Mechanism. But this dial now turns out to be inscribed with the names of months in a regional calendar used in Corinthian colonies in northwest Greece — providing evidence that the device was used for mundane reckonings, and giving a surprising clue to its origin.


But Freeth and his team now think that the instrument may have come from Syracuse in Sicily, the Corinthian colony where Archimedes devised a planetarium in the third century BC. “Archimedes died at the siege of Syracuse in 212 BC, so we are confident that he did not make the mechanism,” says Freeth. “But it is possible that it came from a heritage of instrument-making that originated with him in Syracuse. It is an attractive idea, but purely speculative at present.”
Nature 454, 614-617 (31 July 2008) | doi:10.1038/nature07130

Calendars with Olympiad display and eclipse prediction on the Antikythera Mechanism

Tony Freeth1,2, Alexander Jones3, John M. Steele4 & Yanis Bitsakis1,5

Previous research on the Antikythera Mechanism established a highly complex ancient Greek geared mechanism with front and back output dials1, 2, 3, 4, 5, 6, 7. The upper back dial is a 19-year calendar, based on the Metonic cycle, arranged as a five-turn spiral1, 6, 8. The lower back dial is a Saros eclipse-prediction dial, arranged as a four-turn spiral of 223 lunar months, with glyphs indicating eclipse predictions6. Here we add surprising findings concerning these back dials. Though no month names on the Metonic calendar were previously known, we have now identified all 12 months, which are unexpectedly of Corinthian origin. The Corinthian colonies of northwestern Greece or Syracuse in Sicily are leading contenders—the latter suggesting a heritage going back to Archimedes. Calendars with excluded days to regulate month lengths, described in a first century bc source9, have hitherto been dismissed as implausible10, 11. We demonstrate their existence in the Antikythera calendar, and in the process establish why the Metonic dial has five turns. The upper subsidiary dial is not a 76-year Callippic dial as previously thought8, but follows the four-year cycle of the Olympiad and its associated Panhellenic Games. Newly identified index letters in each glyph on the Saros dial show that a previous reconstruction needs modification6. We explore models for generating the unusual glyph distribution, and show how the eclipse times appear to be contradictory. We explain the four turns of the Saros dial in terms of the full moon cycle and the Exeligmos dial as indicating a necessary correction to the predicted eclipse times. The new results on the Metonic calendar, Olympiad dial and eclipse prediction link the cycles of human institutions with the celestial cycles embedded in the Mechanism's gearwork.


July 29, 2008

Haplogroup sizes and observation selection effects (continued)

This is a continuation of my comments on How Y-STR variance accumulates.

The story so far

In my previous post I showed how the "evolutionary rate" of Zhivotovsky, Underhill, and Feldman (2006) is inappropriate for TMRCA calculations, because:
  • It is not calculated from the time depth of the MRCA, but of an earlier "Patriarch"; more importantly:
  • It is an average over many simulated haplogroups of small size, and not the kinds of haplogroups one is usually interested in dating in population studies

How big are the haplogroups in Z.U.F.-type simulations?

Z.U.F. consider several different demographic models, differing in their choice of m, the population growth constant. The population size increases (stochastically) on average by 100(1-m)% every generation.

I produce N=10,000 simulations for each reported number. These are the average, and maximum number of descendants over these N simulations.

Constant population size (m=1)

Under this assumption, haplogroup size grows purely due to randomness of the fathering process; there is no overall population growth. This is an important case, because the 3.6x slower evolutionary rate has been derived from it.

Number of Descendants

It is clear, that this type of simulation produces very small haplogroup sizes. Even for 320 generations (early Neolithic for Greece) the very largest haplogroup produced had 1,310 descendants, while the average one had the theoretically predicted ~160.

Small haplogroups => more drift => loss of variance => lower "effective" mutation rate.

So, as I mentioned in my previous post, to calculate the 3.6x slower rate, not only do we average over haplogroups of all sizes, small and large alike, but we are actually missing the relevant observations. But more on this, in the next section.

Expanding population (m=1.01)

Number of Descendants

Predictably, haplogroups end up bigger in an expanding population, but still far short of the sizes of commonly dated real-world haplogroups. The case of m=1.01 is important, because it is the one which yields the maximum effective mutation rate considered by Z.U.F. assuming haplogroups start with one individual.

Thus, even the highest mutation rate considered by Z.U.F (about 0.55μ over 400 generations) is derived by averaging over haplogroups that are unrealistic (too small). Real Y-STR variance accumulates at a higher rate in the real world.

Why are Z.U.F.-style simulated haplogroups so small?

It is surprising that these simulated haplogroups end up so small, looking nothing like commonly studied haplogroups even for an expanding population.

The apparent mystery is resolved, once we realize that m is nothing more than the average number of sons a man has. The reason why we see haplogroups so much bigger than the simulated ones is because for individual men, m may be much more, or much less than its population average. In other words, there is reproductive inequality, which could be due both to social advantage, or to natural selection.

So, rather than having a uniform m for all men, we can allow m to vary in individual lineages. A man A may have mA<m if he is impoverished or has a faulty Y-chromosome gene, and he may have mA>m if he is a ruler or has an advantageous gene in his Y-chromosome.

The advantage could be slight but long-standing (a small fitness improvement) or small and intense (a conquest or foundation of a dynasty). Its effect on the lucky lineage is an increase in the number of descendants. Its effect on Y-STR variance is a rate of increase approaching the germline rate.

It is clear, by now, that realistic haplogroup sizes can occur only when there is reproductive inequality. They are not the result of genetic drift, but of natural or social selection. And, effective mutation rates should be calculated over successful haplogroups under conditions of reproductive inequality, and not over all haplogroups under conditions of reproductive equality.(*)

A note on sampling

Consider a lineage of 1,000 men (i.e. ~ the maximum produced with reproductive equality) in a population of 1,000,000 men. Its frequency is thus 0.1%

We take a sample of 1,000 men from this population; this is much larger sample than is typically used in population studies, and for a smaller population. We expect on average to find just 1 man from the lineage in question in our sample. You can't do a variance-based age estimate with one man!

Thus, it becomes clear why haplogroups produced by Z.U.F.-style simulations are uninteresting. You just never encounter enough representatives from them in a real population study. You are typically interested in the much larger haplogroups, which could only have proliferated under conditions of reproductive inequality, and which are the only ones that can yield enough representatives in a sample to allow for a variance calculation.


In the previous post I showed that Z.U.F. calculate their effective rate over all simulated observations, but the rate is applied in the literature over a very specific set of observations, i.e. large haplogroups.

In this post, I showed that Z.U.F.-style simulation just don't produce realistic haplogroup sizes. Drift alone can't explain why millions of men share patrilineal ancestry. Large haplogroup sizes require an assumption of reproductive inequality, and Y-STR variance within them accumulates near the germline rate.

(*) Of course, if one studies numerically small populations, it is possible that a slower effective rate may be desired. My concern is with the large human populations (e.g. Greeks or Indians) where real haplogroup sizes exceed greatly those produced by simulations with reproductive equality.

UPDATE (August 8): Continued in On the effective mutation rate for Y-STR variance

July 28, 2008

SLC24A5 in Greeks

Since two subjects were heterozygous for the Thr(111) allele, its overall frequency in Greeks is 99.4%, within the range of 98.7% and 100% reported for European Americans.

Exp Dermatol. 2008 Jul 7.

A study of a single variant allele (rs1426654) of the pigmentation-related gene SLC24A5 in Greek subjects.

Dimisianos G, Stefanaki I, Nicolaou V, Sypsa V, Antoniou C, Poulou M, Papadopoulos O, Gogas H, Kanavakis E, Nicolaidou E, Katsambas AD, Stratigos AJ.

Department of Medical Genetics, University of Athens, Agia Sophia Children's Hospital, Athens, Greece.

The SLC24A5 gene, the human orthologue of the zebrafish golden gene, has been shown to play a key role in human pigmentation. In this study, we investigate the prevalence of the variant allele rs1426654 in a selected sample of Greek subjects. Allele-specific polymerase chain reaction was performed in peripheral blood samples from 158 attendants of a dermatology outpatient service. The results were correlated with pigmentary traits and MC1R genotype. The vast majority of subjects (99%) were homozygous for the Thr(111) allele. Only two subjects from the control group (1.26%) were heterozygous for the alanine and threonine allele. Both of these Thr(111)/Ala(111) heterozygotes carried a single polymorphism of MC1R (one with the V92M variant and another with the V60L variant). Following reports of the rs1426654 polymorphism reaching fixation in the European population, our study of Greek subjects showed a prevalence of the Thr(111) allele, even among subjects with darker skin pigmentation or phototype.


Ancient mtDNA from Inner Mongolia

Three individuals with mixed Caucasoid-Mongoloid affinities were an adult female (haplogroup C), 25yo male (haplogroup M), and 25-30yo male (haplogroup A). From the paper:
All haplogroups were Asian-specific, the haplotypes of 10 individuals are shared by modern Han Chinese, and the one-step neighbors to another 7 individuals also mainly distribute in modern Han Chinese (Yao et al., 2002). The phylogenetic analysis of the ancient population and extant Eurasian populations showed that the ancient population most closely related to the Han Chinese, especially the northern Han.
American Journal of Physical Anthropology doi: 10.1002/ajpa.20894

Ancient DNA analysis of human remains from the upper capital city of Kublai Khan

Yuqin Fu et al.


Analysis of DNA from human archaeological remains is a powerful tool for reconstructing ancient events in human history. To help understand the origin of the inhabitants of Kublai Khan's Upper Capital in Inner Mongolia, we analyzed mitochondrial DNA (mtDNA) polymorphisms in 21 ancient individuals buried in the Zhenzishan cemetery of the Upper Capital. MtDNA coding and noncoding region polymorphisms identified in the ancient individuals were characteristic of the Asian mtDNA haplogroups A, B, N9a, C, D, Z, M7b, and M. Phylogenetic analysis of the ancient mtDNA sequences, and comparison with extant reference populations, revealed that the maternal lineages of the population buried in the Zhenzishan cemetery are of Asian origin and typical of present-day Han Chinese, despite the presence of typical European morphological features in several of the skeletons.


July 25, 2008

German origin of Transylvanian Saxons

Using Athey's haplogroup predictor, with equal priors and a threshold of 50 and probability of 90%, the following haplogroups were predicted in the 59 males:

5 E1b1b
1 G1
2 G2a
2 H
4 I1
3 I2a(xI2a2)
1 I2a2
1 I2b1
1 J2b
1 N
2 R1a
22 R1b

Rom J Leg Med 12 (4) 247 – 255 (2004)

A study on Y-STR haplotypes in the Saxon population from Transylvania (Siebenbürger Sachsen): is there an evidence for a German origin?

Ligia Barbarii et al.

ABSTRACT: A study on Y-STR haplotypes in the Saxon population from Transylvania
(Siebenbürger Sachsen): is there an evidence for a German origin? Y chromosome markers are increasingly used to investigate human population histories, being considered to be sensitive systems for detecting the population movements. In this study we present Y-STR data for a male population of Transylvanian Saxons in
comparison with Y-haplotypes from Romanians and other European populations. The Transylvanian Saxons, called like that since medieval times, are representing a western population with unknown origin, settled in the Arch of Romanian Carpathian Mountains in the earliest of the 12th century. Historical and dialectal studies strongly suggest that they do not originate from Saxony, but more probably from the Mosel riversides (Rhine affluent) and also from the Eifel Mountains Valley (present territory of Luxembourg). Living protected by fortified cities in compact communities, they still represent a quite distinct population in Transylvania. For this study, 59 male samples were collected from the Siebenburgen area, subjects being selected by their Saxon surnames and paternal grandfather birthplace. A set of nine STR polymorphic systems mapping on the male-specific region of the human Y chromosome (DYS19, DYS385, DYS389 I/II, DYS390, DYS391, DYS392, DYS393) were typed by means of
one or two two multiplex PCR reactions and capillary electrophoresis. The typing results reflect high Saxon population haplotype diversity. Furthermore, we present data on the haplotype sharing of the Saxon population with other European populations, especially with Germans as well as with the Romanians and the Transylvanian Szekely.

Link (pdf)

Data on 17 Y-STRs

Int J Legal Med. 2008 Jul 24

Population and segregation data on 17 Y-STRs: results of a GEP-ISFG collaborative study.

Sánchez-Diz P, Alves C, Carvalho E, Carvalho M, Espinheira R, García O, Pinheiro MF, Pontes L, Porto MJ, Santapa O, Silva C, Sumita D, Valente S, Whittle M, Yurrebaso I, Carracedo A, Amorim A, Gusmão L; GEP-ISFG (The Spanish and Portuguese Working Group of the International Society for Forensic Genetics).

A collaborative work was carried out by the Spanish and Portuguese International Society for Forensic Genetics Working Group in order to extend the existing data on Y-short tandem repeat (STR) mutations at the 17 Y chromosome STR loci included in the AmpFlSTR YFiler kit (Applied Biosystems): DYS19, DYS385, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, and GATA H4.1. In a sample of 701 father/son pairs, 26 mutations were observed among 11,917 allele transfers across the 17 loci. After summing previously reported mutation data with our sample, mutation rates varied between 4.25 x 10(-4) (95% CI 0.05 x 10(-3)-1.53 x 10(-3)) at DYS438 and 6.36 x 10(-3) (95% CI 2.75 x 10(-3)-12.49 x 10(-3)) at DYS458. All mutations were single step, and mutations in the same father/son pair were found twice.


Ancient Thracian mtDNA

The presentation of the results isn't very clear. From a cursory comparison of the results listed in the text with the Genographic project list of motifs, at least the following seem represented in the ancient Thracian individuals:
  • 1 individual seems to be 16129A 16223T
  • 1 individual seems to be 16145A
  • 1 individual seems to be 16186T 16190C (however, this looks like 16189C in Fig. 4, 186T and 189C are found in haplogroup T1)
  • 1 individual seems to be 16193T 16283C (16193T is found in J2, which also carries 16069T (beyond the region sequenced) 16126C (in the region sequenced but not found).
  • 1 individual seems to be 16311C
  • 2 individuals seems to be 16362C which in West Eurasia seems to be found in R0a and R6
Anyway, feel free to comment if you can make better sense of these results.

Rom J Leg Med 12 (4) 239 – 246 (2004)

Paleo-mtDNA analysis and population genetic aspects of old Thracian populations from South-East of Romania

Cardos G. et al.

ABSTRACT: Paleo-mtDNA analysis and population genetic aspects of old Thracian populations from South-East of Romania. We have performed a study of mtDNA polymorphisms (HVR I and HVR II sequences) on the skeletal remains of some old Thracian populations from SE of Romania, dating from the Bronze and Iron Age in order to show their contribution to the foundation of the modern Romanian genetic pool and the degree of their genetic kinships with other old and modern human European populations. For this purpose we have applied and adapted three DNA extraction methods: the phenol/chloroform, the guanidine isotiocianat and silica particles and thirdly the Invisorb Forensic Kit (Invitek)-based DNA extraction method. We amplified by PCR short fragments of HVR I and HVR II and sequenced them by the Sanger method. So far, we have obtained mtDNA from 13 Thracian individuals, which we have compared with several modern mtDNA sequences from 5 European present-day populations. Our results reflect an evident genetic similarity between the old Thracian individuals and the modern populations from SE of Europe.

Link (pdf)

July 24, 2008

Cuban mtDNA and Y chromosomes

A message from this study is that Y chromosome diversity within an already settled territory can indeed be wiped out. Introduction of new pathogens or a technological differential between colonists and natives, are just two possible ways to achieve this.

Many technological innovations (e.g. farming, Bronze, Iron) originated in a very small part of the Old World and spread far and wide. I would not be very surprised if this coincided with a massive replacement of Y chromosomes. The legacy of the earlier inhabitants may, of course, endure, via mtDNA, or autosomal DNA.

BMC Evol Biol. 2008 Jul 21;8(1):213. [Epub ahead of print]

Genetic origin, admixture, and asymmetry in maternal and paternal human lineages in Cuba.

Mendizabal I, Sandoval K, Berniell-Lee G, Calafell F, Salas A, Martinez-Fuentes A, Comas D.

ABSTRACT: BACKGROUND: Before the arrival of Europeans to Cuba, the island was inhabited by two Native American groups, the Tainos and the Ciboneys. Most of the present archaeological, linguistic and ancient DNA evidence indicates a South American origin for these populations. In colonial times, Cuban Native American people were replaced by European settlers and slaves from Africa. It is still unknown however, to what extent their genetic pool intermingled with and was 'diluted' by the arrival of newcomers. In order to investigate the demographic processes that gave rise to the current Cuban population, we analyzed the hypervariable region I (HVS-I) and five single nucleotide polymorphisms (SNPs) in the mitochondrial DNA (mtDNA) coding region in 245 individuals, and 40 Y-chromosome SNPs in 132 male individuals. RESULTS: The Native American contribution to present-day Cubans accounted for 33% of the maternal lineages, whereas Africa and Eurasia contributed 45% and 22% of the lineages, respectively. This Native American substrate in Cuba cannot be traced back to a single origin within the American continent, as previously suggested by ancient DNA analyses. Strikingly, no Native American lineages were found for the Y-chromosome, for which the Eurasian and African contributions were around 80% and 20%, respectively. CONCLUSIONS: While the ancestral Native American substrate is still appreciable in the maternal lineages, the extensive process of population admixture in Cuba has left no trace of the paternal Native American lineages, mirroring the strong sexual bias in the admixture processes taking place during colonial times.


Hercules movie in development

Berg to direct 'Hercules':
"Hancock" director Peter Berg is spearheading a fresh take on Hercules for Universal.

Berg will produce and will develop to direct "Hercules: The Thracian Wars," a co-production of Spyglass Entertainment, Berg’s Film 44 and Radical Pictures. Spyglass and Universal will co-finance the film.

Ryan Condal will write the script, based on a five-issue comicbook series by Steve Moore that debuted in May through Radical Publishing.

July 21, 2008

How Y-STR variance accumulates: a comment on Zhivotovsky, Underhill and Feldman (2006)

An important erratum for this post.

Additions to this entry at the bottom (last update July 29)

In recent years, in most population genetics papers, an evolutionary mutation rate for Y chromosome microsatellites (STRs) of 0.00069/locus/generation has been used. This rate was proposed by Zhivotovsky et al. (2004) (pdf), and defended in Zhivotovsky et al. (2005), and especially Zhivotovsky, Underhill and Feldman (2006) (henceforth Z.U.F.)

This mutation rate is smaller than the observed germline mutation rate by a factor of 3-4. The germline mutation rate is observed by counting mutations directly, e.g., in father-son pairs, or in known pedigrees. Zhivotovsky et al. have provided two pieces of evidence in favor of their evolutionary rate:
  • Study of accumalation of STR variation in populations with known founding events, namely Bulgarian Roma and Maori, in their 2004 paper.
  • Simulations indicating a 3.6x discrepancy between the two rates in their 2006 paper, which is due to multiple bottlenecks in a haplogroup's history.
I was always apprehensive about what the "right" mutation rate should be:
We need to obtain good estimates of the mutation rate in order to pinpoint in time the common ancestor of a set of Y chromosomes. A factor of 3, especially for relatively recent events may correspond to a difference between early historical and late Paleolithic events.
Thus, I decided to look into the matter myself to be convinced -one way or another- of what the evolutionary mutation rate must be.


The following assumptions, following Z.U.F. are made:
  • A man has 0, 1, 2, ... sons according to a Poisson process with mean m=1.
  • A step mutation (increase or decrease by 1 repeat) occurs with a mutation rate of µ=0.00251
  • STR variance of the man's descendants is measured after g generations.
Results are averaged over N men who have descendants after g generations. I will call such men, "Patriarchs". Thus, I generate random family trees for men until I have harvested N=10,000 of them who have living descendants today.2

Patriarch vs. MRCA

A consequence of the time-forward methodology of simulation, is that a Patriarch may not be the Most Recent Common Ancestor (MRCA) of his descendants g generations into the future. Trivially, if a Patriarch has only one son, then, that son -not the Patriarch- is the MRCA of his descendants. But, even if the Patriarch has many sons, and his group of descendants grows, it is possible (due to randomness of the fathering process) that at some generation only 1 descendant will survive.

Suppose that the Patriarch has lived in generation 0, and the MRCA lived in generation i. Thus, STR variance in the descendants at generation g (today) has accumulated over a time span of g-i generations, since, of course, at the generation i (of the MRCA), STR variance is zero.

Now, if we use a time-forward methodology from known foundation events (e.g. the arrival of the Roma in Bulgaria, or the Maori in New Zealand), it is perfectly right to see how STR variance accumulates from the known foundational event. We would then divide the accumulated STR variance by the known time span to determine an effective evolutionary mutation rate, similar to Zhivotovsky et al. (2004).

But, when the foundational event is unknown, when we are trying to estimate its age, then we can only go as far back as the MRCA, since at his time variance is zero. Therefore, by dividing accumulated variance with the evolutionary mutation rate of Z.U.F., we are over-estimating the time to the MRCA.

For example, with g=100, the average STR variance for the descendants of N=10,000 Patriarchs is 0.0755. But, if we average only those Patriarchs who are also the MRCA of their descendants, we obtain a value of 0.0824, or about 9% higher.

In general, the over-estimate (as a percentage) decreases as g increases: as g increases, the average number of descendants of a Patriarch increases, making them much less susceptible to a variance-reset type of bottleneck described here.

Thus, while the age difference between the MRCA and the Patriarch is real, its effect in the age estimate is not very pronounced. There is, however, a second, and much more serious problem, with the Z.U.F. rates when applied to evolutionary studies.

Prolific vs. Non-Prolific Patriarchs: an Observation Selection effect

Patriarchs starting at generation 0 will have a very variable number of descendants at generation g. By averaging over all of them, we are estimating the average STR variance in the descendants of men who lived g generations ago.

Now, consider how this average changes if we average only over the k most "prolific" men (with the most descendants) out of all the N=10,000 Patriarchs:

Average Variance

It is clear that the STR variance in the descendants of the most "prolific" Patriarchs is much higher than in the descendants of the least "prolific" ones. In fact, for the most prolific Patriarchs, variance accumulates near the germline mutation rate, and not at the lower evolutionary effective rate.

Below is the cumulative percentage of the descendants of the k most prolific Patriarchs, with k from 1 to N.

It can be seen that e.g., from the most prolific half of the Patriarchs stems 84% of the descendants. And this, assuming no social inequality in the number of progeny, i.e. each man having the exact same average probability (m=1) of fathering a son. Thus, in reality, the more prolific Patriarchs may have an even larger fraction of the descendants.

Why is this important? Because, in population studies, scientists are likely observe (in the finite samples they collect) multiple descendants only of the most prolific of the Patriarchs. Thus, for the vast majority of the Patriarchs with few descendants, we are likely to sample no, or few of their descendants.

This means that there is an inherent observation selection effect in the types of Patriarchs we are likely to study: they are much more likely to be among the prolific ones. Coupling this observation with the knowledge that STR variance in the descendants of prolific Patriarchs accumulates near the germline mutation rate (0.69µ for the 100 most prolific ones in my experiment), we, once again, conclude that the STR variance in haplogroups likely to be made the object of scientific study accumulates near the germline mutation rate, and at the very least, faster than the evolutionary rate of Z.U.F.

Closing Remarks

Z.U.F. have also proposed two additional demographic scenaria under which a higher effective mutation rate would be observed:
  • A sudden jump in the size of the haplogroup after it appears
  • An expanding population (m>1)
Both factors seem reasonable for post-Holocene human populations. It is well known that -whatever temporary setbacks there were- mankind has overall experienced a substantial population growth in recent millennia. Thus, an expanding population seems like a fair assumption.

Moreover, it is reasonable to assume that in stratified human societies, a few males, (leaders, or conquerors), or groups of closely related males may have generated a disproportionate number of descendants in the short-term.

In summary:
  • The age difference between the Patriarch and the MRCA indicates that Variance/0.00069 overestimates the age of the MRCA somewhat (but not very much).
  • A prolific Patriarch's descendants are more likely to be sampled by scientists, and tend to have a higher STR variance. Hence, Variance/0.00069 overestimates the age of the MRCA, perhaps substantially.
  • Demographic factors, such as population growth, or short-term success by related males indicates that Variance/0.00069 overestimates the age of the MRCA.
In view of the above, and keeping in mind both the stochastic factors that cause STR variance to fluctuate around its expected value, as well as uncertainties in demographic history, I do believe that ages calculated with the evolutionary mutation rate of 0.00069/locus/generation are significantly overestimated.

1 Z.U.F. used a germline mutation rate of µ=0.001. For the purposes of simulation, this is not an important difference, as they themselves note. I choose the rate of 0.0025 because it is closer to the actual human germline mutation rate for STRs.
2 Z.U.F. generated 50,000 men and then averaged over the men who had descendants. I, on the other hand, generate as many men as it takes to harvest at least N men with descendants, to ensure that I average a substantially large number of such men.

Editorial change (Jul 22): erroneously written "exceeds",in paragraph 2, changed to "is smaller than".

Update (July 23):

To further elucidate how the observation selection effect may make lineages seem older than they really are, I carried out another small experiment (g=110, N=10,000, m=1).

The age of each group is inferred by dividing the accumulated variance by the evolutionary rate of 0.0006944 (=μ/3.6).

The average variance over all N in this experiment is 0.0867, thus, the average inferred age is 125 generations, close to the truth (110 generations), allowing for the correction in age between the Patriarch and the TMRCA.

However, if we calculated the average variance over ten groups of 1,000 lineages (out of all N=10,000) according to the number of descendants, we see, as described above, that more "prolific" lineages have accumulated more variance, whereas less "prolific" ones have accumulated less variance than the overall average of 0.0867.

Thus, over the 10% most populous lineages (right of the figure), the average inferred age is 209 generations, or a 90% overestimate of the true age!

But, as I mentioned, it is precisely these populous lineages (which don't just have "some" descendants today, but thousands and millions of them) that are likely to be studied, because they are the only ones that have enough representatives in a sample of 100-1,000 men, typically seen in a population study, to allow for an age estimate via a variance calculation.

Update (July 24): Haplogroup sizes

The number of a Patriarch's descendants after g generations is a random variable which depends on the parameters m (the population growth constant), and g, the number of generations.

Scientists typically look at haplogroups with thousands or millions of existing members. Are such haplogroups produced in the types of simulations performed by Z.U.F.?

I estimate the average size of the haplogroups of the haplogroups produced by Z.U.F. for different g=10,20,...,700 and m=1.

It is evident that this number increases linearly with g at a rate estimated to be 0.5/generation [This was also noted by Z.U.F. who state: "the average size of the surviving haplogroups increased each generation by a value rapidly approaching 0.5"] However, this means, that the average haplogroup at 700 generations has a size of ~350 men.

Thus, not only is the average variance estimated by Z.U.F. inappropriate because of an observation selection effect (averaging over small and large haplogroups alike), but it seems to miss the relevant observations altogether, i.e. the really large haplogroups numbering in the hundreds of thousands or millions. Yet it is precise for such large haplogroups that it has often be used in the literature.

How can we produce "realistic" haplogroup sizes, close to those likely to become an object of scientific study in contemporary human populations? We can either:
  • increase the number of initial representatives, i.e. start with many related men with identical Y chromosomes rather than just 1, or we can
  • increase the population growth constant m to something higher than 1, i.e. a growing population.
Yet, both these changes have the same effect, namely the accumulation of variance at a higher rate than the Z.U.F. rate.

Indeed, Z.U.F. produce some such large haplogroups in some of their simulations (Fig. 1 asterisks, Fig. 2 squares/diamonds), all of which show -predictably- a higher effective rate than their 3.6x slower rate.

They caution against such large haplogroup sizes ["population size exceeds 1 million by generation 1000, which is not realistic for many local tribes."]. Granted, -- if one looks at local tribes never growing to large numbers.

And yet, some or all of the co-authors of Z.U.F. did not limit their use of the 3.6x slower rate to local tribes: Cinnioglu et al. 2004 (pdf), Sengupta et al. (2006), King et al. (2008) all apply the 0.00069 rate for populations (and haplogroups) that have grown to much more than 1 million in less time, thus overestimating severely their age.

Update (July 24): Variance of a large haplogroup

Following the previous observations, naturally, I wanted to see for myself what the STR variance of an ancient lineage with a large number of modern descendants actually looks like. My target size is 1,000,000, which is about 20% of modern Greek males.

I consider two cases:
  • Expansion commencing in the Late Bronze Age (g=120 or 1,600BC with a generation length of 30)
  • Expansion commencing in the early Neolithic (g=300 or 7,000BC)

I harvest N=1,000 haplogroups for each of these cases. I set the growth constant at m=1.100694 for the Bronze Age, and m=1.039122 for the Neolithic. This ensures that enough "large" haplogroups will be generated during simulation. Naturally, the overall population grows at a smaller rate, but the successful lineages will grow much faster than the population average.

Note that I harvest only haplogroups whose MRCA lived in the specified time span. Also, I harvest haplogroups whose final size is between 750,000 and 1,250,000 to match my target size of 1,000,000. Indeed, the average size of the harvested haplogroups is 964,327 for the Bronze Age, and 979693 for the Neolithic.

Here are the results:
  • ~1 million descendants of a Bronze Age (120 generations ago) ancestor have an STR variance of 0.269 +/ 0.087
  • ~1 million descendants of a Neolithic (300 generations ago) ancestor have an STR variance of 0.629 +/- 0.156
If we used the germline mutation rate (μ=0.0025) we would estimate the ages of these haplogroups as:
  • Bronze Age: 107.6 generations, or a 10% underestimate
  • Neolithic: 251.6 generations, or a 16% underestimate
On the other hand, if we used the evolutionary rate of 0.00069 of Z.U.F., our estimates would be:
  • Bronze Age: 389.9 generations, or a 225% overestimate
  • Neolithic 911.6 generations, or a 203% overestimate
It is clear that the Z.U.F. rate of 0.00069 substantially overestimates the ages of large recent haplogroups, whereas the germline rate underestimates them by a little.

Let's look at some concrete examples of age estimates in the literature, where I compare my own (first) estimates with the published ones. Here is how my estimates are derived:

For a Bronze Age ancestor (g=120) it is: 0.269 =(approx) 0.9 μg

For a Neolithic ancestor (g=300) it is: 0.629 =(approx) 0.84 μg

Thus, the correction multiplier, if the variance is between 0.269 and 0.629 is between 0.84 and 0.9; I will use the midpoint 0.87. If the variance is less than 0.269, then I use 0.9. If the variance is more than 0.629 then I use 0.84. Of course, the correction factor could be expressed more accurately as a function of the variance.

Note that the generation length preferred by these authors is 25, by me it is 30. All ages are ky BC.

Cinnioglu et al. (2004)

In this paper, an evolutionary rate of 0.0007 is used.


E-M78 is dated to 400BC, only a couple of centuries after the historical Greek colonization. E-M78 reaches its maximum in the Peloponese, a major source of Greek colonists.

I-P37 and J-M12 are dated to 1,100BC and 1,200BC, at around the time that e.g. the Phrygians from the Balkans are believed to have migrated to Asia Minor. I-P37 and J-M12 reach their maxima in areas north of Greece where the Phrygians are said to have originated.

Sengupta et al. (2006)

R-M17 (upper caste)

Thus, all the exogenous West Asian lineages in India have post-Neolithic ages, with R-M17 having a suggestive age of 1,500BC coinciding with the suggested date for the Indo-Aryans.

King et al. (2008)

J-M12 (Nea Nikomedeia)
E-V13 (Sesklo/Dimini)
E-V13 (Lerna Franchthi)
J-M92 (Crete)
0.1 AD
J-M319 (Crete)
0.1 AD
E-V13 (Crete)
0.8 AD

These are very localized samples, so they should not be interpreted as reflecting expansion times in Greece itself, however, they do suggest a Bronze Age expansion of E-V13 and a much later arrival of E-V13 in Crete.

Note that for Crete, the 1,000,000-haplogroup size assumption is a substantial overestimate, so my age estimates are also substantial underestimates.

Update (July 25): R-M17 in South Siberia

Derenko et al. (2006) "Contrasting patterns of Y-chromosome variation in South Siberian populations from Baikal and Altai-Sayan regions" calculate the variance of R-M17 chromosomes in South Siberia, using the Z.U.F. rate, arriving at an age of 11.3kya corresponding to a value of 0.31. This corresponds to 2,300BC according to my estimate (see previous update).

Recently Bouakaze et al. (Int J Legal Med (2007) 121:493–499) reported the presence of R-M17 chromosomes in ancient inhabitants of South Siberia and the Andronovo culture (2,500BC-1,500BC).

The Andronovo culture is widely believed to be of Eastern European ultimate origin, reflecting the eastward movement of the Kurgan culture, and is associated by some with the ancestors of the Indo-Iranians.

In the Balkans, again in Z.U.F. years, the age of R-M17 is 15.8kya corresponding to variation of 0.44, corresponding to ~4,000BC according to my estimate.

Update (July 25): Baltic Y chromosomes

Lappalainen et al. (2008) use the Z.U.F. rate to estimate the antiquity of lineages in the Baltic region. Dates are ky BC.

Lappalainen Dienekes

1,000BC for I1a in the Baltic region is within the time frame of the emergence of the Germanic people who did experience a strong demographic growth.
1,500BC for N3 shows a rather late time for Finno-Ugrians. However, it must be noted that smaller demographic sizes would impose more drift, and hence a slower accumulation of variance. Therefore, this time is probably underestimated.
1,900BC for R1a1 is consistent with the northern edge of the expansion of R1a1. Once again, reduced variance may also be influenced by smaller population numbers, making this a possible underestimate.

Update (July 25): Southeastern Europe (the Balkans)

Pericic et al. (2005) use the Z.U.F. rate to estimate ages of Y-chromosome lineages in the Balkans. Dates are ky BC.

I1b* (xM26)
J-M241 (without Kosovars) 1

Thus, Balkan haplogroup I seems related to a Bronze Age origin, with R-M17 being substantially older, and deriving perhaps from northern Balkan Neolithic or alternatively intrusive Kurgan populations. J-M241 seems to be quite young, similar to J-M12 in Nea Nikomedeia (see discussion of King et al. (2008) above).

The young ages of J-M12 and J-M241 also explain the striking inverse correlation between it and J-M410, which makes sense if it expanded later. A fairly late expansion also explains its under-representation in Southern Italy and Anatolia: it appears to be a rather young and "Epirotic" clade that was too late in coming to significantly participate in the historical Greek colonization.

Update (July 26): E3b in Cyprus and Southern Italy

Capelli et al. (2005) [Population Structure in the Mediterranean Basin: A Y Chromosome Perspective] study Y-chromosome variation in many Mediterranean populations including Cyprus. I use a mutation rate of 0.0018 for the six markers used in this study (Quintana-Murci et al. AJHG 68(2) pp. 537 - 542 ). Ages are in ky BC.

I come up with an age of 1.4ky BC for E3b in Cyprus, which is consistent with Mycenaean and later Greek settlements on the island.

I also looked at Southern Italian Y chromosomes. I removed those with values other than (13,12) in DYS19,DYS388), since these are universal in Greek E-V13, in order to remove possible contamination from non E-V13 chromosomes. The resulting age is 900BC, once again very close to the historical Greek colonization of Magna Graecia.

July (26): A more elaborate population growth model

Z.U.F. also propose (Fig. 2 triangles) a more elaborate population growth with:
  • m=1.002 before 400 generations
  • m=1.012 from 400 to to 14 generations ago
  • m=1.12 from 14 to 8 generations ago
  • m=1.25 from 8 generations ago to current time

I ran a simulation (g=1000, N=10,000) with this population growth model. The average size of the descent groups of the MRCAs is 692,982 men. Averaged all of them, variance is 1.37.
  • With the germline mutation rate, an estimate of 549 generations (45% underestimate)
  • With the Z.U.F. evolutionary rate, an estimate of 1,988 generations (99% overestimate)
If we limit ourselves only to the 10, 1000, 5000 most prolific MRCAs (out of the N=10,000), we obtain ages (respectively):
  • With the germline mutation rate: 776, 747, 668 generations
  • With the Z.U.F. evolutionary rate: 2,810, 2,707, 2,419 generations

Thus, one can estimate that STR variance since the time of the MRCA accumulates at a rate of ~0.75μ / generation.

And, yet, the 0.00069 rate has been used to date Paleolithic events, e.g., by Semino et al. (2004) [Am. J. Hum. Genet. 74:1023–1034, 2004], leading to general age overestimates.

Update (July 29)

My discussion is continued in Haplogroup sizes and observation selection effects (continued)

Y chromosomes and mtDNA of Daghestan groups

This is a free paper which establishes the difference between highland Northeast Caucasian speakers and lowland Altaic speakers in Daghestan. The lowland groups show evidence of Mongoloid haplogroups in both Y chromosomes and mtDNA, while the highland groups are dominated by haplogroup J:
The highland Avar, Dargin, and Kubachi exhibit high frequencies of haplogroup J (0.56, 1.00, and 0.67, respectively)
According to Table 2, the Avars possess 0.33 of J2, so, consistent with previous observations, the Northeast Caucasian groups are J1 (or at least J*(xJ2)) exclusive.

Interestingly, haplogroup G occurs in the Avars (0.06) but not in the other highland groups. Haplogroup G is common in the Southern Caucasus. The mountain groups also have little R1*(xR1a1) (0.06 in Avars, 0.08 in Kubachi) and no I, R1a1 or E.

It certainly seems to be the case that the highland Northeast Caucasian speakers are descended from a J1-dominated ancient Near Eastern population which was preserved due to patrilocal endogamy. The relationship -that I wrote about earlier- of these Caucasian J1's to the Arabian J1's, the second major region of J1 dominance remains to be seen.

BMC Genetics 2008, 9:47 doi:10.1186/1471-2156-9-47

Culture creates genetic structure in the Caucasus: Autosomal, mitochondrial, and Y-chromosomal variation in Daghestan

Elizabeth E Marchani 1, W Scott Watkins 2, Kazima Bulayeva 3, Henry C
Harpending 1, Lynn B Jorde 2§



Near the junction of three major continents, the Caucasus region has been an important thoroughfare for human migration. While the Caucasus Mountains have diverted human traffic to the few lowland regions that provide a gateway from north to south between the Caspian and Black Seas, highland populations have been isolated by their remote geographic location and their practice of patrilocal endogamy. We investigate how these cultural and historical differences between highland and lowland populations have affected patterns of genetic diversity. We test 1) whether the highland practice of patrilocal endogamy has generated sex-specific population relationships, and 2) whether the history of migration and military conquest associated with the lowland populations has left Central Asian genes in the Caucasus, by comparing genetic diversity and pairwise population relationships between Daghestani populations and reference populations throughout Europe and Asia for autosomal, mitochondrial, and Y-chromosomal markers.


We found that the highland Daghestani populations had contrasting histories for the mitochondrial DNA and Y-chromosome data sets. Y-chromosomal haplogroup diversity was reduced among highland Daghestani populations when compared to other populations and to highland Daghestani mitochondrial DNA haplogroup diversity. Lowland Daghestani populations showed Turkish and Central Asian affinities for both mitochondrial and Y-chromosomal data sets. Autosomal population histories are strongly correlated to the pattern observed for the mitochondrial DNA data set, while the correlation between the mitochondrial DNA and Y-chromosome distance matrices was weak and not significant.


The reduced Y-chromosomal diversity exhibited by highland Daghestani populations is consistent with genetic drift caused by patrilocal endogamy. Mitochondrial and Ychromosomal phylogeographic comparisons indicate a common Near Eastern origin of highland populations. Lowland Daghestani populations show varying influence from Near Eastern and Central Asian populations.

Link (pdf)

July 18, 2008

'Ten Commandments' of race and genetics

Via the New Scientist:
Even with the human genome in hand, geneticists are split about how to deal with issues of race, genetics and medicine.

Some favor using genetic markers to sort humans into groups based on ancestral origin – groups that may show meaningful health differences. Others argue that genetic variations across the human species are too gradual to support such divisions and that any categorisation based on genetic differences is arbitrary.

These issues have been discussed in depth by a multidisciplinary group – ranging from geneticists and psychologists to historians and philosophers – led by Sandra Soo-Jin Lee of Stanford University, California.

Now the group has released a set of 10 guiding principles for the scientific community, published as an open letter in this week's Genome Biology.

Here is my commentary on each of the "commandments":
1. All races are created equal

No genetic data has ever shown that one group of people is inherently superior to another. Equality is a moral value central to the idea of human rights; discrimination against any group should never be tolerated.
This is a vague statement that is false for two reasons: (i) for any particular single trait, there is a wealth of evidence that one race may be genetically better than another, e.g., Caucasoids are inherently more likely to get skin cancer than Negroids. (ii) there is no way set-in-stone to rand two groups based on a number of many different traits. But, this is an obvious statement: if someone is beautiful and dumb and another one is ugly and intelligent, then you can't say that one is better than another: it depends on what importance you assign to different traits.
2. An Argentinian and an Australian are more likely to have differences in their DNA than two Argentinians

Groups of human beings have moved around throughout history. Those that share the same culture, language or location tend to have different genetic variations than other groups. This is becoming less true, though, as populations mix.
Correct, although populations are hardly mixing at a very high rate even in our interconnected world, but definitely more so than in pre-Columbian times.
3. A person's history isn't written only in his or her genes

Everyone's genetic material carries a useful, though incomplete, map of his or her ancestors' travels. Studies looking for health disparities between individuals shouldn't rely solely on this identity. They should also consider a person's cultural background.
Essentially correct, since groups and individuals differ both because of genes and because of culture.
4: Members of the same race may have different underlying genetics

Social definitions of what it means to be "Hispanic" or "black" have changed over time. People who claim the same race may actually have very different genetic histories.
Correct in the sense that there is variation within races. Also, in the sense that socially-defined races such as "black" and "Hispanic" do not correspond perfectly to biological races. "Blacks", at least in the United States are usually thought of as partial Negroids, and "Hispanics" are usually thought as Spanish speakers who tend to have a variable amount of Caucasoid and American Mongoloid ancestry.
5. Both nature and nurture play important parts in our behaviors and abilities

Trying to use genetic differences between groups to show differences in intelligence, violent behaviors or the ability to throw a ball is an oversimplification of much more complicated interactions between genetics and environment.
Essentially correct. However, this statement is often used to "ease the blow" of the fact that races may indeed have genetic differences that affect outcomes irrespective of environments, or at least in the range of environments that people tend to find themselves in in the 21st century.
6. Researchers should be careful about using racial groups when designing experiments

When scientists decide to divide their subjects into groups based on ethnicity, they need to be clear about why and how these divisions are made to avoid contributing to stereotypes.
No disagreement here.
7. Medicine should focus on the individual, not the race

Although some diseases are connected to genetic markers, these markers tend to be found in many different racial groups. Overemphasising genetics may promote racist views or focus attention on a group when it should be on the individual.

Focusing on the individual is a noble goal for the future. Doctors don't have infinite time and resources to study the individual in all its particulars, so they work by placing him and his condition in a few relevant categories, e.g., "old white male". The category "white" may be of little relevance depending if one has a broken limb but of greater relevance if one has a skin pathology.

Individuals are real, but we don't really perceive individuals: we perceive a cloud of categories and attributes about individuals, as time, knowledge, and interest allows, and one of these categories -and not an insignificant one- is their race.
8. The study of genetics requires cooperation between experts in many different fields

Human disease is the product of a mishmash of factors: genetic, cultural, economic and behavioral. Interdisciplinary efforts that involve the social sciences are more likely to be successful.


9. Oversimplified science feeds popular misconceptions

Policy makers should be careful about simplifying and politicising scientific data. When presenting science to the public, the media should address the limitations of race-related research.

Scientists should try to make scientific results accessible to the public without fueling misconceptions. A big part of this is being honest about race-related research, something which many scientists holding a politically correct "races don't exist/races are social constructs" seem unwilling to do.
10. Genetics 101 should include a history of racism

Any high school or college student learning about genetics should also learn about misguided attempts in the past to use science to justify racism. New textbooks should be developed for this purpose.
Genetics 101 should focus on the science of genetics, nothing more and nothing less. It should impart on the student correct notions about the science, and about differences between groups.

UPDATE (July 19):

The Genome biology open letter on which the New Scientist article is based.

Better mental health of African Americans is not explained by social relationships

Personal Relationships doi: 10.1111/j.1475-6811.2008.00195.x

Race, social relationships, and mental health



Researchers often assume that the extent, quality, and effectiveness of personal relationships explain why African Americans have relatively good mental health despite experiencing high levels of stress. This study tests this assumption using data from the 1990–1992 National Comorbidity Survey. Few racial differences emerge in patterns of social relationships, and the nature and quality of social relationships do not explain African Americans' resiliency on mental health. Several aspects of social relationships benefit African Americans' mental health more than Whites', but these moderating effects are insubstantial. Hence, the data do not support the assumption. If social relationships help explain the lack of racial differences in mental health, their nature and effects must be more adequately conceptualized.


Nasal passage differences between Caucasoids and Negroids

American Journal of Physical Anthropology

Ecogeographic variation in human nasal passages

Todd R. Yokley


Theoretically, individuals whose ancestors evolved in cold and/or dry climates should have greater nasal mucosal surface area relative to air volume of the nasal passages than individuals whose ancestors evolved in warm, humid climates. A high surface-area-to-volume (SA/V) ratio allows relatively more air to come in contact with the mucosa and facilitates more efficient heat and moisture exchange during inspiration and expiration, which would be adaptive in a cold, dry environment. Conversely, a low SA/V ratio is not as efficient at recapturing heat and moisture during expiration and allows for better heat dissipation, which would be adaptive in a warm, humid environment. To test this hypothesis, cross-sectional measurements of the nasal passages that reflect surface area and volume were collected from a sample of CT scans of patients of European and African ancestry. Results indicate that individuals of European descent do have higher SA/V ratios than individuals of African descent, but only when decongested. Otherwise, the two groups show little difference. This pattern of variation may be due to selection for different SA/V configurations during times of physical exertion, which has been shown to elicit decongestion. Relationships between linear measurements of the skeletal nasal aperture and cavity and cross-sectional dimensions were also examined. Contrary to predictions, the nasal index, the ratio of nasal breadth to nasal height, is not strongly correlated with internal dimensions. However, differences between the nasal indices of the two groups are highly significant. These results may be indicative of different adaptive solutions to the same problem.


July 17, 2008

Beauty map of London

This is the kind of quantitative study that I really like. There is so much anecdotal talk and debate about whether people from this region/country/continent/class/religion etc. are more beautiful/attractive/intelligent/etc. but with the exception of IQ and personality traits, I have seen very little quantitative evidence for these assertions.

Like g where an individual's correlated performance in multiple test items allows us to extract a common underlying intelligence factor, correlated measures of attractiveness across many observers could in principle allow us to extract an individuals BQ (beauty quotient) in a controlled social science experiment.

Personality and Individual Differences doi:10.1016/j.paid.2008.05.005

A beauty-map of London: Ratings of the physical attractiveness of women and men in London’s boroughs

Viren Swami and Eliana G. Hernandez


In 1908, Francis Galton discussed anecdotal data he had collected for the compilation of a ‘beauty-map of the British Isles’. Based on his discussion, the present study attempted to compile a more empirical beauty-map of London. A community sample of 461 Londoners completed a questionnaire in which they rated the physical attractiveness of women and men in London’s 33 boroughs, as well as their familiarity with those boroughs. Results showed a significant interaction between borough and rated sex, with women being rated as more attractive across boroughs, and three boroughs in particular (the City of London, the City of Westminster, and Kensington and Chelsea) being rated high in physical attractiveness. Overall, ratings of attractiveness were significantly positively correlated with familiarity of boroughs, as well as objective measures of borough affluence (specifically, annual gross pay and average house prices) but not of borough health (life expectancy). These results are discussed in relation to the association between wealth and attractiveness, as well as Galton’s original beauty-map.