September 16, 2009

Y chromosome and mtDNA of goats in North Africa

Mol Biol Evol. 2009 Sep 3. [Epub ahead of print]

Tracing the history of goat pastoralism: new clues from mitochondrial and Y chromosome DNA in North Africa.

Pereira F, Queirós S, Gusmão L, Nijman IJ, Cuppen E, Lenstra JA; the Econogene Consortium, Davis SJ, Nejmeddine F, Amorim A.

Valuable insights into the history of human populations have been obtained by studying the genetic composition of their domesticated species. Here we address some of the long-standing questions about the origin and subsequent movements of goat pastoralism in Northern Africa. We present the first study combining results from mitochondrial DNA (mtDNA) and Y chromosome loci for the genetic characterization of a domestic goat population. Our analyses indicate a remarkably high diversity of maternal and paternal lineages in a sample of indigenous goats from the northwestern fringe of the African continent. Median-joining networks and a multidimensional scaling of ours and almost 2000 published mtDNA sequences revealed a considerable genetic affinity between goat populations from the Maghreb (Northwest Africa) and the Near East. It has been previously shown that goats have a weak phylogeographic structure compatible with high levels of gene flow, as demonstrated by the worldwide dispersal of the predominant mtDNA haplogroup A. In contrast, our results revealed a strong correlation between genetic and geographical distances in 20 populations from different regions of the world. The distribution of Y chromosome haplotypes in Maghrebi goats indicates a common origin for goat patrilines in both Mediterranean coastal regions. Taken together, these results suggest that the colonization and subsequent dispersal of domestic goats in Northern Africa was influenced by the maritime diffusion throughout the Mediterranean Sea and its coastal regions of pastoralist societies whose economy included goat herding. Finally, we also detected traces of gene flow between goat populations from the Maghreb and the Iberian Peninsula corroborating evidence of past cultural and commercial contacts across the Strait of Gibraltar.



argiedude said...
This comment has been removed by the author.
argiedude said...

Dienekes, I made a file that calculates FST for any combination of samples from the HGDP dataset. It's very easy to understand. I made this to demonstrate that the sample size in an FST estimate really does affect the result. In the recent Tian study of Europeans, where sample sizes were exceptionally small, some of the the results were predictably odd and contradictory, such as Greeks having a distance of 0,0000 to south Italians, while the distances of these two to Spain were 0,0010 and 0,0035, a huge difference within Europe, equivalent to the genetic distance between Spain and Czech Republic, or between Ireland and Russia. These results paint a blurry genetic map, which gives the false impression there's a gradual genetic change between Europe and the Middle East.

You need Excel 2007 to run the file. I tried converting it to Excel 2003, but every time it has to do a calculation it takes ten minutes. The same calculations in Excel 2007 take just 2 seconds.

Pick any population and put half of their samples in Pop 1 and the other half in Pop 2. Theoretically, their FST should be 0,0000. This hardly happens.

After loading the samples and obtaining the first FST results, interchange any 2 samples between the 2 columns and recalculate FST. Sometimes it will change a little (such as 0,0002), other times it will change by as much as 0,0050 or even more. Always just by interchanging 2 samples in the columns.

When I pointed out that the Greek-south Italian distance (in the Tian study) was completely unreliable because of the sample size, you observed that the standard deviation was only 0,0010, so that my argument that the real result could be off by as much as 0,0050 was incorrect, and that the real result was close to the study's estimate, after all. But if you look at the standard deviations in the file I uploaded, you'll see that this isn't reliable, either. For example, taking the 24 Yoruba samples, 12 in each column, the mean FST of 4 random subsets of 4500 SNPs is 0,0035 (a little on the high side when comparing samples of the same population but it happens). The standard deviation is 0,0004. After interchanging 2 samples I recalculated FST and now the mean of the 4 subsets is 0,0042 and the SD is 0,0003. Another interchange of samples and the mean drops to 0,0004, with a SD of 0,0005. The last 4 results don't come even close to overlapping with the previous 2 sets of 4 results each. The SD was very tight in all 3 runs, yet the mean differed by an order of magnitude of this standard deviation. This is just a typical example that can be easily reproduced in other samples.

One last note, I recommend disabling automatic recalculation in Excel [go to Excel Options > Formulas > then select Manual]. Afterwards, to manually instruct Excel to recalculate all cells, you have to press F9.