Showing posts with label Gypsies. Show all posts
Showing posts with label Gypsies. Show all posts

December 10, 2012

Roma origins once more (Moorjani et al. 2012)

I had first noticed that this new paper by Moorjani et al. was referenced by Loh et al., and it has now been posted on arXiv. In the last week, a couple of other papers on the same topic (Mendizabal et al. on autosomal DNA and Rai et al. on a Y-chromosome founder lineage) have also appeared.

All three studies appear to converge on NW India as the place of origin of the European Roma, and on a recent admixture between this "Proto-Roma" population and Europeans. It will be interesting to see if there are any substantial differences between Moorjani et al. and Mendizabal et al. in the reconstruction of Roma origins. There is also an appendix on updates to rolloff and other topics of a technical nature that ought to be useful to readers irrespective of their interest in this particular population.

It'll probably take me a while to digest everything in this paper, but I will make one quick observation after (virtually) leafing through the article; the observation that {CEU, ANI} form a clade with Adygei as an outgroup is used to infer admixture proportions. I recently had a blog post on the differential relationship of ANI to Caucasus populations, in which I showed that while D(CEU, Adygei; South Asian, Onge) was positive, and significant in some cases -- indicating CEU being more closely related to ANI (Ancestral North Indians) than Adygei -- the reverse was the case for D(CEU, Georgian/Lezgin; South Asian, Onge).

A second observation was inspired by the following figure:


High IBD sharing with Romanians makes sense, because there is good evidence (e.g., presence of Y-haplogroup E-V13) that the Roma picked up European ancestry in the Balkans. So, I'm fairly sure that we are seeing a real signal that the Roma have Romanian-like recent European ancestors. But, we ought to be vigilant, because it is possible that some Romanians may have Roma ancestry too!  This was the case in a couple of individuals from the Romanian sample of Behar et al. (2010).

This is a more general issue: IBD sharing occasionally involves strictly -or mostly- unidirectional gene flow,  e.g., sharing between European and African Americans largely went EA->AA way, so an AA sharing with a EA more often than not involves EA->AA gene flow.

But, in other cases, the direction of gene flow is more obscure (so, e.g., sharing between German, Magyar, and Slavic speakers, and Jews in the old Austro-Hungarian Empire). This issue often comes up in the genealogical community, with a typical example being a couple of individuals (let's call them Klaus and Mikolaj) discovering a shared IBD segment, and Klaus thinking he's found a Polish ancestor, and Mikolaj a German one.

In any case, as the authors themselves note it will be interesting to use more European reference populations, and this might indicate whether they picked up European ancestry in one particular region, carrying it with them as they expanded into the Balkans and beyond, or whether they picked it up by interacting with different host populations (e.g., Greek Gypsies with Greeks, Romanian Gypsies with Romanians, and so on).


arXiv:1212.1696 [q-bio.PE]

Reconstructing Roma history from genome-wide data

Priya Moorjani et al.

The Roma people, living throughout Europe, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1000-1500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry-deriving from a combination of European and South Asian sources- and that the date of admixture of South Asian and European ancestry was about 850 years ago. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which we hypothesize was followed by a major demographic expansion once the population arrived in Europe.

Link

December 06, 2012

Romani origins and admixture (Mendizabal et al.)

I will comment on the paper in this space after I read it. For the time being here's a link to the press release.
"From a genome-wide perspective, Romani people share a common and unique history that consists of two elements: the roots in northwestern India and the admixture with non-Romani Europeans accumulating with different magnitudes during the out-of-India migration across Europe," Kayser said. "Our study clearly illustrates that understanding the Romani's genetic legacy is necessary to complete the genetic characterization of Europeans as a whole, with implications for various fields, from human evolution to the health sciences."
The results seem to complement a recent Y-chromosome study of the major founder lineage of European Roma H-M82.

Current Biology dx.doi.org/10.1016/j.cub.2012.10.039

Reconstructing the Population History of European Romani from Genome-wide Data

Isabel Mendizabal et al.

The Romani, the largest European minority group with approximately 11 million people [1], constitute a mosaic of languages, religions, and lifestyles while sharing a distinct social heritage. Linguistic [2] and genetic [3, 4, 5, 6, 7 and 8] studies have located the Romani origins in the Indian subcontinent. However, a genome-wide perspective on Romani origins and population substructure, as well as a detailed reconstruction of their demographic history, has yet to be provided. Our analyses based on genome-wide data from 13 Romani groups collected across Europe suggest that the Romani diaspora constitutes a single initial founder population that originated in north/northwestern India ∼1.5 thousand years ago (kya). Our results further indicate that after a rapid migration with moderate gene flow from the Near or Middle East, the European spread of the Romani people was via the Balkans starting ∼0.9 kya. The strong population substructure and high levels of homozygosity we found in the European Romani are in line with genetic isolation as well as differential gene flow in time and space with non-Romani Europeans. Overall, our genome-wide study sheds new light on the origins and demographic history of European Romani.

Link

November 29, 2012

Pinpointing Roma origins: Out of Northwestern India

Interestingly, besides H-M82, there has been recent evidence that R-Z93 might also represent a second founder haplogroup of the European Roma populations; it will be interesting to study it in the future in order to confirm the scenario presented in this new paper.

From the paper:
This first genetic evidence of this nature allows us to develop a more detailed picture of the paternal genetic history of European Roma, revealing that the ancestors of present scheduled tribes and scheduled caste populations of northern India, traditionally referred to collectively as the Ḍoma, are the likely ancestral populations of modern European Roma. Our findings corroborate the hypothesized cognacy of the terms Rroma and Ḍoma and resolve the controversy about the Gangetic plain and the Punjab in favour of the northwestern portion of the diffuse widespread range of the Ḍoma ancestral population of northern India.
A paper about Roma origins based on autosomal DNA is also apparently in the works, so it will be interesting to see how it might tie in with the Y-chromosome evidence.

PLoS ONE 7(11): e48477. doi:10.1371/journal.pone.0048477

The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations

Niraj Rai et al.

Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.

Link

November 02, 2012

ALDER paper and software (Loh et al. 2012)

A new paper has appeared on the arXiv that introduces ALDER, a method for testing for admixture and inferring its parameters (when it happened and the proportions of the two mixing populations). You can get the software from here.

I have already tried it and I can confirm two claims in the paper (i) it's extremely fast, and (ii) it is conservative in the sense that it's test fails even when an f3 test of admixture indicates admixture. Here is a plot of one case where it detected admixture, ASW as CEU+YRI, I got the output on the right, which shows a very clear pattern of exponential decay. I also tried a different experiment using Mozabites as the admixed population. The results are quite interesting:

Test SUCCEEDS (z=10.39, p=2.7e-25) for Mozabite with {CEU30, YRI30} weights

DATA: success (warning: decay rates inconsistent) 2.7e-25 Mozabite CEU30 YRI30 10.39 6.75 11.39 55%  17.45 +/- 1.68 0.00037417 +/- 0.00003187 28.63 +/- 3.84 0.00005311 +/- 0.00000787 16.21 +/- 1.42 0.00023789 +/- 0.00001752

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

Notice that the 1-reference decay using CEU is 28.63 and with YRI it is 16.21, while the 2-reference (both CEU and YRI) is an intermediate 17.45. I believe that this is capturing the same behavior as Jin et al. (2012), according to which:
There was an almost complete absence of recent gene flow from European populations to the Mozabite gene pool (Figure 6A). For the Sub-Saharan African ancestral component, there were more long CSDAs at the tail of empirical distribution than those in the HI model, which confirmed that recent gene flow from African populations had contributed to the Mozabite gene pool (Figure 6B). 
This is also what ALDER is telling us, since the decay using CEU is more "abrupt" (hence lack of long segments of admixture that might indicate recent admixture), while that using YRI is less so (and hence recent Sub-Saharan admixture has contributed longer segments).

In any case, enough with my own preliminary experiments. From the paper itself, there are interesting applications of the new methodology for Sardinians, Japanese, and Central African Pygmies:
Both Central African Pygmy populations in the HGDP, the Mbuti and Biaka, show evidence of admixture (Table 1), about 28 +/- 4 generations (800 years) ago for Mbuti and 38 +/- 4 generations (1100 years) ago for Biaka, estimated using San and Yoruba as reference populations (Figure 2A,C). The intra-population heterogeneity is low, as demonstrated by the negligible affine terms. In each case, we also generated weighted LD curves with the Pygmy population itself as one reference and a variety of second references. We found that using populations French, Han, or Yoruba as the second reference gave very similar amplitudes, but the amplitude was significantly smaller with the other Pygmy population or San as the second reference (Figure 2B,D). Using the amplitudes with Yoruba, we estimated mixture fractions of at least 15.9 +/- 0.9% and 28.8 +/- 1.4% Yoruba-related ancestry for Mbuti and Biaka, respectively. 
For Sardinians:
We detect a very small proportion of Sub-Saharan African ancestry in Sardinians, which our ALDER tests identified as admixed (Table 1; Figure 3A). To investigate further, we computed weighted LD curves with Sardinian as a test population and all pairs of the HapMap CEU, YRI and CHB populations as references (Table 2). We observed an abnormally large amount of shared long-range LD in chromosome 8, likely do to an extended inversion segregating in Europeans (PRICE et al. 2008), so we omitted it from these analyses. The CEU–YRI curve has the largest amplitude, suggesting both that the LD present is due to admixture and that the small non- European ancestry component, for which we estimated a lower bound of 0.6+/-0.2%, is from Africa. The existence of a weighted LD decay curve with CHB and YRI as references provides further evidence that the LD is not simply due to a population bottleneck or other non-admixture sources, as does the fact that our estimated dates from all three reference pairs are roughly consistent at about 40 generations (1200 years). Our findings thus confirm the signal of African ancestry in Sardinians reported in MOORJANI et al. (2011). The date, small mixture proportion, and geography are consistent with a small influx of migrants from North Africa, who themselves traced only a fraction of their ancestry ultimately to Sub-Saharan Africa, consistent with the findings of DUPANLOUP et al. (2004).
Moorjani et al. (2011) had estimated 2.9% admixture in Sardinians occurring at 71 +/- 28 generations, so the new results appear to be different, perhaps on account of the the treatment of the chromosome 8 inversion or the ability of ALDER to pick the distance threshold (hard-set at 0.5cM in rolloff) adaptively. Also, note that ALDER is able to estimate admixture proportions based on the amplitude of the weighted LD, whereas in the previous test the proportions were calculated using an F4 ratio test which did not take into account East Eurasian-like gene flow into the CEU population, and considered both CEU and Sardinians as having experienced no Asian-related gene flow.

So it appears that the African admixture in Sardinians is real, but may be both lower and later than previously estimated. In a recent experiment, I "scrubbed" possible segments of African ancestry in Sardinians, and this diminished their African ancestry from 3.1% to 1.8%. If we consider the 1.8% to be the spurious admixture due to Asian-related gene flow into northern Europe, then African admixture in Sardinians will be the remainder 1.3%, and perhaps lower due to the very "intensive" nature of the scrubbing procedure.

globe4 estimates African admixture in Sardinians as 0.8%, with some heterogeneity in its apportionment in 28 different individuals (left), with three individuals appearing as outliers and the remainder randomly distributed around the 0.8% median. The outlier individuals are HGDP01062, HGDP01076, and HGDP01071; the last of these is not included in the curated version of HGDP released by Patterson et al. (2012). ALDER includes a facility for detecting heterogeneity in admixture, but I did not see this particularly discussed in my first scan of the paper. In any case, it now appears that different methods converge on a small African admixture in Sardinians, and the 1200-year old age estimate seems consistent with medieval history.


The paper also deals with the Japanese: 
Genetic studies have suggested that present-day Japanese are descended from admixture between two waves of settlers, responsible for the Jomon and Yayoi cultures (HAMMER and HORAI 1995; HAMMER et al. 2006; RASTEIRO and CHIKHI 2009). We also observed evidence of admixture in Japanese (Table 1), and while our ability to learn about the history is limited by the absence of a close surrogate for the original Paleolithic mixing population, we were able to take advantage of the one-reference inference capabilities of ALDER. We observed a clear weighted LD curve using HapMap JPT as the test population and JPT–CHB weights (Figure 3B). This curve yields an estimate of 45 +/- 6 generations, or about 1,300 years, as the age of admixture. To our knowledge, this is the first time genome-wide data have been used to date admixture in Japanese. As with previous estimates based on coalescence of Y-chromosome haplotypes (HAMMER et al. 2006), our date is consistent with the archaeologically attested arrival of the Yayoi in Japan roughly 2300 years ago (we suspect that our estimate is from later than the initial arrival because admixture may not have happened immediately). Based on the amplitude of the curve, we also obtain a (likely very conservative) genome-wide lower bound of 41 +/- 3% “Yayoi” ancestry using formula (12) (under the reasonable assumption that Han Chinese are fairly similar to the Yayoi population). It is important to note that observation of a single-reference weighted LD curve is not sufficient evidence to prove that a population is admixed, but we did find a pair of references with which the ALDER test identified Japanese as admixed, which, combined with previous work and the lack of any signal of reduced population size, makes us confident that our inferences are based on true historical admixture.
This is a useful application of the idea that you don't need both reference populations to estimate admixture. If a population A experiences gene flow from another B, then A will become more like B over time, and allele frequency differences between A and B will diminish but will continue to reflect differences between the local and introgressing element. This idea was first used by Pickrell et al. (2012), and a new variation of it is used in the current paper.

According to Wikipedia, Japanese skeletons of the Kofun period resemble those of modern Japanese, so perhaps the age estimate is a little younger than the actual period of admixture. In any case, perhaps admixture between populations carrying varying amounts of Yayoi/Jomon ancestry was not instantaneous, so ALDER is not picking up the beginning of a continuous process that lasted for several centuries.

Finally, there is a reference to another paper currently in submission: "MOORJANI, P., N. PATTERSON, P. LOH, M. LIPSON, and OTHERS, 2012 Reconstructing Roma history from genome-wide data. In submission." Given that the Roma likely possess really old West Eurasian admixture related to "Ancestral North Indians", as well as really recent European admixture after they migrated to Europe, and perhaps even intermediate West/Central Asian admixture as they made their way from India to the west, this seems like a very complicated case, involving admixture at different time scales, and between different but related populations, so it will be interesting to see how it will all fit together.

To conclude, ALDER seems like a very practical tool for studying admixture in human populations, so I'm sure it will prove quite useful in the future.

arXiv:1211.0251 [q-bio.PE]

Inference of Admixture Parameters in Human Populations Using Weighted Linkage Disequilibrium

Po-Ru Loh, Mark Lipson, Nick Patterson, Priya Moorjani, Joseph K. Pickrell, David Reich, Bonnie Berger

Abstract

Long-range migrations and the resulting admixture between populations have been an important force shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We de?ne an LD-based three-population test for admixture and identify scenarios in which it can detect admixture that previous formal tests cannot. We further show that we can discover phylogenetic relationships between populations by comparing weighted LD curves obained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the computation. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.

Link

October 31, 2012

Improved phylogenetic resolution within Y-haplogroup R1a1

Here is the table of haplogroup frequencies:

I have written before that I envision R1a1 to have been anciently distributed in the wide arc of "flatlands" north and east of the Caspian sea, complementing R-M269 whose distribution is suggestive of the short arc of "highlands" west and south of it.

The current distribution is strongly geographically bimodal, with peaks in eastern Europe and South Asia. The phylogeny of this group is continuously refined, but one of the problems with the "commercial" studies of haplogroups is that they tend to consist of samples drawn primarily of the groups of people who are likely to have heard of DNA testing, and this excludes large regions within the Eurasian heartland. For example, Z280 is listed as "Central and Eastern Europe Western Asia" in the R1a1a and subclades project, but here makes up 2/9 R-M198 related samples from the Central Asian Uzbek sample. Similarly, M458 is listed as Central Europe, but occurs in 1/9 Uzbek. We can't know for sure which SNP occurs where until we test large representative samples.

There are various aspects of the problem that need to be considered: the absence of R1a-related lineages in pre-Copper Age Europe, and of the distant R1b relative, together with the firm rooting of the R1 clade in Asia indicate that the lineage leading to R1a1 traces its ancestry to a migration into Europe. How and when that migration occurred is an open problem. The paucity of SNP diversity in South Asia, at least in the available samples, indicates that a migration into South Asia also occurred. So, I agree with the authors "that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe."

My working hypothesis is that the bimodal distribution of R1a1-related lineages in Eurasia can be explained on the basis of two expansions involving largely Z283 in Europe and Z93 in Asia. The source of those expansions may have been Central Asia, and the relative scarcity of R1a1 in that region (relative to Europe and South Asia) may be the result of a subsequent movement of East Eurasians into it, at the same time as the expansion of Altaic speakers. The Uzbek sample in this paper give us a strong hint about the existence of an overlap zone in Central Asia, but the SNP diversity is little studied in the populations of the -stan states, and ancient DNA samples are missing.

The issue of time depth is also relevant, as it will anchor in time the evolutionary relationships between different populations of R1a1 descendants. This can be achieved both by (i) typing ancient samples for the relevant markers, which will provide -assuming a positive result- a terminus ante quem for the appearance of particular SNP, and (ii) sequencing modern Y-chromosomes to determine their TMRCA.

AJPA DOI: 10.1002/ajpa.22167

Brief communication: New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1

Horolma Pamjav et al.

Abstract

Haplogroup R1a1-M198 is a major clade of Y chromosomal haplogroups which is distributed all across Eurasia. To this date, many efforts have been made to identify large SNP-based subgroups and migration patterns of this haplogroup. The origin and spread of R1a1 chromosomes in Eurasia has, however, remained unknown due to the lack of downstream SNPs within the R1a1 haplogroup. Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.

The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.

Link

May 23, 2012

Y-STR haplotype shared between Roma and South Indians

Gene. 2012 May 17. [Epub ahead of print]

Ancestral modal Y-STR haplotype shared among Romani and South Indian populations.

Regueiro M, Rivera L, Chennakrishnaiah S, Popovic B, Andjus S, Milasin J, Herrera RJ.

Abstract

One of the primary unanswered questions regarding the dispersal of Romani populations concerns the geographical region and/or the Indian caste/tribe that gave rise to the proto-Romani group. To shed light on this matter, 161 Y-chromosomes from Roma, residing in two different provinces of Serbian, were analyzed. Our results indicate that the paternal gene pool of both groups is shaped by several strata, the most prominent of which, H1-M52, comprises almost half of each collection's patrilineages. The high frequency of M52 chromosomes in the two Roma populations examined may suggest that they descend from a single founder that has its origins in the Indian subcontinent. Moreover, when the Y-STR profiles of haplogroup H derived individuals in our Roma populations were compared to those typed in the South Indian emigrants from Malaysia and groups from Madras, Karnataka (Lingayat and Vokkaliga castes) and tribal Soligas, sharing of the two most common haplotypes was observed. These similarities suggest that South India may have been one of the contributors to the proto-Romanis. European genetic signatures (i.e., haplogroups E1b1b1a1b -V13, G2a-P15, I-M258, J2-M172 and R1-M173), on the other hand, were also detected in both groups, but at varying frequencies. The divergent European genetic signals in each collection are likely the result of differential gene flow and/or admixture with the European host populations but may also be attributed to dissimilar endogamous practices following the initial founder effect. Our data also supports the notion that a number of haplogroups including G2a-P15, J2a3b-M67(xM92), I-M258 and E1b1b1-M35 were incorporated into the proto-Romani paternal lineages as migrants moved from northern India through Southwestern Asia, the Middle East and/or Anatolia into the Balkans.

Link

October 25, 2010

More detailed analysis of Eurasian populations (K=10)

I have removed some populations from the previous run (such as Moroccan Jews and Samaritans) that tended to generate mini-clusters due to the presence of close relatives and/or inbreeding in the sample. I have removed some redundant populations to even out the dataset, and I have also added North Kannadi and Gujarati, which helped reveal the gradient of ancestry in South Asia.

ADMIXTURE results:

Admixture proportions:


Some interesting observations:
  • The occurrence of 3.8% South Asian in Romanians may signify its Roma population. Indeed, almost all of this comes from a 25% South Asian individual, almost certainly a Roma.
  • The small African component in Spaniards which was revealed in a previous K=8 run turns out to be East African (0.5%) rather than West African (0.1%). If this holds up in larger sets then it might signify that its origin is from East African admixed populations from the east, rather than Sub-Saharan Africans.
  • The multiplicity of ancestries of the Uygur is made evident, in agreement with the extensive craniometric and genetic data on prehistoric and extant populations from the area.
  • The proportion of the two East Eurasian components in Turkic populations is interesting. It seems that the earliest departures from the Turkic homeland (such as the Chuvash and Yakut) have a predominance of the NE Asian component, the Anatolian Turks are intermediate, and the Uygurs, the only ones to have stayed close to the homeland, have experienced an increase in the E Asian component.
  • The absence of the West African component in Ethiopians is striking. Here are the individual results for Ethiopians, illustrating the variability of the Southwest African vs. East African components. The Ethiopian sample consists of a number different ethnic groups of the country, some of which (like the Amharas) are of Western Eurasian linguistic origin.

I am currently running K=11 and K=12 on the exact same data to see how the LogLikelihood and Bayes Information Criterion will move and whether new mini-clusters will appear, or if the mega-components (such as the "West Asian", "South European", and "North European") will split informatively. I will update this post with information on what actually happened, and with additional plots -- if I get robust results.

October 04, 2010

Y chromosomes of Vlax Roma

From the paper:
The Gypsies arrived in Europe 900–1100 years ago, when they first appeared in the Balkans. The present-day Gypsy population groups in Europe are the compound product of the early migrations from the Balkans into Europe [1]. The Gypsies came to Hungary from the Balkans in two large migrations. The Carpathian Romanies arrived in the 15th century and the Vlax Romanies came in the 19th century. The Carpathian Gypsies speak Hungarian and the Vlax Romanies speak Hungarian and Romani languages.
Interesting:
A median-joining (MJ) network of haplogroup H1a-M82 has demonstrated the sharing of identical Indian specific Y-chromosomal lineages between all Romani populations including Malaysian Indians as well as the Vlax Romanies (Fig. 2A and B). This common lineage of haplogroup H1a-M82 represents a common descent from a single ancestor providing a strong genetic link to the ancestral geographical origin of the proto-Gypsies [1]. According to Sengupta et al. [24] the age of microsatellite variation within haplogroup H1 in Indian populations is more than 9.7 +/- 4.4 ky. This time was estimated to be 992 years (95%CI 425–3472) in the Romani populations investigated by Gresham et al. [1] suggesting the Indian H1 haplogroup is the ancestral one.
Gresham et al. (pdf) used the genealogical mutation rate. Hence, the discrepancy between the Sengupta et al. age estimates and their own is partly due to the choice of mutation rate. Nonetheless, it's obvious that the Balkan H1 is still 3 times younger than the Indian one, and obviously of South Asian origin. Notice also how the Gresham et al. paper gives a large confidence interval for its estimate, in agreement with my observations about the inadequacy of a limited number of Y-STRs, and Y-STRs in general to couple tightly with historical events.

Nonetheless, if one uses an order-of-magnitude approach, the Gresham et al. estimate is quite compatible with historical knowledge about the arrival of Gypsy founders to the Balkans, just as was the case for Serbian Roma.

The ~1ky estimate for Balkan Gypsy H1 is similar to the ~1ky estimate for the updated J1 Cohen Modal Haplotype. For reasons explained in that post, this is probably an overestimate, and I can envision a scenario according to which the tribal descendants of an H1-man who lived in the 1st millennium AD made their way to Europe at the turn of the millennium, proliferating into the Gypsy communities of today.

Related:

Forensic Science International: Genetics doi:10.1016/j.fsigen.2010.08.017

Paternal genetic history of the Vlax Roma

Andrea Zalán et al.

Romanies constitute the largest minority group belonging to different subgroups in Hungary. Vlax Romanies are one of these Romani subgroups. The Gypsies came to Hungary from the Balkans in two large migrations. The Carpathian Romanies arrived in the 15th century and the Vlax Romanies came in the 19th century. The Carpathian Gypsies speak Hungarian and the Vlax Romanies speak Hungarian and Romani languages.
Only a limited number of genetic studies of Y-chromosomal haplotypes/haplogroups have been done before, moreover most studies did not contain information regarding the investigated Roma populations which subgroups belong to.
In the present study, we analyzed a wide set of Y-chromosomal markers to do comparable studies of the Vlax Roma in eastern Hungarian regions. The results can be compared in the context of previously published data on other Romani groups, Indian and Hungarian reference populations.
Haplogroups H1a-M82 and J2a2-M67 were most common in the investigated population groups. A median-joining network of haplogroup H1a-M82 has demonstrated the sharing of identical Indian specific Y-chromosomal lineages between all Romani populations including Malaysian Indians as well as the Vlax Romanies. This common lineage of haplogroup H1a-M82 represents a common descent from a single ancestor provides a strong genetic link to the ancestral geographical origin of the proto-Gypsies.
The detected haplogroups in the Vlax Romani population groups can be classified into two different Y-chromosomal lineages based on their putative origin. These lineages include ancestral Indian (H1a-M82), present-day Eurasian (J2a2-M67, J2*-M172, E1b1b1a-M78, I1-M253, R1a1-M198 and R1b1-P25) Y-chromosome lineages. Presence of these lineages in the paternal gene pool of the Roma people is illustrative of the Gypsy migration route from India through the Balkan to the Carpathian Basin.

September 28, 2010

Y chromosome study of Serbian Roma

The haplogroups are available as supplementary material. I wonder whether different population of Roma underwent different levels of admixture, or whether the Roma are themselves originally unrelated groups of wanderers which came to be identified by others as "Gypsies" and eventually believed it.

Both "massive admixture" and the scenario I am entertaining have their problems: in the former: why did a group of Roma admix so heavily while another not at all? in the latter: how did groups of unrelated origin come to share common cultural-linguistic traits? Balkan ethnology is not easy.

Here is an interesting tidbit from the paper which complements my recent enumeration of the genealogical mutation rate's superiority:
For the majority of the populations, time estimates based on Zhivotovsky et al., (2004) and NETWORK using the evolutionary mutation rate are comparable.
On the other hand, time estimates using the genealogical mutation rate (Goedbloed et al., 2009) seem to fit better with historical data of the Romani diaspora.
American Journal of Physical Anthropology DOI: 10.1002/ajpa.21372

Divergent patrilineal signals in three Roma populations

Maria Regueiro et al.

Abstract

Previous studies have revealed that the European Roma share close genetic, linguistic and cultural similarities with Indian populations despite their disparate geographical locations and divergent demographic histories. In this study, we report for the first time Y-chromosome distributions in three Roma collections residing in Belgrade, Vojvodina and Kosovo. Eighty-eight Y-chromosomes were typed for 14 SNPs and 17 STRs. The data were subsequently utilized for phylogenetic comparisons to pertinent reference collections available from the literature. Our results illustrate that the most notable difference among the three Roma populations is in their opposing distributions of haplogroups H and E. Although the Kosovo and Belgrade samples exhibit elevated levels of the Indian-specific haplogroup H-M69, the Vojvodina collection is characterized almost exclusively by haplogroup E-M35 derivatives, most likely the result of subsequent admixture events with surrounding European populations. Overall, the available data from Romani groups points to different levels of gene flow from local populations.

March 21, 2010

Y-chromosomes of Albanian populations (Ferri et al. 2010)

This is a very important study as it shows (for the first time) some detail on Albanian populations. From a first reading of the evidence, we can say that:
  • the Ghegs resemble Kosovar Albanians in having a higher frequency of E1b1b1.
  • Tosks on the other hand have a higher frequency of I.
  • The high J2 frequency resembles Greeks, with the expected 10 to 1 or so ratio between J2 and J1, and is dissimilar from northwestern Balkan populations. Past studies have shown however, that J2b is dominant in Albanian, rather than J2a which is dominant in most Greek populations tested so far (although J2b is also represented).
  • Similar frequencies to Greeks are also found in R1.
  • There is also a relative paucity of G compared to Greeks, and limited introgression of Gypsy chromosomes (H1) in the main Albanian groups (Gheg and Tosk).


International Journal of Legal Medicine DOI: 10.1007/s00414-010-0432-x

Y-STR variation in Albanian populations: implications on the match probabilities and the genetic legacy of the minority claiming an Egyptian descent

Gianmarco Ferri et al.

Y chromosome variation at 12 STR (the Powerplex® Y system core set) and 18 binary markers was investigated in two major (the Ghegs and the Tosks) and two minor (the Gabels and the Jevgs) populations from Albania (Southern Balkans). The large proportion of haplotypes shared within and between groups makes the Powerplex 12-locus set inadequate to ensure a suitable power of discrimination for the forensic practice. At least 85% of Y lineages in the Jevgs, the cultural minority claiming an Egyptian descent, turned out to be of either Roma or Balkan ancestry. They also showed unequivocal signs of a common genetic history with the Gabels, the other Albanian minority practising social and cultural Roma traditions.

Link

September 14, 2008

Y chromosomes of Bayash Romani

Once again, the 0.00069/locus/generation rate is used in this paper, and hence its estimated ages are wrong. The given Y-STR variance for haplogroup H1a in Table 2 is 0.06, which corresponds to an age of ~800 years.

It's interesting though, that Zhivotovsky is a co-author of this paper which states that:
A recent refinement of E1b1b1a-M78 by novel biallelic markers indicates that its subhaplogroup E1b1b1a2-V13 is the most common in Europe (Cruciani et al., 2007). In fact, E1b1b1a2-V13 originated in Western Asia about 11 KYA and expanded in Southeastern Europe about 4.5 KYA, not in connection with the spread of agriculture as traditionally assumed, but rather at the beginning of the Balkan Bronze age, as a consequence of the in situ population increase in the already populated territory (Cruciani et al., 2007).
and he was a co-author of King et al. (2008) which stated that:
The calculated expansion time of haplogroup E3b1a2-V13 in mainland Greece is 8,600 y BP at Nea Nikomedeia and 9,200 y BP at Lerna/Franchthi Cave and is consistent with the late Mesolithic/initial Neolithic horizon. These dates exceed those reported previously for Europe (Cruciani et al., 2007) that date to the Bronze Age. This discrepancy arises mainly because of differences in the choice of mutation rate used.
Peter Underhill was also a co-author of the latter study, and also of the recent paper on Sicily which used germline mutation rates and:
The estimate of Time to Most Recent Common Ancestor is about 2380 years before present, which broadly agrees with the archaeological traces of the Greek classic era.
Mesolithic - Early Bronze Age - classical Greek. Three completely different ages using three different mutation rates: a mutation rate 3.6x slower than the germline rate => Mesolithic. A mutation rate 2.4 to 2.8x slower => Early Bronze Age. A germline mutation rate => classical Greek.

My most recent take. I'll be much surprised if E-V13 turns out to be anything other than 2nd millennium BC in the Balkans.

American Journal of Physical Anthropology doi: 10.1002/ajpa.20933

Dissecting the molecular architecture and origin of Bayash Romani patrilineages: Genetic influences from South-Asia and the Balkans

Irena Martinovi Klari et al.

Abstract

The Bayash are a branch of Romanian speaking Roma living dispersedly in Central, Eastern, and Southeastern Europe. To better understand the molecular architecture and origin of the Croatian Bayash paternal gene pool, 151 Bayash Y chromosomes were analyzed for 16 SNPs and 17 STRs and compared with European Romani and non-Romani majority populations from Europe, Turkey, and South Asia. Two main layers of Bayash paternal gene pool were identified: ancestral (Indian) and recent (European). The reduced diversity and expansion signals of H1a patrilineages imply descent from closely related paternal ancestors who could have settled in the Indian subcontinent, possibly as early as between the eighth and tenth centuries AD. The recent layer of the Bayash paternal pool is dominated by a specific subset of E1b1b1a lineages that are not found in the Balkan majority populations. At least two private mutational events occurred in the Bayash during their migrations from the southern Balkans toward Romania. Additional admixture, evident in the low frequencies of typical European haplogroups, J2, R1a, I1, R1b1b2, G, and I2a, took place primarily during the early Bayash settlement in the Balkans and the Romani bondage in Romania. Our results indicate two phenomena in the Bayash and analyzed Roma: a significant preservation of ancestral H1a haplotypes as a result of considerable, but variable level of endogamy and isolation and differential distribution of less frequent, but typical European lineages due to different patterns of the early demographic history in Europe marked by differential admixture and genetic drift.

Link

May 20, 2008

ESHG 2008 abstracts

The European Society of Human Genetics conference is coming up, and there are some very interesting abstracts.

Note: The ESHG site has updated with a notice that the abstracts are embargoed until their presentation time. Therefore, I have decided to remove the body of this post until then, although I think it is a bit weird to embargo something that one places on the public web. In any case, you can find the abstracts easily by going to the site above. (June 1): post restored.

The peopling of North Asia: Y and X perspectives
V. A. Stepanov, V. Kharkov, I. Khitrinskaya, O. Medvedeva, M. Spiridonova, A. Marusin, V. Puzyrev;
Institute for Medical Genetics, Tomsk, Russian Federation.
Presentation Number: P07.056
To reconstruct the origin and evolution of human populations in North Asia we investigated the genetic diversity in 50 population samples (about 2000 individuals totally) using Y and X chromosome lineages. Y-chromosomal haplotypes were constructed with unique event polymorphisms (UEP) and STR markers according to Y Chromosome consortium (YCC) classification. SNP markers in a single 60 kb linkage disequilibrium region of ZFX gene was used to trace the X chromosomal population history.
The genetic diversity of Y haplogroups was quite high (0.70 - 0.95) in most populations except few very isolated groups. The proportion of inter-population differences in the total genetic variability measured by Fst statistics is 17% for binary haplogroups and 19% for YSTR. Multidimensional scaling and principal component analysis revealed four major components in North Asian Y gene pool, reflecting the presence of Paleoasiatic (Q), Proto-Uralic (N3, N2), Eastern Asian (O, C), and Western Eurasian (R1, I, J) lineages.
X-chromosomal haplotypes in North Asia are less divers (gene diversity within populations 0.65 - 0.80) and less differentiated (Fst = 4%) compared to Y lineages.
The population clustering by X and Y gives, to a first approximation, a similar picture, and matrixes of genetic distances between populations for X and Y haplotypes significantly correlates.
The age of genetic diversity generation and time of population differentiation demonstrates the Upper Paleolithic origin of major Y and X lineages and post-glacial population expansions.
This work is supported by RFBR grants ##06-04-48274 and 07-04-01629.
The following seems to be a very important study; in particular the notion that particular Y chromosome/mtDNA haplogroups may be associated with higher or lower fertility may have implications about their distribution.

UPDATE (May 21): I did a quick and dirty analysis of the Y-haplogroup and mtDNA-haplogroup data from Bosch et al. (2006) (Ann Hum Genet. 2006 Jul;70(Pt 4):459-87.), and there is a -0.43 correlation between Y-haplogroup I and mtDNA-haplogroup H and a +0.46 correlation between Y-haplogroup R1 and mtDNA-haplogroup H. While not significant (with only 10 populations), this is definitely in the right direction for a selection effect for/against specific Y-DNA/mtDNA combinations.

... on the other hand, another quick and dirty analysis of 23 populations from Rootsi's survey on Y-haplogroup I and mtDNA frequencies from AJHG Volume 80, Issue 4, April 2007, Pages 759-768 didn't turn up any correlation. Perhaps, someone can look at possible correlations between Y-chromosome and mtDNA haplogroups in Europe to see if anything interesting turns up.

Male infertility induced by mtDNA/Y unfavorable combination? An association study on human mitochondrial DNA
S. C. Gomes1, S. Fernandes2, R. Gonçalves1, A. T. Fernandes1, A. Barros3, H. Geada4, A. Brehm1;
1Human Genetics Laboratory, University of Madeira, Funchal, Portugal, 2Genetics Department, Faculty of Medicine, University of Porto, Porto, Portugal, 3Centre of Reproductive Genetics A Barros, Porto, Portugal, 4Faculty of Medicine, University of Lisbon, Lisboa, Portugal.
Presentation Number: P07.084
There is growing evidence that certain mtDNA haplogroups determine a genetic susceptibility to various disorders bringing out the interest in the possible role of mtDNA background on the phenotype expression of mitochondrial genetic disorders. An association between haplogroup T and asthenospermia has been reported and several sublineages of haplogroup U were associated with differences in sperm motility and vitality. The deletion of some DAZ copies gene in 10-15% of azoospermic and oligospermic patients has been reported but also present in fertile men belonging to certain Y-haplogroups. The findings of one study have rarely been replicated by studies in other populations and conflicting associations have been reported. Our focus in this case-control study is to investigate the existence of other influences, besides a weak mtDNA background, promoting male infertility. The occurrence of a specific mtDNA variant associated to a certain Y-chromosome haplogroup could represent a vital link that will compromise the sperm function and be responsible for male infertility. A group of 99 infertile men and other one composed by 90 subjects with proven fertility were selected and analysed. The frequency of the combination mtDNA-haplogroup H (especially with the CRS sequence) and Y-haplogroup R was higher in fertile than in infertile men seemingly to be favorable to fertility. On the other hand, a considerable number of infertile men belonging to mtDNA-haplogroup H (CRS) and to Y-haplogroup I, associated to a specific DAZ gene deletion pattern- 2+4d, suggests a non favorable combination to male fertility.
The Bayash Roma: phylogenetic dissection of Eurasian paternal genetic elements
I. Martinovic Klaric, M. Pericic Salihovic, L. Barac Lauc, B. Janicijevic;
Institute for Anthropological Research, Zagreb, Croatia.
Presentation Number: P07.110
The Bayash consist of numerous and small Romani groups speaking different dialects of the Romanian language and living dispersedly in Croatia, Hungary, Bosnia and Herzegovina, Serbia, Romania, Bulgaria, and to the lesser extent in Macedonia, Greece, Ukraine, Slovakia and Slovenia. Larger Bayash groups migrated to Croatia most likely during the 19th century, after abolition of slavery in Romania. Molecular architecture and the origin of the Croatian Bayash paternal gene pool was addressed by analysing 151 Bayash Y chromosomes from two Croatian regions, 332 Y chromosomes from Romani populations across Europe, 814 Y-chromosomes from non-Romani host populations living in Southeastern, Southern and Eastern Europe as well as with 1680 Y-chromosomes from South Asian populations. The Bayash in Croatia represent one population of largely shared paternal genetic history characterized by substantial percentage (44%) of common H1-M82 and E3b1-M78 lineages. Relatively ancient expansion signals and limited diversity of Indian specific H1-M82 lineages imply descent from closely related paternal ancestors who could have been settled in the Indian subcontinent between 7th and 9th centuries AD. Minimal time divergence of the Bayash subpopulations is consistent with their putative migratory split within Romania towards Wallachia and Transilvania. Substantial percentage of E3b1 lineages and high associated microsatellite variance in the Bayash men is a reflection of significant admixture with majority populations from the Vardar-Morava-Danube catchment basin - possibly a common paternal signature of Romani populations in Southeastern Europe. Additional traces of admixture are evident in the modest presence of typical European haplogroups.


Are the Moravian Valachs of Czech Republic the Aromuns of Central Europe? Model population for isolation and admixture
E. Ehler1,2, V. Vančata2;
1Department of Anthropology and Human Genetics, Charles University in Prague, Faculty of Science, Prague, Czech Republic, 2Department of Biology and Ecological Education, Charles University in Prague, Faculty of Education, Prague, Czech Republic.
Presentation Number: P07.129
Moravian Valachs of Czech Republic are one of the most distinct ethnic groups from Central Europe. Related to similar populations in Poland and Slovakia, they emerge at the end of 15th century, as the north-westernmost prominence of migration that started 250 years earlier in northern Romania. Being predominately highland sheep herders and of putative Romanian origin, they represent a Central European analogue of Balkan Aromanian populations. We have gathered Y-chromosomal, linguistic, ethnographic and historical data for this population and compared them with surrounding as well as with east European populations.
Linguistic data show specific parts of shared vocabulary of Romanian origin between several pastoral groups in Central and Eastern Europe. Comparing genetic and linguistic pairwise distance matrices (Mantel test) in these populations did not revealed any significant correlation. Thus we confirmed that plain geographical distance still plays the major role in genetic distances between populations in Europe. From our further analysis it is clear, that the Moravian Valachs, after at least five centuries of admixture, are not overly genetically different from surrounding populations. On the other hand, from the point of view of intra-population diversity, they are much more similar to isolated Balkan populations (e.g. Aromuns) than to Central European populations.


Phylogeography of the human Y chromosome haplogroup E3a
F. Cruciani1, B. Trombetta1, D. Sellitto2, C. Nodale1, R. Scozzari1;
1Sapienza Università di Roma, Rome, Italy, 2Consiglio Nazionale delle Ricerche, Rome, Italy.
Presentation Number: P07.134
The Y chromosome specific biallelic marker DYS271 defines the most common haplogroup (E3a) currently found in sub-Saharan Africa. A sister clade, E3b (E-M215), is rare in sub-Saharan Africa, but very common in northern and eastern Africa. On the whole, these two clades represent more than 70% of the Y chromosomes of the African continent. A third clade belonging to E3 (E3c or E-M329) has been recently reported to be present only in eastern Africa, at low frequencies.
In this study we analyzed more than 1,600 Y chromosomes from 55 African populations, using both new and previously described biallelic markers, in order to refine the phylogeny and the geographic distribution of the E3a haplogroup.
The most common E-DYS271 sub-clades (E-DYS271*, E-M191, E-U209) showed a non uniform distribution across sub-Saharan Africa. Most of the E-DYS271 chromosomes found in northern and western Africa belong to the paragroup E-DYS271*, which is rare in central and southern Africa. In these latter regions, haplogroups E-M191 and E-U209 show similar frequency distributions and coalescence ages (13 and 11 kyr, respectively), suggesting their involvement in the same migratory event/s.
By the use of two new phylogenetically equivalent markers (V38 and V89), the earlier tripartite structure of E3 haplogroup was resolved in favor of a common ancestor for haplogroups E-DYS271 (formerly E3a) and E-M329 (formerly E3c). The new topology of the E3 haplogroup is suggestive of a relatively recent eastern African origin for the majority of the chromosomes presently found in sub-Saharan Africa.
Y-chromosome lineages in Xhosa and Zulu Bantu speaking populations
R. P. A. Gonçalves, H. Spínola, A. Brehm;
Human Genetics Laboratory, Funchal, Portugal.
Presentation Number: P07.137
Y-chromosome Single Nucleotide Polymorphisms have been analysed in Zulu and Xhosa, two southern Africa Bantu speaking populations. These two ethnic groups have their origin on the farmer’s Bantu expansion from Niger-Congo border towards sub-Sahel regions on the southern tip of the continent, during the past 3000 years.
Seven different Y-chromosome haplogroups were found in Zulu contrasting with only two in Xhosa. E3a, a common haplogroup among West sub-Saharans associated to Bantu migration was the most prevalent in both populations (56.9% in Zulu and 90% in Xhosa). The second most common haplogroup was E2 (29.3% in Zulu and 10% in Xhosa), present both in West and East African populations.
The present-day Zulu and Xhosa paternal legacy is essentially of West sub-Saharan origin. Zulu population shows a most diverse genetic influence comparing to Xhosa, revealing some pre-Bantu expansion markers and East African influences. Zulu presents 8.6% Y-chromosome haplogroups (A, B, J1) of non-Bantu influence that could indicate gene flow from other populations, particularly Khoisan.
Human genetic population structure: Patterns and underlying processes
Presentation Time: Tuesday, 9:15 a.m. - 9:45 a.m.
G. Barbujani;
University of Ferrara, Department of Biology and Evolution, Ferrara, Italy.
Presentation Number: S15.2
Classical studies of genetic diversity in humans consistently showed that the largest proportion of human diversity occurs among members of the same population. On average, differences among different populations in the same continent represent 5% of the global human variance, and differences among continents another 10%. Genetic variation is largely discordant across the genome, meaning that different loci show different spatial patterns, and implying that a good description of population structure can only be based on the analysis of multiple loci. Studies of single loci are also unlikely to reasonably identify an individual’s place of origin. A general decline of genetic of genetic diversity with distance from Africa, and a parallel increase in linkage disequilibrium, can be accounted for by the effects of a series of founder effects accompanying the spread of anatomically-modern humans from Africa. Recent DNA analyses at the global level show that most allelic variants are cosmopolitan and only a small percentage are continent-specific, whereas a clearer continental structure emerges when considering composite haplotypes. This suggests that, at the global level, gene flow has had a strong impact on genetic diversity, through both directional dispersal and successive short-range migratory exchanges. At the local level, several factors have contributed to genetic differentiation, and, in particular, language barriers have been shown to be associated with small but non-negligible increases of the genetic differences between neighboring populations.

Hierarchical analysis of 28 Y-chromosome SNP’s in the population of the Republic of Macedonia

P. Noveski, S. Trivodalieva, G. D. Efremov, D. Plaseska-Karanfilska;
Macedonian Academy of Sciences and Arts, Research Centre for Genetic Engineering and Biotechnology, Skopje, Macedonia, The Former Yugoslav Republic of.


Presentation Number:
P05.211


Analysis of Y-chromosome haplogroups, defined by single nucleotide polymorphisms (SNP’s), has become a standard approach for studying the origin of human populations and measuring the variability among them. Furthermore, Y-SNP’s represent a new forensic tool, because their population specificity may allow to determine the origin of any male sample of interest for forensic purposes. The aim of this study was to develop a strategy for rapid, simple and inexpensive Y-chromosome SNP’s typing in the population of R. Macedonia. We have studied a total of 343 DNA male samples; 211 Macedonians, 111 Albanians and 21 of other ethnic origin (Roma, Serbs and Turks). Methodology included multiplex PCR and single nucleotide extension reaction by SNaPshot multiplex kit. The set of 28 markers has been grouped in 5 multiplexes in order to determine the most frequent haplogroups using only 1 or 2 multiplexes. Twenty different Y haplogroups were determined among 343 male DNA samples. The finding that five haplogroups (E3b1, I1b1, J2b1a, R1a and R1b) comprise more than 70% of the Y chromosomes is consistent with the typical European Y chromosome gene pool. The distribution of the Y-haplogroups differs between Macedonians and Albanians. The most common Y haplogroup among Macedonians is I1b1 (27.5%), followed by three haplogroups present with similar frequencies E3b1 (15.6%), R1a (14.2%) and R1b (11.4%). Among Albanians the most frequent Y haplogroup is E3b1 (28.8%), followed by R1b (18.0%), J2b1a (13.5%) and R1a (12.6%).


The following paper (probably) refers to a recent study, according to which:
One of the most elevated values of 35delG prevalence corresponds to Greece (1/28); the pattern of various 35delG prevalences is interpretated in the present meta-analysis as the result of Ancient Greek colonizations of the "Magna Grecia" in historical times.
Strong linkage disequilibrium for the frequent GJB2 35delG mutation in the Greek population
H. Kokotas1, L. Van Laer2, M. Grigoriadou1, V. Iliadou3, J. Economides4, S. Pomoni1, A. Pampanos1, N. Eleftheriades5, E. Ferekidou6, S. Korres6, A. Giannoulia-Karantana7, G. Van Camp2, M. B. Petersen1;
1Institute of Child Health, Athens, Greece, 2University of Antwerp, Antwerp, Belgium, 3AHEPA Hospital, Thessaloniki, Greece, 4‘Aghia Sophia’ Children’s Hospital, Athens, Greece, 5St. Loukas Hospital, Thessaloniki, Greece, 6Athens University, Athens, Greece, 7Athens University Medical School, Athens, Greece.


Presentation Number: P06.080

Approximately one in 1,000 children is affected by severe or profound hearing loss at birth or during early childhood (prelingual deafness). Up to forty percent of autosomal recessive, congenital, severe to profound hearing impairment cases result from mutations in a single gene, GJB2. The 35delG mutation accounts for the majority of GJB2 mutations detected in Caucasian populations and represents one of the most frequent disease mutations identified so far. Some previous studies have assumed that the high frequency of the 35delG mutation reflects the presence of a mutational hot spot, whilst other studies support the theory of a common founder. Greece is amongst the countries presenting high frequency of the 35delG mutation (3.5%), and a recent study raised the hypothesis of the origin of this mutation in ancient Greece. We genotyped 60 Greek deafness patients homozygous for the 35delG mutation for six single nucleotide polymorphisms (SNPs) and two microsatellite markers, mapping within or flanking the GJB2 gene, as compared to 60 Greek hearing controls. A strong linkage disequilibrium was found between the 35delG mutation and markers inside or flanking the GJB2 gene, at distances of 34 kb on the centromeric and 90 kb on the telomeric side of the gene, respectively. Our study supports the hypothesis of a founder effect and we further propose that ethnic groups of Greek ancestry could have propagated the 35delG mutation, as evidenced by historical data beginning from the 15th century BC.

January 22, 2008

Y chromosomes of Iberian Gypsies

Ann Hum Genet (OnlineEarly Articles).
doi:10.1111/j.1469-1809.2007.00421.x

A Perspective on the History of the Iberian Gypsies Provided by Phylogeographic Analysis of Y-Chromosome Lineages

A. Gusmão et al.

The European Gypsies, commonly referred to as Roma, are represented by a vast number of groups spread across many countries. Although sharing a common origin, the Gypsy groups are highly heterogeneous as a consequence of genetic drift and different levels of admixture with surrounding populations. With this study we aimed at contributing to the knowledge of the Roma history by studying 17 Y-STR and 34 Y-SNP loci in a sample of 126 Portuguese Gypsies. Distinct genetic hallmarks of their past and migration route were detected, namely: an ancestral component, shared by all Roma groups, that reflects their origin in India (H1a-M82; ~17%); an influence from their long permanence in the Balkans/Middle-East region (J2a1b-M67, J2a1b1-M92, I-M170, Q-M242; ~31%); traces of contacts with European populations preceding the entrance in the Iberian Peninsula (R1b1c-M269, J2b1a-M241; ~10%); and a high proportion of admixture with the non-Gypsy population from Iberia (R1b1c-M269, R1-M173/del.M269, J2a-M410, I1b1b-M26, E3b1b-M81; ~37%). Among the Portuguese Gypsies the proportion of introgression from host populations is higher than observed in other groups, a fact which is somewhat unexpected since the arrival of the Roma to Portugal is documented to be more recent than in Central or East Europe.


Link

mtDNA of Slovaks

From the paper:
Recent mtDNA variability study in Czechs, the neighbors of Slovaks, has shown that they are genetically similar with adjacent European populations, but characterized by a small frequency of East Eurasian (2.8%) and Roma-specific (2.8%) mtDNA lineages (Malyarchuk et al. 2006b). Therefore, the aim of the present study was to characterize the mtDNA variation in Slovaks from western and eastern areas of Slovakia, based on variation of the HVS I and HVS II sequences, followed by a hierarchical survey of mtDNA haplogroup-specific restriction fragments length polymorphism (RFLP) markers.
In the above passage they are referring to this paper. More on the haplogroup M in Slovaks:
However, in contrast to the previously studied Czech population from western Bohemia (Malyarchuk et al. 2006b), samples from Slovakia do not display any East Eurasian mtDNAs. One of the Slovak M-haplotype belongs to subhaplogroup M1b and is identical to M1b1a-haplotypes revealed in Italians and Bedouins from southern Israel (Olivieri et al. 2006) as well as in Saudi Arabs (Abu-Amero et al. 2007). A second M-lineage detected in Slovaks is defined by variants at positions 16129–16223-16230–16233-16304–16344. This lineage is identical to those revealed previously in gene pools of the Bulgarian Roma at frequency of 3.6% (Gresham et al. 2001). Based on the presence of the 16129 variant, Gresham et al. (2001) suggested that this lineage belongs to Indian-specific haplogroup M5. Nevertheless, to determine its exact phylogenetic status we completely sequenced our Slovak sample (Slv227) and compared it with Indian M-haplotypes published by Sun et al. (2006) (Fig. 1). As a result, we have found that our sample belongs to haplogroup M35 due to mutations at positions 199 and 12561. Moreover, it shared transition at 15928 with the South Indian sample T17 (from Andhra Pradesh) that allowed us to define a new Indian/Roma branch called as M35b.
And on a Roma-related J1* lineage:
Previously, we have found that the Polish Roma population is characterized by high incidence (18.8%) of haplogroup J1* lineage, defined by HVS I motif 16069–16126-16145–16222-16235–16261-16271 (Malyarchuk et al. 2006a). This and a similar haplotype, lacking only the 16271 transition, are very rare in European Roma populations, being found only in the Spanish, Bulgarian and Hungarian Roma (Gresham et al. 2001; Egyed et al. 2007). Among Europeans, such haplotypes have been revealed only in French (0.5%; Dubut et al. 2004), Hungarian (0.5%; Egyed et al. 2007) and Czech (about 3%; Vanecek et al. 2004; Malyarchuk et al. 2006b) populations. In the present study, we have found that 2.9% of individuals from eastern Slovakia are characterized by exactly the same J1*-haplotype.
On differences within Slovakia:
The MDS analysis performed on the basis of pairwise FST values revealed that Slovak populations do not cluster together. Western Slovaks are located together with the Czechs and Austrians (in accordance with their geographic proximity), whereas eastern Slovaks are placed close to Slovenians (Fig. 3).

Ann Hum Genet (OnlineEarly Articles). doi:10.1111/j.1469-1809.2007.00410.x

Mitochondrial DNA Variability in Slovaks, with Application to the Roma Origin
Annals of Human Genetics

A. Malyarchuk, M. A. Perkova, M. V. Derenko, T. Vanecek, J. Lazur, P. Gomolcak

To gain insight into the mitochondrial gene pool diversity of European populations, we studied mitochondrial DNA (mtDNA) variability in 207 subjects from western and eastern areas of Slovakia. Sequencing of two hypervariable segments, HVS I and HVS II, in combination with screening of coding region haplogroup-specific RFLP-markers, revealed that the majority of Slovak mtDNAs belong to the common West Eurasian mitochondrial haplogroups (HV, J, T, U, N1, W, and X). However, a few sub-Saharan African (L2a) mtDNAs were detected in a population from eastern part of Slovakia. In addition, about 3% of mtDNAs from eastern Slovakia encompass Roma-specific lineages. By means of complete mtDNA sequencing we demonstrate here that the Roma-specific M-lineages observed in gene pools of different Slavonic populations (Slovaks, Poles and Russians), belong to Indian-specific haplogroups M5a1 and M35. Moreover, we show that haplogroup J lineages found in gene pools of the Roma and some Slavonic populations (Czechs and Slovaks) belong to new subhaplogroup J1a, which is defined by coding region mutation at position 8460.

Link