Dienekes’ Anthropology Blog: Arabs

Showing posts with label Arabs. Show all posts

September 18, 2013

Genetic structure of Kuwaiti population

PLoS ONE 8(9): e74913. doi:10.1371/journal.pone.0074913

Genetic Substructure of Kuwaiti Population Reveals Migration History

Osama Alsmadi et al.

The State of Kuwait is characterized by settlers from Saudi Arabia, Iran, and other regions of the Arabian Peninsula. The settlements and subsequent admixtures have shaped the genetics of Kuwait. High prevalence of recessive disorders and metabolic syndromes (that increase risk of diabetes) is seen in the peninsula. Understanding the genetic structure of its population will aid studies designed to decipher the underlying causes of these disorders. In this study, we analyzed 572,366 SNP markers from 273 Kuwaiti natives genotyped using the illumina HumanOmniExpress BeadChip. Model-based clustering identified three genetic subgroups with different levels of admixture. A high level of concordance (Mantel test, p=0.0001 for 9999 repeats) was observed between the derived genetic clusters and the surname-based ancestries. Use of Human Genome Diversity Project (HGDP) data to understand admixtures in each group reveals the following: the first group (Kuwait P) is largely of West Asian ancestry, representing Persians with European admixture; the second group (Kuwait S) is predominantly of city-dwelling Saudi Arabian tribe ancestry, and the third group (Kuwait B) includes most of the tent-dwelling Bedouin surnames and is characterized by the presence of 17% African ancestry. Identity by Descent and Homozygosity analyses find Kuwait’s population to be heterogeneous (placed between populations that have large amount of ROH and the ones with low ROH) with Kuwait S as highly endogamous, and Kuwait B as diverse. Population differentiation FST estimates place Kuwait P near Asian populations, Kuwait S near Negev Bedouin tribes, and Kuwait B near the Mozabite population. FST distances between the groups are in the range of 0.005 to 0.008; distances of this magnitude are known to cause false positives in disease association studies. Results of analysis for genetic features such as linkage disequilibrium decay patterns conform to Kuwait’s geographical location at the nexus of Africa, Europe, and Asia.

Link

July 10, 2013

Population history of middle Euphrates valley

HOMO - Journal of Comparative Human Biology Available online 3 July 2013

Population history of the middle Euphrates valley: Dental non-metric traits at Tell Ashara, Tell Masaikh and Jebel Mashtale, Syria

Arkadiusz Sołtysiak, Marta Bialon

Fifty-nine dental non-metric traits were scored using Arizona State University Dental Anthropology System on a sample of teeth from 350 human skeletons excavated at three sites in the lower middle Euphrates valley. The dataset was divided into six chronological subsets: Early Bronze Age, Middle Bronze Age, Early Iron Age with Neo-Assyrian period, Classical/Late Antiquity, Early Islamic (Umayyad and Abbasid) period and Modern period. The matrix of Mean Measure of Divergence values exhibited temporal homogeneity of the sample with only dental non-metric trait scores in the Modern subset differing significantly from most other subsets. Such a result suggests that no major gene flow occurred in the middle Euphrates valley between the 3rd millennium BCE and the early 2nd millennium CE. Only after the Mongolian invasion and large depopulation of northern Mesopotamia in the 13th century CE a major population change occurred when the area was taken over in the 17th century by Bedouin tribes from the Arabian Peninsula.

Link

October 25, 2012

Instantaneous vs. continuous admixture dynamics (Jin et al. 2012)

A new paper in AJHG discusses the distribution of chromosomal segments of distinct ancestry (CSDAs) under three different models of admixture dynamics (left). In the hybrid isolation (HI) model, admixture is instantaneous and results in a hybrid population that evolves with drift and recombination only. In the gradual admixture (GA) model, the hybrid population continues to receive admixture from the unadmixed parental populations. Finally, in the continuous gene flow model (CGF), one of the populations becomes admixed while the other continues to exist unadmixed and to contribute to the admixed one.

In practical terms, the HI model results in the diminution of CSDA length due to recombination over time, and at "present" there is a paucity of long CSDAs. In the GA model there are more long CSDAs for both populations, while in the CGF model there is an asymmetry in the CSDAs donated by Pop1 and Pop2, with those from the "donor" population being longer (because fresh "long" segments are added in every generation).

The conclusions of the paper regarding some particular admixture cases are also interesting. For African Americans:

Although the actual population admixture of African Americans might be more complex than what our simulation suggested, the CGF1 model setting at 14 generations was found to be reasonably epresentative, capturing the main pattern of the population admixture dynamics.

The CGF1 model has Africans as recipients and Europeans as donors. This makes sense, since African Americans are descended from slaves who were transported to the New World, with the slave trade ending centuries ago, hence there was mostly no replenishment of the AA population with fresh African-origin individuals. On the other hand, European Americans, both due to social dynamics and their numerical majority continued to exist as a distinct population that contributed to the AA population.

I should mention that according to HAPMIX, the admixture time was 7 generations, with is close to the 6 +/- 1 generations inferred by rolloff analysis by Moorjani et al. So, in this case this admixture time appears to be an "average" of a continuing process of admixture that began 14 generations ago.

Onto Mexicans:

In short, the GA model at 24 generations fit the empirical data best among all these simulated scenarios, as indicated by the distribution of EMDs.

Again, this makes sense, because in Mexico there continued to exist unadmixed populations of Europeans and Amerindians that contributed to the Mestizo population of the country.

On the African admixture in Mozabites:

Comparing the empirical distribution of CSDAs with that simulated, we found that the Mozabite admixture process essentially fit the HI model with 100 generations since admixture. There was an almost complete absence of recent gene flow from European populations to the Mozabite gene pool (Figure 6A). For the Sub-Saharan African ancestral component, there were more long CSDAs at the tail of empirical distribution than those in the HI model, which confirmed that recent gene flow from African populations had contributed to the Mozabite gene pool (Figure 6B).

Again, this makes sense: Berber groups were not replenished from other Caucasoid sources, so their original admixture with native Africans resulted in a blend that persisted largely unaffected by "Europeans", but did find occasion of admixture with Sub-Saharans. Hence, the asymmetry in the presence of long "European" vs. "Sub-Saharan" segments.

A similar pattern was evident for Bedouin, Palestinians, and Druze:

Analyses of European ancestral component in Bedouin and Palestinian populations also showed that the empirical distributions essentially fit the HI model for both populations (Figures 6C and 6E). Although the empirical CSDA distribution of Sub-Saharan African ancestral component also fit the HI model best, both distributions showed a long tail at the right compared with those under the HI model, indicating that recent gene flow from Sub-Saharan Africans also contributed to the two admixed populations (Figures 6D and 6F). ... For Druze, their European component of ancestry fit the HI model very well. However, their African ancestral component contained much shorter CSDAs than those of simulated (Figure S14), which might indicate that previous studies had underestimated the admixture time of Druze. In addition, populations receiving recent gene flow from their parental populations showed higher variation of individual ancestral proportions than those who did not (Figure S13).

The Druze have well-known Egyptian connections, and they may have largely avoided Sub-Saharan African admixture during the Islamic period, principally because of its avoidance of proselytism. Hence, their African admixture may stem from Egyptian adherents who were themselves a product of much earlier Caucasoid/Sub-Saharan admixture during the course of pre-Islamic Egypt.

The American Journal of Human Genetics, 25 October 2012 doi:10.1016/j.ajhg.2012.09.008

Exploring Population Admixture Dynamics via Empirical and Simulated Genome-Wide Distribution of Ancestral Chromosomal Segments

Wenfei Jin et al

Abstract

The processes of genetic admixture determine the haplotype structure and linkage disequilibrium patterns of the admixed population, which is important for medical and evolutionary studies. However, most previous studies do not consider the inherent complexity of admixture processes. Here we proposed two approaches to explore population admixture dynamics, and we demonstrated, by analyzing genome-wide empirical and simulated data, that the approach based on the distribution of chromosomal segments of distinct ancestry (CSDAs) was more powerful than that based on the distribution of individual ancestry proportions. Analysis of 1,890 African Americans showed that a continuous gene flow model, in which the African American population continuously received gene flow from European populations over about 14 generations, best explained the admixture dynamics of African Americans among several putative models. Interestingly, we observed that some African Americans had much more European ancestry than the simulated samples, indicating substructures of local ancestries in African Americans that could have been caused by individuals from some particular lineages having repeatedly admixed with people of European ancestry. In contrast, the admixture dynamics of Mexicans could be explained by a gradual admixture model in which the Mexican population continuously received gene flow from both European and Amerindian populations over about 24 generations. Our results also indicated that recent gene flows from Sub-Saharan Africans have contributed to the gene pool of Middle Eastern populations such as Mozabite, Bedouin, and Palestinian. In summary, this study not only provides approaches to explore population admixture dynamics, but also advances our understanding on population history of African Americans, Mexicans, and Middle Eastern populations.

Link

July 19, 2012

Huge study on Y-chromosome variation in Iran (Grugni et al. 2012)

This is the equivalent of a box of candy for anyone interested in Eurasian (pre-)history. I will have digest all the goodies within, and post any of my comments as updates to this post.

UPDATE I: Here is the table of haplogroup frequencies for easy reference:

One of the most interesting finds is the presence of a few IJ-M429* chromosomes in the sample. Haplogroup IJ encompasses the major European I subclade, and the major West Asian J subclade. The discovery of IJ* chromosomes is consistent with the origin of this haplogroup in West Asia; it is widely believed that haplogroup I represents a pre-Neolithic lineage in Europe, although at present there are no Y chromosome-tested pre-Neolithic remains.

There is also a wide assortment of Q and R in Iran. While some of these may be intrusive (e.g., the 42.6% of Q1a2 in Turkmen, likely a legacy of their Central Asian origins), the overall picture appears consistent with a deep presence of these lineages in Iran. This is especially true for haplogroup R where pretty much every paragroup and derived group is present, excepting those likely to have originated recently elsewhere.

UPDATE II: From the paper:

Although accounting only for 25% of the total variance, the first two components (Figure 3) separate populations according to their geographic and ethnic origin and define five main clusters: East-African, North-African and Near Eastern Arab, European, Near Eastern and South Asian. The 1stPC clearly distinguishes the East African groups (showing a high frequency of haplogroup E) from all the others which distribute longitudinally along the axis with a wide overlapping between European and Arab peoples and between Near Eastern and South Asian groups. The 2ndPC separates the North-African and Near Eastern Arabs (characterized by the highest frequency of haplogroup J1) from Europeans (characterized by haplogroups I, R1a and R1b) and the Near Easterners from the South Asians (due to the distribution of haplogroups G, R2 and L). Iranian groups do not cluster all together, occupying intermediate positions among Arab, Near Eastern and Asian clusters. In this scenario, it is worth of noticing the position of three Iranian groups: (i) Khuzestan Arabs (KHU-Ar) who, despite their Arabic origin, are close to the Iranian samples; (ii) Armenians from Tehran (THE-Ar), whose position, in the upper part of the Iranian distribution, indicates a close affinity with the Near Eastern cluster, while their position near Turkey and Caucasus groups, due to the high frequency R1b-M269 and other European markers (eg: I-M170), is in agreement with their Armenia origin; (iii) Sistan Baluchestan (SB-Ba) that clusters with its neighbouring Pakistan.

UPDATE III: There are lots of little details in the haplogroup distribution that make historical sense. For example, C3 exists in Assyrians from Azarbaijan, and both C*, C3, and O exists in Zoroastrians from Yazd. It is often forgotten that before the spread of Islam, and quite time thereafter, Inner Asia was teeming with Zoroastrians and Nestorian Christians. It seems quite likely that these outliers represent a legacy of these communities.

UPDATE IV: I have a feeling that Razib will take exception with this statement: "Ancient Persian people were firstly characterized by the Zoroastrianism. After the Islamization, Shi'a became the main doctrine of all Iranian people."

UPDATE V: This confirms my observation from the recent studies in Afghanistan, that there is an inverse relationship of J2a and R1a in Iranian-speaking groups, with an excess of the latter among the eastern Iranians, and of the former among the Persians. From the paper:

Among the different J2a haplogroups, J2a-M530 [46] is the most informative as for ancient dispersal events from the Iranian region. This lineage probably originated in Iran where it displays its highest frequency and variance in Yazd and Mazandaran (Figure 2). Taking into account its microsatellite variation and age estimates along its distribution area (Tables S3 and S7), it is likely that its diffusion could have been triggered by the Euroasiatic climatic amelioration after the Last Glacial Maximum and later increased by agriculture spread from Turkey and Caucasus towards southern Europe. The high variance observed in the Italian Peninsula is probably the result of stratifications of subsequent migrations and/or of the presence of sub-lineages not yet identified. Of interest in the M530 network (Figures 2 and S3) is the presence of a lateral branch that is characterized by a DYS391 repeat number equal to 9. Differently from previous observations [46], this branch is not restricted to Anatolian Greek samples being shared with different eastern Mediterranean coastal populations. The M530 diffusion pattern seems to be also shared by the paragroups J2a-M410* and J2a-PAGE55*. In addition, the variance distribution of the rare R1b-M269* Y chromosomes, displaying decreasing values from Iran, Anatolia and the western Black Sea coastal region, is also suggestive of a westward diffusion from the Iranian plateau, although more complex scenarios can be still envisioned because of its non-star like structure.

Of course, the idea that the diffusion of J2a related lineages ties in with early agricultural expansions has been with us for a long time, but it is time to abandon it. First of all, as we have seen, J2a diminishes greatly as we head towards South Asia; it certainly doesn't look like the lineage of the multitude of agricultural settlements that sprang up along the southeastern vector soon after the invention of agriculture. Second, it is lacking so far in all ancient Y chromosome data from Europe down to 5,000 years ago. It seems much more probably that J2 related lineages spread from the highlands of West Asia much later.

The "age estimates" are the result of using the inappropriate "evolutionary mutation rate", and become even older because of the inclusion of the DYS388 marker that is very stable in many haplogroups but very mutable within haplogroup J. On the left you can see frequency, Y-STR variance, and haplotype network structures for various J-related groups.

It is unfortunate that there is no progress in the phylogeographic assessment of R1a in this paper. There have been substantial discoveries of SNPs within this haplogroup as a result of commercial testing; however there is clearly an ascertainment bias in the newer discoveries, as almost all these SNPs have been detected in Europeans. The new paper confirms the high levels of Y-STR variance in India, Pakistan, and Iran. Together with the cornucopia of related paragroups in Iran, there is little doubt that this haplogroup originated in the general area of Central/South Asia.

Personally, as I have stated before, I would relate this R1a with Neolithic peoples living east of the Caspian, in contrast to the R1b bearers who lived west and south of it. These two populations came under the influence of the Indo-Europeans and spread in different directions. The Indo-Iranians were then initially the mixed descendants of the Indo-Europeans and the R1a old agricultural population, and were formed in the territory of the Bactria-Margiana Archaeological Complex.

This also explains the contrast between Iranian and Armenian groups: the latter mostly lack the R1a lineage, contrasting with all Iranian groups (even their Kurdish neighbors) who possess it. Conversely, Iranian groups, and especially eastern Iranians and Indo-Ayrans lack the R1b lineage. This is due to the fact that neither R1a nor R1b were originally part of the Indo-European community, but their geographical position was such that they came under the influence of the Indo-Europeans when the latter began their expansion.

UPDATE VI: I have created my own dendrogram using the Y-haplogroup frequencies and the hclust package of R (default parameters):

From top to bottom, one can identify some clusters:

Eastern Europe, further broken down into Balkans and Slavic+Hungary
West Asian/Caucasus
Iranian Proper
Arab

These correspond largely to the clusters identified by the authors, with India and the Turkmen sample emerging as the clear outliers. I omitted the Ethiopian samples, since E-M78 was not resolved phylogenetically, causing the Ethiopians to group with the likely E-V13 from the Balkans.

UPDATE VII: I have also run MCLUST over the haplogroup frequency data over the MDS representation of the distance matrix. The maximum number of 10 clusters occurred with 5 MDS dimensions retained. Population assignments in the 10 clusters can be found in the table below:

Iran/Azerbaijan_Gharbi+Tehran_(Assyrian)	1
Iran/Lorestan_(Lur)	1
Iran/Tehran_(Armenian)	1
Iran/Azerbaijan_Gharbi_(Azeri)	2
Iran/Hormozgan_(Bandari+Afro-Iranian)	2
Iran/Hormozgan/Qeshmi	2
Iran/Khorasan_(Persian)	2
Iran/Kurdistan_(Kurd)	2
Iran/Sistan_Baluchestan_(Baluch)	2
Pakistan	2
Iran/Fars+Isfahan_(Persian)	3
Iran/Gilan_(Gilak)	3
Iran/Yazd+Tehran_(Zoroastrian)	3
Turkey/Central	3
Turkey/East	3
Turkey/West_	3
Iran/Golestan_(Turkmen)	4
India	4
Iran/Khuzestan_(Arab)	5
Egypt_(Arab)	5
Iraq/Baghdad	5
Oman	5
Saudi_Arabia	5
Tunisia	5
United_Arab_Emirates	5
Iran/Mazandaran_(Mazandarani)	6
Iran/Yazd_(Persian)	6
Balkarian	6
Georgia	6
Albania	7
Greece	7
Bosnia	8
Croatia	8
Slovenia	8
Czech_Republic	9
Hungary	9
Poland	9
Ukraine	9
Iraq_(Marsh_Arab)	10
Qatar	10
Yemen	10

We can ignore cluster #4 which consists of the two outliers (India + Turkmen). The rest of the clusters seem relatively coherent. Notice, for example, the Arabian cluster #10, Balkan cluster #8, Eastern European cluster #9, Greek-Albanian cluster #7, Mixed Arab cluster #5.

PLoS ONE 7(7): e41252. doi:10.1371/journal.pone.0041252

Ancient Migratory Events in the Middle East: New Clues from the Y-Chromosome Variation of Modern Iranians

Viola Grugni et al.

Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations.

Link

June 27, 2012

Population structure in Qatar

The recent publication of Omberg et al. (2012) has reminded me of the data of Henn et al. (2012) on Qatar which I don't believe I've used yet. I used the K12b calculator on ~20,000 SNPs that are common between it and the Affymetrix chip used.

Below is the population portrait of the Qatari population:

Obviously this isn't a homogeneous population. In order to figure out which ancestral groups are present there, I ran MCLUST over the admixture proportions, which resulted in individuals assigned to five different clusters. Here are the average admixture proportions of these five clusters:

On the basis of the above, I conclude that there are several different groups represented in the Qatari population. I have absolutely no knowledge about the Qatari population, so it would be interesting to see if readers find correspondences between these and known social divisions in Qatar.

For example, I could wager that #5 which is a "Southwest Asian"+"Caucasus" mix represents a pure Arabian group with little outside influences. #1 and #2 are also Arab-like but with various degrees of admixture. #3 appears to include substantial African descendants and #4 a clear Iranian signal due to the high "Gedrosia" component. Of interest is that the "African" group #3 also scores high in the "South Asian" component.

June 26, 2012

SupportMix

This sounds amazing, hopefully I can give it a try before not too long. Software link.

BMC Genetics 2012, 13:49 doi:10.1186/1471-2156-13-49

Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations

Larsson Omberg et al.

Abstract (provisional)

Background

Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. 1

Results

Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information.

Conclusions

By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.

Link

October 05, 2011

Y-chromosomes of Marsh Arabs

What do the Marsh Arabs have to do with ancient Sumer? Nothing that can be determined on the basis of this data. There are plenty of ancient Sumerian skulls, so how about we study them directly?

As far as I can see, the only link between Marsh Arabs and Sumerians presented in this paper comes from dating Y-STR variation of their major J1-Page08 group using the evolutionary mutation rate, with a divergence time of 4.5 +/- 2.6 ky. Even if that mutation rate was correct (it is not) and the assumptions on which the confidence interval are based were exhaustive (they are not), we still have +/- 2.6 ky leeway to deal with, which spans not only the Sumerians but plenty more besides.

Not to mention that the evolutionary mutation rate is wrongly applied to every case under the sun, and that Y-STR based age estimation in general has been conclusively shown to be a rather futile exercise.

Nonetheless, the paper does have value in demonstrating the paucity of J2 and R1 in the Marsh Arabs compared to the more cosmopolitan general Iraqi population:

Different from the Iraqi control sample, the Marsh Arab gene pool displays a very scarce input from the northern Middle East (Hgs J2-M172 and derivatives, G-M201 and E-M123), virtually lacks western Eurasian (Hgs R1-M17, R1-M412 and R1-L23) and sub-Saharan African (Hg E-M2) contributions.

Rather than "Sumerian", it seems that the Marsh Arabs have rather preserved a more pristine Semitic patrilineal gene pool compared to the cosmopolitan Iraqi samples that have absorbed pre-Arab and pre-Semitic population elements.

BMC Evolutionary Biology 2011, 11:288doi:10.1186/1471-2148-11-288

In search of the genetic footprints of Sumerians: a survey of Y-chromosome and mtDNA variation in the Marsh Arabs of Iraq.

Nadia Al-Zahery et al.

Abstract (provisional)

Background
For millennia, the southern part of the Mesopotamia has been a wetland region generated by the Tigris and Euphrates rivers before flowing into the Gulf. This area has been occupied by human communities since ancient times and the present-day inhabitants, the Marsh Arabs, are considered the population with the strongest link to ancient Sumerians. Popular tradition, however, considers the Marsh Arabs as a foreign group, of unknown origin, which arrived in the marshlands when the rearing of water buffalo was introduced to the region.

Results
To shed some light on the paternal and maternal origin of this population, Y chromosome and mitochondrial DNA (mtDNA) variation was surveyed in 143 Marsh Arabs and in a large sample of Iraqi controls. Analyses of the haplogroups and sub-haplogroups observed in the Marsh Arabs revealed a prevalent autochthonous Middle Eastern component for both male and female gene pools, with weak South-West Asian and African contributions, more evident in mtDNA. A higher male than female homogeneity is characteristic of the Marsh Arab gene pool, likely due to a strong male genetic drift determined by socio-cultural factors (patrilocality, polygamy, unequal male and female migration rates).

Conclusions
Evidence of genetic stratification ascribable to the Sumerian development was provided by the Y-chromosome data where the J1-Page08 branch reveals a local expansion, almost contemporary with the Sumerian City State period that characterized Southern Mesopotamia. On the other hand, a more ancient background shared with to Northern Mesopotamia is revealed by the less represented Y-chromosome lineage J1-M267*. Overall our results indicate that the introduction of water buffalo breeding and rice farming, most likely from the Indian sub-continent, only marginally affected the gene pool of autochthonous people of the region. Furthermore, a prevalent Middle Eastern ancestry of the modern population of the marshes of southern Iraq implies that if the Marsh Arabs are descendants of the ancient Sumerians, also the Sumerians were most likely autochthonous and not of Indian or South Asian ancestry.

Link

June 10, 2011

mtDNA haplogroup HV1 across the Red Sea

American Journal of Physical Anthropology DOI: 10.1002/ajpa.21522

Population history of the Red Sea—genetic exchanges between the Arabian Peninsula and East Africa signaled in the mitochondrial DNA HV1 haplogroup

Eliška Musilová et al.

Archaeological studies have revealed cultural connections between the two sides of the Red Sea dating to prehistory. The issue has still not been properly addressed, however, by archaeogenetics. We focus our attention here on the mitochondrial haplogroup HV1 that is present in both the Arabian Peninsula and East Africa. The internal variation of 38 complete mitochondrial DNA sequences (20 of them presented here for the first time) affiliated into this haplogroup testify to its emergence during the late glacial maximum, most probably in the Near East, with subsequent dispersion via population expansions when climatic conditions improved. Detailed phylogeography of HV1 sequences shows that more recent demographic upheavals likely contributed to their spread from West Arabia to East Africa, a finding concordant with archaeological records suggesting intensive maritime trade in the Red Sea from the sixth millennium BC onwards. Closer genetic exchanges are apparent between the Horn of Africa and Yemen, while Egyptian HV1 haplotypes seem to be more similar to the Near Eastern ones.

Link

April 28, 2011

Comparing five methods of admixture estimation

In my comments on the Moorjani et al. (2011) I argued that admixture proportions presented in the paper are inaccurate, and gave my reasoning behind this claim. Moorjani et al. (2011) also present STRUCTURE 2.2 results.

Naturally, I wanted to see whether independent admixture estimates on some of the same populations had been estimated in the literature. This brought me to Pugach et al. (2011) which introduced a wavelet-based admixture estimation method called StepPCO. In that paper, the authors presented estimates of the extent and timing of admixture for some populations also included in Moorjani et al. (2011). They also compared with HAPMIX, a well-known method using a completely different methodology, and presented their comparative data in this table and in the body of their paper.

Hence, we now have 4 different estimates of admixture for some populations. To these, I decided to add supervised ADMIXTURE 1.1 results. I used CEU and YRI as "West Eurasian" and "Sub-Saharan" references so that I would be in accordance with these other methods.

(The ADMIXTURE results were obtained by merging the datasets in PLINK with the --geno 0.001 option, then pruning the combined set for LD with --indep-pairwise 50 5 0.3)

The following table summarizes the estimates.

Note that these are estimates of Sub-Saharan admixture assuming two parental populations; also, Moorjani et al. break up the Bedouin sample into the two distinct groups it is composed of, so I have taken a weighted average of the figures in their paper.

It is difficult to make meaningful statistical inferences on only a few comparison points, but I do observe that STRUCTURE 2.2 on 13,900 markers gives the higher estimates, followed by Moorjani et al.

The other three methods cannot be ordered, giving higher estimates in some populations and lower in others. They all give, however, lower estimates than both Moorjani et al. (2011) and STRUCTURE.

As I've explained in my earlier post, Moorjani et al. (2011) have higher estimates of admixture because they measure it by comparing populations' shift on the East Eurasian-African axis, ignoring the Asian-shift of North Europeans and adding it to the African-shift of southern Caucasoids. This leads them to conclude a few percentage points of African admixture in populations that have virtually none (such as Sardinians and North Italians, even Swiss French). For populations that do have noticeable African admixture (such as those on the table) their overestimates amount to a a few percentage points.

Three of the methods also provide an estimate of time since admixture:

There is no simple relationship between these times, but an obvious pattern is that the dates of Moorjani et al. are younger, perhaps less than 50% of the other two methods.

Conclusion

Clearly, the art of admixture estimation is still in its infancy, and different methods provide different results even with a simple 2-population model. I've argued how the results of one method can be harmonized with those of the others, but I don't have a ready explanation about the substantial age differences. Pugach et al. argue that their method is better than HAPMIX, but the differences between the two seem small compared to the differences of both methods to ROLLOFF (the method of Moorjani et al.'s paper).

The discrepancy is even more interesting if one takes into account the fact that HAPMIX and ROLLOFF were done by many of the same people. Hopefully, someone will be able to figure out the cause behind the discrepancy. A commenter in my earlier post suggested that ROLLOFF produces younger ages because its age estimation is tied to its inflated admixture proportions; this could be true, however, the discrepancy exists even in populations where the relative difference is small.

A speculative historical coda

Historical explanations about the circumstances of this admixture need to be made with some caution, due to the uncertainty about admixture times.

For example, a doubling of the Moorjani et al. age estimates would disentangle the Sub-Saharan element in Levantine Arabs from the Islamic epoch. A doubling of the admixture date for Jewish populations, as presented by Moorjani et al. would bring that admixture's age to the middle of the 2nd millennium BC, a period in which the Hebrews were said to be in Egypt, where potentially they may have collectively acquired a small African admixture.

Hopefully, with time and full genome sequencing, we will get a better idea of what these African signals in some West Eurasian populations represent.

March 29, 2011

The power of Clusters Galore: Iranians and Arabs

The full power of Clusters Galore depends on its ability to infer clusters of arbitrary size, shape, and orientation in a high-dimensional space. It achieves this by using MCLUST over an MDS or PCA representation of dense genomic data.

Nonetheless, we can still see get a sense of it even in a simple 2D representation as the following:

This was produced by applying MDS on 240 individuals (from Behar et al. 2010, HGDP, Xing et al. 2010, and the Dodecad Project).

One can see that the Behar et al. and Dodecad Iranians form a small cluster on the right, together with the Xing et al. Kurds and the single Dodecad Kurd. Arabs are quite more variable: Druze extend to the bottom of the figure, Bedouin form two groups: one similar to other Arabs, the other extending to the left of the figure. There are also a few Arabs stretching to the top.

The variability of the Arabs can be attributed to reproductive isolation, inbreeding, and variable amounts of African admixture. Let's apply MCLUST over these 240 2D points:

The above visual representation shows the centroids and shapes of the 5 inferred clusters. Here are the numbers of individuals from each population assigned to each cluster:

Notice cluster #5: it consists of all Kurds, most Behar et al. Iranians and all Dodecad ones, and the single Dodecad Kurd, plus a Lebanese and a Syrian. It is overall 96% Iranic in composition. It is quite tempting to think that the two Syrian and Lebanese members have some links to Iranian peoples either due to Kurdish ancestry or the Shia form of Islam.

The more variable Arabs are split into multiple clusters: the main, tight, cluster #3 which includes most of the Levantine Arabs, but also some Saudis and Yemenese, the extremely variable African-admixed cluster #1 dominated by some Yemenese but including a few others, the "Arabian" Saudi-Bedouin dominated cluster #2, and the Druze-specific cluster #4.

It seems that just as the distinction between Celto-Germans and Balto-Slavs is not only cultural, but also genetic, so is the distinction between Iranian and Arab. In the case of the Arabs though, religious distinctions (e.g., the Druze), variable African admixture, and quite possibly Arabization of Levantine populations has resulted in a non-homogeneous array of genetic clusters.

PS: Iranic groups are also not homogeneous if one includes some of those from South Asia, as evidenced by this previous genetic map of West Eurasians which analyzed Kurds and Iranians together with Pathans and Balochis.

March 28, 2011

Relationship between Iran and the Arabian peninsula

J Hum Genet. 2011 Mar;56(3):235-46. Epub 2011 Feb 17.

Mitochondrial DNA and Y-chromosomal stratification in Iran: relationship between Iran and the Arabian Peninsula.

Terreros MC, Rowold DJ, Mirabal S, Herrera RJ.

Abstract
Modern day Iran is strategically located in the tri-continental corridor uniting Africa, Europe and Asia. Several ethnic groups belonging to distinct religions, speaking different languages and claiming divergent ancestries inhabit the region, generating a potentially diverse genetic reservoir. In addition, past pre-historical and historical events such as the out-of-Africa migrations, the Neolithic expansion from the Fertile Crescent, the Indo-Aryan treks from the Central Asian steppes, the westward Mongol expansions and the Muslim invasions may have chiseled their genetic fingerprints within the genealogical substrata of the Persians. On the other hand, the Iranian perimeter is bounded by the Zagros and Albrez mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts, which may have restricted gene flow from neighboring regions. By utilizing high-resolution mitochondrial DNA (mtDNA) markers and reanalyzing our previously published Y-chromosomal data, we have found a previously unexplored, genetic connection between Iranian populations and the Arabian Peninsula, likely the result of both ancient and recent gene flow. Furthermore, the regional distribution of mtDNA haplogroups J, I, U2 and U7 also provides evidence of barriers to gene flow posed by the two major Iranian deserts and the Zagros mountain range.

Link

March 06, 2011

AAPA 2011 abstracts

A draft of the abstracts from the 80th meeting of the American Association of Physical Anthropologists is online. Some titles of interest:

Cristian Capelli et al.
Early Y chromosome lineages in Africa: the origin and dispersal of Homo sapiens.

The study of Y chromosome variation in extant populations has provided significant insights into the genetic history of Homo sapiens. Focusing on sub-Saharan Africa, demographic events associated with the spread of languages, agriculture and pastoralism have been targeted but little is known on the early history of the continent. The first two branches of the Y chromosome genealogy, namely haplogroup A and B, are African specific, with average continental frequencies of 14-34%, reaching up to 65% in groups of foragers . Despite the potential of such lineages in revealing signatures of the ancient peopling of the continent, an exhaustive investigation of their distribution and variation is currently missing. Here we show that their systematic dissection provides novel insights into the early history of our species. We highlighted complex pattern of populations’ dynamics among hunter-gatherer communities, evidence for the peopling of western and southern Africa, and showed the retention of the very early human Y chromosome lineages in eastern and central but not southern Africa. These results open new perspectives on the early African history of Homo sapiens, with particular attention to areas of the continent where human fossil remains and archaeological data are scanty.

Aslihan Sen et al.

The genetic history of the Karachays:Insights from mtDNA and Y-chromosome evidence

The Karachay-Malkar population of the northwestern Caucasus Mountains has an interesting but unclear history. Oral traditions indicate that they descended from the Alans, ancient Iranian tribes who entered the region starting in the 1st century BC. However, they now speak a Kipchak Turkic language, which was purportedly brought to the Caucasus by the Kumans from the Minusinsk Basin (Yenisei River-Altai Mountains). They are also allegedly related to the Hun-Bulgars, with the name Malkar/Balkar being evidence for this affiliation. Therefore, to elucidate their genetic past, we characterized genetic variation in 106 Karachay individuals using a combination of HVS1/ HVS2 sequencing and SNP analysis for mtDNAs and SNP and STR analysis for Y-chromosomes. We observed a predominance of mtDNA haplogroups H and U in this population, along with a minority of East Eurasian lineages, and mostly Y-chromosome haplogroups G, I, J and R1. The mtDNA data suggest that the Karachay are most similar to the Adygei, among Caucasus populations, and have affinities with eastern Iranians, supporting the hypothesized link to Scythio-Iranians (Alans), although being quite distant to Turkic speaking indigenous Altaians. By contrast, Y-chromosome data point to genetic links with populations from Anatolia, the Near East and the Balkans, as well as the Volga-Ural region, Central Asia and Siberia, the source area for ancient Turkic populations. Using these data and associated genealogical and linguistic evidence, we attempt to reconstruct the history of the Karachay population and assess its genetic relationships to the diverse ethnolinguistic groups of the Caucasus.

Jasem Theyab et al.
The genetic structure of the Kuwaiti population: mitochondrial DNA markers.

In the past few decades, researchers using human mitochondrial DNA (mt- DNA) have significantly contributed to our understanding of human evolution and migration. However, little attention has been paid to the Arabian Peninsula which is assumed to be one of the first inhabited regions following the expansion of early Homo sapiens out of Africa. Recently, a number of investigations have started to reconstruct human expansion through the archaeology and the study of the genetic structure of populations of the Arabian Peninsula. Populations of Kuwait, located in the Northeast portion of the Arabian Peninsula, have not been studied from a molecular genetic perspective. This research investigated the mitochondrial DNA (mtDNA) genetic variation in 117 unrelated individuals to determine the genetic structure of the Kuwaiti population and compared the Kuwaiti population to their neighboring populations. Restriction fragment length polymorphism (RFLP) and mt- DNA sequencing analyses were used to reconstruct the genetic structure of Kuwait. The result showed that the Kuwaiti population has a high frequency of haplogroup pre-HV (18%) and U (12%) similar to other Arabian populations. In addition, the African influence was detected through the presence of haplogroup L (1.6%). Furthermore, the MDS plot showed that the Kuwaiti population is clustered with neighboring populations, including Iran and Saudi Arabia, but not Iraq.

Kristin L. Young et al.
Paternal genetic history of the Basque population of Spain.

This study examines the genetic variation in Basque Y chromosome lineages using data on 12 Y-STR loci in a sample of 158 males from four Basque provinces of Spain. In agreement with previous studies, the Basques are characterized by high frequencies of haplogroup R1b (83%). Five additional haplogroups were identified in this sample: E1b1b (6%), J2a (3%), I2 (3%), G2a (2%), and L (1%). Only 8% of haplotypes were found in more than one province, and the AMOVA analysis shows only a small amount of variation (1.71%, p50.0369) is accounted for between provinces, demonstrating the overall homogeneity of this population. Gene and haplotype diversity levels in the Basques are on the low end of the European distribution (gene diversity: 0.4268; haplotype diversity: 0.9421). Other isolated populations in Europe, including the Swedish Saami, the Roma in Portugal, and Albanians in Kosovo, also exhibit low haplotype diversity levels. Comparison of the Garza-Williamson Index for the Basques and 36 additional European populations shows no significant impact of a recent genetic bottleneck on the continent. A bootstrapped neighbor-joining tree (R2 5 0.922) of Shriver’s genetic distances (DSW) clusters Basque populations with other Atlantic Fringe groups (Galicia, Ireland) and the non- Indo-European Saami. Paleolithic and Neolithic contribution to the paternal Basque gene pool was estimated by measuring the proportion of proposed Paleolithic (R1b, I2a2) and Neolithic haplogroups (E1b1b, G2a, J2a). The Basque provinces show varying degrees of post-Neolithic contribution in the paternal lineages, with 10.9% Neolithic lineages in the combined sample.

Timothy D. Weaver
Did a short-term event in the Middle Pleistocene give rise to modern humans?

It is often stated that modern humans originated 250,000-150,000 years ago. This statement implies, at least implicitly, that something ‘‘special’’ happened at this point in the Middle Pleistocene, such as a speciation event that was perhaps triggered by, or resulted in, a bottleneck in human population size. Two pieces of evidence are usually said to support this contention: that living human mitochondrial DNA haplotypes coalesce _200,000 years ago, and that fossil specimens classified as anatomically modern humans begin to appear shortly afterward. Alternatively, modern human origins could have been a lengthy process that lasted from the divergence of the modern human and Neandertal evolutionary lineages _400,000 years ago to the expansion of modern humans out of Africa _50,000 years ago, and nothing particularly ‘‘special’’ happened 250,000-150,000 years ago. Because this alternative model does not posit a discrete origins event, it may be better able to explain why [50,000-year-old fossils are arguably only ‘‘near modern’’ in anatomy. Here I use computer simulations based on theory from population and quantitative genetics to show that the alternative lengthy-process model also is consistent with a _200,000-year-old mitochondrial DNA coalescence time and the appearance shortly afterward of fossil specimens that, at least for some traits, appear to be anatomically modern. I further discuss how these two models differ in their predictions and whether or not it is possible to distinguish between them with current fossil and genetic evidence.

Steven L. Wang

Regional isolation and extinction? The story of mid-Pleistocene hominins in Asia.

Over the past decade, numerous reviews of the Middle Pleistocene record have taken place in light of new fossil discoveries. However, with primary foci on the Euro- African records, much of the rich fossil evidence in Asia was sidelined and overlooked. It is thus unsurprising that in the minds of many, Asia remains terra incognita— and its hominin record exotic. Moreover, the accuracy of the Asian chronology remains problematic, adding another layer of impediment to our understanding of regional evolution and local adaptation. In this context, I bring a synergistic review of the chronology of mid-Pleistocene hominins from East and South Asia, including recent new dates from key sites such as Zhoukoudian Locality 1 and Hathnora. Using 3-D geometric morphometric data, I examine cranial shape changes between H. erectus and mPH (post-erectus, non- Neandertal mid-Pleistocene Homo), as well as both to later Pleistocene hominins. A large number of not-often-discussed specimens are considered (e.g., Hexian, Nanjing 1, Maba, and Ngawi), many of them original fossils. The cranial anatomy from the Asian mid- Pleistocene suggests the existence of at least two distinctive groups in the region. Additionally, a north-south (geographical) shape difference is observed, hinting the presence of paleodemes each evolving in relative isolation. The shape affinity of mPH to extra-Asian fossils is confirmed; however, depending on the fossil in question (Dali or Narmada), the said affinity to Kabwe and Petralona is exclusive. This, coupled with a limited number of good sample, warrants caution against lumping all Asian mPH within the H. heidelbergensis hypodigm.

John Hawks
Deep genealogy, Neandertal ancestors, and our accelerating evolution

Anthropologists have long confused genealogical and behavioral definitions of humanity. At least five out of six living living humans have Neandertal ancestors, which comprise an estimated 1 to 4% of their ancestry. Human genes have divergent genealogical histories, representing multiple "archaic" populations inside and outside of Africa. Late Pleistocene populations show comparable technical and symbolic abilities within and outside of Africa. A humanlike vocal-auditory channel had appeared before 600,000 years ago. Yet humans of the last 40,000 years have evolved extremely rapidly, in some instances diversifying; in others paralleling each other. Using new visualization methods, I examine the genealogical patterns of human genes. The impact of our rapid Holocene evolution simplifies some genealogical relationships while partially obscuring earlier ones. The genetic echoes of Neandertals and other archaic populations emerge against a slim network binding all living people. These networks show the impact of adaptive potential in ancient human populations. A broad view of human cultural and technical records suggests that gene-culture interaction may be a fundamental aspect of Pleistocene human evolution.

December 01, 2010

Y-chromosomes of Maronites from Lebanon

The freely available supplementary material contain a real treasure trove of Y-STR haplotypes for different populations of Lebanon and from Iran.

UPDATE: The paper uses the wrong Zhivotovsky et al. "evolutionary" mutation rate, hence their age estimates are inflated 3-fold. Hence, their conclusion that religion differences were superimposed on an already structured population is also wrong, in my opinion.

The write, for example that:

The Christian–Muslim split dated to 3475 (2000–6025) ybp for pooled Muslims and 3325 (1875–4225) ybp for pooled Christians.

Divide these by 3 and you get about 1.2ky which is quite close (given the huge confidence intervals, of course) to the arrival of Islam to the country. Once again, the genealogical mutation rate conforms with history, while the "evolutionary" one suggests a speculative scenario about the supposed long-term maintenance of structure on which the Islam-Christian distinction was superimposed.

European Journal of Human Genetics advance online publication 1 December 2010; doi: 10.1038/ejhg.2010.177

Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon

Marc Haber et al.

Cultural expansions, including of religions, frequently leave genetic traces of differentiation and in-migration. These expansions may be driven by complex doctrinal differentiation, together with major population migrations and gene flow. The aim of this study was to explore the genetic signature of the establishment of religious communities in a region where some of the most influential religions originated, using the Y chromosome as an informative male-lineage marker. A total of 3139 samples were analyzed, including 647 Lebanese and Iranian samples newly genotyped for 28 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y chromosome. Genetic organization was identified by geography and religion across Lebanon in the context of surrounding populations important in the expansions of the major sects of Lebanon, including Italy, Turkey, the Balkans, Syria, and Iran by employing principal component analysis, multidimensional scaling, and AMOVA. Timing of population differentiations was estimated using BATWING, in comparison with dates of historical religious events to determine if these differentiations could be caused by religious conversion, or rather, whether religious conversion was facilitated within already differentiated populations. Our analysis shows that the great religions in Lebanon were adopted within already distinguishable communities. Once religious affiliations were established, subsequent genetic signatures of the older differentiations were reinforced. Post-establishment differentiations are most plausibly explained by migrations of peoples seeking refuge to avoid the turmoil of major historical events.

Link

September 28, 2010

Some ADMIXTURE estimates in Eurasia

(Last Update: Sep 29)

Continuing my exploration of ADMIXTURE, I turned to the HGDP data, which has 660,918 SNPs for a wide assortment of worldwide populations. After pruning 12,086 SNPs with more than 1% missing genotypes, I was still left with ~650k SNPs.

Here are some experiments on this dataset. First, a clustering with K=2 of Han Chinese, Russians, and Orcadians (left to right)

The emergence of 2 clusters (red=Mongoloid, blue=Caucasoid) is as expected, with Russians showing a small participation in the red cluster (7.2%). These northern Russians are believed to have a substantial Finno-Ugric genetic origin, so this is inline with a recent estimate for the eastern component in the westernmost Finno-Ugric speakers being less than 10% (but see below).

Notice a couple of Chinese individuals with a small Caucasoid component: as I've mentioned before Mongolians, and presumably northern Han have a small Caucasoid component from early movements of Iranian speakers from the west. That's an advantage of doing your own admixture analysis, that you can look at the data at a fine detail, and not rely on the published figures.

Next, a clustering of Orcadians, Uygur, and Han Chinese:

The variable admixture in Uygurs is evident (47.2-63.7%, mean: 54.2%)

Next, a clustering of Druze, Bedouin, and Bantu from Kenya.

Druze appear complete Caucasoid (red), Bantu completely Negroid (save for a couple of individuals), while Bedouins show a quite variable minor Negroid component. This variable African contribution (0-17.6%) makes an elongated cluster out of Bedouins in a recent analysis, pulling them away from other Middle Eastern populations in a Sub-Saharan direction.

Finally, I clustered European populations together with Mandenka and Han Chinese:

The populations are in the following order: Han, Mandenka, Orcadian, French Basque, French, North Italian, Tuscan, Sardinian, Russian.

Here are the admixture proportions:

Notice how the eastern component in Russians is now estimated as 10.9%. This probably reflects the inclusion of French Basque and Sardinians, i.e., populations which have historically no opportunity for eastern Eurasian admixture, rather than only Orcadians. This underscores the importance of having appropriate poles in inter-continental admixture estimates (see Appendix I).

Note also that the 100% value for the Han Chinese is not incompatible with the presence of the two aforementioned Caucasoid-admixed individuals, who are present here with an estimated 1.9% and 0.5% such admixture. However, this contributes little to the sample average of 40+ individuals.

The minor (0.1%) Sub-Saharan admixture in Tuscans and Sardinians is also interesting. As you can guess from the figure, this stems from a handful of individuals (green specks) with less than 1% admixture, which is, however more than the numerical low of 0.001% inferred for most Europeans by the software.

UPDATE I: Eurasian Cline

Below is a run for the following populations (left-to-right: French Basque, Russians, Uygur, Mongolians, Daur, Han Chinese). Notice that the Mongolic-speakers (Mongolian and Daur from HGDP have a small Caucasoid admixture, as I have mentioned before.

APPENDIX I: The importance of choosing poles

The choice of appropriate poles in the estimation of inter-continental admixture is extremely important.

If there is a racial admixture continuum between two major races, such as we observe in Eurasia, then we can express each intermediate population as a weighted sum of populations that live to the east and west of it.

For example, I will use a variable in interval [0, 1] to represent the position in the continuum, with 0: pure western, and 1: pure eastern.

A population at 0.4 can be expressed as the following weighted sum:

0.4 = 0.6*0 + 0.4*1

i.e., as an admixture of 60% western, and 40% eastern.

But, it can also be expressed as e.g.,

0.4 = 0.612*0.02 + 0.388*1

Notice that the choice of a slightly eastward-tilted "western pole" (at position 0.02 in the continuum) has resulted in a reduction of the inferred eastern component (from 40% to 38.8%).

This is exactly what happened in our example: Russian eastern admixture reduced when we used Orcadians, rather than French Basque as the western pole.

Note also, that this is all done automatically: no one told ADMIXTURE to identify these two poles: it was the presence of unlabeled individuals from different ends of the spectrum that influenced the admixture estimates for the rest.

APPENDIX II: Latent populations

Another important point that needs to be remembered has to do with the possible existence of latent ancestral populations.

For example, it is true that Eurasia (minus South Asia) is economically described as a continuum from the Caucasoids of the Atlantic coast to the Mongoloids of the Pacific, with a transition zone in Central Asia and Siberia, and spillovers on either side. But, we cannot exclude the prehistoric existence of other races in the Eurasian landmass that do not exist today in a relatively unadmixed form.

In Eurasia, the Proto-Uralic race was postulated as such a "third race" with features of its own and not reducible to simple Caucasoid-Mongoloid admixture. It is difficult to see whether these features are ancestral peculiarites (prior to admixture with Caucasoids and Mongoloids), or if they have arisen in a mixed Caucasoid-Mongoloid population.

It is also important to understand how such latent populations affect genetic continua:

First, if the latent population is equidistant from the two major races, then its admixture has no effect on an individual's position in the continuum between the two races. However, it is possible that the latent population was more related to one of the two major races. In that case, admixture with it will move a population towards that race.

So while the jury is still out about the existence of a Proto-Uralic race in Eurasia, its effects on admixed populations indicates that if it had existed it was genetically closer to Mongoloids than to Caucasoids.

September 08, 2010

ASHG 2010 abstracts

The 2010 meeting of the American Society of Human Genetics is in November. Here are some interesting abstracts that caught my eye:

It's nice to finally see a genomic study on the Greek population.
P. Paschou et al. Evaluation of the HapMap dataset as reference for the Greek population.

The HapMap project has provided a unique tool for the analysis of human genetic variation, providing reference information for allele frequency and genotype distributions as well as linkage disequilibrium patterns of Single Nucleotide Polymorphisms (SNPs) across the entire genome. The latest release of HapMap phase 3 data provides genotypes for millions of SNPs in 11 populations from around the world, with Europe being represented by the CEU (originating from Northwestern Europe) and the TSI populations (Tuscan Italians from Southern Europe). Although initial studies support the fact that the CEU can be used as reference for the selection of tagging SNPs in other European populations, a critical step in the design of genetic association studies, this hypothesis has not been extensively studied across Europe and in particular in Southern Europe. We set out to explore the extent to which the HapMap populations can be used as reference for a previously unstudied population of South-Eastern Europe, the Greek population. To do so we studied genomic variation in 1,813 SNPs, genotyped by our group in 56 individuals of Greek origin, and compared them to the CEU and TSI genotypes (1,813 SNPs from the CEU HapMap dataset and 1,205 from the TSI dataset). The studied SNPs are spread over 13 autosomal chromosomes and 26 regions, ranging in size from 120Kb to more than 4Mb. Genotype, allele frequency, and pairwise LD measures were compared across all three populations. PCA was used in order to identify those markers that are responsible for the observed inter-sample variance. Tagging SNPs were selected in the CEU and TSI samples and their transferability to the Greek population was tested, using both the r2 metric as well as the efficiency of genotype imputation of the non-selected SNPs. Our results demonstrate that, although the CEU population can to some extent be used as reference for the Greek population, it is preferable to use as reference a European population of closer genetic ancestry, like the TSI. These results are applicable in medical genetics, in order to inform the design of genetic association studies, as well as in studies of evolutionary relationships of Southern European populations.

One of the great problems of Eurasian anthropology is whether the Uralic populations are simply variable admixtures of Caucasoids and Mongoloids or they contain a tertium quid in the form of a Proto-Uralic element. The latter need not be distinct from the other two, as it can also be an old or stabilized blend of the two major Eurasian races that later admixed with more recent groups on either side. The abstract does not seem promising in this respect, i.e., in identifying a common core of ancestry among Uralic speakers in addition to their variable east-west admixture, but it would be nice to see if anything like that exists in the paper.

K. Tambets et al. Haploid and autosomal variation within a linguistic continuum of the Uralic-speaking people of Eurasia.

For about last two decades the examination of uniparentally inherited genetic marker systems revealing the variation embedded in mtDNA and Y chromosome has been the main tool in the studies of human genetic origins. Within few recent years the analysis of the genome-wide SNP data of individuals from different populations has started to give promising new insights in the field of human population genetics. The uniparentally inherited markers have shown slightly different demographic scenarios for the maternal and paternal lineages of North Eurasian, particularly of European Uralic-speaking populations. The geographical location of a population has evidently been the most important component that dictates the proportion of western and eastern mtDNA types in the gene pool of Uralic-speakers. Thus, the palette of maternal lineages of the Uralic-speakers resembles that of their geographically close European or Western Siberian Indo-European and/or Altaic-speaking neighbours, respectively. At the same time, the most frequent North Eurasian Y chromosome type N1c, that is also a common link between almost all Uralic-speakers, is with few exceptions rare, if present at all, among Indo-European-speakers of Western and Southern Europe. Here we combine genome-wide high density SNP data (650 000 SNPs, Illumina) with uniparentally inherited mtDNA and Y-chromosome variation of 16 Uralic-speaking populations to assess their place on the genetic landscape of North Eurasia. By the use of principal component and structure-like analysis on the autosomal data we show that the proportions of western and eastern ancestry components among the Uralic-speakers are determined mostly by geographical factors. The westernmost populations from Europe, both Uralic- and Indo-European speakers, are similar in their pattern of ancestry components and show low levels (less than 10%) of the eastern component. Conversely, the eastern ancestry component is dominant (60-70%) in the gene pool of the Siberian Uralic-speakers. In general, the genome-wide analyses corroborate the results of mtDNA analysis and do not reflect the common genetic characteristics between western and eastern Uralic-speakers at the level seen in case of N1c. Interestingly, among Saami from North Europe, who are often considered as „outliers“ in genetic studies, the dominant western component is accompanied by 30% of eastern component making them more similar to Volga-Uralic populations than to their closest neighbours.

This seems to validate my thoughts on relics and their importance in age estimation.

U. A. Perego et al. The Initial Peopling Of The Americas: An Ever-Growing Number Of Founding Mitochondrial Genomes From Beringia

Genetic evidence based on mitochondrial DNA (mtDNA) has recently revealed the existence of additional founding lineages that have contributed to the first peopling of America’s double-continent in addition to the more popular five Native American haplogroups (A2, B2, C1, D1 and X2a), and has demonstrated as well the need for additional sampling and analysis to be performed for some of the already known but poorly characterized lineages. One paradigmatic example is represented by the pan-American haplogroup C1. Two of its sub-branches (C1b and C1c) harbor ages and geographical distributions that are indicative of an early arrival from Beringia about 15-17,000 years ago, concomitantly with the other currently accepted Paleo-Indian founders. However, the estimated age of C1d - the third Native American subset of C1 - is only 8-10,000 years, which is suggestive of a much later entry and spread in the Americas. In this study, we shed light on the origin of this enigmatic Native American branch of C1 by completely sequencing a large number of C1d mitochondrial genomes from a wide range of geographically diverse, mixed and indigenous American populations. The revised phylogeny shows that the age previously reported for C1d was heavily underestimated and indicate that C1d is ancient enough to be among the founding Paleo-Indian mtDNA lineages. Moreover, our results reveal that there were two C1d founder genomes for Paleo-Indians that most likely arose early (~16kya), either in the dynamic Beringian gene pool, or at a very initial stage of the Paleo-Indian southward migration. This brings the recognized maternal founding lineages of Native Americans to the unexpected number of 15, and indicates that the overall number of Beringian or Asian founder mitochondrial genomes will probably continue to increase as more Native American haplogroups reach the same level of phylogenetic resolution as we obtained here for C1d. Additionally, we have confirmed a nearly identical geographic distribution pattern for haplogroup C1d when comparing samples collected in the general mixed population with those from native tribal groups, as it was also reported previously for haplogroups X2a and D4h3. This substantiates the validity of searching large public mtDNA databases (such as the one available through the Sorenson Molecular Genealogy Foundation, www.SMGF.org) for novel founder candidates able to reveal unknown details concerning the ancient human history of the Americas.

Another interesting abstract. I've written before about the association of Y-chromosome haplogroups with the spread of Semitic speakers and the agreement with language phylogenetics.

N. Al-Zahery et al. The male gene pool of the contemporary Mesopotamia marsh population supports their Semitic origin.

The origin of the modern Mesopotamia marsh people, which are locally called “Ma’dan” or “Marsh’s Arabs”, is a question of great interest. Based on their life-style (living in reed houses, grazing of water buffalo and other aspects) and local archaeological sites, many historians and archaeologists believe they may have Sumerian ancestry. Although little is known about the origin of Sumerians themselves, two main hypotheses have been advanced in this regard. According to the first, Sumerians were a group of populations which migrated from the “South East” following a seashore route through the Arabian Gulf, and settled down in the southern marshes of Iraq. According to the second, the advancement of the Sumerian civilization is the result of migration from the mountainous area of Anatolia to the southern marshes of Iraq where they settled, adsorbing previous populations. In order to shed some light on the genetic origin of the Mesopotamia marsh population, we investigated the male gene pool of 145 DNA samples of modern Mesopotamia people, still living in marshes in the south of Iraq. The analyses of Single Nucleotide Polymorphisms (SNPs) and Short Tandem Repeats (STRs) of the paternally transmitted Male Specific region of the Y chromosome (MSY) revealed that more than 80% of marsh Y chromosomes belong to (Hg) J1-M267, the autochthonous haplogroup of Middle Eastern/Semitic speakers with possible recent expansion and/or founder effect reflected by the reduced STRs variability. In particular, 90% of them were assigned to the J1e-M267-PAGE08 sub-haplogroup, which is the predominant Y chromosome lineage among Middle Eastern Arab populations (Yemen, Qatar, UAE, and Levant). Thus, these findings testify, at least from the paternal side, a strong Semitic Arabian component in the contemporary Mesopotamia marshes population, whereas no clear Anatolian and/or South Asian genetic evidence has been detected.

The finding of haplogroup I in China is surprising, as I is not generally found that far away from Europe. It would be interesting to see what the actual haplotypes are.
Y. Lu et al. Western Eurasian Y chromosomes found in the Chinese Salar ethnic group

Salar is a small Western-Turkish-speaking population living mostly in Qinghai province of China. The most similar languages to Salar are all far in Turkmenistan. Historical records suggested that they may be descendants of the Turkic nomadic tribes in Central Asia. In this study, 141 Salar Y chromosomes were analyzed for 39 SNP and 14 STR markers to investigate the potential imprints of their western ancestors. The most frequent haplogroup (hg) in this population sample is Hg R, comprising 40% of all Y chromosomes. Most of these Hg R samples belong to R1a1 (M17), which distributes in a wide geographic region including South Asia, East Europe, Central Asia, and South Siberia. Other four Western Eurasian haplogroups (G-2%, H-5%, I-3%, J-3%) were also found in Salar Y chromosome gene pool. These paternal lineages of Salar are absent in their East Asian neighbors but frequent in Central Asia. Y-STR-based analyses also grouped Salar to Central Asians. On the other side, Salar also has low frequencies of the East Asian specific Hg D and Hg O, suggesting possible gene flow from their neighboring populations. This Y chromosome study demonstrated that Salar well keeps the Western Eurasian paternal lineages of their Central Asian ancestors although they may have migrated to Central China for about 800 years.

I wish that more "people pairs" would be studied this way, as it would give us some good insight of how migration affects gene pools (allele frequency changes, founder effects, possible social selection etc.)

M. Davis et al. Ancient and recent demographic events influence mitochondrial DNA diversity in an immigrant Basque population

The Basques are an ancient people, considered by many anthropologists to represent the oldest extant European population. Because of this, they have been the subject of numerous sociological and biological investigations. The Basque Diaspora, a relatively recent demographic expansion of the Basque population, has until now been overlooked in genetic studies. Samples were taken from 53 individuals with Basque ancestry in Boise, Idaho, and the mitochondrial DNA (mtDNA) sequence variation of the first and second hypervariable regions were determined. Thirty-six mtDNA haplotypes were detected in the sample. Comparing the genetic diversity in the Idaho sample with other Basque populations, signatures of founder effects were observed, consistent with both the recent and ancient history of Basque mitochondrial lineages. There has been a marked alteration of haplogroup frequency and diversity, and there is a slight reduction in other measures of diversity in the NW Basque population compared to the native Basque population. We have found a relatively high percentage of the Cambridge Reference Sequence (rCRS) haplotype for hypervariable regions I and II, which is absent in previous studies of Basque mtDNA, and rare in other Spanish populations. The amount of nucleotide diversity is consistent with a sample that is predominantly haplogroup H, which is especially common in the Basque regions of Europe, due to ancient migrations and expansions out of glacial refugia. This is the first report of mtDNA diversity in an immigrant Basque population, and we find that the diversity in NW Basques can be explained by the recent history of migration, as well as the phylogeography and diversity of the major European haplogroups.

W. S. Watkins et al. Admixture in New World populations: an analysis of Y-chromosome, mtDNA, and genome-wide microarray data

The first major interaction between Native Americans and Europeans is documented historically and occurred less than 550 years ago. This recent time frame provides an excellent opportunity to investigate the effects of admixture between two populations that were previously separated for hundreds of generations. To characterize European admixture in Native American populations, we sampled and analyzed a group of isolated Totonac agriculturists from tropical Mexico near Veracruz and a group of native Bolivians predominantly from the mountainous region near La Paz, Boliva. Mitochondrial sequencing of HVS1 showed that all samples had pre-Columbian mtDNA haplogroups (A, B, C, and D). Using a panel of 48 STRs or 12 Y-chromosome SNPs, Totonac Y-chromosomes lineages were all assigned to the pre-Columbian haplogroup Q1a3a, and Bolivian Y-chromosome lineages were assigned to haplogroups Q1a3a, R1, and J2. Haplogroups R1 and J2 are common in European populations. Principal components analysis (PCA) using >800K autosomal SNPs typed in 24 Totonacs and 23 Bolivians showed that all Totonacs and 14 Bolivians clustered distinctly from Eurasian individuals. Nine Bolivians, however, were positioned between the New World and European PCA clusters. Admixture analysis showed that these nine samples had 21 - 33% European admixture using a European reference population. All three observed Y-chromosome haplogroups, including the well-studied pre-Columbian haplogroup Q1a3a, occurred in the admixed individuals. Two of the nine admixed individuals had pre-Columbian mtDNA and Y-chromosome haplogroups but 21-23% European ancestry. This result demonstrates that Y-chromosome and mtDNA haplogroups are only partial indicators of an individual’s complete ancestry.

Readers of the blog know that I don't agree with the scenario presented in the followin abstract. The serial founder effect idea is used by geneticists to explain the overall reduced genetic diversity of our species (that we appear to be young, in evolutionary terms). Personally, I don't see how a smart, expanding species that all of the sudden had access to the resources of the landmass of Eurasia went through these extreme bottlenecks.
I think that the alternative of a larger human population, genetic diversity reduced across the species by ongoing climate- and culture-mediated selection, and admixture within Africa itself -where a particular expanding H. sapiens group must've co-existed with pre-existed hominids, anatomically modern or not- has merit.
J. Long et al. Evidence for archaic admixture in contemporary non-African human populations

Analyses of large-scale genetic data sets show evidence for a series of founder effects that occurred as modern humans left Africa and settled the rest of the world. Nonetheless, research on modern humans has not ruled out the possibility that other processes, such as local gene flow, or mixing between archaic and modern humans, have also contributed to modern human diversity. Recent analyses of the Neanderthal genome make archaic admixture a salient issue because they show evidence for mixing between Neanderthals and out-of-Africa migrants. The present study examines evidence for archaic admixture in genotypes for 619 microsatellite loci collected from over 2,000 individuals from 100 human populations. We obtained these data from the Marshfield Clinic collection. The populations analyzed represent all inhabited continents of the world. In our analysis, we formulate the serial founder effects (SFE) model as a special case of a phylogenetic model promoted by Cavalli-Sforza and his associates. In this light, the SFE process makes four predictions: 1) A tree of descent according to the pattern of fissions. 2) The root of the tree lies in Africa. 3) The length of each branch is proportional to ratio of evolutionary time to effective population size. 4) The gene identity between all pairs of populations that share the same most recent common ancestor is equal in expectation. Using hypothesis tests based on generalized hierarchical statistical models, we find good agreement between the SFE predictions and diversity within and between African populations, and we find good agreement between the SFE predictions and diversity between non-African populations. However, there is more diversity within the non-African populations than the SRE model can account for. This makes for greater genetic distance between Africans and non-Africans than otherwise expected. How and where did the non-Africans obtain this diversity? A simple explanation for the finding is that the earliest migrants out-of-Africa mixed with an archaic population such as Neanderthals prior to their expansion throughout Europe and Asia. Coalescent based computer simulations of the SFE model with mixing support our interpretation. The time and place that we detect mixing coincides perfectly with that detected in a recent examination of Neanderthal genome sequences. Our study shows that genomic diversity in modern humans still reflects ancient events and processes.

C. Flores et al. Using EuroAIMs to measure admixture proportions in atypical European populations: the case of Canary Islanders

Using ancestry informative markers (AIMs) allows reducing the number of makers needed for population stratification adjustments in association studies. As few as 100 AIMs are sufficient to adjust for the largest European axis of differentiation (i.e. EuroAIMs). However, their use for ancestry inference and adjustment in association studies in atypical European populations such as the Canary Islanders, a recently African-admixed population from Spain, needs to be addressed. We aimed to explore whether EuroAIMs were suitable both for the inference of Spanish and Northwest African admixture proportions and for ancestry adjustments in association studies including samples from Canary Islanders. We analyzed samples from Canary Islanders, mainland Spanish (IBE) and Northwest Africans (NWA) for 93 EuroAIMs and compared the data with CEU and YRI from HapMap, Basques and Mozabite from HGDP, as well as from previously analyzed European samples. The major genetic difference was observed between NWA and all European populations, preserving the northwest-to-southeast differentiation of European populations in the second axis. Analyses revealed that Canary Islanders were intermediate between IBE and NWA, and that direct sub-Saharan African influences were negligible. Assessment of individual admixtures without prior population information clearly identified two subpopulations corresponding to NWA and IBE, while Canary Islanders were admixed with an average of 17.4% Northwest African contribution varying largely among individuals (range 0-95.7%). As few as 23 EuroAIMs correctly estimated population membership to IBE and NWA, while 69 EuroAIMs were required to accurately estimate individual admixture proportions in Canary Islanders. Ancestry estimates based on a subset of 69 EuroAIMs also controlled significant allele frequency differences between IBE and Canary Islanders. These data suggest that a handful of EuroAIMs would be useful to control false-positives in association studies performed in Spanish populations. Supported by FUNCIS 23/07 and grants from the Spanish Ministry of Science and Innovation PI081383 and EMER07/001 to CF.

As I have I mentioned before, the Maasai (and many other east Africans in various degrees) are intermediate between Negroids and Caucasoids, and hence admixture estimates considering Yoruba Nigerians would tend to underestimate the African element. It's important to remember that extant Africans are not uniform, ranging from Caucasoids to Negroids, Pygmies, and Khoi-San, with multiple identifiable clusters within the major Negroid group itself, and all sorts of between-group gene flow in a regional basis. It is always useful (as is the case e.g., with African Americans) to both use historical knowledge about population sources, and also to validate historical narratives with the genetic evidence.
R. L. Raaum et al. Autosomal African admixture in Yemeni populations.

Approximately 30% of mtDNA lineages in South Arabian samples are African L haplotypes, whose origin has usually been attributed to migration and assimilation of African females into the Arabian population over approximately the last 2,500 years. Few In contrast, few Y chromosome lineages of clear recent sub-Saharan African origin have been found in Southern Arabian populations. This bias in maternal and paternal lineages is in accord with historical accounts of the female bias in the Middle Eastern slave trade. In order to evaluate autosomal African ancestry, we collected high-resolution SNP genotype data from a geographically representative set of 62 Yemenis selected from a collection of 552 samples acquired in the Spring of 2007. The ancestry of chromosomal segments in the Yemeni population was estimated using a haplotype-based local ancestry estimation method, HAPMIX. The HAPMIX method is based on a two way admixture model that requires two phased reference populations; we used the HapMap Yoruba in Ibadan, Nigeria (YRI), Luhya in Webuye, Kenya (LWK), Maasai in Kinyawa, Kenya (MKK), and CEPH US residents with ancestry from northern and western Europe (CEU) samples. The three African reference populations include two Bantu-speaking groups (YRI and LWK) and one Nilotic-speaking group (MKK). We estimated local ancestry in the Yemeni sample with all three European-African reference population combinations (CEU-YRI, CEU-LWK, CEU-MKK). The correlations among African ancestry calculated using all three reference population combinations are high (r > 0.98 in all pairwise correlations). Furthermore, there is no significant difference between the average proportion of African ancestry in Yemenis calculated using either of the two Bantu-speaking reference populations: CEU-YRI (mean 0.062, sd 0.044) and CEU-LWK (mean 0.076, sd 0.049) (p=0.13, two-tailed Welch two sample t-test). However, the average African ancestry calculated using the Maasai reference population (CEU-MKK, mean 0.148, sd 0.060) is significantly greater from that calculated using either the Yoruba or Luhya reference populations (p less than 0.0001 in both comparison, two-tailed Welch two sample t-test). These data suggest that the source population for the African ancestry of the Yemeni population is more similar to the contemporary Maasai population than either the Luhya or Yoruba.

The next abstract seems fun; it's always nice to see something that isn't like everything that came before it.
T. Rzeszutek et al. Music as a novel marker in the study of prehistoric human migrations.

The study of prehistoric human population history is often fraught with controversy owing to incongruent evidence among various markers of present-day genetic and cultural diversity. While archaeological evidence can be used to calibrate the conclusions drawn from present-day diversity, the fickle nature of the fossil record leaves some migration histories unresolved. Our work analyzes the potential of music - in particular, vocal music - to serve as novel migration marker, bolstering established migration work and shedding light on regions of the world whose settlement history is contested. One such migration is the recent expansion of Austronesian-speaking peoples across the Pacific within the last 6000 years. The dominant hypothesis posits a recent origin in Taiwan, with a rapid movement southwards and eastwards to populate Polynesia during the following 3500 years. While this model is strongly supported by both archaeological evidence and the present-day distribution of linguistic diversity, our goal was to analyze whether music could serve as a novel line of evidence in the study of Pacific prehistory. A critical concern regarding any migration marker is its time depth. In order to examine this for music, we analyzed correlations between musical diversity and mitochondrial-DNA diversity in 9 Taiwanese aboriginal tribes for which both types of data were available. A sample of 226 choral songs was analyzed using 39 binary characters representing significant structural features of music (e.g., rhythm, interval size, melodic contour, etc.). The musical samples were restricted to ritual musics, which constitute the most conservative (i.e., slowly changing) component of a culture’s repertoire. Mantel tests showed a significant correlation between musical distance and genetic distance among these 9 tribes, suggesting that music may have a time depth comparable to widely-used genetic markers like mitochondrial DNA. This work demonstrates that music has the potential to enrich the conclusions drawn from other markers, and establishes methods for employing it as a tool in the study of prehistoric human movements throughout the world. At the same time, we want to capitalize on music’s own unique dynamics of change over time and place, particularly its capacity for admixture. In other words, music might not only be able to support the narratives told by other migration markers but shed new light on the histories of population movement and cultural contact.

The bolded part in the following abstract makes sense, as it indicates (i) the distinctiveness of Ashkenazi Jews compared to CEU Europeans, and (ii) the fairly recent widespread formation of admixed individuals (in the last couple of generations) which generated individuals that are 1/4 1/2 and 3/4 AJ genomically.

V. Vacic et al., Admixture in Ashkenazi Jewish cohorts and implications for association studies.

Studies of complex genetic disorders may benefit from focusing on population isolates, such as Ashkenazi Jews (AJ). However, in order to truly exploit the advantages of reduced genetic diversity the self-declared AJ ancestry of study participants should be independently confirmed with available genetic data. We investigate whether the AJ cohorts display genetic heterogeneity, such as e.g. different rate of admixing in cases and controls, which could potentially confound disease association studies. We applied principal component analysis (PCA) to AJ cohorts ascertained in Israel and the US East Coast with the goal of characterizing population structure. As described previously, when compared to the HapMap samples with CEU, YRI and CHB/JPT ancestry, virtually all AJ samples cluster with the CEU. Similar analysis done on CEU and Jewish HapMap samples from Ashkenazi, Sephardic and Middle Eastern Jewish communities revealed that 97.8% of AJ samples cluster along the AJ-CEU axis, with modes at AJ and CEU cluster centers and at approximately quartile distances between them. We postulate that these groups correspond to 100-0, 75-25, 50-50, 25-75, and 0-100% AJ-CEU admixtures. Notably, only 91.7% of self-reported AJ individuals fall into the reference JHapMap panel AJ cluster, with 1.6, 3.3, 0.5 and 0.7% in the admixed modes ordered by decreasing fraction of AJ ancestry. We also observe admixing with the non-AJ Jewish communities: 0.7% of samples fall within the non-AJ clusters and 1.4% at a subgroup approximately halfway between the AJ and non-AJ cluster centers. In our dataset we found that when compared to the sample as a whole or only to controls, individuals with Crohn’s disease (CD) show significantly more admixing: 78.1, 3.1, 8.5, 2.0 and 0.9% in the 100, 75, 50, 25 and 0% AJ subgroups respectively. Also, CD samples show more admixing with non-AJ groups (2.8 and 1.0% in the 50-50 and 0-100 AJ-non-AJ subgroups). Isolates typically exhibit a greater amount of cryptic relatedness compared to outbred populations, which motivates an orthogonal method for verifying AJ ancestry based on identity-by-descent (IBD). The high background level of IBD within the Ashkenazi Jewish community can be used to estimate degree of AJ ancestry by averaging the IBD between a sample under study and the AJ individuals in the JHapMap panel. Our preliminary results show that this method recapitulates the high-level results from the PCA analysis and provides better resolution.