I have removed some populations from the previous run (such as Moroccan Jews and Samaritans) that tended to generate mini-clusters due to the presence of close relatives and/or inbreeding in the sample. I have removed some redundant populations to even out the dataset, and I have also added North Kannadi and Gujarati, which helped reveal the gradient of ancestry in South Asia.
Some interesting observations:
- The occurrence of 3.8% South Asian in Romanians may signify its Roma population. Indeed, almost all of this comes from a 25% South Asian individual, almost certainly a Roma.
- The small African component in Spaniards which was revealed in a previous K=8 run turns out to be East African (0.5%) rather than West African (0.1%). If this holds up in larger sets then it might signify that its origin is from East African admixed populations from the east, rather than Sub-Saharan Africans.
- The multiplicity of ancestries of the Uygur is made evident, in agreement with the extensive craniometric and genetic data on prehistoric and extant populations from the area.
- The proportion of the two East Eurasian components in Turkic populations is interesting. It seems that the earliest departures from the Turkic homeland (such as the Chuvash and Yakut) have a predominance of the NE Asian component, the Anatolian Turks are intermediate, and the Uygurs, the only ones to have stayed close to the homeland, have experienced an increase in the E Asian component.
- The absence of the West African component in Ethiopians is striking. Here are the individual results for Ethiopians, illustrating the variability of the Southwest African vs. East African components. The Ethiopian sample consists of a number different ethnic groups of the country, some of which (like the Amharas) are of Western Eurasian linguistic origin.
I am currently running K=11 and K=12 on the exact same data to see how the LogLikelihood and Bayes Information Criterion will move and whether new mini-clusters will appear, or if the mega-components (such as the "West Asian", "South European", and "North European") will split informatively. I will update this post with information on what actually happened, and with additional plots -- if I get robust results.