a | b | c | d | e | |
b | 1 | ||||
c | 2 | 1 | |||
d | 1 | 2 | 2 | ||
e | 3 | 2 | 1 | 2 | |
f | 7 | 5 | 6 | 6 | 5 |
While haplotypes a-e are all within 1-3 mutations of each other, haplotype f is 5-7 mutations away from any other haplotypes. It looks like it "doesn't belong".
Haplotypes such as f present a challenge:
- Are they true outliers? They might be an artifact of lab error, or simply extreme examples of normal variation. In the above example, if more haplotypes had been sampled, many more "pals" of f might be found, and it will no longer appear to be isolated.
- If they are true outliers, how did they end up in the collection?
Spawn of the shipwrecked sailor
A popular explanation for outliers is that they are of foreign origin, the result of a chance event. According to this explanation, the distinctiveness of the outliers is due to being the product of a rare occurrence: a shipwrecked sailor, a lost explorer, a slave far from home, and so on.
To substantiate this as an explanation, it suffices to show that what is an "outlier" in a certain population X, is actually normal in another population Y. Then, it can be easily seen that the outlier may have ultimate origins in Y.
Relic of a bygone age
A different explanation is that outliers are relics of a previous age. Consider a country in which some important technological innovation, say farming, or iron, or the bow is introduced. Pretty soon, the inhabitants who acquire the new innovation may multiply in numbers, at the expense of their more isolated neighbors. Fast forward into the future, and the gene pool will be dominated by the closely related haplotypes of the "adopters" and the haplotypes of the "non-adopters" will stand out in the total population as oddities.
Implications for age estimation
Determining the cause of an outlier has important implications for determining the age of the common ancestor of the whole group:
- If the outlier is of foreign origin, then one must reject it, and age the remaining, more homogeneous haplotypes. This will lead to a younger age than if the entire group was used.
- If the outlier is a relic, then one must incorporate it, and downgrade the statistical weight of the larger more populous group; otherwise the age estimate will be dominated by the recently expanding group. This will lead to an older age than if the entire group was used.
Conclusion
The treatment of outliers in the existing literature is problematic. The default position seems to be not to analyze a haplotype group's substructure, and to use all sampled haplotypes. This may lead to either a substantial overestimation of the age (if foreign outliers are included), or a substantial underestimation (if relic outliers are given equal weight with the more populous main group).
Recommendation
For any collection of haplotypes, the first step should be to calculate the distribution of pairwise distances to detect outliers. Subsequently, a search of public databases or the literature should be performed to see if said outliers appear to be of foreign origin. Depending on this search (*), appropriate correction (inclusion/weighting) should be used in age estimation.
(*) Taking into account that the detection of foreign haplotypes depends on adequate sampling of the source population; hence, no matches in other populations do not imply non-foreign origin.
THe USA is one big mess, when it comes to DNA-genealogy. Most of white western Europeans came from the British Isles. So that skews the results toward that population. As for my mitochondrial DNA, I can trace it back to colonial times, but it doesn't seem to fall in with typical British. It belongs to a very small (thus far) group within U5b2, under the "11653" mutation and separate from the typically British group with the "4732" mutation. What I want to know is whether there are others in my group with Brtish roots; or whether they have continental roots, say, from Germany or etc.
ReplyDeleteInteresting Dienekes.
ReplyDeleteJust curious. can we extend this all the way to T , and Mt haplo. how do they show?.
Very interesting meditation, Dienekes. I'd say that "relics of the past" are not impossible at all and may actually be a high percentage of the cases. But it's difficult to decide, sure.
ReplyDeleteTo the bypasser: much of the nacestry in the USA is from continental origins (Germany specially), even those dating from colonial times. Dutch and Germans were not rare among "British" colonists specially. That is specially true of Pennsylvania, whose colonial population was mostly of German origin, but may also be the case in other of he original states (Dutch in New York, Swedes in Delaware, French Huguenots all around, etc.) I would certainly not discard such origins, even if genuinely colonial.
Just to clarify, because it's a very common misconception Europeans have about the US and many Americans have about themselves: German is the #1 ancestry of citizens of the United States, not Irish or English, which are #2 and #3 respectively. Just because the US is a predominantly English-speaking nation does not mean most Americans are English in paternity and/or maternity. They're not. A full 1/4 to 1/3 of US citizens are entirely or partially ethnic German in background. This is from both the 1990s and 2000 census data. Of course, Germany didn't exist as a unified state until the 1870s, but the notion of one German ethnolinguistic identity dates to the late Middle Ages and early Renaissance.
ReplyDeleteJust to clarify, because it's a very common misconception Europeans have about the US and many Americans have about themselves: German is the #1 ancestry of citizens of the United States, not Irish or English, which are #2 and #3 respectively.
ReplyDeleteI doubt that. It may very well be that Americans with deep roots going back to colonial times simply put "American" in census forms, whereas those of German descent who migrated in more recent times put their more specific origin.
English is probably the most important single European ethnic group contributing to the American population.