- Clusters Galore
- The Dodecad Oracle
The Dodecad Oracle tries to address this problem, by using simple geometry to estimate 2-way mixes between populations. Individuals are projected onto lines formed by population pairs. An individual X that can be expressed as a mixture of A and B will tend to fall on the line segment AB, or close to it: the distance between X and AB is a measure of the closeness of fit. There are two downsides to this approach:
- The limitation to two populations
- The fact that different "populations" may in fact be different samples from the same population (e.g., the Behar et al. (2010) Ashkenazy_Jews and the Dodecad Project Ashkenazi_D populations)
The DRACOS pipeline
Fine-scale admixture estimation can be achieved by putting together these three ideas. I have called this new technique DRACOS:
- Dimensionality Reduction
- Analysis into COmponents
- Structure estimation
1. Dimensionality Reduction: Use PCA or MDS to convert genotype data into a few principal components or MDS dimensions
2. Analysis into Components: Use MCLUST over the MDS/PCA representation to infer the presence of clusters at a fine scale
3. Identify sets of individuals that clearly belong to each of the clusters; one can use a filter based on posterior probability (e.g., greater 0.99) and/or distance from the cluster centroid (e.g., the 30 closest individuals)
4. Convert these sets of cluster-typical individuals into zombies for use with ADMIXTURE; alternatively, their allele frequencies themselves can be used, as in DIYDodecad, or any other structure-like analysis.
The DRACOS approach addresses all the drawbacks of the three individual methods:
- Compared to Clusters Galore, it allows for admixture
- It allows one to create zombies at a fine-scale. ADMIXTURE cannot do this, both because of its O(K^2) running time, as well as its lack of the model-based sophistication of MCLUST as applied over the first few principal components.
- Admixture can be estimated with any number of ancestral populations, not just two
I have a few things running in parallel at this time, but I am pretty sure I will eventually release a DRACOS-based calculator on the Dodecad project page. I anticipate that such a tool, in conjunction with DIYDodecad's "byseg" and "target" modes may be helpful to genealogists, as it has the potential of inferring the geographical origin of segments of DNA at a finer level of detail.