I made a quick test using data from the Li et al. version of the HGDP set that I'm most familiar with:
Here is the output of qp3pop:
## qp3Pop version: 204
nplist: 1
number of blocks for block jackknife: 553
snps: 640789
Source 1 Source 2 Target f_3 std. err Z SNPs
result: Sardinian Yoruba Mozabite -0.021157 0.000395 -53.598 617103
##end of qp3Pop
So, it discovered that Mozabites are a West Eurasian/Sub-Saharan mix.
And, here's the output of qpF4ratio:
## qpF4ratio version: 300
nplist: 1
  0                  San    5
  1                  Han   44
  2             Mozabite   29
  3            Sardinian   28
  4               Yoruba   21
jackknife block size:     0.050
snps: 640789  indivs: 127
number of blocks for block jackknife: 553
                                                                                                           alpha     std. err  Z (null=0)
 result:        San        Han   Mozabite  Sardinian  :        San        Han     Yoruba  Sardinian     0.265350     0.003628     73.143
## end of run
So, it estimates Mozabites as 26.5% African, which seems like it is in the right ballpark.
Hopefully I'll be able to try rolloff soon. According to the README:
To run the main ROLLOFF program, type the following on a linux machine. Do not run the program locally as it requires a lot of memory.Fingers crossed that I will have enough.
 
6 comments:
Hello Dienekes, could ask your advice on something. I'll be working on some hundreds of WGS (formatted as vcf, bam/sam, or even .fa) on which I'll want to run an analysis checking their ancestry, to make sure that people are (roughly) from where they say they are. As a simple example, we will want to determine whether a sample is primarily of European, East Asian, or African descent.
Of course, one could manually check each sample for a small number of unequally-MAF markers, but I expect there must be a smarter package already made for the task. What do you think?
You should convert the vcf's to plink format and use ADMIXTURE together with a reference set (e.g., the HGDP set). This ought to be enough for coarse-grained assessment of ancestry.
Mozabites as 26.5% African is not realistic as if we look at haplogroup distribution we have:
1) Y-Dna (N=67, Dugoujon 2009):
E1b1b1b (M81): 86.6%
E1b1b1a (M78):1.5%
E1a: 3%
E1b1ba: 1.5%
J1: 1.5%
G: 1.5%
R1b:3%
Total Sub-saharan (E1a+E1b1ba) = 4.5%
2) MtDna (N=85 Coudray 2009)
Eurasian lineages: 54.1%
North African lineages (U6,M1):33%
Sub-shararan lineages (L) :12.9%
So we should get on average about (4.5 + 12.9)/2 = 8-9% Sub-saharan which is similar to what 23andme gets for the Mozabite sample.
There is no reason to calculate the Sub-Saharan admixture in Mozabites on the basis of mtDNA and Y-chromosomes, when we can do so using autosomal DNA.
The HGDP Mozabite sample has 16-25.8% admixture based on 3 different methods:
http://dienekes.blogspot.com/2011/04/comparing-five-methods-of-admixture.html
The 26.5% inferred using the f4 ratio test is in great agreement with the 25.8% inferred by ADMIXTURE.
yes but one problem with Admixture and similar tools is that if the parental populations are not well represented it can lead to wrong results. For exemple see comment from Henn
"Maghrebi or Near Eastern diversity that is not present in the panel populations is more likely to be assigned to the more diverse, Sub-Saharan African ancestry"
http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002397
The 26.5% inferred using the f4 ratio test is in great agreement with the 25.8% inferred by ADMIXTURE.
ADMIXTURE and f4 ratio seem to be much more in agreement with each other in their calculations of Negroid admixture in Caucasoids than in their calculations of Mongoloid admixture in Caucasoids. This must be because of the fact that Caucasoids are genetically much closer to Mongoloids than to Negroids and that, unlike f4 ratio, ADMIXTURE is not good at detecting ancient racial admixtures, especially between genetically relatively close races like Caucasoids and Mongoloids.
Post a Comment