Due to the quadratic running time of ChromoPainter, I took a random sample of 15 individuals from every included population with more than15 individuals. The final set included 392 individuals. It appears that a set of ~400 individuals/~260k SNPs can be processed in about 2 weeks on a single thread.
The raw chunkcounts between all individuals can be obtained from here.
The heatmap can be seen below:
The principal components analysis, shows the familiar West-to-South Asia cline:
More information can be found in the spreadsheet, including:
- How many individuals from each population were assigned to each of 51 clusters
- Individual assignments of all 392 individuals
- Raw chunkcounts between all 33 different populations
- Z scores of the above (by row)
- Z scores of the above (by column)
- by row: scan each line to see which populations (columns) are the bigger donors for each row.
- by column: scan each column to see which populations (rows) are the bigger recipients for each column.
Finally, in the RAR file you can find some plots of Z scores (by row) for the different population.
For example, here is a list of donors for the Kalash population; the order is slightly different compared to the teaser, but the overall pattern is the same.
4 comments:
"More information can be found in the spreadsheet, including..."
Sorry, but I cannot find the spreadsheet.
I've added the link
Dienekes, it's a matter of taste, but the plot literally cries out to be transformed non-specially, to fit the geography of samples :)
How long are the "chunks"?
Post a Comment