Due to the quadratic running time of ChromoPainter, I took a random sample of 15 individuals from every included population with more than15 individuals. The final set included 392 individuals. It appears that a set of ~400 individuals/~260k SNPs can be processed in about 2 weeks on a single thread.

The raw chunkcounts between all individuals can be obtained from here.

The heatmap can be seen below:

The principal components analysis, shows the familiar West-to-South Asia cline:

More information can be found in the spreadsheet, including:

- How many individuals from each population were assigned to each of 51 clusters
- Individual assignments of all 392 individuals
- Raw chunkcounts between all 33 different populations
- Z scores of the above (by row)
- Z scores of the above (by column)

- by row: scan each line to see which populations (columns) are the bigger donors for each row.
- by column: scan each column to see which populations (rows) are the bigger recipients for each column.

Finally, in the RAR file you can find some plots of Z scores (by row) for the different population.

For example, here is a list of donors for the Kalash population; the order is slightly different compared to the teaser, but the overall pattern is the same.

"More information can be found in the spreadsheet, including..."

ReplyDeleteSorry, but I cannot find the spreadsheet.

I've added the link

ReplyDeleteDienekes, it's a matter of taste, but the plot literally cries out to be transformed non-specially, to fit the geography of samples :)

ReplyDeleteHow long are the "chunks"?

ReplyDelete