March 13, 2012

Auxiliary TreeMix scripts

Joe Pickrell has put up a useful Python script for TreeMix in the software page. It takes in PLINK-formatted input and transforms it into TreeMix-formatted input. You only need to specify a PLINK cluster file to indicate the correspondence between individuals and populations.

I was planning to release my own R script that does the same, and which I've used to produce this, but it is now redundant.

But since some people might want to try TreeMix with ADMIXTURE components, I am releasing my script that takes ADMIXTURE output and produces TreeMix output.

To use it, type in R:

ADMIXTUREtoTreeMix(PFILE='data.5.P', QFILE='data.5.Q', NAMES='components.txt', outfile='plink.treemix', GZIP=T)

data.5.P and data.5.Q are produced by an ADMIXTURE run.

This will output a plink.treemix.gz file that is ready for input into TreeMix

If you specify GZIP=F, the file won't be gzipped, so you can inspect it visually and gzip it later yourself, since TreeMix takes *.gz input.

The 'components.txt' file is a list of names for the components of ADMIXTURE, one per row.

1 comment:

Aditi Sharma said...

Could you please specify the structure of cluster file? The cluster file I have (plink output) is somehow not working. I have bed, bim, fam, ped and cluster file for my data but the python script supplied along with treemix is probably unable to recognize it. Requesting tips and pointers on that. Thanks