#acl ChrisSeidel:read,write,delete,revert All:read --- notes on runnning MACS --- For replicates: {{{ From Ivan Gregoretti (Oct 29 2009, 8:43 am): Hi Yan, If you are using Solexa, just concatenate the relevant files. Make sure you get organised first. Lets say you are doing an analysis that for your own records you want to call '123'. The idea is to pool as may records as possible from both treatment and control. So, for example, I do: # pooling treatment cat /some_run/s_1_eland_multi.txt /some_other_run/s_1_eland_multi.txt > ./s_t_123_eland_multi.txt # pooling control cat /some_run/s_2_eland_multi.txt /some_other_run/s_2_eland_multi.txt > ./s_c_123_eland_multi.txt Notice that in this example the assumption is that your treatments are always lane 1 and your controls are always lane 2. IMPORTANT: Before pooling treatments, make sure both replicates are of comparable quality otherwise you may be 'polluting' your best replicate with the 'not so good' one. In my experience, it is far better to have a few but good tags rather than to have a big sample with a lot of random tags (noise). Actually, I would like to hear what people normally use to assess sample quality. Ivan Ivan Gregoretti, PhD National Institute of Diabetes and Digestive and Kidney Diseases National Institutes of Health 5 Memorial Dr, Building 5, Room 205. Bethesda, MD 20892. USA. Phone: 1-301-496-1592 Fax: 1-301-496-9878 From Tao Liu (Oct 29 2009, 9:50 am), Hi Yan, The script in MACS website is contributed by one of our user, so it's not a must-do process. You can convert .ma files into BED files first, which should be no problem for concatenation. For the step2, try Jim Kent's 'randomLines' tool in command line which will give you a fixed number of tags randomly picked from your input: $randomLines randomLines - Pick out random lines from file usage: randomLines inFile count outFile options: -seed=N - Set seed used for randomizing, useful for debugging. -decomment - remove blank lines and those starting with For more detail, check genomewiki.ucsc . If you don't want to compile JK's source code and just want a binary for this tool, send an email to me, I have the x86_64 binary file for this tool. Also, I never manually select the same number of control tags to match the IP tags. MACS will calculate the ratio between IP and control. Before concatenating replicates, I recommend this process. First, run MACS on each replicates to get wiggle files for tags pileup, then calculate the correlation values between the wiggle files of replicates. I have some scripts to do that. The idea is very simple, for every several kbps in the genome, calculate the mean/median tag count from the wiggle files then pair the means/medians from replicates at the same position, then use R's functions to calculate correlations and draw some paired scatterplots between replicates. Best, Tao }}} Balancing Reads from different files: {{{ perl -wlen 'if(rand(1) < N_SMALL/N_LARGE ){print}' large_file.bed > sampled_large.bed }}} If you have a large file and a small file, you can sample the large file to match the number of reads in the smaller file with the line above. Alternatively you can use Jim Kent's randomLines program from the Kent source tree to do this (if you have that installed).