#acl ChrisSeidel:read,write,delete,revert All:read

--- notes on runnning MACS ---

For replicates:

{{{

From Ivan Gregoretti (Oct 29 2009, 8:43 am):

Hi Yan,

If you are using Solexa, just concatenate the relevant files. Make
sure you get organised first.

Lets say you are doing an analysis that for your own records you want
to call '123'. The idea is to pool as may records as possible from
both treatment and control. So, for example, I do:

# pooling treatment
cat /some_run/s_1_eland_multi.txt /some_other_run/s_1_eland_multi.txt

> ./s_t_123_eland_multi.txt

# pooling control
cat /some_run/s_2_eland_multi.txt /some_other_run/s_2_eland_multi.txt

> ./s_c_123_eland_multi.txt

Notice that in this example the assumption is that your treatments are
always lane 1 and your controls are always lane 2.

IMPORTANT: Before pooling treatments, make sure both replicates are of
comparable quality otherwise you may be 'polluting' your best
replicate with the 'not so good' one. In my experience, it is far
better to have a few but good tags rather than to have a big sample
with a lot of random tags (noise). Actually, I would like to hear what
people normally use to assess sample quality.

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878 

From Tao Liu (Oct 29 2009, 9:50 am), 

Hi Yan,

The script in MACS website is contributed by one of our user, so it's  
not a must-do process. You can convert .ma files into BED files first,  
which should be no problem for concatenation.

For the step2, try Jim Kent's 'randomLines' tool in command line which  
will give you a fixed number of tags randomly picked from your input:

$randomLines
randomLines - Pick out random lines from file
usage:
    randomLines inFile count outFile
options:
    -seed=N - Set seed used for randomizing, useful for debugging.
    -decomment - remove blank lines and those starting with

For more detail, check genomewiki.ucsc <http://genomewiki.ucsc.edu/index.php/Kent_source_utilities
 >. If you don't want to compile JK's source code and just want a  
binary for this tool, send an email to me, I have the x86_64 binary  
file for this tool.

Also, I never manually select the same number of control tags to match  
the IP tags. MACS will calculate the ratio between IP and control.

Before concatenating replicates, I recommend this process. First, run  
MACS on each replicates to get wiggle files for tags pileup, then  
calculate the correlation values between the wiggle files of  
replicates. I have some scripts to do that. The idea is very simple,  
for every several kbps in the genome, calculate the mean/median tag  
count from the wiggle files then pair the means/medians from  
replicates at the same position, then use R's functions to calculate  
correlations and draw some paired scatterplots between replicates.

Best,
Tao 

}}}

Balancing Reads from different files:
{{{
perl -wlen 'if(rand(1) < N_SMALL/N_LARGE ){print}' large_file.bed > sampled_large.bed
}}}
If you have a large file and a small file, you can sample the large file to match the number of reads in the smaller file with the line above. Alternatively you can use Jim Kent's randomLines program from the Kent source tree to do this (if you have that installed).