The following 243 words could not be found in the dictionary of 615 words (including 615 LocalSpellingWords) and are highlighted below:

above   Actually   Alternatively   always   am   an   analysis   and   assess   assumption   at   Balancing   be   bed   Before   best   Best   Bethesda   better   between   big   binary   blank   both   Building   but   by   calculate   call   can   cat   check   command   comparable   compile   concatenate   concatenating   concatenation   contributed   control   controls   convert   correlation   correlations   count   debugging   decomment   detail   Diabetes   different   Digestive   Diseases   doing   don   Dr   draw   each   edu   eland   every   example   experience   far   Fax   few   file   File   files   fixed   For   for   From   from   functions   genome   genomewiki   get   give   good   Gregoretti   Health   hear   Hi   idea   if   If   in   In   index   input   installed   Institute   Institutes   into   Ivan   Jim   just   kbps   Kent   Kidney   lane   large   Lets   like   line   Lines   lines   Liu   lot   ma   Make   make   manually   match   me   mean   means   median   medians   Memorial   more   multi   must   my   National   never   no   noise   normally   notes   Notice   number   Oct   of   on   one   options   organised   other   otherwise   our   out   own   pair   paired   people   perl   Ph   Phone   php   Pick   picked   pileup   polluting   pool   pooling   position   possible   problem   process   program   quality   rand   random   randomizing   randomly   rather   ratio   Reads   reads   recommend   records   relevant   remove   replicate   replicates   Room   run   runnning   same   sample   sampled   say   scatterplots   script   scripts   seed   select   send   Set   several   should   simple   small   smaller   So   so   Solexa   some   source   starting   step2   sure   tag   tags   Tao   than   that   The   the   then   this   those   to   tool   treatment   treatments   tree   try   ucsc   usage   use   used   useful   user   using   utilities   values   very   want   website   what   which   wiggle   will   with   wlen   would   x86   Yan   You   you   your  

Clear message

--- notes on runnning MACS ---

For replicates:

From Ivan Gregoretti (Oct 29 2009, 8:43 am):

Hi Yan,

If you are using Solexa, just concatenate the relevant files. Make
sure you get organised first.

Lets say you are doing an analysis that for your own records you want
to call '123'. The idea is to pool as may records as possible from
both treatment and control. So, for example, I do:

# pooling treatment
cat /some_run/s_1_eland_multi.txt /some_other_run/s_1_eland_multi.txt

> ./s_t_123_eland_multi.txt

# pooling control
cat /some_run/s_2_eland_multi.txt /some_other_run/s_2_eland_multi.txt

> ./s_c_123_eland_multi.txt

Notice that in this example the assumption is that your treatments are
always lane 1 and your controls are always lane 2.

IMPORTANT: Before pooling treatments, make sure both replicates are of
comparable quality otherwise you may be 'polluting' your best
replicate with the 'not so good' one. In my experience, it is far
better to have a few but good tags rather than to have a big sample
with a lot of random tags (noise). Actually, I would like to hear what
people normally use to assess sample quality.

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878 

From Tao Liu (Oct 29 2009, 9:50 am), 

Hi Yan,

The script in MACS website is contributed by one of our user, so it's  
not a must-do process. You can convert .ma files into BED files first,  
which should be no problem for concatenation.

For the step2, try Jim Kent's 'randomLines' tool in command line which  
will give you a fixed number of tags randomly picked from your input:

$randomLines
randomLines - Pick out random lines from file
usage:
    randomLines inFile count outFile
options:
    -seed=N - Set seed used for randomizing, useful for debugging.
    -decomment - remove blank lines and those starting with

For more detail, check genomewiki.ucsc <http://genomewiki.ucsc.edu/index.php/Kent_source_utilities
 >. If you don't want to compile JK's source code and just want a  
binary for this tool, send an email to me, I have the x86_64 binary  
file for this tool.

Also, I never manually select the same number of control tags to match  
the IP tags. MACS will calculate the ratio between IP and control.

Before concatenating replicates, I recommend this process. First, run  
MACS on each replicates to get wiggle files for tags pileup, then  
calculate the correlation values between the wiggle files of  
replicates. I have some scripts to do that. The idea is very simple,  
for every several kbps in the genome, calculate the mean/median tag  
count from the wiggle files then pair the means/medians from  
replicates at the same position, then use R's functions to calculate  
correlations and draw some paired scatterplots between replicates.

Best,
Tao 

Balancing Reads from different files:

perl -wlen 'if(rand(1) < N_SMALL/N_LARGE ){print}' large_file.bed > sampled_large.bed

If you have a large file and a small file, you can sample the large file to match the number of reads in the smaller file with the line above. Alternatively you can use Jim Kent's randomLines program from the Kent source tree to do this (if you have that installed).

MACS (last edited 2011-06-24 18:07:45 by ChrisSeidel)