The following 59 words could not be found in the dictionary of 615 words (including 615 LocalSpellingWords) and are highlighted below:

10e6   and   bar   based   be   below   bioconductor   C0   Can   cat   current   D0   dir   down   fastq   Fastq   file   files   for   fqfiles   fqpath   from   gz   in   index   library   million   Name   names   new   of   on   paste   path   pattern   random   Random   Read   reads   s1   Sampler   samples   sampling   script   sep   Short   sub   systematically   take   takes   The   the   to   Use   useful   with   write   yield   zipd  

Clear message

Random sampling of fastq files

Use the ShortRead library from bioconductor to take random samples of a fastq file. Can be useful for systematically down sampling data.

The script below reads a directory of g'zipd fastq files with names based on a bar code index, takes 1 million reads a random, and write a new file to the current directory.

library(ShortRead)

fqpath <- "/fastq/C0TD0ACXX/"
fqfiles <- dir(pattern="*[AGCT].fastq.gz$", path=fqpath)

for( f in fqfiles){
   cat("sampling",f,"\n")
   s1 <- FastqSampler(paste(fqpath,f,sep=""),10e6)
   newName <- sub(".gz","",f)
   writeFastq(yield(s1),file=newName)
}

R/Fastq (last edited 2013-03-11 18:00:27 by ChrisSeidel)