You are not allowed to edit this page.

Clear message

Random sampling of fastq files

Use the ShortRead library from bioconductor to take random samples of a fastq file. Can be useful for systematically down sampling data.

The script below reads a directory of g'zipd fastq files with names based on a bar code index, takes 1 million reads a random, and write a new file to the current directory.

library(ShortRead)

fqpath <- "/fastq/C0TD0ACXX/"
fqfiles <- dir(pattern="*[AGCT].fastq.gz$", path=fqpath)

for( f in fqfiles){
   cat("sampling",f,"\n")
   s1 <- FastqSampler(paste(fqpath,f,sep=""),10e6)
   newName <- sub(".gz","",f)
   writeFastq(yield(s1),file=newName)
}