Random sampling of fastq files
Use the ShortRead library from bioconductor to take random samples of a fastq file. Can be useful for systematically down sampling data.
The script below reads a directory of g'zipd fastq files with names based on a bar code index, takes 1 million reads a random, and write a new file to the current directory.
library(ShortRead)
fqpath <- "/fastq/C0TD0ACXX/"
fqfiles <- dir(pattern="*[AGCT].fastq.gz$", path=fqpath)
for( f in fqfiles){
cat("sampling",f,"\n")
s1 <- FastqSampler(paste(fqpath,f,sep=""),10e6)
newName <- sub(".gz","",f)
writeFastq(yield(s1),file=newName)
}