Revision 6 as of 2008-03-16 06:29:09

Clear message

Using R to evaluate DGE for differential expression

Moderated statistical tests for assessing differences in tag abundance. (2007). Robinson MD, Smyth GK. Bioinformatics. 23(21):2881-7. PMID: 17881408

The authors of this paper develop a statistical test using a negative binomial distribution to model noise in the data. They release R code (available at bioinf.wehi.edu.au/resources/) to support their treatment, but it is uncommented, and not functional as is.

Here I apply the code and examine the results in a sample data set.

At the website above you can find a tar archive called: msage_0.7.tar.gz. I can't get the package to install in R, so I simply unpack it and use the .R files directly. In addition, the code calls a code block (kepler.R) which is not included in the distribution. However, it is available here: http://dulci.biostat.duke.edu/sage/sage_analysis_code.r

Basically there are two files of R code in the msage package: fitNBP.R and msage.R that can be sourced. The file msage.R has a source statement inside: source("kepler.R"), however this code block is not included. To get the code to function, download sage_analysis.r, and rename it kepler.R (written by Thomas Kepler at Duke). It contains a function called: glm.poisson.disp() that is called by msage.R.

Assemble some data

There is a function in msage which can assemble data from individual tab-delimited tag files of the form: tag, count. It's called: read.sage() and takes an array of file names to read. It returns a list with two elements: a data matrix in the first position, and an array of library sizes as the second element. The input files simply need to be lists of tags (or gene names) and the associated counts, separated by a tab character. They do not have to be normalized to tpm.

However I already have my data in a table.

# set working directory and load library functions
setwd("E:/cry1/Solexa/Smyth")
source("fitNBP.R")
source("msage.R")

CategoryR