= References for Solexa sequencing technology = == Platform Comparison == * '''A comparison of microarray and MPSS technology platforms for expression analysis of Arabidopsis.''' (2007) Chen J, Agrawal V, Rattray M, West MA, St Clair DA, Michelmore RW, Coughlan SJ, Meyers BC. BMC Genomics. 8:414. [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&uid=17997849 PMID:17997849] * '''Analysis of tag-position bias in MPSS technology.''' (2006) Chen J, Rattray M. BMC Genomics. 7:77. [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&uid=16603069 PMID: 16603069] * '''Application of Affymetrix array and Massively Parallel Signature Sequencing for identification of genes involved in prostate cancer progression.''' Oudes AJ, Roach JC, Walashek LS, Eichner LJ, True LD, Vessella RL, Liu AY. (2005) BMC Cancer. 5:86. [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&uid=17997849 PMID:16042785] ~-Comparison of Affy with MPSS (Lynx). MPSS gene expression counted anything with 1 or more tags at 3' ends of known transcripts. Tags were summed for a locus (Gene ID).-~ * '''The use of MPSS for whole-genome transcriptional analysis in Arabidopsis.''' (2004) Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S. Genome Res. 14(8):1641-53. [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&uid=15289482 PMID:15289482] * '''Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing.''' (2004) Meyers BC, Vu TH, Tej SS, Ghazal H, Matvienko M, Agrawal V, Ning J, Haudenschild CD. Nat Biotechnol. (8):1006-11. [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&uid=15668391 PMID:15668391] == Statistical Models == * '''Moderated statistical tests for assessing differences in tag abundance.''' Robinson MD, Smyth GK. (2007). Bioinformatics. 23(21):2881-7. [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&uid=17881408 PMID: 17881408] ~- Discusses use of a negative binomial distribution to develop a statistical test for assessing DGE data. Points to a package in R. -~ * '''Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression.''' (2005). Stolovitzky GA, Kundaje A, Held GA, Duggar KH, Haudenschild CD, Zhou D, Vasicek TJ, Smith KD, Aderem A, Roach JC. PNAS 102:1402-7 [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&uid=15247925 PMID:15247925] ~- Massively Parallel Signature Sequencing (MPSS), a recently developed high-throughput transcription profiling technology, has the ability to profile almost every transcript in a sample without requiring prior knowledge of the sequence of the transcribed genes. As is the case with DNA microarrays, effective data analysis depends crucially on understanding how noise affects measurements. We analyze the sources of noise in MPSS and present a quantitative model describing the variability between replicate MPSS assays. We use this model to construct statistical hypotheses that test whether an observed change in gene expression in a pair-wise comparison is significant. This analysis is then extended to the determination of the significance of changes in expression levels measured over the course of a time series of measurements. We apply these analytic techniques to the study of a time series of MPSS gene expression measurements on LPS-stimulated macrophages. To evaluate our statistical significance metrics, we compare our results with published data on macrophage activation measured by using Affymetrix GeneChips. -~ === notes === * Tim Burcham, Chapter 4: High-throughput and industrial methods for mRNA expression analysis, Section 4.4.1.2.5 Data Handling and calculation of mRNA abundance, page 547, in "Analysing Gene Expression: A Handbook of Methods Possibilities and Pitfalls" (2003) by Stefan Lorkowski (Editor), Paul M. Cullen (Editor) ISBN: 3527304886 * Given that the scale of measurement is in the millions of reads per sample, and the observation from SAGE studies that an average cell contains a few hundred thousand RNA molecules (Velculescu et al., 1995), a suggested method of data normalization is to express tag counts for a gene in a sample in the form of transcripts per million. * Analysis of a large number of samples at Lynx suggests that a large percentage of genes are expressed at 10 to 100 transcripts per million.