Title: | RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods |
---|---|
Description: | This package provides extensive functionality for comparing results obtained by different methods for differential expression analysis of RNAseq data. It also contains functions for simulating count data. Finally, it provides convenient interfaces to several packages for performing the differential expression analysis. These can also be used as templates for setting up and running a user-defined differential analysis workflow within the framework of the package. |
Authors: | Charlotte Soneson [aut, cre] , Paul Bastide [aut] , Mélina Gallopin [aut] |
Maintainer: | Charlotte Soneson <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.43.0 |
Built: | 2025-01-21 03:26:51 UTC |
Source: | https://github.com/bioc/compcodeR |
RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods
This package provides extensive functionality for comparing results obtained by different methods for differential expression analysis of RNAseq data. It also contains functions for simulating count data and interfaces to several packages for performing the differential expression analysis.
Charlotte Soneson
compData
objectCheck the validity of a compData
object. An object that passes the check can be used as the input for the differential expression analysis methods interfaced by compcodeR
.
check_compData(object)
check_compData(object)
object |
A |
Charlotte Soneson
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100) check_compData(mydata)
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100) check_compData(mydata)
compData
result objectCheck the validity of a compData
object containing differential expression results. An object that passes the check can be used as the input for the method comparison functions in compcodeR
.
check_compData_results(object)
check_compData_results(object)
object |
A |
Charlotte Soneson
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) ## Check an object without differential expression results check_compData_results(mydata) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") resdata <- readRDS(file.path(tmpdir, "mydata_voom.limma.rds")) ## Check an object containing differential expression results check_compData_results(resdata)
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) ## Check an object without differential expression results check_compData_results(mydata) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") resdata <- readRDS(file.path(tmpdir, "mydata_voom.limma.rds")) ## Check an object containing differential expression results check_compData_results(resdata)
phyloCompData
objectCheck the validity of a phyloCompData
object.
An object that passes the check can be used as the input for the differential expression analysis methods interfaced by compcodeR
.
check_phyloCompData(object)
check_phyloCompData(object)
object |
A |
Charlotte Soneson, Paul Bastide
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), tree = ape::rphylo(10, 1, 0), lengths.relmeans = "auto", lengths.dispersions = "auto") check_phyloCompData(mydata)
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), tree = ape::rphylo(10, 1, 0), lengths.relmeans = "auto", lengths.dispersions = "auto") check_phyloCompData(mydata)
compData
object for compatibility with the differential expression functions interfaced by compcodeR
Check if a list or a compData
object contains the necessary slots for applying the differential expression functions interfaced by the compcodeR
package. This function is provided for backward compatibility, see also check_compData
and check_compData_results
.
checkDataObject(data.obj)
checkDataObject(data.obj)
data.obj |
A list containing data and condition information, or a |
Charlotte Soneson
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100) checkDataObject(mydata.obj)
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100) checkDataObject(mydata.obj)
runComparison
Check that the dataset
, nbr.samples
, repl
and de.methods
columns of a data frame are consistent with the information provided in the input files (given in the input.files
column of the data frame). If there are inconsistencies or missing information in any of the columns, replace the given information with the information in the input files.
checkTableConsistency(file.table)
checkTableConsistency(file.table)
file.table |
A data frame with columns named |
Returns a consistent file table defining the result files that will be used as the basis for a method comparison.
Charlotte Soneson
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = tmpdir, norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise") ## A correct table file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), datasets = c("mydata", "mydata"), nbr.samples = c(5, 5), repl = c(1, 1), stringsAsFactors = FALSE) new.table <- checkTableConsistency(file.table) new.table ## An incorrect table file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), datasets = c("mydata", "mydata"), nbr.samples = c(5, 3), repl = c(2, 1), stringsAsFactors = FALSE) new.table <- checkTableConsistency(file.table) new.table ## A table with missing information file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), stringsAsFactors = FALSE) new.table <- checkTableConsistency(file.table) new.table
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = tmpdir, norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise") ## A correct table file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), datasets = c("mydata", "mydata"), nbr.samples = c(5, 5), repl = c(1, 1), stringsAsFactors = FALSE) new.table <- checkTableConsistency(file.table) new.table ## An incorrect table file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), datasets = c("mydata", "mydata"), nbr.samples = c(5, 3), repl = c(2, 1), stringsAsFactors = FALSE) new.table <- checkTableConsistency(file.table) new.table ## A table with missing information file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), stringsAsFactors = FALSE) new.table <- checkTableConsistency(file.table) new.table
compData
objectThe compData
class is used to store information about the experiment, such as the count matrix, sample and variable annotations, information regarding the generation of the data and results from applying a differential expression analysis to the data. This constructor function creates a compData
object.
compData( count.matrix, sample.annotations, info.parameters, variable.annotations = data.frame(), filtering = "no info", analysis.date = "", package.version = "", method.names = list(), code = "", result.table = data.frame() )
compData( count.matrix, sample.annotations, info.parameters, variable.annotations = data.frame(), filtering = "no info", analysis.date = "", package.version = "", method.names = list(), code = "", result.table = data.frame() )
count.matrix |
A count matrix, with genes as rows and observations as columns. |
sample.annotations |
A data frame, containing at least one column named 'condition', encoding the grouping of the observations into two groups. The row names should be the same as the column names of the |
info.parameters |
A list containing information regarding simulation parameters etc. The only mandatory entries are
|
variable.annotations |
A data frame with variable annotations (with number of rows equal to the number of rows in
|
filtering |
A character string containing information about the filtering that has been applied to the data set. |
analysis.date |
If a differential expression analysis has been performed, a character string detailing when it was performed. |
package.version |
If a differential expression analysis has been performed, a character string giving the version of the differential expression packages that were applied. |
method.names |
If a differential expression analysis has been performed, a list with entries |
code |
If a differential expression analysis has been performed, a character string containing the code that was run to perform the analysis. The code should be in R markdown format, and can be written to an HTML file using the |
result.table |
If a differential expression analysis has been performed, a data frame containing the results of the analysis. The number of rows should be equal to the number of rows in
|
A compData
object.
Charlotte Soneson
count.matrix <- round(matrix(1000*runif(4000), 1000)) sample.annotations <- data.frame(condition = c(1, 1, 2, 2)) info.parameters <- list(dataset = "mydata", uID = "123456") cpd <- compData(count.matrix, sample.annotations, info.parameters)
count.matrix <- round(matrix(1000*runif(4000), 1000)) sample.annotations <- data.frame(condition = c(1, 1, 2, 2)) info.parameters <- list(dataset = "mydata", uID = "123456") cpd <- compData(count.matrix, sample.annotations, info.parameters)
The compData
class is used to store information about the experiment, such as the count matrix, sample and variable annotations, information regarding the generation of the data and results from applying a differential expression analysis to the data.
count.matrix
:The read count matrix, with genes as rows and samples as columns. Class matrix
sample.annotations
:A data frame containing sample annotation information for all samples in the data set. Must contain at least a column named condition
, encoding the division of the samples into two classes. The row names should be the same as the column names of count.matrix
. Class data.frame
info.parameters
:A list of parameters detailing the simulation process used to generate the data. Must contain at least two entries, named dataset
(an informative name for the data set/simulation setting) and uID
(a unique ID for the specific data set instance). Class list
filtering
:A character string detailing the filtering process that has been applied to the data. Class character
variable.annotations
:Contains information regarding the variables, such as the differential expression status, the true mean, dispersion and effect sizes. If present, the row names should be the same as those of count.matrix
. Class data.frame
analysis.date
:(If a differential expression analysis has been performed and the results are included in the compData
object). Gives the date when the differential expression analysis was performed. Class character
package.version
:(If a differential expression analysis has been performed and the results are included in the compData
object). Gives the version numbers of the package(s) used for the differential expression analysis. Class character
method.names
:(If a differential expression analysis has been performed and the results are included in the compData
object). A list, containing the name of the method used for the differential expression analysis. The list should have two entries: full.name
and short.name
, where the full.name
is the full (potentially long) name identifying the method, and short.name
may be an abbreviation. Class list
code
:(If a differential expression analysis has been performed and the results are included in the compData
object). A character string containing the code that was used to run the differential expression analysis. The code should be in R markdown format. Class character
result.table
:(If a differential expression analysis has been performed and the results are included in the compData
object). Contains the results of the differential expression analysis, in the form of a data frame with one row per gene. Must contain at least one column named score
, where a higher value corresponds to 'more strongly differentially expressed genes'. Class data.frame
signature(x="compData")
signature(x="compData",value="matrix")
:
Get or set the count matrix in a compData
object. value
should be a numeric matrix.
signature(x="compData")
signature(x="compData",value="data.frame")
:
Get or set the sample annotations data frame in a compData
object. value
should be a data frame with at least a column named 'condition'.
signature(x="compData")
signature(x="compData",value="list")
:
Get or set the list with info parameters in a compData
object. value
should be a list with at least elements named 'dataset' and 'uID'.
signature(x="compData")
signature(x="compData",value="character")
:
Get or set the information about the filtering in a compData
object. value
should be a character string describing the filtering that has been performed.
signature(x="compData")
signature(x="compData",value="data.frame")
:
Get or set the variable annotations data frame in a compData
object. value
should be a data frame.
signature(x="compData")
signature(x="compData",value="character")
:
Get or set the analysis date in a compData
object. value
should be a character string describing when the differential expression analysis of the data was performed.
signature(x="compData")
signature(x="compData",value="character")
:
Get or set the information about the package version in a compData
object. value
should be a character string detailing which packages and versions were used to perform the differential expression analysis of the data.
signature(x="compData")
signature(x="compData",value="list")
:
Get or set the method names in a compData
object. value
should be a list with slots full.name
and short.name
, giving the full name and an abbreviation for the method that was used to perform the analysis of the data.
signature(x="compData")
signature(x="compData",value="character")
:
Get or set the code slot in a compData
object. value
should be a character string in R markdown format, giving the code that was run to obtain the results from the differential expression analysis.
signature(x="compData")
signature(x="compData",value="data.frame")
:
Get or set the result table in a compData
object. value
should be a data frame with one row per gene, and at least a column named 'score'.
An object of the class compData
can be constructed using the compData
function.
Charlotte Soneson
compData
object to a listGiven a compData
object, convert it to a list.
convertcompDataToList(cpd)
convertcompDataToList(cpd)
cpd |
A |
Charlotte Soneson
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250) mydata.list <- convertcompDataToList(mydata.obj)
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250) mydata.list <- convertcompDataToList(mydata.obj)
compData
objectGiven a list with data and results (resulting e.g. from compcodeR
version 0.1.0), convert it to a compData
object.
convertListTocompData(inp.list)
convertListTocompData(inp.list)
inp.list |
A list with data and results, e.g. generated by |
Charlotte Soneson
convertListTocompData(list(count.matrix = matrix(round(1000*runif(4000)), 1000), sample.annotations = data.frame(condition = c(1,1,2,2)), info.parameters = list(dataset = "mydata", uID = "123456")))
convertListTocompData(list(count.matrix = matrix(round(1000*runif(4000)), 1000), sample.annotations = data.frame(condition = c(1,1,2,2)), info.parameters = list(dataset = "mydata", uID = "123456")))
phyloCompData
objectGiven a list with data and results (resulting e.g. from compcodeR
version 0.1.0), convert it to a phyloCompData
object.
convertListTophyloCompData(inp.list)
convertListTophyloCompData(inp.list)
inp.list |
A list with data and results, e.g. generated by |
Charlotte Soneson, Paul Bastide
tree <- ape::read.tree( text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);" ) count.matrix <- round(matrix(1000*runif(8000), 1000)) sample.annotations <- data.frame(condition = c(1, 1, 1, 1, 2, 2, 2, 2), id.species = c("A", "A", "A", "B", "C", "C", "D", "D")) info.parameters <- list(dataset = "mydata", uID = "123456") length.matrix <- round(matrix(1000*runif(8000), 1000)) colnames(count.matrix) <- colnames(length.matrix) <- rownames(sample.annotations) <- tree$tip.label convertListTophyloCompData(list(count.matrix = count.matrix, sample.annotations = sample.annotations, info.parameters = list(dataset = "mydata", uID = "123456"), tree = tree, length.matrix = length.matrix))
tree <- ape::read.tree( text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);" ) count.matrix <- round(matrix(1000*runif(8000), 1000)) sample.annotations <- data.frame(condition = c(1, 1, 1, 1, 2, 2, 2, 2), id.species = c("A", "A", "A", "B", "C", "C", "D", "D")) info.parameters <- list(dataset = "mydata", uID = "123456") length.matrix <- round(matrix(1000*runif(8000), 1000)) colnames(count.matrix) <- colnames(length.matrix) <- rownames(sample.annotations) <- tree$tip.label convertListTophyloCompData(list(count.matrix = count.matrix, sample.annotations = sample.annotations, info.parameters = list(dataset = "mydata", uID = "123456"), tree = tree, length.matrix = length.matrix))
phyloCompData
object to a listGiven a phyloCompData
object, convert it to a list.
convertphyloCompDataToList(cpd)
convertphyloCompDataToList(cpd)
cpd |
A |
Charlotte Soneson, Paul Bastide
tree <- ape::read.tree( text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);" ) id.species <- factor(c("A", "A", "A", "B", "C", "C", "D", "D")) names(id.species) <- tree$tip.label mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 4, n.diffexp = 100, tree = tree, id.species = id.species) mydata.list <- convertcompDataToList(mydata.obj)
tree <- ape::read.tree( text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);" ) id.species <- factor(c("A", "A", "A", "B", "C", "C", "D", "D")) names(id.species) <- tree$tip.label mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 4, n.diffexp = 100, tree = tree, id.species = id.species) mydata.list <- convertcompDataToList(mydata.obj)
.Rmd
file containing code to perform differential expression analysis with DESeq2A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the DESeq2 package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
DESeq2.createRmd( data.path, result.path, codefile, fit.type, test, beta.prior = TRUE, independent.filtering = TRUE, cooks.cutoff = TRUE, impute.outliers = TRUE, nas.as.ones = FALSE )
DESeq2.createRmd( data.path, result.path, codefile, fit.type, test, beta.prior = TRUE, independent.filtering = TRUE, cooks.cutoff = TRUE, impute.outliers = TRUE, nas.as.ones = FALSE )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
fit.type |
The fitting method used to get the dispersion-mean relationship. Possible values are |
test |
The test to use. Possible values are |
beta.prior |
Whether or not to put a zero-mean normal prior on the non-intercept coefficients. Default is |
independent.filtering |
Whether or not to perform independent filtering of the data. With independent filtering=TRUE, the adjusted p-values for genes not passing the filter threshold are set to NA. |
cooks.cutoff |
The cutoff value for the Cook's distance to consider a value to be an outlier. Set to Inf or FALSE to disable outlier detection. For genes with detected outliers, the p-value and adjusted p-value will be set to NA. |
impute.outliers |
Whether or not the outliers should be replaced by a trimmed mean and the analysis rerun. |
nas.as.ones |
Whether or not adjusted p values that are returned as |
For more information about the methods and the interpretation of the parameters, see the DESeq2
package and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Anders S and Huber W (2010): Differential expression analysis for sequence count data. Genome Biology 11:R106
try( if (require(DESeq2)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DESeq2", Rmdfunction = "DESeq2.createRmd", output.directory = tmpdir, fit.type = "parametric", test = "Wald") })
try( if (require(DESeq2)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DESeq2", Rmdfunction = "DESeq2.createRmd", output.directory = tmpdir, fit.type = "parametric", test = "Wald") })
.Rmd
file containing code to perform differential expression analysis with DESeq2 with custom model matrixA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the DESeq2 package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
DESeq2.length.createRmd( data.path, result.path, codefile, fit.type, test, beta.prior = TRUE, independent.filtering = TRUE, cooks.cutoff = TRUE, impute.outliers = TRUE, extra.design.covariates = NULL, nas.as.ones = FALSE )
DESeq2.length.createRmd( data.path, result.path, codefile, fit.type, test, beta.prior = TRUE, independent.filtering = TRUE, cooks.cutoff = TRUE, impute.outliers = TRUE, extra.design.covariates = NULL, nas.as.ones = FALSE )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
fit.type |
The fitting method used to get the dispersion-mean relationship. Possible values are |
test |
The test to use. Possible values are |
beta.prior |
Whether or not to put a zero-mean normal prior on the non-intercept coefficients. Default is |
independent.filtering |
Whether or not to perform independent filtering of the data. With independent filtering=TRUE, the adjusted p-values for genes not passing the filter threshold are set to NA. |
cooks.cutoff |
The cutoff value for the Cook's distance to consider a value to be an outlier. Set to Inf or FALSE to disable outlier detection. For genes with detected outliers, the p-value and adjusted p-value will be set to NA. |
impute.outliers |
Whether or not the outliers should be replaced by a trimmed mean and the analysis rerun. |
extra.design.covariates |
A vector containing the names of extra control variables to be passed to the design matrix of |
nas.as.ones |
Whether or not adjusted p values that are returned as |
For more information about the methods and the interpretation of the parameters, see the DESeq2
package and the corresponding publications.
The lengths matrix is used as a normalization factor and applied to the DESeq2
model in the way explained in normalizationFactors
(see examples of this function).
The provided matrix will be multiplied by the default normalization factor
obtained through the estimateSizeFactors
function.
The design
model used in the DESeqDataSetFromMatrix
uses the "condition" column of the sample.annotations
data frame from the phyloCompData
object
as well as all the covariates named in extra.design.covariates
.
For example, if extra.design.covariates = c("var1", "var2")
, then
sample.annotations
must have two columns named "var1" and "var2", and the design formula
in the DESeqDataSetFromMatrix
function will be:
~ condition + var1 + var2
.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson, Paul Bastide, Mélina Gallopin
Anders S and Huber W (2010): Differential expression analysis for sequence count data. Genome Biology 11:R106
Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. 10.1186/s13059-014-0550-8.
try( if (require(DESeq2)) { tmpdir <- normalizePath(tempdir(), winslash = "/") ## Simulate data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = 1:10, lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DESeq2", Rmdfunction = "DESeq2.length.createRmd", output.directory = tmpdir, fit.type = "parametric", test = "Wald", extra.design.covariates = c("test_factor", "test_reg")) })
try( if (require(DESeq2)) { tmpdir <- normalizePath(tempdir(), winslash = "/") ## Simulate data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = 1:10, lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DESeq2", Rmdfunction = "DESeq2.length.createRmd", output.directory = tmpdir, fit.type = "parametric", test = "Wald", extra.design.covariates = c("test_factor", "test_reg")) })
.Rmd
file containing code to perform differential expression analysis with DSSA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the DSS package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
DSS.createRmd(data.path, result.path, codefile, norm.method, disp.trend)
DSS.createRmd(data.path, result.path, codefile, norm.method, disp.trend)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. Possible values are |
disp.trend |
A logical parameter indicating whether or not to include a trend in the dispersion estimation. |
For more information about the methods and the interpretation of the parameters, see the DSS
package and the corresponding publications.
Charlotte Soneson
Wu H, Wang C and Wu Z (2013): A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics 14(2), 232-243
try( if (require(DSS)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DSS", Rmdfunction = "DSS.createRmd", output.directory = tmpdir, norm.method = "quantile", disp.trend = TRUE) })
try( if (require(DSS)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DSS", Rmdfunction = "DSS.createRmd", output.directory = tmpdir, norm.method = "quantile", disp.trend = TRUE) })
A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the EBSeq
package. The code is written to a .Rmd file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
EBSeq.createRmd(data.path, result.path, codefile, norm.method)
EBSeq.createRmd(data.path, result.path, codefile, norm.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. Possible values are |
For more information about the methods and the meaning of the parameters, see the EBSeq
package and the corresponding publications.
The function generates a .Rmd file containing the differential expression code. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG, Haag JD, Gould MN, Stewart RM and Kendziorski C (2013): EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics
try( if (require(EBSeq)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "EBSeq", Rmdfunction = "EBSeq.createRmd", output.directory = tmpdir, norm.method = "median") } )
try( if (require(EBSeq)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "EBSeq", Rmdfunction = "EBSeq.createRmd", output.directory = tmpdir, norm.method = "median") } )
.Rmd
file containing code to perform differential expression analysis with the edgeR exact testA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the exact test functionality from the edgeR package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
edgeR.exact.createRmd( data.path, result.path, codefile, norm.method, trend.method, disp.type )
edgeR.exact.createRmd( data.path, result.path, codefile, norm.method, trend.method, disp.type )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. Possible values are |
trend.method |
The method used to estimate the trend in the mean-dispersion relationship. Possible values are |
disp.type |
The type of dispersion estimate used. Possible values are |
For more information about the methods and the interpretation of the parameters, see the edgeR
package and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Robinson MD, McCarthy DJ and Smyth GK (2010): edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = tmpdir, norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise")
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = tmpdir, norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise")
.Rmd
file containing code to perform differential expression analysis with the edgeR GLM approachA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the GLM functionality from the edgeR package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
edgeR.GLM.createRmd( data.path, result.path, codefile, norm.method, disp.type, disp.method, trended )
edgeR.GLM.createRmd( data.path, result.path, codefile, norm.method, disp.type, disp.method, trended )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. Possible values are |
disp.type |
The type of dispersion estimate used. Possible values are |
disp.method |
The method used to estimate the dispersion. Possible values are |
trended |
Logical parameter indicating whether or not a trended dispersion estimate should be used. |
For more information about the methods and the interpretation of the parameters, see the edgeR
package and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Robinson MD, McCarthy DJ and Smyth GK (2010): edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.GLM", Rmdfunction = "edgeR.GLM.createRmd", output.directory = tmpdir, norm.method = "TMM", disp.type = "tagwise", disp.method = "CoxReid", trended = TRUE)
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.GLM", Rmdfunction = "edgeR.GLM.createRmd", output.directory = tmpdir, norm.method = "TMM", disp.type = "tagwise", disp.method = "CoxReid", trended = TRUE)
A function to extract the code used to generate differential expression results from saved compData
result objects (typically obtained by runDiffExp
), and to write the code to HTML files. This requires that the code was saved as a character string in R markdown format in the code
slot of the result object, which is done automatically by runDiffExp
. If the differential expression analysis was performed with functions outside compcodeR
, the code has to be added manually to the result object.
generateCodeHTMLs(input.files, output.directory)
generateCodeHTMLs(input.files, output.directory)
input.files |
A vector with paths to one or several |
output.directory |
The path to the directory where the code HTML files will be saved. |
Charlotte Soneson
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") generateCodeHTMLs(file.path(tmpdir, "mydata_voom.limma.rds"), tmpdir)
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") generateCodeHTMLs(file.path(tmpdir, "mydata_voom.limma.rds"), tmpdir)
Generate synthetic count data sets, following the simulation strategy detailed in Soneson and Delorenzi (2013).
generateSyntheticData( dataset, n.vars, samples.per.cond, n.diffexp, repl.id = 1, seqdepth = 1e+07, minfact = 0.7, maxfact = 1.4, relmeans = "auto", dispersions = "auto", fraction.upregulated = 1, between.group.diffdisp = FALSE, filter.threshold.total = 1, filter.threshold.mediancpm = 0, fraction.non.overdispersed = 0, random.outlier.high.prob = 0, random.outlier.low.prob = 0, single.outlier.high.prob = 0, single.outlier.low.prob = 0, effect.size = 1.5, output.file = NULL, tree = NULL, prop.var.tree = 1, model.process = c("BM", "OU"), selection.strength = 0, id.condition = NULL, id.species = as.factor(rep(1, 2 * samples.per.cond)), check.id.species = TRUE, lengths.relmeans = NULL, lengths.dispersions = NULL, lengths.phylo = TRUE )
generateSyntheticData( dataset, n.vars, samples.per.cond, n.diffexp, repl.id = 1, seqdepth = 1e+07, minfact = 0.7, maxfact = 1.4, relmeans = "auto", dispersions = "auto", fraction.upregulated = 1, between.group.diffdisp = FALSE, filter.threshold.total = 1, filter.threshold.mediancpm = 0, fraction.non.overdispersed = 0, random.outlier.high.prob = 0, random.outlier.low.prob = 0, single.outlier.high.prob = 0, single.outlier.low.prob = 0, effect.size = 1.5, output.file = NULL, tree = NULL, prop.var.tree = 1, model.process = c("BM", "OU"), selection.strength = 0, id.condition = NULL, id.species = as.factor(rep(1, 2 * samples.per.cond)), check.id.species = TRUE, lengths.relmeans = NULL, lengths.dispersions = NULL, lengths.phylo = TRUE )
dataset |
A name or identifier for the data set/simulation settings. |
n.vars |
The initial number of genes in the simulated data set. Based on the filtering conditions ( |
samples.per.cond |
The number of samples in each of the two conditions. |
n.diffexp |
The number of genes simulated to be differentially expressed between the two conditions. |
repl.id |
A replicate ID for the specific simulation instance. Useful for example when generating multiple count matrices with the same simulation settings. |
seqdepth |
The base sequencing depth (total number of mapped reads). This number is multiplied by a value drawn uniformly between |
minfact , maxfact
|
The minimum and maximum for the uniform distribution used to generate factors that are multiplied with |
relmeans |
A vector of mean values to use in the simulation of data from the Negative Binomial distribution, or |
dispersions |
A vector or matrix of dispersions to use in the simulation of data from the Negative Binomial distribution, or |
fraction.upregulated |
The fraction of the differentially expressed genes that is upregulated in condition 2 compared to condition 1. |
between.group.diffdisp |
Whether or not the dispersion should be allowed to be different between the conditions. Only applicable if |
filter.threshold.total |
The filter threshold on the total count for a gene across all samples. All genes for which the total count across all samples is less than the threshold will be filtered out. |
filter.threshold.mediancpm |
The filter threshold on the median count per million (cpm) for a gene across all samples. All genes for which the median cpm across all samples is less than the threshold will be filtered out. |
fraction.non.overdispersed |
The fraction of the genes that should be simulated according to a Poisson distribution, without overdispersion. The non-overdispersed genes will be divided proportionally between the upregulated, downregulated and non-differentially expressed genes. |
random.outlier.high.prob |
The fraction of 'random' outliers with unusually high counts. |
random.outlier.low.prob |
The fraction of 'random' outliers with unusually low counts. |
single.outlier.high.prob |
The fraction of 'single' outliers with unusually high counts. |
single.outlier.low.prob |
The fraction of 'single' outliers with unusually low counts. |
effect.size |
The strength of the differential expression, i.e., the effect size, between the two conditions. If this is a single number, the effect sizes will be obtained by simulating numbers from an exponential distribution (with rate 1) and adding the results to the |
output.file |
If not |
tree |
a dated phylogenetic tree of class |
prop.var.tree |
the proportion of the common variance explained by the tree for each gene. It can be a scalar, in which case the same parameter is used for all genes. Otherwise it needs to be a vector with length |
model.process |
the process to be used for phylogenetic simulations. One of "BM" or "OU", default to "BM". |
selection.strength |
if the process is "OU", the selection strength parameter. |
id.condition |
A named vector, indicating which species is in each condition. Default to first 'samples.per.cond' species in condition '1' and others in condition '2'. |
id.species |
A factor giving the species for each sample. If a tree is used, should be a named vector with names matching the taxa of the tree. Default to |
check.id.species |
Should the species vector be checked against the tree lengths (if provided) ? If TRUE, the function checks that all the samples that share a factor value in |
lengths.relmeans |
An optional vector of mean values to use in the simulation of lengths from the Negative Binomial distribution. Should be of length n.vars. Default to |
lengths.dispersions |
An optional vector of dispersions to use in the simulation of data from the Negative Binomial distribution. Should be of length n.vars. Default to |
lengths.phylo |
If TRUE, the lengths are simulated according to a phylogenetic Poisson Log-Normal model on the tree, with a BM process. If FALSE, they are simulated according to an iid negative binomial distribution. In both cases, |
In the comparison function, only results obtained for data sets with the same value of the dataset
parameter will be compared. Hence, it is important to give the same value of this parameter e.g. to different replicates generated with the same simulation settings.
For more detailed information regarding the different types of outliers, see Soneson and Delorenzi (2013).
Mean and dispersion parameters (if relmeans
and/or dispersions
is set to "auto"
) are sampled from values estimated from the data sets by Pickrell et al (2010) and Cheung et al (2010). The data sets were downloaded from the ReCount web page (Frazee et al (2011)) and processed as detailed by Soneson and Delorenzi (2013).
To get the actual mean value for the Negative Binomial distribution used for the simulation of counts for a given sample, take the column truemeans.S1
(or truemeans.S2
, if the sample is in condition S2) of the variable.annotations
slot, divide by the sum of the same column and multiply with the base sequencing depth (provided in the info.parameters
list) and the depth factor for the sample (given in the sample.annotations
data frame). Thus, if you have a vector of mean values that you want to provide as the relmeans
argument and make sure to use it 'as-is' in the simulation (for condition S1), make sure to set the seqdepth
argument to the sum of the values in the relmeans
vector, and to set minfact
and maxfact
equal to 1.
When the tree
argument is provided (not NULL
),
then the "phylogenetic Poisson log-Normal" model is used for the simulations,
possibly with varying gene lengths across species
(both lengths.relmeans
and lengths.dispersions
must be specified
or set to "auto"
.)
Phylogenetic simulations use the rTrait
function
from package phylolm
.
A compData
object. If output.file
is not NULL
, the object is saved in the given output.file
(which should have an .rds
extension).
Charlotte Soneson
Soneson C and Delorenzi M (2013): A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14:91
Cheung VG, Nayak RR, Wang IX, Elwyn S, Cousins SM, Morley M and Spielman RS (2010): Polymorphic cis- and trans-regulation of human gene expression. PLoS Biology 8(9):e1000480
Frazee AC, Langmead B and Leek JT (2011): ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics 12:449
Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y and Pritchard JK (2010): Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768-772
Robles JA, Qureshi SE, Stephen SJ, Wilson SR, Burden CJ and Taylor JM (2012): Efficient experimental design and analysis strategies for the detection of differential expression using RNA-sequencing. BMC Genomics 13:484
Stern DB and Crandall KA (2018): The Evolution of Gene Expression Underlying Vision Loss in Cave Animals. Molecular Biology and Evolution. 35:2005–2014.
## RNA-Seq data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100) ## Inter-species RNA-Seq data library(ape) tree <- read.tree(text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);") id.species <- factor(c("A", "A", "A", "B", "C", "C", "D", "D")) names(id.species) <- tree$tip.label mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 4, n.diffexp = 100, tree = tree, id.species = id.species, lengths.relmeans = "auto", lengths.dispersions = "auto")
## RNA-Seq data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100) ## Inter-species RNA-Seq data library(ape) tree <- read.tree(text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);") id.species <- factor(c("A", "A", "A", "B", "C", "C", "D", "D")) names(id.species) <- tree$tip.label mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 4, n.diffexp = 100, tree = tree, id.species = id.species, lengths.relmeans = "auto", lengths.dispersions = "auto")
.Rmd
file containing code to perform differential expression analysis with length normalized counts + limmaA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) by applying a length normalizing transformation followed by differential expression analysis with limma. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
lengthNorm.limma.createRmd( data.path, result.path, codefile, norm.method, extra.design.covariates = NULL, length.normalization = "RPKM", data.transformation = "log2", trend = FALSE, block.factor = NULL )
lengthNorm.limma.createRmd( data.path, result.path, codefile, norm.method, extra.design.covariates = NULL, length.normalization = "RPKM", data.transformation = "log2", trend = FALSE, block.factor = NULL )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
extra.design.covariates |
A vector containing the names of extra control variables to be passed to the design matrix of |
length.normalization |
one of "none" (no length correction), "TPM", or "RPKM" (default). See details. |
data.transformation |
one of "log2", "asin(sqrt)" or "sqrt". Data transformation to apply to the normalized data. |
trend |
should an intensity-trend be allowed for the prior variance? Default to |
block.factor |
Name of the factor specifying a blocking variable, to be passed to |
For more information about the methods and the interpretation of the parameters, see the limma
package and the corresponding publications.
The length.matrix
field of the phyloCompData
object
is used to normalize the counts, using one of the following formulas:
length.normalization="none"
:
length.normalization="TPM"
:
length.normalization="RPKM"
:
where is the count for gene g and sample i,
where
is the length of gene g in sample i,
and
is the normalization for sample i,
normalized using
calcNormFactors
of the edgeR
package.
The function specified by the data.transformation
is then applied
to the normalized count matrix.
The "" and "
" are taken from Law et al 2014,
and dropped from the normalization
when the transformation is something else than
log2
.
The "" and "
" factors are omitted when
the
asin(sqrt)
transformation is taken, as can only
be applied to real numbers smaller than 1.
The design
model used in the lmFit
uses the "condition" column of the sample.annotations
data frame from the phyloCompData
object
as well as all the covariates named in extra.design.covariates
.
For example, if extra.design.covariates = c("var1", "var2")
, then
sample.annotations
must have two columns named "var1" and "var2", and the design formula
in the lmFit
function will be:
~ condition + var1 + var2
.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson, Paul Bastide, Mélina Gallopin
Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420
Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075.
Law, C.W., Chen, Y., Shi, W. et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29.
Musser, JM, Wagner, GP. (2015): Character trees from transcriptome data: Origin and individuation of morphological characters and the so‐called “species signal”. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 588– 604.
try( if (require(limma)) { tmpdir <- normalizePath(tempdir(), winslash = "/") ## Simulate data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "length.limma", Rmdfunction = "lengthNorm.limma.createRmd", output.directory = tmpdir, norm.method = "TMM", extra.design.covariates = c("test_factor", "test_reg")) })
try( if (require(limma)) { tmpdir <- normalizePath(tempdir(), winslash = "/") ## Simulate data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "length.limma", Rmdfunction = "lengthNorm.limma.createRmd", output.directory = tmpdir, norm.method = "TMM", extra.design.covariates = c("test_factor", "test_reg")) })
.Rmd
file containing code to perform differential expression analysis with length normalized counts + SVA + limmaA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) by applying a length normalizing transformation, followed by a surrogate variable analysis (SVA), and then a differential expression analysis with limma. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
lengthNorm.sva.limma.createRmd( data.path, result.path, codefile, norm.method, extra.design.covariates = NULL, length.normalization = "RPKM", data.transformation = "log2", trend = FALSE, n.sv = "auto" )
lengthNorm.sva.limma.createRmd( data.path, result.path, codefile, norm.method, extra.design.covariates = NULL, length.normalization = "RPKM", data.transformation = "log2", trend = FALSE, n.sv = "auto" )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
extra.design.covariates |
A vector containing the names of extra control variables to be passed to the design matrix of |
length.normalization |
one of "none" (no length correction), "TPM", or "RPKM" (default). See details. |
data.transformation |
one of "log2", "asin(sqrt)" or "sqrt". Data transformation to apply to the normalized data. |
trend |
should an intensity-trend be allowed for the prior variance? Default to |
n.sv |
The number of surrogate variables to estimate (see |
For more information about the methods and the interpretation of the parameters, see the sva
and limma
packages and the corresponding publications.
See the details
section of lengthNorm.limma.createRmd
for details
on the normalization and the extra design covariates.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson, Paul Bastide, Mélina Gallopin
Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420
Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075.
Law, C.W., Chen, Y., Shi, W. et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29.
Musser, JM, Wagner, GP. (2015): Character trees from transcriptome data: Origin and individuation of morphological characters and the so‐called “species signal”. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 588– 604.
Leek JT, Johnson WE, Parker HS, Jaffe AE, and Storey JD. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics DOI:10.1093/bioinformatics/bts034
try( if (require(limma) && require(sva)) { tmpdir <- normalizePath(tempdir(), winslash = "/") ## Simulate data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "lengthNorm.sva.limma", Rmdfunction = "lengthNorm.sva.limma.createRmd", output.directory = tmpdir, norm.method = "TMM", extra.design.covariates = c("test_factor", "test_reg")) })
try( if (require(limma) && require(sva)) { tmpdir <- normalizePath(tempdir(), winslash = "/") ## Simulate data mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "lengthNorm.sva.limma", Rmdfunction = "lengthNorm.sva.limma.createRmd", output.directory = tmpdir, norm.method = "TMM", extra.design.covariates = c("test_factor", "test_reg")) })
Print a list of all *.createRmd
functions that are available in the search path. These functions can be used together with the runDiffExp
function to perform differential expression analysis. Consult the help pages for the respective functions for more information.
listcreateRmd()
listcreateRmd()
Charlotte Soneson
listcreateRmd()
listcreateRmd()
.Rmd
file containing code to perform differential expression analysis with limma after log-transforming the counts per million (cpm)A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using limma, after preprocessing the counts by computing the counts per million (cpm) and applying a logarithmic transformation. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
logcpm.limma.createRmd(data.path, result.path, codefile, norm.method)
logcpm.limma.createRmd(data.path, result.path, codefile, norm.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
For more information about the methods and the interpretation of the parameters, see the edgeR
and limma
packages and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420
Robinson MD, McCarthy DJ and Smyth GK (2010): edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
Robinson MD and Oshlack A (2010): A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11:R25
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "logcpm.limma", Rmdfunction = "logcpm.limma.createRmd", output.directory = tmpdir, norm.method = "TMM")
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "logcpm.limma", Rmdfunction = "logcpm.limma.createRmd", output.directory = tmpdir, norm.method = "TMM")
.Rmd
file containing code to perform differential expression analysis with NBPSeqA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using NBPSeq
. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
NBPSeq.createRmd(data.path, result.path, codefile, norm.method, disp.method)
NBPSeq.createRmd(data.path, result.path, codefile, norm.method, disp.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
disp.method |
The method to use to estimate the dispersion values. Possible values are |
For more information about the methods and the interpretation of the parameters, see the NBPSeq
and edgeR
packages and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Robinson MD, McCarthy DJ and Smyth GK (2010): edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
Robinson MD and Oshlack A (2010): A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11:R25
Di Y, Schafer DW, Cumbie JS, and Chang JH (2011): The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq. Statistical Applications in Genetics and Molecular Biology 10(1), 1-28
try( if (require(NBPSeq)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "NBPSeq", Rmdfunction = "NBPSeq.createRmd", output.directory = tmpdir, norm.method = "TMM", disp.method = "NBP") })
try( if (require(NBPSeq)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "NBPSeq", Rmdfunction = "NBPSeq.createRmd", output.directory = tmpdir, norm.method = "TMM", disp.method = "NBP") })
.Rmd
file containing code to perform differential expression analysis with NOISeqA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using NOISeq
. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
NOISeq.prenorm.createRmd(data.path, result.path, codefile, norm.method)
NOISeq.prenorm.createRmd(data.path, result.path, codefile, norm.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
For more information about the methods and the interpretation of the parameters, see the NOISeq
package and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Robinson MD, McCarthy DJ and Smyth GK (2010): edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
Robinson MD and Oshlack A (2010): A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11:R25
Tarazona S, Furio-Tari P, Ferrer A and Conesa A (2012): NOISeq: Exploratory analysis and differential expression for RNA-seq data. R package
Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A and Conesa A (2011): Differential expression in RNA-seq: a matter of depth. Genome Res 21(12), 2213-2223
try( if (require(NOISeq)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "NOISeq", Rmdfunction = "NOISeq.prenorm.createRmd", output.directory = tmpdir, norm.method = "TMM") })
try( if (require(NOISeq)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "NOISeq", Rmdfunction = "NOISeq.prenorm.createRmd", output.directory = tmpdir, norm.method = "TMM") })
phyloCompData
objectThe phyloCompData
class extends the compData
class
with sequence length and phylogeny related information.
phyloCompData( count.matrix, sample.annotations, info.parameters, variable.annotations = data.frame(), filtering = "no info", analysis.date = "", package.version = "", method.names = list(), code = "", result.table = data.frame(), tree = list(), length.matrix = matrix(NA_integer_, 0, 0) )
phyloCompData( count.matrix, sample.annotations, info.parameters, variable.annotations = data.frame(), filtering = "no info", analysis.date = "", package.version = "", method.names = list(), code = "", result.table = data.frame(), tree = list(), length.matrix = matrix(NA_integer_, 0, 0) )
count.matrix |
A count matrix, with genes as rows and observations as columns. |
sample.annotations |
A data frame, containing at least one column named 'condition', encoding the grouping of the observations into two groups, and one column named |
info.parameters |
A list containing information regarding simulation parameters etc. The only mandatory entries are
|
variable.annotations |
A data frame with variable annotations (with number of rows equal to the number of rows in
|
filtering |
A character string containing information about the filtering that has been applied to the data set. |
analysis.date |
If a differential expression analysis has been performed, a character string detailing when it was performed. |
package.version |
If a differential expression analysis has been performed, a character string giving the version of the differential expression packages that were applied. |
method.names |
If a differential expression analysis has been performed, a list with entries |
code |
If a differential expression analysis has been performed, a character string containing the code that was run to perform the analysis. The code should be in R markdown format, and can be written to an HTML file using the |
result.table |
If a differential expression analysis has been performed, a data frame containing the results of the analysis. The number of rows should be equal to the number of rows in
|
tree |
The phylogenetic tree describing the relationships between samples. The taxa names of the |
length.matrix |
The length matrix, with genes as rows and samples as columns. The column names of the |
A phyloCompData
object.
Charlotte Soneson, Paul Bastide
tree <- ape::read.tree( text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);" ) count.matrix <- round(matrix(1000*runif(8000), 1000)) sample.annotations <- data.frame(condition = c(1, 1, 1, 1, 2, 2, 2, 2), id.species = c("A", "A", "A", "B", "C", "C", "D", "D")) info.parameters <- list(dataset = "mydata", uID = "123456") length.matrix <- round(matrix(1000*runif(8000), 1000)) colnames(count.matrix) <- colnames(length.matrix) <- rownames(sample.annotations) <- tree$tip.label cpd <- phyloCompData(count.matrix, sample.annotations, info.parameters, tree = tree, length.matrix = length.matrix)
tree <- ape::read.tree( text = "(((A1:0,A2:0,A3:0):1,B1:1):1,((C1:0,C2:0):1.5,(D1:0,D2:0):1.5):0.5);" ) count.matrix <- round(matrix(1000*runif(8000), 1000)) sample.annotations <- data.frame(condition = c(1, 1, 1, 1, 2, 2, 2, 2), id.species = c("A", "A", "A", "B", "C", "C", "D", "D")) info.parameters <- list(dataset = "mydata", uID = "123456") length.matrix <- round(matrix(1000*runif(8000), 1000)) colnames(count.matrix) <- colnames(length.matrix) <- rownames(sample.annotations) <- tree$tip.label cpd <- phyloCompData(count.matrix, sample.annotations, info.parameters, tree = tree, length.matrix = length.matrix)
The phyloCompData
class extends the compData
class
with sequence length and phylogeny related information.
tree
:The phylogenetic tree describing the relationships between samples. The taxa names of the tree
should be the same as the column names of the count.matrix
. Class phylo
.
length.matrix
:The length matrix, with genes as rows and samples as columns. The column names of the length.matrix
should be the same as the column names of the count.matrix
. Class matrix
.
sample.annotations
:In addition to the columns described in the compData
class, if the tree is specified, it should contain an extra column named id.species
of factors giving the species for each sample. The row names should be the same as the column names of count.matrix. Class data.frame
.
signature(x="phyloCompData")
signature(x="phyloCompData",value="phylo")
:
Get or set the tree in a phyloCompData
object. value
should be a phylo object.
signature(x="phyloCompData")
signature(x="phyloCompData",value="matrix")
:
Get or set the length matrix in a phyloCompData
object. value
should be a numeric matrix.
An object of the class phyloCompData
can be constructed using the phyloCompData
function.
Charlotte Soneson, Paul Bastide
.Rmd
file containing code to perform differential expression analysis with phylolm
.A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the phylolm package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
phylolm.createRmd( data.path, result.path, codefile, norm.method, model = "BM", measurement_error = TRUE, extra.design.covariates = NULL, length.normalization = "RPKM", data.transformation = "log2", ... )
phylolm.createRmd( data.path, result.path, codefile, norm.method, model = "BM", measurement_error = TRUE, extra.design.covariates = NULL, length.normalization = "RPKM", data.transformation = "log2", ... )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
model |
The model for trait evolution on the tree. Default to "BM". |
measurement_error |
A logical value indicating whether there is measurement error. Default to TRUE. |
extra.design.covariates |
A vector containing the names of extra control variables to be passed to the design matrix of |
length.normalization |
one of "none" (no correction), "TPM" or "RPKM" (default). See details. |
data.transformation |
one of "log2", "asin(sqrt)" or "sqrt". Data transformation to apply to the normalized data. |
... |
Further arguments to be passed to function |
For more information about the methods and the interpretation of the parameters, see the phylolm
package and the corresponding publications.
The length.matrix
field of the phyloCompData
object
is used to normalize the counts, using one of the following formulas:
* length.normalization="none"
:
*
length.normalization="TPM"
:
*
length.normalization="RPKM"
:
where is the count for gene g and sample i,
where
is the length of gene g in sample i,
and
is the normalization for sample i,
normalized using
calcNormFactors
of the edgeR
package.
The function specified by the data.transformation
is then applied
to the normalized count matrix.
The "" and "
" are taken from Law et al 2014,
and dropped from the normalization
when the transformation is something else than
log2
.
The "" and "
" factors are omitted when
the
asin(sqrt)
transformation is taken, as can only
be applied to real numbers smaller than 1.
The design
model used in the phylolm
uses the "condition" column of the sample.annotations
data frame from the phyloCompData
object
as well as all the covariates named in extra.design.covariates
.
For example, if extra.design.covariates = c("var1", "var2")
, then
sample.annotations
must have two columns named "var1" and "var2", and the design formula
in the phylolm
function will be:
~ condition + var1 + var2
.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson, Paul Bastide, Mélina Gallopin
Ho, L. S. T. and Ane, C. 2014. "A linear-time algorithm for Gaussian and non-Gaussian trait evolution models". Systematic Biology 63(3):397-408.
Law, C.W., Chen, Y., Shi, W. et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29.
Musser, JM, Wagner, GP. (2015): Character trees from transcriptome data: Origin and individuation of morphological characters and the so‐called “species signal”. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 588– 604.
try( if (require(ape) && require(phylolm)) { tmpdir <- normalizePath(tempdir(), winslash = "/") set.seed(20200317) tree <- rphylo(10, 0.1, 0) mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, tree = tree, id.species = 1:10, lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DESeq2", Rmdfunction = "phylolm.createRmd", output.directory = tmpdir, norm.method = "TMM", extra.design.covariates = c("test_factor", "test_reg"), length.normalization = "RPKM") })
try( if (require(ape) && require(phylolm)) { tmpdir <- normalizePath(tempdir(), winslash = "/") set.seed(20200317) tree <- rphylo(10, 0.1, 0) mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, tree = tree, id.species = 1:10, lengths.relmeans = rpois(1000, 1000), lengths.dispersions = rgamma(1000, 1, 1), output.file = file.path(tmpdir, "mydata.rds")) ## Add covariates ## Model fitted is count.matrix ~ condition + test_factor + test_reg sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5)) sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1) saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds")) ## Diff Exp runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DESeq2", Rmdfunction = "phylolm.createRmd", output.directory = tmpdir, norm.method = "TMM", extra.design.covariates = c("test_factor", "test_reg"), length.normalization = "RPKM") })
The main function for performing comparisons among differential expression methods and generating a report in HTML format. It is assumed that all differential expression results have been generated in advance (using e.g. the function runDiffExp
) and that the result compData
object for each data set and each differential expression method is saved separately in files with the extension .rds
. Note that the function can also be called via the runComparisonGUI
function, which lets the user set parameters and select input files using a graphical user interface.
runComparison( file.table, parameters, output.directory, check.table = TRUE, out.width = NULL, save.result.table = FALSE, knit.results = TRUE )
runComparison( file.table, parameters, output.directory, check.table = TRUE, out.width = NULL, save.result.table = FALSE, knit.results = TRUE )
file.table |
A data frame with at least a column |
parameters |
A list containing parameters for the comparison study. The following entries are supported, and used by different comparison methods:
|
output.directory |
The directory where the results should be written. The subdirectory structure will be created automatically. If the directory already exists, it will be overwritten. |
check.table |
Logical, should the input table be checked for consistency. Default |
out.width |
The width of the figures in the final report. Will be passed on to |
save.result.table |
Logical, should the intermediate result table be saved for future use ? Default to |
knit.results |
Logical, should the Rmd be generated and knitted ? Default to |
The input to runComparison
is a data frame with at least a column named input.files
, containing paths to .rds
files containing result objects (of the class compData
), such as those generated by runDiffExp
. Other columns that can be included in the data frame are datasets
, nbr.samples
, repl
and de.methods
. They have to match the information contained in the corresponding result objects. If these columns are not present, they will be added to the data frame automatically.
If knit.results=TRUE
, the function will create a comparison report, named compcodeR_report<timestamp>.html, in the output.directory
. It will also create subfolders named compcodeR_code
and compcodeR_figure
, where the code used to perform the differential expression analysis and the figures contained in the report, respectively, will be stored. Note that if these directories already exists, they will be overwritten.
If save.result.table=TRUE
, the function will also create a comparison report, named compcodeR_result_table_<timestamp>.rds in the output.directory
, containing the result table.
Charlotte Soneson
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = tmpdir, norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise") file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), stringsAsFactors = FALSE) parameters <- list(incl.nbr.samples = 5, incl.replicates = 1, incl.dataset = "mydata", incl.de.methods = NULL, fdr.threshold = 0.05, tpr.threshold = 0.05, typeI.threshold = 0.05, ma.threshold = 0.05, fdc.maxvar = 1500, overlap.threshold = 0.05, fracsign.threshold = 0.05, mcc.threshold = 0.05, nbrtpfp.threshold = 0.05, comparisons = c("auc", "fdr", "tpr", "ma", "correlation")) if (interactive()) { runComparison(file.table = file.table, parameters = parameters, output.directory = tmpdir) }
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = tmpdir, norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise") file.table <- data.frame(input.files = file.path(tmpdir, c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")), stringsAsFactors = FALSE) parameters <- list(incl.nbr.samples = 5, incl.replicates = 1, incl.dataset = "mydata", incl.de.methods = NULL, fdr.threshold = 0.05, tpr.threshold = 0.05, typeI.threshold = 0.05, ma.threshold = 0.05, fdc.maxvar = 1500, overlap.threshold = 0.05, fracsign.threshold = 0.05, mcc.threshold = 0.05, nbrtpfp.threshold = 0.05, comparisons = c("auc", "fdr", "tpr", "ma", "correlation")) if (interactive()) { runComparison(file.table = file.table, parameters = parameters, output.directory = tmpdir) }
This function provides a GUI to the main function for performing comparisons among differential expression methods and generating a report in HTML format (runComparison
). It is assumed that all differential expression results have been generated in advance (using e.g. the function runDiffExp
) and that the result compData
object for each data set and each differential expression method is saved separately in files with the extension .rds
. The function opens a graphical user interface where the user can set parameter values and choose the files to be used as the basis of the comparison. It is, however, possible to circumvent the GUI and call the comparison function runComparison
directly.
runComparisonGUI( input.directories, output.directory, recursive, out.width = NULL, upper.limits = NULL, lower.limits = NULL )
runComparisonGUI( input.directories, output.directory, recursive, out.width = NULL, upper.limits = NULL, lower.limits = NULL )
input.directories |
A list of directories containing the result files ( |
output.directory |
The directory where the results should be written. The subdirectory structure will be created automatically. If the directory already exists, it will be overwritten. |
recursive |
A logical parameter indicating whether or not the search should be extended recursively to subfolders of the |
out.width |
The width of the figures in the final report. Will be passed on to |
upper.limits , lower.limits
|
Lists that can be used to manually set upper and lower limits for boxplots of fdr, tpr, auc, mcc, fracsign, nbrtpfp, nbrsign and typeIerror. |
This function requires that the rpanel
package is installed. If this package can not be installed, please use the runComparison
function directly.
The function will create a comparison report, named compcodeR_report<timestamp>.html, in the output.directory
. It will also create subfolders named compcodeR_code
and compcodeR_figure
, where the code used to perform the differential expression analysis and the figures contained in the report, respectively, will be saved. Note that if these directories already exist they will be overwritten.
Charlotte Soneson
if (interactive()) { mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250, output.file = "mydata.rds") runDiffExp(data.file = "mydata.rds", result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = ".", norm.method = "TMM") runComparisonGUI(input.directories = ".", output.directory = ".", recursive = FALSE) }
if (interactive()) { mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250, output.file = "mydata.rds") runDiffExp(data.file = "mydata.rds", result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = ".", norm.method = "TMM") runComparisonGUI(input.directories = ".", output.directory = ".", recursive = FALSE) }
This function provides a GUI to the main function for performing comparisons
among differential expression methods and generating a report in HTML format
(runComparison
). It is assumed that all differential
expression results have been generated in advance (using e.g. the function
runDiffExp
) and that the result compData
object for
each data set and each differential expression method is saved separately in
files with the extension .rds
. The function opens a graphical user
interface where the user can set parameter values and choose the files to be
used as the basis of the comparison. It is, however, possible to circumvent
the GUI and call the comparison function runComparison
directly.
runComparisonShiny( input.directories, output.directory, recursive, out.width = NULL, upper.limits = NULL, lower.limits = NULL )
runComparisonShiny( input.directories, output.directory, recursive, out.width = NULL, upper.limits = NULL, lower.limits = NULL )
input.directories |
A list of directories containing the result files
( |
output.directory |
The directory where the results should be written. The subdirectory structure will be created automatically. If the directory already exists, it will be overwritten. |
recursive |
A logical parameter indicating whether or not the search
should be extended recursively to subfolders of the
|
out.width |
The width of the figures in the final report. Will be
passed on to |
upper.limits , lower.limits
|
Lists that can be used to manually set upper and lower limits for boxplots of fdr, tpr, auc, mcc, fracsign, nbrtpfp, nbrsign and typeIerror. |
The function will create a comparison report, named
compcodeR_report<timestamp>.html, in the output.directory
.
It will also create subfolders named compcodeR_code
and
compcodeR_figure
, where the code used to perform the differential
expression analysis and the figures contained in the report, respectively,
will be saved. Note that if these directories already exist they will be
overwritten.
Charlotte Soneson
if (interactive()) { mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250, output.file = "mydata.rds") runDiffExp(data.file = "mydata.rds", result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = ".", norm.method = "TMM") runComparisonShiny(input.directories = ".", output.directory = ".", recursive = FALSE) }
if (interactive()) { mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250, output.file = "mydata.rds") runDiffExp(data.file = "mydata.rds", result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = ".", norm.method = "TMM") runComparisonShiny(input.directories = ".", output.directory = ".", recursive = FALSE) }
The main function for running differential expression analysis (comparing two conditions), using one of the methods interfaced through compcodeR
or a user-defined method. Note that the interface functions are provided for convenience and as templates for other, user-defined workflows, and there is no guarantee that the included differential expression code is kept up-to-date with the latest recommendations and best practices for running each of the interfaced methods, or that the chosen settings are suitable in all situations. The user should make sure that the analysis is performed in the way they intend, and check the code that was run, using e.g. the generateCodeHTMLs()
function.
runDiffExp( data.file, result.extent, Rmdfunction, output.directory = ".", norm.path = TRUE, ... )
runDiffExp( data.file, result.extent, Rmdfunction, output.directory = ".", norm.path = TRUE, ... )
data.file |
The path to a |
result.extent |
The extension that will be added to the data file name in order to construct the result file name. This can be for example the differential expression method together with a version number. |
Rmdfunction |
A function that creates an Rmd file containing the code that should be run to perform the differential expression analysis. All functions available through |
output.directory |
The directory in which the result object will be saved. |
norm.path |
Logical, whether to include the full (absolute) path to the output object in the saved code. |
... |
Additional arguments that will be passed to the |
Charlotte Soneson
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) listcreateRmd() runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") if (interactive()) { ## The following list covers the currently available ## differential expression methods: runDiffExp(data.file = "mydata.rds", result.extent = "DESeq2", Rmdfunction = "DESeq2.createRmd", output.directory = ".", fit.type = "parametric", test = "Wald", beta.prior = TRUE, independent.filtering = TRUE, cooks.cutoff = TRUE, impute.outliers = TRUE) runDiffExp(data.file = "mydata.rds", result.extent = "DSS", Rmdfunction = "DSS.createRmd", output.directory = ".", norm.method = "quantile", disp.trend = TRUE) runDiffExp(data.file = "mydata.rds", result.extent = "EBSeq", Rmdfunction = "EBSeq.createRmd", output.directory = ".", norm.method = "median") runDiffExp(data.file = "mydata.rds", result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = ".", norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise") runDiffExp(data.file = "mydata.rds", result.extent = "edgeR.GLM", Rmdfunction = "edgeR.GLM.createRmd", output.directory = ".", norm.method = "TMM", disp.type = "tagwise", disp.method = "CoxReid", trended = TRUE) runDiffExp(data.file = "mydata.rds", result.extent = "logcpm.limma", Rmdfunction = "logcpm.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "NBPSeq", Rmdfunction = "NBPSeq.createRmd", output.directory = ".", norm.method = "TMM", disp.method = "NBP") runDiffExp(data.file = "mydata.rds", result.extent = "NOISeq", Rmdfunction = "NOISeq.prenorm.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "sqrtcpm.limma", Rmdfunction = "sqrtcpm.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "TCC", Rmdfunction = "TCC.createRmd", output.directory = ".", norm.method = "tmm", test.method = "edger", iteration = 3, normFDR = 0.1, floorPDEG = 0.05) runDiffExp(data.file = "mydata.rds", result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "voom.ttest", Rmdfunction = "voom.ttest.createRmd", output.directory = ".", norm.method = "TMM") }
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) listcreateRmd() runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM") if (interactive()) { ## The following list covers the currently available ## differential expression methods: runDiffExp(data.file = "mydata.rds", result.extent = "DESeq2", Rmdfunction = "DESeq2.createRmd", output.directory = ".", fit.type = "parametric", test = "Wald", beta.prior = TRUE, independent.filtering = TRUE, cooks.cutoff = TRUE, impute.outliers = TRUE) runDiffExp(data.file = "mydata.rds", result.extent = "DSS", Rmdfunction = "DSS.createRmd", output.directory = ".", norm.method = "quantile", disp.trend = TRUE) runDiffExp(data.file = "mydata.rds", result.extent = "EBSeq", Rmdfunction = "EBSeq.createRmd", output.directory = ".", norm.method = "median") runDiffExp(data.file = "mydata.rds", result.extent = "edgeR.exact", Rmdfunction = "edgeR.exact.createRmd", output.directory = ".", norm.method = "TMM", trend.method = "movingave", disp.type = "tagwise") runDiffExp(data.file = "mydata.rds", result.extent = "edgeR.GLM", Rmdfunction = "edgeR.GLM.createRmd", output.directory = ".", norm.method = "TMM", disp.type = "tagwise", disp.method = "CoxReid", trended = TRUE) runDiffExp(data.file = "mydata.rds", result.extent = "logcpm.limma", Rmdfunction = "logcpm.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "NBPSeq", Rmdfunction = "NBPSeq.createRmd", output.directory = ".", norm.method = "TMM", disp.method = "NBP") runDiffExp(data.file = "mydata.rds", result.extent = "NOISeq", Rmdfunction = "NOISeq.prenorm.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "sqrtcpm.limma", Rmdfunction = "sqrtcpm.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "TCC", Rmdfunction = "TCC.createRmd", output.directory = ".", norm.method = "tmm", test.method = "edger", iteration = 3, normFDR = 0.1, floorPDEG = 0.05) runDiffExp(data.file = "mydata.rds", result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = ".", norm.method = "TMM") runDiffExp(data.file = "mydata.rds", result.extent = "voom.ttest", Rmdfunction = "voom.ttest.createRmd", output.directory = ".", norm.method = "TMM") }
compData
objectShow method for compData
object.
## S4 method for signature 'compData' show(object)
## S4 method for signature 'compData' show(object)
object |
A |
Charlotte Soneson
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250) mydata
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 12500, samples.per.cond = 5, n.diffexp = 1250) mydata
phyloCompData
objectShow method for phyloCompData
object.
## S4 method for signature 'phyloCompData' show(object)
## S4 method for signature 'phyloCompData' show(object)
object |
A |
Charlotte Soneson, Paul Bastide
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), tree = ape::rphylo(10, 1, 0), lengths.relmeans = "auto", lengths.dispersions = "auto") mydata
mydata <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, id.species = factor(1:10), tree = ape::rphylo(10, 1, 0), lengths.relmeans = "auto", lengths.dispersions = "auto") mydata
.Rmd
file containing code to perform differential expression analysis with limma after square root-transforming the counts per million (cpm)A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using limma, after preprocessing the counts by computing the counts per million (cpm) and applying a square-root transformation. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
sqrtcpm.limma.createRmd(data.path, result.path, codefile, norm.method)
sqrtcpm.limma.createRmd(data.path, result.path, codefile, norm.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
For more information about the methods and the interpretation of the parameters, see the edgeR
and limma
packages and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420
Robinson MD, McCarthy DJ and Smyth GK (2010): edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
Robinson MD and Oshlack A (2010): A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11:R25
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "sqrtcpm.limma", Rmdfunction = "sqrtcpm.limma.createRmd", output.directory = tmpdir, norm.method = "TMM")
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "sqrtcpm.limma", Rmdfunction = "sqrtcpm.limma.createRmd", output.directory = tmpdir, norm.method = "TMM")
Summarize a synthetic data set (generated by generateSyntheticData
) by some diagnostic plots.
summarizeSyntheticDataSet(data.set, output.filename)
summarizeSyntheticDataSet(data.set, output.filename)
data.set |
A data set, either a |
output.filename |
The filename of the resulting html report (including the path). |
Charlotte Soneson
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) if (interactive()) { summarizeSyntheticDataSet(data.set = file.path(tmpdir, "mydata.rds"), output.filename = file.path(tmpdir, "mydata_check.html")) }
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) if (interactive()) { summarizeSyntheticDataSet(data.set = file.path(tmpdir, "mydata.rds"), output.filename = file.path(tmpdir, "mydata_check.html")) }
.Rmd
file containing code to perform differential expression analysis with TCCA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the TCC package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
TCC.createRmd( data.path, result.path, codefile, norm.method, test.method, iteration = 3, normFDR = 0.1, floorPDEG = 0.05 )
TCC.createRmd( data.path, result.path, codefile, norm.method, test.method, iteration = 3, normFDR = 0.1, floorPDEG = 0.05 )
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. Possible values are |
test.method |
The method used in TCC to find differentially expressed genes. Possible values are |
iteration |
The number of iterations used to find the normalization factors. Default value is 3. |
normFDR |
The FDR cutoff for calling differentially expressed genes in the computation of the normalization factors. Default value is 0.1. |
floorPDEG |
The minimum value to be eliminated as potential differentially expressed genes before performing step 3 in the TCC algorithm. Default value is 0.05. |
For more information about the methods and the interpretation of the parameters, see the TCC
package and the corresponding publications.
Charlotte Soneson
Kadota K, Nishiyama T, and Shimizu K. A normalization strategy for comparing tag count data. Algorithms Mol Biol. 7:5, 2012.
Sun J, Nishiyama T, Shimizu K, and Kadota K. TCC: an R package for comparing tag count data with robust normalization strategies. BMC Bioinformatics 14:219, 2013.
try( if (require(TCC)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "TCC", Rmdfunction = "TCC.createRmd", output.directory = tmpdir, norm.method = "tmm", test.method = "edger") })
try( if (require(TCC)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "TCC", Rmdfunction = "TCC.createRmd", output.directory = tmpdir, norm.method = "tmm", test.method = "edger") })
.Rmd
file containing code to perform differential expression analysis with a t-testA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using a t-test, applied to the normalized counts. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
ttest.createRmd(data.path, result.path, codefile, norm.method)
ttest.createRmd(data.path, result.path, codefile, norm.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
For more information about the methods and the interpretation of the parameters, see the edgeR
package and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Robinson MD, McCarthy DJ and Smyth GK (2010): edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
Robinson MD and Oshlack A (2010): A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11:R25
try( if (require(genefilter)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = tmpdir, norm.method = "TMM") })
try( if (require(genefilter)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "ttest", Rmdfunction = "ttest.createRmd", output.directory = tmpdir, norm.method = "TMM") })
.Rmd
file containing code to perform differential expression analysis with voom+limmaA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) by applying the voom transformation (from the limma package) followed by differential expression analysis with limma. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
voom.limma.createRmd(data.path, result.path, codefile, norm.method)
voom.limma.createRmd(data.path, result.path, codefile, norm.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
For more information about the methods and the interpretation of the parameters, see the limma
package and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420
Law CW, Chen Y, Shi W and Smyth GK (2014): voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM")
tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma", Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir, norm.method = "TMM")
.Rmd
file containing code to perform differential expression analysis with voom+t-testA function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) by applying the voom transformation (from the limma
package) followed by differential expression analysis with a t-test. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
voom.ttest.createRmd(data.path, result.path, codefile, norm.method)
voom.ttest.createRmd(data.path, result.path, codefile, norm.method)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
For more information about the methods and the interpretation of the parameters, see the limma
and edgeR
packages and the corresponding publications.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson
Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420
Law CW, Chen Y, Shi W and Smyth GK (2014): voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29
try( if (require(genefilter)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.ttest", Rmdfunction = "voom.ttest.createRmd", output.directory = tmpdir, norm.method = "TMM") })
try( if (require(genefilter)) { tmpdir <- normalizePath(tempdir(), winslash = "/") mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, samples.per.cond = 5, n.diffexp = 100, output.file = file.path(tmpdir, "mydata.rds")) runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.ttest", Rmdfunction = "voom.ttest.createRmd", output.directory = tmpdir, norm.method = "TMM") })