Title: | Benchmarks for RNA-seq Quantification Pipelines |
---|---|
Description: | Several quantitative and visualized benchmarks for RNA-seq quantification pipelines. Two-condition quantifications for genes, transcripts, junctions or exons by each pipeline with necessary meta information should be organized into numeric matrices in order to proceed the evaluation. |
Authors: | Mingxiang Teng and Rafael A. Irizarry |
Maintainer: | Mingxiang Teng <[email protected]> |
License: | GPL-3 |
Version: | 1.37.0 |
Built: | 2024-11-18 04:08:37 UTC |
Source: | https://github.com/bioc/rnaseqcomp |
This function always checks if the elements are valid
to create a S4 rnaseqcomp object. Specifically, check if quantData
is a list of matrices, if condInfo
has the correct length and
levels, etc.
check_rnaseqcomp(object)
check_rnaseqcomp(object)
object |
A object of S4 rnaseqcomp class |
TRUE, or character if error happens.
For any compared two replicates in each cell line, the proportion of one transcript for genes that only include two annotated transcripts can be different even flipped. This function estimates and plots the proportion difference stratefied by detrended logsignal. Means of absolute difference will be reported for three levels of detrened logsignals. Average is used when multiple two-replicate comparisons included.
plot2TX(dat, genes, step = 0.5, thresholds = c(1, 6), plotcell = 1, ...)
plot2TX(dat, genes, step = 0.5, thresholds = c(1, 6), plotcell = 1, ...)
dat |
A |
genes |
A vector of gene names corresponding to quantified
transcripts. Note that |
step |
A number specifying the resolution on detrended logsignal for calculation and plotting the proportion difference. (default: 0.5) |
thresholds |
A vector of two numbers define cutoffs for three levels of detreded log signals, where one number summary will be generated. (default: c(1, 6)) |
plotcell |
1 or 2 indicating which cell line will be plotted. If values other than 1 and 2, both cell lines will be plotted. This value won't affect estimation for both cell lines. (default: 1) |
... |
Parameters for base function |
plot |
2TX plots of quantification pipelines for
selected cell line by |
list |
A list of two matrices indicating the mean and standard error of absolute proportion differences. Valuesa are based on average of two cell lines. |
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plot2TX(dat,genes=simdata$meta$gene)
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plot2TX(dat,genes=simdata$meta$gene)
For each pipeline, differential expression is estimated by fold change on mean signals across replicates of cell lines. For features that are truely differential expressed, their fold changes levels are summarized based on different levels of detrended logsignals.
plotFC(dat, positive, fcsign, constant = 0.5, loessspan = 1/3, thresholds = c(1, 6), ...)
plotFC(dat, positive, fcsign, constant = 0.5, loessspan = 1/3, thresholds = c(1, 6), ...)
dat |
A |
positive |
A logical vector with length equivalent to row
number of matrices in |
fcsign |
A numeric vector with length equivalent to row
number of matrices in |
constant |
A numeric constant that is added to quantifications before fold changes calculation. (default: 0.5) |
loessspan |
A numeric number indicating span used
for loess smooth. Details see
|
thresholds |
A numeric vector defining cutoffs on fold changes as the points to make threshold averaging on ROC curves. (default: seq(12, 0, len = 300)) |
... |
Parameters for base function |
plot |
Fold change plots for all the quantification pipelines. |
list |
A list of two numeric vectors indicating median and standard error of fold changes in three different levels of detrended logsignals. |
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) ## only select the true differential that have exact fold changes simdata$meta$fcsign[simdata$meta$fcstatus == "off.on"] <- NA plotFC(dat,simdata$meta$positive,simdata$meta$fcsign)
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) ## only select the true differential that have exact fold changes simdata$meta$fcsign[simdata$meta$fcstatus == "off.on"] <- NA plotFC(dat,simdata$meta$positive,simdata$meta$fcsign)
For each cell line, any compared two replicates might have a portion of transcripts that express in one replicate but not the other, depending on what cutoff is used to define non-express. This function estimate and plot the proportion of disagreement using multiple cutoffs. Average is used when multiple two-replicate comparisons included.
plotNE(dat, steps = seq(-0.5, 12, 0.5), Ks = 0:3, pchK = seq_along(Ks) - 1, plotcell = 1, ...)
plotNE(dat, steps = seq(-0.5, 12, 0.5), Ks = 0:3, pchK = seq_along(Ks) - 1, plotcell = 1, ...)
dat |
A |
steps |
A numeric vector specifying log-scale cutoffs to be used for calculation and plotting. (default: seq(-0.5, 12, 0.5)) |
Ks |
A numeric vector specifying which cutoffs to be highlighted and to which the reported proportions to be corresponding. |
pchK |
Plot styles of highlight points corresponding
to |
plotcell |
1 or 2 indicating which cell line will be plotted. If values other than 1 and 2, both cell lines will be plotted. This value won't affect estimation for both cell lines. (default: 1) |
... |
Parameters for base function |
plot |
NE plots of quantification pipelines for
selected cell line by |
NE |
A list of two matrices. The first matrix gives
the proportion of disagreement and the second matrix gives the
proportion of both replicates under (non-express)
correspoinding cutoff |
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plotNE(dat)
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plotNE(dat)
For each pipeline, differential expression is first estimated by fold change on 1 vs. 1 comparison between cell lines. ROC curves then are made by comparing fold changes with predefined true differentials. Then, ROC curves from multiple 1 vs. 1 comparisons are averaged using threshold averaging strategy. Standardized partial area under the curve (pAUC) is reported for each pipeline.
plotROC(dat, positive, fcsign, cut = 1, constant = 0.5, thresholds = seq(12, 0, len = 300), arrow = FALSE, ...)
plotROC(dat, positive, fcsign, cut = 1, constant = 0.5, thresholds = seq(12, 0, len = 300), arrow = FALSE, ...)
dat |
A |
positive |
A logical vector with length equivalent to row
number of matrices in |
fcsign |
A numeric vector with length equivalent to row
number of matrices in |
cut |
A numeric cutoff used to decide if fold change should be
estimated. For a 1 vs 1 comparison, if features have signals less than
|
constant |
A numeric constant that is added to quantifications before fold changes calculation. (default: 0.5) |
thresholds |
A numeric vector defining cutoffs on fold changes as the points to make threshold averaging on ROC curves. (default: seq(12, 0, len = 300)) |
arrow |
A logical indicating if error bars should be added to the averaged ROC curves. (default: FALSE) |
... |
Parameters for base function |
plot |
ROC plots for all the quantification pipelines. |
pAUC |
A numeric vector indicating pipeline accuracy. This is standardized partial AUC based on ranges chosen on false positive rate. |
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plotROC(dat,simdata$meta$positive,simdata$meta$fcsign)
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plotROC(dat,simdata$meta$positive,simdata$meta$fcsign)
For each cell line in each pipeline, the standard deviation of detrend logsignals are calculated for individual features. Then, loess smooth on standard deviation are plotted stratified by detrended log signals for select cell line. The median of standard deviation at three different levels of detrend logsignals are reported.
plotSD(dat, constant = 0.5, loessspan = 1/3, thresholds = c(1, 6), plotcell = 1, ...)
plotSD(dat, constant = 0.5, loessspan = 1/3, thresholds = c(1, 6), plotcell = 1, ...)
dat |
A |
constant |
A numeric pseudo-constant to be added on all the signals before transferred to log scale. (default: 0.5) |
loessspan |
A numeric number indicating span used
for loess smooth. Details see
|
thresholds |
A vector of two numbers define cutoffs for three levels of detreded log signals. (default: c(1, 6)) |
plotcell |
1 or 2 indicating which cell line will be plotted. If values other than 1 and 2, both cell lines will be plotted. This value won't affect estimation for both cell lines. (default: 1) |
... |
Parameters for base function |
plot |
SD plots of quantification pipelines for
selected cell line by |
list |
A list of two matrices of median and standard error of standard deviations. |
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plotSD(dat)
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature) plotSD(dat)
This is a S4 class to organize data ready for benchmark summarization.
There are 5 S3 objects inside this class. quantData
documents a
list of data matrices ready for evaluation by functions plotSD
,
plotNE
, plot2TX
or plotROC
. condInfo
is a
factor corresponding to columns of quantData
matrices, indicating
to which cell lines each sample belongs. repInfo
is a factor
corresponding to columns of quantData
matrices indicating replicate
information. repInfo
is a legacy from previous versions, and
doesn't have too much meanings in current version. refMed
is the
median log2 signal of calibration references. scaler
is a number
that point to the median log2 signal of reference pipeline. refMed
and scaler
were used to calibrate and generate quantData
.
This is the function to do any pre-filtering or pre-processing analysis for downstream benchmark estimation and visualization. Pre-filtering includes row selection (e.g. protein coding genes) of quantification table; pre-processing includes calculation on a set of rows as calibration reference (e.g. house keeping genes) across different quantification pipelines, calibration of quantifications across all the pipelines based on given cutoffs from selected pipelines.
signalCalibrate(quantData, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, unitCutoff = 0, calibrationFeature2 = NULL, fixMedian = 4.776)
signalCalibrate(quantData, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, unitCutoff = 0, calibrationFeature2 = NULL, fixMedian = 4.776)
quantData |
A list of quantification matrices each with rows by features (transcripts, genes, junctions or exons) and columns by samples. Names of the list should be provided. The sizes of each element should be the same. Missing data will be set to 0. |
condInfo |
A factor documenting condition information of samples,
corresponding to the columns of each element in |
repInfo |
A factor documenting replicate information of samples,
corresponding to the columns of each element in |
evaluationFeature |
A logical vector corresponding to the
rows of each element in |
calibrationFeature |
A logical vector corresponding to the
rows of each element in |
unitReference |
A numeric number specifying which pipeline will
be selected as reference pipeline, i.e. the index of
one element in |
unitCutoff |
A numeric number for signal cutoff on reference
pipeline specified by |
calibrationFeature2 |
A logical vector corresponding to the
rows of each element in |
fixMedian |
A numeric number specifying the median of detrend
logsignals for features specified by |
In the functions plotSD
and plot2TX
, detrended
signals with value 0 will be at the same level as value 1 for
giving pipeline by unitReference
.
A rnaseqcomp
S4 class object
quantData |
A filtered and calibrated list of quantifications for downstream analysis. |
condInfo |
A factor documenting sample condition information. |
repInfo |
A factor documenting sample replicate information. |
refMed |
A list of numeric vectors giving the log scale medians of calibration features in different pipelines. |
scaler |
A number that was used for scaling quantifications onto reference pipeline. |
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature)
data(simdata) condInfo <- factor(simdata$samp$condition) repInfo <- factor(simdata$samp$replicate) evaluationFeature <- rep(TRUE, nrow(simdata$meta)) calibrationFeature <- simdata$meta$house & simdata$meta$chr == 'chr1' unitReference <- 1 dat <- signalCalibrate(simdata$quant, condInfo, repInfo, evaluationFeature, calibrationFeature, unitReference, calibrationFeature2 = calibrationFeature)
This dataset include quantifications on 15776 transcripts on
two cell lines each with 8 replicates. The true differential
expressed transcripts were simulated. Quantifications from
two pipelines (RSEM and FluxCapacitor) are included in this
dataset at simdata$quant
. Meta information of transcripts
is included at simdata$meta
, inlcuding if they belongs
to house keeping genes and their true fold change status.
Sample information is included at simdata$samp
.
A list of objects including list of two 15776*16 quantification matrices, one 15776*6 data frame with meta information and one 16*3 data frame with sample information.