Title: | Statistical analysis of sequins |
---|---|
Description: | The project is intended to support the use of sequins (synthetic sequencing spike-in controls) owned and made available by the Garvan Institute of Medical Research. The goal is to provide a standard open source library for quantitative analysis, modelling and visualization of spike-in controls. |
Authors: | Ted Wong |
Maintainer: | Ted Wong <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 2.31.0 |
Built: | 2024-11-29 03:23:33 UTC |
Source: | https://github.com/bioc/Anaquin |
Create scatter plot for conjoint sequins.
plotConjoint(seqs, units, x, y, title=NULL, xlab=NULL, ylab=NULL)
plotConjoint(seqs, units, x, y, title=NULL, xlab=NULL, ylab=NULL)
seqs |
Sequin names |
units |
Copy units |
x |
Expected copy number on the x-axis |
y |
Measued abundance on the y-axis |
title |
Label of the plot. Default to |
xlab |
Label for the x-axis. Default to |
ylab |
Label for the y-axis. Default to |
This is an experimental function for the conjoint sequins, and thus might not be fully utilized.
This function does not return anything.
Ted Wong [email protected]
Create linear model for sequins, between input concentation on the x-axis and measurment on the y-axis.
plotLinear(seqs, x, y, std, title, xlab, ylab, showSD, showLOQ, showStats, xBreaks, yBreaks, errors, showLinear, showAxis)
plotLinear(seqs, x, y, std, title, xlab, ylab, showSD, showLOQ, showStats, xBreaks, yBreaks, errors, showLinear, showAxis)
seqs |
Sequin names |
x |
Input concentration on the x-axis |
y |
Measurement on the y-axis |
std |
Standard deviation. (Default to |
title |
Label of the plot. (Default to |
xlab |
Label for the x-axis. (Default to |
ylab |
Label for the y-axis. (Default to |
xBreaks |
Breaks for the x-axis. (Default to |
yBreaks |
Breaks for the y-axis. (Default to |
showSD |
Display vertical standard deviation bars. (Default to |
showLOQ |
Display limit-of-quantification? Default to |
showStats |
Display regression statistics? Default to |
errors |
How errors bar should be calculated. |
showLinear |
Display regression line. (Default to |
showAxis |
Display x-axis and y-axis. (Default to |
The plotLinear
function plots a scatter plot with input concentration on
the x-axis, and measurement on the y-axis. The input concentration is
typically the concentration level in ladder mixture, although other
measures (such as expected copy number) are also possible. The function
builds a linear regression between the two variables, and reports associated
statistics (R2, correlation and regression parameters) on the plot.
The function also estimates limit-of-quantification (LOQ) breakpoint, and reports it on the plot if found. LOQ is defined as the lowest empirical detection limit, a threshold value beyond which stochastic behavior occur. LOQ is estimated by fitting segmented linear regression with two segments on the entire data set, while minimizing the total sum of squares of the differences between the variables.
The function prints a scatter plot and return it's LOQ statistics.
Ted Wong [email protected]
library(Anaquin) # # Data set generated by Cufflinks and Anaquin. described in Section 5.4.6.3 of # the user guide. # data(UserGuideData_5.4.6.3) title <- 'Gene Expression' xlab <- 'Input Concentration (log2)' ylab <- 'FPKM (log2)' # Sequin names seqs <- row.names(UserGuideData_5.4.6.3) # Input concentration x <- log2(UserGuideData_5.4.6.3$Input) # Measured FPKM y <- log2(UserGuideData_5.4.6.3[,2:4]) plotLinear(seqs, x, y, title=title, xlab=xlab, ylab=ylab, showLOQ=TRUE)
library(Anaquin) # # Data set generated by Cufflinks and Anaquin. described in Section 5.4.6.3 of # the user guide. # data(UserGuideData_5.4.6.3) title <- 'Gene Expression' xlab <- 'Input Concentration (log2)' ylab <- 'FPKM (log2)' # Sequin names seqs <- row.names(UserGuideData_5.4.6.3) # Input concentration x <- log2(UserGuideData_5.4.6.3$Input) # Measured FPKM y <- log2(UserGuideData_5.4.6.3[,2:4]) plotLinear(seqs, x, y, title=title, xlab=xlab, ylab=ylab, showLOQ=TRUE)
Create Limit-of-Detection Ratio (LOD) plot between measured abundance (x-axis) and p-value probability (y-axis).
plotLOD(measured, pval, ratio, qval, FDR, title, xlab, ylab, legTitle, showConf)
plotLOD(measured, pval, ratio, qval, FDR, title, xlab, ylab, legTitle, showConf)
measured |
Measured abundance |
pval |
P-value probability |
ratio |
How to group ROC points |
qval |
Q-value probability. (Default to |
FDR |
Chosen false-discovery-rate. Default to |
title |
Title of the plot. (Default to |
xlab |
Label for the x-axis. (Default to |
ylab |
Label for the y-axis. (Default to |
legTitle |
Title for the legend. (Default to |
showConf |
Display confidence interval. (Default to |
Create a Limit-of-Detection Ratio (LOD) plot between measured abundance (x-axis) and p-value probability (y-axis).
The LOD plot indicates the confidence in measurement relative to the magnitude of the measurement. For example, p-value should converge to zero as the sequencing depth increases.
The function also fits non-parametric curves for each sequin ratio group. The curves are modelled with local regression analysis, and are colored by the sequin group.
plotLODR is a simplification from the ERCC dashboard R-package. Further details on the statistical algorithm is available in the ERCC documentation at https://bioconductor.org/packages/release/bioc/html/erccdashboard.html.
The function prints a LODR plot and return associated statistics.
Ted Wong [email protected]
library(Anaquin) # # Data set generated by DESeq2 and Anaquin. described in Section 5.6.3.3 of # the user guide. # data(UserGuideData_5.6.3) xlab <- 'Average Counts' ylab <- 'P-value' title <- 'LOD Curves' # Sequin names seqs <- row.names(UserGuideData_5.6.3) # Expected log-fold group <- UserGuideData_5.6.3$ExpLFC # Measured average abundance measured <- UserGuideData_5.6.3$Mean # P-value pval <- UserGuideData_5.6.3$Pval # Q-value qval <- UserGuideData_5.6.3$Qval plotLOD(measured, pval, group, qval, xlab=xlab, ylab=ylab, title=title, FDR=0.1)
library(Anaquin) # # Data set generated by DESeq2 and Anaquin. described in Section 5.6.3.3 of # the user guide. # data(UserGuideData_5.6.3) xlab <- 'Average Counts' ylab <- 'P-value' title <- 'LOD Curves' # Sequin names seqs <- row.names(UserGuideData_5.6.3) # Expected log-fold group <- UserGuideData_5.6.3$ExpLFC # Measured average abundance measured <- UserGuideData_5.6.3$Mean # P-value pval <- UserGuideData_5.6.3$Pval # Q-value qval <- UserGuideData_5.6.3$Qval plotLOD(measured, pval, group, qval, xlab=xlab, ylab=ylab, title=title, FDR=0.1)
Create a scatter plot with input concentration on the x-axis, and measured proportion on the y-axis.
plotLogistic(seqs, x, y, title, xlab, ylab, showLOA, threshold)
plotLogistic(seqs, x, y, title, xlab, ylab, showLOA, threshold)
seqs |
Sequin names |
x |
Expected input concentration on the x-axis |
y |
Measured proportion on the y-axis |
title |
Title of the plot. (Default to NULL). |
xlab |
Label for the x-axis. (Default to NULL). |
ylab |
Label for the y-axis. (Default to NULL). |
showLOA |
Display limit-of-assembly. (Default to TRUE). |
threshold |
Threshold required for limit-of-assembly (LOA). (Default to 0.7). |
The plotLogistic
function creates a scatter plot with input
concentration on the x-axis, and measured proportion on the y-axis. Common
measured statistics include p-value, percentage and sensitivity. The plot
builds a logistic regression model between the two variables.
The function also estimates limit-of-assembly (LOA) breakpoint, and reports it on the plot if found. The LOA breakpoint is an empirical detection limit, and also the abundance whereby the fitted logistic curve exceeds a user-defined threshold.
The function returns the limit of quantification.
Ted Wong [email protected]
library(Anaquin) # # Data set generated by Cufflinks and Anaquin. described in Section 5.4.5.1 of # the user guide. # data(UserGuideData_5.4.5.1) title <- 'Assembly Plot' xlab <- 'Input Concentration (log2)' ylab <- 'Sensitivity' # Sequin names seqs <- row.names(UserGuideData_5.4.5.1) # Input concentration x <- log2(UserGuideData_5.4.5.1$Input) # Measured sensitivity y <- UserGuideData_5.4.5.1$Sn plotLogistic(seqs, x, y, title=title, xlab=xlab, ylab=ylab, showLOA=TRUE)
library(Anaquin) # # Data set generated by Cufflinks and Anaquin. described in Section 5.4.5.1 of # the user guide. # data(UserGuideData_5.4.5.1) title <- 'Assembly Plot' xlab <- 'Input Concentration (log2)' ylab <- 'Sensitivity' # Sequin names seqs <- row.names(UserGuideData_5.4.5.1) # Input concentration x <- log2(UserGuideData_5.4.5.1$Input) # Measured sensitivity y <- UserGuideData_5.4.5.1$Sn plotLogistic(seqs, x, y, title=title, xlab=xlab, ylab=ylab, showLOA=TRUE)
Create receiver operating characteristic (ROC) plot at various threshold settings.
plotROC(seqs, score, group, label, refGroup, title, legTitle)
plotROC(seqs, score, group, label, refGroup, title, legTitle)
seqs |
Sequin names |
score |
How to rank ROC points |
group |
How to group ROC points |
label |
True-positive (TP) or false positive (FP) |
refGroup |
Reference ratio groups |
title |
Label of the plot. Default to |
legTitle |
Title of the legend. Default to |
Create a receiver operating characteristic (ROC) plot at various threshold settings. The true positive rate (TPR) is plotted on the x-axis and false positive rate (FPR) is plotted on the y-axis.
The function requires a scoring threshold function, and illustrates the performance of the data as the threshold is varied. Common scoring threshold include p-value, sequencing depth and allele frequency, etc.
ROC plot is a useful diagnostic performance tool; it provides tools to select possibly optimal models and to discard suboptimal ones. In particularly, the AUC statistics indicate the performance of the model relatively to a random experiment (AUC 0.5).
The function prints ROC plot and return it's AUC statistics.
Ted Wong [email protected]
library(Anaquin) # # Data set generated by DESeq2 and Anaquin. described in Section 5.6.3.3 of # the user guide. # data(UserGuideData_5.6.3) # Sequin names seqs <- row.names(UserGuideData_5.6.3) # Expected log-fold group <- abs(UserGuideData_5.6.3$ExpLFC) # How the ROC curves are ranked score <- 1-UserGuideData_5.6.3$Pval # Classified labels (TP/FP) label <- UserGuideData_5.6.3$Label plotROC(seqs, score, group, label, title='ROC Plot', refGroup=0)
library(Anaquin) # # Data set generated by DESeq2 and Anaquin. described in Section 5.6.3.3 of # the user guide. # data(UserGuideData_5.6.3) # Sequin names seqs <- row.names(UserGuideData_5.6.3) # Expected log-fold group <- abs(UserGuideData_5.6.3$ExpLFC) # How the ROC curves are ranked score <- 1-UserGuideData_5.6.3$Pval # Classified labels (TP/FP) label <- UserGuideData_5.6.3$Label plotROC(seqs, score, group, label, title='ROC Plot', refGroup=0)
Individual sequins are combined across a range of precise concentrations to formulate mixtures. By modulating the concentration at which each sequin is present in the mixture, we can emulate quantitative features of genome biology.
This is the mixture A and B in RnaQuin
. File name is A.R.6.csv
on
http://www.sequins.xyz
.
data(RnaQuinGeneMixture)
data(RnaQuinGeneMixture)
Data frame:
Name: Sequin name
Length: Gene length
MixA: Input concentration for mixture A
MixB: Input concentration for mixture B
Data frame with columns defined in Format
.
Individual sequins are combined across a range of precise concentrations to formulate mixtures. By modulating the concentration at which each sequin is present in the mixture, we can emulate quantitative features of genome biology.
This is the mixture A and B in RnaQuin
. File name is A.R.5.csv
on
http://www.sequins.xyz
.
data(RnaQuinIsoformMixture)
data(RnaQuinIsoformMixture)
Data frame:
Name: Sequin name
Length: Sequin length
MixA: Input concentration for mixture A
MixB: Input concentration for mixture B
Data frame with columns defined in Format
.
Assembly sensitivity estimated by Cuffcompare. Section 5.4.5.1
of the Anaquin user guide has details on the data set.
data(UserGuideData_5.4.5.1)
data(UserGuideData_5.4.5.1)
Data frame:
InputConcent: Input concentration in attomol/ul
Sn: Measured sensitivity
Data frame with columns defined in Format
.
S.A Hardwick. Spliced synthetic genes as internal controls in RNA sequencing
experiments. Nature Methods
, 2016.
Gene expression estimated by Cufflinks. Section 5.4.6.3
of the Anaquin
user guide has details on the data set.
data(UserGuideData_5.4.6.3)
data(UserGuideData_5.4.6.3)
Data frame:
InputConcent: Input concentration in attomol/ul
Observed1: Measured FPKM for the first replicate
Observed2: Measured FPKM for the second replicate
Observed3: Measured FPKM for the third replicate
Data frame with columns defined in Format
.
S.A Hardwick. Spliced synthetic genes as internal controls in RNA sequencing
experiments. Nature Methods
, 2016.
Differential gene expression estimated by DESeq2. Section 5.6.3
has details on the data set.
data(UserGuideData_5.6.3)
data(UserGuideData_5.6.3)
Data frame:
ExpLFC: Expected log-fold change
ObsLFC: Observed log-fold change
SD: Standard deviation of the measurment
Pval: P-value probability
Qval: Q-value probability
Mean: Average counts across the samples
Label: Average counts across the samples
Data frame with columns defined in Format
.
S.A Hardwick. Spliced synthetic genes as internal controls in RNA sequencing
experiments. Nature Methods
, 2016.