Title: | Conditional quantile normalization |
---|---|
Description: | A normalization tool for RNA-Seq data, implementing the conditional quantile normalization method. |
Authors: | Jean (Zhijin) Wu, Kasper Daniel Hansen |
Maintainer: | Kasper Daniel Hansen <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.53.0 |
Built: | 2024-10-30 05:33:56 UTC |
Source: | https://github.com/bioc/cqn |
This function implements CQN (conditional quantile normalization) for RNA-Seq data.
cqn(counts, x, lengths, sizeFactors = NULL, subindex = NULL, tau = 0.5, sqn = TRUE, lengthMethod = c("smooth", "fixed"), verbose = FALSE) ## S3 method for class 'cqn' print(x, ...)
cqn(counts, x, lengths, sizeFactors = NULL, subindex = NULL, tau = 0.5, sqn = TRUE, lengthMethod = c("smooth", "fixed"), verbose = FALSE) ## S3 method for class 'cqn' print(x, ...)
counts |
An object that can be coerced to a |
x |
This is a covariate whose systematic influence on the counts will be removed. Typically the GC content. Has to have the same length as the number of rows of counts. |
lengths |
The lengths (in bp) of the regions in counts. Has to have the same length as the number of rows of counts. |
sizeFactors |
An optional vector of sizeFactors, ie. the sequencing effort of the
various samples. If |
subindex |
An optional vector of indices into the rows of |
tau |
This argument is passed to |
sqn |
This argument indicates whether the residuals from the systematic fit are (subset) quantile normalized. The default should only be changed by expert users. |
lengthMethod |
Should length enter the model as a smooth function or not. |
verbose |
Is the function verbose? |
... |
Not used. |
These functions implement the CQN (conditional quantile normalization)
for RNA-Seq data. The functions remove a single systematic effect,
contained in the argument x
, which will typicall be GC
content. The effect of lengths
will either be modelled as a
smooth function (which we recommend), if you are using
lengthMethod = "smooth"
or
as an offset (equivalent to modelling using RPKMs), if you are using
lengthMethod = "fixed"
. Length can be complete removed from
the model by having lengthMethod = "fixed"
and setting all
lengths to 1000.
Final corrected values are equal to value$y + value$offset
.
A list
with the following components
counts |
The value of argument |
x |
The value of argument |
lengths |
The value of argument |
sizeFactors |
The value of argument |
subindex |
The value of argument |
y |
The dependent value used in the systematic effect fit. Equal to log2 tranformed reads per millions. |
offset |
The estimated offset. |
offset0 |
A single number used internally for identifiability. |
glm.offset |
An offset useful for supplying to a GLM type model function. It is on the natural log scale and includes correcting for sizeFactors. |
func1 |
The estimated effect of function 1 (argument |
grid1 |
The grid points on which function 1 (argument |
knots1 |
The knots used for function 1 (argument |
func2 |
The estimated effect of function 2 (lengths). This is a matrix of function values on a grid. Columns are samples and rows are grid points. |
grid2 |
The grid points on which function 2 (lengths) was evaluated. |
knots2 |
The knots used for function 2 (lengths). |
call |
The call. |
Internally, the function uses a custom implementation of subset
quantile normalization, contained in the (not exported) SQN2
function.
Kasper Daniel Hansen, Zhijin Wu
KD Hansen, RA Irizarry, and Z Wu, Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 2012 vol. 13(2) pp. 204-216.
The package vignette.
data(montgomery.subset) data(sizeFactors.subset) data(uCovar) cqn.subset <- cqn(montgomery.subset, lengths = uCovar$length, x = uCovar$gccontent, sizeFactors = sizeFactors.subset, verbose = TRUE)
data(montgomery.subset) data(sizeFactors.subset) data(uCovar) cqn.subset <- cqn(montgomery.subset, lengths = uCovar$length, x = uCovar$gccontent, sizeFactors = sizeFactors.subset, verbose = TRUE)
This function plots the estimated systematic effect which are removed suring CQN normalization.
cqnplot(x, n = 1, col = "grey60", ylab = "QR fit", xlab = "", type = "l", lty = 1, ...)
cqnplot(x, n = 1, col = "grey60", ylab = "QR fit", xlab = "", type = "l", lty = 1, ...)
x |
The result of a call to |
n |
Which systematic effect is plotted. |
col |
A vector of colors, as in |
ylab |
y-label as in |
xlab |
x-label as in |
type |
type, as in |
lty |
line type, as in |
... |
These arguments are passed to |
This function is invoked for its side effect.
Kasper Daniel Hansen
data(montgomery.subset) data(sizeFactors.subset) data(uCovar) cqn.subset <- cqn(montgomery.subset, lengths = uCovar$length, x = uCovar$gccontent, sizeFactors = sizeFactors.subset, verbose = TRUE) cqnplot(cqn.subset, n = 1)
data(montgomery.subset) data(sizeFactors.subset) data(uCovar) cqn.subset <- cqn(montgomery.subset, lengths = uCovar$length, x = uCovar$gccontent, sizeFactors = sizeFactors.subset, verbose = TRUE) cqnplot(cqn.subset, n = 1)
A gene by sample count matrix for 10 samples from from Montgomery et al. Also included is information about these genes (length and gc content) as well as sequencing depth for each of the samples.
data(montgomery.subset) data(sizeFactors.subset) data(uCovar)
data(montgomery.subset) data(sizeFactors.subset) data(uCovar)
montgomery.subset
is a data frame with 23552 observations on 10
different samples, the column names are the sample ids.
sizeFactors.subset
a a named vector of length 10 containing the
number of mapped reads for each of the 10 samples. uCovar
is a
data frame with 23552 observations on 2 different covariates: gc
content and genic length in bp.
Gene models are union models based on Ensembl 61. These gene models were constructed using Genominator. Genes that have zero counts in all 10 samples were excluded.
SB Montgomery, M Sammeth, M Gutierrez-Arcelus, RP Lach, C Ingle, J Nisbett, R Guigo, ET Dermitzakis, (2010) “Transcriptome genetics using second generation sequencing in a Caucasian population”. Nature 464(7289), 773-777.