Title: | Mean/Median-balanced quantile normalization |
---|---|
Description: | Modified quantile normalization for omics or other matrix-like data distorted in location and scale. |
Authors: | Eva Brombacher [aut, cre] , Clemens Kreutz [aut, ctb] , Ariane Schad [aut, ctb] |
Maintainer: | Eva Brombacher <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 2.19.0 |
Built: | 2024-10-31 06:08:44 UTC |
Source: | https://github.com/bioc/MBQN |
An exemplary matrix of a missing value (MV) pattern extracted from LFQ intensities of the proteinGroups.txt dataset from PXD001584 [1].
example_NApattern
example_NApattern
A matrix of zero and ones with 1264 rows and 18 columns; 0 means MV, 1 means no MV.
Ariane Schad
https://www.ebi.ac.uk/pride/archive/projects/PXD001584
[1] Ramond E, et al. Importance of host cell arginine uptake
in Francisella phagosomal escape and ribosomal protein amounts.
Mol Cell Proteomics. 2015 14(4):870-881
Extract the k largest or smallest values and their indices for each column of a matrix.
getKminmax(x, k, flag = "max")
getKminmax(x, k, flag = "max")
x |
a data matrix or data frame. |
k |
an integer specifying the number of extreme values. Must be
|
flag |
use "min" or "max" (default) to select smallest or largest elements. |
Order the values of each column of x
and determine the
k smallest (flag = "min"
) or largest (flag = "max"
) values and
their indices. NA's in the data are ignored.
List with elements:
ik |
indices of ordered extreme values |
minmax |
ordered extreme values. |
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
# Create a data matrix x <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) # Get indices of the 5 largest values in each column getKminmax(x, k = 5, "max")
# Create a data matrix x <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) # Get indices of the 5 largest values in each column getKminmax(x, k = 5, "max")
Calculates Pitman-Morgan variance test on two matrices
getPvalue(mtx1, mtx2)
getPvalue(mtx1, mtx2)
mtx1 |
Matrix with samples in columns and features in rows |
mtx2 |
Matrix with samples in columns and features in rows |
Data frame with p values and statistics
set.seed(30) n <- 20 m <- 20 mtx1 <- matrix(rnorm(m * n), m, n) mtx2 <- mbqn(mtx1, FUN = "mean") getPvalue(mtx1, mtx2)
set.seed(30) n <- 20 m <- 20 mtx1 <- matrix(rnorm(m * n), m, n) mtx2 <- mbqn(mtx1, FUN = "mean") getPvalue(mtx1, mtx2)
Modified quantile-normalization (QN) of a matrix, e.g., intensity values from omics data or other data sorted in columns. The modification prevents systematic flattening of features (rows) which are rank invariant (RI) or nearly rank invariant (NRI) across columns, for example features that populate mainly the tails of the intensity distribution or features that separate in intensity.
mbqn( x, FUN = "mean", na.rm = TRUE, method = "limma", offsetmatrix = FALSE, verbose = FALSE )
mbqn( x, FUN = "mean", na.rm = TRUE, method = "limma", offsetmatrix = FALSE, verbose = FALSE )
x |
a data matrix, where rows represent features, e.g. of protein abundance, and columns represent groups or samples, e.g. replicates, treatments, or conditions. |
FUN |
a function like mean, median (default), a user defined function,
or a numeric vector of weights with length |
na.rm |
logical indicating to omit NAs in the computation of feature mean. |
method |
character specifying function for computation of quantile
normalization; "limma" (default) for |
offsetmatrix |
logical indicating if offset matrix should be used instead of offset vector specifying offset for each row |
verbose |
logical indicating to print messages. |
Balance each matrix row by substracting its feature offset computed with FUN, e.g. the median; apply quantile-normalization and add the feature means to the normalized matrix. For further details see [4]. For quantile normalization with the "limma" package see [1,2] and for the preProcessCore package see [3].
Normalized matrix
Ariane Schad
[1] Smyth, G. K., and Speed, T. P. (2003). Normalization of cDNA microarray
data. Methods 31, 265–273.
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth,
[2] G.K. (2015). limma powers differential expression analyses for
RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
[3] Bolstad B. M. (2016). preprocessCore: A collection of pre-processing
functions. R package version 1.36.0.
https://github.com/bmbolstad/preprocessCore
[4] Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile
Normalization. BioRxiv.
mbqnNRI()
, mbqnGetNRIfeatures()
.
## Compute mean and median balanced quantile normalization X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) mbqn(X, mean) # Use arithmetic mean to center features mbqn(X, median) # Use median to center features mbqn(X, "median") ## Use user defined array of weights for averaging wt <- c(1,3,1)/5 # Weights for each sample user_array <- apply(X,1,weighted.mean, wt ,na.rm =TRUE) mbqn(X, user_array)
## Compute mean and median balanced quantile normalization X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) mbqn(X, mean) # Use arithmetic mean to center features mbqn(X, median) # Use median to center features mbqn(X, "median") ## Use user defined array of weights for averaging wt <- c(1,3,1)/5 # Weights for each sample user_array <- apply(X,1,weighted.mean, wt ,na.rm =TRUE) mbqn(X, user_array)
Create a box-and-whisker plot of a data matrix and plot selected features and/or additional user-defined data on top of it.
mbqnBoxplot(mtx, irow = NULL, vals = NULL, add.leg = TRUE, ...)
mbqnBoxplot(mtx, irow = NULL, vals = NULL, add.leg = TRUE, ...)
mtx |
a matrix or data frame. |
irow |
index or vector of row indices of matrix features to plot on top of the boxplot. |
vals |
numeric, array, matrix, or data frame of features with length
|
add.leg |
add legend to plot. |
... |
additional arguments passed to the plot functions, e.g. xlab, ylab, main, ylim, type, las. |
This function calls graphics::boxplot
.
Groups are represent by matrix columns. Selected rows/features or
user-defined arrays are plot on top of the box plot. Missing values are
ignored.
Figure.
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
## Create boxplot of quantile normalized data matrix and plot ## feature from median balanced quantile normalization on top of it. X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) # Create data matrix # Quantile normalization qn.dat <- mbqn(x=X,FUN = NULL ,na.rm = TRUE) # Median balanced quantile normalization mbqn.dat <- mbqn(x=X,FUN = median ,na.rm = TRUE) ## Create boxplot: plot.new() mbqnBoxplot(qn.dat,irow = 1, vals = mbqn.dat[1,], type = "b")
## Create boxplot of quantile normalized data matrix and plot ## feature from median balanced quantile normalization on top of it. X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) # Create data matrix # Quantile normalization qn.dat <- mbqn(x=X,FUN = NULL ,na.rm = TRUE) # Median balanced quantile normalization mbqn.dat <- mbqn(x=X,FUN = median ,na.rm = TRUE) ## Create boxplot: plot.new() mbqnBoxplot(qn.dat,irow = 1, vals = mbqn.dat[1,], type = "b")
Helper function for mbqnGetThreshold
mbqnGetIntersect(combined_qn, combined_mbqn, threshold, plot = TRUE)
mbqnGetIntersect(combined_qn, combined_mbqn, threshold, plot = TRUE)
combined_qn |
Data frame containing RI, p value and statistic calculated for QN |
combined_mbqn |
Data frame containing RI, p value and statistic calculated for MBQN |
threshold |
Significance threshold for p value of Pitman-Morgan variance test |
plot |
Boolean values if logistic regression curves that are used to calculate intersection point should be plotted |
threshold value
Compute the rank frequency of each feature of a matrix and identify NRI/RI features.
mbqnGetNRIfeatures(x, low_thr = 0.5, method = NULL, verbose = TRUE)
mbqnGetNRIfeatures(x, low_thr = 0.5, method = NULL, verbose = TRUE)
x |
a data matrix. Rows represent features, e.g. protein abundances; columns represent samples. |
low_thr |
a value between [0 1]. Features with RI
frequency >= |
method |
character specifying function for computation of quantile
normalization; "limma" (default) for |
verbose |
logical indicating to print messages. |
Quantile normalize the data matrix and sort ranks. Determine the maximum frequency of equal rank across all columns for each feature. Features with maximum frequency above the user-defined threhold are declared as nearly rank invariant.
A list with elements:
p |
a matrix with the rank invariance frequencies |
max_p |
maximum rank invariance frequency in percent |
ip |
index of feature with maximum rank invariance frequency |
nri |
table of the rank invariance frequencies in percent for each NRI/RI feature |
var0_feature |
indices of features with zero sample variance after QN |
low_thr |
threshold used for NRI/RI detection from RI frequency. |
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
mbqnPlotRI()
for visualization of detected NRI/RI features.
## Check data matrix for RI and NRI features set.seed(1234) x <- mbqnSimuData("omics.dep") RI <- mbqnGetNRIfeatures(x, low_thr = 0.5, verbose = FALSE) mbqnPlotRI(RI)
## Check data matrix for RI and NRI features set.seed(1234) x <- mbqnSimuData("omics.dep") RI <- mbqnGetNRIfeatures(x, low_thr = 0.5, verbose = FALSE) mbqnPlotRI(RI)
Calculates the rank invariance threshold from which on MBQN should be used instead of 'classical' QN
mbqnGetThreshold(mtx, meanMedian = "mean", plot = TRUE)
mbqnGetThreshold(mtx, meanMedian = "mean", plot = TRUE)
mtx |
Matrix with samples in columns and features in rows |
meanMedian |
Offset function for the MBQN calculation |
plot |
Boolean values if logistic regression curves that are used to calculate intersection point should be plotted |
threshold value
set.seed(30) n <- 20 m <- 20 mtx <- matrix(rnorm(m * n), m, n) mbqnGetThreshold(mtx)
set.seed(30) n <- 20 m <- 20 mtx <- matrix(rnorm(m * n), m, n) mbqnGetThreshold(mtx)
Quantile normalization of a data matrix where rank invariant (RI)/nearly rank invariant (NRI) rows/features or other user-selected rows are normalized by the mean/median-balanced quantile normalization.
mbqnNRI( x, FUN = "mean", na.rm = TRUE, method = NULL, low_thr = 0.5, index = NULL, offsetmatrix = FALSE, verbose = TRUE )
mbqnNRI( x, FUN = "mean", na.rm = TRUE, method = NULL, low_thr = 0.5, index = NULL, offsetmatrix = FALSE, verbose = TRUE )
x |
a data matrix, where rows represent features, e.g. of protein abundance, and columns represent groups or samples, e.g. replicates, treatments, or conditions. |
FUN |
a function like mean, median (default), a user defined function,
or a numeric vector of weights with length |
na.rm |
logical indicating to omit NAs in the computation of feature mean. |
method |
character specifying function for computation of quantile
normalization; "limma" (default) for |
low_thr |
a value between [0 1]. Features with RI
frequency >= |
index |
an integer or a vector integers specifying the indices of selected rows. |
offsetmatrix |
logical indicating if offset matrix should be used instead of offset vector specifying offset for each row |
verbose |
logical indicating to print messages. |
Selected rows and/or rows with rank invariance frequency
>=threshold
are normalized with the mean/median-balanced quantile
normalization. Remaining rows are quantile normalized without mean balancing.
Normalized matrix
.
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
## Quantile normalize a data matrix where ## nearly rank invariant (NRI) features are balanced X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) mbqnNRI(X, median,low_thr = 0.5) # Balance NRI features selected by threshold mbqnNRI(X, median, index = c(1,2)) # Balance selected features
## Quantile normalize a data matrix where ## nearly rank invariant (NRI) features are balanced X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) mbqnNRI(X, median,low_thr = 0.5) # Balance NRI features selected by threshold mbqnNRI(X, median, index = c(1,2)) # Balance selected features
Check data matrix for rank invariant (RI) and nearly rank invariant (NRI) features/rows across samples and visualize result for different normalizations.
mbqnPlotAll( x, FUN = NULL, low_thr = 0.5, show_nri_only = FALSE, verbose = TRUE, ... )
mbqnPlotAll( x, FUN = NULL, low_thr = 0.5, show_nri_only = FALSE, verbose = TRUE, ... )
x |
a data matrix. Rows represent features, e.g. protein abundances; columns represent samples. |
FUN |
a function like mean, median (default), a user defined function,
or a numeric vector of weights with length |
low_thr |
a value between [0 1]. Features with RI
frequency >= |
show_nri_only |
logical indicating to display only the RI/NRI detection graph. |
verbose |
logical indicating to print messages. |
... |
additional plot arguments passed to |
Rank data and check if lower and upper intensity tails are dominated by few features. Apply quantile normalization without and with mean-balancing and check the standard deviation of normalized features located in the tails.
A set of figures that display the detected RI/NRI features and a list with elements:
p |
a matrix with the rank invariance frequencies |
max_p |
maximum rank invariance frequency in percent |
ip |
index of feature with maximum rank invariance frequency |
nri |
table of the rank invariance frequencies in percent for each NRI/RI feature |
var0_feature |
indices of features with zero sample variance after QN. |
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
mbqnPlotRI()
and mbqnBoxplot()
for the generation of figures,
and mbqn()
for normalization.
## Check data matrix for RI and NRI features X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) mbqnPlotAll(X, mean, low_thr = 0.5)
## Check data matrix for RI and NRI features X <- matrix(c(5,2,3,NA,4,1,4,2,3,4,6,NA,1,3,1),ncol=3) mbqnPlotAll(X, mean, low_thr = 0.5)
Plot rank invariance frequency and feature coverage of detected RI and NRI features
mbqnPlotRI(obj, verbose = FALSE, ...)
mbqnPlotRI(obj, verbose = FALSE, ...)
obj |
list object of RI frequencies from |
verbose |
logical indicating to run function quietly. |
... |
additional arguments (cex, cex.lab, cex.axis, cex.main) passed to the plot function. |
Graphical output of the NRI/RI identification results from
mbqnGetNRIfeatures()
. For each detected NRI/RI feature, plot the
feature index against the RI frequencies together with the RI frequency
detection threshold and print the sample coverage.
Figure
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
mbqnGetNRIfeatures()
for detection of NRI/RI features.
## Check data matrix for RI and NRI features x <- mbqnSimuData("omics.dep") RI <- mbqnGetNRIfeatures(x, low_thr = 0.5, verbose = FALSE) mbqnPlotRI(RI)
## Check data matrix for RI and NRI features x <- mbqnSimuData("omics.dep") RI <- mbqnGetNRIfeatures(x, low_thr = 0.5, verbose = FALSE) mbqnPlotRI(RI)
Generate a random data matrix with or without proteomics, log-transformed feature intensity-like properties.
mbqnSimuData(model = "rand", nrow = NULL, ncol = NULL, show.fig = FALSE)
mbqnSimuData(model = "rand", nrow = NULL, ncol = NULL, show.fig = FALSE)
model |
character indicating one of the three different type of models:
|
nrow |
number of rows of data matrix (only for |
ncol |
number of columns of data matrix
(only for |
show.fig |
logical inidicating whether data properties are plot to
figure (only for |
For model "rand"
, each matrix element is drawn from a
standard normal distribution . For model
"omics"
, the
matrix elements of each row are drawn from a Gaussian distribution
where the mean and standard deviation itself are
drawn Gaussian distributions, i.e.
and
. About 35\
to the missing value pattern present in real protein LFQ
intensities. For model
"omics.dep"
, a single differentially epxressed
RI feature is stacked on top of the matrix from model "omics"
.
matrix
of size nrow x ncol.
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
example_NApattern()
for description of missing value pattern.
mbqnSimuData(model = "rand") mbqnSimuData(model = "rand", 2000,6) set.seed(1234) mbqnSimuData(model = "omics") set.seed(1111) mbqnSimuData(model = "omics.dep")
mbqnSimuData(model = "rand") mbqnSimuData(model = "rand", 2000,6) set.seed(1234) mbqnSimuData(model = "omics") set.seed(1111) mbqnSimuData(model = "omics.dep")
mbqnSimuDistortion
adds a random perturbation of mean and
scale to each column of a matrix.
mbqnSimuDistortion(x, s.mean = 0.05, s.scale = 0.01)
mbqnSimuDistortion(x, s.mean = 0.05, s.scale = 0.01)
x |
a matrix or data frame. |
s.mean |
scatter of relative change of mean. |
s.scale |
scatter of realtive change in scale, i.e. 0.01 corresponds to 1 percent. |
Shift and scale the sample mean and standard deviation of a matrix.
The perturbation of center and scale relative to mean and standard deviation
of each sample are drawn from a Gaussian distribution
with
s.mean
and
=
s.scale
, respectively.
List with:
x.mod |
perturbed matrix |
mx.offset |
numeric array of shifts of the sample means |
mx.scale |
numeric array of relative scales of the sample standard deviations. |
Ariane Schad
Brombacher, E., Schad, A., Kreutz, C. (2020). Tail-Robust Quantile Normalization. BioRxiv.
mbqnSimuData()
for data generation.
set.seed(1234) x <- mbqnSimuData("omics.dep") df <- mbqnSimuDistortion(x)
set.seed(1234) x <- mbqnSimuData("omics.dep") df <- mbqnSimuDistortion(x)
Recalulate p value from two-sided to one-sided
oneSidedTest(sign_value, z_value)
oneSidedTest(sign_value, z_value)
sign_value |
P value from two-sided significance test |
z_value |
Z value from two-sided significance test |
P value from one sided significance test
Truncate float to defined number of decimal values
truncateDecimals(x, digits = 2)
truncateDecimals(x, digits = 2)
x |
float |
digits |
Number of decimal values |
Truncated number
x <- 2.567836 truncateDecimals(x, 3)
x <- 2.567836 truncateDecimals(x, 3)