| Title: | Shiny-based interactive data-quality exploration for omics data |
|---|---|
| Description: | Data quality assessment is an integral part of preparatory data analysis to ensure sound biological information retrieval. We present here the MatrixQCvis package, which provides shiny-based interactive visualization of data quality metrics at the per-sample and per-feature level. It is broadly applicable to quantitative omics data types that come in matrix-like format (features x samples). It enables the detection of low-quality samples, drifts, outliers and batch effects in data sets. Visualizations include amongst others bar- and violin plots of the (count/intensity) values, mean vs standard deviation plots, MA plots, empirical cumulative distribution function (ECDF) plots, visualizations of the distances between samples, and multiple types of dimension reduction plots. Furthermore, MatrixQCvis allows for differential expression analysis based on the limma (moderated t-tests) and proDA (Wald tests) packages. MatrixQCvis builds upon the popular Bioconductor SummarizedExperiment S4 class and enables thus the facile integration into existing workflows. The package is especially tailored towards metabolomics and proteomics mass spectrometry data, but also allows to assess the data quality of other data types that can be represented in a SummarizedExperiment object. |
| Authors: | Thomas Naake [aut, cre] (ORCID: <https://orcid.org/0000-0001-7917-5580>), Wolfgang Huber [aut] (ORCID: <https://orcid.org/0000-0002-0474-2218>) |
| Maintainer: | Thomas Naake <[email protected]> |
| License: | GPL-3 |
| Version: | 1.21.0 |
| Built: | 2026-06-03 07:31:39 UTC |
| Source: | https://github.com/bioc/MatrixQCvis |
barplotSamplesMeasuredMissing plots the number of
measured/missing features of samples as a barplot. The function will
take as input the returned tbl of samplesMeasuredMissing.
barplotSamplesMeasuredMissing(tbl, measured = TRUE)barplotSamplesMeasuredMissing(tbl, measured = TRUE)
tbl |
|
measured |
|
gg object from ggplot2
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## create the data.frame with information on number of measured/missing ## values tbl <- samplesMeasuredMissing(se) ## plot number of measured values barplotSamplesMeasuredMissing(tbl, measured = TRUE) ## plot number of missing values barplotSamplesMeasuredMissing(tbl, measured = FALSE)## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## create the data.frame with information on number of measured/missing ## values tbl <- samplesMeasuredMissing(se) ## plot number of measured values barplotSamplesMeasuredMissing(tbl, measured = TRUE) ## plot number of missing values barplotSamplesMeasuredMissing(tbl, measured = FALSE)
SummarizedExperiment
The function batchCorrectionAssay removes the batch effect of
(count/intensity) values of a SummarizedExperiment.
It uses either the removeBatchEffect or ComBat functions
or no batch effect correction method (pass-through,
none).
batchCorrectionAssay( se, method = c("none", "removeBatchEffect (limma)", "ComBat"), batch = NULL, batch2 = NULL, ... )batchCorrectionAssay( se, method = c("none", "removeBatchEffect (limma)", "ComBat"), batch = NULL, batch2 = NULL, ... )
se |
|
method |
|
batch |
|
batch2 |
|
... |
further arguments passed to |
The column batch in colData(se) contains the information
on the batch identity. For method = "removeBatchEffect (limma)",
batch2 may indicate a second series of batches.
Internal use in shinyQC.
If batch is NULL and method is set to
method = "removeBatchEffect (limma)" or method = "ComBat",
no batch correction will be performed (equivalent to
method = "none").
The method ComBat will only perform batch correction on valid
features: (1) more or equal than two observations (no NA) per level
and per feature, (2) variance greater than 0 per feature, and (3) more than
two valid features as given by (1) and (2). For non-valid features, values
are taken from assay(se).
matrix
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), batch = rep(c(1, 2), 5)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## method = "removeBatchEffect (limma)" batchCorrectionAssay(se, method = "removeBatchEffect (limma)", batch = "batch", batch2 = NULL) ## method = "ComBat" batchCorrectionAssay(se, method = "ComBat", batch = "batch", batch2 = NULL)## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), batch = rep(c(1, 2), 5)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## method = "removeBatchEffect (limma)" batchCorrectionAssay(se, method = "removeBatchEffect (limma)", batch = "batch", batch2 = NULL) ## method = "ComBat" batchCorrectionAssay(se, method = "ComBat", batch = "batch", batch2 = NULL)
The function create_boxplot creates a boxplot per sample for the
intensity/count values.
createBoxplot( se, orderCategory = colnames(colData(se)), title = "", log = TRUE, violin = FALSE )createBoxplot( se, orderCategory = colnames(colData(se)), title = "", log = TRUE, violin = FALSE )
se |
|
orderCategory |
|
title |
|
log |
|
violin |
|
Internal usage in shinyQC.
gg object from ggplot2
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) createBoxplot(se, orderCategory = "name", title = "", log = TRUE, violin = FALSE)## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) createBoxplot(se, orderCategory = "name", title = "", log = TRUE, violin = FALSE)
The function createDfFeature takes as input a list of matrices and
returns the row Feature of each matrix as a column of a
data.frame. The function createDfFeature provides the input
for the function featurePlot.
createDfFeature(l, feature)createDfFeature(l, feature)
l |
|
feature |
|
Internal usage in shinyQC
data.frame
set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) createDfFeature(l, "feature 1")set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) createDfFeature(l, "feature 1")
The function cv calculates the coefficient of variation from columns
of a matrix. The coefficients of variation are calculated according to the
formula sd(y) / mean(y) * 100 with y the column values, thus,
the function returns the coefficient of variation in percentage.
cv(x, name = "raw")cv(x, name = "raw")
x |
|
name |
|
The function returned a named list (the name is specified by the
name argument) containing the coefficient of variation of the
columns of x.
list
x <- matrix(seq_len(10), ncol = 2) cv(x)x <- matrix(seq_len(10), ncol = 2) cv(x)
The function cvFeaturePlot returns a plotly plot of coefficient
of variation values. It will create a violin plot and superseded points
of coefficient of variation values per list entry of l.
cvFeaturePlot(l, lines = FALSE)cvFeaturePlot(l, lines = FALSE)
l |
|
lines |
|
lines = TRUE will connect the points belonging to the same feature
with a line. If there are less than two features, the violin plot will not be
plotted. The violin plots will be ordered according to the order in l
plotly
x1 <- matrix(seq_len(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) cvFeaturePlot(l, lines = FALSE)x1 <- matrix(seq_len(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) cvFeaturePlot(l, lines = FALSE)
The function dimensionReduction creates a data.frame
with the coordinates of the projected data (first entry of returned output).
The function allows for the
following projections:
Principal Component Analysis (PCA), Principal Coordinates
Analysis/Multidimensional Scaling (PCoA), Non-metric Multidimensional
scaling (NMDS), t-distributed stochastic neighbor embedding (tSNE), and
Uniform Manifold Approximation and Projection (UMAP).
The second list entry will contains the object returned from
prcomp (PCA), cmdscale (PCoA), isoMDS (NMDS),
Rtsne (tSNE), or umap (UMAP).
dimensionReduction( x, type = c("PCA", "PCoA", "NMDS", "tSNE", "UMAP"), params = list() )dimensionReduction( x, type = c("PCA", "PCoA", "NMDS", "tSNE", "UMAP"), params = list() )
x |
|
type |
|
params |
|
The function dimensionReduction is a wrapper around the following
functions stats::prcomp (PCA), stats::cmdscale (PCoA),
MASS::isoMDS (NMDS), Rtsne::Rtsne (tSNE), and
umap::umap (UMAP). For the function umap::umap
the method is set to naive.
list, first entry contains a tbl, second entry contains
the object returned from prcomp (PCA), cmdscale (PCoA),
isoMDS (NMDS), Rtsne (tSNE), or umap (UMAP)
Thomas Naake
x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP dimensionReduction(x, type = "PCA", params = params) dimensionReduction(x, type = "PCoA", params = params) dimensionReduction(x, type = "NMDS", params = params) dimensionReduction(x, type = "tSNE", params = params) dimensionReduction(x, type = "UMAP", params = params)x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP dimensionReduction(x, type = "PCA", params = params) dimensionReduction(x, type = "PCoA", params = params) dimensionReduction(x, type = "NMDS", params = params) dimensionReduction(x, type = "tSNE", params = params) dimensionReduction(x, type = "UMAP", params = params)
dimensionReduction valuesThe function dimensionReductionPlot creates a dimension reduction plot.
The function takes as input the tbl object obtained
from the dimensionReduction function. The tbl contains
transformed values by one of the dimension reduction methods.
dimensionReductionPlot( tbl, se, color = c("none", colnames(se@colData)), size = c("none", colnames(se@colData)), explainedVar = NULL, x_coord, y_coord, height = 600, interactive = TRUE )dimensionReductionPlot( tbl, se, color = c("none", colnames(se@colData)), size = c("none", colnames(se@colData)), explainedVar = NULL, x_coord, y_coord, height = 600, interactive = TRUE )
tbl |
|
se |
|
color |
|
size |
|
explainedVar |
NULL or named |
x_coord |
|
y_coord |
|
height |
|
interactive |
|
The function dimensionReductionPlot is a wrapper for a
ggplot/ggplotly expression.
plotly or gg
Thomas Naake
library(SummarizedExperiment) ## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, byrow = TRUE, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), median_vals = apply(a, 2, median)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) pca <- dimensionReduction(x = assay(se), type = "PCA", params = list())[[1]] dimensionReductionPlot(tbl = pca, se = se, color = "type", size = "median_vals", x_coord = "PC1", y_coord = "PC2")library(SummarizedExperiment) ## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, byrow = TRUE, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), median_vals = apply(a, 2, median)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) pca <- dimensionReduction(x = assay(se), type = "PCA", params = list())[[1]] dimensionReductionPlot(tbl = pca, se = se, color = "type", size = "median_vals", x_coord = "PC1", y_coord = "PC2")
The function distSample creates a heatmap from a distance matrix
created by the function distShiny. The heatmap is annotated by the
column specified by the label column in colData(se).
distSample(d, se, label = "name", title = "raw", ...)distSample(d, se, label = "name", title = "raw", ...)
d |
|
se |
|
label |
|
title |
|
... |
further arguments passed to |
Internal use in shinyQC
Heatmap object from ComplexHeatmap
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(name = colnames(a_i), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a_i)) se <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) dist <- distShiny(a_i) distSample(dist, se, label = "type", title = "imputed", show_row_names = TRUE)## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(name = colnames(a_i), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a_i)) se <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) dist <- distShiny(a_i) distSample(dist, se, label = "type", title = "imputed", show_row_names = TRUE)
The function distShiny takes as an input a numerical matrix or
data.frame and returns the distances between the rows and columns
based on the defined method (e.g. euclidean distance).
distShiny(x, method = "euclidean")distShiny(x, method = "euclidean")
x |
|
method |
|
Internal use in shinyQC.
matrix
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) distShiny(x = x)x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) distShiny(x = x)
The function driftPlot aggregates the (count/intensity) values from
the assay() slot of a SummarizedExperiment by the median
or sum of the (count/intensity) values. driftPlot then
visualizes these aggregated values and adds a trend line (using either
LOESS or a linear model) from (a subset of) the aggregated values. The
subset is specified by the arguments category and level.
driftPlot( se, aggregation = c("median", "sum"), category = colnames(colData(se)), orderCategory = colnames(colData(se)), level = c("all", unique(colData(se)[, category])), method = c("loess", "lm") )driftPlot( se, aggregation = c("median", "sum"), category = colnames(colData(se)), orderCategory = colnames(colData(se)), level = c("all", unique(colData(se)[, category])), method = c("loess", "lm") )
se |
|
aggregation |
|
category |
|
orderCategory |
|
level |
|
method |
|
The x-values are sorted according to the orderCategory argument: The
levels of the corresponding column in colData(se) are pasted with the
sample names (in the column name) and factorized.
Internal usage in shinyQC.
gg object from ggplot2
#' ## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 10, ncol = 100, dimnames = list(seq_len(10), paste("sample", seq_len(100)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 50), rep("2", 50))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) driftPlot(se, aggregation = "sum", category = "type", orderCategory = "type", level = "1", method = "loess")#' ## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 10, ncol = 100, dimnames = list(seq_len(10), paste("sample", seq_len(100)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 50), rep("2", 50))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) driftPlot(se, aggregation = "sum", category = "type", orderCategory = "type", level = "1", method = "loess")
The function ECDF creates a plot of the empirical cumulative
distribution function of a specified sample and an outgroup (reference).
The reference is specified by the group argument. The row-wise
(feature) mean values of the reference are calculated after excluding
the specified sample.
ECDF(se, sample = colnames(se), group = c("all", colnames(colData(se))))ECDF(se, sample = colnames(se), group = c("all", colnames(colData(se))))
se |
|
sample |
|
group |
|
Internal use in shinyQC.
The function ECDF uses the ks.test function from stats
to perform a two-sample Kolmogorov-Smirnov test. The Kolmogorov-Smirnov
test is run with the alternative "two.sided"
(null hypothesis is that the true distribution function of the
sample is equal to the hypothesized distribution function of the
group).
The exact argument in ks.test is set to NULL, meaning
that an exact p-value is computed if the product of the sample sizes is
less than 10000 of sample and group. Otherwise, asymptotic
distributions are used whose approximations might be inaccurate in low
sample sizes.
gg object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) ECDF(se, sample = "sample 1", group = "all")## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) ECDF(se, sample = "sample 1", group = "all")
The function explVar calculates the proportion of explained variance
for each principal component (PC, type = "PCA") and axis
(type = "PCoA").
explVar(d, type = c("PCA", "PCoA"))explVar(d, type = c("PCA", "PCoA"))
d |
|
type |
|
explVar uses the function prcomp from the stats package
to retrieve the explained standard deviation per PC
(type = "PCA") and the function cmdscale from the stats
package to retrieve the explained variation based on eigenvalues per
Axis (type = "PCoA").
numeric vector with the proportion of explained variance
for each PC or Axis
Thomas Naake
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) x <- x + rnorm(100) ## run for PCA pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] explVar(d = pca, type = "PCA") ## run for PCoA pcoa <- dimensionReduction(x = x, params = list(method = "euclidean"), type = "PCoA")[[2]] explVar(d = pcoa, type = "PCoA")x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) x <- x + rnorm(100) ## run for PCA pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] explVar(d = pca, type = "PCA") ## run for PCoA pcoa <- dimensionReduction(x = x, params = list(method = "euclidean"), type = "PCoA")[[2]] explVar(d = pcoa, type = "PCoA")
The function extractComb extracts the features that match a
combination depending if the features was measured or missing.
The function will return the sets that match the combination,
thus, the function might be useful when answering questions about which
features are measured/missing under certain combinations (e.g. sample
types or experimental conditions).
extractComb(se, combination, measured = TRUE, category = "type")extractComb(se, combination, measured = TRUE, category = "type")
se |
|
combination |
|
measured |
|
category |
|
The function extractComb uses the make_comb_mat function from
ComplexHeatmap package.
Presence is defined by a feature being measured in at least one sample of a set.
Absence is defined by a feature with only missing values (i.e. no measured values) of a set.
character
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) extractComb(se, combination = "2", measured = TRUE, category = "type")## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) extractComb(se, combination = "2", measured = TRUE, category = "type")
The function featurePlot creates a plot of (count/intensity) values
for different data processing steps (referring to columns in the
data.frame) over the different samples (referring to rows in
the data.frame).
featurePlot(df)featurePlot(df)
df |
|
Internal usage in shinyQC.
gg object from ggplot2
set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) df <- createDfFeature(l, "feature 1") featurePlot(df)set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) df <- createDfFeature(l, "feature 1") featurePlot(df)
hist_sample plots the number of a category (e.g. sample types)
as a histogram. It use the returned tbl from hist_sample_num.
hist_sample(tbl, category = "type")hist_sample(tbl, category = "type")
tbl |
|
category |
|
gg object from ggplot2
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- hist_sample_num(se, category = "type") hist_sample(tbl)## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- hist_sample_num(se, category = "type") hist_sample(tbl)
hist_sample_num returns the number of a category
(e.g. sample types) as a tbl.
The function will retrieve first the column category in colData(se).
The function will return a tbl containing the numerical
values of the quantities.
hist_sample_num(se, category = "type")hist_sample_num(se, category = "type")
se |
|
category |
|
tbl
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) hist_sample_num(se, category = "type")## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) hist_sample_num(se, category = "type")
The function histFeature creates a histogram with the number
of measured/missing values per feature.
histFeature(x, measured = TRUE, ...)histFeature(x, measured = TRUE, ...)
x |
|
measured |
|
... |
additional parameters passed to |
plotly object from ggplotly
x <- matrix(c(c(1, 1, 1), c(1, NA, 1), c(1, NA, 1), c(1, 1, 1), c(NA, 1, 1), c(NA, 1, 1)), byrow = FALSE, nrow = 3) colnames(x) <- c("A_1", "A_2", "A_3", "B_1", "B_2", "B_3") histFeature(x, binwidth = 1)x <- matrix(c(c(1, 1, 1), c(1, NA, 1), c(1, NA, 1), c(1, 1, 1), c(NA, 1, 1), c(NA, 1, 1)), byrow = FALSE, nrow = 3) colnames(x) <- c("A_1", "A_2", "A_3", "B_1", "B_2", "B_3") histFeature(x, binwidth = 1)
The function histFeatureCategory creates histogram
plots for each sample type in se.
histFeatureCategory(se, measured = TRUE, category = "type", ...)histFeatureCategory(se, measured = TRUE, category = "type", ...)
se |
|
measured |
|
category |
|
... |
additional parameters passed to |
plotly object from ggplotly
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) histFeatureCategory(se, measured = TRUE, category = "type")## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) histFeatureCategory(se, measured = TRUE, category = "type")
The function hoeffDPlot creates via ggplot a violin plot per
factor, a jitter plot of the data points and (optionally) connects the points
via lines. hoeffDPlot uses the plotly package to make the
figure interactive.
hoeffDPlot(df, lines = TRUE)hoeffDPlot(df, lines = TRUE)
df |
|
lines |
|
The function hoeffDPlot will create the violin plot and jitter plot
according to the specified order given by the colnames of df.
hoeffDPlot will thus internally refactor the colnames of the
supplied data.frame according to the order of the colnames.
gg object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") hd_r <- hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, log = FALSE, group = "all") hd_n <- hoeffDValues(tbl_n, "normalized") df <- data.frame(raw = hd_r, normalized = hd_n) hoeffDPlot(df, lines = TRUE) hoeffDPlot(df, lines = FALSE)## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") hd_r <- hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, log = FALSE, group = "all") hd_n <- hoeffDValues(tbl_n, "normalized") df <- data.frame(raw = hd_r, normalized = hd_n) hoeffDPlot(df, lines = TRUE) hoeffDPlot(df, lines = FALSE)
The function creates and returns Hoeffding's D statistics values from MA values.
In case sample_n is set to a numerical value (e.g. 10000), a
random subset containing sample_n is taken to calculate Hoeffding's D
values to speed up the calculation. In case there are less features
than sample_n, all features are taken.
hoeffDValues(tbl, name = "raw", sample_n = NULL)hoeffDValues(tbl, name = "raw", sample_n = NULL)
tbl |
|
name |
|
sample_n |
|
The function uses the function hoeffd from the Hmisc package to
calculate the values.
named list with Hoeffding's D values per sample
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se) hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, group = "all") hoeffDValues(tbl_n, "normalized") ## transformed values se_t <- se assay(se_t) <- transformAssay(a, "log") tbl_t <- MAvalues(se_t, group = "all") hoeffDValues(tbl_t, "transformed")## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se) hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, group = "all") hoeffDValues(tbl_n, "normalized") ## transformed values se_t <- se assay(se_t) <- transformAssay(a, "log") tbl_t <- MAvalues(se_t, group = "all") hoeffDValues(tbl_t, "transformed")
matrix
The function impute imputes missing values based on one of the
following principles: Bayesian missing value imputation (BPCA),
k-nearest neighbor averaging (kNN), Malimum likelihood-based
imputation method using the EM algorithm (MLE), replacement by
the smallest non-missing value
in the data (Min), replacement by the minimal value observed as
the q-th quantile (MinDet, default q = 0.01), and replacement
by random draws from a Gaussian distribution centred to a minimal value
(MinProb).
imputeAssay( a, method = c("BPCA", "kNN", "MLE", "Min", "MinDet", "MinProb", "none") )imputeAssay( a, method = c("BPCA", "kNN", "MLE", "Min", "MinDet", "MinProb", "none") )
a |
|
method |
|
BPCA wrapper for pcaMethods::pca with methods = "bpca".
BPCA is a missing at random (MAR) imputation method.
kNN wrapper for impute::impute.knn with k = 10,
rowmax = 0.5, colmax = 0.5, maxp = 1500. kNN
is a MAR imputation method.
MLE wrapper for imputeLCMD::impute.MAR with
method = "MLE",
model.selector = 1/imputeLCMD::impute.wrapper.MLE.
MLE is a MAR imputation method.
Min imputes the missing values by the observed minimal value of
x. Min is a missing not at random (MNAR) imputation method.
MinDet is a wrapper for imputeLCMD::impute.MinDet with
q = 0.01. MinDet performs the imputation using a
deterministic minimal value approach. The missing entries are
replaced with a minimal value, estimated from the q-th quantile
from each sample. MinDet is a MNAR imputation method.
MinProb is a wrapper for imputeLCMD::impute.MinProb with
q = 0.01 and tune.sigma = 1. MinProb performs the
imputation based on random draws from a Gaussion distribution with the
mean set to the minimal value of a sample. MinProb is a
MNAR imputation method.
MinProb does not impute values (not available within shiny
application).
matrix
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA imputeAssay(a, method = "kNN") imputeAssay(a, method = "Min") imputeAssay(a, method = "MinDet") imputeAssay(a, method = "MinProb")a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA imputeAssay(a, method = "kNN") imputeAssay(a, method = "Min") imputeAssay(a, method = "MinDet") imputeAssay(a, method = "MinProb")
The function creates a 2D histogram of M and A values.
MAplot( tbl, group = c("all", colnames(tbl)), plot = c("all", unique(tbl[["name"]])) )MAplot( tbl, group = c("all", colnames(tbl)), plot = c("all", unique(tbl[["name"]])) )
tbl |
|
group |
|
plot |
|
MAplot returns a 2D hex histogram instead of a classical scatterplot
due to computational reasons and better visualization of overlaying points.
The argument plot specifies the sample (refering to
colData(se)$name) to be plotted. If plot = "all", MA values
for all samples will be plotted (samples will be plotted in facets).
If the number of features (tbl$Features) is below 1000, points will be
plotted (via geom_points), otherwise hexagons will be plotted
(via geom_hex).
gg object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") MAplot(tbl, group = "all", plot = "all")## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") MAplot(tbl, group = "all", plot = "all")
The function MAvalues will create MA values as input for the function
MAplot and hoeffDValues.
M and A are specified relative to specified samples which
is determined by the group argument. In case of group == "all",
all samples (expect the specified one) are taken for the reference
calculation. In case of group != "all" will use the samples belonging
to the same group given in colnames(colData(se)) expect the
specified one.
MAvalues(se, log2 = TRUE, group = c("all", colnames(colData(se))))MAvalues(se, log2 = TRUE, group = c("all", colnames(colData(se))))
se |
|
log2 |
|
group |
|
tbl with columns Feature, name (sample name),
A, M and additional columns of colData(se)
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) MAvalues(se, log = FALSE, group = "all")## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) MAvalues(se, log = FALSE, group = "all")
The function measuredCategory creates a tbl with
the number of measured values per feature. 0 means that there were only
missing values (NA) for the feature and sample type.
measuredCategory will return a tbl where columns are the
unique sample types and rows are the features as in assay(se).
measuredCategory(se, measured = TRUE, category = "type")measuredCategory(se, measured = TRUE, category = "type")
se |
|
measured |
|
category |
|
measuredCategory is a helper function.
matrix with number of measured/missing features per
category type
## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) measuredCategory(se, measured = TRUE, category = "type")## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) measuredCategory(se, measured = TRUE, category = "type")
The function mosaic creates a mosaic plot of two factors from
an SummarizedExperiment object. The columns f1 and f2
are taken from colData(se).
mosaic(se, f1, f2)mosaic(se, f1, f2)
se |
|
f1 |
|
f2 |
|
Code partly taken from https://stackoverflow.com/questions/21588096/pass-string-to-facet-grid-ggplot2
gg object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), cell_type = c("A", "B")) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) mosaic(se, "cell_type", "type")## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), cell_type = c("A", "B")) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) mosaic(se, "cell_type", "type")
The function normalizeAssay performs normalization by sum of the
(count/intensity) values per sample (method = "sum"), quantile
division per sample (method = "quantile division"),
or by quantile normalization (adjusting the value distributions that they
become identical in statistical properties, method = "quantile").
The value for quantile division (e.g., the 75
specified by the probs argument. Quantile normalization is
performed by using the normalizeQuantiles function from limma.
For the methods "sum" and "quantile division", normalization
will be done depending on the multiplyByNormalizationValue parameter.
If set to TRUE, normalization values (e.g. sum or quantile) will be
calculated per sample. In a next step, adjusted normalization values will
be calculated for each sample in relation to the median normalization
values across all samples. Finally, the values in a are
multiplied by these adjusted normalization values.
If multiplyByNormalizationValue is set to FALSE,
normalization values (e.g. sum or quantile) will be
calculated per sample. The values in a are sample-wise divided by
the normalization values.
normalizeAssay( a, method = c("none", "sum", "quantile division", "quantile"), probs = 0.75, multiplyByNormalizationValue = FALSE )normalizeAssay( a, method = c("none", "sum", "quantile division", "quantile"), probs = 0.75, multiplyByNormalizationValue = FALSE )
a |
|
method |
|
probs |
|
multiplyByNormalizationValue |
|
Internal usage in shinyQC. If method is set to "none",
the object x is returned as is (pass-through).
If probs is NULL, probs is internally set to 0.75 if
method = "quantile division".
Depending on the values in a, if multiplyByNormalizationValue
is set to TRUE the returned normalized values will be in the same
order of magnitude than the original values, while if FALSE, the
returned values will be in a smaller order of magnitude.
matrix
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) normalizeAssay(a, "sum")a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) normalizeAssay(a, "sum")
The function permuteExplVar determines the explained variance of the
permuted expression matrix (x). It is used to determine the optimal
number of PCs for tSNE.
permuteExplVar(x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)permuteExplVar(x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)
x |
|
n |
|
center |
|
scale |
|
sample_n |
|
For the input of tSNE, typically, we want to reduce the initial number of
dimensions linearly with PCA (used as the initial_dims arguments in
the Rtsne funtion). The reduced data set is used for feeding
into tSNE. By plotting the percentage of variance explained by the Princical
Components (PCs) we can estimate how many PCs we keep as input into tSNE.
However, if we select too many PCs, noise will be included as input to tSNE;
if we select too few PCs we might loose the important data structures.
To get a better understanding how many PCs to include, randomization will
be employed and the observed variance will be compared to the permuted
variance.
matrix with explained variance
Thomas Naake
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) permuteExplVar(x = x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) permuteExplVar(x = x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)
The function plotCV displays the coefficient of variation values of
set of values supplied in a data.frame object. The function will
create a plot using the ggplot2 package and will print the values
in the different columns in different colors.
plotCV(df)plotCV(df)
df |
|
Internal usage in shinyQC.
gg object from ggplot2
x1 <- matrix(seq_len(10), ncol = 2) x2 <- matrix(seq(11, 20), ncol = 2) x3 <- matrix(seq(21, 30), ncol = 2) x4 <- matrix(seq(31, 40), ncol = 2) ## calculate cv values cv1 <- cv(x1, "x1") cv2 <- cv(x2, "x2") cv3 <- cv(x3, "x3") cv4 <- cv(x4, "x4") df <- data.frame(cv1, cv2, cv3, cv4) plotCV(df)x1 <- matrix(seq_len(10), ncol = 2) x2 <- matrix(seq(11, 20), ncol = 2) x3 <- matrix(seq(21, 30), ncol = 2) x4 <- matrix(seq(31, 40), ncol = 2) ## calculate cv values cv1 <- cv(x1, "x1") cv2 <- cv(x2, "x2") cv3 <- cv(x3, "x3") cv4 <- cv(x4, "x4") df <- data.frame(cv1, cv2, cv3, cv4) plotCV(df)
The function plotPCALoadings creates a loadings plot of the features.
plotPCALoadings(tbl, x_coord, y_coord)plotPCALoadings(tbl, x_coord, y_coord)
tbl |
|
x_coord |
|
y_coord |
|
The function takes as input the output of the function
tblPlotPCALoadings. It uses the ggplotly function from
plotly to create an interactive plotly plot.
plotly
Thomas Naake
x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tbl <- tblPCALoadings(x, params) plotPCALoadings(tbl, x_coord = "PC1", y_coord = "PC2")x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tbl <- tblPCALoadings(x, params) plotPCALoadings(tbl, x_coord = "PC1", y_coord = "PC2")
The function plotPCAVar plots the explained variance (in
y-axis against the principal components for the measured and permuted values.
plotPCAVar(var_x, var_perm = NULL)plotPCAVar(var_x, var_perm = NULL)
var_x |
|
var_perm |
|
The argument var_perm is optional and visualization of permuted values
can be omitted by setting var_perm = NULL.
gg object from ggplot
Thomas Naake
x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVar(var_x = var_x, var_perm = var_perm)x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVar(var_x = var_x, var_perm = var_perm)
The function plotPCAVarPvalue plots the p-values of significances of
principal components. Using the visual output, the optimal number of
principal components can be selected.
plotPCAVarPvalue(var_x, var_perm)plotPCAVarPvalue(var_x, var_perm)
var_x |
|
var_perm |
|
Internal usage in shinyQC.
gg object from ggplot
Thomas Naake
x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVarPvalue(var_x = var_x, var_perm = var_perm)x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVarPvalue(var_x = var_x, var_perm = var_perm)
samplesMeasuredMissing returns a tbl with
the number of measured/missing
features of samples. The function will take as input a
SummarizedExperiment object and will access its assay() slot
samplesMeasuredMissing(se)samplesMeasuredMissing(se)
se |
|
tbl with number of measured/missing features per sample
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) sample <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) featData <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = featData, colData = sample) ## create the data.frame with information on number of measured/missing ## values samplesMeasuredMissing(se)## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) sample <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) featData <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = featData, colData = sample) ## create the data.frame with information on number of measured/missing ## values samplesMeasuredMissing(se)
The shiny application allows to explore -omics
data sets especially with a focus on quality control. shinyQC gives
information on the type of samples included (if this was previously
specified within the SummarizedExperiment object). It gives
information on the number of missing and measured values across features
and across sets (e.g. quality control samples, control, and treatment
groups, only displayed for SummarizedExperiment objects that
contain missing values).
shinyQC includes functionality to display (count/intensity) values
across samples (to detect drifts in intensity values during the
measurement), to display
mean-sd plots, MA plots, ECDF plots, and distance plots between samples.
shinyQC includes functionality to perform dimensionality reduction
(currently limited to PCA, PCoA, NMDS, tSNE, and UMAP). Additionally,
it includes functionality to perform differential expression analysis
(currently limited to moderated t-tests and the Wald test).
shinyQC(se, app_server = FALSE)shinyQC(se, app_server = FALSE)
se |
|
app_server |
|
rownames(se) should be set to the corresponding name of features,
while colnames(se) should be set to the sample IDs.
rownames(se) and colnames(se) are not allowed to be NULL.
colnames(se), colnames(assay(se)) and
rownames(colData(se)) all have to be identical.
shinyQC allows to subset the supplied SummarizedExperiment object.
On exit of the shiny application, the (subsetted) SummarizedExperiment
object is returned with information on the processing steps (normalization,
transformation, batch correction and imputation). The object will
only returned if app_server = FALSE and if the function call is assigned
to an object, e.g. tmp <- shinyQC(se).
If the se argument is omitted the app will load an interface that allows
for data upload.
shiny application,
SummarizedExperiment upon exiting the shiny application
Thomas Naake
library(dplyr) library(SummarizedExperiment) ## create se set.seed(1) a <- matrix(rnorm(100, mean = 10, sd = 2), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) shinyQC(se)library(dplyr) library(SummarizedExperiment) ## create se set.seed(1) a <- matrix(rnorm(100, mean = 10, sd = 2), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) shinyQC(se)
The function sumDistSample creates a plot showing the sum of distance
of a sample to other samples.
sumDistSample(d, title = "raw")sumDistSample(d, title = "raw")
d |
|
title |
|
gg object from ggplot2
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) dist <- distShiny(a) sumDistSample(dist, title = "raw")a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) dist <- distShiny(a) sumDistSample(dist, title = "raw")
The function tblPCALoadings returns a tibble with loadings
values for the features (row entries) in x.
tblPCALoadings(x, params)tblPCALoadings(x, params)
x |
|
params |
|
The function tblPCALoadings acccesses the list entry rotation
of the prcomp object.
tbl
Thomas Naake
set.seed(1) x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tblPCALoadings(x, params)set.seed(1) x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tblPCALoadings(x, params)
data.frame,
tbl or matrix
The function transformAssay transforms the (count/intensity) values
of a matrix. It uses either log, log2, log10,
variance stabilizing normalisation (vsn) or no transformation method
(pass-through, none). The object
x has the samples in the columns and the features in the rows.
transformAssay( a, method = c("none", "log", "log2", "log10", "vsn"), .offset = 1 )transformAssay( a, method = c("none", "log", "log2", "log10", "vsn"), .offset = 1 )
a |
|
method |
|
.offset |
|
Internal use in shinyQC.
matrix
a <- matrix(seq_len(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) transformAssay(a, "none") transformAssay(a, "log") transformAssay(a, "log2") transformAssay(a, "vsn")a <- matrix(seq_len(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) transformAssay(a, "none") transformAssay(a, "log") transformAssay(a, "log2") transformAssay(a, "vsn")
The function upsetCategory displays the frequency of measured values
per feature with respect to class/sample type to assess difference in
occurrences. Internally, the measured values per sample are obtained via
the measuredCategory function: this function will access the number
of measured/missing values per category and feature. From this, a binary
tbl will be created specifying if the feature is present/missing,
which will be given to the upset function from the UpSetR
package.
upsetCategory(se, category = colnames(colData(se)), measured = TRUE)upsetCategory(se, category = colnames(colData(se)), measured = TRUE)
se |
|
category |
|
measured |
|
Presence is defined by a feature being measured in at least one sample of a set.
Absence is defined by a feature with only missing values (i.e. no measured values) of a set.
upset plot
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) upsetCategory(se, category = "type")## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) upsetCategory(se, category = "type")
The function ComplexHeatmap creates a volcano plot. On the y-axis the
-log10(p-values) are displayed, while on the x-axis the fold
changes/differences are displayed.
The output of the function differs depending on the
type parameter. For type == "ttest", the fold changes are
plotted; for type == "proDA", the differences are plotted.
volcanoPlot(df, type = c("ttest", "proDA"))volcanoPlot(df, type = c("ttest", "proDA"))
df |
|
type |
|
Internal use in shinyQC.
plotly
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(sample = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) se_i <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) ## create model and contrast matrix modelMatrix_expr <- stats::formula("~ 0 + type") contrast_expr <- "type1-type2" modelMatrix <- model.matrix(modelMatrix_expr, data = colData(se)) contrastMatrix <- limma::makeContrasts(contrasts = contrast_expr, levels = modelMatrix) ## ttest fit <- limma::lmFit(a_i, design = modelMatrix) fit <- limma::contrasts.fit(fit, contrastMatrix) fit <- limma::eBayes(fit, trend = TRUE) df_ttest <- limma::topTable(fit, n = Inf, adjust = "fdr", p = 0.05) df_ttest <- cbind(name = rownames(df_ttest), df_ttest) ## plot volcanoPlot(df_ttest, type = "ttest") ## proDA fit <- proDA::proDA(a, design = modelMatrix) df_proDA <- proDA::test_diff(fit = fit, contrast = contrast_expr, sort_by = "adj_pval") ## plot volcanoPlot(df_proDA, type = "proDA")## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(sample = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) se_i <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) ## create model and contrast matrix modelMatrix_expr <- stats::formula("~ 0 + type") contrast_expr <- "type1-type2" modelMatrix <- model.matrix(modelMatrix_expr, data = colData(se)) contrastMatrix <- limma::makeContrasts(contrasts = contrast_expr, levels = modelMatrix) ## ttest fit <- limma::lmFit(a_i, design = modelMatrix) fit <- limma::contrasts.fit(fit, contrastMatrix) fit <- limma::eBayes(fit, trend = TRUE) df_ttest <- limma::topTable(fit, n = Inf, adjust = "fdr", p = 0.05) df_ttest <- cbind(name = rownames(df_ttest), df_ttest) ## plot volcanoPlot(df_ttest, type = "ttest") ## proDA fit <- proDA::proDA(a, design = modelMatrix) df_proDA <- proDA::test_diff(fit = fit, contrast = contrast_expr, sort_by = "adj_pval") ## plot volcanoPlot(df_proDA, type = "proDA")