Title: | Shiny-based interactive data-quality exploration for omics data |
---|---|
Description: | Data quality assessment is an integral part of preparatory data analysis to ensure sound biological information retrieval. We present here the MatrixQCvis package, which provides shiny-based interactive visualization of data quality metrics at the per-sample and per-feature level. It is broadly applicable to quantitative omics data types that come in matrix-like format (features x samples). It enables the detection of low-quality samples, drifts, outliers and batch effects in data sets. Visualizations include amongst others bar- and violin plots of the (count/intensity) values, mean vs standard deviation plots, MA plots, empirical cumulative distribution function (ECDF) plots, visualizations of the distances between samples, and multiple types of dimension reduction plots. Furthermore, MatrixQCvis allows for differential expression analysis based on the limma (moderated t-tests) and proDA (Wald tests) packages. MatrixQCvis builds upon the popular Bioconductor SummarizedExperiment S4 class and enables thus the facile integration into existing workflows. The package is especially tailored towards metabolomics and proteomics mass spectrometry data, but also allows to assess the data quality of other data types that can be represented in a SummarizedExperiment object. |
Authors: | Thomas Naake [aut, cre] , Wolfgang Huber [aut] |
Maintainer: | Thomas Naake <[email protected]> |
License: | GPL-3 |
Version: | 1.15.0 |
Built: | 2024-11-14 05:50:15 UTC |
Source: | https://github.com/bioc/MatrixQCvis |
barplotSamplesMeasuredMissing
plots the number of
measured/missing features of samples as a barplot. The function will
take as input the returned tbl
of samplesMeasuredMissing
.
barplotSamplesMeasuredMissing(tbl, measured = TRUE)
barplotSamplesMeasuredMissing(tbl, measured = TRUE)
tbl |
|
measured |
|
gg
object from ggplot2
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## create the data.frame with information on number of measured/missing ## values tbl <- samplesMeasuredMissing(se) ## plot number of measured values barplotSamplesMeasuredMissing(tbl, measured = TRUE) ## plot number of missing values barplotSamplesMeasuredMissing(tbl, measured = FALSE)
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## create the data.frame with information on number of measured/missing ## values tbl <- samplesMeasuredMissing(se) ## plot number of measured values barplotSamplesMeasuredMissing(tbl, measured = TRUE) ## plot number of missing values barplotSamplesMeasuredMissing(tbl, measured = FALSE)
SummarizedExperiment
The function batchCorrectionAssay
removes the batch effect of
(count/intensity) values of a SummarizedExperiment
.
It uses either the removeBatchEffect
or ComBat
functions
or no batch effect correction method (pass-through,
none
).
batchCorrectionAssay( se, method = c("none", "removeBatchEffect (limma)", "ComBat"), batch = NULL, batch2 = NULL, ... )
batchCorrectionAssay( se, method = c("none", "removeBatchEffect (limma)", "ComBat"), batch = NULL, batch2 = NULL, ... )
se |
|
method |
|
batch |
|
batch2 |
|
... |
further arguments passed to |
The column batch
in colData(se)
contains the information
on the batch identity. For method = "removeBatchEffect (limma)"
,
batch2
may indicate a second series of batches.
Internal use in shinyQC
.
If batch
is NULL and method
is set to
method = "removeBatchEffect (limma)"
or method = "ComBat"
,
no batch correction will be performed (equivalent to
method = "none"
).
The method ComBat
will only perform batch correction on valid
features: (1) more or equal than two observations (no NA
) per level
and per feature, (2) variance greater than 0 per feature, and (3) more than
two valid features as given by (1) and (2). For non-valid features, values
are taken from assay(se)
.
matrix
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), batch = rep(c(1, 2), 5)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## method = "removeBatchEffect (limma)" batchCorrectionAssay(se, method = "removeBatchEffect (limma)", batch = "batch", batch2 = NULL) ## method = "ComBat" batchCorrectionAssay(se, method = "ComBat", batch = "batch", batch2 = NULL)
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), batch = rep(c(1, 2), 5)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) ## method = "removeBatchEffect (limma)" batchCorrectionAssay(se, method = "removeBatchEffect (limma)", batch = "batch", batch2 = NULL) ## method = "ComBat" batchCorrectionAssay(se, method = "ComBat", batch = "batch", batch2 = NULL)
The function create_boxplot
creates a boxplot per sample for the
intensity/count values.
createBoxplot( se, orderCategory = colnames(colData(se)), title = "", log = TRUE, violin = FALSE )
createBoxplot( se, orderCategory = colnames(colData(se)), title = "", log = TRUE, violin = FALSE )
se |
|
orderCategory |
|
title |
|
log |
|
violin |
|
Internal usage in shinyQC
.
gg
object from ggplot2
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) createBoxplot(se, orderCategory = "name", title = "", log = TRUE, violin = FALSE)
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) createBoxplot(se, orderCategory = "name", title = "", log = TRUE, violin = FALSE)
The function createDfFeature
takes as input a list of matrices and
returns the row Feature
of each matrix as a column of a
data.frame
. The function createDfFeature
provides the input
for the function featurePlot
.
createDfFeature(l, feature)
createDfFeature(l, feature)
l |
|
feature |
|
Internal usage in shinyQC
data.frame
set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) createDfFeature(l, "feature 1")
set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) createDfFeature(l, "feature 1")
The function cv
calculates the coefficient of variation from columns
of a matrix. The coefficients of variation are calculated according to the
formula sd(y) / mean(y) * 100
with y
the column values, thus,
the function returns the coefficient of variation in percentage.
cv(x, name = "raw")
cv(x, name = "raw")
x |
|
name |
|
The function returned a named list
(the name is specified by the
name
argument) containing the coefficient of variation of the
columns of x
.
list
x <- matrix(seq_len(10), ncol = 2) cv(x)
x <- matrix(seq_len(10), ncol = 2) cv(x)
The function cvFeaturePlot
returns a plotly
plot of coefficient
of variation values. It will create a violin plot and superseded points
of coefficient of variation values per list entry of l
.
cvFeaturePlot(l, lines = FALSE)
cvFeaturePlot(l, lines = FALSE)
l |
|
lines |
|
lines = TRUE
will connect the points belonging to the same feature
with a line. If there are less than two features, the violin plot will not be
plotted. The violin plots will be ordered according to the order in l
plotly
x1 <- matrix(seq_len(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) cvFeaturePlot(l, lines = FALSE)
x1 <- matrix(seq_len(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) cvFeaturePlot(l, lines = FALSE)
The function dimensionReduction
creates a data.frame
with the coordinates of the projected data (first entry of returned output).
The function allows for the
following projections:
Principal Component Analysis (PCA), Principal Coordinates
Analysis/Multidimensional Scaling (PCoA), Non-metric Multidimensional
scaling (NMDS), t-distributed stochastic neighbor embedding (tSNE), and
Uniform Manifold Approximation and Projection (UMAP).
The second list entry will contains the object returned from
prcomp
(PCA), cmdscale
(PCoA), isoMDS
(NMDS),
Rtsne
(tSNE), or umap
(UMAP).
dimensionReduction( x, type = c("PCA", "PCoA", "NMDS", "tSNE", "UMAP"), params = list() )
dimensionReduction( x, type = c("PCA", "PCoA", "NMDS", "tSNE", "UMAP"), params = list() )
x |
|
type |
|
params |
|
The function dimensionReduction
is a wrapper around the following
functions stats::prcomp
(PCA), stats::cmdscale
(PCoA),
MASS::isoMDS
(NMDS), Rtsne::Rtsne
(tSNE), and
umap::umap
(UMAP). For the function umap::umap
the method is set to naive
.
list, first entry contains a tbl
, second entry contains
the object returned from prcomp
(PCA), cmdscale
(PCoA),
isoMDS
(NMDS), Rtsne
(tSNE), or umap
(UMAP)
Thomas Naake
x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP dimensionReduction(x, type = "PCA", params = params) dimensionReduction(x, type = "PCoA", params = params) dimensionReduction(x, type = "NMDS", params = params) dimensionReduction(x, type = "tSNE", params = params) dimensionReduction(x, type = "UMAP", params = params)
x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP dimensionReduction(x, type = "PCA", params = params) dimensionReduction(x, type = "PCoA", params = params) dimensionReduction(x, type = "NMDS", params = params) dimensionReduction(x, type = "tSNE", params = params) dimensionReduction(x, type = "UMAP", params = params)
dimensionReduction
valuesThe function dimensionReductionPlot
creates a dimension reduction plot.
The function takes as input the tbl
object obtained
from the dimensionReduction
function. The tbl
contains
transformed values by one of the dimension reduction methods.
dimensionReductionPlot( tbl, se, color = c("none", colnames(se@colData)), size = c("none", colnames(se@colData)), explainedVar = NULL, x_coord, y_coord, height = 600, interactive = TRUE )
dimensionReductionPlot( tbl, se, color = c("none", colnames(se@colData)), size = c("none", colnames(se@colData)), explainedVar = NULL, x_coord, y_coord, height = 600, interactive = TRUE )
tbl |
|
se |
|
color |
|
size |
|
explainedVar |
NULL or named |
x_coord |
|
y_coord |
|
height |
|
interactive |
|
The function dimensionReductionPlot
is a wrapper for a
ggplot/ggplotly
expression.
plotly
or gg
Thomas Naake
library(SummarizedExperiment) ## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, byrow = TRUE, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), median_vals = apply(a, 2, median)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) pca <- dimensionReduction(x = assay(se), type = "PCA", params = list())[[1]] dimensionReductionPlot(tbl = pca, se = se, color = "type", size = "median_vals", x_coord = "PC1", y_coord = "PC2")
library(SummarizedExperiment) ## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, byrow = TRUE, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), median_vals = apply(a, 2, median)) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) pca <- dimensionReduction(x = assay(se), type = "PCA", params = list())[[1]] dimensionReductionPlot(tbl = pca, se = se, color = "type", size = "median_vals", x_coord = "PC1", y_coord = "PC2")
The function distSample
creates a heatmap from a distance matrix
created by the function distShiny
. The heatmap is annotated by the
column specified by the label
column in colData(se)
.
distSample(d, se, label = "name", title = "raw", ...)
distSample(d, se, label = "name", title = "raw", ...)
d |
|
se |
|
label |
|
title |
|
... |
further arguments passed to |
Internal use in shinyQC
Heatmap
object from ComplexHeatmap
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(name = colnames(a_i), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a_i)) se <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) dist <- distShiny(a_i) distSample(dist, se, label = "type", title = "imputed", show_row_names = TRUE)
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(name = colnames(a_i), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a_i)) se <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) dist <- distShiny(a_i) distSample(dist, se, label = "type", title = "imputed", show_row_names = TRUE)
The function distShiny
takes as an input a numerical matrix
or
data.frame
and returns the distances between the rows and columns
based on the defined method
(e.g. euclidean distance).
distShiny(x, method = "euclidean")
distShiny(x, method = "euclidean")
x |
|
method |
|
Internal use in shinyQC
.
matrix
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) distShiny(x = x)
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) distShiny(x = x)
The function driftPlot
aggregates the (count/intensity) values from
the assay()
slot of a SummarizedExperiment
by the median
or sum
of the (count/intensity) values. driftPlot
then
visualizes these aggregated values and adds a trend line (using either
LOESS or a linear model) from (a subset of) the aggregated values. The
subset is specified by the arguments category
and level
.
driftPlot( se, aggregation = c("median", "sum"), category = colnames(colData(se)), orderCategory = colnames(colData(se)), level = c("all", unique(colData(se)[, category])), method = c("loess", "lm") )
driftPlot( se, aggregation = c("median", "sum"), category = colnames(colData(se)), orderCategory = colnames(colData(se)), level = c("all", unique(colData(se)[, category])), method = c("loess", "lm") )
se |
|
aggregation |
|
category |
|
orderCategory |
|
level |
|
method |
|
The x-values are sorted according to the orderCategory
argument: The
levels of the corresponding column in colData(se)
are pasted with the
sample names (in the column name
) and factorized.
Internal usage in shinyQC
.
gg
object from ggplot2
#' ## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 10, ncol = 100, dimnames = list(seq_len(10), paste("sample", seq_len(100)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 50), rep("2", 50))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) driftPlot(se, aggregation = "sum", category = "type", orderCategory = "type", level = "1", method = "loess")
#' ## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 10, ncol = 100, dimnames = list(seq_len(10), paste("sample", seq_len(100)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 50), rep("2", 50))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) driftPlot(se, aggregation = "sum", category = "type", orderCategory = "type", level = "1", method = "loess")
The function ECDF
creates a plot of the empirical cumulative
distribution function of a specified sample and an outgroup (reference).
The reference is specified by the group
argument. The row-wise
(feature) mean values of the reference are calculated after excluding
the specified sample
.
ECDF(se, sample = colnames(se), group = c("all", colnames(colData(se))))
ECDF(se, sample = colnames(se), group = c("all", colnames(colData(se))))
se |
|
sample |
|
group |
|
Internal use in shinyQC
.
The function ECDF
uses the ks.test
function from stats
to perform a two-sample Kolmogorov-Smirnov test. The Kolmogorov-Smirnov
test is run with the alternative "two.sided"
(null hypothesis is that the true distribution function of the
sample
is equal to the hypothesized distribution function of the
group
).
The exact
argument in ks.test
is set to NULL
, meaning
that an exact p-value is computed if the product of the sample sizes is
less than 10000 of sample
and group
. Otherwise, asymptotic
distributions are used whose approximations might be inaccurate in low
sample sizes.
gg
object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) ECDF(se, sample = "sample 1", group = "all")
## create se set.seed(1) a <- matrix(rnorm(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) ECDF(se, sample = "sample 1", group = "all")
The function explVar
calculates the proportion of explained variance
for each principal component (PC, type = "PCA"
) and axis
(type = "PCoA"
).
explVar(d, type = c("PCA", "PCoA"))
explVar(d, type = c("PCA", "PCoA"))
d |
|
type |
|
explVar
uses the function prcomp
from the stats
package
to retrieve the explained standard deviation per PC
(type = "PCA"
) and the function cmdscale
from the stats
package to retrieve the explained variation based on eigenvalues per
Axis (type = "PCoA"
).
numeric
vector with the proportion of explained variance
for each PC or Axis
Thomas Naake
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) x <- x + rnorm(100) ## run for PCA pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] explVar(d = pca, type = "PCA") ## run for PCoA pcoa <- dimensionReduction(x = x, params = list(method = "euclidean"), type = "PCoA")[[2]] explVar(d = pcoa, type = "PCoA")
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) set.seed(1) x <- x + rnorm(100) ## run for PCA pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] explVar(d = pca, type = "PCA") ## run for PCoA pcoa <- dimensionReduction(x = x, params = list(method = "euclidean"), type = "PCoA")[[2]] explVar(d = pcoa, type = "PCoA")
The function extractComb
extracts the features that match a
combination
depending if the features was measured or missing.
The function will return the sets that match the combination
,
thus, the function might be useful when answering questions about which
features are measured/missing under certain combinations (e.g. sample
types or experimental conditions).
extractComb(se, combination, measured = TRUE, category = "type")
extractComb(se, combination, measured = TRUE, category = "type")
se |
|
combination |
|
measured |
|
category |
|
The function extractComb
uses the make_comb_mat
function from
ComplexHeatmap
package.
Presence is defined by a feature being measured in at least one sample of a set.
Absence is defined by a feature with only missing values (i.e. no measured values) of a set.
character
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) extractComb(se, combination = "2", measured = TRUE, category = "type")
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) extractComb(se, combination = "2", measured = TRUE, category = "type")
The function featurePlot
creates a plot of (count/intensity) values
for different data processing steps (referring to columns in the
data.frame
) over the different samples (referring to rows in
the data.frame
).
featurePlot(df)
featurePlot(df)
df |
|
Internal usage in shinyQC
.
gg
object from ggplot2
set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) df <- createDfFeature(l, "feature 1") featurePlot(df)
set.seed(1) x1 <- matrix(rnorm(100), ncol = 10, nrow = 10, dimnames = list(paste("feature", seq_len(10)), paste("sample", seq_len(10)))) x2 <- x1 + 5 x3 <- x2 + 10 l <- list(x1 = x1, x2 = x2, x3 = x3) df <- createDfFeature(l, "feature 1") featurePlot(df)
hist_sample
plots the number of a category (e.g. sample types)
as a histogram. It use the returned tbl
from hist_sample_num
.
hist_sample(tbl, category = "type")
hist_sample(tbl, category = "type")
tbl |
|
category |
|
gg
object from ggplot2
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- hist_sample_num(se, category = "type") hist_sample(tbl)
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- hist_sample_num(se, category = "type") hist_sample(tbl)
hist_sample_num
returns the number of a category
(e.g. sample types) as a tbl
.
The function will retrieve first the column category
in colData(se)
.
The function will return a tbl
containing the numerical
values of the quantities.
hist_sample_num(se, category = "type")
hist_sample_num(se, category = "type")
se |
|
category |
|
tbl
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) hist_sample_num(se, category = "type")
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 4), rep("2", 6))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) hist_sample_num(se, category = "type")
The function histFeature
creates a histogram with the number
of measured/missing values per feature.
histFeature(x, measured = TRUE, ...)
histFeature(x, measured = TRUE, ...)
x |
|
measured |
|
... |
additional parameters passed to |
plotly
object from ggplotly
x <- matrix(c(c(1, 1, 1), c(1, NA, 1), c(1, NA, 1), c(1, 1, 1), c(NA, 1, 1), c(NA, 1, 1)), byrow = FALSE, nrow = 3) colnames(x) <- c("A_1", "A_2", "A_3", "B_1", "B_2", "B_3") histFeature(x, binwidth = 1)
x <- matrix(c(c(1, 1, 1), c(1, NA, 1), c(1, NA, 1), c(1, 1, 1), c(NA, 1, 1), c(NA, 1, 1)), byrow = FALSE, nrow = 3) colnames(x) <- c("A_1", "A_2", "A_3", "B_1", "B_2", "B_3") histFeature(x, binwidth = 1)
The function histFeatureCategory
creates histogram
plots for each sample type in se
.
histFeatureCategory(se, measured = TRUE, category = "type", ...)
histFeatureCategory(se, measured = TRUE, category = "type", ...)
se |
|
measured |
|
category |
|
... |
additional parameters passed to |
plotly
object from ggplotly
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) histFeatureCategory(se, measured = TRUE, category = "type")
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) histFeatureCategory(se, measured = TRUE, category = "type")
The function hoeffDPlot
creates via ggplot
a violin plot per
factor, a jitter plot of the data points and (optionally) connects the points
via lines. hoeffDPlot
uses the plotly
package to make the
figure interactive.
hoeffDPlot(df, lines = TRUE)
hoeffDPlot(df, lines = TRUE)
df |
|
lines |
|
The function hoeffDPlot
will create the violin plot and jitter plot
according to the specified order given by the colnames of df
.
hoeffDPlot
will thus internally refactor the colnames
of the
supplied data.frame
according to the order of the colnames
.
gg
object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") hd_r <- hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, log = FALSE, group = "all") hd_n <- hoeffDValues(tbl_n, "normalized") df <- data.frame(raw = hd_r, normalized = hd_n) hoeffDPlot(df, lines = TRUE) hoeffDPlot(df, lines = FALSE)
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") hd_r <- hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, log = FALSE, group = "all") hd_n <- hoeffDValues(tbl_n, "normalized") df <- data.frame(raw = hd_r, normalized = hd_n) hoeffDPlot(df, lines = TRUE) hoeffDPlot(df, lines = FALSE)
The function creates and returns Hoeffding's D statistics values from MA values.
In case sample_n
is set to a numerical value (e.g. 10000), a
random subset containing sample_n
is taken to calculate Hoeffding's D
values to speed up the calculation. In case there are less features
than sample_n
, all features are taken.
hoeffDValues(tbl, name = "raw", sample_n = NULL)
hoeffDValues(tbl, name = "raw", sample_n = NULL)
tbl |
|
name |
|
sample_n |
|
The function uses the function hoeffd
from the Hmisc
package to
calculate the values.
named list with Hoeffding's D values per sample
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se) hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, group = "all") hoeffDValues(tbl_n, "normalized") ## transformed values se_t <- se assay(se_t) <- transformAssay(a, "log") tbl_t <- MAvalues(se_t, group = "all") hoeffDValues(tbl_t, "transformed")
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se) hoeffDValues(tbl, "raw") ## normalized values se_n <- se assay(se_n) <- normalizeAssay(a, "sum") tbl_n <- MAvalues(se_n, group = "all") hoeffDValues(tbl_n, "normalized") ## transformed values se_t <- se assay(se_t) <- transformAssay(a, "log") tbl_t <- MAvalues(se_t, group = "all") hoeffDValues(tbl_t, "transformed")
matrix
The function impute
imputes missing values based on one of the
following principles: Bayesian missing value imputation (BPCA
),
k-nearest neighbor averaging (kNN
), Malimum likelihood-based
imputation method using the EM algorithm (MLE
), replacement by
the smallest non-missing value
in the data (Min
), replacement by the minimal value observed as
the q-th quantile (MinDet
, default q = 0.01
), and replacement
by random draws from a Gaussian distribution centred to a minimal value
(MinProb
).
imputeAssay( a, method = c("BPCA", "kNN", "MLE", "Min", "MinDet", "MinProb", "none") )
imputeAssay( a, method = c("BPCA", "kNN", "MLE", "Min", "MinDet", "MinProb", "none") )
a |
|
method |
|
BPCA
wrapper for pcaMethods::pca
with methods = "bpca"
.
BPCA
is a missing at random (MAR) imputation method.
kNN
wrapper for impute::impute.knn
with k = 10
,
rowmax = 0.5
, colmax = 0.5
, maxp = 1500
. kNN
is a MAR imputation method.
MLE
wrapper for imputeLCMD::impute.MAR
with
method = "MLE"
,
model.selector = 1
/imputeLCMD::impute.wrapper.MLE
.
MLE
is a MAR imputation method.
Min
imputes the missing values by the observed minimal value of
x
. Min
is a missing not at random (MNAR) imputation method.
MinDet
is a wrapper for imputeLCMD::impute.MinDet
with
q = 0.01
. MinDet
performs the imputation using a
deterministic minimal value approach. The missing entries are
replaced with a minimal value, estimated from the q
-th quantile
from each sample. MinDet
is a MNAR imputation method.
MinProb
is a wrapper for imputeLCMD::impute.MinProb
with
q = 0.01
and tune.sigma = 1
. MinProb
performs the
imputation based on random draws from a Gaussion distribution with the
mean set to the minimal value of a sample. MinProb
is a
MNAR imputation method.
MinProb
does not impute values (not available within shiny
application).
matrix
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA imputeAssay(a, method = "kNN") imputeAssay(a, method = "Min") imputeAssay(a, method = "MinDet") imputeAssay(a, method = "MinProb")
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA imputeAssay(a, method = "kNN") imputeAssay(a, method = "Min") imputeAssay(a, method = "MinDet") imputeAssay(a, method = "MinProb")
The function creates a 2D histogram of M and A values.
MAplot( tbl, group = c("all", colnames(tbl)), plot = c("all", unique(tbl[["name"]])) )
MAplot( tbl, group = c("all", colnames(tbl)), plot = c("all", unique(tbl[["name"]])) )
tbl |
|
group |
|
plot |
|
MAplot
returns a 2D hex histogram instead of a classical scatterplot
due to computational reasons and better visualization of overlaying points.
The argument plot
specifies the sample (refering to
colData(se)$name
) to be plotted. If plot = "all"
, MA values
for all samples will be plotted (samples will be plotted in facets).
If the number of features (tbl$Features
) is below 1000, points will be
plotted (via geom_points
), otherwise hexagons will be plotted
(via geom_hex
).
gg
object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") MAplot(tbl, group = "all", plot = "all")
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) tbl <- MAvalues(se, log = FALSE, group = "all") MAplot(tbl, group = "all", plot = "all")
The function MAvalues
will create MA values as input for the function
MAplot
and hoeffDValues
.
M
and A
are specified relative to specified samples which
is determined by the group
argument. In case of group == "all"
,
all samples (expect the specified one) are taken for the reference
calculation. In case of group != "all"
will use the samples belonging
to the same group given in colnames(colData(se))
expect the
specified one.
MAvalues(se, log2 = TRUE, group = c("all", colnames(colData(se))))
MAvalues(se, log2 = TRUE, group = c("all", colnames(colData(se))))
se |
|
log2 |
|
group |
|
tbl
with columns Feature
, name
(sample name),
A
, M
and additional columns of colData(se)
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) MAvalues(se, log = FALSE, group = "all")
## create se set.seed(1) a <- matrix(rnorm(10000), nrow = 1000, ncol = 10, dimnames = list(seq_len(1000), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) MAvalues(se, log = FALSE, group = "all")
The function measuredCategory
creates a tbl
with
the number of measured values per feature. 0 means that there were only
missing values (NA
) for the feature and sample type.
measuredCategory
will return a tbl
where columns are the
unique sample types and rows are the features as in assay(se)
.
measuredCategory(se, measured = TRUE, category = "type")
measuredCategory(se, measured = TRUE, category = "type")
se |
|
measured |
|
category |
|
measuredCategory
is a helper function.
matrix
with number of measured/missing features per
category
type
## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) measuredCategory(se, measured = TRUE, category = "type")
## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) measuredCategory(se, measured = TRUE, category = "type")
The function mosaic
creates a mosaic plot of two factors from
an SummarizedExperiment
object. The columns f1
and f2
are taken from colData(se)
.
mosaic(se, f1, f2)
mosaic(se, f1, f2)
se |
|
f1 |
|
f2 |
|
Code partly taken from https://stackoverflow.com/questions/21588096/pass-string-to-facet-grid-ggplot2
gg
object from ggplot2
## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), cell_type = c("A", "B")) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) mosaic(se, "cell_type", "type")
## create se set.seed(1) a <- matrix(rnorm(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5)), cell_type = c("A", "B")) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) mosaic(se, "cell_type", "type")
The function normalizeAssay
performs normalization by sum of the
(count/intensity) values per sample (method = "sum"
), quantile
division per sample (method = "quantile division"
),
or by quantile normalization (adjusting the value distributions that they
become identical in statistical properties, method = "quantile"
).
The value for quantile division (e.g., the 75
specified by the probs
argument. Quantile normalization is
performed by using the normalizeQuantiles
function from limma
.
For the methods "sum"
and "quantile division"
, normalization
will be done depending on the multiplyByNormalizationValue
parameter.
If set to TRUE
, normalization values (e.g. sum or quantile) will be
calculated per sample. In a next step, adjusted normalization values will
be calculated for each sample in relation to the median normalization
values across all samples. Finally, the values in a
are
multiplied by these adjusted normalization values.
If multiplyByNormalizationValue
is set to FALSE
,
normalization values (e.g. sum or quantile) will be
calculated per sample. The values in a
are sample-wise divided by
the normalization values.
normalizeAssay( a, method = c("none", "sum", "quantile division", "quantile"), probs = 0.75, multiplyByNormalizationValue = FALSE )
normalizeAssay( a, method = c("none", "sum", "quantile division", "quantile"), probs = 0.75, multiplyByNormalizationValue = FALSE )
a |
|
method |
|
probs |
|
multiplyByNormalizationValue |
|
Internal usage in shinyQC
. If method
is set to "none"
,
the object x
is returned as is (pass-through).
If probs
is NULL, probs
is internally set to 0.75 if
method = "quantile division"
.
Depending on the values in a
, if multiplyByNormalizationValue
is set to TRUE
the returned normalized values will be in the same
order of magnitude than the original values, while if FALSE
, the
returned values will be in a smaller order of magnitude.
matrix
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) normalizeAssay(a, "sum")
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) normalizeAssay(a, "sum")
The function permuteExplVar
determines the explained variance of the
permuted expression matrix (x
). It is used to determine the optimal
number of PCs for tSNE.
permuteExplVar(x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)
permuteExplVar(x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)
x |
|
n |
|
center |
|
scale |
|
sample_n |
|
For the input of tSNE, typically, we want to reduce the initial number of
dimensions linearly with PCA (used as the initial_dims
arguments in
the Rtsne
funtion). The reduced data set is used for feeding
into tSNE. By plotting the percentage of variance explained by the Princical
Components (PCs) we can estimate how many PCs we keep as input into tSNE.
However, if we select too many PCs, noise will be included as input to tSNE;
if we select too few PCs we might loose the important data structures.
To get a better understanding how many PCs to include, randomization will
be employed and the observed variance will be compared to the permuted
variance.
matrix with explained variance
Thomas Naake
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) permuteExplVar(x = x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)
x <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) permuteExplVar(x = x, n = 10, center = TRUE, scale = TRUE, sample_n = NULL)
The function plotCV
displays the coefficient of variation values of
set of values supplied in a data.frame
object. The function will
create a plot using the ggplot2
package and will print the values
in the different columns in different colors.
plotCV(df)
plotCV(df)
df |
|
Internal usage in shinyQC
.
gg
object from ggplot2
x1 <- matrix(seq_len(10), ncol = 2) x2 <- matrix(seq(11, 20), ncol = 2) x3 <- matrix(seq(21, 30), ncol = 2) x4 <- matrix(seq(31, 40), ncol = 2) ## calculate cv values cv1 <- cv(x1, "x1") cv2 <- cv(x2, "x2") cv3 <- cv(x3, "x3") cv4 <- cv(x4, "x4") df <- data.frame(cv1, cv2, cv3, cv4) plotCV(df)
x1 <- matrix(seq_len(10), ncol = 2) x2 <- matrix(seq(11, 20), ncol = 2) x3 <- matrix(seq(21, 30), ncol = 2) x4 <- matrix(seq(31, 40), ncol = 2) ## calculate cv values cv1 <- cv(x1, "x1") cv2 <- cv(x2, "x2") cv3 <- cv(x3, "x3") cv4 <- cv(x4, "x4") df <- data.frame(cv1, cv2, cv3, cv4) plotCV(df)
The function plotPCALoadings
creates a loadings plot of the features.
plotPCALoadings(tbl, x_coord, y_coord)
plotPCALoadings(tbl, x_coord, y_coord)
tbl |
|
x_coord |
|
y_coord |
|
The function takes as input the output of the function
tblPlotPCALoadings
. It uses the ggplotly
function from
plotly
to create an interactive plotly
plot.
plotly
Thomas Naake
x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tbl <- tblPCALoadings(x, params) plotPCALoadings(tbl, x_coord = "PC1", y_coord = "PC2")
x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tbl <- tblPCALoadings(x, params) plotPCALoadings(tbl, x_coord = "PC1", y_coord = "PC2")
The function plotPCAVar
plots the explained variance (in
y-axis against the principal components for the measured and permuted values.
plotPCAVar(var_x, var_perm = NULL)
plotPCAVar(var_x, var_perm = NULL)
var_x |
|
var_perm |
|
The argument var_perm
is optional and visualization of permuted values
can be omitted by setting var_perm = NULL
.
gg
object from ggplot
Thomas Naake
x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVar(var_x = var_x, var_perm = var_perm)
x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVar(var_x = var_x, var_perm = var_perm)
The function plotPCAVarPvalue
plots the p-values of significances of
principal components. Using the visual output, the optimal number of
principal components can be selected.
plotPCAVarPvalue(var_x, var_perm)
plotPCAVarPvalue(var_x, var_perm)
var_x |
|
var_perm |
|
Internal usage in shinyQC
.
gg
object from ggplot
Thomas Naake
x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVarPvalue(var_x = var_x, var_perm = var_perm)
x <- matrix(seq_len(100), ncol = 10) pca <- dimensionReduction(x = x, params = list(center = TRUE, scale = TRUE), type = "PCA")[[2]] var_x <- explVar(d = pca, type = "PCA") var_perm <- permuteExplVar(x = x, n = 100, center = TRUE, scale = TRUE) plotPCAVarPvalue(var_x = var_x, var_perm = var_perm)
samplesMeasuredMissing
returns a tbl
with
the number of measured/missing
features of samples. The function will take as input a
SummarizedExperiment
object and will access its assay()
slot
samplesMeasuredMissing(se)
samplesMeasuredMissing(se)
se |
|
tbl
with number of measured/missing features per sample
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) sample <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) featData <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = featData, colData = sample) ## create the data.frame with information on number of measured/missing ## values samplesMeasuredMissing(se)
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) sample <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) featData <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = featData, colData = sample) ## create the data.frame with information on number of measured/missing ## values samplesMeasuredMissing(se)
The shiny application allows to explore -omics
data sets especially with a focus on quality control. shinyQC
gives
information on the type of samples included (if this was previously
specified within the SummarizedExperiment
object). It gives
information on the number of missing and measured values across features
and across sets (e.g. quality control samples, control, and treatment
groups, only displayed for SummarizedExperiment
objects that
contain missing values).
shinyQC
includes functionality to display (count/intensity) values
across samples (to detect drifts in intensity values during the
measurement), to display
mean-sd plots, MA plots, ECDF plots, and distance plots between samples.
shinyQC
includes functionality to perform dimensionality reduction
(currently limited to PCA, PCoA, NMDS, tSNE, and UMAP). Additionally,
it includes functionality to perform differential expression analysis
(currently limited to moderated t-tests and the Wald test).
shinyQC(se, app_server = FALSE)
shinyQC(se, app_server = FALSE)
se |
|
app_server |
|
rownames(se)
should be set to the corresponding name of features,
while colnames(se)
should be set to the sample IDs.
rownames(se)
and colnames(se)
are not allowed to be NULL.
colnames(se)
, colnames(assay(se))
and
rownames(colData(se))
all have to be identical.
shinyQC
allows to subset the supplied SummarizedExperiment
object.
On exit of the shiny application, the (subsetted) SummarizedExperiment
object is returned with information on the processing steps (normalization,
transformation, batch correction and imputation). The object will
only returned if app_server = FALSE
and if the function call is assigned
to an object, e.g. tmp <- shinyQC(se)
.
If the se
argument is omitted the app will load an interface that allows
for data upload.
shiny
application,
SummarizedExperiment
upon exiting the shiny
application
Thomas Naake
library(dplyr) library(SummarizedExperiment) ## create se set.seed(1) a <- matrix(rnorm(100, mean = 10, sd = 2), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) shinyQC(se)
library(dplyr) library(SummarizedExperiment) ## create se set.seed(1) a <- matrix(rnorm(100, mean = 10, sd = 2), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment(assay = a, rowData = rD, colData = cD) shinyQC(se)
The function sumDistSample
creates a plot showing the sum of distance
of a sample to other samples.
sumDistSample(d, title = "raw")
sumDistSample(d, title = "raw")
d |
|
title |
|
gg
object from ggplot2
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) dist <- distShiny(a) sumDistSample(dist, title = "raw")
a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) dist <- distShiny(a) sumDistSample(dist, title = "raw")
The function tblPCALoadings
returns a tibble
with loadings
values for the features (row entries) in x
.
tblPCALoadings(x, params)
tblPCALoadings(x, params)
x |
|
params |
|
The function tblPCALoadings
acccesses the list entry rotation
of the prcomp
object.
tbl
Thomas Naake
set.seed(1) x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tblPCALoadings(x, params)
set.seed(1) x <- matrix(rnorm(seq_len(10000)), ncol = 100) rownames(x) <- paste("feature", seq_len(nrow(x))) colnames(x) <- paste("sample", seq_len(ncol(x))) params <- list(method = "euclidean", ## dist initial_dims = 10, max_iter = 100, dims = 3, perplexity = 3, ## tSNE min_dist = 0.1, n_neighbors = 15, spread = 1) ## UMAP tblPCALoadings(x, params)
data.frame
,
tbl
or matrix
The function transformAssay
transforms the (count/intensity) values
of a matrix
. It uses either log
, log2
, log10
,
variance stabilizing normalisation (vsn
) or no transformation method
(pass-through, none
). The object
x
has the samples in the columns and the features in the rows.
transformAssay( a, method = c("none", "log", "log2", "log10", "vsn"), .offset = 1 )
transformAssay( a, method = c("none", "log", "log2", "log10", "vsn"), .offset = 1 )
a |
|
method |
|
.offset |
|
Internal use in shinyQC
.
matrix
a <- matrix(seq_len(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) transformAssay(a, "none") transformAssay(a, "log") transformAssay(a, "log2") transformAssay(a, "vsn")
a <- matrix(seq_len(1000), nrow = 100, ncol = 10, dimnames = list(seq_len(100), paste("sample", seq_len(10)))) transformAssay(a, "none") transformAssay(a, "log") transformAssay(a, "log2") transformAssay(a, "vsn")
The function upsetCategory
displays the frequency of measured values
per feature with respect to class/sample type to assess difference in
occurrences. Internally, the measured values per sample are obtained via
the measuredCategory
function: this function will access the number
of measured/missing values per category and feature. From this, a binary
tbl
will be created specifying if the feature is present/missing,
which will be given to the upset
function from the UpSetR
package.
upsetCategory(se, category = colnames(colData(se)), measured = TRUE)
upsetCategory(se, category = colnames(colData(se)), measured = TRUE)
se |
|
category |
|
measured |
|
Presence is defined by a feature being measured in at least one sample of a set.
Absence is defined by a feature with only missing values (i.e. no measured values) of a set.
upset
plot
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) upsetCategory(se, category = "type")
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) cD <- data.frame(name = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) upsetCategory(se, category = "type")
The function ComplexHeatmap
creates a volcano plot. On the y-axis the
-log10(p-values) are displayed, while on the x-axis the fold
changes/differences are displayed.
The output of the function differs depending on the
type
parameter. For type == "ttest"
, the fold changes are
plotted; for type == "proDA"
, the differences are plotted.
volcanoPlot(df, type = c("ttest", "proDA"))
volcanoPlot(df, type = c("ttest", "proDA"))
df |
|
type |
|
Internal use in shinyQC
.
plotly
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(sample = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) se_i <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) ## create model and contrast matrix modelMatrix_expr <- stats::formula("~ 0 + type") contrast_expr <- "type1-type2" modelMatrix <- model.matrix(modelMatrix_expr, data = colData(se)) contrastMatrix <- limma::makeContrasts(contrasts = contrast_expr, levels = modelMatrix) ## ttest fit <- limma::lmFit(a_i, design = modelMatrix) fit <- limma::contrasts.fit(fit, contrastMatrix) fit <- limma::eBayes(fit, trend = TRUE) df_ttest <- limma::topTable(fit, n = Inf, adjust = "fdr", p = 0.05) df_ttest <- cbind(name = rownames(df_ttest), df_ttest) ## plot volcanoPlot(df_ttest, type = "ttest") ## proDA fit <- proDA::proDA(a, design = modelMatrix) df_proDA <- proDA::test_diff(fit = fit, contrast = contrast_expr, sort_by = "adj_pval") ## plot volcanoPlot(df_proDA, type = "proDA")
## create se a <- matrix(seq_len(100), nrow = 10, ncol = 10, dimnames = list(seq_len(10), paste("sample", seq_len(10)))) a[c(1, 5, 8), seq_len(5)] <- NA set.seed(1) a <- a + rnorm(100) a_i <- imputeAssay(a, method = "MinDet") cD <- data.frame(sample = colnames(a), type = c(rep("1", 5), rep("2", 5))) rD <- data.frame(spectra = rownames(a)) se <- SummarizedExperiment::SummarizedExperiment(assay = a, rowData = rD, colData = cD) se_i <- SummarizedExperiment::SummarizedExperiment(assay = a_i, rowData = rD, colData = cD) ## create model and contrast matrix modelMatrix_expr <- stats::formula("~ 0 + type") contrast_expr <- "type1-type2" modelMatrix <- model.matrix(modelMatrix_expr, data = colData(se)) contrastMatrix <- limma::makeContrasts(contrasts = contrast_expr, levels = modelMatrix) ## ttest fit <- limma::lmFit(a_i, design = modelMatrix) fit <- limma::contrasts.fit(fit, contrastMatrix) fit <- limma::eBayes(fit, trend = TRUE) df_ttest <- limma::topTable(fit, n = Inf, adjust = "fdr", p = 0.05) df_ttest <- cbind(name = rownames(df_ttest), df_ttest) ## plot volcanoPlot(df_ttest, type = "ttest") ## proDA fit <- proDA::proDA(a, design = modelMatrix) df_proDA <- proDA::test_diff(fit = fit, contrast = contrast_expr, sort_by = "adj_pval") ## plot volcanoPlot(df_proDA, type = "proDA")