Compiled date: 2024-12-27
Last edited: 2023-12-14
License: GPL-3
Run the following code to install the Bioconductor version of package.
Let’s create a cleaned SummarizedExperiment
object from
the sample data st000336
to explore the normalization
effects.
example_data <- st000336 %>%
PomaImpute() # KNN imputation
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
Attaching package: 'MatrixGenerics'
The following objects are masked from 'package:matrixStats':
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: generics
Attaching package: 'generics'
The following objects are masked from 'package:base':
as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
setequal, union
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following object is masked from 'package:utils':
findMatches
The following objects are masked from 'package:base':
I, expand.grid, unname
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: 'Biobase'
The following object is masked from 'package:MatrixGenerics':
rowMedians
The following objects are masked from 'package:matrixStats':
anyMissing, rowMedians
2 features removed.
example_data
class: SummarizedExperiment
dim: 29 57
metadata(0):
assays(1): ''
rownames(29): x1_methylhistidine x3_methylhistidine ... pyruvate
succinate
rowData names(0):
colnames(57): 1 2 ... 56 57
colData names(2): group steroids
Here we will evaluate the normalization methods that POMA offers on
the same SummarizedExperiment
object to compare them (Berg et al.
2006).
none <- PomaNorm(example_data, method = "none")
auto_scaling <- PomaNorm(example_data, method = "auto_scaling")
level_scaling <- PomaNorm(example_data, method = "level_scaling")
log_scaling <- PomaNorm(example_data, method = "log_scaling")
log_transformation <- PomaNorm(example_data, method = "log")
vast_scaling <- PomaNorm(example_data, method = "vast_scaling")
log_pareto <- PomaNorm(example_data, method = "log_pareto")
When we check for the dimension of the data after normalization we
can see that all methods have the same effect on data dimension.
PomaNorm
only modifies the data dimension
when the dataset contains only-zero features or
zero-variance features.
dim(SummarizedExperiment::assay(none))
> [1] 29 57
dim(SummarizedExperiment::assay(auto_scaling))
> [1] 29 57
dim(SummarizedExperiment::assay(level_scaling))
> [1] 29 57
dim(SummarizedExperiment::assay(log_scaling))
> [1] 29 57
dim(SummarizedExperiment::assay(log_transformation))
> [1] 29 57
dim(SummarizedExperiment::assay(vast_scaling))
> [1] 29 57
dim(SummarizedExperiment::assay(log_pareto))
> [1] 29 57
Here we can evaluate the normalization effects on samples (Berg et al. 2006).
a <- PomaBoxplots(none,
x = "samples") +
ggplot2::ggtitle("Not Normalized")
b <- PomaBoxplots(auto_scaling,
x = "samples",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Auto Scaling") +
ggplot2::theme(axis.text.x = ggplot2::element_blank())
c <- PomaBoxplots(level_scaling,
x = "samples",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Level Scaling") +
ggplot2::theme(axis.text.x = ggplot2::element_blank())
d <- PomaBoxplots(log_scaling,
x = "samples",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Log Scaling") +
ggplot2::theme(axis.text.x = ggplot2::element_blank())
e <- PomaBoxplots(log_transformation,
x = "samples",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Log Transformation") +
ggplot2::theme(axis.text.x = ggplot2::element_blank())
f <- PomaBoxplots(vast_scaling,
x = "samples",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Vast Scaling") +
ggplot2::theme(axis.text.x = ggplot2::element_blank())
g <- PomaBoxplots(log_pareto,
x = "samples",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Log Pareto") +
ggplot2::theme(axis.text.x = ggplot2::element_blank())
a
Here we can evaluate the normalization effects on features.
h <- PomaDensity(none,
x = "features",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Not Normalized")
i <- PomaDensity(auto_scaling,
x = "features",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Auto Scaling") +
ggplot2::theme(axis.title.x = ggplot2::element_blank(),
axis.title.y = ggplot2::element_blank())
j <- PomaDensity(level_scaling,
x = "features",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Level Scaling") +
ggplot2::theme(axis.title.x = ggplot2::element_blank(),
axis.title.y = ggplot2::element_blank())
k <- PomaDensity(log_scaling,
x = "features",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Log Scaling") +
ggplot2::theme(axis.title.x = ggplot2::element_blank(),
axis.title.y = ggplot2::element_blank())
l <- PomaDensity(log_transformation,
x = "features",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Log Transformation") +
ggplot2::theme(axis.title.x = ggplot2::element_blank(),
axis.title.y = ggplot2::element_blank())
m <- PomaDensity(vast_scaling,
x = "features",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Vast Scaling") +
ggplot2::theme(axis.title.x = ggplot2::element_blank(),
axis.title.y = ggplot2::element_blank())
n <- PomaDensity(log_pareto,
x = "features",
theme_params = list(legend_title = FALSE, legend_position = "none")) +
ggplot2::ggtitle("Log Pareto") +
ggplot2::theme(axis.title.x = ggplot2::element_blank(),
axis.title.y = ggplot2::element_blank())
h
sessionInfo()
> R version 4.4.2 (2024-10-31)
> Platform: x86_64-pc-linux-gnu
> Running under: Ubuntu 24.04.1 LTS
>
> Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> time zone: Etc/UTC
> tzcode source: system (glibc)
>
> attached base packages:
> [1] stats4 stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] SummarizedExperiment_1.37.0 Biobase_2.67.0
> [3] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
> [5] IRanges_2.41.2 S4Vectors_0.45.2
> [7] BiocGenerics_0.53.3 generics_0.1.3
> [9] MatrixGenerics_1.19.0 matrixStats_1.4.1
> [11] patchwork_1.3.0 ggtext_0.1.2
> [13] POMA_1.17.6 BiocStyle_2.35.0
>
> loaded via a namespace (and not attached):
> [1] gtable_0.3.6 impute_1.81.0 xfun_0.49
> [4] bslib_0.8.0 ggplot2_3.5.1 lattice_0.22-6
> [7] vctrs_0.6.5 tools_4.4.2 tibble_3.2.1
> [10] pkgconfig_2.0.3 Matrix_1.7-1 lifecycle_1.0.4
> [13] GenomeInfoDbData_1.2.13 stringr_1.5.1 compiler_4.4.2
> [16] farver_2.1.2 munsell_0.5.1 htmltools_0.5.8.1
> [19] sys_3.4.3 buildtools_1.0.0 sass_0.4.9
> [22] yaml_2.3.10 pillar_1.10.0 crayon_1.5.3
> [25] jquerylib_0.1.4 tidyr_1.3.1 cachem_1.1.0
> [28] DelayedArray_0.33.3 abind_1.4-8 commonmark_1.9.2
> [31] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4
> [34] dplyr_1.1.4 purrr_1.0.2 maketools_1.3.1
> [37] labeling_0.4.3 fastmap_1.2.0 grid_4.4.2
> [40] colorspace_2.1-1 cli_3.6.3 SparseArray_1.7.2
> [43] magrittr_2.0.3 S4Arrays_1.7.1 withr_3.0.2
> [46] scales_1.3.0 UCSC.utils_1.3.0 rmarkdown_2.29
> [49] XVector_0.47.1 httr_1.4.7 evaluate_1.0.1
> [52] knitr_1.49 viridisLite_0.4.2 markdown_1.13
> [55] rlang_1.1.4 gridtext_0.1.5 Rcpp_1.0.13-1
> [58] glue_1.8.0 BiocManager_1.30.25 xml2_1.3.6
> [61] jsonlite_1.8.9 R6_2.5.1