The SEtools package is a set of convenience functions for
the Bioconductor class SummarizedExperiment.
It facilitates merging, melting, and plotting
SummarizedExperiment
objects.
NOTE that the heatmap-related and melting functions have been
moved to a standalone package, sechm.
The old sehm
function of SEtools
should be
considered deprecated, and most SEtools
functions are
conserved for legacy/reproducibility reasons (or until they find a
better home).
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SEtools")
Or, to install the latest development version:
To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:
suppressPackageStartupMessages({
library(SummarizedExperiment)
library(SEtools)
})
data("SE", package="SEtools")
SE
## class: SummarizedExperiment
## dim: 100 20
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition
This is taken from Floriou-Servou et al., Biol Psychiatry 2018.
## class: SummarizedExperiment
## dim: 100 20
## metadata(3): se1 se2 anno_colors
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(2): meanCPM meanTPM
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
## se2.HC.Swim.5
## colData names(3): Dataset Region Condition
All assays were merged, along with rowData and colData slots.
By default, row z-scores are calculated for each object when merging. This can be prevented with:
If more than one assay is present, one can specify a different scaling behavior for each assay:
Differences to the cbind
method include prefixes added
to column names, optional scaling, handling of metadata (e.g. for
sechm
)
It is also possible to merge by rowData columns, which are specified
through the mergeBy
argument. In this case, one can have
one-to-many and many-to-many mappings, in which case two behaviors are
possible:
aggFun
, the features of
each object will by aggregated by mergeBy
using this
function before merging.rowData(se1)$metafeature <- sample(LETTERS,nrow(se1),replace = TRUE)
rowData(se2)$metafeature <- sample(LETTERS,nrow(se2),replace = TRUE)
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE, mergeBy="metafeature", aggFun=median)
## Aggregating the objects by metafeature
## Merging...
A single SE can also be aggregated by using the aggSE
function:
## Aggregation methods for each assay:
## counts: sum; logcpm: expsum
## class: SummarizedExperiment
## dim: 25 10
## metadata(0):
## assays(2): counts logcpm
## rownames(25): A B ... X Y
## rowData names(0):
## colnames(10): HC.Homecage.1 HC.Homecage.2 ... HC.Handling.4
## HC.Handling.5
## colData names(2): Region Condition
If the aggregation function(s) are not specified, aggSE
will try to guess decent aggregation functions from the assay names.
This is similar to scuttle::sumCountsAcrossFeatures
, but
preserves other SE slots.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] SEtools_1.21.0 sechm_1.15.0
## [3] ComplexHeatmap_2.23.0 SummarizedExperiment_1.37.0
## [5] Biobase_2.67.0 GenomicRanges_1.59.1
## [7] GenomeInfoDb_1.43.4 IRanges_2.41.3
## [9] S4Vectors_0.45.4 BiocGenerics_0.53.6
## [11] generics_0.1.3 MatrixGenerics_1.19.1
## [13] matrixStats_1.5.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 rlang_1.1.5 magrittr_2.0.3
## [4] clue_0.3-66 GetoptLong_1.0.5 RSQLite_2.3.9
## [7] compiler_4.4.2 mgcv_1.9-1 png_0.1-8
## [10] vctrs_0.6.5 sva_3.55.0 stringr_1.5.1
## [13] pkgconfig_2.0.3 shape_1.4.6.1 crayon_1.5.3
## [16] fastmap_1.2.0 XVector_0.47.2 ca_0.71.1
## [19] rmarkdown_2.29 UCSC.utils_1.3.1 bit_4.5.0.1
## [22] xfun_0.50 cachem_1.1.0 jsonlite_1.8.9
## [25] blob_1.2.4 DelayedArray_0.33.6 BiocParallel_1.41.1
## [28] parallel_4.4.2 cluster_2.1.8 R6_2.6.1
## [31] bslib_0.9.0 stringi_1.8.4 RColorBrewer_1.1-3
## [34] limma_3.63.3 genefilter_1.89.0 jquerylib_0.1.4
## [37] Rcpp_1.0.14 iterators_1.0.14 knitr_1.49
## [40] splines_4.4.2 Matrix_1.7-2 abind_1.4-8
## [43] yaml_2.3.10 TSP_1.2-4 doParallel_1.0.17
## [46] codetools_0.2-20 curl_6.2.0 lattice_0.22-6
## [49] tibble_3.2.1 KEGGREST_1.47.0 evaluate_1.0.3
## [52] Rtsne_0.17 survival_3.8-3 zip_2.3.2
## [55] Biostrings_2.75.3 circlize_0.4.16 pillar_1.10.1
## [58] BiocManager_1.30.25 foreach_1.5.2 ggplot2_3.5.1
## [61] munsell_0.5.1 scales_1.3.0 xtable_1.8-4
## [64] glue_1.8.0 pheatmap_1.0.12 maketools_1.3.2
## [67] tools_4.4.2 sys_3.4.3 data.table_1.16.4
## [70] openxlsx_4.2.8 annotate_1.85.0 locfit_1.5-9.11
## [73] registry_0.5-1 buildtools_1.0.0 XML_3.99-0.18
## [76] seriation_1.5.7 AnnotationDbi_1.69.0 edgeR_4.5.2
## [79] colorspace_2.1-1 nlme_3.1-167 GenomeInfoDbData_1.2.13
## [82] randomcoloR_1.1.0.1 cli_3.6.4 S4Arrays_1.7.3
## [85] V8_6.0.1 gtable_0.3.6 DESeq2_1.47.3
## [88] sass_0.4.9 digest_0.6.37 SparseArray_1.7.5
## [91] rjson_0.2.23 memoise_2.0.1 htmltools_0.5.8.1
## [94] lifecycle_1.0.4 httr_1.4.7 GlobalOptions_0.1.2
## [97] statmod_1.5.0 bit64_4.6.0-1