The following examples demonstrate how to shuttle data between DESeqDataSet and DGEList containers. Before we start, let’s first load the required packages.
We will illustrate the functionality of the package on a mock expression data set of an RNA-seq experiment. The sample counts table can be generated using the function provided by DEFormats:
The resulting object is a matrix with rows corresponding to genomic features and columns to samples.
## sample1 sample2 sample3 sample4 sample5 sample6
## gene1 85 76 103 107 140 124
## gene2 1 6 11 6 1 4
## gene3 80 98 39 82 97 113
## gene4 92 83 59 85 100 98
## gene5 36 24 18 50 22 15
## gene6 0 0 1 4 2 3
In order to construct a DGEList object we need to provide, apart from the counts matrix, the sample grouping.
## An object of class "DGEList"
## $counts
## sample1 sample2 sample3 sample4 sample5 sample6
## gene1 85 76 103 107 140 124
## gene2 1 6 11 6 1 4
## gene3 80 98 39 82 97 113
## gene4 92 83 59 85 100 98
## gene5 36 24 18 50 22 15
## 995 more rows ...
##
## $samples
## group lib.size norm.factors
## sample1 A 42832 1
## sample2 A 40511 1
## sample3 A 39299 1
## sample4 B 43451 1
## sample5 B 40613 1
## sample6 B 43662 1
A DGEList object can be easily converted to a
DESeqDataSet object with the help of the function
as.DESeqDataSet
.
## class: DESeqDataSet
## dim: 1000 6
## metadata(1): version
## assays(1): counts
## rownames(1000): gene1 gene2 ... gene999 gene1000
## rowData names(0):
## colnames(6): sample1 sample2 ... sample5 sample6
## colData names(3): group lib.size norm.factors
Just to make sure that the coercions conserve the data and metadata,
we now convert dds
back to a DGEList and compare
the result to the original dge
object.
## [1] TRUE
Note that because of the use of reference classes in the
SummarizedExperiment class which DESeqDataSet extends,
identical
will return FALSE
for any two
DESeqDataSet instances, even for ones constructed from the same
input:
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## [1] TRUE
Instead of a count matrix, simulateRnaSeqData
can also
return an annotated RangedSummarizedExperiment object. The
advantage of such an object is that, apart from the counts matrix stored
in the assay
slot, it also contains sample description in
colData
, and gene information stored in
rowRanges
as a GRanges
object.
## class: RangedSummarizedExperiment
## dim: 1000 6
## metadata(1): version
## assays(1): counts
## rownames(1000): gene1 gene2 ... gene999 gene1000
## rowData names(3): trueIntercept trueBeta trueDisp
## colnames(6): sample1 sample2 ... sample5 sample6
## colData names(1): condition
The se
object can be readily used to construct a
DESeqDataSet object,
which can be converted to a DGEList using the familiar method.
## An object of class "DGEList"
## $counts
## sample1 sample2 sample3 sample4 sample5 sample6
## gene1 85 76 103 107 140 124
## gene2 1 6 11 6 1 4
## gene3 80 98 39 82 97 113
## gene4 92 83 59 85 100 98
## gene5 36 24 18 50 22 15
## 995 more rows ...
##
## $samples
## group lib.size norm.factors
## sample1 A 42832 1
## sample2 A 40511 1
## sample3 A 39299 1
## sample4 B 43451 1
## sample5 B 40613 1
## sample6 B 43662 1
##
## $genes
## seqnames start end width strand trueIntercept trueBeta trueDisp
## gene1 1 1 100 100 * 6.525909 0 0.1434076
## gene2 1 101 200 100 * 3.347533 0 0.4929634
## gene3 1 201 300 100 * 6.659599 0 0.1395659
## gene4 1 301 400 100 * 6.544859 0 0.1428412
## gene5 1 401 500 100 * 4.829283 0 0.2407022
## 995 more rows ...
Note the additional genes
element on the
dge
list compared to the object from the previous section.
Similarly as for the DESeqDataSet
constructor from
DESeq2, it
is possible to directly use RangedSummarizedExperiment objects
as input to the generic DGEList
constructor defined in
DEFormats. This allows you to use common input objects
regardless of whether you are applying DESeq2 or edgeR
to perform your analysis, or to easily switch between these two tools in
your pipeline.
## An object of class "DGEList"
## $counts
## sample1 sample2 sample3 sample4 sample5 sample6
## gene1 85 76 103 107 140 124
## gene2 1 6 11 6 1 4
## gene3 80 98 39 82 97 113
## gene4 92 83 59 85 100 98
## gene5 36 24 18 50 22 15
## 995 more rows ...
##
## $samples
## group lib.size norm.factors
## sample1 A 42832 1
## sample2 A 40511 1
## sample3 A 39299 1
## sample4 B 43451 1
## sample5 B 40613 1
## sample6 B 43662 1
##
## $genes
## seqnames start end width strand trueIntercept trueBeta trueDisp
## gene1 1 1 100 100 * 6.525909 0 0.1434076
## gene2 1 101 200 100 * 3.347533 0 0.4929634
## gene3 1 201 300 100 * 6.659599 0 0.1395659
## gene4 1 301 400 100 * 6.544859 0 0.1428412
## gene5 1 401 500 100 * 4.829283 0 0.2407022
## 995 more rows ...
We renamed the condition
column of
colData(se)
to group
in order to allow
edgeR to automatically pick up the correct sample grouping.
Another way of specifying this is through the argument
group
to DGEList
.
Even though there is no direct method to go from a DGEList to a RangedSummarizedExperiment, the coercion can be easily performed by first converting the object to a DESeqDataSet, which is a subclass of RangedSummarizedExperiment, and then coercing the resulting object to its superclass.
## class: RangedSummarizedExperiment
## dim: 1000 6
## metadata(1): version
## assays(1): counts
## rownames(1000): gene1 gene2 ... gene999 gene1000
## rowData names(3): trueIntercept trueBeta trueDisp
## colnames(6): sample1 sample2 ... sample5 sample6
## colData names(3): group lib.size norm.factors
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DEFormats_1.35.0 edgeR_4.3.21
## [3] limma_3.61.12 DESeq2_1.45.3
## [5] SummarizedExperiment_1.35.5 Biobase_2.67.0
## [7] MatrixGenerics_1.17.1 matrixStats_1.4.1
## [9] GenomicRanges_1.57.2 GenomeInfoDb_1.41.2
## [11] IRanges_2.39.2 S4Vectors_0.43.2
## [13] BiocGenerics_0.53.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.48 bslib_0.8.0
## [4] ggplot2_3.5.1 lattice_0.22-6 vctrs_0.6.5
## [7] tools_4.4.1 parallel_4.4.1 tibble_3.2.1
## [10] fansi_1.0.6 pkgconfig_2.0.3 Matrix_1.7-1
## [13] checkmate_2.3.2 data.table_1.16.2 lifecycle_1.0.4
## [16] GenomeInfoDbData_1.2.13 compiler_4.4.1 statmod_1.5.0
## [19] munsell_0.5.1 codetools_0.2-20 htmltools_0.5.8.1
## [22] sys_3.4.3 buildtools_1.0.0 sass_0.4.9
## [25] yaml_2.3.10 pillar_1.9.0 crayon_1.5.3
## [28] jquerylib_0.1.4 BiocParallel_1.39.0 DelayedArray_0.31.14
## [31] cachem_1.1.0 abind_1.4-8 locfit_1.5-9.10
## [34] digest_0.6.37 maketools_1.3.1 fastmap_1.2.0
## [37] grid_4.4.1 colorspace_2.1-1 cli_3.6.3
## [40] SparseArray_1.5.45 magrittr_2.0.3 S4Arrays_1.5.11
## [43] utf8_1.2.4 backports_1.5.0 scales_1.3.0
## [46] UCSC.utils_1.1.0 rmarkdown_2.28 XVector_0.45.0
## [49] httr_1.4.7 evaluate_1.0.1 knitr_1.48
## [52] rlang_1.1.4 Rcpp_1.0.13 glue_1.8.0
## [55] BiocManager_1.30.25 jsonlite_1.8.9 R6_2.5.1
## [58] zlibbioc_1.51.2