Brings SummarizedExperiment to the tidyverse!
website: stemangiola.github.io/tidySummarizedExperiment/
Please also have a look at
tidySummarizedExperiment provides a bridge between Bioconductor SummarizedExperiment (Morgan et al. 2020) and the tidyverse (Wickham et al. 2019). It creates an invisible layer that enables viewing the Bioconductor SummarizedExperiment object as a tidyverse tibble, and provides SummarizedExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.
SummarizedExperiment-compatible Functions | Description |
---|---|
all |
After all tidySummarizedExperiment is a
SummarizedExperiment object, just better |
tidyverse Packages | Description |
---|---|
dplyr |
Almost all dplyr APIs like for any tibble |
tidyr |
Almost all tidyr APIs like for any tibble |
ggplot2 |
ggplot like for any tibble |
plotly |
plot_ly like for any tibble |
Utilities | Description |
---|---|
as_tibble |
Convert cell-wise information to a tbl_df |
tidySummarizedExperiment
, the best of both
worlds!This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs.
It looks like a tibble
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # ℹ 40 more rows
But it is a SummarizedExperiment object after all
## List of length 1
## names(1): counts
We can use tidyverse commands to explore the tidy SummarizedExperiment object.
We can use slice
to choose rows by position, for example
to choose the first row.
## # A SummarizedExperiment-tibble abstraction: 1 × 5
## # Features=1 | Samples=1 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
We can use filter
to choose rows by criteria.
## # A SummarizedExperiment-tibble abstraction: 58,396 × 5
## # Features=14599 | Samples=4 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # ℹ 40 more rows
We can use select
to choose columns.
## # A tibble: 102,193 × 1
## .sample
## <chr>
## 1 untrt1
## 2 untrt1
## 3 untrt1
## 4 untrt1
## 5 untrt1
## 6 untrt1
## 7 untrt1
## 8 untrt1
## 9 untrt1
## 10 untrt1
## # ℹ 102,183 more rows
We can use count
to count how many rows we have for each
sample.
## # A tibble: 7 × 2
## .sample n
## <chr> <int>
## 1 trt1 14599
## 2 trt2 14599
## 3 trt3 14599
## 4 untrt1 14599
## 5 untrt2 14599
## 6 untrt3 14599
## 7 untrt4 14599
We can use distinct
to see what distinct sample
information we have.
## # A tibble: 7 × 3
## .sample condition type
## <chr> <chr> <chr>
## 1 untrt1 untreated single_end
## 2 untrt2 untreated single_end
## 3 untrt3 untreated paired_end
## 4 untrt4 untreated paired_end
## 5 trt1 treated single_end
## 6 trt2 treated paired_end
## 7 trt3 treated paired_end
We could use rename
to rename a column. For example, to
modify the type column name.
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts condition sequencing
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # ℹ 40 more rows
We could use mutate
to create a column. For example, we
could create a new type column that contains single and paired instead
of single_end and paired_end.
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single
## 2 FBgn0000008 untrt1 92 untreated single
## 3 FBgn0000014 untrt1 5 untreated single
## 4 FBgn0000015 untrt1 0 untreated single
## 5 FBgn0000017 untrt1 4664 untreated single
## 6 FBgn0000018 untrt1 583 untreated single
## 7 FBgn0000022 untrt1 0 untreated single
## 8 FBgn0000024 untrt1 10 untreated single
## 9 FBgn0000028 untrt1 0 untreated single
## 10 FBgn0000032 untrt1 1446 untreated single
## # ℹ 40 more rows
We could use unite
to combine multiple columns into a
single column.
## # A SummarizedExperiment-tibble abstraction: 102,193 × 4
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts group
## <chr> <chr> <int> <chr>
## 1 FBgn0000003 untrt1 0 untreated_single_end
## 2 FBgn0000008 untrt1 92 untreated_single_end
## 3 FBgn0000014 untrt1 5 untreated_single_end
## 4 FBgn0000015 untrt1 0 untreated_single_end
## 5 FBgn0000017 untrt1 4664 untreated_single_end
## 6 FBgn0000018 untrt1 583 untreated_single_end
## 7 FBgn0000022 untrt1 0 untreated_single_end
## 8 FBgn0000024 untrt1 10 untreated_single_end
## 9 FBgn0000028 untrt1 0 untreated_single_end
## 10 FBgn0000032 untrt1 1446 untreated_single_end
## # ℹ 40 more rows
We can also combine commands with the tidyverse pipe
%>%
.
For example, we could combine group_by
and
summarise
to get the total counts for each sample.
## # A tibble: 7 × 2
## .sample total_counts
## <chr> <int>
## 1 trt1 18670279
## 2 trt2 9571826
## 3 trt3 10343856
## 4 untrt1 13972512
## 5 untrt2 21911438
## 6 untrt3 8358426
## 7 untrt4 9841335
We could combine group_by
, mutate
and
filter
to get the transcripts with mean count > 0.
## # A tibble: 86,513 × 6
## # Groups: .feature [12,359]
## .feature .sample counts condition type mean_count
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 FBgn0000003 untrt1 0 untreated single_end 0.143
## 2 FBgn0000008 untrt1 92 untreated single_end 99.6
## 3 FBgn0000014 untrt1 5 untreated single_end 1.43
## 4 FBgn0000015 untrt1 0 untreated single_end 0.857
## 5 FBgn0000017 untrt1 4664 untreated single_end 4672.
## 6 FBgn0000018 untrt1 583 untreated single_end 461.
## 7 FBgn0000022 untrt1 0 untreated single_end 0.143
## 8 FBgn0000024 untrt1 10 untreated single_end 7
## 9 FBgn0000028 untrt1 0 untreated single_end 0.429
## 10 FBgn0000032 untrt1 1446 untreated single_end 1085.
## # ℹ 86,503 more rows
my_theme <-
list(
scale_fill_brewer(palette="Set1"),
scale_color_brewer(palette="Set1"),
theme_bw() +
theme(
panel.border=element_blank(),
axis.line=element_line(),
panel.grid.major=element_line(size=0.2),
panel.grid.minor=element_line(size=0.1),
text=element_text(size=12),
legend.position="bottom",
aspect.ratio=1,
strip.background=element_blank(),
axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))
)
)
We can treat pasilla_tidy
as a normal tibble for
plotting.
Here we plot the distribution of counts per sample.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidyr_1.3.1 dplyr_1.1.4
## [3] tidySummarizedExperiment_1.17.0 ttservice_0.4.1
## [5] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [7] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
## [9] IRanges_2.41.2 S4Vectors_0.45.2
## [11] BiocGenerics_0.53.3 generics_0.1.3
## [13] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [15] ggplot2_3.5.1 knitr_1.49
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.49 bslib_0.8.0
## [4] htmlwidgets_1.6.4 lattice_0.22-6 vctrs_0.6.5
## [7] tools_4.4.2 tibble_3.2.1 fansi_1.0.6
## [10] pkgconfig_2.0.3 Matrix_1.7-1 data.table_1.16.4
## [13] RColorBrewer_1.1-3 lifecycle_1.0.4 GenomeInfoDbData_1.2.13
## [16] farver_2.1.2 compiler_4.4.2 stringr_1.5.1
## [19] munsell_0.5.1 htmltools_0.5.8.1 sys_3.4.3
## [22] buildtools_1.0.0 sass_0.4.9 yaml_2.3.10
## [25] lazyeval_0.2.2 plotly_4.10.4 pillar_1.10.0
## [28] crayon_1.5.3 jquerylib_0.1.4 ellipsis_0.3.2
## [31] cachem_1.1.0 DelayedArray_0.33.3 abind_1.4-8
## [34] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4
## [37] purrr_1.0.2 labeling_0.4.3 maketools_1.3.1
## [40] fastmap_1.2.0 grid_4.4.2 colorspace_2.1-1
## [43] cli_3.6.3 SparseArray_1.7.2 magrittr_2.0.3
## [46] S4Arrays_1.7.1 utf8_1.2.4 withr_3.0.2
## [49] scales_1.3.0 UCSC.utils_1.3.0 rmarkdown_2.29
## [52] XVector_0.47.0 httr_1.4.7 evaluate_1.0.1
## [55] viridisLite_0.4.2 rlang_1.1.4 glue_1.8.0
## [58] jsonlite_1.8.9 R6_2.5.1 zlibbioc_1.52.0