autonomics
is created to make cross-platform omics
analysis intuitive and enjoyable. It contains import +
preprocess + analyze + visualize one-liners for
RNAseq, MS proteomics, SOMAscan and
Metabolon platforms and a generic importer for data
from any rectangular omics file. With a focus on not only
automation but also customization, these importers have a flexible
formula
/ block
/contrastdefs
interface which allows to define any statistical formula, fit any
general linear model, quantify any contrast, and use random effects or
precision weights if required, building on top of the powerful
limma
platform for statistical analysis. It also offers
exquisite support for analyzing complex designs such as
the innovative contrastogram visualization to unravel
and summarize the differential expression structure in complex designs.
By autonomating repetitive tasks, generifying common steps, and
intuifying complex designs, it makes cross-platform omics data analysis
a fun experience. Try it and enjoy :-).
Autonomics offers import + preprocess + analyze + visualize one-liners for RNAseq, MS proteomics, SOMAscan and Metabolon platforms, as well a generic importer for data from any rectangular omics file. We will discuss each of these in more detail in the following sections, but all of them follow the following general steps:
~0+subgroup
Autonomics provides ready-to-use importers for both count as well as BAM files, which read / preprocess / analyze RNAseq data. Specific to RNAseq is counting reads using Rsubread::featureCounts (for read_rna_bams)), normalizing for library composition (cpm) or gene length (tpm), estimating voom precision weights, and using these in linear modeling. Let’s illustrate both of these on example datasets:
Billing et al. (2019) studied the differentiation of embryonic (E00) into mesenchymal stem cells (E00.8, E01, E02, E05, E15, E30), with similar properties as bone-marrow mesenchymal stem cells (M00). Importing the RNAseq counts:
require(autonomics, quietly = TRUE)
##
## Attaching package: 'autonomics'
## The following objects are masked from 'package:stats':
##
## biplot, loadings
## The following object is masked from 'package:base':
##
## beta
file <- system.file('extdata/billing19.rnacounts.txt', package = 'autonomics')
object <- read_rnaseq_counts(file = file, fit = 'limma', plot = TRUE, label = 'gene')
## Read /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/billing19.rnacounts.txt
## Preprocess
## Keep 24/24 features: count >= 10 (~subgroup)
## counts: add 0.5
## cpm: tmm scale libsizes
## cpm
## voom: ~subgroup
## counts: rm 0.5
## Add PCA
## LinMod
## Code subgroup
## level
## coefficient E00 E00.8 E01 E02 E05 E15 E30 M00
## Intercept 1 . . . . . . .
## E00.8-E00 -1 1 . . . . . .
## E01-E00 -1 . 1 . . . . .
## E02-E00 -1 . . 1 . . . .
## E05-E00 -1 . . . 1 . . .
## E15-E00 -1 . . . . 1 . .
## E30-E00 -1 . . . . . 1 .
## M00-E00 -1 . . . . . . 1
## lmFit(~subgroup, weights = assays(object)$weights)
## coefficient fit downfdr upfdr downp upp
## <fctr> <fctr> <int> <int> <int> <int>
## 1: E00.8-E00 limma 1 2 1 4
## 2: E01-E00 limma 3 2 5 3
## 3: E02-E00 limma 2 2 3 4
## 4: E05-E00 limma 1 2 2 2
## 5: E15-E00 limma 8 9 8 10
## 6: E30-E00 limma 7 10 8 10
## 7: M00-E00 limma 6 7 6 7
## Plot summary
Billing et al. (2016) compared three types of stem cells: embryonic (E), embryonic differentiated into mesenchymal (EM), and bone-marrow mesenchymal (BM). Importing a downsized version of the RNAseq BAM files (with only a subset of all reads):
Proper preprocessing leads to increased power:
file <- system.file('extdata/billing19.rnacounts.txt', package = 'autonomics')
# log2counts
object <- read_rnaseq_counts(file,
cpm = FALSE, voom = FALSE, fit = 'limma', verbose = FALSE, plot = FALSE)
## LinMod
## Code subgroup
## level
## coefficient E00 E00.8 E01 E02 E05 E15 E30 M00
## Intercept 1 . . . . . . .
## E00.8-E00 -1 1 . . . . . .
## E01-E00 -1 . 1 . . . . .
## E02-E00 -1 . . 1 . . . .
## E05-E00 -1 . . . 1 . . .
## E15-E00 -1 . . . . 1 . .
## E30-E00 -1 . . . . . 1 .
## M00-E00 -1 . . . . . . 1
colSums(summarize_fit(fdt(object), 'limma')[-1, c(3,4)])
## downfdr upfdr
## 7 13
# log2cpm
object <- read_rnaseq_counts(file,
cpm = TRUE, voom = FALSE, fit = 'limma', verbose = FALSE, plot = FALSE)
## LinMod
## Code subgroup
## level
## coefficient E00 E00.8 E01 E02 E05 E15 E30 M00
## Intercept 1 . . . . . . .
## E00.8-E00 -1 1 . . . . . .
## E01-E00 -1 . 1 . . . . .
## E02-E00 -1 . . 1 . . . .
## E05-E00 -1 . . . 1 . . .
## E15-E00 -1 . . . . 1 . .
## E30-E00 -1 . . . . . 1 .
## M00-E00 -1 . . . . . . 1
colSums(summarize_fit(fdt(object), 'limma')[-1, c(3,4)])
## downfdr upfdr
## 28 33
# log2cpm + voom
object <- read_rnaseq_counts(file, # log2 cpm + voom
cpm = TRUE, voom = TRUE, fit = 'limma', verbose = FALSE, plot = FALSE)
## LinMod
## Code subgroup
## level
## coefficient E00 E00.8 E01 E02 E05 E15 E30 M00
## Intercept 1 . . . . . . .
## E00.8-E00 -1 1 . . . . . .
## E01-E00 -1 . 1 . . . . .
## E02-E00 -1 . . 1 . . . .
## E05-E00 -1 . . . 1 . . .
## E15-E00 -1 . . . . 1 . .
## E30-E00 -1 . . . . . 1 .
## M00-E00 -1 . . . . . . 1
colSums(summarize_fit(fdt(object), 'limma')[-1, c(3,4)])
## downfdr upfdr
## 27 32
A popular approach in (DDA) MS proteomics data analysis is to use
MaxQuant to produce proteinGroups.txt
and phospho (STY)Sites.txt files. These can be imported
using read_proteingroups
/ read_phosphosites
,
which :
An LFQ intensities example is the Fukuda et al. (2020) dataset, which compares the proteome of 30d zebrafish juveniles versus adults using label-free quantification. In the volcano plot measured values are shown with circles, imputed values as triangles.
file <- system.file('extdata/fukuda20.proteingroups.txt', package = 'autonomics')
object <- read_maxquant_proteingroups(file = file, plot = TRUE)
## Read proteingroups maxlfq /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/fukuda20.proteingroups.txt
## contamin fastahdrs /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/contaminants.tsv
## maxquant fastahdrs
## Annotate Uncollapse 20 proIds into 32 proteins
## Drop REV_ 32 proteins
## Annotate 0 using uniprothdrs
## 1 using contaminanthdrs
## 29 using maxquanthdrs
## 0 using uniprotrestapi
## Filter 32 proteins
## within proId 31 proteins swissprot > trembl
## 25 proteins fullseq > fragment
## 20 proteins protein > transcript > homolog > prediction > uncertain
## Add REV__ 20 proteins
## Recollapse 20 proIds
## SumExp
## Replace 0->NA for 27/120 values (in 11/20 features of 6/6 samples)
## Standardize snames: LFQ intensity 30dpt.R1 -> 30dpt.R1
## Retain 19/20 features: ~reverse == ""
## Retain 18/19 features: ~contaminant == ""
## Add PCA
## LinMod
## Code subgroup
## level
## coefficient X30dpt Adult
## Intercept 1 .
## Adult-X30dpt -1 1
## lmFit(~subgroup)
## coefficient fit downfdr upfdr downp upp
## <fctr> <fctr> <int> <int> <int> <int>
## 1: Adult-X30dpt limma 2 2 2 2
## Plot summary
Normalized ratios were used in the proteomics
profiling of the earlier described Billing et
al. (2019) study, which examined the differentiation of
embryonic stem cells (E00) into mesenchymal stem cells
(E01,E02, E05,
E15, E30), and compared these to
bone-marrow derived mesenchymal stem cells (M00). The
proteomics profiling experiment was performed using MS1 labeling: a
light (L) labeled internal standard was created by
mixing all samples in equal amounts, the subsequent samples were either
medium (M) or heavy (H) labeled. Not
all label combinations were of interest, and the
select_subgroups
argument allows to specifies which
subgroup combinations to retain:
file <- system.file('extdata/billing19.proteingroups.txt', package = 'autonomics')
subgroups <- c('E00_STD', 'E01_STD', 'E02_STD', 'E05_STD', 'E15_STD', 'E30_STD', 'M00_STD')
object <- read_maxquant_proteingroups(file = file, subgroups = subgroups, plot = TRUE)
## Read proteingroups normalizedratio /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/billing19.proteingroups.txt
## contamin fastahdrs /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/contaminants.tsv
## maxquant fastahdrs
## Annotate Uncollapse 25 proIds into 83 proteins
## Drop REV_ 45 proteins
## Annotate 0 using uniprothdrs
## 1 using contaminanthdrs
## 34 using maxquanthdrs
## 0 using uniprotrestapi
## Filter 45 proteins
## within proId 24 proteins swissprot > trembl
## Add REV__ 24 proteins
## Recollapse 25 proIds
## SumExp
## Replace NaN->NA for 175/825 values (in 12/25 features of 33/33 samples)
## Standardize snames: Ratio M/L normalized STD(L).E00(M).E01(H).R1 -> STD(L).E00(M).E01(H).R1{M/L}
## Retain 21/33 samples: ~subgroup %in% subgroups
## Retain 24/25 features: ~reverse == ""
## Retain 23/24 features: ~contaminant == ""
## Retain 22/23 features: non-zero, non-NA, and non-NaN for some sample
## Add PCA
## LinMod
## Code subgroup
## level
## coefficient E00_STD E01_STD E02_STD E05_STD E15_STD E30_STD M00_STD
## Intercept 1 . . . . . .
## E01_STD-E00_STD -1 1 . . . . .
## E02_STD-E00_STD -1 . 1 . . . .
## E05_STD-E00_STD -1 . . 1 . . .
## E15_STD-E00_STD -1 . . . 1 . .
## E30_STD-E00_STD -1 . . . . 1 .
## M00_STD-E00_STD -1 . . . . . 1
## lmFit(~subgroup)
## coefficient fit downfdr upfdr downp upp
## <fctr> <fctr> <int> <int> <int> <int>
## 1: E01_STD-E00_STD limma 0 0 0 1
## 2: E02_STD-E00_STD limma 0 2 1 2
## 3: E05_STD-E00_STD limma 0 0 0 4
## 4: E15_STD-E00_STD limma 2 9 2 10
## 5: E30_STD-E00_STD limma 2 8 2 8
## 6: M00_STD-E00_STD limma 2 6 2 6
## Plot summary
The phosphosites can be read in a similar fashion:
fosfile <- system.file('extdata/billing19.phosphosites.txt', package = 'autonomics')
profile <- system.file('extdata/billing19.proteingroups.txt', package = 'autonomics')
subgroups <- c('E00_STD', 'E01_STD', 'E02_STD', 'E05_STD', 'E15_STD', 'E30_STD', 'M00_STD')
object <- read_maxquant_phosphosites(fosfile = fosfile, profile = profile, subgroups = subgroups, plot = TRUE)
## Read proteingroups normalizedratio /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/billing19.proteingroups.txt
## phosphosites normalizedratio /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/billing19.phosphosites.txt
## Read 21 phosphosites in proteins, contaminants, reverse
## Retain 21 phosphosites in single proteingroup
## Retain 21 phosphosites with signal in some sample
## Keep proteingroup uniprots
## contamin fastahdrs /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/contaminants.tsv
## maxquant fastahdrs
## Annotate Uncollapse 21 fosIds into 53 proteins
## Drop REV_ 28 proteins
## Annotate 0 using uniprothdrs
## 1 using contaminanthdrs
## 20 using maxquanthdrs
## 0 using uniprotrestapi
## Filter 28 proteins
## within fosId 16 proteins swissprot > trembl
## Add REV__ 16 proteins
## Recollapse 21 fosIds
## SumExp
## Replace NaN->NA for 104/693 values (in 6/21 features of 33/33 samples)
## Retain 21/33 samples: ~subgroup %in% subgroups
## Retain 20/21 features: ~reverse == ""
## Retain 19/20 features: ~contaminant == ""
## Add PCA
## LinMod
## Code subgroup
## level
## coefficient E00_STD E01_STD E02_STD E05_STD E15_STD E30_STD M00_STD
## Intercept 1 . . . . . .
## E01_STD-E00_STD -1 1 . . . . .
## E02_STD-E00_STD -1 . 1 . . . .
## E05_STD-E00_STD -1 . . 1 . . .
## E15_STD-E00_STD -1 . . . 1 . .
## E30_STD-E00_STD -1 . . . . 1 .
## M00_STD-E00_STD -1 . . . . . 1
## lmFit(~subgroup)
## coefficient fit downfdr upfdr downp upp
## <fctr> <fctr> <int> <int> <int> <int>
## 1: E01_STD-E00_STD limma 1 0 1 2
## 2: E02_STD-E00_STD limma 2 1 3 2
## 3: E05_STD-E00_STD limma 1 2 1 4
## 4: E15_STD-E00_STD limma 2 6 2 6
## 5: E30_STD-E00_STD limma 1 4 1 5
## 6: M00_STD-E00_STD limma 2 3 2 4
## Plot summary
Metabolon delivers metabolomics results in the form
of an xlsx file, with values of interest likely in the
(second) OrigScale sheet. Features are laid out in
rows, samples in columns, and additional feature/sample data are
self-contained in the file. Metabolon data can be read/analyzed with the
autonomics function read_metabolon
, as illustrated below on
the dataset of Halama and and Atkin (2018), who
investigates the effects of hypoglycemia on a number of
subjects (SUB) at four different time points
(t0, t1, t2,
t3):
file <- system.file('extdata/atkin.metabolon.xlsx', package = 'autonomics')
object <- read_metabolon(file, block = 'Subject', plot = TRUE)
## Read: /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/atkin.metabolon.xlsx
## log2 metabolon
## Add PCA
## LinMod
## Code subgroup
## level
## coefficient t0 t1 t2 t3
## Intercept 1 . . .
## t1-t0 -1 1 . .
## t2-t0 -1 . 1 .
## t3-t0 -1 . . 1
## Dupcor `Subject`
## lmFit(~subgroup | Subject)
## coefficient fit downfdr upfdr downp upp
## <fctr> <fctr> <int> <int> <int> <int>
## 1: t1-t0 limma 8 1 10 1
## 2: t2-t0 limma 7 3 8 3
## 3: t3-t0 limma 1 1 4 2
## Plot summary
Somascan results are returned in a txt file with
extension .adat. Features are laid out in columns and
samples in rows. Rectangles with feature/sample data are also contained
in the file. The autonomics function read_somascan
reads/analyzes SOMAscan files, additionally filtering samples/features
for quality and type. This is illustrated on the above described dataset
from Halama and and Atkin (2018), who investigated the
effects of hypoglycemia on a number of subjects
(SUB) at four different time points
(t0, t1, t2,
t3):
file <- system.file('extdata/atkin.somascan.adat', package = 'autonomics')
object <- read_somascan(file, block = 'Subject', plot = TRUE)
## Read: /tmp/RtmpT5MYSZ/Rinst2fec7ea72981/autonomics/extdata/atkin.somascan.adat
## log2 somascan
## Add PCA
## LinMod
## Code SampleGroup
## level
## coefficient t0 t1 t2 t3
## Intercept 1 . . .
## t1-t0 -1 1 . .
## t2-t0 -1 . 1 .
## t3-t0 -1 . . 1
## Dupcor `Subject`
## lmFit(~SampleGroup | Subject)
## coefficient fit downfdr upfdr downp upp
## <fctr> <fctr> <int> <int> <int> <int>
## 1: t1-t0 limma 0 1 1 3
## 2: t2-t0 limma 2 2 2 4
## 3: t3-t0 limma 0 0 0 0
## Plot summary
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] autonomics_1.15.10 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3
## [3] readxl_1.4.3 rlang_1.1.4
## [5] magrittr_2.0.3 matrixStats_1.4.1
## [7] compiler_4.4.2 RSQLite_2.3.8
## [9] reshape2_1.4.4 vctrs_0.6.5
## [11] stringr_1.5.1 pkgconfig_2.0.3
## [13] crayon_1.5.3 fastmap_1.2.0
## [15] dbplyr_2.5.0 XVector_0.47.0
## [17] labeling_0.4.3 utf8_1.2.4
## [19] rmarkdown_2.29 UCSC.utils_1.3.0
## [21] purrr_1.0.2 bit_4.5.0
## [23] xfun_0.49 MultiAssayExperiment_1.33.0
## [25] zlibbioc_1.52.0 cachem_1.1.0
## [27] GenomeInfoDb_1.43.1 jsonlite_1.8.9
## [29] fractional_0.1.3 blob_1.2.4
## [31] DelayedArray_0.33.2 BiocParallel_1.41.0
## [33] tweenr_2.0.3 codingMatrices_0.4.0
## [35] parallel_4.4.2 cluster_2.1.6
## [37] R6_2.5.1 bslib_0.8.0
## [39] stringi_1.8.4 RColorBrewer_1.1-3
## [41] limma_3.63.2 GenomicRanges_1.59.1
## [43] jquerylib_0.1.4 cellranger_1.1.0
## [45] Rcpp_1.0.13-1 SummarizedExperiment_1.37.0
## [47] knitr_1.49 R.utils_2.12.3
## [49] IRanges_2.41.1 igraph_2.1.1
## [51] Matrix_1.7-1 tidyselect_1.2.1
## [53] abind_1.4-8 yaml_2.3.10
## [55] codetools_0.2-20 curl_6.0.1
## [57] plyr_1.8.9 lattice_0.22-6
## [59] tibble_3.2.1 rARPACK_0.11-0
## [61] Biobase_2.67.0 withr_3.0.2
## [63] evaluate_1.0.1 polyclip_1.10-7
## [65] BiocFileCache_2.15.0 pillar_1.9.0
## [67] BiocManager_1.30.25 filelock_1.0.3
## [69] MatrixGenerics_1.19.0 stats4_4.4.2
## [71] ellipse_0.5.0 generics_0.1.3
## [73] S4Vectors_0.45.2 ggplot2_3.5.1
## [75] munsell_0.5.1 scales_1.3.0
## [77] glue_1.8.0 maketools_1.3.1
## [79] tools_4.4.2 sys_3.4.3
## [81] data.table_1.16.2 RSpectra_0.16-2
## [83] locfit_1.5-9.10 buildtools_1.0.0
## [85] grid_4.4.2 tidyr_1.3.1
## [87] edgeR_4.5.0 colorspace_2.1-1
## [89] GenomeInfoDbData_1.2.13 ggforce_0.4.2
## [91] cli_3.6.3 fansi_1.0.6
## [93] mixOmics_6.31.0 S4Arrays_1.7.1
## [95] rematch_2.0.0 dplyr_1.1.4
## [97] corpcor_1.6.10 pcaMethods_1.99.0
## [99] gtable_0.3.6 R.methodsS3_1.8.2
## [101] sass_0.4.9 digest_0.6.37
## [103] BiocGenerics_0.53.3 SparseArray_1.7.2
## [105] ggrepel_0.9.6 farver_2.1.2
## [107] memoise_2.0.1 htmltools_0.5.8.1
## [109] R.oo_1.27.0 lifecycle_1.0.4
## [111] httr_1.4.7 statmod_1.5.0
## [113] bit64_4.5.2 MASS_7.3-61
A M Billing, S S Dib, A M Bhagwat, I T da Silva, R D Drummond, S Hayat, R Al-Mismar, H Ben-Hamidane, N Goswami, K Engholm-Keller, M R Larsen, K Suhre, A Rafii, J Graummann (2019). Mol Cell Proteomics. 18(10):1950-1966. doi: 10.1074/mcp.RA119.001356.
A M Billing, H Ben Hamidane, S S Dib, R J Cotton, A M Bhagwat, P Kumar, S Hayat, N A Yousri, N Goswami, K Suhre, A Rafii, J Graumann (2016). Comprehensive transcriptomics and proteomics characterization of human mesenchymal stem cells reveals source specific cellular markers. Sci Rep. 9;6:21507. doi: 10.1038/srep21507.
R Fukuda, R Marin-Juez, H El-Sammak, A Beisaw, R Ramadass, C Kuenne, S Guenther, A Konzer, A M Bhagwat, J Graumann, D YR Stainier (2020). EMBO Rep. 21(8): e49752. doi: 10.15252/embr.201949752
A Halama, M Kulinski, S Dib, S B Zaghlool, K S Siveen, A Iskandarani, J Zierer 3, K S Prabhu, N J Satheesh, A M Bhagwat, S Uddin, G Kastenmüller, O Elemento, S S Gross, K Suhre (2018). Accelerated lipid catabolism and autophagy are cancer survival mechanisms under inhibited glutaminolysis. Cancer Lett., 430:133-147. doi:10.1016/j.canlet.2018.05.017
A Halama, H Kahal, A M Bhagwat, J Zierer, T Sathyapalan, J Graumann, K Suhre, S L Atkin (2018). Metabolic and proteomics signatures of hypoglycaemia in type 2 diabetes. Diabetes, obesity and metabolism, https://doi.org/10.1111/dom.13602