| Title: | Preprocessing, analyzing, and reporting of RNA-seq data |
|---|---|
| Description: | Provides classes and functions for quality control, filtering, normalization and differential expression analysis of pre-processed `RNA-seq` data. Data can be imported from `SummarizedExperiment` as well as `matrix` objects and can be annotated from `BioMart`. Filtering for genes without too low expression or containing required annotations, as well as filtering for samples with sufficient correlation to other samples or total number of reads is supported. The standard normalization methods including cpm, rpkm and tpm can be used, and 'DESeq2` as well as voom differential expression analyses are available. |
| Authors: | Daniel Sabanés Bové [aut, cre], Namrata Bhatia [aut], Stefanie Bienert [aut], Benoit Falquet [aut], Haocheng Li [aut], Jeff Luong [aut], Lyndsee Midori Zhang [aut], Alex Richardson [aut], Simona Rossomanno [aut], Tim Treis [aut], Mark Yan [aut], Naomi Chang [aut], Chendi Liao [aut], Carolyn Zhang [aut], Joseph N. Paulson [aut], F. Hoffmann-La Roche AG [cph, fnd] |
| Maintainer: | Daniel Sabanés Bové <[email protected]> |
| License: | Apache License 2.0 |
| Version: | 1.17.0 |
| Built: | 2026-05-30 08:05:12 UTC |
| Source: | https://github.com/bioc/hermes |
hermes Packagehermes facilitates preprocessing, analyzing, and reporting of RNA-seq data.
Maintainer: Daniel Sabanés Bové [email protected]
Authors:
Namrata Bhatia
Stefanie Bienert
Benoit Falquet
Haocheng Li
Jeff Luong
Lyndsee Midori Zhang [email protected]
Alex Richardson
Simona Rossomanno
Tim Treis
Mark Yan
Naomi Chang
Chendi Liao [email protected]
Carolyn Zhang
Joseph N. Paulson [email protected]
Other contributors:
F. Hoffmann-La Roche AG [copyright holder, funder]
Useful links:
Report bugs at https://github.com/insightsengineering/hermes/issues
The function add_quality_flags() adds quality flag information to a AnyHermesData object:
low_expression_flag: for each gene, counts how many samples don't pass a minimum
expression Counts per Million (CPM) threshold. If too many, then it flags this gene
as a "low expression" gene.
tech_failure_flag: first calculates the Pearson correlation matrix of the sample wise
CPM values, resulting in a matrix measuring the correlation between samples.
Then compares the average correlation per sample with a threshold - if it is too low,
then the sample is flagged as a "technical failure" sample.
low_depth_flag: computes the library size (total number of counts) per sample.
If this number is too low, the sample is flagged as a "low depth" sample.
Separate helper functions are internally used to create the flags, and
separate getter functions allow easy access to the quality control flags in an object.
add_quality_flags(object, control = control_quality(), overwrite = FALSE) h_low_expression_flag(object, control = control_quality()) h_low_depth_flag(object, control = control_quality()) h_tech_failure_flag(object, control = control_quality()) get_tech_failure(object) get_low_depth(object) get_low_expression(object)add_quality_flags(object, control = control_quality(), overwrite = FALSE) h_low_expression_flag(object, control = control_quality()) h_low_depth_flag(object, control = control_quality()) h_tech_failure_flag(object, control = control_quality()) get_tech_failure(object) get_low_depth(object) get_low_expression(object)
object |
( |
control |
( |
overwrite |
( |
While object already has the variables mentioned above as part of the
rowData and colData (as this is enforced by the validation
method for AnyHermesData), they are usually still NA after the initial
object creation.
The input object with added quality flags.
h_low_expression_flag(): creates the low expression flag for genes
given control settings.
h_low_depth_flag(): creates the low depth (library size) flag for samples
given control settings.
h_tech_failure_flag(): creates the technical failure flag for samples
given control settings.
get_tech_failure(): get the technical failure flags for all samples.
get_low_depth(): get the low depth failure flags for all samples.
get_low_expression(): get the low expression failure flags for all genes.
control_quality() for the detailed settings specifications;
set_tech_failure() to manually flag samples as technical failures.
# Adding default quality flags to `AnyHermesData` object. object <- hermes_data result <- add_quality_flags(object) which(get_tech_failure(result) != get_tech_failure(object)) head(get_low_expression(result)) head(get_tech_failure(result)) head(get_low_depth(result)) # It is possible to overwrite flags if needed, which will trigger a message. result2 <- add_quality_flags(result, control_quality(min_cpm = 1000), overwrite = TRUE) # Separate calculation of low expression flag. low_expr_flag <- h_low_expression_flag( object, control_quality(min_cpm = 500, min_cpm_prop = 0.9) ) length(low_expr_flag) == nrow(object) head(low_expr_flag) # Separate calculation of low depth flag. low_depth_flag <- h_low_depth_flag(object, control_quality(min_depth = 5)) length(low_depth_flag) == ncol(object) head(low_depth_flag) # Separate calculation of technical failure flag. tech_failure_flag <- h_tech_failure_flag(object, control_quality(min_corr = 0.35)) length(tech_failure_flag) == ncol(object) head(tech_failure_flag) head(get_tech_failure(object)) head(get_low_depth(object)) head(get_low_expression(object))# Adding default quality flags to `AnyHermesData` object. object <- hermes_data result <- add_quality_flags(object) which(get_tech_failure(result) != get_tech_failure(object)) head(get_low_expression(result)) head(get_tech_failure(result)) head(get_low_depth(result)) # It is possible to overwrite flags if needed, which will trigger a message. result2 <- add_quality_flags(result, control_quality(min_cpm = 1000), overwrite = TRUE) # Separate calculation of low expression flag. low_expr_flag <- h_low_expression_flag( object, control_quality(min_cpm = 500, min_cpm_prop = 0.9) ) length(low_expr_flag) == nrow(object) head(low_expr_flag) # Separate calculation of low depth flag. low_depth_flag <- h_low_depth_flag(object, control_quality(min_depth = 5)) length(low_depth_flag) == ncol(object) head(low_depth_flag) # Separate calculation of technical failure flag. tech_failure_flag <- h_tech_failure_flag(object, control_quality(min_corr = 0.35)) length(tech_failure_flag) == ncol(object) head(tech_failure_flag) head(get_tech_failure(object)) head(get_low_depth(object)) head(get_low_expression(object))
Internal function to check whether a whole vector is NA.
all_na(x)all_na(x)
x |
( |
Corresponding flag.
These methods access and set the gene annotations stored in a AnyHermesData object.
## S4 method for signature 'AnyHermesData' annotation(object, ...) .row_data_annotation_cols ## S4 replacement method for signature 'AnyHermesData,DataFrame' annotation(object) <- value## S4 method for signature 'AnyHermesData' annotation(object, ...) .row_data_annotation_cols ## S4 replacement method for signature 'AnyHermesData,DataFrame' annotation(object) <- value
object |
( |
... |
not used. |
value |
( |
The annotation column names are available in the exported
character vector .row_data_annotation_cols.
The S4Vectors::DataFrame with the gene annotations:
symbol
desc
chromosome
size
When trying to replace the required annotations with completely missing
values for any genes, a warning will be given and the corresponding gene
IDs will be saved in the attribute annotation.missing.genes. Note also
that additional annotations beyond the required ones may be supplied and
will be stored.
object <- hermes_data head(annotation(object))object <- hermes_data head(annotation(object))
The documentation to this function lists all the conventional arguments in
additional checkmate assertions.
x |
an object to check. |
null.ok |
( |
.var.name |
( |
add |
( |
info |
( |
label |
( |
assert_that
We provide additional assertion functions which can be used together with
assertthat::assert_that().
We provide additional assertion functions which can be used together with
the checkmate functions. These are described in individual help pages
linked below.
is_class(x, class2) is_hermes_data(x) is_counts_vector(x) is_list_with(x, elements) one_provided(one, two) is_constant(x)is_class(x, class2) is_hermes_data(x) is_counts_vector(x) is_list_with(x, elements) one_provided(one, two) is_constant(x)
x |
an object to check. |
class2 |
( |
elements |
( |
one |
first input. |
two |
second input. |
Depending on the function prefix.
assert_ functions return the object invisibly if successful, and otherwise
throw an error message.
check_ functions return TRUE if successful, otherwise a string with the
error message.
test_ functions just return TRUE or FALSE.
is_class(): checks the class.
is_hermes_data(): checks whether x is an AnyHermesData object.
is_counts_vector(): checks for a vector of counts (positive integers).
is_list_with(): checks for a list containing elements.
one_provided(): checks that exactly one of the two inputs one, two is not NULL.
is_constant(): checks whether the vector x is constant (only supports numeric, factor,
character, logical). NAs are removed first.
# Assert a general class. a <- 5 is_class(a, "character") # Assert a `AnyHermesData` object. is_hermes_data(hermes_data) is_hermes_data(42) # Assert a counts vector. a <- 5L is_counts_vector(a) # Assert a list containing certain elements. b <- list(a = 5, b = 3) is_list_with(b, c("a", "c")) is_list_with(b, c("a", "b")) # Assert that exactly one of two arguments is provided. a <- 10 b <- 10 one_provided(a, b) one_provided(a, NULL) # Assert a constant vector. is_constant(c(1, 2)) is_constant(c(NA, 1)) is_constant(c("a", "a")) is_constant(factor(c("a", "a")))# Assert a general class. a <- 5 is_class(a, "character") # Assert a `AnyHermesData` object. is_hermes_data(hermes_data) is_hermes_data(42) # Assert a counts vector. a <- 5L is_counts_vector(a) # Assert a list containing certain elements. b <- list(a = 5, b = 3) is_list_with(b, c("a", "c")) is_list_with(b, c("a", "b")) # Assert that exactly one of two arguments is provided. a <- 10 b <- 10 one_provided(a, b) one_provided(a, NULL) # Assert a constant vector. is_constant(c(1, 2)) is_constant(c(NA, 1)) is_constant(c("a", "a")) is_constant(factor(c("a", "a")))
This generates all standard plots - histogram and q-q plot of library sizes, density plot of the (log) counts distributions, boxplot of the number of number of non-zero expressed genes per sample, and a stacked barplot of low expression genes by chromosome at default setting.
## S4 method for signature 'AnyHermesData' autoplot(object)## S4 method for signature 'AnyHermesData' autoplot(object)
object |
( |
A list with the ggplot objects from draw_libsize_hist(), draw_libsize_qq(),
draw_libsize_densities(), draw_nonzero_boxplot() and draw_genes_barplot()
functions with default settings.
result <- hermes_data autoplot(result)result <- hermes_data autoplot(result)
The calc_pca() function performs principal components analysis of the gene count
vectors across all samples.
A corresponding autoplot() method then can visualize the results.
calc_pca(object, assay_name = "counts", n_top = NULL)calc_pca(object, assay_name = "counts", n_top = NULL)
object |
( |
assay_name |
( |
n_top |
( |
PCA should be performed after filtering out low quality genes and samples, as well as normalization of counts.
In addition, genes with constant counts across all samples are excluded from
the analysis internally in calc_pca(). Centering and scaling is also applied internally.
Plots can be obtained with the ggplot2::autoplot() function
with the corresponding method from the ggfortify package to plot the
results of a principal components analysis saved in a HermesDataPca
object. See ggfortify::autoplot.prcomp() for details.
A HermesDataPca object which is an extension of the stats::prcomp class.
Afterwards correlations between principal components
and sample variables can be calculated, see pca_cor_samplevar.
object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() result <- calc_pca(object, assay_name = "tpm") summary(result) result1 <- calc_pca(object, assay_name = "tpm", n_top = 500) summary(result1) # Plot the results. autoplot(result) autoplot(result, x = 2, y = 3) autoplot(result, variance_percentage = FALSE) autoplot(result, label = TRUE, label.repel = TRUE)object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() result <- calc_pca(object, assay_name = "tpm") summary(result) result1 <- calc_pca(object, assay_name = "tpm", n_top = 500) summary(result1) # Plot the results. autoplot(result) autoplot(result, x = 2, y = 3) autoplot(result, variance_percentage = FALSE) autoplot(result, label = TRUE, label.repel = TRUE)
This function concatenates inputs like cat()
and prints them with newline.
cat_with_newline(...)cat_with_newline(...)
... |
inputs to concatenate. |
None, only used for the side effect of producing the concatenated output in the R console.
This is similar to cli::cat_line().
cat_with_newline("hello", "world")cat_with_newline("hello", "world")
AnyHermesData ObjectsThis method combines AnyHermesData objects with the same ranges but different
samples (columns in assays).
... |
( |
The combined AnyHermesData object.
Note that this just inherits
SummarizedExperiment::cbind,SummarizedExperiment-method(). When binding a
AnyHermesData object with a SummarizedExperiment::SummarizedExperiment
object, then the result will be a
SummarizedExperiment::SummarizedExperiment object (the more general
class).
Note that the combined object needs to have unique sample IDs (column names).
rbind to row bind objects.
a <- hermes_data[, 1:10] b <- hermes_data[, 11:20] result <- cbind(a, b) class(result)a <- hermes_data[, 1:10] b <- hermes_data[, 11:20] result <- cbind(a, b) class(result)
Check whether x is a (single) proportion.
check_proportion(x, null.ok = FALSE) assert_proportion( x, null.ok = FALSE, .var.name = checkmate::vname(x), add = NULL ) test_proportion(x, null.ok = FALSE) expect_proportion(x, null.ok = FALSE, info = NULL, label = vname(x))check_proportion(x, null.ok = FALSE) assert_proportion( x, null.ok = FALSE, .var.name = checkmate::vname(x), add = NULL ) test_proportion(x, null.ok = FALSE) expect_proportion(x, null.ok = FALSE, info = NULL, label = vname(x))
x |
an object to check. |
null.ok |
( |
.var.name |
( |
add |
( |
info |
( |
label |
( |
TRUE if successful, otherwise a string with the error message.
assertions for more details.
check_proportion(0.25)check_proportion(0.25)
This obtains the sample variables of a HermesData object together
with selected gene information.
col_data_with_genes(object, assay_name, genes)col_data_with_genes(object, assay_name, genes)
object |
( |
assay_name |
( |
genes |
( |
The combined data set, where the additional attribute gene_cols contains
the names of the columns obtained by extracting the genes information.
The class of the returned data set will depend on the class of colData, so usually
will be S4Vectors::DFrame.
result <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820")) tail(names(result)) result$GeneID.1820result <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820")) tail(names(result)) result$GeneID.1820
This helper function returns the Z-score from an assay stored as a matrix.
colMeanZscores(x)colMeanZscores(x)
x |
( |
A numeric vector containing the mean Z-score values for each
column in x.
object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() %>% assay("counts") colMeanZscores(object)object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() %>% assay("counts") colMeanZscores(object)
This helper function returns the first principal component from an assay
stored as a matrix.
colPrinComp1(x, center = TRUE, scale = TRUE)colPrinComp1(x, center = TRUE, scale = TRUE)
x |
( |
center |
( |
scale |
( |
A numeric vector containing the principal component values for each
column in x.
object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() %>% assay("counts") colPrinComp1(object)object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() %>% assay("counts") colPrinComp1(object)
BioMart
connect_biomart() creates a connection object of class ConnectionBiomart which contains
the biomaRt object of class biomaRt::Mart and the prefix of the object
which is used downstream for the query.
connect_biomart(prefix = c("ENSG", "GeneID"), version = NULL)connect_biomart(prefix = c("ENSG", "GeneID"), version = NULL)
prefix |
( |
version |
( |
This connects to the Ensembl data base of BioMart for human genes.
A specific version can be optionally chosen to ensure reproducibility of results
once a new release is available, as accessed data might then change.
ConnectionBiomart object.
if (interactive()) { connection <- connect_biomart("ENSG") }if (interactive()) { connection <- connect_biomart("ENSG") }
This control function allows for easy customization of the normalization settings.
control_normalize( log = TRUE, lib_sizes = NULL, prior_count = 1, fit_type = "parametric" )control_normalize( log = TRUE, lib_sizes = NULL, prior_count = 1, fit_type = "parametric" )
log |
( |
lib_sizes |
( |
prior_count |
(non-negative |
fit_type |
( |
List with the above settings used to perform the normalization procedure.
To be used with the normalize() function.
control_normalize() control_normalize(log = FALSE, lib_sizes = rep(1e6L, 20))control_normalize() control_normalize(log = FALSE, lib_sizes = rep(1e6L, 20))
Control function which specifies the quality flag settings. One or more settings can be customized. Not specified settings are left at defaults.
control_quality( min_cpm = 1, min_cpm_prop = 0.25, min_corr = 0.5, min_depth = NULL )control_quality( min_cpm = 1, min_cpm_prop = 0.25, min_corr = 0.5, min_depth = NULL )
min_cpm |
(non-negative |
min_cpm_prop |
( |
min_corr |
( |
min_depth |
(non-negative |
List with the above criteria to flag observations.
To be used with the add_quality_flags() function.
# Default settings. control_quality() # One or more settings can be customized. control_quality(min_cpm = 5, min_cpm_prop = 0.001)# Default settings. control_quality() # One or more settings can be customized. control_quality(min_cpm = 5, min_cpm_prop = 0.001)
New generic function to calculate correlations for one or two objects.
correlate(object, ...)correlate(object, ...)
object |
input of which the class will be used to decide the method. |
... |
additional arguments. |
Corresponding object that contains the correlation results.
pca_cor_samplevar and calc_cor which are the methods included for this generic function.
sample_cors <- correlate(hermes_data) autoplot(sample_cors) pca_sample_var_cors <- correlate(calc_pca(hermes_data), hermes_data) autoplot(pca_sample_var_cors)sample_cors <- correlate(hermes_data) autoplot(sample_cors) pca_sample_var_cors <- correlate(calc_pca(hermes_data), hermes_data) autoplot(pca_sample_var_cors)
AnyHermesData
The correlate() method can calculate the correlation matrix between the sample vectors of
counts from a specified assay. This produces a HermesDataCor object, which is an extension
of a matrix with additional quality flags in the slot flag_data
(containing the tech_failure_flag and low_depth_flag columns describing the original
input samples).
An autoplot() method then afterwards can produce the corresponding heatmap.
## S4 method for signature 'AnyHermesData' correlate(object, assay_name = "counts", method = "pearson", ...) ## S4 method for signature 'HermesDataCor' autoplot( object, flag_colors = c(`FALSE` = "green", `TRUE` = "red"), cor_colors = circlize::colorRamp2(c(0, 0.5, 1), c("red", "yellow", "green")), ... )## S4 method for signature 'AnyHermesData' correlate(object, assay_name = "counts", method = "pearson", ...) ## S4 method for signature 'HermesDataCor' autoplot( object, flag_colors = c(`FALSE` = "green", `TRUE` = "red"), cor_colors = circlize::colorRamp2(c(0, 0.5, 1), c("red", "yellow", "green")), ... )
object |
( |
assay_name |
( |
method |
( |
... |
other arguments to be passed to |
flag_colors |
(named |
cor_colors |
( |
A HermesDataCor object.
autoplot(HermesDataCor): This autoplot() method uses the ComplexHeatmap::Heatmap() function
to plot the correlations between samples saved in a HermesDataCor object.
object <- hermes_data # Calculate the sample correlation matrix. correlate(object) # We can specify another correlation coefficient to be calculated. result <- correlate(object, method = "spearman") # Plot the correlation matrix. autoplot(result) # We can customize the heatmap. autoplot(result, show_column_names = FALSE, show_row_names = FALSE) # Including changing the axis label text size. autoplot( result, row_names_gp = grid::gpar(fontsize = 8), column_names_gp = grid::gpar(fontsize = 8) )object <- hermes_data # Calculate the sample correlation matrix. correlate(object) # We can specify another correlation coefficient to be calculated. result <- correlate(object, method = "spearman") # Plot the correlation matrix. autoplot(result) # We can customize the heatmap. autoplot(result, show_column_names = FALSE, show_row_names = FALSE) # Including changing the axis label text size. autoplot( result, row_names_gp = grid::gpar(fontsize = 8), column_names_gp = grid::gpar(fontsize = 8) )
This correlate() method analyses the correlations (in R2 values) between all sample variables
in a AnyHermesData object and the principal components of the samples.
A corresponding autoplot() method then can visualize the results in a heatmap.
## S4 method for signature 'HermesDataPca' correlate(object, data) ## S4 method for signature 'HermesDataPcaCor' autoplot( object, cor_colors = circlize::colorRamp2(c(-1, 0, 1), c("blue", "white", "red")), ... )## S4 method for signature 'HermesDataPca' correlate(object, data) ## S4 method for signature 'HermesDataPcaCor' autoplot( object, cor_colors = circlize::colorRamp2(c(-1, 0, 1), c("blue", "white", "red")), ... )
object |
( |
data |
( |
cor_colors |
( |
... |
other arguments to be passed to |
A HermesDataPcaCor object with R2 values for all sample variables.
autoplot(HermesDataPcaCor): This plot method uses the ComplexHeatmap::Heatmap() function
to visualize a HermesDataPcaCor object.
h_pca_df_r2_matrix() which is used internally for the details.
object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() # Perform PCA and then correlate the prinicipal components with the sample variables. object_pca <- calc_pca(object) result <- correlate(object_pca, object) # Visualize the correlations in a heatmap. autoplot(result) # We can also choose to not reorder the columns. autoplot(result, cluster_columns = FALSE) # We can also choose break-points for color customization. autoplot( result, cor_colors = circlize::colorRamp2( c(-0.5, -0.25, 0, 0.25, 0.5, 0.75, 1), c("blue", "green", "purple", "yellow", "orange", "red", "brown") ) )object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() # Perform PCA and then correlate the prinicipal components with the sample variables. object_pca <- calc_pca(object) result <- correlate(object_pca, object) # Visualize the correlations in a heatmap. autoplot(result) # We can also choose to not reorder the columns. autoplot(result, cluster_columns = FALSE) # We can also choose break-points for color customization. autoplot( result, cor_colors = circlize::colorRamp2( c(-0.5, -0.25, 0, 0.25, 0.5, 0.75, 1), c("blue", "green", "purple", "yellow", "orange", "red", "brown") ) )
These methods access and set the counts assay in a AnyHermesData object.
## S4 method for signature 'AnyHermesData' counts(object, ...) ## S4 replacement method for signature 'AnyHermesData,matrix' counts(object, ..., withDimnames = TRUE) <- value## S4 method for signature 'AnyHermesData' counts(object, ...) ## S4 replacement method for signature 'AnyHermesData,matrix' counts(object, ..., withDimnames = TRUE) <- value
object |
( |
... |
not used. |
withDimnames |
( |
value |
( |
The counts assay.
counts(object = AnyHermesData) <- value:
a <- hermes_data result <- counts(a) class(result) head(result) counts(a) <- counts(a) + 100L head(counts(a))a <- hermes_data result <- counts(a) class(result) head(result) counts(a) <- counts(a) + 100L head(counts(a))
This function transforms a numeric vector into a factor corresponding to the quantile intervals. The intervals are left-open and right-closed.
cut_quantile(x, percentiles = c(1/3, 2/3), digits = 0)cut_quantile(x, percentiles = c(1/3, 2/3), digits = 0)
x |
( |
percentiles |
( |
digits |
( |
The factor with a description of the available quantiles as levels.
set.seed(452) x <- runif(10, -10, 10) cut_quantile(x, c(0.33333333, 0.6666666), digits = 4) x[1:4] <- NA cut_quantile(x)set.seed(452) x <- runif(10, -10, 10) cut_quantile(x, c(0.33333333, 0.6666666), digits = 4) x[1:4] <- NA cut_quantile(x)
DataFrame
This utility function converts all eligible character and logical variables in a
S4Vectors::DataFrame to factor variables. All factor variables get amended
with an explicit missing level. This includes both NA and empty strings.
df_cols_to_factor(data, omit_columns = NULL, na_level = "<Missing>")df_cols_to_factor(data, omit_columns = NULL, na_level = "<Missing>")
data |
( |
omit_columns |
( |
na_level |
( |
The modified data.
All required rowData and colData variables cannot be converted
to ensure proper downstream behavior. These are automatically omitted if found in data
and therefore do not need to be specified in omit_columns.
dat <- colData(summarized_experiment) any(vapply(dat, is.character, logical(1))) any(vapply(dat, is.logical, logical(1))) dat_converted <- df_cols_to_factor(dat) any(vapply(dat_converted, function(x) is.character(x) || is.logical(x), logical(1)))dat <- colData(summarized_experiment) any(vapply(dat, is.character, logical(1))) any(vapply(dat, is.logical, logical(1))) dat_converted <- df_cols_to_factor(dat) any(vapply(dat_converted, function(x) is.character(x) || is.logical(x), logical(1)))
The diff_expression() function performs differential expression analysis
using a method of preference.
A corresponding autoplot() method is visualizing the results as a volcano plot.
diff_expression(object, group, method = c("voom", "deseq2"), ...) ## S4 method for signature 'HermesDataDiffExpr' autoplot(object, adj_p_val_thresh = 0.05, log2_fc_thresh = 2.5)diff_expression(object, group, method = c("voom", "deseq2"), ...) ## S4 method for signature 'HermesDataDiffExpr' autoplot(object, adj_p_val_thresh = 0.05, log2_fc_thresh = 2.5)
object |
( |
group |
( |
method |
( |
... |
additional arguments passed to the helper function associated with the selected method. |
adj_p_val_thresh |
( |
log2_fc_thresh |
( |
Possible method choices are:
voom: uses limma::voom(), see h_diff_expr_voom() for details.
deseq2: uses DESeq2::DESeq(), see h_diff_expr_deseq2() for details.
A HermesDataDiffExpr object which is a data frame with the following columns for each gene
in the HermesData object:
log2_fc (the estimate of the log2 fold change between the 2 levels of the
provided factor)
stat (the test statistic, which one depends on the method used)
p_val (the raw p-value)
adj_p_val (the multiplicity adjusted p-value value)
autoplot(HermesDataDiffExpr): generates a volcano plot for a HermesDataDiffExpr object.
We provide the df_cols_to_factor() utility function that makes it easy to convert the
colData() character and logical variables to factors, so that they can be subsequently
used as group inputs. See the example.
In order to avoid a warning when using deseq2, it can be necessary to specify
fitType = "local" as additional argument. This could e.g. be the case when only few samples
are present in which case the default parametric dispersions estimation will not work.
object <- hermes_data %>% add_quality_flags() %>% filter() # Convert character and logical to factor variables in `colData`, # including the below used `group` variable. colData(object) <- df_cols_to_factor(colData(object)) res1 <- diff_expression(object, group = "SEX", method = "voom") head(res1) res2 <- diff_expression(object, group = "SEX", method = "deseq2") head(res2) # Pass method arguments to the internally used helper functions. res3 <- diff_expression(object, group = "SEX", method = "voom", robust = TRUE, trend = TRUE) head(res3) res4 <- diff_expression(object, group = "SEX", method = "deseq2", fitType = "local") head(res4) # Create the corresponding volcano plots. autoplot(res1) autoplot(res3)object <- hermes_data %>% add_quality_flags() %>% filter() # Convert character and logical to factor variables in `colData`, # including the below used `group` variable. colData(object) <- df_cols_to_factor(colData(object)) res1 <- diff_expression(object, group = "SEX", method = "voom") head(res1) res2 <- diff_expression(object, group = "SEX", method = "deseq2") head(res2) # Pass method arguments to the internally used helper functions. res3 <- diff_expression(object, group = "SEX", method = "voom", robust = TRUE, trend = TRUE) head(res3) res4 <- diff_expression(object, group = "SEX", method = "deseq2", fitType = "local") head(res4) # Create the corresponding volcano plots. autoplot(res1) autoplot(res3)
This produces a barplot of the dichotomized gene expression counts into two or three categories based on custom defined percentiles.
draw_barplot( object, assay_name, x_spec, facet_var = NULL, fill_var = NULL, percentiles = c(1/3, 2/3) )draw_barplot( object, assay_name, x_spec, facet_var = NULL, fill_var = NULL, percentiles = c(1/3, 2/3) )
object |
( |
assay_name |
( |
x_spec |
( |
facet_var |
( |
fill_var |
( |
percentiles |
( |
The ggplot barplot.
object <- hermes_data g <- genes(object) draw_barplot( object, assay_name = "counts", x_spec = gene_spec(g[1]), facet_var = "SEX", fill_var = "AGE18" ) draw_barplot( object, assay_name = "counts", x_spec = gene_spec(g[1:3], colMedians, "Median"), facet_var = "SEX", fill_var = "AGE18" ) draw_barplot( object, assay_name = "counts", x_spec = gene_spec(g[1:3], colMeans, "Mean"), facet_var = "SEX", fill_var = "AGE18", percentiles = c(0.1, 0.9) )object <- hermes_data g <- genes(object) draw_barplot( object, assay_name = "counts", x_spec = gene_spec(g[1]), facet_var = "SEX", fill_var = "AGE18" ) draw_barplot( object, assay_name = "counts", x_spec = gene_spec(g[1:3], colMedians, "Median"), facet_var = "SEX", fill_var = "AGE18" ) draw_barplot( object, assay_name = "counts", x_spec = gene_spec(g[1:3], colMeans, "Mean"), facet_var = "SEX", fill_var = "AGE18", percentiles = c(0.1, 0.9) )
This produces boxplots of the gene expression values of a single gene, multiple genes or a gene signature.
draw_boxplot( object, assay_name, genes, x_var = NULL, color_var = NULL, facet_var = NULL, violin = FALSE, jitter = FALSE ) h_draw_boxplot_df(object, assay_name, genes, x_var, color_var, facet_var)draw_boxplot( object, assay_name, genes, x_var = NULL, color_var = NULL, facet_var = NULL, violin = FALSE, jitter = FALSE ) h_draw_boxplot_df(object, assay_name, genes, x_var, color_var, facet_var)
object |
( |
assay_name |
( |
genes |
( |
x_var |
( |
color_var |
( |
facet_var |
( |
violin |
( |
jitter |
( |
The ggplot boxplot.
h_draw_boxplot_df(): Helper function to prepare the data frame required
for plotting.
object <- hermes_data draw_boxplot( object, assay_name = "counts", genes = gene_spec(c(A = genes(object)[1])), violin = TRUE ) object2 <- object %>% add_quality_flags() %>% filter() %>% normalize() draw_boxplot( object2, assay_name = "tpm", x_var = "SEX", genes = gene_spec(setNames(genes(object2)[1:10], 1:10), fun = colMeans), facet_var = "RACE", color_var = "AGE18", jitter = TRUE ) draw_boxplot( object, assay_name = "counts", x_var = "SEX", genes = gene_spec(genes(object)[1:3]), jitter = TRUE, facet_var = "AGE18" ) draw_boxplot( object, assay_name = "counts", genes = gene_spec(c(A = "GeneID:11185", B = "GeneID:10677")), violin = TRUE )object <- hermes_data draw_boxplot( object, assay_name = "counts", genes = gene_spec(c(A = genes(object)[1])), violin = TRUE ) object2 <- object %>% add_quality_flags() %>% filter() %>% normalize() draw_boxplot( object2, assay_name = "tpm", x_var = "SEX", genes = gene_spec(setNames(genes(object2)[1:10], 1:10), fun = colMeans), facet_var = "RACE", color_var = "AGE18", jitter = TRUE ) draw_boxplot( object, assay_name = "counts", x_var = "SEX", genes = gene_spec(genes(object)[1:3]), jitter = TRUE, facet_var = "AGE18" ) draw_boxplot( object, assay_name = "counts", genes = gene_spec(c(A = "GeneID:11185", B = "GeneID:10677")), violin = TRUE )
This creates a barplot of chromosomes for the AnyHermesData object with the proportions of low expression genes.
draw_genes_barplot( object, chromosomes = c(seq_len(22), "X", "Y", "MT"), include_others = TRUE )draw_genes_barplot( object, chromosomes = c(seq_len(22), "X", "Y", "MT"), include_others = TRUE )
object |
( |
chromosomes |
( |
include_others |
( |
The ggplot object with the histogram.
object <- hermes_data # Display chromosomes 1-22, X, Y, and MT. Other chromosomes are displayed in "Others". # To increase readability, we can have flip the coordinate axes. draw_genes_barplot(object) + coord_flip() # Alternatively we can also rotate the x-axis tick labels. draw_genes_barplot(object) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) # Display chromosomes 1 and 2. Other chromosomes are displayed in "Others". draw_genes_barplot(object, chromosomes = c("1", "2")) # Display chromosomes 1 and 2 only. draw_genes_barplot(object, chromosomes = c("1", "2"), include_others = FALSE)object <- hermes_data # Display chromosomes 1-22, X, Y, and MT. Other chromosomes are displayed in "Others". # To increase readability, we can have flip the coordinate axes. draw_genes_barplot(object) + coord_flip() # Alternatively we can also rotate the x-axis tick labels. draw_genes_barplot(object) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) # Display chromosomes 1 and 2. Other chromosomes are displayed in "Others". draw_genes_barplot(object, chromosomes = c("1", "2")) # Display chromosomes 1 and 2 only. draw_genes_barplot(object, chromosomes = c("1", "2"), include_others = FALSE)
This produces a heatmap of the chosen assay and groups by various sample variables.
draw_heatmap( object, assay_name, color_extremes = c(0.01, 0.99), col_data_annotation = NULL, ... )draw_heatmap( object, assay_name, color_extremes = c(0.01, 0.99), col_data_annotation = NULL, ... )
object |
( |
assay_name |
( |
color_extremes |
( |
col_data_annotation |
( |
... |
additional arguments to pass to |
The ComplexHeatmap::Heatmap heatmap
result <- hermes_data %>% normalize(methods = "voom") %>% add_quality_flags() %>% filter(what = "genes") draw_heatmap( object = result[1:10, ], assay_name = "counts", col_data_annotation = "COUNTRY" ) draw_heatmap( object = result[1:10, ], assay_name = "counts", color_extremes = c(0.001, 0.999), col_data_annotation = "AGEGRP" )result <- hermes_data %>% normalize(methods = "voom") %>% add_quality_flags() %>% filter(what = "genes") draw_heatmap( object = result[1:10, ], assay_name = "counts", col_data_annotation = "COUNTRY" ) draw_heatmap( object = result[1:10, ], assay_name = "counts", color_extremes = c(0.001, 0.999), col_data_annotation = "AGEGRP" )
This creates a density plot of the (log) counts distributions of the AnyHermesData object where each line on the plot corresponds to a sample.
draw_libsize_densities(object, log = TRUE)draw_libsize_densities(object, log = TRUE)
object |
( |
log |
( |
The ggplot object with the density plot.
result <- hermes_data draw_libsize_densities(result) draw_libsize_densities(result, log = FALSE)result <- hermes_data draw_libsize_densities(result) draw_libsize_densities(result, log = FALSE)
This creates a histogram of the library sizes of the AnyHermesData object.
draw_libsize_hist(object, bins = 30L, fill = "darkgrey")draw_libsize_hist(object, bins = 30L, fill = "darkgrey")
object |
( |
bins |
( |
fill |
( |
The ggplot object with the histogram.
result <- hermes_data draw_libsize_hist(result) draw_libsize_hist(result, bins = 10L, fill = "blue")result <- hermes_data draw_libsize_hist(result) draw_libsize_hist(result, bins = 10L, fill = "blue")
This creates a Q-Q plot of the library sizes of the AnyHermesData object.
draw_libsize_qq(object, color = "grey", linetype = "dashed")draw_libsize_qq(object, color = "grey", linetype = "dashed")
object |
( |
color |
( |
linetype |
( |
The ggplot object with the Q-Q Plot.
result <- hermes_data draw_libsize_qq(result) draw_libsize_qq(result, color = "blue", linetype = "solid") # We can also add sample names as labels. library(ggrepel) draw_libsize_qq(result) + geom_text_repel(label = colnames(result), stat = "qq")result <- hermes_data draw_libsize_qq(result) draw_libsize_qq(result, color = "blue", linetype = "solid") # We can also add sample names as labels. library(ggrepel) draw_libsize_qq(result) + geom_text_repel(label = colnames(result), stat = "qq")
This draws a boxplot, with overlaid data points, of the number of non-zero expressed genes per sample.
draw_nonzero_boxplot(object, position = position_jitter(0.2), alpha = 0.25)draw_nonzero_boxplot(object, position = position_jitter(0.2), alpha = 0.25)
object |
( |
position |
( |
alpha |
( |
The ggplot object with the boxplot.
# Default boxplot. result <- hermes_data draw_nonzero_boxplot(result) # Reusing the same position for labeling. library(ggrepel) pos <- position_jitter(0.5) draw_nonzero_boxplot(result, position = pos) + geom_text_repel(aes(label = samples(result)), position = pos)# Default boxplot. result <- hermes_data draw_nonzero_boxplot(result) # Reusing the same position for labeling. library(ggrepel) pos <- position_jitter(0.5) draw_nonzero_boxplot(result, position = pos) + geom_text_repel(aes(label = samples(result)), position = pos)
This produces a scatterplot of two genes or gene signatures.
draw_scatterplot( object, assay_name, x_spec, y_spec, color_var = NULL, facet_var = NULL, smooth_method = c("lm", "loess", "none") )draw_scatterplot( object, assay_name, x_spec, y_spec, color_var = NULL, facet_var = NULL, smooth_method = c("lm", "loess", "none") )
object |
( |
assay_name |
( |
x_spec |
( |
y_spec |
( |
color_var |
( |
facet_var |
( |
smooth_method |
( |
The ggplot scatterplot.
object <- hermes_data g <- genes(object) draw_scatterplot( object, assay_name = "counts", facet_var = NULL, x_spec = gene_spec(c(A = g[1])), y_spec = gene_spec(g[2]), color = "RACE" ) object2 <- object %>% add_quality_flags() %>% filter() %>% normalize() g2 <- genes(object2) draw_scatterplot( object2, assay_name = "tpm", facet_var = "SEX", x_spec = gene_spec(g2[1:10], colMeans, "Mean"), y_spec = gene_spec(g2[11:20], colMedians, "Median"), smooth_method = "loess" )object <- hermes_data g <- genes(object) draw_scatterplot( object, assay_name = "counts", facet_var = NULL, x_spec = gene_spec(c(A = g[1])), y_spec = gene_spec(g[2]), color = "RACE" ) object2 <- object %>% add_quality_flags() %>% filter() %>% normalize() g2 <- genes(object2) draw_scatterplot( object2, assay_name = "tpm", facet_var = "SEX", x_spec = gene_spec(g2[1:10], colMeans, "Mean"), y_spec = gene_spec(g2[11:20], colMedians, "Median"), smooth_method = "loess" )
ExpressionSet DataThis example data can be used to try out conversion of a Biobase::ExpressionSet
object into a HermesData object.
expression_setexpression_set
A Biobase::ExpressionSet object with 20 samples covering 5085
features (Entrez gene IDs).
This is an artificial dataset designed to resemble real data.
SummarizedExperiment::makeSummarizedExperimentFromExpressionSet() to convert into a
SummarizedExperiment::SummarizedExperiment.
summarized_experiment which contains similar data already as a
SummarizedExperiment::SummarizedExperiment.
The methods access the names of the variables in colData() and rowData() of
the object which are not required by design. So these can be additional sample or
patient characteristics, or gene characteristics.
extraColDataNames(x, ...) ## S4 method for signature 'AnyHermesData' extraColDataNames(x, ...) extraRowDataNames(x, ...) ## S4 method for signature 'AnyHermesData' extraRowDataNames(x, ...)extraColDataNames(x, ...) ## S4 method for signature 'AnyHermesData' extraColDataNames(x, ...) extraRowDataNames(x, ...) ## S4 method for signature 'AnyHermesData' extraRowDataNames(x, ...)
x |
( |
... |
not used. |
The character vector with the additional variable names in either
colData() or rowData().
object <- hermes_data extraColDataNames(object) extraRowDataNames(object)object <- hermes_data extraColDataNames(object) extraRowDataNames(object)
AnyHermesData on Subset Passing Default QC FlagsThis filters a AnyHermesData object using the default QC flags and required annotations.
filter(object, ...) ## S4 method for signature 'AnyHermesData' filter(object, what = c("genes", "samples"), annotation_required = "size") ## S4 method for signature 'data.frame' filter(object, ...) ## S4 method for signature 'ts' filter(object, ...)filter(object, ...) ## S4 method for signature 'AnyHermesData' filter(object, what = c("genes", "samples"), annotation_required = "size") ## S4 method for signature 'data.frame' filter(object, ...) ## S4 method for signature 'ts' filter(object, ...)
object |
( |
... |
additional arguments. |
what |
( |
annotation_required |
( |
Only genes without low expression (low_expression_flag) and samples
without low depth (low_depth_flag) or technical failure (tech_failure_flag)
remain in the returned filtered object.
Also required gene annotation columns can be specified, so that genes which are not complete
for these columns are filtered out. By default this is the size column, which is needed
for default normalization of the object.
The filtered AnyHermesData object.
The internal implementation cannot use the subset() method since that
requires non-standard evaluation of arguments.
a <- hermes_data dim(a) # Filter genes and samples on default QC flags. result <- filter(a) dim(result) # Filter only genes without low expression. result <- filter(a, what = "genes") # Filter only samples with low depth and technical failure. result <- filter(a, what = "samples") # Filter only genes, and require certain annotations to be present. result <- filter(a, what = "genes", annotation_required = c("size"))a <- hermes_data dim(a) # Filter genes and samples on default QC flags. result <- filter(a) dim(result) # Filter only genes without low expression. result <- filter(a, what = "genes") # Filter only samples with low depth and technical failure. result <- filter(a, what = "samples") # Filter only genes, and require certain annotations to be present. result <- filter(a, what = "genes", annotation_required = c("size"))
GeneSpec ConstructorCreates a new GeneSpec object.
gene_spec(genes = NULL, fun = NULL, fun_name = deparse(substitute(fun)))gene_spec(genes = NULL, fun = NULL, fun_name = deparse(substitute(fun)))
genes |
(named |
fun |
( |
fun_name |
( |
A new GeneSpec object.
gene_spec("GeneID:11185") gene_spec(c("GeneID:11185", "GeneID:10677", "GeneID:101928428"), fun = colMeans)gene_spec("GeneID:11185") gene_spec(c("GeneID:11185", "GeneID:10677", "GeneID:101928428"), fun = colMeans)
Access the gene IDs, i.e. row names, of a AnyHermesData object with a
nicely named accessor method.
genes(object) ## S4 method for signature 'AnyHermesData' genes(object)genes(object) ## S4 method for signature 'AnyHermesData' genes(object)
object |
( |
The character vector with the gene IDs.
samples() to access the sample IDs.
a <- hermes_data genes(a)a <- hermes_data genes(a)
A GeneSpec consists of the gene IDs (possibly named with labels),
the summary function and the name of the summary function.
new()
Creates a new GeneSpec object.
GeneSpec$new(genes = NULL, fun = NULL, fun_name = deparse(substitute(fun)))
genes(named character or NULL)
the gene IDs, where the names
are used as labels if available.
fun(function or NULL)
summary function. If NULL is
used then multiple genes are not summarized but returned as a matrix from the
extract method.
fun_name(string)
name of the summary function.
A new GeneSpec object.
get_genes()
Returns the genes.
GeneSpec$get_genes()
get_gene_labels()
Returns the gene labels (substituted by gene IDs if not available).
GeneSpec$get_gene_labels(genes = self$get_genes())
genes(character)
for which subset of genes the labels should be returned.
returns_vector()
Predicate whether the extract returns a vector or not.
GeneSpec$returns_vector()
get_label()
Returns a string which can be used e.g. for plot labels.
GeneSpec$get_label(genes = self$get_genes())
genes(character)
for which subset of genes the labels should be returned.
extract()
Extract the gene values from an assay as specified.
GeneSpec$extract(assay)
assay(matrix)
original matrix with rownames containing the
specified genes.
Either a vector with one value per column, or a matrix with multiple genes in the rows.
extract_data_frame()
Extract the gene values as a data.frame.
GeneSpec$extract_data_frame(assay)
assay(matrix)
original matrix with rownames containing the
specified genes.
A data.frame with the genes in the columns and the samples
in the rows.
clone()
The objects of this class are cloneable with this method.
GeneSpec$clone(deep = FALSE)
deepWhether to make a deep clone.
# Minimal specification if only one gene is used. x_spec <- gene_spec("GeneID:1820") # Using multiple genes with a signature. x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colMeans) x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colPrinComp1) x_spec$returns_vector() x_spec$get_genes() x_spec$get_gene_labels() x_spec$get_label() # Using multiple genes with partial labels, without a signature. x_spec <- gene_spec(c(A = "GeneID:1820", "GeneID:52")) x_spec$returns_vector() x_spec$get_gene_labels() # Use the gene specification to extract genes from a matrix. mat <- matrix( data = rpois(15, 10), nrow = 3, ncol = 5, dimnames = list(c("GeneID:1820", "GeneID:52", "GeneID:523"), NULL) ) x_spec$extract(mat) # We can also extract these as a `data.frame`. x_spec$extract_data_frame(mat)# Minimal specification if only one gene is used. x_spec <- gene_spec("GeneID:1820") # Using multiple genes with a signature. x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colMeans) x_spec <- gene_spec(c("GeneID:1820", "GeneID:52"), fun = colPrinComp1) x_spec$returns_vector() x_spec$get_genes() x_spec$get_gene_labels() x_spec$get_label() # Using multiple genes with partial labels, without a signature. x_spec <- gene_spec(c(A = "GeneID:1820", "GeneID:52")) x_spec$returns_vector() x_spec$get_gene_labels() # Use the gene specification to extract genes from a matrix. mat <- matrix( data = rpois(15, 10), nrow = 3, ncol = 5, dimnames = list(c("GeneID:1820", "GeneID:52", "GeneID:523"), NULL) ) x_spec$extract(mat) # We can also extract these as a `data.frame`. x_spec$extract_data_frame(mat)
The difference here to BiocGenerics::duplicated() is that also the first occurrence
of a duplicate is flagged as TRUE.
h_all_duplicated(x)h_all_duplicated(x)
x |
a (generalized, see |
Logical vector flagging all occurrences of duplicate values as TRUE.
h_all_duplicated(c("a", "a", "b")) duplicated(c("a", "a", "b"))h_all_duplicated(c("a", "a", "b")) duplicated(c("a", "a", "b"))
data.frame
This helper function converts all character and logical variables
to factor variables in a data.frame. It also sets an explicit missing data level
for all factor variables that have at least one NA. Empty strings are handled
as NA.
h_df_factors_with_explicit_na(data, na_level = "<Missing>")h_df_factors_with_explicit_na(data, na_level = "<Missing>")
data |
( |
na_level |
( |
The modified data.
dat <- data.frame( a = c(NA, 2), b = c("A", NA), c = c("C", "D"), d = factor(c(NA, "X")), e = factor(c("Y", "Z")) ) h_df_factors_with_explicit_na(dat)dat <- data.frame( a = c(NA, 2), b = c("A", NA), c = c("C", "D"), d = factor(c(NA, "X")), e = factor(c("Y", "Z")) ) h_df_factors_with_explicit_na(dat)
DESeq2 Differential Expression AnalysisThis helper functions performs the differential expression analysis with
DESeq2::DESeq() for a given AnyHermesData input and design matrix.
h_diff_expr_deseq2(object, design, ...)h_diff_expr_deseq2(object, design, ...)
object |
( |
design |
( |
... |
additional arguments internally passed to |
A data frame with columns log2_fc (estimated log2 fold change),
stat (Wald statistic), p_val (raw p-value), adj_p_pval (Benjamini-Hochberg adjusted p-value).
Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15(12), 550. doi:10.1186/s13059-014-0550-8.
object <- hermes_data # Create the design matrix corresponding to the factor of interest. design <- model.matrix(~SEX, colData(object)) # Then perform the `DESeq2` differential expression analysis. result <- h_diff_expr_deseq2(object, design) head(result) # Change of the `fitType` can be required in some cases. result2 <- h_diff_expr_deseq2(object, design, fitType = "local") head(result2)object <- hermes_data # Create the design matrix corresponding to the factor of interest. design <- model.matrix(~SEX, colData(object)) # Then perform the `DESeq2` differential expression analysis. result <- h_diff_expr_deseq2(object, design) head(result) # Change of the `fitType` can be required in some cases. result2 <- h_diff_expr_deseq2(object, design, fitType = "local") head(result2)
limma/voom Differential Expression AnalysisThis helper functions performs the differential expression analysis with the voom
method from the limma package (via limma::voom(), limma::lmFit() and limma::eBayes())
for given counts in a AnyHermesData object and a corresponding design matrix.
h_diff_expr_voom(object, design, ...)h_diff_expr_voom(object, design, ...)
object |
( |
design |
( |
... |
additional arguments internally passed to |
A data frame with columns log2_fc (estimated log2 fold change),
stat (moderated t-statistic), p_val (raw p-value), adj_p_pval (Benjamini-Hochberg adjusted p-value).
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47. doi:10.1093/nar/gkv007.
Law CW, Chen Y, Shi W, Smyth GK (2014). “voom: precision weights unlock linear model analysis tools for RNA-seq read counts.” Genome Biology, 15(2), R29. doi:10.1186/gb-2014-15-2-r29.
object <- hermes_data # Create the design matrix corresponding to the factor of interest. design <- model.matrix(~SEX, colData(object)) # Then perform the differential expression analysis. result <- h_diff_expr_voom(object, design) head(result) # Sometimes we might want to specify method details. result2 <- h_diff_expr_voom(object, design, trend = TRUE, robust = TRUE) head(result2)object <- hermes_data # Create the design matrix corresponding to the factor of interest. design <- model.matrix(~SEX, colData(object)) # Then perform the differential expression analysis. result <- h_diff_expr_voom(object, design) head(result) # Sometimes we might want to specify method details. result2 <- h_diff_expr_voom(object, design, trend = TRUE, robust = TRUE) head(result2)
Ensembl to Entrez Gene IDsThis helper function queries BioMart to translate Ensembl to Entrez Gene IDs.
h_ensembl_to_entrez_ids(gene_ids, mart)h_ensembl_to_entrez_ids(gene_ids, mart)
gene_ids |
( |
mart |
( |
Character vector of Entrez gene IDs.
if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") h_ensembl_to_entrez_ids(c("ENSG00000135407", "ENSG00000241644"), mart) }if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") h_ensembl_to_entrez_ids(c("ENSG00000135407", "ENSG00000241644"), mart) }
BioMart
Helper function to query annotations from biomaRt, for cleaned up gene IDs of
a specific ID variable and given biomaRt::Mart.
h_get_annotation_biomart(gene_ids, id_var, mart)h_get_annotation_biomart(gene_ids, id_var, mart)
gene_ids |
( |
id_var |
( |
mart |
( |
A data frame with columns:
id_var (depending on what was used)
hgnc_symbol
entrezgene_description
chromosome_name
size
refseq_mrna
refseq_peptide
if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") h_get_annotation_biomart(c("11185", "10677"), id_var = "entrezgene_id", mart = mart) }if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") h_get_annotation_biomart(c("11185", "10677"), id_var = "entrezgene_id", mart = mart) }
BioMart Coordinates into GRanges
This function extracts the chromosome number, the start position and the end position of transcripts
in given data.frame with coordinates as returned by biomaRt::getBM() and converts
them to a GRanges object.
h_get_granges_by_id(coords, id)h_get_granges_by_id(coords, id)
coords |
( |
id |
( |
GRange objects for the respective single gene ID.
if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") attrs <- c( "ensembl_gene_id", "ensembl_exon_id", "chromosome_name", "exon_chrom_start", "exon_chrom_end" ) coords <- biomaRt::getBM( filters = "entrezgene_id", attributes = attrs, values = c("11185", "10677"), mart = mart ) h_get_granges_by_id(coords, "ENSG00000135407") }if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") attrs <- c( "ensembl_gene_id", "ensembl_exon_id", "chromosome_name", "exon_chrom_start", "exon_chrom_end" ) coords <- biomaRt::getBM( filters = "entrezgene_id", attributes = attrs, values = c("11185", "10677"), mart = mart ) h_get_granges_by_id(coords, "ENSG00000135407") }
This helper function queries BioMart for lengths of genes by adding up all
exon lengths after reducing overlaps.
h_get_size_biomart(gene_ids, id_var, mart)h_get_size_biomart(gene_ids, id_var, mart)
gene_ids |
( |
id_var |
( |
mart |
( |
Named integer vector indicating the gene lengths.
if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") h_get_size_biomart("11185", "entrezgene_id", mart) h_get_size_biomart("ENSG00000215417", "ensembl_gene_id", mart) h_get_size_biomart(c("11185", "10677"), "entrezgene_id", mart) h_get_size_biomart(c("ENSG00000135407", "ENSG00000215417"), "ensembl_gene_id", mart) }if (interactive()) { mart <- biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl") h_get_size_biomart("11185", "entrezgene_id", mart) h_get_size_biomart("ENSG00000215417", "ensembl_gene_id", mart) h_get_size_biomart(c("11185", "10677"), "entrezgene_id", mart) h_get_size_biomart(c("ENSG00000135407", "ENSG00000215417"), "ensembl_gene_id", mart) }
This helper function determines for each gene in the object whether all required annotation columns are filled.
h_has_req_annotations(object, annotation_required)h_has_req_annotations(object, annotation_required)
object |
( |
annotation_required |
( |
Named logical vector with one value for each gene in object, which is TRUE if all
required annotation columns are filled, and otherwise FALSE.
filter() where this is used internally.
object <- hermes_data result <- h_has_req_annotations(object, "size") all(result) rowData(object)$size[1] <- NA # nolint which(!h_has_req_annotations(object, "size"))object <- hermes_data result <- h_has_req_annotations(object, "size") all(result) rowData(object)$size[1] <- NA # nolint which(!h_has_req_annotations(object, "size"))
This is used by the rename method. It wraps the assertions and the
matching used several times.
h_map_pos(names, map)h_map_pos(names, map)
names |
( |
map |
(named |
Integer vector of the positions of the map values in the names.
h_map_pos(c("a", "b"), c(d = "b"))h_map_pos(c("a", "b"), c(d = "b"))
This helper function adds parentheses around each element of a character vector.
h_parens(x)h_parens(x)
x |
( |
Character vector with parentheses, except when x is a blank string
in which case it is returned unaltered.
h_parens("bla") h_parens("") h_parens(c("bla", "bli"))h_parens("bla") h_parens("") h_parens(c("bla", "bli"))
This function processes sample variables from AnyHermesData and the
corresponding principal components matrix, and then generates the matrix of R2 values.
h_pca_df_r2_matrix(pca, df)h_pca_df_r2_matrix(pca, df)
pca |
( |
df |
( |
Note that only the df columns which are numeric, character, factor or
logical are included in the resulting matrix, because other variable types are not
supported.
In addition, df columns which are constant, all NA, or character or factor
columns with too many levels are also dropped before the analysis.
A matrix with R2 values for all combinations of sample variables and principal components.
h_pca_var_rsquared() which is used internally to calculate the R2 for one
sample variable.
object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() # Obtain the principal components. pca <- calc_pca(object)$x # Obtain the `colData` as a `data.frame`. df <- as.data.frame(colData(object)) # Correlate them. r2_all <- h_pca_df_r2_matrix(pca, df) str(r2_all) # We can see that only about half of the columns from `df` were # used for the correlations. ncol(r2_all) ncol(df)object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() # Obtain the principal components. pca <- calc_pca(object)$x # Obtain the `colData` as a `data.frame`. df <- as.data.frame(colData(object)) # Correlate them. r2_all <- h_pca_df_r2_matrix(pca, df) str(r2_all) # We can see that only about half of the columns from `df` were # used for the correlations. ncol(r2_all) ncol(df)
This helper function calculates R2 values between one sample variable from AnyHermesData
and all Principal Components (PCs) separately (one linear model is fit for each PC).
h_pca_var_rsquared(pca, x)h_pca_var_rsquared(pca, x)
pca |
( |
x |
( |
Note that in case there are estimation problems for any of the PCs, then NA will
be returned for those.
A vector with R2 values for each principal component.
object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() # Obtain the principal components. pca <- calc_pca(object)$x # Obtain the sample variable. x <- colData(object)$AGE18 # Correlate them. r2 <- h_pca_var_rsquared(pca, x)object <- hermes_data %>% add_quality_flags() %>% filter() %>% normalize() # Obtain the principal components. pca <- calc_pca(object)$x # Obtain the sample variable. x <- colData(object)$AGE18 # Correlate them. r2 <- h_pca_var_rsquared(pca, x)
This helper function makes a short list string, e.g. "a, b, ..., z"
out of a character vector, e.g. letters.
h_short_list(x, sep = ", ", thresh = 3L)h_short_list(x, sep = ", ", thresh = 3L)
x |
( |
sep |
( |
thresh |
( |
String with the short list.
h_short_list(letters) h_short_list(letters[1:3]) h_short_list(LETTERS[1:5], sep = ";", thresh = 5L)h_short_list(letters) h_short_list(letters[1:3]) h_short_list(LETTERS[1:5], sep = ";", thresh = 5L)
This helper function removes the prefix and possible delimiter from a vector of gene IDs, such that only the digits are returned.
h_strip_prefix(gene_ids, prefix)h_strip_prefix(gene_ids, prefix)
gene_ids |
( |
prefix |
( |
Character vector that contains only the digits for each gene ID.
This is currently used to strip away the GeneID prefix from Entrez gene IDs
so that they can be queried from BioMart
h_strip_prefix(c("GeneID:11185", "GeneID:10677"), prefix = "GeneID")h_strip_prefix(c("GeneID:11185", "GeneID:10677"), prefix = "GeneID")
This helper function generates a set of unique labels given unique IDs and not necessarily unique names.
h_unique_labels(ids, nms = NULL)h_unique_labels(ids, nms = NULL)
ids |
( |
nms |
( |
Character vector where empty names are replaced by the IDs and non-unique names are made unique by appending the IDs in parentheses.
h_unique_labels(c("1", "2", "3"), c("A", "B", "A")) h_unique_labels(NULL) h_unique_labels(c("1", "2", "3"))h_unique_labels(c("1", "2", "3"), c("A", "B", "A")) h_unique_labels(NULL) h_unique_labels(c("1", "2", "3"))
HermesData DataThis example HermesData is created from the underlying SummarizedExperiment::SummarizedExperiment
object by renaming descriptors to align with standard specification. It already
contains the required columns in rowData and colData.
hermes_datahermes_data
A HermesData object with 20 samples covering
5085 features (Entrez gene IDs).
This is an artificial dataset designed to resemble real data.
summarized_experiment for the underlying SummarizedExperiment::SummarizedExperiment
object.
HermesData and RangedHermesData
The HermesData class is an extension of SummarizedExperiment::SummarizedExperiment
with additional validation criteria.
HermesData(object) HermesDataFromMatrix(counts, ...)HermesData(object) HermesDataFromMatrix(counts, ...)
object |
( |
counts |
( |
... |
additional arguments, e.g. |
The additional criteria are:
The first assay must be counts containing non-missing, integer, non-negative values.
Note that rename() can be used to edit the assay name to counts if needed.
The following columns must be in rowData:
symbol (also often called HGNC or similar, example: "INMT")
desc (the gene name, example: "indolethylamine N-methyltransferase")
chromosome (the chromosome as string, example: "7")
size (the size of the gene in base pairs, e.g 5468)
low_expression_flag (can be populated with add_quality_flags())
The following columns must be in colData:
low_depth_flag (can be populated with add_quality_flags())
tech_failure_flag (can be populated with add_quality_flags())
The object must have unique row and column names. The row names are the gene names and the column names are the sample names.
Analogously, RangedHermesData is an extension of
SummarizedExperiment::RangedSummarizedExperiment and has the same
additional validation requirements. Methods can be defined for both classes at the
same time with the AnyHermesData signature.
A Biobase::ExpressionSet object can be imported by using the
SummarizedExperiment::makeSummarizedExperimentFromExpressionSet() function to
first convert it to a SummarizedExperiment::SummarizedExperiment object before
converting it again into a HermesData object.
An object of class AnyHermesData (HermesData or RangedHermesData).
prefixcommon prefix of the gene IDs (row names).
Note that we use S4Vectors::setValidity2() to define the validity
method, which allows us to turn off the validity checks in internal
functions where intermediate objects may not be valid within the scope of
the function.
It can be helpful to convert character and logical variables to factors in colData()
(before or after the HermesData creation). We provide the utility function
df_cols_to_factor() to simplify this task, but leave it to the user to allow
for full control of the details.
rename() for renaming columns of the input data.
# Convert an `ExpressionSet` to a `RangedSummarizedExperiment`. ranged_summarized_experiment <- makeSummarizedExperimentFromExpressionSet(expression_set) # Then convert to `RangedHermesData`. HermesData(ranged_summarized_experiment) # Create objects starting from a `SummarizedExperiment`. hermes_data <- HermesData(summarized_experiment) hermes_data # Create objects from a matrix. Note that additional arguments are not required but possible. counts_matrix <- assay(summarized_experiment) counts_hermes_data <- HermesDataFromMatrix(counts_matrix)# Convert an `ExpressionSet` to a `RangedSummarizedExperiment`. ranged_summarized_experiment <- makeSummarizedExperimentFromExpressionSet(expression_set) # Then convert to `RangedHermesData`. HermesData(ranged_summarized_experiment) # Create objects starting from a `SummarizedExperiment`. hermes_data <- HermesData(summarized_experiment) hermes_data # Create objects from a matrix. Note that additional arguments are not required but possible. counts_matrix <- assay(summarized_experiment) counts_hermes_data <- HermesDataFromMatrix(counts_matrix)
This is a useful function when trying to join genetic with CDISC data sets.
inner_join_cdisc( gene_data, cdisc_data, patient_key = "USUBJID", additional_keys = character() )inner_join_cdisc( gene_data, cdisc_data, patient_key = "USUBJID", additional_keys = character() )
gene_data |
( |
cdisc_data |
( |
patient_key |
( |
additional_keys |
( |
A data.frame which contains columns from both data sets merged by the keys.
Columns which are contained in both data sets but are not specified as keys are taken
from gene_data and not from cdisc_data.
gene_data <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820")) cdisc_data <- data.frame( USUBJID = head(gene_data$USUBJID, 10), extra = 1:10 ) result <- inner_join_cdisc(gene_data, cdisc_data) resultgene_data <- col_data_with_genes(hermes_data, "counts", gene_spec("GeneID:1820")) cdisc_data <- data.frame( USUBJID = head(gene_data$USUBJID, 10), extra = 1:10 ) result <- inner_join_cdisc(gene_data, cdisc_data) result
SummarizedExperiment
This method checks whether a SummarizedExperiment::SummarizedExperiment object is empty.
## S4 method for signature 'SummarizedExperiment' isEmpty(x)## S4 method for signature 'SummarizedExperiment' isEmpty(x)
x |
( |
Flag whether the object is empty.
isEmpty(summarized_experiment) isEmpty(summarized_experiment[NULL, ]) isEmpty(hermes_data)isEmpty(summarized_experiment) isEmpty(summarized_experiment[NULL, ]) isEmpty(hermes_data)
lapply method for MultiAssayExperiment
Apply a function on all experiments in an MAE.
## S4 method for signature 'MultiAssayExperiment' lapply(X, FUN, safe = TRUE, ...)## S4 method for signature 'MultiAssayExperiment' lapply(X, FUN, safe = TRUE, ...)
X |
( |
FUN |
( |
safe |
( |
... |
additional arguments passed to |
MultiAssayExperiment object with specified function applied.
object <- multi_assay_experiment result <- lapply(object, normalize, safe = TRUE) # Similarly, all experiments in an MAE can be converted to HermesData class: result <- lapply(object, HermesData, safe = TRUE)object <- multi_assay_experiment result <- lapply(object, normalize, safe = TRUE) # Similarly, all experiments in an MAE can be converted to HermesData class: result <- lapply(object, HermesData, safe = TRUE)
These methods access or set the metadata in a AnyHermesData object.
x |
( |
value |
( |
The metadata which is a list.
Note that this just inherits S4Vectors::metadata,Annotated-method().
a <- hermes_data metadata(a) metadata(a) <- list(new = "my metadata") metadata(a)a <- hermes_data metadata(a) metadata(a) <- list(new = "my metadata") metadata(a)
MultiAssayExperiment DataThis example MultiAssayExperiment::MultiAssayExperiment can be used as test data.
multi_assay_experimentmulti_assay_experiment
A MultiAssayExperiment::MultiAssayExperiment object with 3 separate HermesData
objects.
The first object contains 5 samples and covers 1000 features (Entrez gene IDs).
The second object contains 9 samples with 2500 features.
The third object contains 6 samples with 1300 features.
This is an artificial dataset designed to resemble real data.
AnyHermesData ObjectsThe normalize() method is normalizing the input AnyHermesData according to one or more
specified normalization methods. The results are saved as additional assays
in the object.
Possible normalization methods (which are implemented with separate helper functions):
cpm: Counts per Million (CPM). Separately by sample, the original counts of the genes
are divided by the library size of this sample, and multiplied by one million. This is the
appropriate normalization for between-sample comparisons.
rpkm: Reads per Kilobase of transcript per Million reads mapped (RPKM). Each gene count is divided by the gene size (in kilobases) and then again divided by the library sizes of each sample (in millions). This allows for within-sample comparisons, as it takes into account the gene sizes - longer genes will always have more counts than shorter genes.
tpm: Transcripts per Million (TPM). This addresses the problem of RPKM being inconsistent across samples (which can be seen that the sum of all RPKM values will vary from sample to sample). Therefore here we divide the RPKM by the sum of all RPKM values for each sample, and multiply by one million.
voom: VOOM normalization. This is essentially just a slight variation of CPM where
a prior_count of 0.5 is combined with lib_sizes increased by 1 for each sample. Note that
this is not required for the corresponding differential expression analysis, but just provided
as a complementary experimental normalization approach here.
vst: Variance stabilizing transformation. This is to transform the normalized
count data for all genes into approximately homoskedastic values (having constant variance).
rlog: The transformation to the log2 scale values with approximately homoskedastic values.
## S4 method for signature 'AnyHermesData' normalize( object, methods = c("cpm", "rpkm", "tpm", "voom", "vst"), control = control_normalize(), ... ) h_cpm(object, control = control_normalize()) h_rpkm(object, control = control_normalize()) h_tpm(object, control = control_normalize()) h_voom(object, control = control_normalize()) h_vst(object, control = control_normalize()) h_rlog(object, control = control_normalize())## S4 method for signature 'AnyHermesData' normalize( object, methods = c("cpm", "rpkm", "tpm", "voom", "vst"), control = control_normalize(), ... ) h_cpm(object, control = control_normalize()) h_rpkm(object, control = control_normalize()) h_tpm(object, control = control_normalize()) h_voom(object, control = control_normalize()) h_vst(object, control = control_normalize()) h_rlog(object, control = control_normalize())
object |
( |
methods |
( |
control |
(named |
... |
not used. |
The AnyHermesData object with additional assays containing the normalized counts.
The control is saved in the metadata of the object for future reference.
h_cpm(): calculates the Counts per Million (CPM) normalized counts.
h_rpkm(): calculates the Reads per Kilobase per Million (RPKM) normalized counts.
h_tpm(): calculates the Transcripts per Million (TPM) normalized counts.
h_vst(): variance stabilizing transformation (vst) from DESeq2 package.
h_rlog(): regularized log transformation (rlog) from DESeq2 package.
control_normalize() to define the normalization method settings.
a <- hermes_data # By default, log values are used with a prior count of 1 added to original counts. result <- normalize(a) assayNames(result) tpm <- assay(result, "tpm") tpm[1:3, 1:3] # We can also work on original scale. result_orig <- normalize(a, control = control_normalize(log = FALSE)) tpm_orig <- assay(result_orig, "tpm") tpm_orig[1:3, 1:3] # Separate calculation of the CPM normalized counts. counts_cpm <- h_cpm(a) str(counts_cpm) # Separate calculation of the RPKM normalized counts. counts_rpkm <- h_rpkm(a) str(counts_rpkm) # Separate calculation of the TPM normalized counts. counts_tpm <- h_tpm(a) str(counts_tpm) # Separate calculation of the VOOM normalized counts. counts_voom <- h_voom(a) str(counts_voom) # Separate calculation of the vst transformation. counts_vst <- h_vst(a) str(counts_vst) # Separate calculation of the rlog transformation. counts_rlog <- h_rlog(a) str(counts_rlog)a <- hermes_data # By default, log values are used with a prior count of 1 added to original counts. result <- normalize(a) assayNames(result) tpm <- assay(result, "tpm") tpm[1:3, 1:3] # We can also work on original scale. result_orig <- normalize(a, control = control_normalize(log = FALSE)) tpm_orig <- assay(result_orig, "tpm") tpm_orig[1:3, 1:3] # Separate calculation of the CPM normalized counts. counts_cpm <- h_cpm(a) str(counts_cpm) # Separate calculation of the RPKM normalized counts. counts_rpkm <- h_rpkm(a) str(counts_rpkm) # Separate calculation of the TPM normalized counts. counts_tpm <- h_tpm(a) str(counts_tpm) # Separate calculation of the VOOM normalized counts. counts_voom <- h_voom(a) str(counts_voom) # Separate calculation of the vst transformation. counts_vst <- h_vst(a) str(counts_vst) # Separate calculation of the rlog transformation. counts_rlog <- h_rlog(a) str(counts_rlog)
Generic function to access the prefix from an object.
prefix(object, ...)prefix(object, ...)
object |
( |
... |
additional arguments. |
The prefix slot contents.
a <- hermes_data prefix(a)a <- hermes_data prefix(a)
The generic function query() is the interface for querying gene annotations from
a data base connection.
query(genes, connection) ## S4 method for signature 'character,ConnectionBiomart' query(genes, connection)query(genes, connection) ## S4 method for signature 'character,ConnectionBiomart' query(genes, connection)
genes |
( |
connection |
(connection class) |
A method is provided for the ConnectionBiomart class. However, the framework
is extensible: It is simple to add new connections and corresponding query methods
for other data bases, e.g. company internal data bases. Please make sure to
follow the required format of the returned value.
The BioMart queries might not return information for all the genes. This can be
due to different versions being used in the gene IDs and the queried Ensembl data base.
A S4Vectors::DataFrame with the gene annotations. It is required that:
The rownames are identical to the input genes.
The colnames are equal to the annotation columns .row_data_annotation_cols.
Therefore, missing information needs to be properly included in the DataFrame
with NA entries.
if (interactive()) { object <- hermes_data connection <- connect_biomart(prefix(object)) result <- query(genes(object), connection) head(result) head(annotation(object)) }if (interactive()) { object <- hermes_data connection <- connect_biomart(prefix(object)) result <- query(genes(object), connection) head(result) head(annotation(object)) }
AnyHermesData ObjectsThis method combines AnyHermesData objects with the same samples but different
features of interest (rows in assays).
... |
( |
The combined AnyHermesData object.
Note that this just inherits
SummarizedExperiment::rbind,SummarizedExperiment-method(). When binding a
AnyHermesData object with a SummarizedExperiment::SummarizedExperiment
object, then the result will be a
SummarizedExperiment::SummarizedExperiment object (the more general
class).
Note that we need to have unique gene IDs (row names) and the same prefix across the combined object.
cbind to column bind objects.
a <- hermes_data[1:2542, ] b <- hermes_data[2543:5085, ] result <- rbind(a, b) class(result)a <- hermes_data[1:2542, ] b <- hermes_data[2543:5085, ] result <- rbind(a, b) class(result)
SummarizedExperiment ObjectsThis method renames columns of the rowData and colData, as well as assays, of
SummarizedExperiment::SummarizedExperiment objects. This increases the flexibility
since renaming can be done before conversion to a HermesData object.
## S4 method for signature 'SummarizedExperiment' rename( x, row_data = character(), col_data = character(), assays = character(), ... ) ## S4 method for signature 'data.frame' rename(x, ...)## S4 method for signature 'SummarizedExperiment' rename( x, row_data = character(), col_data = character(), assays = character(), ... ) ## S4 method for signature 'data.frame' rename(x, ...)
x |
( |
row_data |
(named |
col_data |
(named |
assays |
(named |
... |
additional arguments (not used here). |
The SummarizedExperiment::SummarizedExperiment object with renamed contents.
x <- summarized_experiment # Use deliberately a non-standard assay name in this example. assayNames(x) <- "count" # Rename `HGNC` to `symbol` in the `rowData`. x <- rename(x, row_data = c(symbol = "HGNC")) head(names(rowData(x))) # Rename `LowDepthFlag` to `low_depth_flag` in `colData`. x <- rename(x, col_data = c(low_depth_flag = "LowDepthFlag")) tail(names(colData(x))) # Rename assay `count` to `counts`. x <- rename(x, assays = c(counts = "count")) assayNames(x)x <- summarized_experiment # Use deliberately a non-standard assay name in this example. assayNames(x) <- "count" # Rename `HGNC` to `symbol` in the `rowData`. x <- rename(x, row_data = c(symbol = "HGNC")) head(names(rowData(x))) # Rename `LowDepthFlag` to `low_depth_flag` in `colData`. x <- rename(x, col_data = c(low_depth_flag = "LowDepthFlag")) tail(names(colData(x))) # Rename assay `count` to `counts`. x <- rename(x, assays = c(counts = "count")) assayNames(x)
Access the sample IDs, i.e. col names, of a AnyHermesData object with a
nicely named accessor method.
## S4 method for signature 'AnyHermesData' samples(object)## S4 method for signature 'AnyHermesData' samples(object)
object |
( |
The character vector with the sample IDs.
genes() to access the gene IDs.
a <- hermes_data samples(a)a <- hermes_data samples(a)
Setter function which allows the user to define a sample manually as a technical failure.
set_tech_failure(object, sample_ids)set_tech_failure(object, sample_ids)
object |
( |
sample_ids |
( |
AnyHermesData object with modified technical failure flags.
add_quality_flags() which automatically sets all (gene and sample) quality flags,
including these technical failure flags.
# Manually flag technical failures in a `AnyHermesData` object. object <- hermes_data get_tech_failure(object)["06520101B0017R"] result <- set_tech_failure(object, c("06520101B0017R", "06520047C0017R")) get_tech_failure(result)["06520101B0017R"]# Manually flag technical failures in a `AnyHermesData` object. object <- hermes_data get_tech_failure(object)["06520101B0017R"] result <- set_tech_failure(object, c("06520101B0017R", "06520047C0017R")) get_tech_failure(result)["06520101B0017R"]
AnyHermesData ObjectsA show method that displays high-level information of AnyHermesData objects.
## S4 method for signature 'HermesData' show(object) ## S4 method for signature 'RangedHermesData' show(object)## S4 method for signature 'HermesData' show(object) ## S4 method for signature 'RangedHermesData' show(object)
object |
( |
None (invisible NULL), only used for the side effect of printing to
the console.
The same method is used for both HermesData and RangedHermesData
objects. We need to define this separately to have this method used instead of
the one inherited from SummarizedExperiment::SummarizedExperiment.
object <- hermes_data objectobject <- hermes_data object
AnyHermesData ObjectsThis method subsets AnyHermesData objects, based on expressions involving the
rowData columns and the colData columns.
x |
( |
subset |
( |
select |
( |
The subsetted AnyHermesData object.
Note that this just inherits
SummarizedExperiment::subset,SummarizedExperiment-method().
a <- hermes_data a # Subset both genes and samples. subset(a, subset = low_expression_flag, select = DISCSTUD == "N") # Subset only genes. subset(a, subset = chromosome == "2") # Subset only samples. subset(a, select = AGE > 18)a <- hermes_data a # Subset both genes and samples. subset(a, subset = low_expression_flag, select = DISCSTUD == "N") # Subset only genes. subset(a, subset = chromosome == "2") # Subset only samples. subset(a, select = AGE > 18)
SummarizedExperiment DataThis example SummarizedExperiment::SummarizedExperiment can be used to create a
HermesData object. It already contains the required columns in rowData and colData.
summarized_experimentsummarized_experiment
A SummarizedExperiment::SummarizedExperiment object with 20 samples covering
5085 features (Entrez gene IDs).
This is an artificial dataset designed to resemble real data.
expression_set which contains similar data as a Biobase::ExpressionSet.
AnyHermesData ObjectsProvides a concise summary of the content of AnyHermesData objects.
summary(object, ...) ## S4 method for signature 'AnyHermesData' summary(object) ## S4 method for signature 'HermesDataSummary' show(object)summary(object, ...) ## S4 method for signature 'AnyHermesData' summary(object) ## S4 method for signature 'HermesDataSummary' show(object)
object |
( |
... |
not used. |
An object of the corresponding summary class, here
HermesDataSummary.
summary(AnyHermesData): A summary method for AnyHermesData object that
creates a HermesDataSummary object.
show(HermesDataSummary): A show method prints summary description of HermesDataSummary object
generated by the summary() method.
object <- hermes_data object_summary <- summary(object) # We can access parts of this S4 object with the `slot` function. str(object_summary) slotNames(object_summary) slot(object_summary, "lib_sizes") # Just calling the summary method like this will use the `show()` method. summary(object)object <- hermes_data object_summary <- summary(object) # We can access parts of this S4 object with the `slot` function. str(object_summary) slotNames(object_summary) slot(object_summary, "lib_sizes") # Just calling the summary method like this will use the `show()` method. summary(object)
top_genes() creates a HermesDataTopGenes object, which extends data.frame. It
contains two columns:
expression: containing the statistic values calculated by summary_fun across columns.
name: the gene names.
The corresponding autoplot() method then visualizes the result as a barplot.
top_genes( object, assay_name = "counts", summary_fun = rowMeans, n_top = if (is.null(min_threshold)) 10L else NULL, min_threshold = NULL ) ## S4 method for signature 'HermesDataTopGenes' autoplot( object, x_lab = "HGNC gene names", y_lab = paste0(object@summary_fun_name, "(", object@assay_name, ")"), title = "Top most expressed genes" )top_genes( object, assay_name = "counts", summary_fun = rowMeans, n_top = if (is.null(min_threshold)) 10L else NULL, min_threshold = NULL ) ## S4 method for signature 'HermesDataTopGenes' autoplot( object, x_lab = "HGNC gene names", y_lab = paste0(object@summary_fun_name, "(", object@assay_name, ")"), title = "Top most expressed genes" )
object |
( |
assay_name |
( |
summary_fun |
( |
n_top |
( |
min_threshold |
( |
x_lab |
( |
y_lab |
( |
title |
( |
The data frame is sorted in descending order of expression and only the top
entries according to the selection criteria are included.
Note that exactly one of the arguments n_top and min_threshold must be
provided.
A HermesDataTopGenes object.
autoplot(HermesDataTopGenes): Creates a bar plot from a HermesDataTopGenes object,
where the y axis shows the expression statistics for each of the top genes
on the x-axis.
object <- hermes_data # Default uses average of raw counts across samples to rank genes. top_genes(object) # Instead of showing top 10 genes, can also set a minimum threshold on average counts. top_genes(object, n_top = NULL, min_threshold = 50000) # We can also use the maximum of raw counts across samples, by specifying a different # summary statistics function. result <- top_genes(object, summary_fun = rowMax) # Finally we can produce barplots based on the results. autoplot(result, title = "My top genes") autoplot(result, y_lab = "Counts", title = "My top genes")object <- hermes_data # Default uses average of raw counts across samples to rank genes. top_genes(object) # Instead of showing top 10 genes, can also set a minimum threshold on average counts. top_genes(object, n_top = NULL, min_threshold = 50000) # We can also use the maximum of raw counts across samples, by specifying a different # summary statistics function. result <- top_genes(object, summary_fun = rowMax) # Finally we can produce barplots based on the results. autoplot(result, title = "My top genes") autoplot(result, y_lab = "Counts", title = "My top genes")
AnyHermesData ObjectsThese functions are used internally only and therefore not exported. They work on
SummarizedExperiment::SummarizedExperiment objects, and AnyHermesData objects are
defined by successfully passing these validation checks.
validate_counts(object) validate_cols(required, actual) validate_row_data(object) validate_col_data(object) validate_names(object) validate_prefix(object)validate_counts(object) validate_cols(required, actual) validate_row_data(object) validate_col_data(object) validate_names(object) validate_prefix(object)
object |
( |
required |
( |
actual |
( |
A character vector with the validation failure messages, or NULL in
case validation passes.
validate_counts(): validates that the first assay is counts containing
non-missing, integer, non-negative values.
validate_cols(): validates that required column names are contained in
actual column names.
validate_row_data(): validates that the object contains rowData with
required columns.
validate_col_data(): validates that the object contains colData with
required columns.
validate_names(): validates that the object contains row and column names.
validate_prefix(): validates that the object prefix is a string
and only contains alphabetic characters.
This helper function wraps SummarizedExperiment objects into an
a MultiAssayExperiment (MAE) object.
wrap_in_mae(x, name = deparse(substitute(x)))wrap_in_mae(x, name = deparse(substitute(x)))
x |
( |
name |
( |
The MAE object with the only experiment being x having the given
name.
mae <- wrap_in_mae(summarized_experiment) mae[["summarized_experiment"]]mae <- wrap_in_mae(summarized_experiment) mae[["summarized_experiment"]]