Title: | Interactive Workflow for Discovering Rhythmicity in Biological Data |
---|---|
Description: | Set of functions for estimation of cyclical characteristics, such as period, phase, amplitude, and statistical significance in large temporal datasets. Supporting functions are available for quality control, dimensionality reduction, spectral analysis, and analysis of experimental replicates. Contains a R Shiny web interface to execute all workflow steps. |
Authors: | Matthew Carlucci [aut, cre], Algimantas Kriščiūnas [aut], Haohan Li [aut], Povilas Gibas [aut], Karolis Koncevičius [aut], Art Petronis [aut], Gabriel Oh [aut] |
Maintainer: | Matthew Carlucci <[email protected]> |
License: | GPL-3 |
Version: | 1.23.0 |
Built: | 2024-10-30 05:28:22 UTC |
Source: | https://github.com/bioc/DiscoRhythm |
This launches the web interface to DiscoRhythm containing all analysis tools. The vignette contains details on usage.
discoApp(ncores = 1, port = 3838, local = TRUE)
discoApp(ncores = 1, port = 3838, local = TRUE)
ncores |
numeric, number of cores to use for parallelized tasks.
Currently, only used in oscillation detection function |
port |
numeric, port to run the shiny application on. Sets shiny.port option. |
local |
logical, set to FALSE for public server mode to reduce file size limits. |
Nothing is returned by this function.
## Not run: discoApp() ## End(Not run)
## Not run: discoApp() ## End(Not run)
Execute the DiscoRhythm workflow with one command to obtain the results
of oscillation detection (discoODAs
) and optionally generate an html
report with data visualizations from an Rmarkdown template. See the
DiscoRhythm vignette for more details on the analysis procedures.
discoBatch( indata, report = NULL, outdata = TRUE, ncores = 1, timeType = "circular", main_per = 24, cor_threshold = 3, cor_method = "pearson", cor_threshType = "sd", pca_threshold = 3, pca_scale = TRUE, pca_pcToCut = paste0("PC", seq_len(4)), aov_method = "None", aov_pcut = 0.05, aov_Fcut = 0, avg_method = "Median", osc_method = NULL, osc_period = 24 )
discoBatch( indata, report = NULL, outdata = TRUE, ncores = 1, timeType = "circular", main_per = 24, cor_threshold = 3, cor_method = "pearson", cor_threshType = "sd", pca_threshold = 3, pca_scale = TRUE, pca_pcToCut = paste0("PC", seq_len(4)), aov_method = "None", aov_pcut = 0.05, aov_Fcut = 0, avg_method = "Median", osc_method = NULL, osc_period = 24 )
indata |
SummarizedExperiment or data.frame, see the vignette for
the specific formats expected for each of these input types.
|
report |
character, if |
outdata |
logical, whether to return the final discoODAs (note if run
with |
ncores |
numeric, number of cores to use for parallelized tasks.
Currently, only used in oscillation detection function |
timeType |
character, nature of the sample times provided
(one of |
main_per |
numeric, the length of the main hypothesized period
(e.g. 24hr for circadian experiments). Used in |
cor_threshold |
numeric, threshold used in inter-sample correlation
analysis for outlier detection. Either in units of correlation coefficient
or standard deviations from the mean (see |
cor_method |
character, which correlation method to use for outlier removal (see cor for more details). |
cor_threshType |
character, one of "sd" or "value" indicating whether cor_threshold should be set by absolute correlation coefficient or by standard deviations from the mean of all samples. |
pca_threshold |
numeric, the number of standard deviations to set as the threshold for outlier detection in PCA outlier removal. |
pca_scale |
logical, whether to scale the data prior to PCA. |
pca_pcToCut |
character, names of which PCs to use for outlier detection (e.g. "PC1","PC2" etc.). |
aov_method |
character, method to use for ANOVA. One of: "Equal Variance", "Welch", or "None". |
aov_pcut |
numeric, p-value cutoff used to select rows with statistically significant signal-to-noise. |
aov_Fcut |
numeric, F-statistic cutoff used to select rows with high signal-to-noise based on magnitude. |
avg_method |
character, method for averaging technical replicates. One of: "Median","Mean","Random", or "None". |
osc_method |
character, vector of oscillation detection algorithms
to apply to the data. Methods that are detmined to be innappropraite for the
experimental design (using the |
osc_period |
numeric, a fixed period to use for oscillation detection using all methods. |
returns the results of discoODAs
discoODAs, discoRepAnalysis, discoPeriodDetection, discoPCAoutliers, discoInterCorOurliers
indata <- discoGetSimu() # Batch execute (on demo data) to generate a DiscoRhythm_report.html report. # Returns the results of discoODAs discoODAres <- discoBatch(indata, report="DiscoRhythm_report.html", osc_method="CS")
indata <- discoGetSimu() # Batch execute (on demo data) to generate a DiscoRhythm_report.html report. # Returns the results of discoODAs discoODAres <- discoBatch(indata, report="DiscoRhythm_report.html", osc_method="CS")
Performs various checks and cleaning operations on the input data.
discoCheckInput(se, n_min_sample = 3)
discoCheckInput(se, n_min_sample = 3)
se |
SummarizedExperiment, the main data object used by DiscoRhythm expected to contain se$ID, se$ReplicateID, se$Time sample metadata and non-null rownames. See the vignette for more details. |
n_min_sample |
numeric value specifying minimal number of samples needed to perform analysis. |
Rows containing NA's or all constant values are removed.
If matrix values are character it will be attempted to convert them
to numeric.
If input is not a matrix it will be converted using
as.matrix()
.
User will be warned if row IDs contain duplicate entries.
SummarizedExperiment checked for errors and modified as needed
se <- discoGetSimu(TRUE) se_clean <- discoCheckInput(se)
se <- discoGetSimu(TRUE) se_clean <- discoCheckInput(se)
Using sample times and biological sample Ids, constructs a summary table of the number of total samples at each timepoint and additionally summarizes the number of replicates for each biological sample.
discoDesignSummary(Metadata)
discoDesignSummary(Metadata)
Metadata |
data.frame of sample data, usually generated by
using |
A table where the first row summarizes the number of datapoints for each timepoint and other cells indicate the number of technical replicates for a given biological sample.
discoParseMeta
# import example data Metadata <- SummarizedExperiment::colData(discoGetSimu(TRUE)) # Summarize the experiment design discoDesignSummary(Metadata)
# import example data Metadata <- SummarizedExperiment::colData(discoGetSimu(TRUE)) # Summarize the experiment design discoDesignSummary(Metadata)
Functions to import a data.frame (from the format expected by the
web application discoApp()
) as a SummarizedExperiment object or
to export a SummarizedExperiment for use with the web application.
discoDFtoSE(Maindata, Metadata = NULL, shinySession = NULL) discoSEtoDF(se)
discoDFtoSE(Maindata, Metadata = NULL, shinySession = NULL) discoSEtoDF(se)
Maindata |
data.frame with the first column containing row IDs and all subsequent columns containing experimental values. Columns should follow the expected naming format described in the vignette. |
Metadata |
data.frame of sample data, usually generated by
using |
shinySession |
shiny session object for use only by the DiscoRhythm
shiny app |
se |
SummarizedExperiment, the main data object used by DiscoRhythm expected to contain se$ID, se$ReplicateID, se$Time sample metadata and non-null rownames. See the vignette for more details. |
discoDFtoSE returns a SummarizedExperiment object with colData containing sample metadata.
discoSEtoDF returns a DiscoRhythm format data.frame.
df <- discoGetSimu() se <- discoDFtoSE(df) df <- discoSEtoDF(se)
df <- discoGetSimu() se <- discoDFtoSE(df) df <- discoSEtoDF(se)
A convenience function to get the simulated circadian transcriptomic system data file used in DiscoRhythm for various demonstrations and tests.
discoGetSimu(as_se = FALSE)
discoGetSimu(as_se = FALSE)
as_se |
logical, indicates if example data should be returned as a SummarizedExperiment or data.frame. |
The simulated demo dataset used in the DiscoRhythm web application as a data.frame or SummarizedExperiment.
indata <- discoGetSimu()
indata <- discoGetSimu()
A small matrix indicating which algorithms should be excluded given certain experimental designs and data types.
discoODAexclusionMatrix
discoODAexclusionMatrix
An object of class matrix
(inherits from array
) with 4 rows and 7 columns.
# Code used to generate discoODAexclusionMatrix itemNames <- c( "missing_value", "with_bio_replicate", "non_integer_interval", "uneven_interval", "circular_t", "invalidPeriod", "invalidJTKperiod" ) # Creating requirements matrix, first assuming all methods are valid # Then appying exclusion criteria of MetaCycle plus CS criteria mat <- matrix(TRUE, nrow = 4, ncol = length(itemNames)) rownames(mat) <- c("CS", "JTK", "LS", "ARS") colnames(mat) <- itemNames # Exclusion criteria from MetaCycle v1.1, i.e. can algorithm handle XXX mat[c("ARS", "JTK"), c("non_integer_interval", "uneven_interval")] <- FALSE mat["ARS", "with_bio_replicate"] <- FALSE mat["ARS", "missing_value"] <- FALSE mat["JTK", "invalidJTKperiod"] <- FALSE # Additional exclusion criteria mat["ARS", "circular_t"] <- FALSE mat[c("CS", "JTK", "ARS", "LS"), "invalidPeriod"] <- FALSE discoODAexclusionMatrix <- mat
# Code used to generate discoODAexclusionMatrix itemNames <- c( "missing_value", "with_bio_replicate", "non_integer_interval", "uneven_interval", "circular_t", "invalidPeriod", "invalidJTKperiod" ) # Creating requirements matrix, first assuming all methods are valid # Then appying exclusion criteria of MetaCycle plus CS criteria mat <- matrix(TRUE, nrow = 4, ncol = length(itemNames)) rownames(mat) <- c("CS", "JTK", "LS", "ARS") colnames(mat) <- itemNames # Exclusion criteria from MetaCycle v1.1, i.e. can algorithm handle XXX mat[c("ARS", "JTK"), c("non_integer_interval", "uneven_interval")] <- FALSE mat["ARS", "with_bio_replicate"] <- FALSE mat["ARS", "missing_value"] <- FALSE mat["JTK", "invalidJTKperiod"] <- FALSE # Additional exclusion criteria mat["ARS", "circular_t"] <- FALSE mat[c("CS", "JTK", "ARS", "LS"), "invalidPeriod"] <- FALSE discoODAexclusionMatrix <- mat
A small named vector mapping oscillation detection algorithm names to a convenient identifier.
discoODAid2name
discoODAid2name
A named vector, length 4
Identifier
Full names
Runs specified oscillation detection algorithms (ODAs) sequentially to obtain oscillation characteristics for each row of the input data.
discoGetODAs(se, method = NULL, period, circular_t = FALSE) discoODAs( se, period = 24, method = c("CS", "JTK", "LS", "ARS"), circular_t = FALSE, ncores = 1 )
discoGetODAs(se, method = NULL, period, circular_t = FALSE) discoODAs( se, period = 24, method = c("CS", "JTK", "LS", "ARS"), circular_t = FALSE, ncores = 1 )
se |
SummarizedExperiment, the main data object used by DiscoRhythm expected to contain se$ID, se$ReplicateID, se$Time sample metadata and non-null rownames. See the vignette for more details. |
method |
character, short names of ODAs to use. If length>1 all input method names will be evaluated. |
period |
numeric, the hypothesized period to test for. |
circular_t |
logical, is time circular on some base-cycle (ex. time of day). See the DiscoRhythm vignette for details. |
ncores |
numeric, number of cores to parallelize with (applicable to JTK, ARSER and LS only). If 1, will execute in serial. |
There are currently 4 available algorithms for rhythm detection:
CS = Cosinor (Cornelissen,G. 2014): a.k.a “Harmonic Regression” fits a sinusoid with a free phase parameter.
LS = Lomb-Scargle (Glynn, 2006): an approach using spectral power density.
ARS = ARSER (Yang, 2010): removes linear trends and performs the Cosinor test.
JTK = JTK Cycle (Hughes, 2010): non-parametric test of rhythmicity robust to outliers.
LS, ARS, and JTK results come directly from MetaCycle meta2d() output using the specified fixed period. ARSmle is set to "nomle" and no method integration is used (see meta2d documentation for details).
CS is implemented directly in DiscoRhythm's lmCSmat() as the single-component cosinor described in Cornelissen,G. (2014).
All q-values are calculated by performing p.adjust() on the resulting p-values with method="fdr".
Technical replicates are expected to be merged (likely by discoRepAnalysis) prior to usage of discoODAs.
The discoGetODAs function is called by discoODAs to determine if the selected methods may be used. If any methods are not valid, a warning will be thrown and only valid methods will be computed. discoGetODAs is not typically used directly, however, it may be called by the user to determine if the provided SummarizedExperiment is suitable for use with the specified methods.
A named list of results where each element is a data.frame for the corresponding method with rownames corresponding to the feature identifiers and columns containing estimates for:
acrophase
amplitude
p-value
q-value
Additional columns relevant to each method will be present.
Yang R. and Su Z. (2010). Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics, 26(12), i168–i174.
Hughes M. E., Hogenesch J. B. and Kornacker K. (2010). JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. Journal of Biological Rhythms, 25(5), 372–380.
Glynn E. F., Chen J. and Mushegian A. R. (2006). Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics, 22(3), 310–316.
Cornelissen,G. (2014) Cosinor-based rhythmometry. Theor. Biol. Med. Model., 11, 16.
# Return valid ODAs for example dataset discoGetODAs(discoGetSimu(as_se=TRUE),period=24) # Import the simulated example dataset se <- discoCheckInput(discoGetSimu(TRUE)) # Use discoRepAnalysis to average technical replicates se_merged <- discoRepAnalysis(se,aov_pcut=1)$se # Execute the Cosinor and JTK methods with a 24hr period discoODAres <- discoODAs(se_merged,method=c("CS","JTK")) # Get the index of rhythmic features detected by both methods at qvalue<0.05 idx <- which(discoODAres$CS$qvalue<0.05 & discoODAres$JTK$qvalue<0.05) # Get the identifiers for common rhythmic features rownames(se_merged)[idx]
# Return valid ODAs for example dataset discoGetODAs(discoGetSimu(as_se=TRUE),period=24) # Import the simulated example dataset se <- discoCheckInput(discoGetSimu(TRUE)) # Use discoRepAnalysis to average technical replicates se_merged <- discoRepAnalysis(se,aov_pcut=1)$se # Execute the Cosinor and JTK methods with a 24hr period discoODAres <- discoODAs(se_merged,method=c("CS","JTK")) # Get the index of rhythmic features detected by both methods at qvalue<0.05 idx <- which(discoODAres$CS$qvalue<0.05 & discoODAres$JTK$qvalue<0.05) # Get the identifiers for common rhythmic features rownames(se_merged)[idx]
Parses the sample metadata from a vector of sample names (often column names of a Maindata format data.frame).
discoParseMeta(samplenames, shinySession = NULL)
discoParseMeta(samplenames, shinySession = NULL)
samplenames |
character, a list of sample names following the DiscoRhythm naming convention (<prefix><Time>_<UniqueID>_<ReplicateID>). |
shinySession |
shiny session object for use only by the DiscoRhythm
shiny app |
The regular expression used to obtain metadata is "^([[:alpha:]]*)(\-?[0-9]+[\.]?[0-9]*)\_? ([[:alnum:]\.]*)\_?([[:alnum:]\.]*)$"
Where each () will be used to construct the final metadata data.frame
a data.frame containing 3 columns of metadata. ID = unique sample identity. Time = sample colleciton time. ReplicateID = Identifier where Time + ReplicateID indicates a biological sample ID.
discoParseMeta(c("CT24_AD_1","CT24_AS_1","CT24_AE_2","CT24_AW_2", "CT26_AB_1","CT26_AC_1","CT26_BB_2","CT26_BC_2"))
discoParseMeta(c("CT24_AD_1","CT24_AS_1","CT24_AE_2","CT24_AW_2", "CT26_AB_1","CT26_AC_1","CT26_BB_2","CT26_BC_2"))
Calculates PCA results from prcomp
with error handling and outputs
suitable for the DiscoRhythm workflow.
discoPCA(se, scale = TRUE, npcs = 10)
discoPCA(se, scale = TRUE, npcs = 10)
se |
SummarizedExperiment, the main data object used by DiscoRhythm expected to contain se$ID, se$ReplicateID, se$Time sample metadata and non-null rownames. See the vignette for more details. |
scale |
logical, whether or not to scale the data prior to PCA, see prcomp for more details. |
npcs |
numeric, maximum number of principal components to return. |
output from prcomp with an added table summary
se <- discoGetSimu(TRUE) pca <- discoPCA(se)
se <- discoGetSimu(TRUE) pca <- discoPCA(se)
Detect dataset-wide fits to multiple periodicities
discoPeriodDetection( se, timeType = c("linear", "circular"), main_per = 24, test_periods = NULL )
discoPeriodDetection( se, timeType = c("linear", "circular"), main_per = 24, test_periods = NULL )
se |
SummarizedExperiment, the main data object used by DiscoRhythm expected to contain se$ID, se$ReplicateID, se$Time sample metadata and non-null rownames. See the vignette for more details. |
timeType |
character, time is either reported as "linear" or "circular" on some base-cycle (ex. time of day). This determines the periods that will be tested for. |
main_per |
numeric, if |
test_periods |
numeric, a vector of the periods to test.
if |
A data.frame of Rsquared values for each period, for each row of Maindata.
se <- discoGetSimu(TRUE) # Detect periods rsqs <- discoPeriodDetection(se)
se <- discoGetSimu(TRUE) # Detect periods rsqs <- discoPeriodDetection(se)
Functions for executing outlier detection and row filtering procedures prior to rhythmicity analysis.
discoPCAoutliers(se, threshold = 3, scale = TRUE, pcToCut = seq_len(4)) discoInterCorOutliers( se, cor_method = c("pearson", "kendall", "spearman"), threshold = 3, thresh_type = c("sd", "value") ) discoRepAnalysis( se, aov_method = c("Equal Variance", "Welch", "None"), aov_pcut = 0.05, aov_Fcut = 0, avg_method = c("Median", "Mean", "Random", "None") )
discoPCAoutliers(se, threshold = 3, scale = TRUE, pcToCut = seq_len(4)) discoInterCorOutliers( se, cor_method = c("pearson", "kendall", "spearman"), threshold = 3, thresh_type = c("sd", "value") ) discoRepAnalysis( se, aov_method = c("Equal Variance", "Welch", "None"), aov_pcut = 0.05, aov_Fcut = 0, avg_method = c("Median", "Mean", "Random", "None") )
se |
SummarizedExperiment, the main data object used by DiscoRhythm expected to contain se$ID, se$ReplicateID, se$Time sample metadata and non-null rownames. See the vignette for more details. |
threshold |
numeric, a threshold determining which samples are outliers (for discoInterCorOutliers, in units of thresh_type, for discoPCAoutliers in units of standard deviations). |
scale |
logical, whether or not to scale the data prior to PCA, see prcomp for more details. |
pcToCut |
numeric, which PCs to use for outlier detection. It is recomended to select the first X PCs based on which PCs explain a significant amount of variance in the data. |
cor_method |
character, method of pairwise correlation (see cor's "method" argument for all options). |
thresh_type |
character indicating threshold type (either standard deviations below the mean, or an absolution correlation value). One of: "sd" or "value". |
aov_method |
character, method to use for ANOVA. One of: "Equal Variance", "Welch", or "None". |
aov_pcut |
numeric, p-value cutoff used to select rows with statistically significant signal-to-noise. |
aov_Fcut |
numeric, F-statistic cutoff used to select rows with high signal-to-noise based on magnitude. |
avg_method |
character, method for averaging technical replicates. One of: "Median","Mean","Random", or "None". |
list containing PCA results and the detected outliers
A list of 3 objects: 1) outliers - named logical indicating if the sample is an outlier 2) meanCor - mean of all pairwise correlations for a given sample 3) corMat - Matrix of all pairwise correlation values
se <- discoGetSimu(TRUE) PCAres <- discoPCAoutliers(se) CorRes <- discoInterCorOutliers(se) ANOVAres <- discoRepAnalysis(se)
se <- discoGetSimu(TRUE) PCAres <- discoPCAoutliers(se) CorRes <- discoInterCorOutliers(se) ANOVAres <- discoRepAnalysis(se)