Title: | An R Package for Time Course RNASeq Data Analysis |
---|---|
Description: | Simple and efficient workflow for time-course gene expression data, built on publictly available open-source projects hosted on CRAN and bioconductor. moanin provides helper functions for all the steps required for analysing time-course data using functional data analysis: (1) functional modeling of the timecourse data; (2) differential expression analysis; (3) clustering; (4) downstream analysis. |
Authors: | Elizabeth Purdom [aut] , Nelle Varoquaux [aut, cre] |
Maintainer: | Nelle Varoquaux <[email protected]> |
License: | BSD 3-clause License + file LICENSE |
Version: | 1.15.0 |
Built: | 2024-10-30 08:54:20 UTC |
Source: | https://github.com/bioc/moanin |
Compute consensus matrix from labels
consensus_matrix(labels, scale = TRUE)
consensus_matrix(labels, scale = TRUE)
labels |
a matrix with each column corresponding to a the results of a single clustering routine. Each column should give the cluster assignment FIXME: What is the required format of entries?? |
scale |
boolean, optional, default: TRUE. Whether to rescale the resulting consensus matrix so that entries correspond to proportions. |
a symmetric matrix of size NxN, where N is the number of rows of the
input matrix labels
. Each i,j entry of the matrix corresponds the
number of times the two rows were in the same cluster across the clusterings
(scale=FALSE
) or the proportion of clustering that the two rows are
in the same cluster (scale=TRUE
).
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) #small function to run splines_kmeans on subsample of 50 genes subsampleCluster<-function(){ ind<-sample(1:nrow(moanin),size=50,replace=FALSE) km<-splines_kmeans(moanin[ind,],n_clusters=3) assign<-splines_kmeans_predict(moanin,km, method="distance") } kmClusters=replicate(10,subsampleCluster()) cm<-consensus_matrix(kmClusters) heatmap(cm)
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) #small function to run splines_kmeans on subsample of 50 genes subsampleCluster<-function(){ ind<-sample(1:nrow(moanin),size=50,replace=FALSE) km<-splines_kmeans(moanin[ind,],n_clusters=3) assign<-splines_kmeans_predict(moanin,km, method="distance") } kmClusters=replicate(10,subsampleCluster()) cm<-consensus_matrix(kmClusters) heatmap(cm)
Run spline models and test for DE of contrasts.
## S4 method for signature 'Moanin' DE_timecourse( object, contrasts, center = FALSE, statistic = c("ftest", "lrt"), use_voom_weights = TRUE )
## S4 method for signature 'Moanin' DE_timecourse( object, contrasts, center = FALSE, statistic = c("ftest", "lrt"), use_voom_weights = TRUE )
object |
An object of class |
contrasts |
Contrasts, either provided as a vector of strings, or a
matrix of contrasts coefficients obtained using
|
center |
boolean, whether to center the data matrix |
statistic |
Which test statistic to use, a likelihood ratio statistic or a F-test. |
use_voom_weights |
boolean, optional, default: TRUE. Whether to use voom weights. See details. |
The implementation of the spline fit and the calculation of p-values
was based on code from edge
, and expanded to enable
handling of comparisons of groups via contrasts. The code assumes that the Moanin
object was created via either a formula or a basis where a different spline was fit for each group_variable
and thus the contrasts are comparisons of those spline fits. If the Moanin
object was created via user-provided basis matrix or formula, then the user should take a great deal of caution in using this code, as the degrees of freedom for the tests of significance cannot be verified to be correct.
If use_voom_weights=TRUE
, then before fitting splines to each gene,
voom weights are calculated from assay(object)
:
y <- edgeR::DGEList(counts=assay(object)) y <- edgeR::calcNormFactors(y, method="upperquartile") v <- limma::voom(y, design, plot=FALSE) weights <- v$weights
The design matrix for the voom weights is based on the formula
~Group + Timepoint +0
where Group and Timepoint are replaced with the user-defined values where appropriate.
These weights are given to the lm.fit
which fits the spline coefficients.
This workflow assumes that the input to the Moanin
object were counts.
If the user set log_transform=TRUE
in the creation of the
Moanin
object, the splines will be fit to the log of the input data,
and not directly to the input data. This is independent of whether the user
chooses use_voom_weights
.
A data.frame
with two columns for each of the contrasts given
in contrasts
, corresponding to the raw p-value of the contrast for
that gene (_pval
) and the adjusted p-value (_qval
). The
adjusted p-values are FDR-adjusted based on the Benjamini-Hochberg method,
as implemented in p.adjust
. The adjustment is done
across all p-values for all contrasts calculated.
makeContrasts
, create_moanin_model
,
DE_timepoints
, edge
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) deTimecourse=DE_timecourse(moanin, contrasts="K-C", use_voom_weights=FALSE) head(deTimecourse)
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) deTimecourse=DE_timecourse(moanin, contrasts="K-C", use_voom_weights=FALSE) head(deTimecourse)
Fit weekly differential expression analysis
Creates pairwise contrasts for all timepoints
## S4 method for signature 'Moanin' DE_timepoints(object, contrasts, add_factors = NULL, use_voom_weights = TRUE) ## S4 method for signature 'Moanin' create_timepoints_contrasts( object, group1, group2 = NULL, type = c("per_timepoint_group_diff", "per_group_timepoint_diff", "group_and_timepoint_diff"), timepoints = sort(unique(time_variable(object))), timepoints_before = head(sort(timepoints), -1), timepoints_after = tail(sort(timepoints), -1), format = c("vector", "data.frame") )
## S4 method for signature 'Moanin' DE_timepoints(object, contrasts, add_factors = NULL, use_voom_weights = TRUE) ## S4 method for signature 'Moanin' create_timepoints_contrasts( object, group1, group2 = NULL, type = c("per_timepoint_group_diff", "per_group_timepoint_diff", "group_and_timepoint_diff"), timepoints = sort(unique(time_variable(object))), timepoints_before = head(sort(timepoints), -1), timepoints_after = tail(sort(timepoints), -1), format = c("vector", "data.frame") )
object |
An object of class |
contrasts |
Contrasts, either provided as a vector of strings, or a
matrix of contrasts coefficients obtained using
|
add_factors |
A character vector of additional variables to add to the design. See details. |
use_voom_weights |
boolean, optional, default: TRUE. Whether to use voom weights. See details. |
group1 |
First group to consider in making contrasts, character value
that must match a value of the grouping variable contained in
|
group2 |
Second group to consider in making contrasts, character value
that must match a value of the grouping variable contained in
|
type |
the type of contrasts that should be created. See details. |
timepoints |
vector of timepoints to compare. Must be contained in the
|
timepoints_before |
for |
timepoints_after |
for |
format |
the choice of "vector" (the default) for
|
By default the formula fitted for each gene is
~ Group*Timepoint +0
If the user gives values to add_factors
, then the vector of character
values given in add_factors
will be added to the default formula.
So that add_factors="Replicate"
will change the formula to
~ Group*Timepoint +0 + Replicate
This allows for a small amount of additional complexity to control for other variables. Users should work directly with limma for more complex models.
If use_voom_weights=TRUE
, the data is given directly to limma
via assay(object)
. The specific series of
calls is:
y <- edgeR::DGEList(counts=assay(object)) y <- edgeR::calcNormFactors(y, method="upperquartile") v <- limma::voom(y, design, plot=FALSE) v <- limma::lmFit(v)
If the user set log_transform=TRUE
in the creation of the
Moanin
object, this will not have an impact in the analysis if
use_voom_weights=TRUE
. Only if use_voom_weights=FALSE
will
this matter, in which case the log of the input data will be given to a
regular call to limma
:
y<-get_log_data(object) v <- limma::lmFit(y, design)
create_timepoints_contrasts
creates the needed contrasts for
comparing groups or timepoints in the format needed for
DE_timepoints
(i.e. makeContrasts
), to which the
contrasts are ultimately passed. The time points and groups are determined
by the levels of the grouping_variable
and the values of
time_variable
in the moanin_object
provided by the user.
Three different types of contrasts are created:
"per_timepoint_group_diff"Contrasts that compare the groups within a timepoint
"per_group_timepoint_diff"Contrasts that compare two timepoints within a group
"group_and_timepoint_diff"Contrasts that compare the
difference between two timepoints between two levels of the
group_variable
of the Moanin
object. These are contrasts in
the form (TP i - TP (i-1))[Group1] - (TP i - TP (i-1))[Group2].
create_timepoints_contrasts
: a character vector with each
element of the vector corresponding to a contrast to be compared.
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) # compare groups within each timepoint contrasts <- create_timepoints_contrasts(moanin,"C", "K", type="per_timepoint_group_diff") head(contrasts) deTimepoints=DE_timepoints(moanin, contrasts=contrasts, use_voom_weights=FALSE) head(deTimepoints) # Control for replicate variable: deTimepoints=DE_timepoints(moanin, contrasts=contrasts, add_factors="Replicate", use_voom_weights=FALSE) head(deTimepoints) # compare adjacent timepoints within each group contrastsDiff <- create_timepoints_contrasts(moanin,"C", type="per_group_timepoint_diff") deDiffTimepoints=DE_timepoints(moanin, contrasts=contrastsDiff, use_voom_weights=FALSE) # provide the sets of timepoints to compare: contrastsDiff2<-create_timepoints_contrasts(moanin,"C", timepoints_before=c(72,120),timepoints_after=c(168,168), type="per_group_timepoint_diff") deDiffTimepoints2=DE_timepoints(moanin, contrasts=contrastsDiff2, use_voom_weights=FALSE) # Compare selected timepoints across groups. # This time we also return format="data.frame" which helps us keep track of # the meaning of each contrast. contrastsGroupDiff<-create_timepoints_contrasts(moanin,"C", "K", timepoints_before=c(72,120),timepoints_after=c(168,168), type="group_and_timepoint_diff",format="data.frame") head(contrastsGroupDiff) deGroupDiffTimepoints=DE_timepoints(moanin, contrasts=contrastsGroupDiff$contrasts, use_voom_weights=FALSE)
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) # compare groups within each timepoint contrasts <- create_timepoints_contrasts(moanin,"C", "K", type="per_timepoint_group_diff") head(contrasts) deTimepoints=DE_timepoints(moanin, contrasts=contrasts, use_voom_weights=FALSE) head(deTimepoints) # Control for replicate variable: deTimepoints=DE_timepoints(moanin, contrasts=contrasts, add_factors="Replicate", use_voom_weights=FALSE) head(deTimepoints) # compare adjacent timepoints within each group contrastsDiff <- create_timepoints_contrasts(moanin,"C", type="per_group_timepoint_diff") deDiffTimepoints=DE_timepoints(moanin, contrasts=contrastsDiff, use_voom_weights=FALSE) # provide the sets of timepoints to compare: contrastsDiff2<-create_timepoints_contrasts(moanin,"C", timepoints_before=c(72,120),timepoints_after=c(168,168), type="per_group_timepoint_diff") deDiffTimepoints2=DE_timepoints(moanin, contrasts=contrastsDiff2, use_voom_weights=FALSE) # Compare selected timepoints across groups. # This time we also return format="data.frame" which helps us keep track of # the meaning of each contrast. contrastsGroupDiff<-create_timepoints_contrasts(moanin,"C", "K", timepoints_before=c(72,120),timepoints_after=c(168,168), type="group_and_timepoint_diff",format="data.frame") head(contrastsGroupDiff) deGroupDiffTimepoints=DE_timepoints(moanin, contrasts=contrastsGroupDiff$contrasts, use_voom_weights=FALSE)
Provides set of basis functions on either side of a time point, allowing for a discontinuity in the fitted functions
discont_basis( timepoints, discont_point, knots = NULL, dfPre = NULL, dfPost = dfPre, degree = 3, intercept = TRUE, type = c("ns", "bs") )
discont_basis( timepoints, discont_point, knots = NULL, dfPre = NULL, dfPost = dfPre, degree = 3, intercept = TRUE, type = c("ns", "bs") )
timepoints |
vector of numeric timepoints for which the splines basis will be evaluated |
discont_point |
a single numeric value that represents where the discontinuity should be |
knots |
passed to |
dfPre |
the df for the basis functions defined before the discontinuity point |
dfPost |
the df for the basis functions defined after the discontinuity point |
degree |
passed to |
intercept |
Whether to include an intercept (vector of all 1s) for each side of the discontinuity. Note this is different than the argument |
type |
either "ns" or "bs" indicating which splines basis function should be used. |
x<-seq(0,10,length=100) basis<-discont_basis(x,discont_point=3, dfPre=3, dfPost=4, intercept=TRUE) # Plot of the basis functions par(mfrow=c(3,3)) for(i in 1:ncol(basis)){ plot(x,basis[,i],type="l") abline(v=3,lty=2) } # Use it in a moanin_model object instead of ns/bs: data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta, spline_formula=~Group:discont_basis(Timepoint,dfPre=3, dfPost=3,discont=20,intercept=TRUE)+0, degrees_of_freedom=6)
x<-seq(0,10,length=100) basis<-discont_basis(x,discont_point=3, dfPre=3, dfPost=4, intercept=TRUE) # Plot of the basis functions par(mfrow=c(3,3)) for(i in 1:ncol(basis)){ plot(x,basis[,i],type="l") abline(v=3,lty=2) } # Use it in a moanin_model object instead of ns/bs: data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta, spline_formula=~Group:discont_basis(Timepoint,dfPre=3, dfPost=3,discont=20,intercept=TRUE)+0, degrees_of_freedom=6)
Estimates log fold change
## S4 method for signature 'Moanin' estimate_log_fold_change( object, contrasts, method = c("timecourse", "sum", "max", "timely", "abs_sum", "abs_squared_sum", "min") )
## S4 method for signature 'Moanin' estimate_log_fold_change( object, contrasts, method = c("timecourse", "sum", "max", "timely", "abs_sum", "abs_squared_sum", "min") )
object |
An object of class |
contrasts |
The contrasts to consider |
method |
method for calculating the log-fold change. See details. |
The following methods exist for calculating the log-fold change between conditions over time (default is "timecourse"):
timely
The log-fold change for each individual timepoint
()
timecourse
The average absolute per-week fold-change,
multiplied by the sign of the average per-week fold-change.
sum
Sum of per-week log fold change, over all timepoints
max
Max of per-week log fold change, over all timepoints
abs_sum
Sum of the absolute value of the per-week log fold
change, over all timepoints
abs_squared_sum
Sum of the square value of the per-week log
fold change, over all timepoint
min
Min of per-week log fold change, over all timepoints
If the user set log_transform=TRUE
in the creation of the
Moanin
object, the data will be log transformed before calculating
the fold-change.
A data.frame giving the estimated log-fold change for each gene
(row). For all methods except for "timely", the data frame will consist of
one column for each value of the argument contrasts
. For "timely"
there will be one column for each timepoint and contrast combination.
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) estsTimely <- estimate_log_fold_change(moanin, contrasts=c("K-C"), method="timely") head(estsTimely) estsTimecourse <- estimate_log_fold_change(moanin, contrasts=c("K-C"),method="timecourse") head(estsTimecourse)
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) estsTimely <- estimate_log_fold_change(moanin, contrasts=c("K-C"), method="timely") head(estsTimely) estsTimecourse <- estimate_log_fold_change(moanin, contrasts=c("K-C"),method="timecourse") head(estsTimecourse)
Small data set for running examples
Three objects are loaded, a data frame of expression of 500 genes by
84 samples (testData
), a data frame with meta information on those
84 samples (testMeta
), and a data frame giving the GOID of the genes
in testData
.
This data is a subset of the full time course data available as
shoemaker2015
and is only provided for the
purpose of running examples, and not for biological meaning. Users should
refer to the full data set.
The rownames of testData
are RefSeq.
#code used to create data: ## Not run: library(timecoursedata) data(shoemaker2015) testData<-shoemaker2015$data[1:500,] whSamples<-which(shoemaker2015$meta$Group %in% c("C","K","M")) testData<-testData[,whSamples] testMeta<-droplevels(shoemaker2015$meta[whSamples,]) library(biomaRt) ensembl = useMart("ensembl") ensembl = useMart("ensembl") ensembl = useDataset("mmusculus_gene_ensembl", mart=ensembl) testGenesGO = getBM(attributes=c("go_id", "refseq_mrna"), values=rownames(testData), filters="refseq_mrna", mart=ensembl) save(list=c("testData","testMeta","testGenesGO"),file="data/exampleData.rda") ## End(Not run)
#code used to create data: ## Not run: library(timecoursedata) data(shoemaker2015) testData<-shoemaker2015$data[1:500,] whSamples<-which(shoemaker2015$meta$Group %in% c("C","K","M")) testData<-testData[,whSamples] testMeta<-droplevels(shoemaker2015$meta[whSamples,]) library(biomaRt) ensembl = useMart("ensembl") ensembl = useMart("ensembl") ensembl = useDataset("mmusculus_gene_ensembl", mart=ensembl) testGenesGO = getBM(attributes=c("go_id", "refseq_mrna"), values=rownames(testData), filters="refseq_mrna", mart=ensembl) save(list=c("testData","testMeta","testGenesGO"),file="data/exampleData.rda") ## End(Not run)
Find enriched GO terms
Create the Gene to GO Term mapping
find_enriched_go_terms( assignments, gene_id_to_go, ontology = "BP", weighted = FALSE, node_size = 10 ) create_go_term_mapping(genes, gene_col = "refseq_mrna")
find_enriched_go_terms( assignments, gene_id_to_go, ontology = "BP", weighted = FALSE, node_size = 10 ) create_go_term_mapping(genes, gene_col = "refseq_mrna")
assignments |
boolean named vector determining the gene subset to be tested for enrichment of GO terms. The names of the vector should be the gene names. Elements with TRUE will consist of the gene cluster. |
gene_id_to_go |
List giving the Gene ID to GO object required for topGO
(see |
ontology |
string, optional, default: BP. specficies which ontology to
use (passed to |
weighted |
boolean, optional, default: FALSE. Whether to use the
weighted algorithm or not in |
node_size |
integer, optional, default: 10. Consider only GO terms with
node_size number of genes, passed to |
genes |
dataframe, with two required columns. The first gives the gene
names, with column name by the argument |
gene_col |
the name of the column of the |
find_enriched_go_terms
is a wrapper for running a GO
enrichment analysis via the package topGO
. This function creates a
topGOdata-class
object, runs the function
runTest
to test for enrichment using the
statistic="fisher"
option, and then runs
GenTable
. This function then does some post-processing
of the results, returning only GO terms that satisfy:
BH adjusted p-values less than 0.05 using
p.adjust
GO terms are enriched, i.e. the number of genes from the GO term found in the subset is greater than expected
Returns results in the format of GenTable
.
create_go_term_mapping
returns a list giving the gene to GO id
in the format required by topGOdata-class
.
create_go_term_mapping
, find_enriched_pathway
, GenTable
, runTest
, topGOdata-class
, p.adjust
data(exampleData) head(testGenesGO) #gives the mapping of genes to GO geneId2Go <- create_go_term_mapping(testGenesGO) #create fake assignment of genes to group based on TRUE/FALSE values inGroup=rep(FALSE,nrow(testData)) inGroup[1:10]=TRUE names(inGroup) <- row.names(testData) find_enriched_go_terms(inGroup, geneId2Go)
data(exampleData) head(testGenesGO) #gives the mapping of genes to GO geneId2Go <- create_go_term_mapping(testGenesGO) #create fake assignment of genes to group based on TRUE/FALSE values inGroup=rep(FALSE,nrow(testData)) inGroup[1:10]=TRUE names(inGroup) <- row.names(testData) find_enriched_go_terms(inGroup, geneId2Go)
Moanin
is a class that extends
SummarizedExperiment
and is used to store the additional spline
basis and meta data for timecourse analysis.
In addition to the slots of the SummarizedExperiment
class, the Moanin
object has the additional slots described
in the Slots section.
There are several methods implemented for this class. The most important methods have their own help page. Simple helper methods are described in the Methods section below. For a comprehensive list of methods specific to this class see the Reference Manual.
The constructor create_moanin_model
creates an object of
the class Moanin
.
## S4 method for signature 'DataFrame' create_moanin_model(data, meta, ...) ## S4 method for signature 'data.frame' create_moanin_model(data, ...) ## S4 method for signature 'matrix' create_moanin_model(data, meta, ...) ## S4 method for signature 'SummarizedExperiment' create_moanin_model( data, spline_formula = NULL, basis_matrix = NULL, group_variable_name = "Group", time_variable_name = "Timepoint", degrees_of_freedom = NULL, log_transform = FALSE, drop_levels = TRUE )
## S4 method for signature 'DataFrame' create_moanin_model(data, meta, ...) ## S4 method for signature 'data.frame' create_moanin_model(data, ...) ## S4 method for signature 'matrix' create_moanin_model(data, meta, ...) ## S4 method for signature 'SummarizedExperiment' create_moanin_model( data, spline_formula = NULL, basis_matrix = NULL, group_variable_name = "Group", time_variable_name = "Timepoint", degrees_of_freedom = NULL, log_transform = FALSE, drop_levels = TRUE )
data |
The input data. Can be a |
meta |
Meta data on the samples (columns) of the |
... |
arguments passed from methods to the |
spline_formula |
formula object, optional, default: NUlL. Used to
construct splines from the data in |
basis_matrix |
matrix, optional, default: NULL. A basis matrix, where each row corresponds to the evaluation of a sample on the basis function (thus one column for each basis function). |
group_variable_name |
A character value giving the column that corresponds to the grouping variable to test for DE. By default "Group" |
time_variable_name |
A character value giving the column that corresponds to the time variable. By default "Timepoint". |
degrees_of_freedom |
int, optional. Number of degrees of freedom to use if neither the basis_matrix nor the spline_formula is provided. If not provided by the user, internally will be set to 4 |
log_transform |
whether the data should be log-transformed by certain
methods (see |
drop_levels |
Logical, whether to perform |
If neither spline_formula
nor basis_matrix
is given,
then by default, the function will create a basis matrix based on the
formula:
spline_formula = ~Group:ns(Timepoint, df=4) + Group + 0
Note that the meta data will have levels dropped (via
droplevels
).
Input to data
that is given as a class DataFrame
or
data.frame
will be converted to class matrix
. The reason
for this is that use of a data.frame
creates errors in taking duplicate rows/columns of SummarizedExperiment
, as in bootstrapping.
Users who absolutely want the object to hold a
object that is not a matrix can construct a SummarizedExperiment
object
(which will not convert the input into a matrix
), and
use this as input to create_moanin_model
.
An object of class Moanin
time_variable_name
character value giving the column in colData
that defines the time variable (must be of class numeric
)
group_variable_name
character value giving the column in colData
that defines the grouping variable (must be of class factor
)
basis_matrix
A basis matrix, where each row corresponds to the evaluation of a sample on the basis function (thus one column for each basis function).
spline_formula
a formula. The formula used in creating the basis matrix
degrees_of_freedom
a numeric integer. Number of degrees of freedom used in creating basis matrix. If NULL, degrees of freedom is not known (usually if user provided basis without degrees of freedom)
log_transform
logical, whether to log-transform the data for certain methods
# Load some data data(exampleData) # Use the default options moanin = create_moanin_model(data=testData,meta=testMeta) moanin # Change the number of degrees of freedom moanin = create_moanin_model(data=testData,meta=testMeta, degrees_of_freedom=6) moanin
# Load some data data(exampleData) # Use the default options moanin = create_moanin_model(data=testData,meta=testMeta) moanin # Change the number of degrees of freedom moanin = create_moanin_model(data=testData,meta=testMeta, degrees_of_freedom=6) moanin
This is a collection of helper methods for the Moanin class.
## S4 method for signature 'Moanin' group_variable_name(object) ## S4 replacement method for signature 'Moanin' group_variable_name(object) <- value ## S4 method for signature 'Moanin' time_by_group_variable(object) ## S4 method for signature 'Moanin' group_variable(object) ## S4 replacement method for signature 'Moanin' group_variable(object) <- value ## S4 method for signature 'Moanin' time_variable_name(object) ## S4 replacement method for signature 'Moanin' time_variable_name(object) <- value ## S4 method for signature 'Moanin' time_variable(object) ## S4 replacement method for signature 'Moanin' time_variable(object) <- value ## S4 method for signature 'Moanin' degrees_of_freedom(object) ## S4 method for signature 'Moanin' basis_matrix(object) ## S4 method for signature 'Moanin' spline_formula(object) ## S4 method for signature 'Moanin' show(object) ## S4 method for signature 'Moanin,ANY,character,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'Moanin,ANY,logical,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'Moanin,ANY,numeric,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'Moanin' log_transform(object) ## S4 method for signature 'Moanin' get_log_data(object)
## S4 method for signature 'Moanin' group_variable_name(object) ## S4 replacement method for signature 'Moanin' group_variable_name(object) <- value ## S4 method for signature 'Moanin' time_by_group_variable(object) ## S4 method for signature 'Moanin' group_variable(object) ## S4 replacement method for signature 'Moanin' group_variable(object) <- value ## S4 method for signature 'Moanin' time_variable_name(object) ## S4 replacement method for signature 'Moanin' time_variable_name(object) <- value ## S4 method for signature 'Moanin' time_variable(object) ## S4 replacement method for signature 'Moanin' time_variable(object) <- value ## S4 method for signature 'Moanin' degrees_of_freedom(object) ## S4 method for signature 'Moanin' basis_matrix(object) ## S4 method for signature 'Moanin' spline_formula(object) ## S4 method for signature 'Moanin' show(object) ## S4 method for signature 'Moanin,ANY,character,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'Moanin,ANY,logical,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'Moanin,ANY,numeric,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'Moanin' log_transform(object) ## S4 method for signature 'Moanin' get_log_data(object)
object |
An object of class |
value |
replacement value |
x |
|
i , j
|
A vector of logical or integer subscripts, indicating the rows and
columns to be subsetted for |
... |
arguments passed to subsetting |
drop |
A logical scalar that is ignored. |
Note that when subsetting the data, the dendrogram information and the co-clustering matrix are lost.
group_variable_name
and time_variable_name
return the
name of the column containing the variable. group_variable
and
time_variable
return the actual variable.
# Load some data data(exampleData) moanin = create_moanin_model(data=testData,meta=testMeta) group_variable_name(moanin) time_variable_name(moanin)
# Load some data data(exampleData) moanin = create_moanin_model(data=testData,meta=testMeta) group_variable_name(moanin) time_variable_name(moanin)
Creates barplot of results of per-timepoint comparison
perWeek_barplot( de_results, type = c("qval", "pval"), labels = NULL, threshold = 0.05, xlab = "Timepoint", ylab = "Number of DE genes", main = "", ... )
perWeek_barplot( de_results, type = c("qval", "pval"), labels = NULL, threshold = 0.05, xlab = "Timepoint", ylab = "Number of DE genes", main = "", ... )
de_results |
results from |
type |
type of p-value to count ("qval" or "pval") |
labels |
labels to give each bar |
threshold |
cutoff for counting gene as DE |
xlab |
x-axis label |
ylab |
y-axis label |
main |
title of plot |
... |
arguments passed to |
create_timepoints_contrasts
creates the needed contrasts for
comparing two groups for every timepoint in the format needed for
DE_timepoints
(i.e. makeContrasts
, to which the
contrasts are ultimately passed). The time points are determined by the
meta data in the moanin_object
provided by the user.
This is a plotting function, and returns (invisibly) the results of
barplot
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) contrasts <- create_timepoints_contrasts(moanin, "C", "K") deTimepoints <- DE_timepoints(moanin, contrasts=contrasts, use_voom_weights=FALSE) perWeek_barplot(deTimepoints)
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) contrasts <- create_timepoints_contrasts(moanin, "C", "K") deTimepoints <- DE_timepoints(moanin, contrasts=contrasts, use_voom_weights=FALSE) perWeek_barplot(deTimepoints)
Methods for evaluating the consensus between sets of clusterings, usually in the context of subsetting of the data or different numbers of clusters.
plot_cdf_consensus(labels) get_auc_similarity_scores(labels, method = c("consensus", "nmi")) plot_model_explorer(labels, colors = rainbow(length(labels)))
plot_cdf_consensus(labels) get_auc_similarity_scores(labels, method = c("consensus", "nmi")) plot_model_explorer(labels, colors = rainbow(length(labels)))
labels |
a list. Each element of
the list is a matrix that gives the results of a clustering routine in each
column (see |
method |
method for calculation of similarity for the AUC measure, one of "consensus" or "nmi". See details. |
colors |
a vector of colors, of length equal to the length of
|
For each element of the list labels
,
plot_cdf_consensus
calculates the consensus between the clusterings
in the matrix, i.e. the number of times that pairs of rows are in the same
cluster for different clusterings (columns) of the matrix using the
consensus_matrix
function. Then the set of values (the N(N-1)
values in the upper triangle of the matrix), are converted into a cdf
function and plotted.
For each set of clusterings given by labels
(i.e. for each
matrix M
which is an element of the list labels
)
get_auc_similarity_scores
calculates a pairwise measure of
similarity between the columns of M
. These pairwise scores are
plotted against their rank, and the final AUC measure is the area under
this curve.
For method "consensus", the pairwise measure is given by calculating
the consensus matrix using consensus_matrix
with
scale=FALSE
. The consensus matrix is divided by the max of M
.
For method "nmi", the pairwise value is the NMI value between each
pair of columns of the matrix of clusterings using the
NMI
function.
plot_cdf_consensus
invisibily returns list of the upper
triangle values, with the list of same length as that of labels
.
get_auc_similarity_scores
returns a vector, equal to length
of the list labels
, giving the AUC value for each element of
labels
.
This function is a plotting function does not return anything
consensus_matrix
, NMI
,
plot_cdf_consensus
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) #small function to run splines_kmeans on subsample of 50 genes subsampleCluster<-function(){ ind<-sample(1:nrow(moanin),size=50) km<-splines_kmeans(moanin[ind,],n_clusters=3) assign<-splines_kmeans_score_and_label(moanin, km, proportion_genes_to_label=1.0)$label } kmClusters1=replicate(10,subsampleCluster()) kmClusters2=replicate(10,subsampleCluster()) # Note, because of the small number of replicates (10), # these plots are not representative of what to expect. out<-plot_cdf_consensus(labels=list(kmClusters1,kmClusters2)) get_auc_similarity_scores(list(kmClusters1,kmClusters2)) plot_model_explorer(list(kmClusters1,kmClusters2))
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) #small function to run splines_kmeans on subsample of 50 genes subsampleCluster<-function(){ ind<-sample(1:nrow(moanin),size=50) km<-splines_kmeans(moanin[ind,],n_clusters=3) assign<-splines_kmeans_score_and_label(moanin, km, proportion_genes_to_label=1.0)$label } kmClusters1=replicate(10,subsampleCluster()) kmClusters2=replicate(10,subsampleCluster()) # Note, because of the small number of replicates (10), # these plots are not representative of what to expect. out<-plot_cdf_consensus(labels=list(kmClusters1,kmClusters2)) get_auc_similarity_scores(list(kmClusters1,kmClusters2)) plot_model_explorer(list(kmClusters1,kmClusters2))
Plotting splines
## S4 method for signature 'Moanin,matrix' plot_splines_data( object, data, colors = NULL, smooth = FALSE, legend = TRUE, legendArgs = NULL, subset_conditions = NULL, subset_data = NULL, simpleY = TRUE, centroid = NULL, scale_centroid = c("toData", "toCentroid", "none"), mar = c(2.5, 2.5, 3, 1), mfrow = NULL, addToPlot = NULL, ylab = "", xaxis = TRUE, yaxis = TRUE, xlab = "Time", ... ) ## S4 method for signature 'Moanin,numeric' plot_splines_data(object, data, ...) ## S4 method for signature 'Moanin,data.frame' plot_splines_data(object, data, ...) ## S4 method for signature 'Moanin,DataFrame' plot_splines_data(object, data, ...) ## S4 method for signature 'Moanin,missing' plot_splines_data(object, data, ...)
## S4 method for signature 'Moanin,matrix' plot_splines_data( object, data, colors = NULL, smooth = FALSE, legend = TRUE, legendArgs = NULL, subset_conditions = NULL, subset_data = NULL, simpleY = TRUE, centroid = NULL, scale_centroid = c("toData", "toCentroid", "none"), mar = c(2.5, 2.5, 3, 1), mfrow = NULL, addToPlot = NULL, ylab = "", xaxis = TRUE, yaxis = TRUE, xlab = "Time", ... ) ## S4 method for signature 'Moanin,numeric' plot_splines_data(object, data, ...) ## S4 method for signature 'Moanin,data.frame' plot_splines_data(object, data, ...) ## S4 method for signature 'Moanin,DataFrame' plot_splines_data(object, data, ...) ## S4 method for signature 'Moanin,missing' plot_splines_data(object, data, ...)
object |
An object of class |
data |
matrix containing the data to be plotted, where each row of the
data provided will be plotted as a separate plot. If missing, will rely on
data in |
colors |
vector, optional, default NULL. Vector of colors |
smooth |
boolean, optional, default: FALSE. Whether to smooth the centroids or not. |
legend |
boolean whether to include a legend (default:TRUE) |
legendArgs |
list of arguments to be passed to legend command (if
|
subset_conditions |
list if provided, only plots the subset of conditions provided. Else, plots all conditions |
subset_data |
list if provided, only plots the subset of data (ie, the rows) provided. Can be any valid vector for subsetting a matrix. See details. |
simpleY |
boolean, if true, will plot all genes on same y-axis and minimize the annotation of the y axis to only label the axis in the exterior plots (the x-axis is always assumed to be the same across all plots and will always be simplified) |
centroid |
numeric vector (or matrix of 1 row) with data to use to fit
the splines. If |
scale_centroid |
determines whether the centroid data given in
|
mar |
vector of margins to set the space around each plot (see
|
mfrow |
a vector of integers of length 2 defining the grid of plots to be
created (see |
addToPlot |
A function that will be called after the plotting, allowing the user to add more to the plot. |
ylab |
label for the y-axis |
xaxis |
Logical, whether to add x-axis labels to plot (if FALSE can be manually created by user with call to addToPlot) |
yaxis |
Logical, whether to add y-axis labels to plot (if FALSE can be manually created by user with call to addToPlot) |
xlab |
label for the x-axis |
... |
arguments to be passed to the individual plot commands (Will be sent to all plot commands) |
If data
is NULL, the data plotted will be from
assay(object)
, after log-transformation if
log_transform(object)=TRUE
.
If centroid
is missing, then splines will be estimated (per
group) for the the data in data
– separately for each row of
data
. If centroid
is provided, this data will be used to plot
a spline function, and this same spline will be plotted for each row of
data
. This is useful, for example, in plotting cluster centroids
over a series of genes.
If the user set log_transform=TRUE
in the creation of the
Moanin
object, the data will be log transformed before plotting and
calculating the spline fits.
This function creates a plot and does not return anything to the user.
# First, load some data and create a moanin model data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta, degrees_of_freedom=6) # The moanin model contains all the information for plotting purposes. The # plot_splines_data will automatically fit the splines from the # information contained in the moanin model genes <- c("NM_001042489", "NM_008725") plot_splines_data(moanin, subset_data=genes, mfrow=c(2, 2)) # By default, same axis for all genes. Can change with 'simpleY=FALSE' plot_splines_data(moanin, subset_data=genes, smooth=TRUE, mfrow=c(2,2), simpleY=FALSE) # The splines can also be smoothed plot_splines_data(moanin, subset_data=genes, smooth=TRUE, mfrow=c(2, 2)) # You can provide different data (on same subjects), # instead of data in moanin object # (in which case moanin just provides grouping information) plot_splines_data(moanin, data=1/assay(moanin), subset_data=genes, smooth=TRUE, mfrow=c(2, 2)) # You can also provide data to use for fitting splines to argument # "centroid". This is helpful for overlaying centroids or predicted data # Here we do a silly example, just to demonstrate syntax, # where we use the data from the first gene as our centroid to fit a # spline estimate, but plot data from genes 3-4 plot_splines_data(moanin, centroid=assay(moanin[1,]), subset_data=3:4, smooth=TRUE, mfrow=c(2,2))
# First, load some data and create a moanin model data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta, degrees_of_freedom=6) # The moanin model contains all the information for plotting purposes. The # plot_splines_data will automatically fit the splines from the # information contained in the moanin model genes <- c("NM_001042489", "NM_008725") plot_splines_data(moanin, subset_data=genes, mfrow=c(2, 2)) # By default, same axis for all genes. Can change with 'simpleY=FALSE' plot_splines_data(moanin, subset_data=genes, smooth=TRUE, mfrow=c(2,2), simpleY=FALSE) # The splines can also be smoothed plot_splines_data(moanin, subset_data=genes, smooth=TRUE, mfrow=c(2, 2)) # You can provide different data (on same subjects), # instead of data in moanin object # (in which case moanin just provides grouping information) plot_splines_data(moanin, data=1/assay(moanin), subset_data=genes, smooth=TRUE, mfrow=c(2, 2)) # You can also provide data to use for fitting splines to argument # "centroid". This is helpful for overlaying centroids or predicted data # Here we do a silly example, just to demonstrate syntax, # where we use the data from the first gene as our centroid to fit a # spline estimate, but plot data from genes 3-4 plot_splines_data(moanin, centroid=assay(moanin[1,]), subset_data=3:4, smooth=TRUE, mfrow=c(2,2))
Combines all p-values per rows.
pvalues_fisher_method(pvalues)
pvalues_fisher_method(pvalues)
pvalues |
a matrix of pvalues, with columns corresponding to different tests or sources of p-values, and rows corresponding to the genes from which the p-values come. |
a vector of p-values, one for each row of pvalues
, that is the
result of Fisher's combined probability test applied to the p-values in
that row.
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) contrasts <- create_timepoints_contrasts(moanin,"C", "K") deTimepoints=DE_timepoints(moanin, contrasts=contrasts, use_voom_weights=FALSE) fisherPval=pvalues_fisher_method( deTimepoints[,grep("pval",colnames(deTimepoints))]) head(fisherPval)
data(exampleData) moanin <- create_moanin_model(data=testData,meta=testMeta) contrasts <- create_timepoints_contrasts(moanin,"C", "K") deTimepoints=DE_timepoints(moanin, contrasts=contrasts, use_voom_weights=FALSE) fisherPval=pvalues_fisher_method( deTimepoints[,grep("pval",colnames(deTimepoints))]) head(fisherPval)
Rescales rows of data to be between 0 and 1
## S4 method for signature 'Moanin' rescale_values(object, data = NULL, use_group = FALSE) ## S4 method for signature ''NULL'' rescale_values(object, data) ## S4 method for signature 'missing' rescale_values(object, ...)
## S4 method for signature 'Moanin' rescale_values(object, data = NULL, use_group = FALSE) ## S4 method for signature ''NULL'' rescale_values(object, data) ## S4 method for signature 'missing' rescale_values(object, ...)
object |
a object of class Moanin, only needed if choose to rescale by grouping variable in the moanin object. If NULL, then data will be rescaled jointly across all observations. |
data |
The matrix to rescale by row. If NULL, and |
use_group |
If true, then the data will be rescaled such that, for each
row, all values associated to each group (defined by grouping variable of
|
... |
arguments passed to the matrix or Moanin method. |
If the user set log_transform=TRUE
in the creation of the
Moanin
object, the data will be log transformed before rescaling
rescaled y, such that for each row, the values are comprised between
0 and 1. Note that if use_group=TRUE
and object
is not NULL,
the values associated to the columns of unique values of the grouping
variable of object
will be rescaled separately.
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) # Can rescale data in Moanin object allData <- rescale_values(moanin) # Or provide different data and/or rescale within grouping variable smallData <- rescale_values(moanin, data=testData[1:10,], use_group=TRUE)
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) # Can rescale data in Moanin object allData <- rescale_values(moanin) # Or provide different data and/or rescale within grouping variable smallData <- rescale_values(moanin, data=testData[1:10,], use_group=TRUE)
Performs splines clustering using K-means
## S4 method for signature 'Moanin' splines_kmeans( object, n_clusters = 10, init = "kmeans++", n_init = 10, max_iter = 300, random_seed = .Random.seed[1], fit_splines = TRUE, rescale = TRUE )
## S4 method for signature 'Moanin' splines_kmeans( object, n_clusters = 10, init = "kmeans++", n_init = 10, max_iter = 300, random_seed = .Random.seed[1], fit_splines = TRUE, rescale = TRUE )
object |
An object of class |
n_clusters |
int optional, default: 10 |
init |
["kmeans++", "random", "optimal_init"] |
n_init |
int, optional, default: 10 Number of initialization to perform. |
max_iter |
int, optional, default: 300 Maximum number of iteration to perform |
random_seed |
int, optional, default: NULL.
Passed to argument |
fit_splines |
boolean, optional, default: TRUE Whether to fit splines or not. |
rescale |
boolean, optional, default: TRUE Whether to rescale the data or not. |
If Moanin
object's slot has log_transform=TRUE, then the data will be transformed by the function log(x+1) before applying splines and clustering.
A list in the format returned by KMeans_rcpp
,
with the following elements added or changed:
centroids
The centroids are rescaled so that they range from
0-1
fit_splines
Logical, the value of fit_splines
given to the
function
rescale
The value of rescale
given to the function
data(exampleData) # Use the default options moanin <- create_moanin_model(data=testData, meta=testMeta) out <- splines_kmeans( moanin,n_clusters=5) table(out$clusters)
data(exampleData) # Use the default options moanin <- create_moanin_model(data=testData, meta=testMeta) out <- splines_kmeans( moanin,n_clusters=5) table(out$clusters)
Assign score and labels from raw data
## S4 method for signature 'Moanin' splines_kmeans_predict( object, kmeans_clusters, data = NULL, method = c("distance", "goodnessOfFit"), ... ) ## S4 method for signature 'Moanin' splines_kmeans_score_and_label( object, kmeans_clusters, data = NULL, proportion_genes_to_label = 0.5, max_score = NULL, previous_scores = NULL, rescale_separately = FALSE )
## S4 method for signature 'Moanin' splines_kmeans_predict( object, kmeans_clusters, data = NULL, method = c("distance", "goodnessOfFit"), ... ) ## S4 method for signature 'Moanin' splines_kmeans_score_and_label( object, kmeans_clusters, data = NULL, proportion_genes_to_label = 0.5, max_score = NULL, previous_scores = NULL, rescale_separately = FALSE )
object |
the Moanin object that contains the basis functions used in creating the clusters |
kmeans_clusters |
List returned by |
data |
the data to predict. If not given, will use |
method |
If "distance", predicts based on distance of data to kmeans
centroids. If "goodnessOfFit", is a wrapper to
|
... |
arguments passed to |
proportion_genes_to_label |
float, optional, default: 0.5 Percentage of genes to label. If max_score is provided, will label genes that are either in the top 'proportion_genes_to_label' or with a score below 'max_score'. |
max_score |
optional, default: Null When provided, will only label genes below that score. If NULL, ignore this option. |
previous_scores |
matrix of scores, optional. Allows user to give the
matrix scores results from a previous run of
|
rescale_separately |
logical, whether to score separately within grouping variable |
splines_kmeans_predict
returns a vector giving the labels for
the given data.
A list consisting of
labels
the label or cluster assigned to each gene based on the
cluster with the best (i.e. lowest) score, with no label given to genes that
do not have a score lower than a specified quantity
scores
the matrix of size n_cluster x n_genes, containing for
each gene and each cluster, the goodness of fit score
score_cutoff
The required cutoff for a gene receiving an
assignment
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) # Cluster on a subset of genes kmClusters=splines_kmeans(moanin[1:50,],n_clusters=3) # get scores on all genes scores_and_labels <- splines_kmeans_score_and_label(object=moanin, kmClusters) head(scores_and_labels$scores) head(scores_and_labels$labels) # should be same as above, only just the assignments predictLabels1 <- splines_kmeans_predict(object=moanin, kmClusters, method="goodnessOfFit") # Instead use distance to centroid: predictLabels2 <- splines_kmeans_predict(object=moanin, kmClusters, method="distance")
data(exampleData) moanin <- create_moanin_model(data=testData, meta=testMeta) # Cluster on a subset of genes kmClusters=splines_kmeans(moanin[1:50,],n_clusters=3) # get scores on all genes scores_and_labels <- splines_kmeans_score_and_label(object=moanin, kmClusters) head(scores_and_labels$scores) head(scores_and_labels$labels) # should be same as above, only just the assignments predictLabels1 <- splines_kmeans_predict(object=moanin, kmClusters, method="goodnessOfFit") # Instead use distance to centroid: predictLabels2 <- splines_kmeans_predict(object=moanin, kmClusters, method="distance")