Title: | Time-Course Multi-Omics data integration |
---|---|
Description: | timeOmics is a generic data-driven framework to integrate multi-Omics longitudinal data measured on the same biological samples and select key temporal features with strong associations within the same sample group. The main steps of timeOmics are: 1. Plaform and time-specific normalization and filtering steps; 2. Modelling each biological into one time expression profile; 3. Clustering features with the same expression profile over time; 4. Post-hoc validation step. |
Authors: | Antoine Bodein [aut, cre], Olivier Chapleur [aut], Kim-Anh Le Cao [aut], Arnaud Droit [aut] |
Maintainer: | Antoine Bodein <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-12-03 06:00:02 UTC |
Source: | https://github.com/bioc/timeOmics |
Compute the spearman dissimilarity distance.
dmatrix.spearman.dissimilarity(X)
dmatrix.spearman.dissimilarity(X)
X |
A numeric matrix with feature in colnames |
Return a dissimilarity matrix of size PxP.
Generates random data to be used in examples.
get_demo_cluster()
get_demo_cluster()
a list containg:
X |
data.frame |
Y |
data.frame |
Z |
data.frame |
pca |
a mixOmics pca result |
spca |
a mixOmics spca result |
pls |
a mixOmics pls result |
spls |
a mixOmics spls result |
block.pls |
a mixOmics block.pls result |
block.spls |
a mixOmics block.spls result |
# Random data could lead to "The SGCCA algorithm did not converge" warning which is not important for a demo demo <- suppressWarnings(get_demo_cluster())
# Random data could lead to "The SGCCA algorithm did not converge" warning which is not important for a demo demo <- suppressWarnings(get_demo_cluster())
Get data for silhouette demo
get_demo_silhouette()
get_demo_silhouette()
A matrix of expression profile, sample in raws, time in columns.
data <- get_demo_silhouette()
data <- get_demo_silhouette()
This function returns the cluster associated to each feature from a mixOmics object.
getCluster(X, user.block = NULL, user.cluster = NULL)
getCluster(X, user.block = NULL, user.cluster = NULL)
X |
an object of the class: |
user.block |
a vector to filter the result and return the features of the specified blocks. |
user.cluster |
a vector to filter the result and return only the features of the specified clusters |
For each feature, the cluster is assigned according to the maximum contribution on a component and the sign of that contribution.
A data.frame containing the name of feature, its assigned cluster and other information such as selected component, contribution, sign, ...
demo <- suppressWarnings(get_demo_cluster()) pca.cluster <- getCluster(demo$pca) spca.cluster <- getCluster(demo$spca) pls.cluster <- getCluster(demo$pls) spls.cluster <- getCluster(demo$spls) block.pls.cluster <- getCluster(demo$block.pls) block.spls.cluster <- getCluster(demo$block.spls)
demo <- suppressWarnings(get_demo_cluster()) pca.cluster <- getCluster(demo$pca) spca.cluster <- getCluster(demo$spca) pls.cluster <- getCluster(demo$pls) spls.cluster <- getCluster(demo$spls) block.pls.cluster <- getCluster(demo$block.pls) block.spls.cluster <- getCluster(demo$block.spls)
Compute the average silhouette coefficient for a given set of components on a mixOmics result. Foreach given ncomp, the mixOmics method is performed with the sames arguments and the given 'ncomp'. Longitudinal clustering is performed and average silhouette coefficient is computed.
getNcomp(object, max.ncomp = NULL, X, Y = NULL, indY = NULL, ...)
getNcomp(object, max.ncomp = NULL, X, Y = NULL, indY = NULL, ...)
object |
A mixOmics object of the class 'pca', 'spca', 'mixo_pls', 'mixo_spls', 'block.pls', 'block.spls' |
max.ncomp |
integer, maximum number of component to include. If no argument is given, 'max.ncomp=object$ncomp' |
X |
a numeric matrix/data.frame or a list of data.frame for |
Y |
(only for |
indY |
(optional and only for |
... |
Other arguments to be passed to methods (pca, pls, block.pls) |
getNcomp
returns a list with class "ncomp.tune.silhouette" containing the following components:
ncomp |
a vector containing the tested ncomp |
silhouette |
a vector containing the average silhouette coefficient by ncomp |
dmatrix |
the distance matrix used to compute silhouette coefficient |
getCluster
, silhouette
, pca
, pls
, block.pls
# random input data demo <- suppressWarnings(get_demo_cluster()) # pca pca.res <- mixOmics::pca(X=demo$X, ncomp = 5) res.ncomp <- getNcomp(pca.res, max.ncomp = 4, X = demo$X) plot(res.ncomp) # pls pls.res <- mixOmics::pls(X=demo$X, Y=demo$Y) res.ncomp <- getNcomp(pls.res, max.ncomp = 4, X = demo$X, Y=demo$Y) plot(res.ncomp) # block.pls block.pls.res <- suppressWarnings(mixOmics::block.pls(X=list(X=demo$X, Z=demo$Z), Y=demo$Y)) res.ncomp <- suppressWarnings(getNcomp(block.pls.res, max.ncomp = 4, X=list(X=demo$X, Z=demo$Z), Y=demo$Y)) plot(res.ncomp)
# random input data demo <- suppressWarnings(get_demo_cluster()) # pca pca.res <- mixOmics::pca(X=demo$X, ncomp = 5) res.ncomp <- getNcomp(pca.res, max.ncomp = 4, X = demo$X) plot(res.ncomp) # pls pls.res <- mixOmics::pls(X=demo$X, Y=demo$Y) res.ncomp <- getNcomp(pls.res, max.ncomp = 4, X = demo$X, Y=demo$Y) plot(res.ncomp) # block.pls block.pls.res <- suppressWarnings(mixOmics::block.pls(X=list(X=demo$X, Z=demo$Z), Y=demo$Y)) res.ncomp <- suppressWarnings(getNcomp(block.pls.res, max.ncomp = 4, X=list(X=demo$X, Z=demo$Z), Y=demo$Y)) plot(res.ncomp)
getSilhouette
is a generic function that compute silhouette coefficient
for an object of the type pca
, spca
, pls
, spls
,
block.pls
, block.spls
.
getSilhouette(object)
getSilhouette(object)
object |
a mixOmics object of the class |
This method extract the componant contribution depending on the object, perform the clustering step, and compute the silhouette coefficient.
silhouette coefficient
demo <- suppressWarnings(get_demo_cluster()) getSilhouette(object = demo$pca) getSilhouette(object = demo$spca) getSilhouette(object = demo$pls) getSilhouette(object = demo$spls) getSilhouette(object = demo$block.pls) getSilhouette(object = demo$block.spls)
demo <- suppressWarnings(get_demo_cluster()) getSilhouette(object = demo$pca) getSilhouette(object = demo$spca) getSilhouette(object = demo$pls) getSilhouette(object = demo$spls) getSilhouette(object = demo$block.pls) getSilhouette(object = demo$block.spls)
Performs a clustering based on the signs of variation between 2 timepoints. Optionally, if the difference between 2 timepoints is lower than a given threshold, the returned difference will be 0.
getUpDownCluster(X, diff_threshold = 0)
getUpDownCluster(X, diff_threshold = 0)
X |
a dataframe or list of dataframe with the same number of rows. |
diff_threshold |
a number (optional, default 0), if the difference between 2 values is lower than the threshold, the returned sign will be 0 (no variation). |
demo <- suppressWarnings(get_demo_cluster()) X <- list(X = demo$X, Y = demo$Y, Z = demo$Z) res <- getUpDownCluster(X) class(res) getCluster(res) X <- demo$X res <- getUpDownCluster(X) res <- getUpDownCluster(X, diff_threshold = 15) res_cluster <- getCluster(res)
demo <- suppressWarnings(get_demo_cluster()) X <- list(X = demo$X, Y = demo$Y, Z = demo$Z) res <- getUpDownCluster(X) class(res) getCluster(res) X <- demo$X res <- getUpDownCluster(X) res <- getUpDownCluster(X, diff_threshold = 15) res_cluster <- getCluster(res)
This function filters linear models with highly heterogeneous variability within residues. From an "lmms" output, 2 parameters are tested:
lmms.filter.lines( data, lmms.obj, time, homoskedasticity = TRUE, MSE.filter = TRUE, homoskedasticity.cutoff = 0.05 )
lmms.filter.lines( data, lmms.obj, time, homoskedasticity = TRUE, MSE.filter = TRUE, homoskedasticity.cutoff = 0.05 )
data |
a data.frame used in the |
lmms.obj |
a |
time |
a numeric vector containing the sample time point information. |
homoskedasticity |
a logical whether or not to test for homoscedasticity with the Breusch-Pagan test. |
MSE.filter |
whether or not to test for low dispersion with a cutoff on the MSE. |
homoskedasticity.cutoff |
a numeric scalar between 0 and 1, p-value threshold for B-P test. |
* homo-sedasticity of the residues with a Breusch-Pagan test * low dispersion with a cutoff on the MSE (mean squared error)
a list containing the following items
filtering.summary |
a data.frame with the different tests per features (passed = TRUE, failed = FALSE) |
to.keep |
features which passed all the tests |
filtered |
the filtered data.frame |
# data and lmms output data(timeOmics.simdata) data <- timeOmics.simdata$sim lmms.output <- timeOmics.simdata$lmms.output time <- timeOmics.simdata$time # filter filter.res <- lmms.filter.lines(data = data, lmms.obj = lmms.output, time = time)
# data and lmms output data(timeOmics.simdata) data <- timeOmics.simdata$sim lmms.output <- timeOmics.simdata$lmms.output time <- timeOmics.simdata$time # filter filter.res <- lmms.filter.lines(data = data, lmms.obj = lmms.output, time = time)
This function provides a expression profile representation over time and by cluster.
plotLong( object, time = NULL, plot = TRUE, center = TRUE, scale = TRUE, title = "Time-course Expression", X.label = NULL, Y.label = NULL, legend = FALSE, legend.title = NULL, legend.block.name = NULL )
plotLong( object, time = NULL, plot = TRUE, center = TRUE, scale = TRUE, title = "Time-course Expression", X.label = NULL, Y.label = NULL, legend = FALSE, legend.title = NULL, legend.block.name = NULL )
object |
a mixOmics result of class (s)pca, (s)pls, block.(s)pls. |
time |
(optional) a numeric vector, the same size as |
plot |
a logical, if TRUE then a plot is produced. Otherwise, the data.frame on which the plot is based on is returned. |
center |
a logical value indicating whether the variables should be shifted to be zero centered. |
scale |
a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. |
title |
character indicating the title plot. |
X.label |
x axis titles. |
Y.label |
y axis titles. |
legend |
a logical, to display or not the legend. |
legend.title |
if |
legend.block.name |
a character vector corresponding to the size of the number of blocks in the mixOmics object. |
a data.frame (gathered form) containing the following columns:
time |
x axis values |
molecule |
names of features |
value |
y axis values |
cluster |
assigned clusters |
block |
name of 'blocks' |
demo <- suppressWarnings(get_demo_cluster()) X <- demo$X Y <- demo$Y Z <- demo$Z # (s)pca pca.res <- mixOmics::pca(X, ncomp = 3) plotLong(pca.res) spca.res <- mixOmics::spca(X, ncomp =2, keepX = c(15, 10)) plotLong(spca.res) # (s)pls pls.res <- mixOmics::pls(X,Y) plotLong(pls.res) spls.res <- mixOmics::spls(X,Y, keepX = c(15,10), keepY=c(5,6)) plotLong(spls.res) # (s)block.spls block.pls.res <- mixOmics::block.pls(X=list(X=X,Z=Z), Y=Y) plotLong(block.pls.res) block.spls.res <- mixOmics::block.spls(X=list(X=X,Z=Z), Y=Y, keepX = list(X = c(15,10), Z = c(5,6)), keepY = c(3,6)) plotLong(block.spls.res)
demo <- suppressWarnings(get_demo_cluster()) X <- demo$X Y <- demo$Y Z <- demo$Z # (s)pca pca.res <- mixOmics::pca(X, ncomp = 3) plotLong(pca.res) spca.res <- mixOmics::spca(X, ncomp =2, keepX = c(15, 10)) plotLong(spca.res) # (s)pls pls.res <- mixOmics::pls(X,Y) plotLong(pls.res) spls.res <- mixOmics::spls(X,Y, keepX = c(15,10), keepY=c(5,6)) plotLong(spls.res) # (s)block.spls block.pls.res <- mixOmics::block.pls(X=list(X=X,Z=Z), Y=Y) plotLong(block.pls.res) block.spls.res <- mixOmics::block.spls(X=list(X=X,Z=Z), Y=Y, keepX = list(X = c(15,10), Z = c(5,6)), keepY = c(3,6)) plotLong(block.spls.res)
proportionality
is a wrapper that compute proportionality distance for
a clustering result (pca
, spca
, pls
, spls
, block.pls
, block.spls
).
and it performs a u-test to compare the median within a cluster to the median of the entire background set.
proportionality(X)
proportionality(X)
X |
an object of the class: |
Return a list containing the following components:
propr.distance |
Square matrix with proportionality distance between pairs of features |
propr.distance.w.cluster |
distance between pairs with cluster label |
pvalue |
Wilcoxon U-test p-value comparing the medians within clusters and with the entire background set |
Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S., Bähler, J. (2015). Proportionality: a valid alternative to correlation for relative data. PLoS Comput. Biol. 11, e1004075. doi: 10.1371/journal.pcbi.1004075
Quinn, T. P., Richardson, M. F., Lovell, D., Crowley, T. M. (2017). propr: an r-package for identifying proportionally abundant features using compositional data analysis. Sci. Rep. 7, 16252. doi: 10.1038/s41598-017-16520-0
demo <- suppressWarnings(get_demo_cluster()) # pca X <- demo$pca propr.res <- proportionality(X) plot(propr.res) # pls X <- demo$spls propr.res <- proportionality(X) plot(propr.res) # block.pls X <- demo$block.spls propr.res <- proportionality(X) plot(propr.res)
demo <- suppressWarnings(get_demo_cluster()) # pca X <- demo$pca propr.res <- proportionality(X) plot(propr.res) # pls X <- demo$spls propr.res <- proportionality(X) plot(propr.res) # block.pls X <- demo$block.spls propr.res <- proportionality(X) plot(propr.res)
remove.low.cv
that removes variables with low variation.
From a matrix/data.frame (samples in rows, features in columns), it computes the coefficient of variation for every features (columns)
and return a filtered data.frame with features for which the coefficient of variation is above a given threshold.
remove.low.cv(X, cutoff = 0.5)
remove.low.cv(X, cutoff = 0.5)
X |
a matrix/data.frame |
cutoff |
a numeric value |
a data.frame/matrix
mat <- matrix(sample(1:3, size = 200, replace = TRUE), ncol=20) remove.low.cv(mat, 0.4)
mat <- matrix(sample(1:3, size = 200, replace = TRUE), ncol=20) remove.low.cv(mat, 0.4)
This function identify the number of feautures to keep per component and thus by cluster in mixOmics::block.spls
by optimizing the silhouette coefficient, which assesses the quality of clustering.
tuneCluster.block.spls( X, Y = NULL, indY = NULL, ncomp = 2, test.list.keepX = NULL, test.keepY = NULL, ... )
tuneCluster.block.spls( X, Y = NULL, indY = NULL, ncomp = 2, test.list.keepX = NULL, test.keepY = NULL, ... )
X |
list of numeric matrix (or data.frame) with features in columns and samples in rows (with samples order matching in all data sets). |
Y |
(optional) numeric matrix (or data.frame) with features in columns and samples in rows (same rows as |
indY |
integer, to supply if Y is missing, indicates the position of the matrix response in the list |
ncomp |
integer, number of component to include in the model |
test.list.keepX |
list of integers with the same size as X. Each entry corresponds to the different keepX value to test for each block of |
test.keepY |
only if Y is provideid. Vector of integer containing the different value of keepY to test for block |
... |
other parameters to be included in the spls model (see |
For each component and for each keepX/keepY value, a spls is done from these parameters. Then the clustering is performed and the silhouette coefficient is calculated for this clustering.
We then calculate "slopes" where keepX/keepY are the coordinates and the silhouette is the intensity. A z-score is assigned to each slope. We then identify the most significant slope which indicates a drop in the silhouette coefficient and thus a deterioration of the clustering.
silhouette |
silhouette coef. computed for every combinasion of keepX/keepY |
ncomp |
number of component included in the model |
test.keepX |
list of tested keepX |
test.keepY |
list of tested keepY |
block |
names of blocks |
slopes |
"slopes" computed from the silhouette coef. for each keepX and keepY, used to determine the best keepX and keepY |
choice.keepX |
best |
choice.keepY |
best |
block.spls
, getCluster
, plotLong
demo <- suppressWarnings(get_demo_cluster()) X <- list(X = demo$X, Z = demo$Z) Y <- demo$Y test.list.keepX <- list("X" = c(5,10,15,20), "Z" = c(2,4,6,8)) test.keepY <- c(2:5) # tuning tune.block.spls <- tuneCluster.block.spls(X= X, Y= Y, test.list.keepX= test.list.keepX, test.keepY= test.keepY, mode= "canonical") keepX <- tune.block.spls$choice.keepX keepY <- tune.block.spls$choice.keepY # final model block.spls.res <- mixOmics::block.spls(X= X, Y= Y, keepX = keepX, keepY = keepY, ncomp = 2, mode = "canonical") # get clusters and plot longitudinal profile by cluster block.spls.cluster <- getCluster(block.spls.res)
demo <- suppressWarnings(get_demo_cluster()) X <- list(X = demo$X, Z = demo$Z) Y <- demo$Y test.list.keepX <- list("X" = c(5,10,15,20), "Z" = c(2,4,6,8)) test.keepY <- c(2:5) # tuning tune.block.spls <- tuneCluster.block.spls(X= X, Y= Y, test.list.keepX= test.list.keepX, test.keepY= test.keepY, mode= "canonical") keepX <- tune.block.spls$choice.keepX keepY <- tune.block.spls$choice.keepY # final model block.spls.res <- mixOmics::block.spls(X= X, Y= Y, keepX = keepX, keepY = keepY, ncomp = 2, mode = "canonical") # get clusters and plot longitudinal profile by cluster block.spls.cluster <- getCluster(block.spls.res)
This function identify the number of feautures to keep per component and thus by cluster in mixOmics::spca
by optimizing the silhouette coefficient, which assesses the quality of clustering.
tuneCluster.spca(X, ncomp = 2, test.keepX = rep(ncol(X), ncomp), ...)
tuneCluster.spca(X, ncomp = 2, test.keepX = rep(ncol(X), ncomp), ...)
X |
numeric matrix (or data.frame) with features in columns and samples in rows |
ncomp |
integer, number of component to include in the model |
test.keepX |
vector of integer containing the different value of keepX to test for block |
... |
other parameters to be included in the spls model (see |
For each component and for each keepX value, a spls is done from these parameters. Then the clustering is performed and the silhouette coefficient is calculated for this clustering.
We then calculate "slopes" where keepX are the coordinates and the silhouette is the intensity. A z-score is assigned to each slope. We then identify the most significant slope which indicates a drop in the silhouette coefficient and thus a deterioration of the clustering.
silhouette |
silhouette coef. computed for every combinasion of keepX/keepY |
ncomp |
number of component included in the model |
test.keepX |
list of tested keepX |
block |
names of blocks |
slopes |
"slopes" computed from the silhouette coef. for each keepX and keepY, used to determine the best keepX and keepY |
choice.keepX |
best |
demo <- suppressWarnings(get_demo_cluster()) X <- demo$X # tuning tune.spca.res <- tuneCluster.spca(X = X, ncomp = 2, test.keepX = c(2:10)) keepX <- tune.spca.res$choice.keepX plot(tune.spca.res) # final model spca.res <- mixOmics::spca(X=X, ncomp = 2, keepX = keepX) plotLong(spca.res)
demo <- suppressWarnings(get_demo_cluster()) X <- demo$X # tuning tune.spca.res <- tuneCluster.spca(X = X, ncomp = 2, test.keepX = c(2:10)) keepX <- tune.spca.res$choice.keepX plot(tune.spca.res) # final model spca.res <- mixOmics::spca(X=X, ncomp = 2, keepX = keepX) plotLong(spca.res)
This function identify the number of feautures to keep per component and thus by cluster in mixOmics::spls
by optimizing the silhouette coefficient, which assesses the quality of clustering.
tuneCluster.spls( X, Y, ncomp = 2, test.keepX = rep(ncol(X), ncomp), test.keepY = rep(ncol(Y), ncomp), ... )
tuneCluster.spls( X, Y, ncomp = 2, test.keepX = rep(ncol(X), ncomp), test.keepY = rep(ncol(Y), ncomp), ... )
X |
numeric matrix (or data.frame) with features in columns and samples in rows |
Y |
numeric matrix (or data.frame) with features in columns and samples in rows (same rows as |
ncomp |
integer, number of component to include in the model |
test.keepX |
vector of integer containing the different value of keepX to test for block |
test.keepY |
vector of integer containing the different value of keepY to test for block |
... |
other parameters to be included in the spls model (see |
For each component and for each keepX/keepY value, a spls is done from these parameters. Then the clustering is performed and the silhouette coefficient is calculated for this clustering.
We then calculate "slopes" where keepX/keepY are the coordinates and the silhouette is the intensity. A z-score is assigned to each slope. We then identify the most significant slope which indicates a drop in the silhouette coefficient and thus a deterioration of the clustering.
silhouette |
silhouette coef. computed for every combinasion of keepX/keepY |
ncomp |
number of component included in the model |
test.keepX |
list of tested keepX |
test.keepY |
list of tested keepY |
block |
names of blocks |
slopes |
"slopes" computed from the silhouette coef. for each keepX and keepY, used to determine the best keepX and keepY |
choice.keepX |
best |
choice.keepY |
best |
demo <- suppressWarnings(get_demo_cluster()) X <- demo$X Y <- demo$Y # tuning tune.spls <- tuneCluster.spls(X, Y, ncomp= 2, test.keepX= c(5,10,15,20), test.keepY= c(2,4,6)) keepX <- tune.spls$choice.keepX keepY <- tune.spls$choice.keepY # final model spls.res <- mixOmics::spls(X, Y, ncomp= 2, keepX= keepX, keepY= keepY) # get clusters and plot longitudinal profile by cluster spls.cluster <- getCluster(spls.res) plotLong(spls.res)
demo <- suppressWarnings(get_demo_cluster()) X <- demo$X Y <- demo$Y # tuning tune.spls <- tuneCluster.spls(X, Y, ncomp= 2, test.keepX= c(5,10,15,20), test.keepY= c(2,4,6)) keepX <- tune.spls$choice.keepX keepY <- tune.spls$choice.keepY # final model spls.res <- mixOmics::spls(X, Y, ncomp= 2, keepX= keepX, keepY= keepY) # get clusters and plot longitudinal profile by cluster spls.cluster <- getCluster(spls.res) plotLong(spls.res)
unscale
is a generic function that unscale and/or uncenter the columns
of a matrix generated by the scale base function
unscale(x)
unscale(x)
x |
A numeric matrix. |
unscale
uses attributes added by the scale function "scaled:scale" and
"scaled:center" and use these scaling factor to retrieve the initial matrix.
It first unscales and then uncenters.
Return a matrix, uncenterd and unscaled. Attributes "scaled:center" and "scaled:scale" are removed.
X <- matrix(1:9, ncol = 3) X.scale <- scale(X, center = TRUE, scale = TRUE) X.unscale <- unscale(X.scale) all(X == X.unscale)
X <- matrix(1:9, ncol = 3) X.scale <- scale(X, center = TRUE, scale = TRUE) X.unscale <- unscale(X.scale) all(X == X.unscale)