Title: | Multiple omics data integrative clustering and gene set analysis |
---|---|
Description: | This package provide a method for doing gene set analysis based on multiple omics data. |
Authors: | Chen Meng |
Maintainer: | Chen Meng <[email protected]> |
License: | GPL-2 |
Version: | 1.41.0 |
Built: | 2024-12-19 03:38:50 UTC |
Source: | https://github.com/bioc/mogsa |
Modern "omics" technologies enable quantitative monitoring of the abundance of various biological molecules in a high-throughput manner, accumulating an unprecedented amount of quantitative information on a genomic scale. Gene set analysis is a particularly useful method in high throughput data analysis since it can summarize single gene level information into the biological informative gene set levels. This package provide a method do the gene set analysis based on multiple omics data that describing the same set of observations/samples.
Package: | mogsa |
Type: | Package |
Version: | 1.3.1 |
Date: | 2016-01-19 |
License: | GPL-2 |
Depends: | methods |
The main function in the package is "mogsa", see the function help manu for more details.
Chen Meng Maintainer: Chen Meng <[email protected]>
Chen Meng, Dominic Helm, Martin Frejno, and Bernhard Kuster. moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets. Journal of Proteome Research 2016.
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # using a list of data.frame as input mgsa1 <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # using moa as input ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=3) mgsa2 <- mogsa(x = ana, sup=NCI60_4array_supdata, nf=9) mgsa3 <- mogsa(x = ana, sup=smoa)
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # using a list of data.frame as input mgsa1 <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # using moa as input ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=3) mgsa2 <- mogsa(x = ana, sup=NCI60_4array_supdata, nf=9) mgsa3 <- mogsa(x = ana, sup=smoa)
Retrive variables/features (genes) mapped to the annotated data sets in a gene set. Also returns the the information about presence and absence of a feature for a specific data set.
annotate.gs(mgsa, gs)
annotate.gs(mgsa, gs)
mgsa |
An object of class |
gs |
The name of a geneset |
This function returns a data.frame. The first column shows the name of features. The last column is for the count of how many data sets has the corresponding features. Columns in the middle contains logical value indicating whether a feature is presented in a particular data set.
Chen Meng
see GIS
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) annotate.gs(mgsa, allgs[1])
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) annotate.gs(mgsa, allgs[1])
An internal function called by mbpca
.
biSoftK(x, maxiter, kp, kt, weight.p, weight.t, pos = FALSE, unit.pb = TRUE, unit.tb = FALSE)
biSoftK(x, maxiter, kp, kt, weight.p, weight.t, pos = FALSE, unit.pb = TRUE, unit.tb = FALSE)
x |
The input matrix, rows are observations, columns are variables |
maxiter |
Number of maximum interation the algorithm can run |
kp |
The number (>=1) or proportion (<1) of variables want to keep. It could be a single value or a vector has the same length as x so the sparsity of individual matrix could be different. |
kt |
The number (>=1) or proportion (<1) of non-zero scores for obvservations. |
weight.p |
The weight of variables. It could be 1) a vector has the same length as x, one value for each table/block; 2) one number, all variables share the same weight or 3) a list of vectors, the length of each vector should be the same with the columns numbers of the corresponding table/block, so every variables has a unique weight. |
weight.t |
The weight for observation. For accepted values or formats, see weight.p. |
pos |
Logical value, if only non-negaitve values in the loading and score vectors. |
unit.pb |
Logical value, whether the length of table/block loading should be unit length. |
unit.tb |
Logical value, whether the length of table/block score should be unit length. |
This function also use the NIPALS algorithm, but it generalized nipalsSoftK from several aspects: 1. Allowing sparsity on both columns and rows of matrices 2. Allowing weights for columns and rows 3. Allowing loading and/or score vectors of blocks to be unit length 4. Allowing only positive number in loading and score vectors
an list
object contains the following elements:
tb
- the block scores
pb
- the block loadings
t
- the global scores
w
- the wegihts of block scores to construct the global score.
Chen Meng
Bootstrap mbpca to estimate the coherence of different data sets and estimate the number of components should be included in an analysis.
bootMbpca(moa, mc.cores = 1, B = 100, replace = TRUE, resample = c("sample", "gene", "total"), log = "y", ncomp = NULL, method = NULL, maxiter = 1000, svd.solver = c("svd", "fast.svd", "propack"), plot = TRUE)
bootMbpca(moa, mc.cores = 1, B = 100, replace = TRUE, resample = c("sample", "gene", "total"), log = "y", ncomp = NULL, method = NULL, maxiter = 1000, svd.solver = c("svd", "fast.svd", "propack"), plot = TRUE)
moa |
|
mc.cores |
Integer; number of cores used in bootstrap. This value is passed to function |
B |
Integer; number of bootstrap |
replace |
Logical; sampling with or without replacement |
resample |
Could be one of "sample", "gene" or "total". "sample" and "gene" means sample-wise and variable-wise resampling, repectively. "total" means total resampling. |
log |
Could be "x", "y" or "xy" for plot log axis |
ncomp |
Passed to function |
method |
Passed to function |
maxiter |
Passed to function |
svd.solver |
Passed to function |
plot |
Logical; whether the result should be plotted. |
Bootstrap method were used to determine the components that are presenting significant concordant structure between datasets.
It returns a matrix, columns are eigenvalues for different components. Each rows is a bootstramp sample.
Chen Meng
# see examples in \code{\link{mbpca}}
# see examples in \code{\link{mbpca}}
bootMbpca
.
An internal function called by bootMbpca
.
bootMbpcaK(data, replace, B = 100, mc.cores = 1, resample = c("sample", "total", "gene"), ncomp, method, k, center = FALSE, scale = FALSE, option = "uniform", maxiter = 1000, svd.solver = c("svd", "fast.svd", "propack"))
bootMbpcaK(data, replace, B = 100, mc.cores = 1, resample = c("sample", "total", "gene"), ncomp, method, k, center = FALSE, scale = FALSE, option = "uniform", maxiter = 1000, svd.solver = c("svd", "fast.svd", "propack"))
data |
A |
replace |
A logical variable to indicate sampling with or without replacement |
B |
Integer; number of bootstrap. |
mc.cores |
Integer; number of cores used in bootstrap. This value is passed to function mclapply |
resample |
Could be one of "sample", "gene" or "total". "sample" and "gene" means sample-wise and variable-wise resampling, repectively. "total" means total resampling. |
ncomp |
passed to |
method |
passed to |
k |
passed to |
center |
passed to |
scale |
passed to |
option |
passed to |
maxiter |
passed to |
svd.solver |
passed to |
A matrix of mbpca eigenvalues resulted from bootstrap samples
Chen Meng
Using bootstrap method to extract the components representing significant concordance structures between datasets from "moa" (returned by function "moa").
bootMoa(moa, proc.row="center_ssq1", w.data="inertia", w.row=NULL, statis=FALSE, mc.cores=1, B = 100, replace=TRUE, resample=c("sample", "gene", "total"), plot=TRUE, log="y", tol = 1e-7)
bootMoa(moa, proc.row="center_ssq1", w.data="inertia", w.row=NULL, statis=FALSE, mc.cores=1, B = 100, replace=TRUE, resample=c("sample", "gene", "total"), plot=TRUE, log="y", tol = 1e-7)
moa |
|
proc.row |
Preprocessing of rows of datasets, should be one of
|
w.data |
The weights of each separate dataset, should be one of
or |
w.row |
If it is not null, it should be a list of positive numerical vectors, the length of which should be the same with the number of rows of each dataset to indicated the weight of rows of datasets. |
statis |
A logical indicates whether STATIS method should be used. See details. |
mc.cores |
Integer; number of cores used in bootstrap. This value is passed to function |
B |
Integer; number of bootstrap |
replace |
Logical; sampling with or without replacement |
resample |
Could be one of "sample", "gene" or "total". "sample" and "gene" means sample-wise and variable-wise resampling, repectively. "total" means total resampling. |
plot |
Logical; whether the result should be plotted. |
log |
Could be "x", "y" or "xy" for plot log axis. |
tol |
The minimum eigenvalues shown in the plot. |
set plot=TRUE to help selecting significant components.
A matrix where columns are components and rows are variance of PCs from bootstrap samples.
Chen Meng
Herve Abdi, Lynne J. Williams, Domininique Valentin and Mohammed Bennani-Dosse. STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. WIREs Comput Stat 2012. Volume 4, Issue 2, pages 124-167 Herve Abdi, Lynne J. Williams, Domininique Valentin. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Comput Stat 2013
moa
, sup.moa
, mogsa
. More about plot see moa-class
.
# see function moa
# see function moa
boxplot to show the variables (e.g. gene expression) of a gene set across all samples.
box.gs.feature(x, gs, moa = NULL, col = 1, layout = NULL, plot = TRUE, obs.order = NULL, ...)
box.gs.feature(x, gs, moa = NULL, col = 1, layout = NULL, plot = TRUE, obs.order = NULL, ...)
x |
An object of calss |
gs |
Gene set want to be explored |
moa |
An obejct of class |
col |
The coler code for samples |
layout |
The layout control, see examples. |
plot |
A logical indicates whether the result should be ploted. If FALSE, a list of expression matrix of the gene set genes is returned. Otherwise nothing returned. |
obs.order |
Can be used to reorder the martrix, could be used when clustering result is available. |
... |
The arguments passed to |
This is a convenient function used to explore the expression of a set of features/genes
Do not return anything (plot=TRUE) or return a list of matrix (plot=FALSE) depends on plot arugment.
Chen meng
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) colcode <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) a <- box.gs.feature(x=mgsa, gs=allgs[5], type=3, col=colcode, plot=FALSE) box.gs.feature(x=mgsa, gs=allgs[5], type=3, col=colcode, plot=TRUE, layout=matrix(1:4, 2, 2))
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) colcode <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) a <- box.gs.feature(x=mgsa, gs=allgs[5], type=3, col=colcode, plot=FALSE) box.gs.feature(x=mgsa, gs=allgs[5], type=3, col=colcode, plot=TRUE, layout=matrix(1:4, 2, 2))
mgsa
into one.
This function could only be used to combine two "mgsa" objects at present; using "Reduce" function to combine more.
combine(x, y, ...)
combine(x, y, ...)
x |
one mgsa object |
y |
another mgsa object |
... |
ignored. Only two mgsa objects could be combined, using "Reduce" to combine more than two sets. |
A combined object of class mgsa
will be returned.
signature(x = "mgsa", y = "mgsa")
To combine two objects of mgsa
.
This function could only be used to combine two "mgsa" objects; using "Reduce" function to combine more.
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # split gene set annotation into two sets. sup1 <- lapply(NCI60_4array_supdata, function(x) x[, 1:10]) sup2 <- lapply(NCI60_4array_supdata, function(x) x[, -(1:10)]) # project two sets of annotation mgsa1 <- mogsa(x = NCI60_4arrays, sup=sup1, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) mgsa2 <- mogsa(x = NCI60_4arrays, sup=sup2, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # combine two indenpendent mgsa sets mgsa_comb <- combine(mgsa1, mgsa2) dim(getmgsa(mgsa1, "score")) dim(getmgsa(mgsa2, "score")) dim(getmgsa(mgsa_comb, "score"))
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # split gene set annotation into two sets. sup1 <- lapply(NCI60_4array_supdata, function(x) x[, 1:10]) sup2 <- lapply(NCI60_4array_supdata, function(x) x[, -(1:10)]) # project two sets of annotation mgsa1 <- mogsa(x = NCI60_4arrays, sup=sup1, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) mgsa2 <- mogsa(x = NCI60_4arrays, sup=sup2, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # combine two indenpendent mgsa sets mgsa_comb <- combine(mgsa1, mgsa2) dim(getmgsa(mgsa1, "score")) dim(getmgsa(mgsa2, "score")) dim(getmgsa(mgsa_comb, "score"))
Data-wise or PC-wise decomposition of gene set scores (GSS) across all observations. The predefined group/cluster information should be given so that the mean decomposed GSSs for each group are returned and plotted.
decompose.gs.group(x, gs, group, decomp = "data", nf = 2, x.legend = "bottomleft", y.legend = NULL, plot = TRUE, main = NULL, ...)
decompose.gs.group(x, gs, group, decomp = "data", nf = 2, x.legend = "bottomleft", y.legend = NULL, plot = TRUE, main = NULL, ...)
x |
An object of class |
gs |
The gene set want to exam. |
group |
An vector or factor to indicate the group of observations, such as clusters. See examples. |
decomp |
A charater string either "data" or "pc" to indicate how the gene set scores should be decomposed (with respect to data or PC. |
nf |
The number of axes/PCs to be calculated and plotted. |
x.legend |
Used to control the position of legends. |
y.legend |
Used to control the position of legends. |
plot |
A logical indicates if a plot should be drawn. |
main |
The main title of plot. |
... |
Other arguments passed to |
This function could be used when the number of observation is large and there are cluster/group information is available. In this case, the means of decomposed gene set scores over each group is calculated. The vertical bar on the end of each bar indicates the 95% confident interval of the means.
Return nothing or a matrix depends on how argument plot
is set.
Chen Meng
TBA
See Also decompose.gs.ind
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # using a list of data.frame as input mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) colcode <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) decompose.gs.group(x = mgsa, gs = 2, group = colcode, decomp = "data", plot = TRUE) decompose.gs.group(x = mgsa, gs = 2, group = colcode, decomp = "pc", nf = 3, plot = TRUE)
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # using a list of data.frame as input mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) colcode <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) decompose.gs.group(x = mgsa, gs = 2, group = colcode, decomp = "data", plot = TRUE) decompose.gs.group(x = mgsa, gs = 2, group = colcode, decomp = "pc", nf = 3, plot = TRUE)
Barplot of decomposed gene set scores, either with respect to datasets or axes.
decompose.gs.ind(x, gs, obs, type = 3, nf = 2, plot=TRUE, col.data = NULL, col.pc = NULL, legend = TRUE)
decompose.gs.ind(x, gs, obs, type = 3, nf = 2, plot=TRUE, col.data = NULL, col.pc = NULL, legend = TRUE)
x |
An object of class |
gs |
The gene set want to exam. |
obs |
The observations want to exam. |
type |
Which type of plot. type=1 - the data-pc mode; type=2 - the pc-data mode; type=3 - both. See detail. |
nf |
The number of axes/PCs to be calculated and plotted. |
plot |
A logical indicates if a plot should be drawn |
col.data |
The bar color of datasets |
col.pc |
The bar color of PCs |
legend |
A logical if legend should be shown |
type=1 (the data-pc mode), the axes/PCs are represented as the narrow bars with different colors and the background wide bars behind narrow bars are gene set scores for datasets, which is calculated from the sum of all underlying individual axes/PC scores. When type=2 (the pc-data mode) the interpreation of narrow and wide bars are in the other way around. If type=3, both are shown.
This function could only be used to check the decomposition of gene set scores
of a single observation. So the function is not efficent when the number of
observation is large. Another function decompose.gs.group
, could be
used in this case, particularly when the cluster information of the observation
panel is available.
Return nothing or a matrix depends on how argument plot
is set.
Chen Meng
TBA
See Also as decompose.gs.group
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) # plot decompose.gs.ind(x=mgsa, gs=allgs[5], obs="BR.MDA_MB_231", type=2, nf=5) # or decompose.gs.ind(x=getmgsa(mgsa, "sup"), gs=allgs[5], obs="BR.MDA_MB_231", type=3, nf=5)
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) # plot decompose.gs.ind(x=mgsa, gs=allgs[5], obs="BR.MDA_MB_231", type=2, nf=5) # or decompose.gs.ind(x=getmgsa(mgsa, "sup"), gs=allgs[5], obs="BR.MDA_MB_231", type=3, nf=5)
mbpca
An internal function called by mbpca
.
deflat(x, t, tb, pb, method = "globalScore")
deflat(x, t, tb, pb, method = "globalScore")
x |
A |
t |
The global scores returned by |
tb |
The block scores returned by |
pb |
The block loadings returned by |
method |
A charater to specify the deflation strateg, could be one of c("globalScore", "blockLoading", "blockScore"). |
A list
of deflated matrix
Chen Meng
moa-class
.
A convenient function to calculate the distance matrix from
an object of class moa-class
.
distMoa(x, nf = NA, tol = 1e-05, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
distMoa(x, nf = NA, tol = 1e-05, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
x |
An object of class |
nf |
Integer; the number of component used to calculate the distance. Default setting (NA) will keep all the axes. |
tol |
Numerical; the tolerance of component with low variance. |
method |
passed to function |
diag |
passed to function |
upper |
passed to function |
p |
passed to function |
An object of class dist
, see function "dist".
Chen Meng
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) dst <- distMoa(moa)
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) dst <- distMoa(moa)
get values/slot in an object of class "mgsa". The "mgsa" consists of two S4 class objects,
moa-class
and moa.sup-class
. This function could extract
values in these two components directly.
getmgsa(mgsa, value)
getmgsa(mgsa, value)
mgsa |
An object of class |
value |
The name of the value want to extract from "mgsa". See detail for options. |
if value in c("call", "moa", "sup"), the function equal function slot
.
if value in c("eig", "tau", "partial.eig", "eig.vec", "loading",
"fac.scr", "partial.fs", "ctr.obs", "ctr.var", "ctr.tab", "RV"), the function extact
corresponding value from moa-class
.
if value in c("data", "coord.sep", "coord.comb", "score",
"score.data", "score.pc", "score.sep", "p.val"), the function extract value from
moa.sup-class
.
The function return the selected value in "mgsa".
Chen Meng
TBA
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) part.eig <- getmgsa(mgsa, "partial.eig") barplot(as.matrix(part.eig))
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) part.eig <- getmgsa(mgsa, "partial.eig") barplot(as.matrix(part.eig))
Calculate the gene influential score of individual feature to the overall variance of GS score. Using a leave-one-out procedure (See detail).
GIS(x, geneSet, nf=NA, barcol=NA, topN=NA, plot=TRUE, Fvalue=FALSE, ff=NA, cor=FALSE)
GIS(x, geneSet, nf=NA, barcol=NA, topN=NA, plot=TRUE, Fvalue=FALSE, ff=NA, cor=FALSE)
x |
An object of class |
geneSet |
A charater string or number to indicated the gene sets under conserderation. |
nf |
The number of PCs used in the caluculation of gene set scores. The default is NA, which means using all the PCs in the mogsa. This should work for most of the cases. |
barcol |
The color of the bars, which is used to distinguish features/genes from different datasets, so its length should be the same as the number of data sets. |
topN |
An positive integer specify the number of top influencers that should to returned. |
plot |
A logical indicate if the result should be plotted. |
Fvalue |
A logical indicate if the GIS should be calculated in a supervised manner. |
ff |
The vector indicates the group of columns for calculating the F-ratio when Fvalue=TRUE. |
cor |
A logical indicates whether use correlation between reconstructed expression with GSS. This is faster than the standard GIS. |
The evaluation of the importance of a single feature is calculated in the supervised or unsupervised manner.
In the unsupervise manner, the value is calculated by:
log2(var(GS_-i)/var(GS))
where GS is the gene set score, and the GS_-i is a recalculate of gene set score without i'th feature. var() is the variance.
In the supervised manner, the value is caluclated as the F-ratio over a class vector:
log2(F(GS_-i)/F(GS))
Where F() is the calculation of F-ratio. The unsupervised GIS is encouraged since it works better for most of the cases in practice.
An object of class data.frame
contains three columns. The first column is the feature name,
the second columns is the gene influential score. The third columns indicates from where the
feature/gene is selected.
Chen Meng
TBA
see annotate.gs
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) # unsupervised measurement GIS(mgsa, allgs[1], topN = 5) # supervised measurement tissueType <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) GIS(mgsa, allgs[1], topN = 5, Fvalue = TRUE, ff = tissueType) # more PCs to calcualte GIS(mgsa, allgs[1], nf = 20, topN = 5, Fvalue = TRUE, ff = tissueType)
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) allgs <- colnames(NCI60_4array_supdata[[1]]) # unsupervised measurement GIS(mgsa, allgs[1], topN = 5) # supervised measurement tissueType <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) GIS(mgsa, allgs[1], topN = 5, Fvalue = TRUE, ff = tissueType) # more PCs to calcualte GIS(mgsa, allgs[1], nf = 20, topN = 5, Fvalue = TRUE, ff = tissueType)
the power of a matrix
matpower(x, n, nf = min(dim(x)), tol = 1e-07)
matpower(x, n, nf = min(dim(x)), tol = 1e-07)
x |
a numerical matrix object that the power of which should be calculated |
n |
The matrix to the power of |
nf |
The number of axes kept in the calculation of SVD and reconstruction |
tol |
The tolerance of the axis, singular vectors with singular value lower than tol will be ignored in the reconstruction. |
The power of a matrix is calculated in two steps: decompostion step: x=UDV' and the reconstruction step: x^n=U*D^n*V' In the reconstruction, the singular vectors with a singular value more than tol are kept.
A matrix x^n
Called by the wsvd function.
Chen Meng
See Also wsvd
set.seed(56) m <- matrix(rnorm(15), 5, 3) s <- matpower(m, 2) s <- matpower(m, -2)
set.seed(56) m <- matrix(rnorm(15), 5, 3) s <- matpower(m, 2) s <- matpower(m, -2)
Three approaches are supplied in this function, consensus PCA (CPCA), generalized CCA (GCCA) and multiple co-inertia analsyis (MCIA).
mbpca(x, ncomp, method, k = "all", center = TRUE, scale = FALSE, option = "uniform", maxiter = 1000, moa = TRUE, verbose = TRUE, svd.solver = c("svd", "fast.svd", "propack"), k.obs = "all", w = NA, w.obs = NA, unit.p = FALSE, unit.obs = FALSE, pos = FALSE)
mbpca(x, ncomp, method, k = "all", center = TRUE, scale = FALSE, option = "uniform", maxiter = 1000, moa = TRUE, verbose = TRUE, svd.solver = c("svd", "fast.svd", "propack"), k.obs = "all", w = NA, w.obs = NA, unit.p = FALSE, unit.obs = FALSE, pos = FALSE)
x |
A |
ncomp |
An integer; the number of components to calculate. To calculate more components requires longer computational time. |
method |
A character string could be one of c("globalScore", "blockScore", "blockLoading"). The "globalScore" approach equals consensus PCA; The "blockScore" approach equals generalized canonical correlation analysis (GCCA); The "blockLoading" approach equals multiple co-inertia anaysis (MCIA); |
k |
The absolute number (if k >= 1) or the proportion (if 0<k<1) of non-zero coefficients for the variable loading vectors. It could be a single value or a vector has the same length as x so the sparsity of individual matrix could be different. |
center |
Logical; if the variables should be centered |
scale |
Logical; if the variables should be scaled |
option |
A charater string could be one of c("lambda1", "inertia", "uniform") to indicate how the different matrices should be normalized. If "lambda1", the matrix is divided by its the first singular value, if "inertia", the matrix is divided by its total inertia (sum of square), if "uniform", none of them would be done. |
maxiter |
Integer; Maximum number of iterations in the algorithm |
moa |
Logical; whether the output should be converted to an object of class |
verbose |
Logical; whether the process (# of PC) should be printed |
svd.solver |
A charater string could be one of c("svd", "fast.svd", "propack"). The default "fast.svd " has a good compromise between the robustness and speed. "propack" is the fastest but may failed to converge in practice. |
k.obs |
The absolute number (if k >= 1) or the proportion (if 0<k<1) of non-zero coefficients for the observations. Sparse factor scores for observation are used by sparse concordance analysis. (New arguments from v1.12) |
w |
The weight of variables. It could be given in the following format: 1) NA or a numeric value: all variables have the same weight; 2) A vector of numeric values, the vecter has the same length as x: variables in each block shares the same weight; 3) A list of vector, each vector in the list has the same length as the number of row in the corresponding table/block, then each variable use a different weight. See detail how to select weight. (New arguments from v1.12) |
w.obs |
The weight of observations, see w. (New arguments from v1.12) |
unit.p |
A logical value, whether the loading vectors (for variables) for each table/block should be unit length. |
unit.obs |
A logical value, whether the score vectors (for observations) for each table/block should be unit length. (New arguments from v1.12) |
pos |
A logical value, whether only retain non-negative coefficients in loading and score vectors. (New arguments from v1.12) |
Select of weight for variables: In omics data, it is often true that low intensity variables suffers more noise. Therefore, The variables with higher intensities are more reliable. If we consider this, we can use the total sum intensity of a variable (or a tranform of it) as weight, the model would prefer to select high intensity variables.
An object of class moa-class
(if moa=TRUE
) or
an list
object contains the following elements:
tb
- the block scores
pb
- the block loadings
t
- the global scores
w
- the wegihts of block scores to construct the global scor
no note
Chen Meng
For clustering problem: Meng et al. 2015 moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets. Journal of proteome research.
see moa
for non-iterative algorithms for multi-block PCA.
data("NCI60_4arrays") tumorType <- sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1) colcode <- as.factor(tumorType) levels(colcode) <- c("red", "green", "blue", "cyan", "orange", "gray25", "brown", "gray75", "pink") colcode <- as.character(colcode) moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) plot(moa, value="eig", type=2) r <- bootMbpca(moa, mc.cores = 1, B=6, replace = FALSE, resample = "sample") moas <- mbpca(NCI60_4arrays, ncomp = 3, k = 0.1, method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) scr <- moaScore(moa) scrs <- moaScore(moas) diag(cor(scr[, 1:3], scrs)) layout(matrix(1:2, 1, 2)) plot(scrs[, 1:2], col=colcode, pch=20) legend("topright", legend = unique(tumorType), col=unique(colcode), pch=20) plot(scrs[, 2:3], col=colcode, pch=20) gap <- moGap(moas, K.max = 12, cluster = "hcl") gap$nClust hcl <- hclust(dist(scrs)) cls <- cutree(hcl, k=4) clsColor <- as.factor(cls) levels(clsColor) <- c("red", "blue", "orange", "pink") clsColor <- as.character((clsColor)) heatmap(t(scrs[hcl$order, ]), ColSideColors = colcode[hcl$order], Rowv = NA, Colv=NA) heatmap(t(scrs[hcl$order, ]), ColSideColors = clsColor[hcl$order], Rowv = NA, Colv=NA) genes <- moaCoef(moas) genes$nonZeroCoef$agilent.V1.neg
data("NCI60_4arrays") tumorType <- sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1) colcode <- as.factor(tumorType) levels(colcode) <- c("red", "green", "blue", "cyan", "orange", "gray25", "brown", "gray75", "pink") colcode <- as.character(colcode) moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) plot(moa, value="eig", type=2) r <- bootMbpca(moa, mc.cores = 1, B=6, replace = FALSE, resample = "sample") moas <- mbpca(NCI60_4arrays, ncomp = 3, k = 0.1, method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) scr <- moaScore(moa) scrs <- moaScore(moas) diag(cor(scr[, 1:3], scrs)) layout(matrix(1:2, 1, 2)) plot(scrs[, 1:2], col=colcode, pch=20) legend("topright", legend = unique(tumorType), col=unique(colcode), pch=20) plot(scrs[, 2:3], col=colcode, pch=20) gap <- moGap(moas, K.max = 12, cluster = "hcl") gap$nClust hcl <- hclust(dist(scrs)) cls <- cutree(hcl, k=4) clsColor <- as.factor(cls) levels(clsColor) <- c("red", "blue", "orange", "pink") clsColor <- as.character((clsColor)) heatmap(t(scrs[hcl$order, ]), ColSideColors = colcode[hcl$order], Rowv = NA, Colv=NA) heatmap(t(scrs[hcl$order, ]), ColSideColors = clsColor[hcl$order], Rowv = NA, Colv=NA) genes <- moaCoef(moas) genes$nonZeroCoef$agilent.V1.neg
"mgsa"
mgsa class here.
Objects can be created by calls of the form new("mgsa", ...)
.
call
:call
moa
:Object of class moa
sup
:Object of class moa.sup
signature(x = "mgsa", y = "mgsa")
To combine two objects of class "mgsa" This function could only be used to combine two "mgsa" objects, using "Reduce" function to combine more.
signature(x = "moa", y = "missing")
:
show the "mgsa" result.
Chen Meng
showClass("mgsa") # library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # split gene set annotation into two sets. sup1 <- lapply(NCI60_4array_supdata, function(x) x[, 1:10]) sup2 <- lapply(NCI60_4array_supdata, function(x) x[, -(1:10)]) # project two sets of annotation mgsa1 <- mogsa(x = NCI60_4arrays, sup=sup1, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) mgsa2 <- mogsa(x = NCI60_4arrays, sup=sup2, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # combine two indenpendent mgsa sets mgsa_comb <- combine(mgsa1, mgsa2) dim(getmgsa(mgsa1, "fac.scr")) dim(getmgsa(mgsa2, "fac.scr")) dim(getmgsa(mgsa_comb, "fac.scr"))
showClass("mgsa") # library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # split gene set annotation into two sets. sup1 <- lapply(NCI60_4array_supdata, function(x) x[, 1:10]) sup2 <- lapply(NCI60_4array_supdata, function(x) x[, -(1:10)]) # project two sets of annotation mgsa1 <- mogsa(x = NCI60_4arrays, sup=sup1, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) mgsa2 <- mogsa(x = NCI60_4arrays, sup=sup2, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # combine two indenpendent mgsa sets mgsa_comb <- combine(mgsa1, mgsa2) dim(getmgsa(mgsa1, "fac.scr")) dim(getmgsa(mgsa2, "fac.scr")) dim(getmgsa(mgsa_comb, "fac.scr"))
Analysis multiple omics data using MFA or STATIS. The input multiple tables are in a form that columns are samples and rows are variables/features.
moa(data, proc.row="center_ssq1", w.data="inertia", w.row=NULL, statis=FALSE, moa=TRUE)
moa(data, proc.row="center_ssq1", w.data="inertia", w.row=NULL, statis=FALSE, moa=TRUE)
data |
A list of |
proc.row |
Preprocessing of rows of datasets, should be one of
|
w.data |
The weights of each separate dataset, should be one of
or |
w.row |
If it is not null, it should be a list of positive numerical vectors, the length of which should be the same with the number of rows of each dataset to indicated the weight of rows of datasets. |
statis |
A logical indicates whether STATIS method should be used. See details. |
moa |
Logical; whether the output should be converted to an object of class |
Different methods employs different precessing of row and datasets. For multipple factorial analysis (MFA), the rows of each dataset are first centered and scaled, then each dataset is weighted by the reverse of its first eigenvalue (proc.row=center_ssq1, w.data="lambda1"). This algorithm does not have a well defined criterion to be optimized (see reference).
If statis=TRUE, the statis algorithm will be used, that is, each dataset will be further weighted so that datasets closer to the overall structure will receive a higher weight.
An object of class moa-class
.
Chen Meng
Herve Abdi, Lynne J. Williams, Domininique Valentin and Mohammed Bennani-Dosse. STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. WIREs Comput Stat 2012. Volume 4, Issue 2, pages 124-167 Herve Abdi, Lynne J. Williams, Domininique Valentin. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Comput Stat 2013
sup.moa
, mogsa
. More about plot see moa-class
.
# library(mogsa) # loading data data(NCI60_4arrays) # run analysis ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # plot # plot eigen value plot(ana, value = "eig", type = 2) # plot the normalized (percentage) eigen value plot(ana, value = "tau", type = 2) # ploting the observations colcode <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) plot(ana, type = 1, value = "obs", col=colcode) plot(ana, type = 2, value = "obs", col=colcode, data.pch=1:4) # plot variables/features in each data sets plot(ana, value = "var", layout=matrix(1:4, 2, 2)) # plot the RV coefficients for the data sets plot(ana, value = "RV") # to extract the components representing significant concordance structures between datasets bt <- bootMoa(moa = ana, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE, B = 20)
# library(mogsa) # loading data data(NCI60_4arrays) # run analysis ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # plot # plot eigen value plot(ana, value = "eig", type = 2) # plot the normalized (percentage) eigen value plot(ana, value = "tau", type = 2) # ploting the observations colcode <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\\."), "[", 1)) plot(ana, type = 1, value = "obs", col=colcode) plot(ana, type = 2, value = "obs", col=colcode, data.pch=1:4) # plot variables/features in each data sets plot(ana, value = "var", layout=matrix(1:4, 2, 2)) # plot the RV coefficients for the data sets plot(ana, value = "RV") # to extract the components representing significant concordance structures between datasets bt <- bootMoa(moa = ana, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE, B = 20)
"moa"
moa class object
Objects can be created by calls of the form new("moa", ...)
.
eig
:eigen values
tau
:The percentage of explained variance by each datasets sparately.
partial.eig
:matrix, rows indicate the partial eigenvalues from each data.
eig.vec
:a matrix, eigenvectors.
loading
:the coordinate of variables/features.
fac.scr
:factor score of observations.
partial.fs
:partial factor score.
ctr.obs
:contribution of each observation to the total factor score.
ctr.var
:contribution of each variables to the total variance.
ctr.tab
:contribution of each data to the total variance.
RV
:pairwise RV coefficients
w.row
:weight of rows
w.data
:weight of datasets
data
:the original input data
tab.dim
:the dimension of each input data
call
:call
signature(x = "moa", y = "missing")
:
Argument "value" sould be one of "eig", "tau", "obs", "var" and "RV"
if value = "eig", the eigenvalue would be plotted as scree plot. The following arguments could be set:
type=1 - The type of plot to show eigenvalues. (type=1: the eigenvalue are plotted; type=2: partial eigenvalue shown as concatenated bars; type=3: partial eigenvalue shown as bars side by side; type=4: matplot view of eigenvales, lty need to be set; type=5; the two dimensional plot of partial eigenvalues, axes and pch need to be set in this case.)
axes=NULL - The axes selected to plot
n=NULL - Top n eigenvalues to be drawn
tol=1e-5 - The tolerance of eigenvalue, eigenvalues lower than this value will not be shown.
legend=NULL - legend to put, a character string as calling legend function
col=NULL - The color of partial eigenvalues from each data set
lty=1 - The line type used in the matplot, used when type =4
pch=NULL - the pch to draw 2D partial eigen plot, when type = 5 used
lg.x="topright" - The position of legend
lg.y=NULL - Poistion argument passed to function "legend"
... - other arguemnts passed to functions
if value = "tau", the same with eig, but in the eigenvalues are scaled to 1
if value = "obs", the observation space will be shown, the following argument could be set:
axes=1:2 - Which axes should be draw
type=1 - Which type, see below (for type=1: the center points draw; type=2: the separate factor scores linked by lines; ... will be passed to function "points")
data.pch=20 - the pch of dataset, if type=1, the first one is used
col=1 - the color of observations, recycled used by data.frame
label=FALSE - A logical indicates if labels should be shown
lg.x="topright" - Position of legend
lg.y=NULL - Position of legend
xlim=NULL - The x limit
ylim=NULL - The y limit
label.cex=1 - the cex of text
...
var - the separate gene view, layout can be specified
RV - the heatmap of RV coefficients
signature(x = "moa", y = "missing")
:
show "moa" object
Chen Meng
Herve Abdi, Lynne J. Williams, Domininique Valentin and Mohammed Bennani-Dosse. STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. WIREs Comput Stat 2012. Volume 4, Issue 2, pages 124-167
Herve Abdi, Lynne J. Williams, Domininique Valentin. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Comput Stat 2013
showClass("moa") # load("R/mogsa/data/NCI60_4arrays.rda") data(NCI60_4arrays) ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plot(ana, value="eig") plot(ana, value="tau", type=2)
showClass("moa") # load("R/mogsa/data/NCI60_4arrays.rda") data(NCI60_4arrays) ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plot(ana, value="eig") plot(ana, value="tau", type=2)
"moa.sup"
moa.sup class desc.
Objects can be created by calls of the form new("moa.sup", ...)
.
sup
:Object of class "list"
, the matrix of supplementary data.
coord.sep
:The projection of geneset infromation on each separate data.
coord.comb
:The projection of geneset infromation on total dataset.
score
:the gene set-sample pathway score
score.data
:the gene set-sample pathway score, data separate
score.pc
:the gene set-sample pathway score, PC separate
score.sep
:the gene set-sample pathway score, separate.
p.val
:the p value matrix have the same dimension with score matrix.
p.val.corrected
:the matrix of corrected p values.
There is no generic function for objects of "moa.sup", but have specific function, including: - decompose.gs.ind - box.gs.feature - plotGS - decompose.gs.group
Chen Meng
objects to See Also as decompose.gs.ind
, box.gs.feature
, plotGS
, decompose.gs.group
.
showClass("moa.sup") data(NCI60_4array_supdata) data(NCI60_4arrays) sapply(NCI60_4array_supdata, dim) ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plot(ana, value="eig") smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=5)
showClass("moa.sup") data(NCI60_4array_supdata) data(NCI60_4arrays) sapply(NCI60_4array_supdata, dim) ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plot(ana, value="eig") smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=5)
moa-class
.
Extract the loadings/coefficients from an object of class moa-class
.
moaCoef(moa)
moaCoef(moa)
moa |
An object of class |
It returns a list consist of two components:
coefMat
- the loading matrix
nonZeroCoef
- it is a list
of data.frame
to list the non-zero coefficient
variable in each of loading vectors and data sets. The element names are in a format as
"xxxx.yy.zzz"
xxxx - are the data names, tells the data set where a varirable is from
yy - the number of Axes, for example, "V1" indicate the variable has a non-zero coefficient in the first loading vector.
zzz - could be either "pos" (coefficient >0) or "neg" (coefficient < 0)
The data.frame
has two columns, the first column is the ID of a variable the second
column is the coefficient/loading.
Chen Meng
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) genes <- moaCoef(moa) scr <- moaScore(moa)
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) genes <- moaCoef(moa) scr <- moaScore(moa)
moa-class
.
Extract global scores from an object of class moa-class
.
moaScore(moa)
moaScore(moa)
moa |
An object of class |
A matrix of global score
Chen Meng
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) genes <- moaCoef(moa) scr <- moaScore(moa)
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) genes <- moaCoef(moa) scr <- moaScore(moa)
moa-class
.
Gap statitistic is a measurement of goodness of clustering result. This is a convenient function to calculate the gap statistic of clustering "moa".
moGap(x, K.max, B = 100, cluster = c("kmeans", "hclust"), plot = TRUE, dist.method = "euclidean", dist.diag = FALSE, dist.upper = FALSE, dist.p = 2, hcl.method = "complete", hcl.members = NULL, km.iter.max = 10, km.nstart = 10, km.algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"), km.trace = FALSE)
moGap(x, K.max, B = 100, cluster = c("kmeans", "hclust"), plot = TRUE, dist.method = "euclidean", dist.diag = FALSE, dist.upper = FALSE, dist.p = 2, hcl.method = "complete", hcl.members = NULL, km.iter.max = 10, km.nstart = 10, km.algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"), km.trace = FALSE)
x |
|
K.max |
The maximum number of clusters to consider, passed to |
B |
The number of bootstrap, passed to |
cluster |
A charater string could be either "kmeans" or "hclust" to specify the clustering algorithm. |
plot |
Logical; whether return the gap statistic plot. |
dist.method |
Distance meaurement, passed to function |
dist.diag |
Passed to function |
dist.upper |
Passed to function |
dist.p |
Passed to function |
hcl.method |
Hierarchical clustering method, passed to |
hcl.members |
Passed to |
km.iter.max |
Maximum number of iteration in kmeans, passed to |
km.nstart |
An integer to specify how many random sets should be chosen. passed to |
km.algorithm |
Kmeans algorithm, passed to |
km.trace |
See function |
It returns a list consists of five components:
"Tab", "n", "B", "FUNcluster" - see clusGap
"nClust" - the estimated number of clusters using different method, see maxSE
Chen Meng
Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. Journal of the Royal Statistical Society B, 63, 411-423.
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.(2015). cluster: Cluster Analysis Basics and Extensions. R package version 2.0.1.
Function "clusGap" in "cluster" package Function "dist", "hclust", "kmeans"
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) gap <- moGap(moa, K.max = 12, cluster = "hcl") genes <- moaCoef(moa) scr <- moaScore(moa) moa2 <- moa(NCI60_4arrays, proc.row="center_ssq1", w.data="inertia", w.row=NULL, statis=FALSE) gap2 <- moGap(moa, K.max = 12, cluster = "hcl")
# see examples in \code{\link{mbpca}} data("NCI60_4arrays") moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", center=TRUE, scale=FALSE) gap <- moGap(moa, K.max = 12, cluster = "hcl") genes <- moaCoef(moa) scr <- moaScore(moa) moa2 <- moa(NCI60_4arrays, proc.row="center_ssq1", w.data="inertia", w.row=NULL, statis=FALSE) gap2 <- moGap(moa, K.max = 12, cluster = "hcl")
The main function called by users, omics data analysis and gene set annotation.
A wrapper function of moa
and sup.moa
.
mogsa(x, sup, nf=NULL, factors = NULL, proc.row=NULL, w.data=NULL, w.row=NULL, statis=FALSE, ks.stat=FALSE, ks.B = 1000, ks.cores = NULL, p.adjust.method = "fdr")
mogsa(x, sup, nf=NULL, factors = NULL, proc.row=NULL, w.data=NULL, w.row=NULL, statis=FALSE, ks.stat=FALSE, ks.B = 1000, ks.cores = NULL, p.adjust.method = "fdr")
x |
An object of class |
sup |
An object of class |
nf |
The number of principal components used to reconstruct, only used when x is a an object of |
factors |
The index of principal components used in the projection, used when non-consecutive PC to be included in the analysis. |
proc.row |
Preprocessing of rows. If x is a object of |
w.data |
Weights of datasets. If x is a object of |
w.row |
Weight of row. If x is a object of |
statis |
A logical indicates if statis algrithm should be used. If x is a object of |
ks.stat |
The logical indicates if the p-value should be calculated using K-S statistic (the method used in "ssgsea" in GSVA package).
Default is FALSE, which means using the z-score method. See |
ks.B |
An integer to indicate the number of bootstrapping samples to calculated the p-value of KS statistic. |
ks.cores |
An integer indicate the number of cores to be used in bootstrapping. It is passed to function |
p.adjust.method |
The method of p value adjustment, passed to |
A wrapper function of moa
and sup.moa
.
An object of class mgsa-class
.
This function will be changed to a generic function for "S4-style" programming.
Chen Meng
Preprint: Meng, C., Kuster, B., Peters, B., Culhane, AC., Moghaddas Gholami, A., moGSA: integrative single sample gene-set analysis of multiple omics data. doi: http://dx.doi.org/10.1101/046904 Haenzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013. Barbie, D.A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(5):108-112, 2009.
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # using a list of data.frame as input mgsa1 <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) mgsa1x <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, factors = c(1,3,6), proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # using moa as input ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=3) mgsa2 <- mogsa(x = ana, sup=NCI60_4array_supdata, nf=9) mgsa3 <- mogsa(x = ana, sup=smoa)
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # using a list of data.frame as input mgsa1 <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) mgsa1x <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, factors = c(1,3,6), proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) # using moa as input ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=3) mgsa2 <- mogsa(x = ana, sup=NCI60_4array_supdata, nf=9) mgsa3 <- mogsa(x = ana, sup=smoa)
mbpca
.
An internal function called by mbpca
. It returns the result comparable
with nipalsSoftK, but way faster since it uses the SVD algorithm. No sparse opertors
in this function.
msvd(x, svd.sol = svd)
msvd(x, svd.sol = svd)
x |
The input matrix, rows are observations, columns are variables |
svd.sol |
A function object to specify the preferred SVD solver, default is |
an list
object contains the following elements:
tb
- the block scores
pb
- the block loadings
t
- the global scores
w
- the wegihts of block scores to construct the global scor
Chen Meng
Supplmentary to NCI60_4arrays.
data(NCI60_4arrays)
data(NCI60_4arrays)
The format is: List of 4 matrix
\$agilent:matrix
containing 300 rows and 60 columns.
300 gene expression log ratio measurements of the NCI60 cell lines, by Agilent
platform.
\$hgu133:matrix
containing 298 rows and 60 columns.
298 gene expression log ratio measurements of the NCI60 cell lines, by H-GU133
platform.
\$hgu133p2:matrix
containing 268 rows and 60 columns.
268 gene expression log ratio measurements of the NCI60 cell lines, by H-GU133
plus 2.0 platform.
\$hgu95:matrix
containing 288 rows and 60 columns.
288 gene expression log ratio measurements of the NCI60 cell lines, by H-GU95
platform.
NCI60_4array_supdata
will be loaded in your working space.
The 60 human tumour cell lines are derived
from patients with leukaemia, melanoma, lung, colon, central
nervous system, ovarian, renal, breast and prostate cancers. The cell line
panel is widely used in anti-cancer drug screen. In this dataset,
a subset of microarray gene expression of the NCI 60 cell lines from
four different platforms are combined in a list, which could be used as
input to mcia
directly.
data(NCI60_4arrays)
data(NCI60_4arrays)
The format is: List of 4 data.frame
s
\$agilent:data.frame
containing 300 rows and 60 columns.
300 gene expression log ratio measurements of the NCI60 cell lines, by Agilent
platform.
\$hgu133:data.frame
containing 298 rows and 60 columns.
298 gene expression log ratio measurements of the NCI60 cell lines, by H-GU133
platform.
\$hgu133p2:data.frame
containing 268 rows and 60 columns.
268 gene expression log ratio measurements of the NCI60 cell lines, by H-GU133
plus 2.0 platform.
\$hgu95:data.frame
containing 288 rows and 60 columns.
288 gene expression log ratio measurements of the NCI60 cell lines, by H-GU95
platform.
NCI60_4arrays
will be loaded in your working space.
Cell Miner http://discover.nci.nih.gov/cellminer/
Reinhold WC, Sunshine M, Liu H, Varma S, Kohn KW, Morris J, Doroshow J, Pommier Y CellMiner: A Web-Based Suite of Genomic and Pharmacologic Tools to Explore Transcript and Drug Patterns in the NCI-60 Cell Line Set. Cancer Research. 2012 Jul, 15;72(14):3499-511
An internal function called by mbpca
.
nipalsSoftK(x, maxiter, k)
nipalsSoftK(x, maxiter, k)
x |
The input matrix, rows are observations, columns are variables |
maxiter |
# of maximum interation the algorithm can run |
k |
The number (>=1) or proportion (<1) of variables want to keep. It could be a single value or a vector has the same length as x so the sparsity of individual matrix could be different. |
an list
object contains the following elements:
tb
- the block scores
pb
- the block loadings
t
- the global scores
w
- the wegihts of block scores to construct the global score.
Chen Meng
Calculating pairwise RV coefficients for a list of matrices or data.frame.
pairwise.rv(data.list, match="col")
pairwise.rv(data.list, match="col")
data.list |
A list of data.frame or matrix, either rows or columns in each data set should be matched. |
match |
Whether columns or rows of data.frame/matrix should be matched. |
The RV coefficient for each pair of matrices is calculated as Rv = trace(XX'YY')/sqrt(trace(XX'XX')*trace(YY'YY'))
The function will return a matrix containing the pairwise RV coefficients.
The variable in matrices are not automatically centered or scaled in this function. So these step may need to be performed before calling this function.
Chen Meng
Robert, P.; Escoufier, Y. (1976). A Unifying Tool for Linear Multivariate Statistical Methods: The RV-Coefficient. Applied Statistics 25 (3): 257-265.
data(NCI60_4arrays) pairwise.rv(NCI60_4arrays)
data(NCI60_4arrays) pairwise.rv(NCI60_4arrays)
plot
Methods for function plot
signature(x = "moa", y = "missing")
plot "moa" object
Argument "value" sould be one of "eig", "tau", "obs", "var" and "RV"\
if value = "eig", the eigenvalue would be plotted as scree plot. The following arguments could be set:\
type=1 - The type of plot to show eigenvalues. (type=1: the eigenvalue are plotted; type=2: partial eigenvalue shown as concatenated bars; type=3: partial eigenvalue shown as bars side by side; type=4: matplot view of eigenvales, lty need to be set; type=5; the two dimensional plot of partial eigenvalues, axes and pch need to be set in this case.) \ axes=NULL - The axes selected to plot \ n=NULL - Top n eigenvalues to be drawn \ tol=1e-5 - The tolerance of eigenvalue, eigenvalues lower than this value will not be shown. \ legend=NULL - legend to put, a character string as calling legend function \ col=NULL - The color of partial eigenvalues from each data set \ lty=1 - The line type used in the matplot, used when type =4 \ pch=NULL - the pch to draw 2D partial eigen plot, when type = 5 used \ lg.x="topright" - The position of legend \ lg.y=NULL - Poistion argument passed to function "legend" \ ... - other arguemnts passed to functions \ \
if value = "tau", the same with eig, but in the eigenvalues are scaled to 1 \
if value = "obs", the observation space will be shown, the following argument could be set:\
axes=1:2 - Which axes should be draw\ type=1 - Which type, see below (for type=1: the center points draw; type=2: the separate factor scores linked by lines; ... will be passed to function "points")\ data.pch=20 - the pch of dataset, if type=1, the first one is used\ col=1 - the color of observations, recycled used by data.frame\ label=FALSE - A logical indicates if labels should be shown\ lg.x="topright" - Position of legend \ lg.y=NULL - Position of legend \ xlim=NULL - The x limit \ ylim=NULL - The y limit \ label.cex=1 - the cex of text \ ... \
var - the separate gene view, layout can be specified \
RV - the heatmap of RV coefficients
Plot the gene set space of objects of "moa" and "mgsa"
plotGS(x, axes=1:2, center.only=FALSE, topN=1, data.pch=20, data.col=1, highlight.col = 2, label=NULL, label.cex=1, layout=NULL, ...)
plotGS(x, axes=1:2, center.only=FALSE, topN=1, data.pch=20, data.col=1, highlight.col = 2, label=NULL, label.cex=1, layout=NULL, ...)
x |
An object of class |
axes |
An integer vector in the length 2 to indicate the axes to be drawn. |
center.only |
A logical to indicate whether the separate gene set spaces from each of the data set should be plotted. Default is FALSE. |
topN |
An integer specify N gene set from the most positive and negative end of axes to be labeled |
data.pch |
The shape for plotting each data set. This argument is passed to points function, so only used when separate gene set spaces are plotted (i.e. center.only = FALSE). |
data.col |
The col for plotting each data set. This argument is passed to points function, so only used when separate gene set spaces are plotted (i.e. center.only = FALSE). |
highlight.col |
The color used to highlight the selected gene sets |
label |
Either a character vector or NULL (default). The character vector should be the name of some gene sets want ot be labeled. |
label.cex |
Passed to |
layout |
A matrix passed to the |
... |
Other arguments passed to |
This is a convenience function to explore the gene set space so not very flexible. For customized plot, please use the object of [email protected]
and [email protected]
.
If assign to variable, A list
of selected/highlighted gene set at
the (positve and negative) end of each axis will be returned.
Chen Meng
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plotGS(mgsa, center.only = TRUE, topN=5) res <- plotGS(mgsa, center.only = FALSE, data.pch=1:4, data.col=1:4) res
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plotGS(mgsa, center.only = TRUE, topN=5) res <- plotGS(mgsa, center.only = FALSE, data.pch=1:4, data.col=1:4) res
Prepare pathway gene sets from "graphite" package, which could be passed to "prepSupMoa" function.
prepGraphite(db, id = c("entrez", "symbol"))
prepGraphite(db, id = c("entrez", "symbol"))
db |
The database to be used, an object of class either 'PathwayList' create by "pathways" function. |
id |
Which identifier for output, either "entrez" or "symbol". |
Only support "entrez" or "symbol" output currently.
This function returns an object of list containing gene set information, which could be further processed by function "prepSupMoa" to convert to the object that can be used as input of "sup.moa" or "mogsa".
Chen Meng
Sales G, Calura E and Romualdi C (2014). graphite: GRAPH Interaction from pathway Topological Environment. R package version 1.10.1.
See Also as prepMsigDB
and prepSupMoa
.
library(graphite) keggdb <- prepGraphite(db = pathways("hsapiens", "kegg")[1:3], id = "entrez")
library(graphite) keggdb <- prepGraphite(db = pathways("hsapiens", "kegg")[1:3], id = "entrez")
Convert a gmt file (Could be downloaded from MSigDB) to a list of gene sets information.
prepMsigDB(file)
prepMsigDB(file)
file |
The directory and file name of the gmt file. |
This function returns an object of list containing gene set information, which could be further processed by function "prepSupMoa" to convert to the object that can be used as input of "sup.moa" or "mogsa".
Chen Meng
See Also as prepGraphite
and prepSupMoa
.
# not run dir <- system.file(package = "mogsa") preGS <- prepMsigDB(file=paste(dir, "/extdata/example_msigdb_data.gmt.gz", sep = ""))
# not run dir <- system.file(package = "mogsa") preGS <- prepMsigDB(file=paste(dir, "/extdata/example_msigdb_data.gmt.gz", sep = ""))
Convert a list of gene set information to a set of sumpplementary tables that can be used as input of function "sup.moa" or "mogsa".
prepSupMoa(X, geneSets, minMatch = 10, maxMatch = 500)
prepSupMoa(X, geneSets, minMatch = 10, maxMatch = 500)
X |
A matrix/data.frame or a list of matrix/data.frame or a list of character vector. If it is a list of matrix/data.frame, row names of matrix/data.frame will be used to create the projection matrix. Otherwise the charater vectors will used to create the supplementary matirx. |
geneSets |
Gene sets list or an object of class "GeneSet" or "GeneSetCollection". A gene set list could be returned by prepGraphite or prepMolsigDB. |
minMatch |
The minimum match of geneset. |
maxMatch |
The maximum match genesets. |
Details here
A list of matrix could used as supplementary tables by "sup.moa" or "mogsa".
Chen Meng
See Also as prepGraphite
and prepMsigDB
.
library(graphite) data(NCI60_4arrays) kegg <- pathways(species = "hsapiens", "kegg") pw <- c("Purine metabolism", "MAPK signaling pathway") gss <- prepGraphite(db = kegg[pw], id="symbol") gss <- lapply(gss, function(x) sub("SYMBOL:", "", x)) sup_data1 <- prepSupMoa(NCI60_4arrays, geneSets=gss) gene_list <- lapply(NCI60_4arrays, rownames) sup_data2 <- prepSupMoa(gene_list, geneSets=gss)
library(graphite) data(NCI60_4arrays) kegg <- pathways(species = "hsapiens", "kegg") pw <- c("Purine metabolism", "MAPK signaling pathway") gss <- prepGraphite(db = kegg[pw], id="symbol") gss <- lapply(gss, function(x) sub("SYMBOL:", "", x)) sup_data1 <- prepSupMoa(NCI60_4arrays, geneSets=gss) gene_list <- lapply(NCI60_4arrays, rownames) sup_data2 <- prepSupMoa(gene_list, geneSets=gss)
print
Methods for function print
signature(object = "moa")
print "moa" class
signature(object = "moa.sup")
print "sup.moa" class
signature(object = "mgsa")
print "mgsa" class
mbpca
.
An internal function called by mbpca
.
processOpt(x, center = TRUE, scale = FALSE, option = c("lambda1", "inertia", "uniform"))
processOpt(x, center = TRUE, scale = FALSE, option = c("lambda1", "inertia", "uniform"))
x |
A list of matrices, rows are observations and columns are variables |
center |
A logical variable indicates whether columns should be centered |
scale |
A logical variable indicates whether columns should be scaled |
option |
A charater string could be one of c("lambda1", "inertia", "uniform") to indicate how the different matrices should be normalized. If "lambda1", the matrix is divided by its the first singular value, if "inertia", the matrix is divided by its total inertia (sum of square), if "uniform", none of them would be done. |
A list
of normalized matrix.
Chen Meng
show
Methods for function show
signature(object = "moa")
show "moa" class
signature(object = "moa.sup")
show "sup.moa" class
signature(object = "mgsa")
show "mgsa" class
Weighted soft-thresholding operator, which is called by mbpca
.
softK(x, k, w = 1, pos = FALSE)
softK(x, k, w = 1, pos = FALSE)
x |
A numerical vector |
k |
Number of non-zero elements want to keep |
w |
weight for each element. The actual thresholding is base on x*w, the default setting equals to ordinary soft thresholding. |
pos |
A logical value, if only positive values are retained. |
A thresholded numerical vector
Chen Meng
v <- rnorm(10) softK(v, k = 2)
v <- rnorm(10) softK(v, k = 2)
summary
Methods for function summary
signature(object = "moa")
summary "moa" class
signature(object = "moa.sup")
summary "sup.moa" class
signature(object = "mgsa")
summary "mgsa" class
moa-class
.
Projecting supplementary tables on moa-class
sup.moa(X, sup, nf = 2, factors = NULL, ks.stat=FALSE, ks.B = 1000, ks.cores = NULL, p.adjust.method = "fdr")
sup.moa(X, sup, nf = 2, factors = NULL, ks.stat=FALSE, ks.B = 1000, ks.cores = NULL, p.adjust.method = "fdr")
X |
An object of class |
sup |
A list of data.frames contains supplementary data. |
nf |
The number of principal components used in the projection. |
factors |
The index of principal components used in the projection, used when non-consecutive PC to be included in the analysis. |
ks.stat |
The logical indicates if the p-value should be calculated using K-S statistic (the method used in "ssgsea" in GSVA package). Default is FALSE, which means using the z-score method. |
ks.B |
An integer to indicate the number of bootstrapping samples to calculated the p-value of KS statistic. |
ks.cores |
An integer indicate the number of cores to be used in bootstrapping. It is passed to function |
p.adjust.method |
The method of p value adjustment, passed to |
Projecting supplementary tables on moa-class
, for details see reference.
An object of class moa.sup-class
.
Chen Meng
Herve Abdi, Lynne J. Williams, Domininique Valentin and Mohammed Bennani-Dosse. STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. WIREs Comput Stat 2012. Volume 4, Issue 2, pages 124-167 Haenzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013. Barbie, D.A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(5):108-112, 2009.
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # check the dimension of each supplementary data to see how many gene set annotated the data sapply(NCI60_4array_supdata, dim) # run analysis ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plot(ana, value="eig") # projectin supplementary data smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=3) # heatmap visualize the gene set scores heatmap(slot(smoa, "score"))
# library(mogsa) # loading gene expression data and supplementary data data(NCI60_4array_supdata) data(NCI60_4arrays) # check the dimension of each supplementary data to see how many gene set annotated the data sapply(NCI60_4array_supdata, dim) # run analysis ana <- moa(NCI60_4arrays, proc.row = "center_ssq1", w.data = "inertia", statis = TRUE) plot(ana, value="eig") # projectin supplementary data smoa <- sup.moa(ana, sup=NCI60_4array_supdata, nf=3) # heatmap visualize the gene set scores heatmap(slot(smoa, "score"))
The weighted version of singular value decomposition.
wsvd(X, D1 = diag(1, nrow(X)), D2 = diag(1, ncol(X)))
wsvd(X, D1 = diag(1, nrow(X)), D2 = diag(1, ncol(X)))
X |
A numeric matrix whose wSVD decomposition is to be computed. |
D1 |
A square matrix or vector. The left constraint/weight matrix (symmetric and positive in diagonal). The dimension of D1 should be the same with the number of rows in X. A vector input will be converted to a diagnal matrix. |
D2 |
A square matrix or vector. The right constraint/weight matrix (symmetric, positive in diagonal). The dimension of D1 should be the same with the number of columns in X. A vector input will be converted to a diagnal matrix. |
The weighted version of generalized singular value decomposition (SVD) of matrix A = UDV' with the constraints U'D1U = I and V'D2V = I D1 and D2 are two matrices express constraints imposed on the rows and the columns of matrix A.
d - singular values
u - left singular vectors
v - right singular vectors
D1 - the left weight matrix (directly from input)
D2 - the right weight matrix (directly from input)
Chen Meng
Herve Abdi. Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD) http://www.utdallas.edu/~herve/Abdi-SVD2007-pretty.pdf
svd
set.seed(56) m <- matrix(rnorm(15), 5, 3) wl <- rnorm(5) wr <- runif(3) s <- wsvd(X=m, D1=wl, D2=wr) # t(s$u) %*% diag(wl) %*% s$u # t(s$v) %*% diag(wr) %*% s$v # all.equal(m, as.matrix(s$u) %*% diag(s$d) %*% t(s$v))
set.seed(56) m <- matrix(rnorm(15), 5, 3) wl <- rnorm(5) wr <- runif(3) s <- wsvd(X=m, D1=wl, D2=wr) # t(s$u) %*% diag(wl) %*% s$u # t(s$v) %*% diag(wr) %*% s$v # all.equal(m, as.matrix(s$u) %*% diag(s$d) %*% t(s$v))