Title: | Multivariate Analysis of Transcriptomic Data |
---|---|
Description: | This package provides the analysis methods fourthcorner and RLQ analysis for large-scale transcriptomic data. |
Authors: | Lara Urban <[email protected]> |
Maintainer: | Lara Urban <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.33.0 |
Built: | 2024-10-30 05:34:27 UTC |
Source: | https://github.com/bioc/covRNA |
covRNA (covariate analysis of RNA-Seq data) is a fast and user-friendly R package which implements fourthcorner analysis and RLQ of transcriptomic data.
Gene expression data normally comes with covariates of the samples and of the genes. To analyze associations between sample and gene covariates, the fourthcorner analysis tests the statistical significance of the associations by permutation tests while the RLQ visualizes associations within and be-tween the covariates.
The fourthcorner analysis and RLQ implemented in the ade4 package are adapted to easily analyze large-scale transcriptomic data. (1) Runtime and storage space are significantly reduced, (2) the analysis accounts for tran-scriptome-specific shapes of the empirical permutation distributions, (3) the analysis is rendered user-friendly by supplying automation, simple design-ing of plots and unsupervised gene filtering.
To cite covRNA, please use citation("covRNA"). For further details, please refer to the vignette by openVignette("covRNA") and the man pages.
Package: | covRNA |
Type: | Package |
License: | GPL (>=2) |
LazyLoad: | yes |
Lara Urban
Maintainer: Lara Urban <[email protected]>
To be announced soon.
The integrated Baca dataset contains the ExpressionSet Baca; its assayData contains deep sequenced RNA-Seq data of Bacillus anthracis under four stress conditions (with four replicates per stress conditions). The raw sequence reads derive from Passalacqua et al. (2012) and are availaible at Gene Expression Omnibus (GEO, accession number GSE36506). We have already mapped, counted and DESeq2 normalised these counts. The phenoData assigns the stress condition, i.e. ctrl, cold, salt and alcohol stress, to the samples. The featureData contains COG annotations of the genes.
Baca
Baca
ExpressionSet
ExpressionSet
GEO GSE36506
Passalacqua, K. D., Varadarajan, A., Weist, C., Ondov, B. D., Byrd, B. et al. (2012) Strand-Specific RNA-Seq Reveals Ordered Patterns of Sense and Antisense Transcription in Bacillus anthracis. PLoS ONE, 7(8):e43350.
data(Baca) fData(Baca) pData(Baca) exprs(Baca)
data(Baca) fData(Baca) pData(Baca) exprs(Baca)
The RLQ visualises the association between and within sample and gene covariates by ordination. It applies generalized singular value decomposition (GSVD) to the fourthcorner matrix, which contains the associations between the sample and gene covariates. This is realised by eigendecomposition of the covariance matrices of the fourthcorner matrix. The name RLQ refers to the three dataframes R, L and Q to be analyzed. The function 'ord' automates the 'rlq' function of the 'ade4' package.
The input has to be given as dataframe or matrix. Dataframe/matrix L [n x p] contains transcriptomic data of p samples across n genes, dataframe/matrix R [n x m] contains m gene covariates across the n genes and dataframe/matrix Q [p x s] contains s sample covariates across the p samples. Alternatively, objects of the class ExpressionSet (with assayData, phenoData and featureData) can be used as input. If the argument ExprSet is missing, the function will use the dataframes/matrices R, L and Q as input.
Genes can be filtered with respect to their expression variance before analysis (argument exprvar); the function will automatically discard the gene covariates which do not annotate any of the remaining genes.
Warning: If R and Q are given as matrices, they will be converted to dataframes at the beginning of the function.
Warning: If R or Q is missing, it will be replaced by an identity matrix. Then, a principal component analysis of this matrix will be performed what might be time-consuming, depending on the size of the identity matrix.
ord(ExprSet, R=NULL, L=NULL, Q=NULL, exprvar=1, nf=2)
ord(ExprSet, R=NULL, L=NULL, Q=NULL, exprvar=1, nf=2)
ExprSet |
An ExpressionSet of the Biobase package. The ExpressionSet is used as default input. If no ExpressionSet is given, the individual dataframes/matrices R, L and Q can be used as input. |
R |
A dataframe/matrix containing information about each gene. The number of rows in R must match the number of rows in L. If R is missing, it will be replaced by an identity matrix [n x n]. |
L |
A dataframe/matrix of gene expression values of genes across samples. |
Q |
A dataframe/matrix containing information about each sample. The number of rows in Q must match the number of columns in L. If Q is missing, it will be replaced by an identity matrix [p x p]. |
exprvar |
The fraction of most variably expressed genes to take into account. If the functions 'stat' and 'ord' shall be combined, this value has to be the same in both analyses. |
nf |
The number of axes to be considered by ordination. |
The function automates the following steps. Firstly, Correspondence Analysis is applied to gene expression table L. Either Principal Component Analysis (only quantitative variables), Multiple Correspondence Analysis (only categorical variables) or Hillsmith analysis (quantitative and categorical variables) are applied to the covariate tables R and Q. Secondly, RLQ is applied to the results of these ordination methods.
The function returns a list ob class ord where:
call |
gives the original call of the function. |
rank |
gives the rank. |
nf |
gives number of axes to be considered by ordination. |
RV |
gives the RV coefficient. |
eig |
gives a vector of the eigenvalues. |
variance |
gives the variance explained by the axes. |
lw |
gives the row weights of the fourthcorner table. |
cw |
gives the column weights of the fourthcorner table. |
lw |
gives the row weights of the fourthcorner table. |
tab |
gives the fourthcorner table. |
li |
gives the coordinates of the covariates of R. |
l1 |
gives the normed scores of the covariates of R. |
co |
gives the coordinates of the covariates of Q. |
c1 |
gives the normed scores of the covariates of Q. |
lR |
gives the row coordinates of R. |
mR |
gives the normed row scores of R. |
lQ |
gives the row coordinates of Q. |
mQ |
gives the normed row scores of Q. |
aR |
gives projection of axis onto co-inertia axis of R. |
aR |
gives projection of axis onto co-inertia axis of Q. |
ngenes |
gives the number of analysed genes. |
Lara Urban
data(Baca) ordBaca <- ord(ExprSet = Baca, exprvar = 1, nf = 2) ls(ordBaca) plot(ordBaca)
data(Baca) ordBaca <- ord(ExprSet = Baca, exprvar = 1, nf = 2) ls(ordBaca) plot(ordBaca)
The function plot can visualise different features of an ord object by adjusting the argument "feature". By default, a barplot of the variance explained by the axes of the RLQ is plotted (see arguments).
## S3 method for class 'ord' plot(x, feature="variance", xaxis=1, yaxis=2, cex=1, range=2, ...)
## S3 method for class 'ord' plot(x, feature="variance", xaxis=1, yaxis=2, cex=1, range=2, ...)
x |
An object of class ord that shall be visualised by ordination. |
feature |
Defines which features of the object shall be visualised: "columns L","rows L", "columns R" and "columns Q" visualise the respective variables as oridnation, "variance" shows a barplot of the variance explained by the axes, "correlation circle R" and "correlation circle Q" visualise the projection of the original space into the ordination space. |
xaxis , yaxis
|
Define which axes of ordination shall be shown by x- and y-axis, respectively. |
cex |
Defines size of covariate text. |
range |
The range of the axes can be extended or reduced, e.g. for the case that not all covariates are visible in the default setting. |
... |
More plotting parameters can be added. |
Plot of RLQ.
Lara Urban
ordBaca <- ord(Baca) plot(ordBaca)
ordBaca <- ord(Baca) plot(ordBaca)
The function plot produces a cross table of the gene and sample covariates of a stat object. Colours indicate positive/negative significance or absence of significance of the assciations (per default: white for non-significant, red for negative significant and red for positive significant associations).
## S3 method for class 'stat' plot(x, col=c("lightgrey","deepskyblue","red"), sig=TRUE, alpha=0.05, show=c("adj","non-adj"), cex=1, ynames, xnames, ytext=1, xtext=1, shiftx=0, shifty=0, ...)
## S3 method for class 'stat' plot(x, col=c("lightgrey","deepskyblue","red"), sig=TRUE, alpha=0.05, show=c("adj","non-adj"), cex=1, ynames, xnames, ytext=1, xtext=1, shiftx=0, shifty=0, ...)
x |
An object of class stat that shall be visualised as a cross table. |
col |
A vector of three colours. The first colour represents non-significant, the second positive significant, the third negative significant associations in the cross table. |
sig |
If TRUE (default), only covariates involved in at least one significant association are plotted. |
alpha |
The significance level. |
show |
'adj' or 'non-adj' indicate if adjusted or raw p-values shall be plotted, respectively. |
cex |
The magnitude of the text in the cross table. |
ynames , xnames
|
Row and column names of the cross table. By default, the column names of R and Q are used, respectively. |
ytext , xtext
|
Rotation of the row and column names of the cross table. |
shifty , shiftx
|
Shift of the row and column names to the right or to the left. |
... |
More plotting parameters can be added. |
Plot of fourthcorner analysis.
Lara Urban
statBaca <- stat(Baca, nrcor = 2) plot(statBaca)
statBaca <- stat(Baca, nrcor = 2) plot(statBaca)
The fourthcorner analysis tests for significant associations between each sample covariate and each gene covariate by statistical permutation tests. The sample and gene covariates can be categorical and/or quantitative.
The input has to be given as dataframe or matrix. Dataframe/matrix L [n x p] contains transcriptomic data of p samples across n genes, dataframe/matrix R [n x m] contains m gene covariates across the n genes and dataframe/matrix Q [p x s] contains s sample covariates across the p samples. Alternatively, objects of the class ExpressionSet (with assayData, phenoData and featureData) can be used as input. If the argument ExprSet is missing, the function will use the dataframes/matrices R, L and Q as input.
The number of permutations is set to 9999 per default to assure significance of p-values after multiple testing correction. As computation time increases with size of the matrices/dataframes and with number of permutations, parallelization across multiple cores is highly recommended. Per default, all except one CPU cores on the current host are used.
Genes can be filtered with respect to their expression variance before analysis (argument exprvar); the function will automatically discard the gene covariates which do not annotate any of the remaining genes.
Warning: If R and Q are given as matrices, they will be converted to dataframes at the beginning of the function.
Warning: If R or Q is missing, it will be replaced by an identity matrix.
stat(ExprSet, R=NULL, L=NULL, Q=NULL, npermut=9999, padjust="BH", nrcor=detectCores()-1, exprvar=1)
stat(ExprSet, R=NULL, L=NULL, Q=NULL, npermut=9999, padjust="BH", nrcor=detectCores()-1, exprvar=1)
ExprSet |
An ExpressionSet of the Biobase package. The ExpressionSet is used as default input. If no ExpressionSet is given, the individual dataframes/matrices R, L and Q can be used as input. |
R |
A dataframe/matrix containing information about each gene. The number of rows in R must match the number of rows in L. If R is missing, it will be replaced by an identity matrix [n x n]. |
L |
A dataframe/matrix of gene expression values of genes across samples. |
Q |
A dataframe/matrix containing information about each sample. The number of rows in Q must match the number of columns in L. If Q is missing, it will be replaced by an identity matrix [p x p]. |
npermut |
The number of permutations. |
padjust |
The method of multiple testing adjustment of the pvalues, see p.adjust.methods for all methods implemented in R. |
nrcor |
The number of cores to be used. |
exprvar |
The fraction of most variably expressed genes to take into account. If the functions 'stat' and 'ord' shall be combined, this value has to be the same in both analyses. |
Dependent on the covariate combination, a statistic is calculated based on matrix multiplication of the three tables. This statistic amounts to a correlation coefficient for the association between quantitative-quantitative and quantitative-categorical variables and to a Chi2-related statistic for the association between categorical-categorical variables.
The function returns a list of class stat where:
stat |
is a cross table (m x s) with the values of the original statistical tests per covariate combination. |
pvalue , adj.pvalue
|
are cross tables (m x s) which contain the p-values and adjusted p-values, respectively, of the permutation tests per covariate combination. |
adjust.method |
shows the applied multiple testing adjustment method. |
npermut |
gives the number of permutations per permutation test. |
ngenes |
gives the number of analysed genes ("all" in the case of no filtering of the genes). |
call |
gives the original call of the function. |
Lara Urban
data(Baca) statBaca <- stat(ExprSet = Baca, npermut = 999, padjust = "BH", nrcor = 2, exprvar = 1) statBaca$adj.pvalue plot(statBaca)
data(Baca) statBaca <- stat(ExprSet = Baca, npermut = 999, padjust = "BH", nrcor = 2, exprvar = 1) statBaca$adj.pvalue plot(statBaca)
The vis function simultaneously visualizes the results of the functions stat and ord. Firstly, all covariates of R and Q are visualized by ordination in one plot; covariates involved in at least one significant association are shown in black, other covariates are shown in gray. Then, all covariates that are significantly associated according to stat are connected by lines which color represents the character of their significance.
vis(Stat, Ord=NULL, alpha=0.05, xaxis=1, yaxis=2, col=c("gray", transblue, transred), alphatrans=0.5, cex=1, rangex=2, rangey=2, ...)
vis(Stat, Ord=NULL, alpha=0.05, xaxis=1, yaxis=2, col=c("gray", transblue, transred), alphatrans=0.5, cex=1, rangex=2, rangey=2, ...)
Stat |
An object of class stat. |
Ord |
An object of class ord. The objects stat and ord should have the same value ngenes. |
alpha |
The significance level. |
xaxis , yaxis
|
Define which axes of ordination shall be shown by x- and y-axis, respectively. |
col |
A vector of three colors. The first color represents non-significant variables, the second positive significant, the third negative significant associations. |
alphatrans |
Defines degree of transparency of the second and third color. |
cex |
The magnitude of the text in the ordination. |
rangex , rangey
|
The range of the x axis and y axis can be extended or reduced, e.g. for the case that not all covariates are visible in the default setting. |
... |
More plotting parameters can be added. |
Plot of fourthcorner analysis and RLQ.
Lara Urban
data(Baca) statBaca <- stat(Baca, nrcor = 2) ordBaca <- ord(Baca) vis(Stat = statBaca, Ord = ordBaca) vis(Ord = ordBaca)
data(Baca) statBaca <- stat(Baca, nrcor = 2) ordBaca <- ord(Baca) vis(Stat = statBaca, Ord = ordBaca) vis(Ord = ordBaca)