Title: | An R package for the linear modeling of high-dimensional designed data based on ASCA/APCA family of methods |
---|---|
Description: | This package has for objectives to provide a method to make Linear Models for high-dimensional designed data. limpca applies a GLM (General Linear Model) version of ASCA and APCA to analyse multivariate sample profiles generated by an experimental design. ASCA/APCA provide powerful visualization tools for multivariate structures in the space of each effect of the statistical model linked to the experimental design and contrarily to MANOVA, it can deal with mutlivariate datasets having more variables than observations. This method can handle unbalanced design. |
Authors: | Bernadette Govaerts [aut, ths], Sebastien Franceschini [ctb], Robin van Oirbeek [ctb], Michel Thiel [aut], Pascal de Tullio [dtc], Manon Martin [aut, cre] , Nadia Benaiche [ctb] |
Maintainer: | Manon Martin <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.3.0 |
Built: | 2024-11-04 06:07:49 UTC |
Source: | https://github.com/bioc/limpca |
Creates the lmpDataList from a SummarizedExperiment or by manually defining the design, the outcomes and the model formula.
lmpDataList serves as an input for the lmpModelMatrix
function to start the limpca modeling.
data2LmpDataList( se = NULL, assay_name = NULL, outcomes = NULL, design = NULL, formula = NULL, verbose = TRUE )
data2LmpDataList( se = NULL, assay_name = NULL, outcomes = NULL, design = NULL, formula = NULL, verbose = TRUE )
se |
A |
assay_name |
If not |
outcomes |
If not |
design |
If not |
formula |
If not |
verbose |
If |
Data can be included as a SummarizedExperiment
(SE) object or by manually defining one or multiple
elements of outcomes
, design
and formula
. If a SE is provided,
the outcomes
corresponds to a transposed assay of the SE (by default the first one),
the design
corresponds to the colData
of the SE and the formula
can be provided as a
formula
element in the S4Vectors::metadata
of SE (metadata(se)$formula
).
In the outputted list, the outcomes are structured in a standard statistical fashion,
i.e. with observations in rows and the variables (features) in column.
If the outcomes
argument is not NULL
, it has to be formatted that way (see Arguments).
Note that there is a priority to the outcomes
, design
and formula
arguments if they are not NULL
(e.g. if both se
and outcomes
arguments are provided,
the resulting outcomes matrix will be from the outcomes
argument). outcomes
and design
elements are mandatory.
Multiple checks are performed to ensure that the data are correctly formatted:
the rownames of design
and outcomes
should match
the names of the model terms in the formula
should match column names from the design
A list with the 3 following named elements:
outcomes
A nxm matrix with the m response variables.
design
A nxq data.frame with the experimental design.
formula
A character string with the model formula.
data(UCH) ### create manually the dataset res <- data2LmpDataList( outcomes = UCH$outcomes, design = UCH$design[, 1, drop = FALSE], formula = "~ Hippurate" ) ### create the dataset from a SummarizedExperiment library(SummarizedExperiment) se <- SummarizedExperiment( assays = list( counts = t(UCH$outcomes), counts2 = t(UCH$outcomes * 2) ), colData = UCH$design, metadata = list(formula = "~ Hippurate + Citrate") ) res <- data2LmpDataList(se, assay_name = "counts2") # changing the formula: res <- data2LmpDataList(se, assay_name = "counts2", formula = "~ Hippurate + Citrate + Time" )
data(UCH) ### create manually the dataset res <- data2LmpDataList( outcomes = UCH$outcomes, design = UCH$design[, 1, drop = FALSE], formula = "~ Hippurate" ) ### create the dataset from a SummarizedExperiment library(SummarizedExperiment) se <- SummarizedExperiment( assays = list( counts = t(UCH$outcomes), counts2 = t(UCH$outcomes * 2) ), colData = UCH$design, metadata = list(formula = "~ Hippurate + Citrate") ) res <- data2LmpDataList(se, assay_name = "counts2") # changing the formula: res <- data2LmpDataList(se, assay_name = "counts2", formula = "~ Hippurate + Citrate + Time" )
This package has for objectives to provide a method to make Linear Models for high-dimensional designed data. This method handles unbalanced design. More features should be included in the future (e.g. generalized linear models, random effects, ...).
The core functions of the package are:
data2LmpDataList
Converts data to a lmpDataList, the input argument for lmpModelMatrix
.
lmpModelMatrix
Creates the model matrix from the design matrix and the model formula.
lmpEffectMatrices
Estimates the model by OLS based on the outcomes and model matrices provided in the outputs of the lmpModelMatrix
function and calculates the estimated effect matrices and residual matrix
. It calculates also the type III percentage of variance explained by each effect.
lmpBootstrapTests
Tests the significance of one or a combination of the model effects using bootstrap. This function is based on the outputs of the lmpEffectMatrices
function.
lmpPcaEffects
Performs a PCA on each of the effect matrices from the outputs of lmpEffectMatrices
. It has an option to choose the method applied: ASCA, APCA or ASCA-E. Combined effects (i.e. linear combinations of original effect matrices) can also be created and decomposed by PCA.
The functions allowing the visualisation of the Linear Models results are:
lmpScreePlot
Provides a barplot of the percentage of variance associated to the PCs of the effect matrices ordered by importance based on the outputs of lmpContributions
.
lmpContributions
This reports the contribution of each effect to the total variance, but also the contribution of each PC to the total variance per effect. Moreover, these contributions are summarized in a barplot.
lmpScorePlot
Draws the score plots of each effect matrix provided in the lmpPcaEffects
function output.
lmpLoading1dPlot
or lmpLoading2dPlot
Plots the loadings as a line plot (1D) or in 2D as a scatterplot.
lmpScoreScatterPlotM
Plots the scores of all model effects simultaneously in a scatterplot matrix. By default, the first PC only is kept for each model effect.
lmpEffectPlot
Plots the ASCA scores by effect levels for a given model effect and for one PC at a time. This graph is especially appealing to interpret interactions or combined effects.
Other useful functions to visualise and explore by PCA the multivariate data are:
plotDesign
Provides a graphical representation of the experimental design. It allows to visualize factor levels and check the design balance.
plotScatter
Produces a plot describing the relationship between two columns of the outcomes matrix . It allows to choose colors and symbols for the levels of the design factors. Ellipses, polygons or segments can be added to group different sets of points on the graph.
plotScatterM
Produces a scatter plot matrix between the selected columns of the outcomes matrix choosing specific colors and symbols for up to four factors from the design on the upper and lower diagonals.
plotMeans
Draws, for a given response variable, a plot of the response means by levels of up to three categorical factors from the design. When the design is balanced, it allows to visualize main effects or interactions for the response of interest. For unbalanced designs, this plot must be used with caution.
plotLine
Generates the response profile of one or more observations i.e. plots of one or more rows of the outcomes matrix on the y-axis against the m response variables on the x-axis. Depending on the response type (spectra, gene expression...), point, line or segment plots can be used.
pcaBySvd
Operates a principal component analysis on the outcome/response matrix by a singular value decomposition. Outputs are can be visulised with the functions
pcaScorePlot
, pcaLoading1dPlot
, pcaLoading2dPlot
and pcaScreePlot
.
pcaScorePlot
Produces score plots from the pcaBySvd
output.
pcaLoading1dPlot
or pcaLoading2dPlot
Plots the PCA loadings as a line plot (1D) or in 2D as a scatterplot.
pcaScreePlot
Returns a bar plot of the percentage of variance explained by each Principal Component (PC) calculated by pcaBySvd
.
Package: | limpca |
Type: | Package |
License: | GPL-2 |
See the package vignettes (vignette(package = "limpca")
) for detailed case studies.
Thiel, M., Benaiche, N., Martin, M., Franceschini, S., Van Oirbeek, R., Govaerts, B. (2023). limpca: An R package for the linear modeling of high-dimensional designed data based on ASCA/APCA family of methods. Journal of Chemometrics. e3482. https://doi.org/10.1002/cem.3482
Martin, M. (2020). Uncovering informative content in metabolomics data: from pre-processing of 1H NMR spectra to biomarkers discovery in multifactorial designs. Prom.: Govaerts, B. PhD thesis. Institut de statistique, biostatistique et sciences actuarielles, UCLouvain, Belgium. http://hdl.handle.net/2078.1/227671
Thiel M., Feraud B. and Govaerts B. (2017). ASCA+ and APCA+: Extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs. Journal of Chemometrics. 31:e2895. https://doi.org/10.1002/cem.2895
Tests the significance of the effects from the model using bootstrap. This function is based on the outputs of lmpEffectMatrices
. Tests on combined effects are also provided.
lmpBootstrapTests( resLmpEffectMatrices, nboot = 100, nCores = 2, verbose = FALSE )
lmpBootstrapTests( resLmpEffectMatrices, nboot = 100, nCores = 2, verbose = FALSE )
resLmpEffectMatrices |
A list of 12 from |
nboot |
An integer with the number of bootstrap sample to be drawn. |
nCores |
The number of cores to use for parallel execution. |
verbose |
If |
A list with the following elements:
f.obs
A vector of size F (number of effects in the model) with the F statistics for each model term calculated on the initial data.
f.boot
b × F matrix with the F statistics calculated on the bootstrap samples.
p.values
A vector of size F with the p-value for each model effect.
resultsTable
A 2 × F matrix with the p-value and the percentage of variance for each model effect.
Thiel M.,Feraud B. and Govaerts B. (2017) ASCA+ and APCA+: Extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs, Journal of Chemometrics
Thiel, M., Benaiche, N., Martin, M., Franceschini, S., Van Oirbeek, R., & Govaerts, B. (2023) limpca: an R package for the linear modeling of high dimensional designed data based on ASCA/APCA family of methods, Journal of Chemometrics
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix = resLmpModelMatrix) res <- lmpBootstrapTests( resLmpEffectMatrices = resLmpEffectMatrices, nboot = 10, nCores = 2, verbose = TRUE )
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix = resLmpModelMatrix) res <- lmpBootstrapTests( resLmpEffectMatrices = resLmpEffectMatrices, nboot = 10, nCores = 2, verbose = TRUE )
Reports the contribution of each effect to the total variance, but also the contribution of each PC to the total variance per effect. These contributions are also summarized in a barplot.
lmpContributions(resLmpPcaEffects, nPC = 5)
lmpContributions(resLmpPcaEffects, nPC = 5)
resLmpPcaEffects |
A list corresponding to the output value of |
nPC |
The number of Principal Components to display. |
A list of:
totalContribTable
Table of the percentage of contribution of each effect to the total variance.
effectTable
Table of the percentage of variance explained by each principal component in each model effect decomposition.
contribTable
Table of the percentage of variance explained by each principal component of each effect reported to the percentage contribution of the given effect to the total variance.
combinedEffectTable
Equivalent of the EffectTable for combined effects.
plotTotal
Plot of the ordered contributions of TotalContribTable.
plotContrib
Plot of the ordered contributions of ContribTable.
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resLmpPcaEffects <- lmpPcaEffects(resLmpEffectMatrices, method = "ASCA-E") lmpContributions(resLmpPcaEffects)
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resLmpPcaEffects <- lmpPcaEffects(resLmpEffectMatrices, method = "ASCA-E") lmpContributions(resLmpPcaEffects)
Estimates the model by OLS based on the outcomes and model matrices provided in the outputs of lmpModelMatrix function and calculates the estimated effect matrices
, ... and residual matrix
.
It calculates also the type III percentage of variance explained by each effect.
lmpEffectMatrices(resLmpModelMatrix, SS = TRUE, contrastList = NA)
lmpEffectMatrices(resLmpModelMatrix, SS = TRUE, contrastList = NA)
resLmpModelMatrix |
A list of 5 elements from |
SS |
Logical. If |
contrastList |
A list of contrasts for each parameter. If |
A list with the following elements:
lmpDataList
The initial object: a list with outcomes, design and formula.
modelMatrix
A nxp model matrix specifically encoded for the ASCA-GLM method.
modelMatrixByEffect
A list of F+1 model matrices for each effect.
effectsNamesUnique
A character vector with the F+1 names of the model effects, each repeated once.
effectsNamesAll
A character vector with the p names of the model effects ordered and repeated as the column names of the model matrix.
effectMatrices
A list of F+1 effect matrices for each model effect.
predictedvalues
The nxm matrix of predicted outcome values.
residuals
The nxm matrix of model residuals.
parameters
The pxm matrix of the estimated parameters.
type3SS
A vector with the type III sum of squares for each model effect (If SS = TRUE).
variationPercentages
A vector with the percentage of variance for each model effect (If SS = TRUE).
varPercentagesPlot
A ggplot bar plot of the contributions of each model effect to the total variance (If SS = TRUE).
Thiel M.,Feraud B. and Govaerts B. (2017) ASCA+ and APCA+: Extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs, Journal of Chemometrics
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) reslmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) reslmpEffectMatrices$varPercentagesPlot
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) reslmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) reslmpEffectMatrices$varPercentagesPlot
Plots the ASCA scores by effect levels for a given model effect and for one PC at a time. This graph is especially appealing to interpret interactions or combined effects. It is a wrapper of plotMeans.
lmpEffectPlot( resASCA, effectName, axes = 1, x, z = NULL, w = NULL, hline = 0, ... )
lmpEffectPlot( resASCA, effectName, axes = 1, x, z = NULL, w = NULL, hline = 0, ... )
resASCA |
A list corresponding to the ASCA output value of |
effectName |
Name of the effect to be used to plot the scores. |
axes |
A numerical vector with the Principal Components axes to be drawn. |
x |
A character string giving the |
z |
A character string giving the |
w |
A character string giving the |
hline |
If not |
... |
Additional arguments to be passed to |
lmpEffectPlot
is a wrapper of plotMeans
.
An effect plot (ggplot).
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects( resLmpEffectMatrices = resLmpEffectMatrices, method = "ASCA", combineEffects = list(c("Hippurate", "Time", "Hippurate:Time")) ) # Effect plot for an interaction effect lmpEffectPlot(resASCA, effectName = "Hippurate:Time", x = "Hippurate", z = "Time") # Effect plot for a combined effect lmpEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", x = "Hippurate", z = "Time")
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects( resLmpEffectMatrices = resLmpEffectMatrices, method = "ASCA", combineEffects = list(c("Hippurate", "Time", "Hippurate:Time")) ) # Effect plot for an interaction effect lmpEffectPlot(resASCA, effectName = "Hippurate:Time", x = "Hippurate", z = "Time") # Effect plot for a combined effect lmpEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", x = "Hippurate", z = "Time")
Plots the loading vectors for each effect matrix from the lmpPcaEffects
outputs with line plots. This is a wrapper of plotLine.
lmpLoading1dPlot(resLmpPcaEffects, effectNames = NULL, axes = c(1, 2), ...)
lmpLoading1dPlot(resLmpPcaEffects, effectNames = NULL, axes = c(1, 2), ...)
resLmpPcaEffects |
A list corresponding to the output value of |
effectNames |
Names of the effects to be plotted. if |
axes |
A numerical vector with the Principal Components axes to be drawn. |
... |
Additional arguments to be passed to |
lmpLoading1dPlot
is a wrapper of plotLine
. See ?plotLine
for more information on the additional arguments.
A list of ggplot
objects representing the loading plots.
# Example of "spectral" type loadings (line and numerical x-axis) data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects(resLmpEffectMatrices, combineEffects = list(c("Time", "Hippurate:Time")) ) lmpLoading1dPlot(resASCA) lmpLoading1dPlot(resASCA, effectNames = c("Hippurate", "Citrate")) # Example of "segment" and discrete type loadings (segments and character x-axis) data("trout") resLmpModelMatrix <- lmpModelMatrix(trout) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects(resLmpEffectMatrices) lmpLoading1dPlot(resASCA, effectNames = "Day", xaxis_type = "character", type = "s", ang_x_axis = 90 )
# Example of "spectral" type loadings (line and numerical x-axis) data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects(resLmpEffectMatrices, combineEffects = list(c("Time", "Hippurate:Time")) ) lmpLoading1dPlot(resASCA) lmpLoading1dPlot(resASCA, effectNames = c("Hippurate", "Citrate")) # Example of "segment" and discrete type loadings (segments and character x-axis) data("trout") resLmpModelMatrix <- lmpModelMatrix(trout) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects(resLmpEffectMatrices) lmpLoading1dPlot(resASCA, effectNames = "Day", xaxis_type = "character", type = "s", ang_x_axis = 90 )
Draws a 2D loading plot of each effect matrix provided in lmpPcaEffects
outputs. As a wrapper of the plotScatter
function, it allows the visualization of effect loading matrices for two components at a time with all options available in plotScatter
.
lmpLoading2dPlot( resLmpPcaEffects, effectNames = NULL, axes = c(1, 2), addRownames = FALSE, pl_n = 10, metadata = NULL, drawOrigin = TRUE, ... )
lmpLoading2dPlot( resLmpPcaEffects, effectNames = NULL, axes = c(1, 2), addRownames = FALSE, pl_n = 10, metadata = NULL, drawOrigin = TRUE, ... )
resLmpPcaEffects |
A list corresponding to the output value of |
effectNames |
Names of the effects to be plotted. If |
axes |
A numerical vector with the 2 Principal Components axes to be drawn. |
addRownames |
Boolean indicating if the labels should be plotted. By default, uses the column names of the outcome matrix but it can be manually specified with the |
pl_n |
The number of labels that should be plotted, based on the distance measure |
metadata |
A nxk "free encoded" data.frame corresponding to |
drawOrigin |
if |
... |
Additional arguments to be passed to |
lmpLoading2dPlot
is a wrapper of plotScatter
. See ?plotScatter
for more information on the additional arguments.
The distance measure that is used to rank the variables is based on the following formula:
where
and
are two selected Principal Components,
represents their
loadings and
their singular values.
A list of loading plots (ggplot).
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects(resLmpEffectMatrices) lmpLoading2dPlot(resASCA, effectNames = "Hippurate") # adding color, shape and labels to points id_hip <- c(seq(126, 156), seq(362, 375)) peaks <- rep("other", ncol(UCH$outcomes)) peaks[id_hip] <- "hip" metadata <- data.frame(peaks) lmpLoading2dPlot(resASCA, effectNames = "Hippurate", metadata = metadata, addRownames = TRUE, color = "peaks", shape = "peaks" ) # changing max.overlaps of ggrepel options(ggrepel.max.overlaps = 30) lmpLoading2dPlot(resASCA, effectNames = "Hippurate", metadata = metadata, addRownames = TRUE, color = "peaks", shape = "peaks", pl_n = 20 )
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCA <- lmpPcaEffects(resLmpEffectMatrices) lmpLoading2dPlot(resASCA, effectNames = "Hippurate") # adding color, shape and labels to points id_hip <- c(seq(126, 156), seq(362, 375)) peaks <- rep("other", ncol(UCH$outcomes)) peaks[id_hip] <- "hip" metadata <- data.frame(peaks) lmpLoading2dPlot(resASCA, effectNames = "Hippurate", metadata = metadata, addRownames = TRUE, color = "peaks", shape = "peaks" ) # changing max.overlaps of ggrepel options(ggrepel.max.overlaps = 30) lmpLoading2dPlot(resASCA, effectNames = "Hippurate", metadata = metadata, addRownames = TRUE, color = "peaks", shape = "peaks", pl_n = 20 )
X
Creates the model matrix X
from the design matrix and the model formula.
lmpModelMatrix(lmpDataList)
lmpModelMatrix(lmpDataList)
lmpDataList |
A list containing the outcomes, the experimental design and the formula. |
In typical ASCA-GLM (ASCA+) analysis, the effects of the GLM model must first be used to transform the design matrix to a model matrix where the design factors encoded usign sum coding commonly used in industrial experimental design. Suppose the design matrix is nxk with n observations and k factors. After the transformation, the model matrix will be of size nxp. For a fator with a levels, the sum coding creates a-1 columns in the model matrix with 0 and 1 for the a-1 first levels and -1 for the last one. p is the total number parameter for each response (outcome) in the ASCA model. More information is available in the article (Thiel et al, 2017) Note that at the moment, only factors can be used as explanatory variables.
A list with the 5 following named elements :
lmpDataList
The initial object: a list with outcomes, design and formula, as outputted by data2LmpDataList
.
modelMatrix
A nxK model matrix specifically encoded for the ASCA-GLM method.
modelMatrixByEffect
A list of p model matrices for each model effect.
effectsNamesUnique
A character vector with the p names of the model effects, each repeated once.
effectsNamesAll
A character vector with the K names of the model effects ordered and repeated as the column names of the model matrix.
Thiel M.,Feraud B. and Govaerts B. (2017) ASCA+ and APCA+: Extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs, Journal of Chemometrics
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) head(resLmpModelMatrix$modelMatrix)
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) head(resLmpModelMatrix$modelMatrix)
Performs a PCA on each of the effect matrices from the outputs of lmpEffectMatrices
. It has an option to choose the method applied: ASCA, APCA or ASCA-E. Combined effects (i.e. linear combinations of original effect matrices) can also be created and decomposed by PCA.
lmpPcaEffects( resLmpEffectMatrices, method = c("ASCA", "APCA", "ASCA-E"), combineEffects = NULL, verbose = FALSE )
lmpPcaEffects( resLmpEffectMatrices, method = c("ASCA", "APCA", "ASCA-E"), combineEffects = NULL, verbose = FALSE )
resLmpEffectMatrices |
A resLmpEffectMatrices list resulting of |
method |
The method used to compute the PCA. One of |
combineEffects |
If not |
verbose |
If |
The function allows 3 different methods :
PCA is applied directly on each pure effect matrix .
PCA is applied on each pure effect matrix but then the augmented effect matrix is projected in the space of the ASCA components.
PCA is applied on each augmented effect matrix : .
A list with first,the PCA results from pcaBySvd
for each effect matrix. Those results contain :
scores
Scores from the PCA for each principal component.
loadings
Loadings from the PCA for each principal component.
eigval
Eigenvalues of each principal component.
singvar
Singular values of each principal component.
var
Explained variances of each principal component.
cumvar
Cumulated explained variances of each principal component.
original.dataset
Original dataset.
There are also others outputs :
lmpDataList
The initial object: a list of outcomes, design and formula.
effectsNamesUnique
A character vector with the F+1 names of the model terms, each repeated once.
method
The dimension reduction method used: c("ASCA","APCA","ASCA-E")
.
type3SS
A vector with the type III SS for each model term.
variationPercentages
A vector with the percentage of variance explained by each model term.
Thiel M.,Feraud B. and Govaerts B. (2017) ASCA+ and APCA+: Extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs. Journal of Chemometrics. 31:e2895. https://doi.org/10.1002/cem.2895
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resLmpPcaEffects <- lmpPcaEffects(resLmpEffectMatrices, method = "ASCA-E")
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resLmpPcaEffects <- lmpPcaEffects(resLmpEffectMatrices, method = "ASCA-E")
Draws the score plots of each (augmented) effect matrix provided in lmpPcaEffects
. As a wrapper of the plotScatter
function, it allows to visualize the scores of the effect matrices for two components at a time with all the available options in plotScatter
.
lmpScorePlot(resLmpPcaEffects, effectNames = NULL, axes = c(1, 2), ...)
lmpScorePlot(resLmpPcaEffects, effectNames = NULL, axes = c(1, 2), ...)
resLmpPcaEffects |
A list corresponding to the output value of |
effectNames |
Names of the effects to be plotted. If |
axes |
A numerical vector with the 2 Principal Components axes to be drawn. |
... |
Additional arguments to be passed to |
lmpScorePlot
is a wrapper of plotScatter
.
A list of score plots (ggplot).
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) # PCA decomposition of effect matrices (ASCA) resASCA <- lmpPcaEffects(resLmpEffectMatrices) # Score plot of Hippurate effect matrix lmpScorePlot(resASCA, effectNames = "Hippurate", color = "Hippurate", shape = "Hippurate" ) # PCA decomposition of augmented effect matrices (APCA) resASCA <- lmpPcaEffects(resLmpEffectMatrices, method = "APCA") # Score plot of Hippurate augmented effect matrix lmpScorePlot(resASCA, effectNames = "Hippurate", color = "Hippurate", shape = "Hippurate", drawShapes = "ellipse" )
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) # PCA decomposition of effect matrices (ASCA) resASCA <- lmpPcaEffects(resLmpEffectMatrices) # Score plot of Hippurate effect matrix lmpScorePlot(resASCA, effectNames = "Hippurate", color = "Hippurate", shape = "Hippurate" ) # PCA decomposition of augmented effect matrices (APCA) resASCA <- lmpPcaEffects(resLmpEffectMatrices, method = "APCA") # Score plot of Hippurate augmented effect matrix lmpScorePlot(resASCA, effectNames = "Hippurate", color = "Hippurate", shape = "Hippurate", drawShapes = "ellipse" )
Plots the scores of all model effects simultaneously in a scatterplot matrix.
By default, the first PC only is kept for each model effect and, as a wrapper of plotScatterM
, the choice of symbols and colors to distinguish factor levels allows an enriched visualization of the factors’ effect on the responses.
lmpScoreScatterPlotM( resLmpPcaEffects, effectNames = NULL, PCdim = NULL, modelAbbrev = FALSE, ... )
lmpScoreScatterPlotM( resLmpPcaEffects, effectNames = NULL, PCdim = NULL, modelAbbrev = FALSE, ... )
resLmpPcaEffects |
A list corresponding to the output value of |
effectNames |
A character vector with the name of the effects to plot. |
PCdim |
A numeric vector with the same length than effectNames and indicating the number of component to plot. |
modelAbbrev |
A logical whether to abbreviate the interaction terms or not. |
... |
Additional arguments to be passed to |
lmpScoreScatterPlotM
is a wrapper of plotScatterM
.
A matrix of graphs
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) ResLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resLmpPcaEffects <- lmpPcaEffects(ResLmpEffectMatrices, method = "ASCA-E") lmpScoreScatterPlotM(resLmpPcaEffects, varname.colorup = "Citrate", varname.pchup = "Hippurate", varname.pchdown = "Day", varname.colordown = "Time" ) # advanced setting lmpScoreScatterPlotM(resLmpPcaEffects, modelAbbrev = FALSE, effectNames = c("Citrate", "Hippurate", "Hippurate:Citrate"), PCdim = c(2, 2, 2), varname.colorup = "Citrate", vec.colorup = c("red", "blue", "green"), varname.pchup = "Hippurate", vec.pchup = c(1, 2, 3), varname.pchdown = "Day", vec.pchdown = c(4, 5), varname.colordown = "Time", vec.colordown = c("brown", "grey") )
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) ResLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resLmpPcaEffects <- lmpPcaEffects(ResLmpEffectMatrices, method = "ASCA-E") lmpScoreScatterPlotM(resLmpPcaEffects, varname.colorup = "Citrate", varname.pchup = "Hippurate", varname.pchdown = "Day", varname.colordown = "Time" ) # advanced setting lmpScoreScatterPlotM(resLmpPcaEffects, modelAbbrev = FALSE, effectNames = c("Citrate", "Hippurate", "Hippurate:Citrate"), PCdim = c(2, 2, 2), varname.colorup = "Citrate", vec.colorup = c("red", "blue", "green"), varname.pchup = "Hippurate", vec.pchup = c(1, 2, 3), varname.pchdown = "Day", vec.pchdown = c(4, 5), varname.colordown = "Time", vec.colordown = c("brown", "grey") )
Provides a barplot of the percentage of variance associated to the PCs of the effect matrices ordered by importance based on the outputs of lmpContributions
.
lmpScreePlot( resLmpContributions, effectNames = NULL, nPC = 5, theme = theme_bw() )
lmpScreePlot( resLmpContributions, effectNames = NULL, nPC = 5, theme = theme_bw() )
resLmpContributions |
A resLmpContributions list from the function |
effectNames |
Names of the effects to be plotted. if |
nPC |
An integer with the number of components to plot. |
theme |
|
A scree plot (ggplot).
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCAE <- lmpPcaEffects(resLmpEffectMatrices, method = "ASCA-E") resLmpContributions <- lmpContributions(resASCAE) lmpScreePlot(resLmpContributions, effectNames = "Hippurate:Citrate", nPC = 4)
data("UCH") resLmpModelMatrix <- lmpModelMatrix(UCH) resLmpEffectMatrices <- lmpEffectMatrices(resLmpModelMatrix) resASCAE <- lmpPcaEffects(resLmpEffectMatrices, method = "ASCA-E") resLmpContributions <- lmpContributions(resASCAE) lmpScreePlot(resLmpContributions, effectNames = "Hippurate:Citrate", nPC = 4)
Operates a Principal Component Analysis on the Y
outcome/response matrix by a singular
Value Decomposition (the pre-processing involves the mean-centering of Y
).
Outputs are represented with the functions pcaScorePlot
,
pcaLoading1dPlot
, pcaLoading2dPlot
and pcaScreePlot
.
pcaBySvd(Y = NULL, lmpDataList = NULL, nPC = min(dim(Y)))
pcaBySvd(Y = NULL, lmpDataList = NULL, nPC = min(dim(Y)))
Y |
The |
lmpDataList |
A list with outcomes, design and formula, as outputted by |
nPC |
Number of Principal Components to extract. |
A list containing the following elements:
scores
Scores
loadings
Loadings
eigval
Eigenvalues
singvar
Singular values
var
Explained variances
cumvar
Cumulated explained variances
original.dataset
Original dataset
design
Design of the study
data("UCH") PCA.res1 <- pcaBySvd(Y = UCH$outcomes) PCA.res2 <- pcaBySvd(lmpDataList = UCH) identical(PCA.res1, PCA.res2)
data("UCH") PCA.res1 <- pcaBySvd(Y = UCH$outcomes) PCA.res2 <- pcaBySvd(lmpDataList = UCH) identical(PCA.res1, PCA.res2)
Plots the loading vectors from pcaBySvd
output with different available line types.
pcaLoading1dPlot(resPcaBySvd, axes = c(1, 2), title = "PCA loading plot", ...)
pcaLoading1dPlot(resPcaBySvd, axes = c(1, 2), title = "PCA loading plot", ...)
resPcaBySvd |
A list corresponding to the output value of |
axes |
A numerical vector of length 2 with the Principal Components axes to be drawn. |
title |
Plot title. |
... |
Additional arguments to be passed to |
pcaLoading1dPlot
is a wrapper of plotLine
. See ?plotLine
for more information on the additional arguments.
A ggplot2
object with the PCA loading plot.
data("UCH") ResPCA <- pcaBySvd(UCH$outcomes) pcaLoading1dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH", xlab = "ppm", ylab = "Values" )
data("UCH") ResPCA <- pcaBySvd(UCH$outcomes) pcaLoading1dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH", xlab = "ppm", ylab = "Values" )
Produces 2D loading plots from pcaBySvd
with the same graphical options as plotScatter
as this is a wrapper of this function.
pcaLoading2dPlot( resPcaBySvd, axes = c(1, 2), title = "PCA loading plot", addRownames = FALSE, pl_n = 10, metadata = NULL, drawOrigin = TRUE, ... )
pcaLoading2dPlot( resPcaBySvd, axes = c(1, 2), title = "PCA loading plot", addRownames = FALSE, pl_n = 10, metadata = NULL, drawOrigin = TRUE, ... )
resPcaBySvd |
A list corresponding to the output value of |
axes |
A numerical vector of length 2 with the Principal Components axes to be drawn. |
title |
Plot title. |
addRownames |
Boolean indicating if the labels should be plotted. By default, uses the row names of the loadings matrix but it can be manually specified with the |
pl_n |
The number of labels that should be plotted, based on a distance measure (see Details). |
metadata |
A |
drawOrigin |
if |
... |
Additional arguments to be passed to |
pcaLoading2dPlot
is a wrapper of plotScatter
. See ?plotScatter
for more information on the additional arguments.
The distance measure that is used to rank the variables is based on the following formula:
where
and
are two selected Principal Components,
represents their
loadings and
their singular values.
A ggplot2
object with the PCA loading plot.
data("UCH") ResPCA <- pcaBySvd(UCH$outcomes) pcaLoading2dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH" ) # adding color, shape and labels to points id_cit <- seq(446, 459) id_hip <- c(seq(126, 156), seq(362, 375)) peaks <- rep("other", ncol(UCH$outcomes)) peaks[id_hip] <- "hip" peaks[id_cit] <- "cit" metadata <- data.frame(peaks) pcaLoading2dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH", metadata = metadata, color = "peaks", shape = "peaks", addRownames = TRUE ) # changing max.overlaps of ggrepel options(ggrepel.max.overlaps = 30) pcaLoading2dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH", metadata = metadata, color = "peaks", shape = "peaks", addRownames = TRUE, pl_n = 35 )
data("UCH") ResPCA <- pcaBySvd(UCH$outcomes) pcaLoading2dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH" ) # adding color, shape and labels to points id_cit <- seq(446, 459) id_hip <- c(seq(126, 156), seq(362, 375)) peaks <- rep("other", ncol(UCH$outcomes)) peaks[id_hip] <- "hip" peaks[id_cit] <- "cit" metadata <- data.frame(peaks) pcaLoading2dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH", metadata = metadata, color = "peaks", shape = "peaks", addRownames = TRUE ) # changing max.overlaps of ggrepel options(ggrepel.max.overlaps = 30) pcaLoading2dPlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA loading plot UCH", metadata = metadata, color = "peaks", shape = "peaks", addRownames = TRUE, pl_n = 35 )
Produces score plots from pcaBySvd
output with the same graphical options as plotScatter
as this is a wrapper of this function..
pcaScorePlot( resPcaBySvd, design = NULL, axes = c(1, 2), title = "PCA score plot", points_labs_rn = FALSE, ... )
pcaScorePlot( resPcaBySvd, design = NULL, axes = c(1, 2), title = "PCA score plot", points_labs_rn = FALSE, ... )
resPcaBySvd |
A list corresponding to the output value of |
design |
A |
axes |
A numerical vector of length 2 with the Principal Components axes to be drawn. |
title |
Plot title. |
points_labs_rn |
Boolean indicating if the rownames of the scores matrix should be plotted. |
... |
Additional arguments to be passed to |
pcaScorePlot
is a wrapper of plotScatter
. See ?plotScatter
for more information on the additional arguments.
A ggplot2
PCA score plot.
data("UCH") # design is explicitly defined ResPCA <- pcaBySvd(Y = UCH$outcomes) pcaScorePlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA score plot UCH", design = UCH$design, color = "Hippurate", shape = "Citrate" ) # design is recovered from lmpDataList through pcaBySvd() ResPCA <- pcaBySvd(lmpDataList = UCH) pcaScorePlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA score plot UCH", color = "Hippurate", shape = "Citrate" )
data("UCH") # design is explicitly defined ResPCA <- pcaBySvd(Y = UCH$outcomes) pcaScorePlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA score plot UCH", design = UCH$design, color = "Hippurate", shape = "Citrate" ) # design is recovered from lmpDataList through pcaBySvd() ResPCA <- pcaBySvd(lmpDataList = UCH) pcaScorePlot( resPcaBySvd = ResPCA, axes = c(1, 2), title = "PCA score plot UCH", color = "Hippurate", shape = "Citrate" )
Returns a bar plot of the percentage of variance explained by each Principal Component calculated by pcaBySvd
.
pcaScreePlot( resPcaBySvd, nPC = 5, title = "PCA scree plot", theme = theme_bw() )
pcaScreePlot( resPcaBySvd, nPC = 5, title = "PCA scree plot", theme = theme_bw() )
resPcaBySvd |
A list corresponding to the output value of |
nPC |
An integer with the number of Principal Components to plot. |
title |
Plot title. |
theme |
|
A ggplot2
PCA scree plot
data("UCH") resPCA <- pcaBySvd(UCH$outcomes) pcaScreePlot(resPCA, nPC = 4)
data("UCH") resPCA <- pcaBySvd(UCH$outcomes) pcaScreePlot(resPCA, nPC = 4)
Provides a graphical representation of the experimental design. It allows to visualize factors levels and check the design balance.
plotDesign( design = NULL, lmpDataList = NULL, x = NULL, y = NULL, rows = NULL, cols = NULL, title = "Plot of the design", theme = theme_bw() )
plotDesign( design = NULL, lmpDataList = NULL, x = NULL, y = NULL, rows = NULL, cols = NULL, title = "Plot of the design", theme = theme_bw() )
design |
A data.frame representing the |
lmpDataList |
If not |
x |
By default, the first column of |
y |
By default, the second column of |
rows |
By default, the fourth column of |
cols |
By default, the third column of |
title |
Plot title. |
theme |
The |
Either design
or lmpDataList
need to be defined. If both are given, the priority goes to design
.
The default behavior (parameters x
, y
, cols
and rows
are NULL
) uses the first four columns of df
. If at least one of these arguments is not NULL
, the function will only use the non NULL
parameters to be displayed.
A ggplot2
plot of the design matrix.
### trout data data(trout) plotDesign(design = trout$design, x = "Day", y = "Treatment") # equivalent to: plotDesign(lmpDataList = trout, x = "Day", y = "Treatment") ### mtcars data(mtcars) library(tidyverse) df <- mtcars %>% dplyr::select(cyl, vs, am, gear, carb) %>% as.data.frame() %>% dplyr::mutate(across(everything(), as.factor)) # Default behavior: display the 4 first factors in the design plotDesign(design = df) # 2 factors plotDesign( design = df, x = "cyl", y = "vs", cols = NULL, rows = NULL ) # 3 factors plotDesign( design = df, x = "cyl", y = "vs", cols = NULL, rows = c("am") ) # 4 factors plotDesign( design = df, x = "cyl", y = "vs", cols = c("gear"), rows = c("am") ) # 5 factors plotDesign( design = df, x = "cyl", y = "vs", cols = c("gear"), rows = c("am", "carb") ) plotDesign( design = df, x = "cyl", y = "vs", cols = c("vs"), rows = c("am", "carb") ) ### UCH data("UCH") plotDesign(design = UCH$design, x = "Hippurate", y = "Citrate", rows = "Day")
### trout data data(trout) plotDesign(design = trout$design, x = "Day", y = "Treatment") # equivalent to: plotDesign(lmpDataList = trout, x = "Day", y = "Treatment") ### mtcars data(mtcars) library(tidyverse) df <- mtcars %>% dplyr::select(cyl, vs, am, gear, carb) %>% as.data.frame() %>% dplyr::mutate(across(everything(), as.factor)) # Default behavior: display the 4 first factors in the design plotDesign(design = df) # 2 factors plotDesign( design = df, x = "cyl", y = "vs", cols = NULL, rows = NULL ) # 3 factors plotDesign( design = df, x = "cyl", y = "vs", cols = NULL, rows = c("am") ) # 4 factors plotDesign( design = df, x = "cyl", y = "vs", cols = c("gear"), rows = c("am") ) # 5 factors plotDesign( design = df, x = "cyl", y = "vs", cols = c("gear"), rows = c("am", "carb") ) plotDesign( design = df, x = "cyl", y = "vs", cols = c("vs"), rows = c("am", "carb") ) ### UCH data("UCH") plotDesign(design = UCH$design, x = "Hippurate", y = "Citrate", rows = "Day")
Generates the response profile of one or more observations i.e. plots of one or more rows of the outcomes matrix on the y-axis against the response variables on the x-axis. Depending on the response type (spectra, gene expression...), point, line or segment plots can be used.
plotLine( Y = NULL, lmpDataList = NULL, rows = 1, type = c("l", "p", "s"), title = "Line plot", xlab = NULL, ylab = NULL, xaxis_type = c("numeric", "character"), stacked = FALSE, ncol = 1, nrow = NULL, facet_label = NULL, hline = 0, size = 0.5, color = NULL, shape = 1, theme = theme_bw(), ang_x_axis = NULL )
plotLine( Y = NULL, lmpDataList = NULL, rows = 1, type = c("l", "p", "s"), title = "Line plot", xlab = NULL, ylab = NULL, xaxis_type = c("numeric", "character"), stacked = FALSE, ncol = 1, nrow = NULL, facet_label = NULL, hline = 0, size = 0.5, color = NULL, shape = 1, theme = theme_bw(), ang_x_axis = NULL )
Y |
A numerical matrix containing the rows to be drawn. Can be |
lmpDataList |
If not |
rows |
A vector with either the row name(s) of the |
type |
Type of graph to be drawn: |
title |
Plot title. |
xlab |
If not |
ylab |
If not |
xaxis_type |
The data type of the x-axis: either |
stacked |
Logical. If |
ncol |
If |
nrow |
If |
facet_label |
If |
hline |
If not |
size |
Argument of length 1 giving the points size (if |
color |
If not |
shape |
The points shape (default = |
theme |
The |
ang_x_axis |
If not |
Either Y
or lmpDataList
need to be defined. If both are given, the priority goes to Y
.
A ggplot2
line plot.
data("UCH") plotLine(Y = UCH$outcomes) plotLine(lmpDataList = UCH) # separate plots plotLine(Y = UCH$outcomes, rows = seq(1, 8), hline = NULL) plotLine(Y = UCH$outcomes, rows = seq(1, 8), color = 2) plotLine(Y = UCH$outcomes, rows = seq(1, 8), ncol = 2) plotLine( Y = UCH$outcomes, type = "p", rows = seq(1, 8), ncol = 2 ) # stacked plots library(ggplot2) plotLine( Y = UCH$outcomes, rows = seq(1, 1), stacked = TRUE, color = "rows" ) + scale_color_brewer(palette = "Set1")
data("UCH") plotLine(Y = UCH$outcomes) plotLine(lmpDataList = UCH) # separate plots plotLine(Y = UCH$outcomes, rows = seq(1, 8), hline = NULL) plotLine(Y = UCH$outcomes, rows = seq(1, 8), color = 2) plotLine(Y = UCH$outcomes, rows = seq(1, 8), ncol = 2) plotLine( Y = UCH$outcomes, type = "p", rows = seq(1, 8), ncol = 2 ) # stacked plots library(ggplot2) plotLine( Y = UCH$outcomes, rows = seq(1, 1), stacked = TRUE, color = "rows" ) + scale_color_brewer(palette = "Set1")
For a given response variable, draws a plot of the response means by levels of up to three categorical factors from the design. When the design is balanced, it allows to visualize main effects or interactions for the response of interest. For unbalanced designs, this plot must be used with caution.
plotMeans( Y = NULL, design = NULL, lmpDataList = NULL, cols = NULL, x, z = NULL, w = NULL, title = NULL, xlab = NULL, ylab = NULL, color = NULL, shape = NULL, linetype = NULL, size = 2, hline = NULL, theme = theme_bw() )
plotMeans( Y = NULL, design = NULL, lmpDataList = NULL, cols = NULL, x, z = NULL, w = NULL, title = NULL, xlab = NULL, ylab = NULL, color = NULL, shape = NULL, linetype = NULL, size = 2, hline = NULL, theme = theme_bw() )
Y |
A numerical matrix containing the columns to be drawn. Can be |
design |
A |
lmpDataList |
If not |
cols |
A vector with either the column name(s) of the |
x |
A character string giving the |
z |
A character string giving the |
w |
A character string giving the |
title |
Plot title. |
xlab |
If not |
ylab |
If not |
color |
If not |
shape |
If not |
linetype |
If not |
size |
Points size. |
hline |
If not |
theme |
The |
Either Y
or lmpDataList
need to be defined. If both are given, the priority goes to Y
.
The same rule applies for design
or lmpDataList
.
A list of ggplot2
means plot(s).
data("UCH") # 1 factor plotMeans( Y = UCH$outcomes, design = UCH$design, cols = "4.0628702", x = "Hippurate", color = "blue" ) # equivalent to: plotMeans( lmpDataList = UCH, cols = "4.0628702", x = "Hippurate", color = "blue" ) # 2 factors plotMeans( Y = UCH$outcomes, design = UCH$design, cols = c(364, 365), x = "Hippurate", z = "Time", shape = c(15, 1) ) # 3 factors plotMeans( Y = UCH$outcomes, design = UCH$design, cols = c(364, 365), x = "Hippurate", z = "Time", w = "Citrate", linetype = c(3, 3) )
data("UCH") # 1 factor plotMeans( Y = UCH$outcomes, design = UCH$design, cols = "4.0628702", x = "Hippurate", color = "blue" ) # equivalent to: plotMeans( lmpDataList = UCH, cols = "4.0628702", x = "Hippurate", color = "blue" ) # 2 factors plotMeans( Y = UCH$outcomes, design = UCH$design, cols = c(364, 365), x = "Hippurate", z = "Time", shape = c(15, 1) ) # 3 factors plotMeans( Y = UCH$outcomes, design = UCH$design, cols = c(364, 365), x = "Hippurate", z = "Time", w = "Citrate", linetype = c(3, 3) )
Produces a plot describing the relationship between two columns of the outcomes matrix . Colors and symbols can be chosen for the levels of the design factors. Ellipses, polygons or segments can be added to group different sets of points on the graph.
plotScatter( Y = NULL, design = NULL, lmpDataList = NULL, xy, color = NULL, shape = NULL, points_labs = NULL, title = "Scatter plot", xlab = NULL, ylab = NULL, size = 2, size_lab = 3, drawShapes = c("none", "ellipse", "polygon", "segment"), typeEl = c("norm", "t", "euclid"), levelEl = 0.9, alphaPoly = 0.4, theme = theme_bw(), drawOrigin = FALSE )
plotScatter( Y = NULL, design = NULL, lmpDataList = NULL, xy, color = NULL, shape = NULL, points_labs = NULL, title = "Scatter plot", xlab = NULL, ylab = NULL, size = 2, size_lab = 3, drawShapes = c("none", "ellipse", "polygon", "segment"), typeEl = c("norm", "t", "euclid"), levelEl = 0.9, alphaPoly = 0.4, theme = theme_bw(), drawOrigin = FALSE )
Y |
A |
design |
A |
lmpDataList |
If not |
xy |
x- and y-axis values: a vector of length 2 with either the column name(s) of the |
color |
If not |
shape |
If not |
points_labs |
If not |
title |
Plot title. |
xlab |
If not |
ylab |
If not |
size |
The points size, by default |
size_lab |
The size of points labels, by default |
drawShapes |
Multiple shapes can be drawn based on the |
typeEl |
The type of ellipse, either |
levelEl |
The confidence level at which to draw an ellipse, by default |
alphaPoly |
The degree of transparency for polygons, by default |
theme |
The |
drawOrigin |
If |
Either Y
or lmpDataList
need to be defined. If both are given, the priority goes to Y
.
The same rule applies for design
or lmpDataList
.
A ggplot2
scatter plot.
data("UCH") # Without the design info plotScatter(Y = UCH$outcomes, xy = c(453, 369)) # equivalent to: plotScatter(lmpDataList = UCH, xy = c(453, 369)) # With color and shape plotScatter( lmpDataList = UCH, xy = c(453, 369), color = "Hippurate", shape = "Citrate" ) # equivalent to: plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", shape = "Citrate" ) # With color and shapes plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", drawShapes = "ellipse" ) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", drawShapes = "polygon" ) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", drawShapes = "segment" ) # With customized shapes library(ggplot2) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), shape = "Hippurate", size = 3 ) + scale_discrete_identity( aesthetics = "shape", guide = "legend" ) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), shape = "Hippurate" ) + scale_shape_discrete(solid = FALSE) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), shape = "Hippurate" ) + scale_shape_manual(values = c(15, 16, 17)) # With labels plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), points_labs = rownames(UCH$design) )
data("UCH") # Without the design info plotScatter(Y = UCH$outcomes, xy = c(453, 369)) # equivalent to: plotScatter(lmpDataList = UCH, xy = c(453, 369)) # With color and shape plotScatter( lmpDataList = UCH, xy = c(453, 369), color = "Hippurate", shape = "Citrate" ) # equivalent to: plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", shape = "Citrate" ) # With color and shapes plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", drawShapes = "ellipse" ) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", drawShapes = "polygon" ) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), color = "Hippurate", drawShapes = "segment" ) # With customized shapes library(ggplot2) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), shape = "Hippurate", size = 3 ) + scale_discrete_identity( aesthetics = "shape", guide = "legend" ) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), shape = "Hippurate" ) + scale_shape_discrete(solid = FALSE) plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), shape = "Hippurate" ) + scale_shape_manual(values = c(15, 16, 17)) # With labels plotScatter( Y = UCH$outcomes, design = UCH$design, xy = c(453, 369), points_labs = rownames(UCH$design) )
Produces a scatter plot matrix between the selected columns of the outcomes matrix choosing specific colors and symbols for up to four factors from the design on the upper and lower diagonals.
plotScatterM( Y = NULL, design = NULL, lmpDataList = NULL, cols, labelVector = NULL, title = "Scatterplot matrix", varname.colorup = NULL, varname.colordown = NULL, varname.pchup = NULL, varname.pchdown = NULL, vec.colorup = NULL, vec.colordown = NULL, vec.pchup = NULL, vec.pchdown = NULL )
plotScatterM( Y = NULL, design = NULL, lmpDataList = NULL, cols, labelVector = NULL, title = "Scatterplot matrix", varname.colorup = NULL, varname.colordown = NULL, varname.pchup = NULL, varname.pchdown = NULL, vec.colorup = NULL, vec.colordown = NULL, vec.pchup = NULL, vec.pchdown = NULL )
Y |
|
design |
A |
lmpDataList |
If not |
cols |
A vector with either the column names of the |
labelVector |
Labels to display on the diagonal. If |
title |
Title of the graph. |
varname.colorup |
A character string with the name of the variable used to color the upper triangle. |
varname.colordown |
A character string with the name of the variable used to color the lower triangle. |
varname.pchup |
A character string with the name of the variable used to mark points for the upper triangle. |
varname.pchdown |
A character string with the name of the variable used to mark points for the lower triangle. |
vec.colorup |
A color vector (character or numeric) with a length equal to the number of levels of |
vec.colordown |
A color vector (character or numeric) with a length equal to the number of levels of |
vec.pchup |
A symbol vector (character or numeric) with a length equal to the number of levels of |
vec.pchdown |
A symbol vector (character or numeric) with a length equal to the number of levels of |
Either Y
or lmpDataList
need to be defined. If both are given, the priority goes to Y
.
The same rule applies for design
or lmpDataList
.
A matrix of scatter plots.
data("UCH") # basic usage plotScatterM( Y = UCH$outcomes, design = UCH$design, cols = c(1:4) ) # equivalent to: plotScatterM( lmpDataList = UCH, cols = c(1:4) ) # with optionnal arguments plotScatterM( Y = UCH$outcomes, design = UCH$design, cols = c(1:4), varname.colorup = "Hippurate", varname.colordown = "Citrate", varname.pchup = "Time", varname.pchdown = "Day", vec.colorup = c(2, 4, 1), vec.colordown = c("orange", "purple", "green"), vec.pchup = c(1, 2), vec.pchdown = c("a", "b") )
data("UCH") # basic usage plotScatterM( Y = UCH$outcomes, design = UCH$design, cols = c(1:4) ) # equivalent to: plotScatterM( lmpDataList = UCH, cols = c(1:4) ) # with optionnal arguments plotScatterM( Y = UCH$outcomes, design = UCH$design, cols = c(1:4), varname.colorup = "Hippurate", varname.colordown = "Citrate", varname.pchup = "Time", varname.pchdown = "Day", vec.colorup = c(2, 4, 1), vec.colordown = c("orange", "purple", "green"), vec.pchup = c(1, 2), vec.pchdown = c("a", "b") )
This dataset comes from the study of the modulation of immunity in rainbow trout (Oncorhynchus mykiss) by exposure to cadmium (Cd) combined with polyunsaturated fatty acids (PUFAs) enriched diets [Cornet et al., 2018].
The responses were quantified by measuring the modification of the expression of 15 immune-related genes (m = 15) by RT-qPCR (reverse transcription quantita- tive polymerase chain reaction). The experiment was carried out on 72 trouts and 3 factors were considered in the experimental design:
Day
Measurements on trouts were collected on days 28, 70 and 72
Treatment
Four polyunsaturated fatty acid diets: alpha-linolenic acid (ALA), linoleic acid (LA), eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA)
Exposure
Trouts were exposed (level = 2) or not (level = 0) to high cadmium concentrations.
This gives a 3 × 4 × 2 factorial design. Each of the 24 trials corresponds to a different aquarium. Three fish were analysed (3 replicates) for each condition, giving a total of 72 observations.
In the limpca vignette, some outliers are first detected and removed from the dataset. The data of each aquarium are then transformed in log10 and mean aggregated in order to avoid the use of an aquarium random factor in the statictical model. Data are then centered and scaled by column. The ASCA+/APCA+ analysis is then applyed on the transformed data.
data("trout")
data("trout")
A list of 3 : the experimental design, the outcomes for every observation and the formula considered to analyze the data. Caution ! Here, the data must first be aggregated before being analyzed with ASCA+ (see details in related vignette)
outcomes
A dataset with 72 observations and 15 response variables
formula
The suggested formula to analyze the data
design
The experimental design of 72 observations and 4 explanatory variables
The data must first be aggregated before being analyzed with limpca. This will remove the hierarchy in the design and allow to apply a classical fixed effect general linear model to the data. See more details in the trout vignette (print(vignette(topic = "Trout", package = "limpca"))
).
Cornet, V., Ouaach, A., Mandiki, S., Flamion, E., Ferain, A., Van Larebeke, M., Lemaire, B., Reyes Lopez F., Tort, L., Larondelle, Y. and Kestemont, P. (2018). Environmentally-realistic concentration of cadmium combined with polyunsaturated fatty acids enriched diets modulated non-specific immunity in rainbow trout. Aquatic Toxicology, 196, 104–116. https://doi.org/10.1016/j.aquatox.2018.01.012
Benaiche, N. (2022). Stabilisation of the R package LMWiRe – Linear Models for Wide Responses. Prom. : Govaerts, B. Master thesis. Institut de statistique, biostatistique et sciences actuarielles, UCLouvain, Belgium. http://hdl.handle.net/2078.1/thesis:33996
data("trout")
data("trout")
This dataset comes from a 1H NMR analysis of urine of female rats with hippuric and citric acid were added to the samples in different known concentrations.
data("UCH")
data("UCH")
A list of length 3: the experimental design ('design'), the outcomes for every observations ('outcomes') and the formula considered to analyze the data ('formula').
design
A data.frame with the experimental design of 34 observations and 5 explanatory variables:
Hippurate
: concentration of hippuric acid;
Citrate
: concentration of citric acid;
Dilution
: dilution, here all the samples are diluted with a dilution rate of 50 %;
Day
: for each medium, the preparation of the mixtures were performed in two series;
Time
: each mixture or experimental condition was repeated twice.
outcomes
A numerical matrix with 34 observations and 600 response variables
formula
A character string with the suggested formula to analyze the data
The UCH vignette can be accessed with: (print(vignette(topic = "UCH", package = "limpca"))
).
The database has been experimentally created in order to control the spectral locations of the biomarkers to find (i.e. Hippurate
and Citrate
). This property allows us to evaluate the performances of the data analysis of various statistical methods.
This urine experimental database is also designed in order to explore the influence on spectra of intra-sample 1H NMR replications (Time
), and inter-day 1H NMR measurements (Day
).
The model formula is:
outcomes = Hippurate + Citrate + Time +
Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time
Martin, M. (2020). Uncovering informative content in metabolomics data: from pre-processing of 1H NMR spectra to biomarkers discovery in multifactorial designs. Prom.: Govaerts, B. PhD thesis. Institut de statistique, biostatistique et sciences actuarielles, UCLouvain, Belgium. http://hdl.handle.net/2078.1/227671
data("UCH") str(UCH)
data("UCH") str(UCH)