Title: | A collection of PCA methods |
---|---|
Description: | Provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a common interface to the PCA results. Initiated at the Max-Planck Institute for Molecular Plant Physiology, Golm, Germany. |
Authors: | Wolfram Stacklies, Henning Redestig, Kevin Wright |
Maintainer: | Henning Redestig <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.99.0 |
Built: | 2024-11-29 06:56:02 UTC |
Source: | https://github.com/bioc/pcaMethods |
This function can be used to conveniently replace the expression
matrix in an ExpressionSet
with the completed data from a
pcaRes
object.
asExprSet(object, exprSet)
asExprSet(object, exprSet)
object |
|
exprSet |
|
This is not a standard as
function as pcaRes
object alone not can be converted to an ExpressionSet
(the
pcaRes
object does not hold any phenoData
for
example).
An object without missing values of class ExpressionSet
.
Wolfram Stacklies
CAS-MPG Partner Institute for
Computational Biology, Shanghai, China
Visualize two-components simultaneously
## S3 method for class 'pcaRes' biplot(x, choices = 1:2, scale = 1, pc.biplot = FALSE, ...) ## S4 method for signature 'pcaRes' biplot(x, choices = 1:2, scale = 1, pc.biplot = FALSE, ...)
## S3 method for class 'pcaRes' biplot(x, choices = 1:2, scale = 1, pc.biplot = FALSE, ...) ## S4 method for signature 'pcaRes' biplot(x, choices = 1:2, scale = 1, pc.biplot = FALSE, ...)
x |
a pcaRes object |
choices |
which two pcs to plot |
scale |
The variables are scaled by
|
pc.biplot |
If true, use what Gabriel (1971) refers to as a
"principal component biplot", with |
... |
optional arguments to be passed to
|
This is a method for the generic function 'biplot'. There is
considerable confusion over the precise definitions: those of the
original paper, Gabriel (1971), are followed here. Gabriel and
Odoroff (1990) use the same definitions, but their plots actually
correspond to pc.biplot = TRUE
.
a plot is produced on the current graphics device.
Kevin Wright, Adapted from biplot.prcomp
prcomp
, pca
, princomp
data(iris) pcIr <- pca(iris[,1:4]) biplot(pcIr)
data(iris) pcIr <- pca(iris[,1:4]) biplot(pcIr)
Implements a Bayesian PCA missing value estimator. The script is a port of the Matlab version provided by Shigeyuki OBA. See also http://ishiilab.jp/member/oba/tools/BPCAFill.html. BPCA combines an EM approach for PCA with a Bayesian model. In standard PCA data far from the training set but close to the principal subspace may have the same reconstruction error. BPCA defines a likelihood function such that the likelihood for data far from the training set is much lower, even if they are close to the principal subspace.
bpca(Matrix, nPcs = 2, maxSteps = 100, verbose = interactive(), threshold = 1e-04, ...)
bpca(Matrix, nPcs = 2, maxSteps = 100, verbose = interactive(), threshold = 1e-04, ...)
Matrix |
|
nPcs |
|
maxSteps |
|
verbose |
|
threshold |
convergence threshold |
... |
Reserved for future use. Currently no further parameters are used |
Scores and loadings obtained with Bayesian PCA slightly differ from those obtained with conventional PCA. This is because BPCA was developed especially for missing value estimation. The algorithm does not force orthogonality between factor loadings, as a result factor loadings are not necessarily orthogonal. However, the BPCA authors found that including an orthogonality criterion made the predictions worse.
The authors also state that the difference between real and predicted Eigenvalues becomes larger when the number of observation is smaller, because it reflects the lack of information to accurately determine true factor loadings from the limited and noisy data. As a result, weights of factors to predict missing values are not the same as with conventional PCA, but the missing value estimation is improved.
BPCA works iteratively, the complexity is growing with
because several matrix inversions are
required. The size of the matrices to invert depends on the
number of components used for re-estimation.
Finding the optimal number of components for estimation is not a
trivial task; the best choice depends on the internal structure of
the data. A method called kEstimate
is provided to
estimate the optimal number of components via cross validation.
In general few components are sufficient for reasonable estimation
accuracy. See also the package documentation for further
discussion about on what data PCA-based missing value estimation
makes sense.
It is not recommended to use this function directely but rather to use the pca() wrapper function.
There is a difference with respect the interpretation of rows (observations) and columns (variables) compared to matlab implementation. For estimation of missing values for microarray data, the suggestion in the original bpca is to intepret genes as observations and the samples as variables. In pcaMethods however, genes are interpreted as variables and samples as observations which arguably also is the more natural interpretation. For bpca behavior like in the matlab implementation, simply transpose your input matrix.
Details about the probabilistic model underlying BPCA are found in
Oba et. al 2003. The algorithm uses an expectation maximation
approach together with a Bayesian model to approximate the
principal axes (eigenvectors of the covariance matrix in PCA).
The estimation is done iteratively, the algorithm terminates if
either the maximum number of iterations was reached or if the
estimated increase in precision falls below .
Complexity: The relatively high complexity of the method is a result of several matrix inversions required in each step. Considering the case that the maximum number of iteration steps is needed, the approximate complexity is given by the term
Where is the number of rows
containing missing values and
is the
complexity for inverting a matrix of size
. Components is the number of
components used for re-estimation.
Standard PCA result object used by all PCA-based methods
of this package. Contains scores, loadings, data mean and
more. See pcaRes
for details.
Requires MASS
.
Wolfram Stacklies
Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara and Shin Ishii. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16):2088-2096, Nov 2003.
ppca
, svdImpute
,
prcomp
, nipalsPca
,
pca
,
pcaRes
. kEstimate
.
## Load a sample metabolite dataset with 5\% missig values (metaboliteData)e data(metaboliteData) ## Perform Bayesian PCA with 2 components pc <- pca(t(metaboliteData), method="bpca", nPcs=2) ## Get the estimated principal axes (loadings) loadings <- loadings(pc) ## Get the estimated scores scores <- scores(pc) ## Get the estimated complete observations cObs <- completeObs(pc) ## Now make a scores and loadings plot slplot(pc)
## Load a sample metabolite dataset with 5\% missig values (metaboliteData)e data(metaboliteData) ## Perform Bayesian PCA with 2 components pc <- pca(t(metaboliteData), method="bpca", nPcs=2) ## Get the estimated principal axes (loadings) loadings <- loadings(pc) ## Get the estimated scores scores <- scores(pc) ## Get the estimated complete observations cObs <- completeObs(pc) ## Now make a scores and loadings plot slplot(pc)
The function contains the actual implementation of the BPCA component estimation. It performs one step of the BPCA EM algorithm. It is called 'maxStep' times from within the main loop in BPCAestimate.
BPCA_dostep(M, y)
BPCA_dostep(M, y)
M |
Data structure containing all needed information. See the source documentation of BPCA_initmodel for details |
y |
Numeric original data matrix |
This function is NOT intended to be run standalone.
Updated version of the data structure
Wolfram Stacklies
Model initialization for Bayesian PCA. This function is NOT inteded to be run separately!
BPCA_initmodel(y, components)
BPCA_initmodel(y, components)
y |
numeric matrix containing missing values. Missing values are denoted as 'NA' |
components |
Number of components used for estimation |
The function calculates the initial Eigenvectors by use of SVD from the complete rows. The data structure M is created and initial values are assigned.
List containing
rows |
Row number of input matrix |
cols |
Column number of input matrix |
comps |
Number of components to use |
yest |
(working variable) current estimate of complete data |
row_miss |
(Array) Indizes of rows containing missing values |
row_nomiss |
(Array) Indices of complete rows (such with no missing values) |
nans |
Matrix of same size as input data. TRUE if |
mean |
Column wise data mean |
PA |
(d x k) Estimated principal axes (eigenvectors, loadings) The matrix ROWS are the vectors |
tau |
Estimated precision of the residual error |
scores |
Estimated scores |
Further elements are: galpha0, balpha0, alpha, gmu0, btau0, gtau0, SigW. These are working variables or constants.
Wolfram Stacklies
Get the centers of the original variables
center(object, ...)
center(object, ...)
object |
pcaRes object |
... |
Not used |
Vector with the centers
Henning Redestig
Check centering was part of the model
centered(object, ...)
centered(object, ...)
object |
pcaRes object |
... |
Not used |
TRUE if model was centered
Henning Redestig
Check a given data matrix for consistency with the format required for further analysis. The data must be a numeric matrix and not contain:
Inf values
NaN values
Rows or columns that consist of NA only
checkData(data, verbose = FALSE)
checkData(data, verbose = FALSE)
data |
|
verbose |
|
isValid |
|
Wolfram Stacklies
Get the original data with missing values replaced with predicted values.
completeObs(object, ...)
completeObs(object, ...)
object |
object to fetch complete data from |
... |
Not used |
Completed data (matrix)
Henning Redestig
Get cross-validation segments that have (as far as possible) the same ratio of all classes (if classes are present)
cvseg(x, fold = 7, seed = NULL)
cvseg(x, fold = 7, seed = NULL)
x |
a factor, character or numeric vector that describes class membership of a set of items, or, a numeric vector indicating unique indices of items, or, a numeric of length 1 that describes the number of items to segment (without any classes) |
fold |
the desired number of segments |
seed |
randomization seed for reproducibility |
a list where each element is a set of indices that defines the CV segment.
Henning Redestig
the cvsegments
function in the pls
package
seg <- cvseg(iris$Species, 10) sapply(seg, function(s) table(iris$Species[s])) cvseg(20, 10)
seg <- cvseg(iris$Species, 10) sapply(seg, function(s) table(iris$Species[s])) cvseg(20, 10)
).Get cross-validation statistics (e.g. ).
cvstat(object, ...)
cvstat(object, ...)
object |
pcaRes object |
... |
not used |
vector CV statistics
Henning Redestig
Replace a diagonal of elements of a matrix with NA
deletediagonals(x, diagonals = 1)
deletediagonals(x, diagonals = 1)
x |
The matrix |
diagonals |
The diagonal to be replaced, i.e. the first, second and so on when looking at the fat version of the matrix (transposed or not) counting from the bottom. Can be a vector to delete more than one diagonal. |
Used for creating artifical missing values in matrices without causing any full row or column to be completely missing
The original matrix with some values missing
Henning Redestig
Later
derrorHierarchic(nlnet, trainIn, trainOut)
derrorHierarchic(nlnet, trainIn, trainOut)
nlnet |
the nlnet |
trainIn |
training data |
trainOut |
fitted data |
derror
Henning Redestig, Matthias Scholz
Dimensions of a PCA model
## S3 method for class 'pcaRes' dim(x)
## S3 method for class 'pcaRes' dim(x)
x |
a pcaRes object |
Get the dimensions of this PCA model
Henning Redestig
Distance to the model of X-space.
DModX(object, dat, newdata=FALSE, type=c("normalized","absolute"), ...)
DModX(object, dat, newdata=FALSE, type=c("normalized","absolute"), ...)
object |
a pcaRes object |
dat |
the original data, taken from |
newdata |
logical indicating if this data was part of the
training data or not. If it was, it is adjusted by a near one factor
|
type |
if absolute or normalized values should be given. Normalized values are adjusted to the the total RSD of the model. |
... |
Not used |
Measures how well described the observations are, i.e. how well they fit in the mode. High DModX indicate a poor fit. Defined as:
For observation , in a model with
components,
variables and
obserations. SSE is the squared sum
of the residuals.
is 1 if model was centered and 0
otherwise. DModX is claimed to be approximately F-distributed and
can therefore be used to check if an observation is significantly
far away from the PCA model assuming normally distributed data.
Pass original data as an argument if the model was calculated with
completeObs=FALSE
.
A vector with distances from observations to the PCA model
Henning Redestig
Introduction to Multi- and Megavariate Data Analysis using Projection Methods (PCA and PLS), L. Eriksson, E. Johansson, N. Kettaneh-Wold and S. Wold, Umetrics 1999, p. 468
data(iris) pcIr <- pca(iris[,1:4]) with(iris, plot(DModX(pcIr)~Species))
data(iris) pcIr <- pca(iris[,1:4]) with(iris, plot(DModX(pcIr)~Species))
Later
errorHierarchic(nlnet, trainIn, trainOut)
errorHierarchic(nlnet, trainIn, trainOut)
nlnet |
The nlnet |
trainIn |
training data |
trainOut |
fitted data |
error
Henning Redestig, Matthias Scholz
Fitted values of a PCA model
## S3 method for class 'pcaRes' fitted(object, data = NULL, nPcs = nP(object), pre = TRUE, post = TRUE, ...) ## S4 method for signature 'pcaRes' fitted(object, data = NULL, nPcs = nP(object), pre = TRUE, post = TRUE, ...)
## S3 method for class 'pcaRes' fitted(object, data = NULL, nPcs = nP(object), pre = TRUE, post = TRUE, ...) ## S4 method for signature 'pcaRes' fitted(object, data = NULL, nPcs = nP(object), pre = TRUE, post = TRUE, ...)
object |
the |
data |
For standard PCA methods this can safely be left null to get scores x loadings but if set, then the scores are obtained by projecting provided data onto the loadings. If data contains missing values the result will be all NA. Non-linear PCA is an exception, here if data is NULL then data is set to the completeObs and propaged through the network. |
nPcs |
The number of PC's to consider |
pre |
pre-process |
post |
unpre-process the final data (add the center back etc to get the final estimate) |
... |
Not used |
This function extracts the fitted values from a pcaResobject. For PCA methods like SVD, Nipals, PPCA etc this is basically just the scores multipled by the loadings and adjusted for pre-processing. for non-linear PCA the original data is propagated through the network to obtain the approximated data.
A matrix representing the fitted data
Henning Redestig
pc <- pca(iris[,1:4], nPcs=4, center=TRUE, scale="uv") sum( (fitted(pc) - iris[,1:4])^2 )
pc <- pca(iris[,1:4], nPcs=4, center=TRUE, scale="uv") sum( (fitted(pc) - iris[,1:4])^2 )
Complete copy of nlpca net object
forkNlpcaNet(nlnet)
forkNlpcaNet(nlnet)
nlnet |
a nlnet |
A copy of the input nlnet
Henning Redestig
Index in hiearchy
getHierarchicIdx(hierarchicNum)
getHierarchicIdx(hierarchicNum)
hierarchicNum |
A number |
...
Henning Redestig, Matthias Scholz
Simulated data set looking like a helix
data(helix)
data(helix)
A matrix containing 1000 observations (rows) and three variables (columns).
Henning Redestig
Matthias Scholz, Fatma Kaplan, Charles L. Guy, Joachim Kopka and Joachim Selbig. - Non-linear PCA: a missing data approach. Bioinformatics 2005 21(20):3887-3895
Perform cross validation to estimate the optimal number of components for missing value estimation. Cross validation is done for the complete subset of a variable.
kEstimate(Matrix, method = "ppca", evalPcs = 1:3, segs = 3, nruncv = 5, em = "q2", allVariables = FALSE, verbose = interactive(), ...)
kEstimate(Matrix, method = "ppca", evalPcs = 1:3, segs = 3, nruncv = 5, em = "q2", allVariables = FALSE, verbose = interactive(), ...)
Matrix |
|
method |
|
evalPcs |
|
segs |
|
nruncv |
|
em |
|
allVariables |
|
verbose |
|
... |
Further arguments to |
The assumption hereby is that variables that are highly correlated in a distinct region (here the non-missing observations) are also correlated in another (here the missing observations). This also implies that the complete subset must be large enough to be representative. For each incomplete variable, the available values are divided into a user defined number of cv-segments. The segments have equal size, but are chosen from a random equal distribution. The non-missing values of the variable are covered completely. PPCA, BPCA, SVDimpute, Nipals PCA, llsImpute an NLPCA may be used for imputation.
The whole cross validation is repeated several times so, depending on the parameters, the calculations can take very long time. As error measure the NRMSEP (see Feten et. al, 2005) or the Q2 distance is used. The NRMSEP basically normalises the RMSD between original data and estimate by the variable-wise variance. The reason for this is that a higher variance will generally lead to a higher estimation error. If the number of samples is small, the variable - wise variance may become an unstable criterion and the Q2 distance should be used instead. Also if variance normalisation was applied previously.
The method proceeds variable - wise, the NRMSEP / Q2 distance is
calculated for each incomplete variable and averaged
afterwards. This allows to easily see for wich set of variables
missing value imputation makes senes and for wich set no
imputation or something like mean-imputation should be used. Use
kEstimateFast
or Q2
if you are not interested in
variable wise CV performance estimates.
Run time may be very high on large data sets. Especially when used
with complex methods like BPCA or Nipals PCA. For PPCA, BPCA,
Nipals PCA and NLPCA the estimation method is called
times as the error for all numbers of principal
components can be calculated at once. For LLSimpute and SVDimpute
this is not possible, and the method is called
times. This should still be fast for
LLSimpute because the method allows to choose to only do the
estimation for one particular variable. This saves a lot of
iterations. Here,
is the number of
variables showing missing values.
As cross validation is done variable-wise, in this function Q2 is
defined on single variables, not on the entire data set. This is
Q2 is calculated as as , where x is the
currently used variable and xe it's estimate. The values are then
averaged over all variables. The NRMSEP is already defined
variable-wise. For a single variable it is then
, where x is the variable and xe it's
estimate, n is the length of x. The variable wise estimation
errors are returned in parameter variableWiseError.
A list with:
bestNPcs |
number of PCs or k for which the minimal average NRMSEP or the maximal Q2 was obtained. |
eError |
an array of of size length(evalPcs). Contains the average error of the cross validation runs for each number of components. |
variableWiseError |
Matrix of size
|
evalPcs |
The evaluated numbers of components or number of neighbours (the same as the evalPcs input parameter). |
variableIx |
Index of the incomplete variables. This can be used to map the variable wise error to the original data. |
Wolfram Stacklies
kEstimateFast, Q2, pca, nni
.
## Load a sample metabolite dataset with 5\% missing values (metaboliteData) data(metaboliteData) # Do cross validation with ppca for component 2:4 esti <- kEstimate(metaboliteData, method = "ppca", evalPcs = 2:4, nruncv=1, em="nrmsep") # Plot the average NRMSEP barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)") # The best result was obtained for this number of PCs: print(esti$bestNPcs) # Now have a look at the variable wise estimation error barplot(drop(esti$variableWiseError[, which(esti$evalPcs == esti$bestNPcs)]), xlab = "Incomplete variable Index", ylab = "NRMSEP")
## Load a sample metabolite dataset with 5\% missing values (metaboliteData) data(metaboliteData) # Do cross validation with ppca for component 2:4 esti <- kEstimate(metaboliteData, method = "ppca", evalPcs = 2:4, nruncv=1, em="nrmsep") # Plot the average NRMSEP barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)") # The best result was obtained for this number of PCs: print(esti$bestNPcs) # Now have a look at the variable wise estimation error barplot(drop(esti$variableWiseError[, which(esti$evalPcs == esti$bestNPcs)]), xlab = "Incomplete variable Index", ylab = "NRMSEP")
This is a simple estimator for the optimal number of componets
when applying PCA or LLSimpute for missing value estimation. No
cross validation is performed, instead the estimation quality is
defined as Matrix[!missing] - Estimate[!missing]. This will give a
relatively rough estimate, but the number of iterations equals the
length of the parameter evalPcs.
Does not work with LLSimpute!!
As error measure the NRMSEP (see Feten et. al, 2005) or the Q2
distance is used. The NRMSEP basically normalises the RMSD
between original data and estimate by the variable-wise
variance. The reason for this is that a higher variance will
generally lead to a higher estimation error. If the number of
samples is small, the gene - wise variance may become an unstable
criterion and the Q2 distance should be used instead. Also if
variance normalisation was applied previously.
kEstimateFast(Matrix, method = "ppca", evalPcs = 1:3, em = "nrmsep", allVariables = FALSE, verbose = interactive(), ...)
kEstimateFast(Matrix, method = "ppca", evalPcs = 1:3, em = "nrmsep", allVariables = FALSE, verbose = interactive(), ...)
Matrix |
|
method |
|
evalPcs |
|
em |
|
allVariables |
|
verbose |
|
... |
Further arguments to |
list |
Returns a list with the elements:
|
Wolfram Stacklies
data(metaboliteData) # Estimate best number of PCs with ppca for component 2:4 esti <- kEstimateFast(t(metaboliteData), method = "ppca", evalPcs = 2:4, em="nrmsep") barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)") # The best k value is: print(esti$minNPcs)
data(metaboliteData) # Estimate best number of PCs with ppca for component 2:4 esti <- kEstimateFast(t(metaboliteData), method = "ppca", evalPcs = 2:4, em="nrmsep") barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)") # The best k value is: print(esti$minNPcs)
The leverages of PCA model indicate how much influence each observation has on the PCA model. Observations with high leverage has caused the principal components to rotate towards them. It can be used to extract both "unimportant" observations as well as picking potential outliers.
## S4 method for signature 'pcaRes' leverage(object)
## S4 method for signature 'pcaRes' leverage(object)
object |
a |
Defined as
The observation leverages as a numeric vector
Henning Redestig
Introduction to Multi- and Megavariate Data Analysis using Projection Methods (PCA and PLS), L. Eriksson, E. Johansson, N. Kettaneh-Wold and S. Wold, Umetrics 1999, p. 466
data(iris) pcIr <- pca(iris[,1:4]) ## versicolor has the lowest leverage with(iris, plot(leverage(pcIr)~Species))
data(iris) pcIr <- pca(iris[,1:4]) ## versicolor has the lowest leverage with(iris, plot(leverage(pcIr)~Species))
Line search for conjugate gradient
lineSearch(nlnet, dw, e0, ttGuess, trainIn, trainOut, verbose)
lineSearch(nlnet, dw, e0, ttGuess, trainIn, trainOut, verbose)
nlnet |
The nlnet |
dw |
.. |
e0 |
.. |
ttGuess |
.. |
trainIn |
Training data |
trainOut |
Fitted data |
verbose |
logical, print messages |
...
Henning Redestig, Matthias Scholz
Linear kernel
linr(x)
linr(x)
x |
datum |
Input value
Henning Redestig, Matthias Scholz
Vector with current valid PCA methods
listPcaMethods(which = c("all", "linear", "nonlinear"))
listPcaMethods(which = c("all", "linear", "nonlinear"))
which |
the type of methods to get. E.g. only get the PCA methods based on the classical model where the fitted data is a direct multiplication of scores and loadings. |
A character vector with the current methods for doing PCA
Henning Redestig
Missing value estimation using local least squares (LLS). First, k variables (for Microarrya data usually the genes) are selected by pearson, spearman or kendall correlation coefficients. Then missing values are imputed by a linear combination of the k selected variables. The optimal combination is found by LLS regression. The method was first described by Kim et al, Bioinformatics, 21(2),2005.
llsImpute(Matrix, k = 10, center = FALSE, completeObs = TRUE, correlation = "pearson", allVariables = FALSE, maxSteps = 100, xval = NULL, verbose = FALSE, ...)
llsImpute(Matrix, k = 10, center = FALSE, completeObs = TRUE, correlation = "pearson", allVariables = FALSE, maxSteps = 100, xval = NULL, verbose = FALSE, ...)
Matrix |
|
k |
|
center |
|
completeObs |
|
correlation |
|
allVariables |
|
maxSteps |
|
xval |
|
verbose |
|
... |
Reserved for parameters used in future version of the algorithm |
Missing values are denoted as NA
It is not recommended
to use this function directely but rather to use the nni() wrapper
function. The methods provides two ways for missing value
estimation, selected by the allVariables
option. The first
one is to use only complete variables for the regression. This is
preferable when the number of incomplete variables is relatively
small.
The second way is to consider all variables as candidates for the regression. Hereby missing values are initially replaced by the columns wise mean. The method then iterates, using the current estimate as input for the regression until the change between new and old estimate falls below a threshold (0.001).
nniRes |
Standard nni (nearest neighbour
imputation) result object of this package. See
|
Each step the generalized inverse of a miss
x k
matrix is calculated. Where miss
is the number of missing
values in variable j and k
the number of neighbours. This
may be slow for large values of k and / or many missing
values. See also help("ginv").
Wolfram Stacklies
Kim, H. and Golub, G.H. and Park, H. - Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics, 2005; 21(2):187-198.
Troyanskaya O. and Cantor M. and Sherlock G. and Brown P. and Hastie T. and Tibshirani R. and Botstein D. and Altman RB. - Missing value estimation methods for DNA microarrays. Bioinformatics. 2001 Jun;17(6):520-525.
## Load a sample metabolite dataset (metaboliteData) with already 5\% of ## data missing data(metaboliteData) ## Perform llsImpute using k = 10 ## Set allVariables TRUE because there are very few complete variables result <- llsImpute(metaboliteData, k = 10, correlation="pearson", allVariables=TRUE) ## Get the estimated complete observations cObs <- completeObs(result)
## Load a sample metabolite dataset (metaboliteData) with already 5\% of ## data missing data(metaboliteData) ## Perform llsImpute using k = 10 ## Set allVariables TRUE because there are very few complete variables result <- llsImpute(metaboliteData, k = 10, correlation="pearson", allVariables=TRUE) ## Get the estimated complete observations cObs <- completeObs(result)
stats
Crude way to unmask the function with the same name from
stats
## S4 method for signature 'ANY' loadings(object, ...)
## S4 method for signature 'ANY' loadings(object, ...)
object |
any object |
... |
not used |
The loadings
Henning Redestig
Get loadings from a pcaRes object
## S4 method for signature 'pcaRes' loadings(object, ...)
## S4 method for signature 'pcaRes' loadings(object, ...)
object |
a pcaRes object |
... |
not used |
The loadings as a matrix
Henning Redestig
Get loadings from a pcaRes object
## S3 method for class 'pcaRes' loadings(object, ...)
## S3 method for class 'pcaRes' loadings(object, ...)
object |
a pcaRes object |
... |
not used |
The loadings as a matrix
Henning Redestig
A incomplete subset from a larger metabolite data set. This is the original, complete data set and can be used to compare estimation results created with the also provided incomplete data (called metaboliteData).
A matrix containing 154 observations (rows) and 52 metabolites (columns). The data contains 5% of artificially created uniformly distributed misssing values. The data was created during an in house Arabidopsis coldstress experiment.
Wolfram Stacklies
Matthias Scholz, Fatma Kaplan, Charles L. Guy, Joachim Kopka and Joachim Selbig. - Non-linear PCA: a missing data approach.Bioinformatics 2005 21(20):3887-3895
A complete subset from a larger metabolite data set. This is the original, complete data set and can be used to compare estimation results created with the also provided incomplete data (called metaboliteData). The data was created during an in house Arabidopsis coldstress experiment.
A matrix containing 154 observations (rows) and 52 metabolites (columns).
Wolfram Stacklies
Matthias Scholz, Fatma Kaplan, Charles L. Guy, Joachim Kopka and Joachim Selbig. - Non-linear PCA: a missing data approach.Bioinformatics 2005 21(20):3887-3895
Get the used PCA method
method(object, ...)
method(object, ...)
object |
pcaRes object |
... |
Not used |
The used pca method
Henning Redestig
PCA by non-linear iterative partial least squares
nipalsPca(Matrix, nPcs = 2, varLimit = 1, maxSteps = 5000, threshold = 1e-06, ...)
nipalsPca(Matrix, nPcs = 2, varLimit = 1, maxSteps = 5000, threshold = 1e-06, ...)
Matrix |
Pre-processed (centered, scaled) numerical matrix samples in rows and variables as columns. |
nPcs |
Number of components that should be extracted. |
varLimit |
Optionally the ratio of variance that should be
explained. |
maxSteps |
Defines how many iterations can be done before algorithm should abort (happens almost exclusively when there were some wrong in the input data). |
threshold |
The limit condition for judging if the algorithm
has converged or not, specifically if a new iteration is done if
|
... |
Only used for passing through arguments. |
Can be used for computing PCA on a numeric matrix using either the NIPALS algorithm which is an iterative approach for estimating the principal components extracting them one at a time. NIPALS can handle a small amount of missing values. It is not recommended to use this function directely but rather to use the pca() wrapper function.
A pcaRes
object.
Henning Redestig
Wold, H. (1966) Estimation of principal components and related models by iterative least squares. In Multivariate Analysis (Ed., P.R. Krishnaiah), Academic Press, NY, 391-420.
prcomp
, princomp
, pca
data(metaboliteData) mat <- prep(t(metaboliteData)) pc <- nipalsPca(mat, nPcs=2) ## better use pca() pc <- pca(t(metaboliteData), method="nipals", nPcs=2)
data(metaboliteData) mat <- prep(t(metaboliteData)) pc <- nipalsPca(mat, nPcs=2) ## better use pca() pc <- pca(t(metaboliteData), method="nipals", nPcs=2)
Neural network based non-linear PCA
nlpca(Matrix, nPcs = 2, maxSteps = 2 * prod(dim(Matrix)), unitsPerLayer = NULL, functionsPerLayer = NULL, weightDecay = 0.001, weights = NULL, verbose = interactive(), ...)
nlpca(Matrix, nPcs = 2, maxSteps = 2 * prod(dim(Matrix)), unitsPerLayer = NULL, functionsPerLayer = NULL, weightDecay = 0.001, weights = NULL, verbose = interactive(), ...)
Matrix |
|
nPcs |
|
maxSteps |
|
unitsPerLayer |
The network units, example: c(2,4,6) for two input units 2feature units (principal components), one hidden layer fornon-linearity and three output units (original amount ofvariables). |
functionsPerLayer |
The function to apply at each layer eg. c("linr", "tanh", "linr") |
weightDecay |
Value between 0 and 1. |
weights |
Starting weights for the network. Defaults to uniform random values but can be set specifically to make algorithm deterministic. |
verbose |
|
... |
Reserved for future use. Not passed on anywhere. |
Artificial Neural Network (MLP) for performing non-linear PCA. Non-linear PCA is conceptually similar to classical PCA but theoretically quite different. Instead of simply decomposing our matrix (X) to scores (T) loadings (P) and an error (E) we train a neural network (our loadings) to find a curve through the multidimensional space of X that describes a much variance as possible. Classical ways of interpreting PCA results are thus not applicable to NLPCA since the loadings are hidden in the network. However, the scores of components that lead to low cross-validation errors can still be interpreted via the score plot. Unfortunately this method depend on slow iterations which currently are implemented in R only making this method extremely slow. Furthermore, the algorithm does not by itself decide when it has converged but simply does 'maxSteps' iterations.
Standard PCA result object used by all PCA-basedmethods of
this package. Contains scores, loadings, data meanand more. See
pcaRes
for details.
Based on a matlab script by Matthias Scholz and ported to R by Henning Redestig
Matthias Scholz, Fatma Kaplan, Charles L Guy, Joachim Kopkaand Joachim Selbig. Non-linear PCA: a missing data approach. Bioinformatics, 21(20):3887-3895, Oct 2005
## Data set with three variables where data points constitute a helix data(helix) helixNA <- helix ## not a single complete observation helixNA <- t(apply(helix, 1, function(x) { x[sample(1:3, 1)] <- NA; x})) ## 50 steps is not enough, for good estimation use 1000 helixNlPca <- pca(helixNA, nPcs=1, method="nlpca", maxSteps=50) fittedData <- fitted(helixNlPca, helixNA) plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))]) ## compared to solution by Nipals PCA which cannot extract non-linear patterns helixNipPca <- pca(helixNA, nPcs=2) fittedData <- fitted(helixNipPca) plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
## Data set with three variables where data points constitute a helix data(helix) helixNA <- helix ## not a single complete observation helixNA <- t(apply(helix, 1, function(x) { x[sample(1:3, 1)] <- NA; x})) ## 50 steps is not enough, for good estimation use 1000 helixNlPca <- pca(helixNA, nPcs=1, method="nlpca", maxSteps=50) fittedData <- fitted(helixNlPca, helixNA) plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))]) ## compared to solution by Nipals PCA which cannot extract non-linear patterns helixNipPca <- pca(helixNA, nPcs=2) fittedData <- fitted(helixNipPca) plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
Missing values
nmissing(object, ...)
nmissing(object, ...)
object |
pcaRes object |
... |
Not used |
Get the number of missing values
Henning Redestig
Wrapper function for imputation methods based on nearest neighbour clustering. Currently llsImpute only.
nni(object, method = c("llsImpute"), subset = numeric(), ...)
nni(object, method = c("llsImpute"), subset = numeric(), ...)
object |
Numerical matrix with (or an object coercible to
such) with samples in rows and variables as columns. Also takes
|
method |
For convenience one can pass a large matrix but only use the variable specified as subset. Can be colnames or indices. |
subset |
Currently "llsImpute" only. |
... |
Further arguments to the chosen method. |
This method is wrapper function to llsImpute, See documentation
for link{llsImpute}
.
A clusterRes
object. Or a list containing a
clusterRes object as first and an ExpressionSet object as second
entry if the input was of type ExpressionSet.
Wolfram Stacklies
data(metaboliteData) llsRes <- nni(metaboliteData, k=6, method="llsImpute", allGenes=TRUE)
data(metaboliteData) llsRes <- nni(metaboliteData, k=6, method="llsImpute", allGenes=TRUE)
This is a class representation of nearest neighbour imputation (nni) result
Creating Objectsnew("nniRes", completeObs=[the estimated complete
observations], k=[cluster size], nObs=[amount of observations],
nVar=[amount of variables], centered=[was the data centered befor
running LLSimpute], center=[original means], method=[method used
to perform clustering], missing=[amount of NAs])
Slots
"matrix", the estimated complete observations
"numeric", amount of observations
"numeric", amount of variables
"character", the correlation method used (pearson, kendall or spearman)
"logical", data was centered or not
"numeric", the original variable centers
"numeric", cluster size
"character", the method used to perform the clustering
"numeric", the total amount of missing values in original data
Methods
Print function
Wolfram Stacklies
Get the number of observations used to build the PCA model.
nObs(object, ...)
nObs(object, ...)
object |
pcaRes object |
... |
Not used |
Number of observations
Henning Redestig
Get number of PCs
nP(object, ...)
nP(object, ...)
object |
pcaRes object |
... |
not used |
Number of PCs
Henning Redestig
Get number of PCs.
nPcs(object, ...)
nPcs(object, ...)
object |
pcaRes object |
... |
not used |
Number of PCs
Try to use link{nP}
instead since nPcs
tend to
clash with argument names.
Henning Redestig
Get the number of variables used to build the PCA model.
nVar(object, ...)
nVar(object, ...)
object |
pcaRes object |
... |
Not used |
Number of variables
Henning Redestig
Conjugate gradient optimization
optiAlgCgd(nlnet, trainIn, trainOut, verbose = FALSE)
optiAlgCgd(nlnet, trainIn, trainOut, verbose = FALSE)
nlnet |
The nlnet |
trainIn |
Training data |
trainOut |
fitted data |
verbose |
logical, print messages |
...
Henning Redestig, Matthias Scholz
ONB = orth(mat) is an orthonormal basis for the range of matrix mat. That is, ONB' * ONB = I, the columns of ONB span the same space as the columns of mat, and the number of columns of ONB is the rank of mat.
orth(mat, skipInac = FALSE)
orth(mat, skipInac = FALSE)
mat |
matrix to calculate orthonormal base |
skipInac |
do not include components with precision below .Machine$double.eps if TRUE |
orthonormal basis for the range of matrix
Wolfram Stacklies
Perform PCA on a numeric matrix for visualisation, information extraction and missing value imputation.
pca(object, method, nPcs = 2, scale = c("none", "pareto", "vector", "uv"), center = TRUE, completeObs = TRUE, subset = NULL, cv = c("none", "q2"), ...)
pca(object, method, nPcs = 2, scale = c("none", "pareto", "vector", "uv"), center = TRUE, completeObs = TRUE, subset = NULL, cv = c("none", "q2"), ...)
object |
Numerical matrix with (or an object coercible to
such) with samples in rows and variables as columns. Also takes
|
method |
One of the methods reported by
|
nPcs |
Number of principal components to calculate. |
scale |
Scaling, see |
center |
Centering, see |
completeObs |
Sets the |
subset |
A subset of variables to use for calculating the model. Can be column names or indices. |
cv |
character naming a the type of cross-validation to be performed. |
... |
This method is wrapper function for the following set of pca methods:
Uses classical prcomp
. See
documentation for svdPca
.
An iterative method capable of handling small
amounts of missing values. See documentation for
nipalsPca
.
Same as nipals but implemented in R.
An iterative method using a Bayesian model to handle
missing values. See documentation for bpca
.
An iterative method using a probabilistic model to
handle missing values. See documentation for ppca
.
Uses expectation maximation to perform SVD PCA
on incomplete data. See documentation for
svdImpute
.
Scaling and centering is part of the PCA model and handled by
prep
.
A pcaRes
object.
Wolfram Stacklies, Henning Redestig
Wold, H. (1966) Estimation of principal components and related models by iterative least squares. In Multivariate Analysis (Ed., P.R. Krishnaiah), Academic Press, NY, 391-420.
Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara and Shin Ishii. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16):2088-2096, Nov 2003.
Troyanskaya O. and Cantor M. and Sherlock G. and Brown P. and Hastie T. and Tibshirani R. and Botstein D. and Altman RB. - Missing value estimation methods for DNA microarrays. Bioinformatics. 2001 Jun;17(6):520-5.
prcomp
, princomp
,
nipalsPca
, svdPca
data(iris) ## Usually some kind of scaling is appropriate pcIr <- pca(iris, method="svd", nPcs=2) pcIr <- pca(iris, method="nipals", nPcs=3, cv="q2") ## Get a short summary on the calculated model summary(pcIr) plot(pcIr) ## Scores and loadings plot slplot(pcIr, sl=as.character(iris[,5])) ## use an expressionset and ggplot data(sample.ExpressionSet) pc <- pca(sample.ExpressionSet) df <- merge(scores(pc), pData(sample.ExpressionSet), by=0) library(ggplot2) ggplot(df, aes(PC1, PC2, shape=sex, color=type)) + geom_point() + xlab(paste("PC1", pc@R2[1] * 100, "% of the variance")) + ylab(paste("PC2", pc@R2[2] * 100, "% of the variance"))
data(iris) ## Usually some kind of scaling is appropriate pcIr <- pca(iris, method="svd", nPcs=2) pcIr <- pca(iris, method="nipals", nPcs=3, cv="q2") ## Get a short summary on the calculated model summary(pcIr) plot(pcIr) ## Scores and loadings plot slplot(pcIr, sl=as.character(iris[,5])) ## use an expressionset and ggplot data(sample.ExpressionSet) pc <- pca(sample.ExpressionSet) df <- merge(scores(pc), pData(sample.ExpressionSet), by=0) library(ggplot2) ggplot(df, aes(PC1, PC2, shape=sex, color=type)) + geom_point() + xlab(paste("PC1", pc@R2[1] * 100, "% of the variance")) + ylab(paste("PC2", pc@R2[2] * 100, "% of the variance"))
Principal Component Analysis in R
Package: | pcaMethods |
Type: | Package |
Developed since: | 2006 |
License: | GPL (>=3) |
LazyLoad: | yes |
Provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a unique interface to the PCA results. Developed at the Max-Planck Institute for Molecular Plant Physiology, Golm, Germany, RIKEN Plant Science Center Yokohama, Japan, and CAS-MPG Partner Institute for Computational Biology (PICB) Shanghai, P.R. China
Wolfram Stacklies, Henning Redestig
Lack of relevance for this plot and the fact that it
can not show cross-validation based diagnostics in the same plot
makes it redundant with the introduction of a dedicated
plot
function for pcaRes
. The new plot only shows
R2cum but the result is pretty much the same.
Henning Redestig
This is a class representation of a non-linear PCA neural
network. The nlpcaNet
class is not meant for user-level
usage.
Creating Objects
new("nlpcaNet", net=[the network structure],
hierarchic=[hierarchic design],
fct=[the functions at each layer], fkt=[the functions used for
forward propagation], weightDecay=[incremental decrease of weight
changes over iterations (between 0 and 1)], featureSorting=[sort
features or not], dataDist=[represents the present values],
inverse=[net is inverse mode or not], fCount=[amount of times
features were sorted], componentLayer=[which layer is the
'bottleneck' (principal components)],
erro=[the used error function], gradient=[the used gradient method],
weights=[the present weights],
maxIter=[the amount of iterations that was done], scalingFactor=[the
scale of the original matrix])
Slots
"matrix", matrix showing the representation of the neural network, e.g. (2,4,6) for a network with two features, a hidden layer and six output neurons (original variables).
"list", the hierarchic design of the network, holds 'idx' (), 'var' () and layer (which layer is the principal component layer).
"character", a vector naming the functions that will be applied on each layer. "linr" is linear (i.e.) standard matrix products and "tanh" means that the arcus tangens is applied on the result of the matrix product (for non-linearity).
"character", same as fct but the functions used during back propagation.
"numeric", the value that is used to incrementally decrease the weight changes to ensure convergence.
"logical", indicates if features will be sorted or not. This is used to make the NLPCA assume properties closer to those of standard PCA were the first component is more important for reconstructing the data than the second component.
"matrix", a matrix of ones and zeroes indicating which values will add to the errror.
"logical", network is inverse mode (currently only inverse is supported) or not. Eg. the case when we have truly missing values and wish to impute them.
"integer", Counter for the amount of times features were really sorted.
"numeric", the index of 'net' that is the component layer.
"function", the used error function. Currently only one
is provided errorHierarchic
.
"function", the used gradient function. Currently
only one is provided derrorHierarchic
"list", A list holding managements of the weights. The list has two functions, weights$current() and weights$set() which access a matrix in the local environment of this object.
"integer", the amount of iterations used to train this network.
"numeric", training the network is best made with 'small' values so the original data is scaled down to a suitable range by division with this number.
Methods
Returns the weights in a matrix representation.
Henning Redestig
This is a class representation of a PCA result
Creating Objectsnew("pcaRes", scores=[the scores], loadings=[the loadings],
nPcs=[amount of PCs], R2cum=[cumulative R2], nObs=[amount of
observations], nVar=[amount of variables], R2=[R2 for each
individual PC], sDev=[stdev for each individual PC],
centered=[was data centered], center=[original means],
varLimit=[what variance limit was exceeded], method=[method used to
calculate PCA], missing=[amount of NAs],
completeObs=[estimated complete observations])
Slots
"matrix", the calculated scores
"matrix", the calculated loadings
"numeric", the cumulative R2 values
"numeric", the individual standard deviations of the score vectors
"numeric", the individual R2 values
"numeric", cross-validation statistics
"numeric", number of observations
"numeric", number of variables
"logical", data was centered or not
"numeric", the original variable centers
"logical", data was scaled or not
"numeric", the original variable scales
"numeric", the exceeded variance limit
"numeric", the number of calculated PCs
"character", the method used to perform PCA
"numeric", the total amount of missing values in original data
"matrix", the estimated complete observations
"nlpcaNet", the network used by non-linear PCA
Methods (not necessarily exhaustive)
Print function
Extract information about PC relevance
Plot a barplot of standard deviations for PCs
Make a side by side score and loadings plot
Get the number of PCs
Get the number of observations
Cross-validation statistics
Get the number of variables
Get the loadings
Get the scores
Get the dimensions (number of observations, number of features)
Get a logical indicating if centering was done as part of the model
Get the averages of the original variables.
Get the imputed data set
Get a string naming the used PCA method
Get the standard deviations of the PCs
Get a logical indicating if scaling was done as part of the model
Get the scales of the original variablesb
Get the cumulative R2
Henning Redestig
Plot the computed diagnostics of PCA model to get an idea of their importance. Note though that the standard screeplot shows the standard deviations for the PCs this method shows the R2 values which empirically shows the importance of the P's and is thus applicable for any PCA method rather than just SVD based PCA.
## S3 method for class 'pcaRes' plot(x, y = NULL, main = deparse(substitute(object)), col = gray(c(0.9, 0.5)), ...)
## S3 method for class 'pcaRes' plot(x, y = NULL, main = deparse(substitute(object)), col = gray(c(0.9, 0.5)), ...)
x |
|
y |
not used |
main |
title of the plot |
col |
Colors of the bars |
... |
further arguments to barplot |
If cross-validation was done for the PCA the plot will also show
the CV based statistics. A common rule-of-thumb for determining
the optimal number of PCs is the PC where the CV diagnostic is at
its maximum but not very far from .
None, used for side effect.
Henning Redestig
data(metaboliteData) pc <- pca(t(metaboliteData), nPcs=5, cv="q2", scale="uv") plot(pc)
data(metaboliteData) pc <- pca(t(metaboliteData), nPcs=5, cv="q2", scale="uv") plot(pc)
A function that can be used to visualise many PCs plotted against each other
plotPcs(object, pcs = 1:nP(object), type = c("scores", "loadings"), sl = NULL, hotelling = 0.95, ...)
plotPcs(object, pcs = 1:nP(object), type = c("scores", "loadings"), sl = NULL, hotelling = 0.95, ...)
object |
|
pcs |
|
type |
|
sl |
|
hotelling |
|
... |
Further arguments to |
Uses pairs
to provide side-by-side plots. Note that
this function only plots scores or loadings but not both in the
same plot.
None, used for side effect.
Henning Redestig
prcomp
, pca
, princomp
, slplot
data(iris) pcIr <- pca(iris[,1:4], nPcs=3, method="svd") plotPcs(pcIr, col=as.integer(iris[,4]) + 1)
data(iris) pcIr <- pca(iris[,1:4], nPcs=3, method="svd") plotPcs(pcIr, col=as.integer(iris[,4]) + 1)
Implementation of probabilistic PCA (PPCA). PPCA allows to perform PCA on incomplete data and may be used for missing value estimation. This script was implemented after the Matlab version provided by Jakob Verbeek ( see http://lear.inrialpes.fr/~verbeek/) and the draft “EM Algorithms for PCA and Sensible PCA” written by Sam Roweis.
ppca(Matrix, nPcs = 2, seed = NA, threshold = 1e-05, maxIterations = 1000, ...)
ppca(Matrix, nPcs = 2, seed = NA, threshold = 1e-05, maxIterations = 1000, ...)
Matrix |
|
nPcs |
|
seed |
|
threshold |
Convergence threshold. |
maxIterations |
the maximum number of allowed iterations |
... |
Reserved for future use. Currently no further parameters are used. |
Probabilistic PCA combines an EM approach for PCA with a probabilistic model. The EM approach is based on the assumption that the latent variables as well as the noise are normal distributed.
In standard PCA data which is far from the training set but close to the principal subspace may have the same reconstruction error. PPCA defines a likelihood function such that the likelihood for data far from the training set is much lower, even if they are close to the principal subspace. This allows to improve the estimation accuracy.
A method called kEstimate
is provided to estimate the
optimal number of components via cross validation. In general few
components are sufficient for reasonable estimation accuracy. See
also the package documentation for further discussion on what kind
of data PCA-based missing value estimation is advisable.
Complexity:
Runtime is linear in the number of data,
number of data dimensions and number of principal components.
Convergence: The threshold indicating convergence was changed from 1e-3 in 1.2.x to 1e-5 in the current version leading to more stable results. For reproducability you can set the seed (parameter seed) of the random number generator. If used for missing value estimation, results may be checked by simply running the algorithm several times with changing seed, if the estimated values show little variance the algorithm converged well.
Standard PCA result object used by all PCA-based methods
of this package. Contains scores, loadings, data mean and
more. See pcaRes
for details.
Requires MASS
. It is not recommended to use this
function directely but rather to use the pca() wrapper function.
Wolfram Stacklies
bpca, svdImpute, prcomp,
nipalsPca, pca, pcaRes
.
## Load a sample metabolite dataset with 5\% missing values (metaboliteData) data(metaboliteData) ## Perform probabilistic PCA using the 3 largest components result <- pca(t(metaboliteData), method="ppca", nPcs=3, seed=123) ## Get the estimated complete observations cObs <- completeObs(result) ## Plot the scores plotPcs(result, type = "scores")
## Load a sample metabolite dataset with 5\% missing values (metaboliteData) data(metaboliteData) ## Perform probabilistic PCA using the 3 largest components result <- pca(t(metaboliteData), method="ppca", nPcs=3, seed=123) ## Get the estimated complete observations cObs <- completeObs(result) ## Plot the scores plotPcs(result, type = "scores")
Predict data using PCA model
## S3 method for class 'pcaRes' predict(object, newdata, pcs = nP(object), pre = TRUE, post = TRUE, ...) ## S4 method for signature 'pcaRes' predict(object, newdata, pcs = nP(object), pre = TRUE, post = TRUE, ...)
## S3 method for class 'pcaRes' predict(object, newdata, pcs = nP(object), pre = TRUE, post = TRUE, ...) ## S4 method for signature 'pcaRes' predict(object, newdata, pcs = nP(object), pre = TRUE, post = TRUE, ...)
object |
|
newdata |
|
pcs |
|
pre |
pre-process |
post |
unpre-process the final data (add the center back etc) |
... |
Not passed on anywhere, included for S3 consistency. |
This function extracts the predict values from a pcaRes object for
the PCA methods SVD, Nipals, PPCA and BPCA. Newdata is first
centered if the PCA model was and then scores () and data
(
) is 'predicted' according to :
. Missing values are
set to zero before matrix multiplication to achieve NIPALS like
treatment of missing values.
A list with the following components:
scores |
The predicted scores |
x |
The predicted data |
Henning Redestig
data(iris) hidden <- sample(nrow(iris), 50) pcIr <- pca(iris[-hidden,1:4]) pcFull <- pca(iris[,1:4]) irisHat <- predict(pcIr, iris[hidden,1:4]) cor(irisHat$scores[,1], scores(pcFull)[hidden,1])
data(iris) hidden <- sample(nrow(iris), 50) pcIr <- pca(iris[-hidden,1:4]) pcFull <- pca(iris[,1:4]) irisHat <- predict(pcIr, iris[hidden,1:4]) cor(irisHat$scores[,1], scores(pcFull)[hidden,1])
Scaling and centering a matrix.
prep(object, scale = c("none", "pareto", "vector", "uv"), center = TRUE, eps = 1e-12, simple = TRUE, reverse = FALSE, ...)
prep(object, scale = c("none", "pareto", "vector", "uv"), center = TRUE, eps = 1e-12, simple = TRUE, reverse = FALSE, ...)
object |
Numerical matrix (or an object coercible to such)
with samples in rows and variables as columns. Also takes
|
scale |
One of "UV" (unit variance |
center |
Either a logical which indicates if the matrix
should be mean centred or not, or a vector with averages which
should be suntracted from the matrix. |
eps |
Minimum variance, variable with lower variance are not scaled and warning is issued instead. |
simple |
Logical indicating if only the data should be returned or a list with the pre-processing statistics as well. |
reverse |
Logical indicating if matrix should be 'post-processed' instead by multiplying each column with its scale and adding the center. In this case, center and scale should be vectors with the statistics (no warning is issued if not, instead output becomes the same as input). |
... |
Only used for passing through arguments. |
Does basically the same as scale
but adds some
alternative scaling options and functionality for treating
pre-processing as part of a model.
A pre-processed matrix or a list with
center |
a vector with the estimated centers |
scale |
a vector with the estimated scales |
data |
the pre (or post) processed data |
Henning Redestig
object <- matrix(rnorm(50), nrow=10) res <- prep(object, scale="uv", center=TRUE, simple=FALSE) obj <- prep(object, scale=res$scale, center=res$center) ## same as original sum((object - prep(obj, scale=res$scale, center=res$center, rev=TRUE))^2)
object <- matrix(rnorm(50), nrow=10) res <- prep(object, scale="uv", center=TRUE, simple=FALSE) obj <- prep(object, scale=res$scale, center=res$center) ## same as original sum((object - prep(obj, scale=res$scale, center=res$center, rev=TRUE))^2)
Internal cross-validation can be used for estimating the level of structure in a data set and to optimise the choice of number of principal components.
Q2(object, originalData = completeObs(object), fold = 5, nruncv = 1, type = c("krzanowski", "impute"), verbose = interactive(), variables = 1:nVar(object), ...)
Q2(object, originalData = completeObs(object), fold = 5, nruncv = 1, type = c("krzanowski", "impute"), verbose = interactive(), variables = 1:nVar(object), ...)
object |
A |
originalData |
The matrix (or ExpressionSet) that used to obtain the pcaRes object. |
fold |
The number of groups to divide the data in. |
nruncv |
The number of times to repeat the whole cross-validation |
type |
krzanowski or imputation type cross-validation |
verbose |
|
variables |
indices of the variables to use during cross-validation calculation. Other variables are kept as they are and do not contribute to the total sum-of-squares. |
... |
Further arguments passed to the |
This method calculates for a PCA model. This is the
cross-validated version of
and can be interpreted as the
ratio of variance that can be predicted independently by the PCA
model. Poor (low)
indicates that the PCA model only
describes noise and that the model is unrelated to the true data
structure. The definition of
is:
for the matrix
which has
rows and
columns. For a given
number of PC's x is estimated as
(T are scores
and P are loadings). Although this defines the leave-one-out
cross-validation this is not what is performed if fold is less
than the number of rows and/or columns. In 'impute' type CV,
diagonal rows of elements in the matrix are deleted and the
re-estimated. In 'krzanowski' type CV, rows are sequentially left
out to build fold PCA models which give the loadings. Then,
columns are sequentially left out to build fold models for
scores. By combining scores and loadings from different models, we
can estimate completely left out values. The two types may seem
similar but can give very different results, krzanowski typically
yields more stable and reliable result for estimating data
structure whereas impute is better for evaluating missing value
imputation performance. Note that since Krzanowski CV operates on
a reduced matrix, it is not possible estimate Q2 for all
components and the result vector may therefore be shorter than
nPcs(object)
.
A matrix or vector with estimates.
Henning Redestig, Ondrej Mikula
Krzanowski, WJ. Cross-validation in principal component analysis. Biometrics. 1987(43):3,575-584
data(iris) x <- iris[,1:4] pcIr <- pca(x, nPcs=3) q2 <- Q2(pcIr, x) barplot(q2, main="Krzanowski CV", xlab="Number of PCs", ylab=expression(Q^2)) ## q2 for a single variable Q2(pcIr, x, variables=2) pcIr <- pca(x, nPcs=3, method="nipals") q2 <- Q2(pcIr, x, type="impute") barplot(q2, main="Imputation CV", xlab="Number of PCs", ylab=expression(Q^2))
data(iris) x <- iris[,1:4] pcIr <- pca(x, nPcs=3) q2 <- Q2(pcIr, x) barplot(q2, main="Krzanowski CV", xlab="Number of PCs", ylab=expression(Q^2)) ## q2 for a single variable Q2(pcIr, x, variables=2) pcIr <- pca(x, nPcs=3, method="nipals") q2 <- Q2(pcIr, x, type="impute") barplot(q2, main="Imputation CV", xlab="Number of PCs", ylab=expression(Q^2))
Cumulative R2 is the total ratio of variance that is being explained by the model
## S4 method for signature 'pcaRes' R2cum(object, ...)
## S4 method for signature 'pcaRes' R2cum(object, ...)
object |
a |
... |
Not used |
Get the cumulative R2
Henning Redestig
Flexible calculation of R2 goodness of fit.
## S4 method for signature 'pcaRes' R2VX(object, direction = c("variables", "observations", "complete"), data = completeObs(object), pcs = nP(object))
## S4 method for signature 'pcaRes' R2VX(object, direction = c("variables", "observations", "complete"), data = completeObs(object), pcs = nP(object))
object |
a PCA model object |
direction |
choose between calculating R2 per variable, per observation or for the entire data with 'variables', 'observations' or 'complete'. |
data |
the data used to fit the model |
pcs |
the number of PCs to use to calculate R2 |
A vector with R2 values
Henning Redestig
R2VX(pca(iris))
R2VX(pca(iris))
This function extracts the residuals values from a pcaRes object for the PCA methods SVD, Nipals, PPCA and BPCA
## S3 method for class 'pcaRes' residuals(object, data = completeObs(object), ...) ## S4 method for signature 'pcaRes' residuals(object, data = completeObs(object), ...) ## S4 method for signature 'pcaRes' resid(object, data = completeObs(object), ...)
## S3 method for class 'pcaRes' residuals(object, data = completeObs(object), ...) ## S4 method for signature 'pcaRes' residuals(object, data = completeObs(object), ...) ## S4 method for signature 'pcaRes' resid(object, data = completeObs(object), ...)
object |
|
data |
|
... |
Passed on to |
A matrix
with the residuals
Henning Redestig
data(iris) pcIr <- pca(iris[,1:4]) head(residuals(pcIr, iris[,1:4]))
data(iris) pcIr <- pca(iris[,1:4]) head(residuals(pcIr, iris[,1:4]))
Creates a large matrix B consisting of an M-by-N tiling of copies of A
repmat(mat, M, N)
repmat(mat, M, N)
mat |
numeric matrix |
M |
number of copies in vertical direction |
N |
number of copies in horizontal direction |
Matrix consiting of M-by-N tiling copies of input matrix
Wolfram Stacklies
PCA by non-linear iterative partial least squares
RnipalsPca(Matrix, nPcs = 2, varLimit = 1, maxSteps = 5000, threshold = 1e-06, verbose = interactive(), ...)
RnipalsPca(Matrix, nPcs = 2, varLimit = 1, maxSteps = 5000, threshold = 1e-06, verbose = interactive(), ...)
Matrix |
Pre-processed (centered, scaled) numerical matrix samples in rows and variables as columns. |
nPcs |
Number of components that should be extracted. |
varLimit |
Optionally the ratio of variance that should be
explained. |
maxSteps |
Defines how many iterations can be done before algorithm should abort (happens almost exclusively when there were some wrong in the input data). |
threshold |
The limit condition for judging if the algorithm
has converged or not, specifically if a new iteration is done if
|
verbose |
Show simple progress information. |
... |
Only used for passing through arguments. |
Can be used for computing PCA on a numeric matrix using either the
NIPALS algorithm which is an iterative approach for estimating the
principal components extracting them one at a time. NIPALS can
handle a small amount of missing values. It is not recommended to
use this function directely but rather to use the pca() wrapper
function. There is a C++ implementation given as nipalsPca
which is faster.
A pcaRes
object.
Henning Redestig
Wold, H. (1966) Estimation of principal components and related models by iterative least squares. In Multivariate Analysis (Ed., P.R. Krishnaiah), Academic Press, NY, 391-420.
prcomp
, princomp
, pca
data(metaboliteData) mat <- prep(t(metaboliteData)) ## c++ version is faster system.time(pc <- RnipalsPca(mat, method="rnipals", nPcs=2)) system.time(pc <- nipalsPca(mat, nPcs=2)) ## better use pca() pc <- pca(t(metaboliteData), method="rnipals", nPcs=2)
data(metaboliteData) mat <- prep(t(metaboliteData)) ## c++ version is faster system.time(pc <- RnipalsPca(mat, method="rnipals", nPcs=2)) system.time(pc <- nipalsPca(mat, nPcs=2)) ## better use pca() pc <- pca(t(metaboliteData), method="rnipals", nPcs=2)
This is a PCA implementation robust to outliers in a data set. It
can also handle missing values, it is however NOT intended to be
used for missing value estimation. As it is based on robustSVD we
will get an accurate estimation for the loadings also for
incomplete data or for data with outliers. The returned scores
are, however, affected by the outliers as they are calculated
inputData X loadings. This also implies that you should look at
the returned R2/R2cum values with caution. If the data show
missing values, scores are caluclated by just setting all NA -
values to zero. This is not expected to produce accurate results.
Please have also a look at the manual page for robustSvd
.
Thus this method should mainly be seen as an attempt to integrate
robustSvd()
into the framework of this package. Use one of
the other methods coming with this package (like PPCA or BPCA) if
you want to do missing value estimation. It is not recommended to
use this function directely but rather to use the pca() wrapper
function.
robustPca(Matrix, nPcs = 2, verbose = interactive(), ...)
robustPca(Matrix, nPcs = 2, verbose = interactive(), ...)
Matrix |
|
nPcs |
|
verbose |
|
... |
Reserved for future use. Currently no further parameters are used |
The method is very similar to the standard prcomp()
function. The main difference is that robustSvd()
is used
instead of the conventional svd()
method.
Standard PCA result object used by all PCA-based methods
of this package. Contains scores, loadings, data mean and
more. See pcaRes
for details. are used.
Wolfram Stacklies
robustSvd, svd, prcomp,
pcaRes
.
## Load a complete sample metabolite data set and mean center the data data(metaboliteDataComplete) mdc <- scale(metaboliteDataComplete, center=TRUE, scale=FALSE) ## Now create 5\% of outliers. cond <- runif(length(mdc)) < 0.05; mdcOut <- mdc mdcOut[cond] <- 10 ## Now we do a conventional PCA and robustPca on the original and the data ## with outliers. ## We use center=FALSE here because the large artificial outliers would ## affect the means and not allow to objectively compare the results. resSvd <- pca(mdc, method="svd", nPcs=10, center=FALSE) resSvdOut <- pca(mdcOut, method="svd", nPcs=10, center=FALSE) resRobPca <- pca(mdcOut, method="robustPca", nPcs=10, center=FALSE) ## Now we plot the results for the original data against those with outliers ## We can see that robustPca is hardly effected by the outliers. plot(loadings(resSvd)[,1], loadings(resSvdOut)[,1]) plot(loadings(resSvd)[,1], loadings(resRobPca)[,1])
## Load a complete sample metabolite data set and mean center the data data(metaboliteDataComplete) mdc <- scale(metaboliteDataComplete, center=TRUE, scale=FALSE) ## Now create 5\% of outliers. cond <- runif(length(mdc)) < 0.05; mdcOut <- mdc mdcOut[cond] <- 10 ## Now we do a conventional PCA and robustPca on the original and the data ## with outliers. ## We use center=FALSE here because the large artificial outliers would ## affect the means and not allow to objectively compare the results. resSvd <- pca(mdc, method="svd", nPcs=10, center=FALSE) resSvdOut <- pca(mdcOut, method="svd", nPcs=10, center=FALSE) resRobPca <- pca(mdcOut, method="robustPca", nPcs=10, center=FALSE) ## Now we plot the results for the original data against those with outliers ## We can see that robustPca is hardly effected by the outliers. plot(loadings(resSvd)[,1], loadings(resSvdOut)[,1]) plot(loadings(resSvd)[,1], loadings(resRobPca)[,1])
A robust approximation to the singular value decomposition of a rectangular matrix is computed using an alternating L1 norm (instead of the more usual least squares L2 norm). As the SVD is a least-squares procedure, it is highly susceptible to outliers and in the extreme case, an individual cell (if sufficiently outlying) can draw even the leading principal component toward itself.
robustSvd(x)
robustSvd(x)
x |
A matrix whose SVD decomposition is to be computed. Missing values are allowed. |
See Hawkins et al (2001) for details on the robust SVD algorithm. Briefly, the idea is to sequentially estimate the left and right eigenvectors using an L1 (absolute value) norm minimization.
Note that the robust SVD is able to accomodate missing values in
the matrix x
, unlike the usual svd
function.
Also note that the eigenvectors returned by the robust SVD algorithm are NOT (in general) orthogonal and the eigenvalues need not be descending in order.
The robust SVD of the matrix is x=u d v'.
d |
A
vector containing the singular values of |
u |
A
matrix whose columns are the left singular vectors of |
v |
A matrix whose columns are the right singular vectors of
|
Two differences from the usual SVD may be noted. One relates to orthogonality. In the conventional SVD, all the eigenvectors are orthogonal even if not explicitly imposed. Those returned by the AL1 algorithm (used here) are (in general) not orthogonal. Another difference is that, in the L2 analysis of the conventional SVD, the successive eigen triples (eigenvalue, left eigenvector, right eigenvector) are found in descending order of eigenvalue. This is not necessarily the case with the AL1 algorithm. Hawkins et al (2001) note that a larger eigen value may follow a smaller one.
Kevin Wright, modifications by Wolfram Stacklies
Hawkins, Douglas M, Li Liu, and S Stanley Young (2001) Robust Singular Value Decomposition, National Institute of Statistical Sciences, Technical Report Number 122. http://www.niss.org/technicalreports/tr122.pdf
svd
, nipals
for
an alternating L2 norm method that also accommodates missing data.
## Load a complete sample metabolite data set and mean center the data data(metaboliteDataComplete) mdc <- prep(metaboliteDataComplete, center=TRUE, scale="none") ## Now create 5% of outliers. cond <- runif(length(mdc)) < 0.05; mdcOut <- mdc mdcOut[cond] <- 10 ## Now we do a conventional SVD and a robustSvd on both, the original and the ## data with outliers. resSvd <- svd(mdc) resSvdOut <- svd(mdcOut) resRobSvd <- robustSvd(mdc) resRobSvdOut <- robustSvd(mdcOut) ## Now we plot the results for the original data against those with outliers ## We can see that robustSvd is hardly affected by the outliers. plot(resSvd$v[,1], resSvdOut$v[,1]) plot(resRobSvd$v[,1], resRobSvdOut$v[,1])
## Load a complete sample metabolite data set and mean center the data data(metaboliteDataComplete) mdc <- prep(metaboliteDataComplete, center=TRUE, scale="none") ## Now create 5% of outliers. cond <- runif(length(mdc)) < 0.05; mdcOut <- mdc mdcOut[cond] <- 10 ## Now we do a conventional SVD and a robustSvd on both, the original and the ## data with outliers. resSvd <- svd(mdc) resSvdOut <- svd(mdcOut) resRobSvd <- robustSvd(mdc) resRobSvdOut <- robustSvd(mdcOut) ## Now we plot the results for the original data against those with outliers ## We can see that robustSvd is hardly affected by the outliers. plot(resSvd$v[,1], resSvdOut$v[,1]) plot(resRobSvd$v[,1], resRobSvdOut$v[,1])
Check if scaling was part of the PCA model
scaled(object, ...)
scaled(object, ...)
object |
pcaRes object |
... |
Not used |
TRUE if scaling was part of the PCA model
Henning Redestig
Get the scales (e.g. standard deviations) of the original variables
scl(object, ...)
scl(object, ...)
object |
pcaRes object |
... |
Not used |
Vector with the scales
Henning Redestig
Get scores from a pcaRes object
## S4 method for signature 'pcaRes' scores(object, ...)
## S4 method for signature 'pcaRes' scores(object, ...)
object |
a pcaRes object |
... |
not used |
The scores as a matrix
Henning Redestig
Get scores from a pcaRes object
## S3 method for class 'pcaRes' scores(object, ...)
## S3 method for class 'pcaRes' scores(object, ...)
object |
a pcaRes object |
... |
not used |
The scores as a matrix
Henning Redestig
Get the standard deviations of the scores (indicates their relevance)
sDev(object, ...)
sDev(object, ...)
object |
pcaRes object |
... |
Not used |
Standard devations of the scores
Henning Redestig
Print basic information about pcaRes object
showPcaRes(x, ...) ## S4 method for signature 'pcaRes' print(x, ...) ## S4 method for signature 'pcaRes' show(object)
showPcaRes(x, ...) ## S4 method for signature 'pcaRes' print(x, ...) ## S4 method for signature 'pcaRes' show(object)
x |
a pcaRes object |
... |
not used |
object |
the object to print information about |
nothing, used for its side effect
Henning Redestig
Print a brief description of nniRes model
showNniRes(x, ...)
showNniRes(x, ...)
x |
An |
... |
Not used |
Nothing, used for side-effect
Henning Redestig
Get a confidence ellipse for uncorrelated bivariate data
simpleEllipse(x, y, alfa = 0.95, len = 200)
simpleEllipse(x, y, alfa = 0.95, len = 200)
x |
first variable |
y |
second variable |
alfa |
confidence level of the circle |
len |
Number of points in the circle |
As described in 'Introduction to multi and megavariate data analysis using PCA and PLS' by Eriksson et al. This produces very similar ellipse as compared to the ellipse function the ellipse package except that this function assumes that and y are uncorrelated (which they of are if they are scores or loadings from a PCA).
A matrix with X and Y coordinates for the circle
Henning Redestig
ellipse
A common way of visualizing two principal components
slplot(object, pcs=c(1,2), scoresLoadings=c(TRUE, TRUE), sl="def", ll="def", hotelling=0.95, rug=TRUE, sub=NULL,...)
slplot(object, pcs=c(1,2), scoresLoadings=c(TRUE, TRUE), sl="def", ll="def", hotelling=0.95, rug=TRUE, sub=NULL,...)
object |
a pcaRes object |
pcs |
which two pcs to plot |
scoresLoadings |
Which should be shown scores and or loadings |
sl |
labels to plot in the scores plot |
ll |
labels to plot in the loadings plot |
hotelling |
confidence interval for ellipse in the score plot |
rug |
logical, rug x axis in score plot or not |
sub |
Subtitle, defaults to annotate with amount of explained variance. |
... |
Further arguments to plot functions. Prefix arguments
to |
This method is meant to be used as a quick way to visualize
results, if you want a more specific plot you probably want to
get the scores, loadings with scores(object)
,
loadings(object)
and then design your own plotting method.
None, used for side effect.
Uses layout instead of par to provide side-by-side so it
works with Sweave (but can not be combined with
par(mfrow=..))
Henning Redestig
data(iris) pcIr <- pca(iris[,1:4], scale="uv") slplot(pcIr, sl=NULL, spch=5) slplot(pcIr, sl=NULL, lcex=1.3, scol=as.integer(iris[,5]))
data(iris) pcIr <- pca(iris[,1:4], scale="uv") slplot(pcIr, sl=NULL, spch=5) slplot(pcIr, sl=NULL, lcex=1.3, scol=as.integer(iris[,5]))
Sort the features of NLPCA object
sortFeatures(nlnet, trainIn, trainOut)
sortFeatures(nlnet, trainIn, trainOut)
nlnet |
The nlnet |
trainIn |
Training data in |
trainOut |
Training data after it passed through the net |
...
Henning Redestig
Print a brief description of the PCA model
## S3 method for class 'pcaRes' summary(object, ...)
## S3 method for class 'pcaRes' summary(object, ...)
object |
a pcaRes object |
... |
Not used |
Nothing, used for side-effect
Henning Redestig
This implements the SVDimpute algorithm as proposed by Troyanskaya
et al, 2001. The idea behind the algorithm is to estimate the
missing values as a linear combination of the k
most
significant eigengenes.
svdImpute(Matrix, nPcs = 2, threshold = 0.01, maxSteps = 100, verbose = interactive(), ...)
svdImpute(Matrix, nPcs = 2, threshold = 0.01, maxSteps = 100, verbose = interactive(), ...)
Matrix |
|
nPcs |
|
threshold |
The iteration stops if the change in the matrix falls below this threshold. |
maxSteps |
Maximum number of iteration steps. |
verbose |
Print some output if TRUE. |
... |
Reserved for parameters used in future version of the algorithm |
Missing values are denoted as NA
. It is not recommended
to use this function directely but rather to use the pca() wrapper
function.
As SVD can only be performed on complete matrices, all missing values are initially replaced by 0 (what is in fact the mean on centred data). The algorithm works iteratively until the change in the estimated solution falls below a certain threshold. Each step the eigengenes of the current estimate are calculated and used to determine a new estimate. Eigengenes denote the loadings if pca is performed considering variable (for Microarray data genes) as observations.
An optimal linear combination is found by regressing the
incomplete variable against the k
most significant
eigengenes. If the value at position j
is missing, the
value of the eigengenes is not used when
determining the regression coefficients.
Standard PCA result object used by all PCA-based methods
of this package. Contains scores, loadings, data mean and
more. See pcaRes
for details.
Each iteration, standard PCA (prcomp
) needs to be
done for each incomplete variable to get the eigengenes. This is
usually fast for small data sets, but complexity may rise if the
data sets become very large.
Wolfram Stacklies
Troyanskaya O. and Cantor M. and Sherlock G. and Brown P. and Hastie T. and Tibshirani R. and Botstein D. and Altman RB. - Missing value estimation methods for DNA microarrays. Bioinformatics. 2001 Jun;17(6):520-5.
## Load a sample metabolite dataset with 5\% missing values data(metaboliteData) ## Perform svdImpute using the 3 largest components result <- pca(metaboliteData, method="svdImpute", nPcs=3, center = TRUE) ## Get the estimated complete observations cObs <- completeObs(result) ## Now plot the scores plotPcs(result, type = "scores")
## Load a sample metabolite dataset with 5\% missing values data(metaboliteData) ## Perform svdImpute using the 3 largest components result <- pca(metaboliteData, method="svdImpute", nPcs=3, center = TRUE) ## Get the estimated complete observations cObs <- completeObs(result) ## Now plot the scores plotPcs(result, type = "scores")
A wrapper function for prcomp
to deliver the result as a
pcaRes
method. Supplied for compatibility with the rest
of the pcaMethods package. It is not recommended to use this
function directely but rather to use the pca()
wrapper
function.
svdPca(Matrix, nPcs = 2, varLimit = 1, verbose = interactive(), ...)
svdPca(Matrix, nPcs = 2, varLimit = 1, verbose = interactive(), ...)
Matrix |
Pre-processed (centered and possibly scaled) numerical matrix samples in rows and variables as columns. No missing values allowed. |
nPcs |
Number of components that should be extracted. |
varLimit |
Optionally the ratio of variance that should be
explained. |
verbose |
Verbose complaints to matrix structure |
... |
Only used for passing through arguments. |
A pcaRes
object.
Henning Redestig
prcomp
, princomp
, pca
data(metaboliteDataComplete) mat <- prep(t(metaboliteDataComplete)) pc <- svdPca(mat, nPcs=2) ## better use pca() pc <- pca(t(metaboliteDataComplete), method="svd", nPcs=2)
data(metaboliteDataComplete) mat <- prep(t(metaboliteDataComplete)) pc <- svdPca(mat, nPcs=2) ## better use pca() pc <- pca(t(metaboliteDataComplete), method="svd", nPcs=2)
Simply replace completely missing rows or cols with zeroes.
tempFixNas(mat)
tempFixNas(mat)
mat |
a matrix |
The original matrix with completely missing rows/cols filled with zeroes.
Henning Redestig
Tranform the vectors of weights to matrix structure
## S4 method for signature 'matrix' vector2matrices(object, net)
## S4 method for signature 'matrix' vector2matrices(object, net)
object |
an nlpcaNet |
net |
the neural network |
weights in matrix structure
Henning Redestig
Tranform the vectors of weights to matrix structure
## S4 method for signature 'nlpcaNet' vector2matrices(object)
## S4 method for signature 'nlpcaNet' vector2matrices(object)
object |
an nlpcaNet |
weights in matrix structure
Henning Redestig
Get a matrix with indicating the elements that were missing in the input data. Convenient for estimating imputation performance.
wasna(object, ...)
wasna(object, ...)
object |
pcaRes object |
... |
Not used |
A matrix with logicals
Henning Redestig
data(metaboliteData) data(metaboliteDataComplete) result <- pca(metaboliteData, nPcs=2) plot(completeObs(result)[wasna(result)], metaboliteDataComplete[wasna(result)])
data(metaboliteData) data(metaboliteDataComplete) result <- pca(metaboliteData, nPcs=2) plot(completeObs(result)[wasna(result)], metaboliteDataComplete[wasna(result)])
Create an object that holds the weights for nlpcaNet. Holds and sets weights in using an environment object.
weightsAccount(w)
weightsAccount(w)
w |
|
A weightsAccound with set
and current
functions.
Henning Redestig