Title: | Classification using generalized partial least squares |
---|---|
Description: | Classification using generalized partial least squares for two-group and multi-group (more than 2 group) classification. |
Authors: | Beiying Ding |
Maintainer: | Bioconductor Package Maintainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.79.0 |
Built: | 2024-11-30 05:18:24 UTC |
Source: | https://github.com/bioc/gpls |
Fit Iteratively ReWeighted Least Squares (IRWPLS) with an option of Firth's bias reduction procedure (IRWPLSF) for two-group classification
glpls1a(X, y, K.prov = NULL, eps = 0.001, lmax = 100, b.ini = NULL, denom.eps = 1e-20, family = "binomial", link = NULL, br = TRUE)
glpls1a(X, y, K.prov = NULL, eps = 0.001, lmax = 100, b.ini = NULL, denom.eps = 1e-20, family = "binomial", link = NULL, br = TRUE)
X |
n by p design matrix (with no intercept term) |
y |
response vector 0 or 1 |
K.prov |
number of PLS components, default is the rank of X |
eps |
tolerance for convergence |
lmax |
maximum number of iteration allowed |
b.ini |
initial value of regression coefficients |
denom.eps |
small quanitity to guarantee nonzero denominator in deciding convergence |
family |
glm family, |
link |
link function, |
br |
TRUE if Firth's bias reduction procedure is used |
coefficients |
regression coefficients |
convergence |
whether convergence is achieved |
niter |
total number of iterations |
bias.reduction |
whether Firth's procedure is used |
loading.matrix |
the matrix of loadings |
Beiying Ding, Robert Gentleman
Ding, B.Y. and Gentleman, R. (2003) Classification using generalized partial least squares.
Marx, B.D (1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4): 374-381.
glpls1a.mlogit
, glpls1a.logit.all
,
glpls1a.train.test.error
,
glpls1a.cv.error
, glpls1a.mlogit.cv.error
x <- matrix(rnorm(20),ncol=2) y <- sample(0:1,10,TRUE) ## no bias reduction glpls1a(x,y,br=FALSE) ## no bias reduction and 1 PLS component glpls1a(x,y,K.prov=1,br=FALSE) ## bias reduction glpls1a(x,y,br=TRUE)
x <- matrix(rnorm(20),ncol=2) y <- sample(0:1,10,TRUE) ## no bias reduction glpls1a(x,y,br=FALSE) ## no bias reduction and 1 PLS component glpls1a(x,y,K.prov=1,br=FALSE) ## bias reduction glpls1a(x,y,br=TRUE)
Leave-one-out cross-validation training set classification error for fitting IRWPLS or IRWPLSF model for two group classification
glpls1a.cv.error(train.X,train.y, K.prov=NULL,eps=1e-3,lmax=100,family="binomial",link="logit",br=T)
glpls1a.cv.error(train.X,train.y, K.prov=NULL,eps=1e-3,lmax=100,family="binomial",link="logit",br=T)
train.X |
n by p design matrix (with no intercept term) for training set |
train.y |
response vector (0 or 1) for training set |
K.prov |
number of PLS components, default is the rank of train.X |
eps |
tolerance for convergence |
lmax |
maximum number of iteration allowed |
family |
glm family, |
link |
link function, |
br |
TRUE if Firth's bias reduction procedure is used |
error |
LOOCV training error |
error.obs |
the misclassified error observation indices |
Beiying Ding, Robert Gentleman
Ding, B.Y. and Gentleman, R. (2003) Classification using generalized partial least squares.
Marx, B.D (1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4): 374-381.
glpls1a.train.test.error
,
glpls1a.mlogit.cv.error
, glpls1a
, glpls1a.mlogit
,glpls1a.logit.all
x <- matrix(rnorm(20),ncol=2) y <- sample(0:1,10,TRUE) ## no bias reduction glpls1a.cv.error(x,y,br=FALSE) ## bias reduction and 1 PLS component glpls1a.cv.error(x,y,K.prov=1, br=TRUE)
x <- matrix(rnorm(20),ncol=2) y <- sample(0:1,10,TRUE) ## no bias reduction glpls1a.cv.error(x,y,br=FALSE) ## bias reduction and 1 PLS component glpls1a.cv.error(x,y,K.prov=1, br=TRUE)
Apply Iteratively ReWeighted Least Squares (MIRWPLS) with an option of Firth's bias reduction procedure (MIRWPLSF) for multi-group (say C+1 classes) classification by fitting logit models for all C classes vs baseline class separately.
glpls1a.logit.all(X, y, K.prov = NULL, eps = 0.001, lmax = 100, b.ini = NULL, denom.eps = 1e-20, family = "binomial", link = "logit", br = T)
glpls1a.logit.all(X, y, K.prov = NULL, eps = 0.001, lmax = 100, b.ini = NULL, denom.eps = 1e-20, family = "binomial", link = "logit", br = T)
X |
n by p design matrix (with no intercept term) |
y |
response vector with class lables 1 to C+1 for C+1 group classification, baseline class should be 1 |
K.prov |
number of PLS components |
eps |
tolerance for convergence |
lmax |
maximum number of iteration allowed |
b.ini |
initial value of regression coefficients |
denom.eps |
small quanitity to guarantee nonzero denominator in deciding convergence |
family |
glm family, |
link |
link function, |
br |
TRUE if Firth's bias reduction procedure is used |
coefficients |
regression coefficient matrix |
Beiying Ding, Robert Gentleman
Ding, B.Y. and Gentleman, R. (2003) Classification using generalized partial least squares.
Marx, B.D (1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4): 374-381.
glpls1a.mlogit
,glpls1a
,glpls1a.mlogit.cv.error
,
glpls1a.train.test.error
,
glpls1a.cv.error
x <- matrix(rnorm(20),ncol=2) y <- sample(1:3,10,TRUE) ## no bias reduction glpls1a.logit.all(x,y,br=FALSE) ## bias reduction glpls1a.logit.all(x,y,br=TRUE)
x <- matrix(rnorm(20),ncol=2) y <- sample(1:3,10,TRUE) ## no bias reduction glpls1a.logit.all(x,y,br=FALSE) ## bias reduction glpls1a.logit.all(x,y,br=TRUE)
Fit multi-logit Iteratively ReWeighted Least Squares (MIRWPLS) with an option of Firth's bias reduction procedure (MIRWPLSF) for multi-group classification
glpls1a.mlogit(x, y, K.prov = NULL, eps = 0.001, lmax = 100, b.ini = NULL, denom.eps = 1e-20, family = "binomial", link = "logit", br = T)
glpls1a.mlogit(x, y, K.prov = NULL, eps = 0.001, lmax = 100, b.ini = NULL, denom.eps = 1e-20, family = "binomial", link = "logit", br = T)
x |
n by p design matrix (with intercept term) |
y |
response vector with class lables 1 to C+1 for C+1 group classification, baseline class should be 1 |
K.prov |
number of PLS components |
eps |
tolerance for convergence |
lmax |
maximum number of iteration allowed |
b.ini |
initial value of regression coefficients |
denom.eps |
small quanitity to guarantee nonzero denominator in deciding convergence |
family |
glm family, |
link |
link function, |
br |
TRUE if Firth's bias reduction procedure is used |
coefficients |
regression coefficient matrix |
convergence |
whether convergence is achieved |
niter |
total number of iterations |
bias.reduction |
whether Firth's procedure is used |
Beiying Ding, Robert Gentleman
Ding, B.Y. and Gentleman, R. (2003) Classification using generalized partial least squares.
Marx, B.D (1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4): 374-381.
glpls1a
,glpls1a.mlogit.cv.error
,
glpls1a.train.test.error
,
glpls1a.cv.error
x <- matrix(rnorm(20),ncol=2) y <- sample(1:3,10,TRUE) ## no bias reduction and 1 PLS component glpls1a.mlogit(cbind(rep(1,10),x),y,K.prov=1,br=FALSE) ## bias reduction glpls1a.mlogit(cbind(rep(1,10),x),y,br=TRUE)
x <- matrix(rnorm(20),ncol=2) y <- sample(1:3,10,TRUE) ## no bias reduction and 1 PLS component glpls1a.mlogit(cbind(rep(1,10),x),y,K.prov=1,br=FALSE) ## bias reduction glpls1a.mlogit(cbind(rep(1,10),x),y,br=TRUE)
Leave-one-out cross-validation training set error for fitting MIRWPLS or MIRWPLSF model for multi-group classification
glpls1a.mlogit.cv.error(train.X, train.y, K.prov = NULL, eps = 0.001,lmax = 100, mlogit = T, br = T)
glpls1a.mlogit.cv.error(train.X, train.y, K.prov = NULL, eps = 0.001,lmax = 100, mlogit = T, br = T)
train.X |
n by p design matrix (with no intercept term) for training set |
train.y |
response vector with class lables 1 to C+1 for C+1 group classification, baseline class should be 1 |
K.prov |
number of PLS components |
eps |
tolerance for convergence |
lmax |
maximum number of iteration allowed |
mlogit |
if |
br |
TRUE if Firth's bias reduction procedure is used |
error |
LOOCV training error |
error.obs |
the misclassified error observation indices |
Beiying Ding, Robert Gentleman
Ding, B.Y. and Gentleman, R. (2003) Classification using generalized partial least squares.
Marx, B.D (1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4): 374-381.
glpls1a.cv.error
, glpls1a.train.test.error
,glpls1a
, glpls1a.mlogit
,glpls1a.logit.all
x <- matrix(rnorm(20),ncol=2) y <- sample(1:3,10,TRUE) ## no bias reduction glpls1a.mlogit.cv.error(x,y,br=FALSE) glpls1a.mlogit.cv.error(x,y,mlogit=FALSE,br=FALSE) ## bias reduction glpls1a.mlogit.cv.error(x,y,br=TRUE) glpls1a.mlogit.cv.error(x,y,mlogit=FALSE,br=TRUE)
x <- matrix(rnorm(20),ncol=2) y <- sample(1:3,10,TRUE) ## no bias reduction glpls1a.mlogit.cv.error(x,y,br=FALSE) glpls1a.mlogit.cv.error(x,y,mlogit=FALSE,br=FALSE) ## bias reduction glpls1a.mlogit.cv.error(x,y,br=TRUE) glpls1a.mlogit.cv.error(x,y,mlogit=FALSE,br=TRUE)
Out-of-sample test set error for fitting IRWPLS or IRWPLSF model on the training set for two-group classification
glpls1a.train.test.error(train.X,train.y,test.X,test.y,K.prov=NULL,eps=1e-3,lmax=100,family="binomial",link="logit",br=T)
glpls1a.train.test.error(train.X,train.y,test.X,test.y,K.prov=NULL,eps=1e-3,lmax=100,family="binomial",link="logit",br=T)
train.X |
n by p design matrix (with no intercept term) for training set |
train.y |
response vector (0 or 1) for training set |
test.X |
transpose of the design matrix (with no intercept term) for test set |
test.y |
response vector (0 or 1) for test set |
K.prov |
number of PLS components, default is the rank of train.X |
eps |
tolerance for convergence |
lmax |
maximum number of iteration allowed |
family |
glm family, |
link |
link function, |
br |
TRUE if Firth's bias reduction procedure is used |
error |
out-of-sample test error |
error.obs |
the misclassified error observation indices |
predict.test |
the predicted probabilities for test set |
Beiying Ding, Robert Gentleman
Ding, B.Y. and Gentleman, R. (2003) Classification using generalized partial least squares.
Marx, B.D (1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4): 374-381.
glpls1a.cv.error
,
glpls1a.mlogit.cv.error
, glpls1a
, glpls1a.mlogit
, glpls1a.logit.all
x <- matrix(rnorm(20),ncol=2) y <- sample(0:1,10,TRUE) x1 <- matrix(rnorm(10),ncol=2) y1 <- sample(0:1,5,TRUE) ## no bias reduction glpls1a.train.test.error(x,y,x1,y1,br=FALSE) ## bias reduction glpls1a.train.test.error(x,y,x1,y1,br=TRUE)
x <- matrix(rnorm(20),ncol=2) y <- sample(0:1,10,TRUE) x1 <- matrix(rnorm(10),ncol=2) y1 <- sample(0:1,5,TRUE) ## no bias reduction glpls1a.train.test.error(x,y,x1,y1,br=FALSE) ## bias reduction glpls1a.train.test.error(x,y,x1,y1,br=TRUE)
Partial least squares is a commonly used dimension reduction technique. The paradigm can be extended to include generalized linear models in several different ways. The code in this function uses the extension proposed by Ding and Gentleman, 2004.
gpls(x, ...) ## Default S3 method: gpls(x, y, K.prov=NULL, eps=1e-3, lmax=100, b.ini=NULL, denom.eps=1e-20, family="binomial", link=NULL, br=TRUE, ...) ## S3 method for class 'formula' gpls(formula, data, contrasts=NULL, K.prov=NULL, eps=1e-3, lmax=100, b.ini=NULL, denom.eps=1e-20, family="binomial", link=NULL, br=TRUE, ...)
gpls(x, ...) ## Default S3 method: gpls(x, y, K.prov=NULL, eps=1e-3, lmax=100, b.ini=NULL, denom.eps=1e-20, family="binomial", link=NULL, br=TRUE, ...) ## S3 method for class 'formula' gpls(formula, data, contrasts=NULL, K.prov=NULL, eps=1e-3, lmax=100, b.ini=NULL, denom.eps=1e-20, family="binomial", link=NULL, br=TRUE, ...)
x |
The matrix of covariates. |
formula |
A formula of the form 'y ~ x1 + x2 + ...', where
|
y |
The vector of responses |
data |
A data.frame to resolve the forumla, if used |
K.prov |
number of PLS components, default is the rank of X |
eps |
tolerance for convergence |
lmax |
maximum number of iteration allowed |
b.ini |
initial value of regression coefficients |
denom.eps |
small quanitity to guarantee nonzero denominator in deciding convergence |
family |
glm family, |
link |
link function, |
br |
TRUE if Firth's bias reduction procedure is used |
... |
Additional arguements. |
contrasts |
an optional list. See the |
This is a different interface to the functionality provided by
glpls1a
. The interface is intended to be simpler to use
and more consistent with other matchine learning code in R.
The technology is intended to deal with two class problems where
there are more predictors than cases. If a response variable
(y
) is used that has more than two levels the behavior may
be unusual.
An object of class gpls
with the following components:
coefficients |
The estimated coefficients. |
convergence |
A boolean indicating whether convergence was achieved. |
niter |
The total number of iterations. |
bias.reduction |
A boolean indicating whether Firth's procedure was used. |
family |
The |
link |
The |
terms |
The constructed terms object. |
call |
The call |
levs |
The factor levels for prediction. |
B. Ding and R. Gentleman
Ding, B.Y. and Gentleman, R. (2003) Classification using generalized partial least squares.
Marx, B.D (1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4): 374-381.
library(MASS) m1 = gpls(type~., data=Pima.tr, K=3)
library(MASS) m1 = gpls(type~., data=Pima.tr, K=3)
A simple prediction method for gpls
objects.
## S3 method for class 'gpls' predict(object, newdata, ...)
## S3 method for class 'gpls' predict(object, newdata, ...)
object |
A |
newdata |
New data, for which predictions are desired. |
... |
Other arguments to be passed on |
The prediction method is straight forward. The estimated coefficients
from object
are used, together with the new data to produce
predicted values. These are then split, according to whether the
predicted values is larger or smaller than 0.5 and predictions
returned.
The code is similar to that in glpls1a.train.test.error
except that in that function both the test and train matrices are
centered and scaled (the covariates) by the same values (those from
the test data set).
A list
of length two:
class |
The predicted classes; one for each row of |
predicted |
The estimated predictors. |
B. Ding and R. Gentleman
example(gpls) p1 = predict(m1)
example(gpls) p1 = predict(m1)