Title: | Fit Penalised Generalised Least Squares models |
---|---|
Description: | Combine generalised least squares methodology from the nlme package for dealing with autocorrelation with penalised least squares methods from the glmnet package to deal with high dimensionality. This pengls packages glues them together through an iterative loop. The resulting method is applicable to high dimensional datasets that exhibit autocorrelation, such as spatial or temporal data. |
Authors: | Stijn Hawinkel [cre, aut] |
Maintainer: | Stijn Hawinkel <[email protected]> |
License: | GPL-2 |
Version: | 1.13.0 |
Built: | 2024-12-29 06:23:10 UTC |
Source: | https://github.com/bioc/pengls |
Extract coefficients from a cv.pengls model
## S3 method for class 'cv.pengls' coef(object, which = "lambda.1se", ...)
## S3 method for class 'cv.pengls' coef(object, which = "lambda.1se", ...)
object |
A cv.pengls object |
which |
a character string, for which lambda should coefficients be returned |
... |
further arguments, currently ignored |
The vector of coefficients
Extract coefficients from a pengls model
## S3 method for class 'pengls' coef(object, ...)
## S3 method for class 'pengls' coef(object, ...)
object |
A pengls object |
... |
further arguments, currently ignored |
The vector of coefficients
Peform cross-validation pengls
cv.pengls( data, glsSt, xNames, outVar, corMat, nfolds, foldid, scale = FALSE, center = FALSE, cvType = "blocked", lambdas, transFun = "identity", transFunArgs = list(), loss = c("R2", "MSE"), ... )
cv.pengls( data, glsSt, xNames, outVar, corMat, nfolds, foldid, scale = FALSE, center = FALSE, cvType = "blocked", lambdas, transFun = "identity", transFunArgs = list(), loss = c("R2", "MSE"), ... )
data |
A data matrix or data frame |
glsSt |
a covariance structure, as supplied to nlme::gls as "correlation" |
xNames |
names of the regressors in data |
outVar |
name of the outcome variable in data |
corMat |
a starting value for the correlation matrix. Taken to be a diagonal matrix if missing |
nfolds |
an integer, the number of folds used in cv.glmnet to find lambda |
foldid |
An optional vector deffining the fold |
scale , center
|
booleans, should regressors be scaled to zero mean and variance 1? Defaults to TRUE |
cvType |
A character vector defining the type of cross-validation. Either "random" or "blocked", ignored if foldid is provided |
lambdas |
an optional lambda sequence |
transFun |
a transformation function to apply to predictions and outcome in the cross-validation |
transFunArgs |
Additional arguments passed onto transFun |
loss |
a character vector, currently either 'R2' or 'MSE' indicating the loss function (although R2 is not a proper loss...) |
... |
passed onto glmnet::glmnet |
A list with components
lambda |
The series of lambdas |
cvm |
The vector of mean R2's |
cvsd |
The standard error of R2 at the maximum |
cvOpt |
The R2 according to the 1 standard error rule |
coefs |
The matrix of coefficients for every lambda value |
bestFit |
The best fitting pengls model according to the 1 standard error rule |
lambda.min |
Lambda value with maximal R2 |
lambda.1se |
Smallest lambda value within 1 standard error from the maximum |
foldid |
The folds |
glsSt |
The nlme correlation object |
loss |
The loss function used |
library(nlme) library(BiocParallel) n <- 20 #Sample size p <- 50 #Number of features g <- 10 #Size of the grid #Generate grid Grid <- expand.grid("x" = seq_len(g), "y" = seq_len(g)) # Sample points from grid without replacement GridSample <- Grid[sample(nrow(Grid), n, replace = FALSE),] #Generate outcome and regressors b <- matrix(rnorm(p*n), n , p) a <- rnorm(n, mean = b %*% rbinom(p, size = 1, p = 0.2)) #20% signal #Compile to a matrix df <- data.frame("a" = a, "b" = b, GridSample) # Define the correlation structure (see ?nlme::gls), with initial nugget 0.5 and range 5 corStruct = corGaus(form = ~ x + y, nugget = TRUE, value = c("range" = 5, "nugget" = 0.5)) #Fit the pengls model, for simplicity for a simple lambda register(MulticoreParam(3)) #Prepare multithereading penglsFitCV = cv.pengls(data = df, outVar = "a", xNames = grep(names(df), pattern = "b", value = TRUE), glsSt = corStruct, nfolds = 5) penglsFitCV$lambda.1se #Lambda for 1 standard error rule penglsFitCV$cvOpt #Corresponding R2 coef(penglsFitCV) penglsFitCV$foldid #The folds used #With MSE as loss function penglsFitCVmse = cv.pengls(data = df, outVar = "a", xNames = grep(names(df), pattern = "b", value =TRUE), glsSt = corStruct, nfolds = 5, loss = "MSE") penglsFitCVmse$lambda.1se #Lambda for 1 standard error rule penglsFitCVmse$cvOpt #Corresponding MSE coef(penglsFitCVmse) predict(penglsFitCVmse)
library(nlme) library(BiocParallel) n <- 20 #Sample size p <- 50 #Number of features g <- 10 #Size of the grid #Generate grid Grid <- expand.grid("x" = seq_len(g), "y" = seq_len(g)) # Sample points from grid without replacement GridSample <- Grid[sample(nrow(Grid), n, replace = FALSE),] #Generate outcome and regressors b <- matrix(rnorm(p*n), n , p) a <- rnorm(n, mean = b %*% rbinom(p, size = 1, p = 0.2)) #20% signal #Compile to a matrix df <- data.frame("a" = a, "b" = b, GridSample) # Define the correlation structure (see ?nlme::gls), with initial nugget 0.5 and range 5 corStruct = corGaus(form = ~ x + y, nugget = TRUE, value = c("range" = 5, "nugget" = 0.5)) #Fit the pengls model, for simplicity for a simple lambda register(MulticoreParam(3)) #Prepare multithereading penglsFitCV = cv.pengls(data = df, outVar = "a", xNames = grep(names(df), pattern = "b", value = TRUE), glsSt = corStruct, nfolds = 5) penglsFitCV$lambda.1se #Lambda for 1 standard error rule penglsFitCV$cvOpt #Corresponding R2 coef(penglsFitCV) penglsFitCV$foldid #The folds used #With MSE as loss function penglsFitCVmse = cv.pengls(data = df, outVar = "a", xNames = grep(names(df), pattern = "b", value =TRUE), glsSt = corStruct, nfolds = 5, loss = "MSE") penglsFitCVmse$lambda.1se #Lambda for 1 standard error rule penglsFitCVmse$cvOpt #Corresponding MSE coef(penglsFitCVmse) predict(penglsFitCVmse)
Get the (square root of the inverse of the) correlation matrix
getCorMat(data, glsSt, Coef = c(coef(glsSt)), control, outVar)
getCorMat(data, glsSt, Coef = c(coef(glsSt)), control, outVar)
data |
The data frame |
glsSt |
The correlation object for gls |
Coef |
optional vector of coefficients to glsSt |
control |
the list of control arguments for gls |
outVar |
the name of the outcome variable |
A list with components
corMat |
The square root of the inverse correlation matrix |
Coef |
The coefficients of the correlation object |
Calculate the loss given predicted and observed values
getLoss(preds, obs, loss)
getLoss(preds, obs, loss)
preds |
Matrix of predicted values |
obs |
vector of observed values |
loss |
a character vector indicating the loss type, see ?cv.pengls |
the evaluated loss
Divide observations into folds
makeFolds(nfolds, data, cvType, coords)
makeFolds(nfolds, data, cvType, coords)
nfolds |
The number of folds |
data |
the dataset |
cvType |
a character vector, indicating the type of cross-validation required, either blocked or random |
coords |
the names of the coordinates in data |
the vector of folds
nfolds <- 10 data <- expand.grid("x" = seq_len(10), "y" = seq_len(10)) randomFolds <- makeFolds(nfolds = nfolds, data, "random", c("x", "y")) blockedFolds <- makeFolds(nfolds = nfolds, data, "blocked", c("x", "y"))
nfolds <- 10 data <- expand.grid("x" = seq_len(10), "y" = seq_len(10)) randomFolds <- makeFolds(nfolds = nfolds, data, "random", c("x", "y")) blockedFolds <- makeFolds(nfolds = nfolds, data, "blocked", c("x", "y"))
Iterative estimation of penalised generalised least squares
pengls( data, glsSt, xNames, outVar, corMat, lambda, foldid, maxIter = 30, tol = 0.05, verbose = FALSE, scale = FALSE, center = FALSE, optControl = lmeControl(opt = "optim", maxIter = 500, msVerbose = verbose, msMaxIter = 500, niterEM = 1000, msMaxEval = 1000), nfolds = 10, penalty.factor = c(0, rep(1, length(xNames))), ... )
pengls( data, glsSt, xNames, outVar, corMat, lambda, foldid, maxIter = 30, tol = 0.05, verbose = FALSE, scale = FALSE, center = FALSE, optControl = lmeControl(opt = "optim", maxIter = 500, msVerbose = verbose, msMaxIter = 500, niterEM = 1000, msMaxEval = 1000), nfolds = 10, penalty.factor = c(0, rep(1, length(xNames))), ... )
data |
A data matrix or data frame |
glsSt |
a covariance structure, as supplied to nlme::gls as "correlation" |
xNames |
names of the regressors in data |
outVar |
name of the outcome variable in data |
corMat |
a starting value for the correlation matrix. Taken to be a diagonal matrix if missing |
lambda |
The penalty value for glmnet. If missing, the optimal value of vanilla glmnet without autocorrelation component is used |
foldid |
An optional vector deffining the fold |
maxIter |
maximum number of iterations between glmnet and gls |
tol |
A convergence tolerance |
verbose |
a boolean, should output be printed? |
scale , center
|
booleans, should regressors be scaled to zero mean and variance 1? Defaults to TRUE |
optControl |
control arguments, passed onto nlme::gls' control argument |
nfolds |
an integer, the number of folds used in cv.glmnet to find lambda |
penalty.factor |
passed onto glmnet:glmnet. The first entry is zero by default for the intercept, which is not shrunk |
... |
passed onto glmnet::glmnet |
A list with components
glmnet |
The glmnet fit, which can be manipulated as such |
gls |
A list with info on the estimated correlation matrix |
iter |
The iterations needed |
conv |
A boolean, indicating whether the iteration between mean model and covariance estimation converged |
xNames , data , glsSt , outVar
|
As provided |
lambda |
The lambda penalty paraneter used |
cv.pengls
### Example 1: spatial data # Define the dimensions of the data library(nlme) n <- 50 #Sample size p <- 100 #Number of features g <- 10 #Size of the grid #Generate grid Grid <- expand.grid("x" = seq_len(g), "y" = seq_len(g)) # Sample points from grid without replacement GridSample <- Grid[sample(nrow(Grid), n, replace = FALSE),] #Generate outcome and regressors b <- matrix(rnorm(p*n), n , p) a <- rnorm(n, mean = b %*% rbinom(p, size = 1, p = 0.2)) #20% signal #Compile to a matrix df <- data.frame("a" = a, "b" = b, GridSample) # Define the correlation structure (see ?nlme::gls), with initial nugget 0.5 and range 5 corStruct <- corGaus(form = ~ x + y, nugget = TRUE, value = c("range" = 5, "nugget" = 0.5)) #Fit the pengls model, for simplicity for a simple lambda penglsFit <- pengls(data = df, outVar = "a", xNames = grep(names(df), pattern = "b", value =TRUE), glsSt = corStruct, nfolds = 5) ### Example 2: timecourse data dfTime <- data.frame("a" = a, "b" = b, "t" = seq_len(n)) dfTime$a[-1] = dfTime$a[-n]*0.25 #Some temporal signal corStructTime <- corAR1(form = ~ t, value = 0.5) penglsFitTime <- pengls(data = dfTime, outVar = "a", xNames = grep(names(dfTime), pattern = "b", value =TRUE), glsSt = corStructTime, nfolds = 5)
### Example 1: spatial data # Define the dimensions of the data library(nlme) n <- 50 #Sample size p <- 100 #Number of features g <- 10 #Size of the grid #Generate grid Grid <- expand.grid("x" = seq_len(g), "y" = seq_len(g)) # Sample points from grid without replacement GridSample <- Grid[sample(nrow(Grid), n, replace = FALSE),] #Generate outcome and regressors b <- matrix(rnorm(p*n), n , p) a <- rnorm(n, mean = b %*% rbinom(p, size = 1, p = 0.2)) #20% signal #Compile to a matrix df <- data.frame("a" = a, "b" = b, GridSample) # Define the correlation structure (see ?nlme::gls), with initial nugget 0.5 and range 5 corStruct <- corGaus(form = ~ x + y, nugget = TRUE, value = c("range" = 5, "nugget" = 0.5)) #Fit the pengls model, for simplicity for a simple lambda penglsFit <- pengls(data = df, outVar = "a", xNames = grep(names(df), pattern = "b", value =TRUE), glsSt = corStruct, nfolds = 5) ### Example 2: timecourse data dfTime <- data.frame("a" = a, "b" = b, "t" = seq_len(n)) dfTime$a[-1] = dfTime$a[-n]*0.25 #Some temporal signal corStructTime <- corAR1(form = ~ t, value = 0.5) penglsFitTime <- pengls(data = dfTime, outVar = "a", xNames = grep(names(dfTime), pattern = "b", value =TRUE), glsSt = corStructTime, nfolds = 5)
Make predictions from a cv.pengls model
## S3 method for class 'cv.pengls' predict(object, ...)
## S3 method for class 'cv.pengls' predict(object, ...)
object |
A cv.pengls object |
... |
further arguments, currently ignored |
A vector with predicted values
Make predictions from a pengls model
## S3 method for class 'pengls' predict(object, newx, ...)
## S3 method for class 'pengls' predict(object, newx, ...)
object |
A pengls object |
newx |
The test data |
... |
further arguments, currently ignored |
A vector with predicted values
Print a summary of a cv.pengls model
## S3 method for class 'cv.pengls' print(x, ...)
## S3 method for class 'cv.pengls' print(x, ...)
x |
A cv.pengls object |
... |
further arguments, currently ignored |
Prints output to console
Print a summary of a pengls model
## S3 method for class 'pengls' print(x, ...)
## S3 method for class 'pengls' print(x, ...)
x |
A pengls object |
... |
further arguments, currently ignored |
Prints output to console