Title: | Identification of SNP Interactions |
---|---|
Description: | Identification of interactions between binary variables using Logic Regression. Can, e.g., be used to find interesting SNP interactions. Contains also a bagging version of logic regression for classification. |
Authors: | Holger Schwender, Tobias Tietz |
Maintainer: | Holger Schwender <[email protected]> |
License: | LGPL (>= 2) |
Version: | 2.27.0 |
Built: | 2024-11-29 06:40:29 UTC |
Source: | https://github.com/bioc/logicFS |
data.logicfs
contains two objects: a simulated matrix data.logicfs
of 400 observations (rows) and 15 variables (columns) and a vector
cl.logicfs
of length 400 containing the class labels of the
observations.
Each variable is categorical with realizations 1, 2 and 3. The first 200 observations
are cases, the remaining are controls. If one of the following expression is TRUE
,
then the corresponding observation is a case:
SNP1 == 3
SNP2 == 1 AND SNP4 == 3
SNP3 == 3 AND SNP5 == 3 AND SNP6 == 1
where SNP1 is in the first column of data.logicfs
, SNP2 in the second, and so on.
Computes the values of prime implicants for observations for which the values of the variables composing the prime implicants are available.
getMatEval(data, vec.primes, check = TRUE)
getMatEval(data, vec.primes, check = TRUE)
data |
a data frame in which each row corrsponds to an observation, and each column to a binary variable. |
vec.primes |
a character vector naming the prime implicants that
should be evaluated. Each of the variables composing these prime implicants must
be represented by one column of |
check |
should some checks be done before the evaluation is performed? It is
highly recommended not to change the default |
a matrix in which each row corresponds to an observation (the same observations
in the same order as in data
, and each column to one of the prime implicants.
Holger Schwender, [email protected]
A bagging and subsampling version of logic regression. Currently available for the
classification, the linear regression, and the logistic regression approach
of logreg
. Additionally, an approach based on multinomial logistic regressions as
implemented in mlogreg
can be used if the response is categorical.
## Default S3 method: logic.bagging(x, y, B = 100, useN = TRUE, ntrees = 1, nleaves = 8, glm.if.1tree = FALSE, replace = TRUE, sub.frac = 0.632, anneal.control = logreg.anneal.control(), oob = TRUE, onlyRemove = FALSE, prob.case = 0.5, importance = TRUE, score = c("DPO", "Conc", "Brier", "PL"), addMatImp = FALSE, fast = FALSE, neighbor = NULL, adjusted = FALSE, ensemble = FALSE, rand = NULL, ...) ## S3 method for class 'formula' logic.bagging(formula, data, recdom = TRUE, ...)
## Default S3 method: logic.bagging(x, y, B = 100, useN = TRUE, ntrees = 1, nleaves = 8, glm.if.1tree = FALSE, replace = TRUE, sub.frac = 0.632, anneal.control = logreg.anneal.control(), oob = TRUE, onlyRemove = FALSE, prob.case = 0.5, importance = TRUE, score = c("DPO", "Conc", "Brier", "PL"), addMatImp = FALSE, fast = FALSE, neighbor = NULL, adjusted = FALSE, ensemble = FALSE, rand = NULL, ...) ## S3 method for class 'formula' logic.bagging(formula, data, recdom = TRUE, ...)
x |
a matrix consisting of 0's and 1's. Each column must correspond to a binary variable and each row to an observation. Missing values are not allowed. |
y |
a numeric vector, a factor, or a vector of class |
B |
an integer specifying the number of iterations. |
useN |
logical specifying if the number of correctly classified out-of-bag observations should
be used in the computation of the importance measure. If |
ntrees |
an integer indicating how many trees should be used. For a binary response: If For a continuous response: A linear regression model with For a categorical response: For a response of class |
nleaves |
a numeric value specifying the maximum number of leaves used
in all trees combined. See the help page of the function |
glm.if.1tree |
if |
replace |
should sampling of the cases be done with replacement? If
|
sub.frac |
a proportion specifying the fraction of the observations that
are used in each iteration to build a classification rule if |
anneal.control |
a list containing the parameters for simulated annealing.
See the help page of |
oob |
should the out-of-bag error rate (classification and logistic regression) or the out-of-bag root mean square prediction error (linear regression), respectively, be computed? |
onlyRemove |
should in the single tree case the multiple tree measure be used? If |
prob.case |
a numeric value between 0 and 1. If the outcome of the
logistic regression, i.e.\ the class probability, for an observation is
larger than |
importance |
should the measure of importance be computed? |
score |
a character string naming the score that should be used in the computation of the importance measure for a survival time analysis. By default, the distance between predicted outcomes ( |
addMatImp |
should the matrix containing the improvements due to the prime implicants
in each of the iterations be added to the output? (For each of the prime implicants,
the importance is computed by the average over the |
fast |
should a greedy search (as implemented in |
neighbor |
a list consisting of character vectors specifying SNPs that are in LD. If specified, all SNPs need to occur exactly one time in this list. If specified, the importance measures are adjusted for LD by considering the SNPs within a LD block as exchangable. |
adjusted |
logical specifying whether the measures should be adjusted for noise. Often, the interaction actually associated with the response is not exactly found in some iterations of logic bagging, but an interaction is identified that additionally contains one (or seldomly more) noise SNPs. If |
ensemble |
in the case of a survival outcome, should |
rand |
numeric value. If specified, the random number generator will be set into a reproducible state. |
formula |
an object of class |
data |
a data frame containing the variables in the model. Each row of |
recdom |
a logical value or vector of length |
... |
for the |
logic.bagging
returns an object of class logicBagg
containing
logreg.model |
a list containing the |
inbagg |
a list specifying the |
vim |
an object of class |
oob.error |
the out-of-bag error (if |
... |
further parameters of the logic regression. |
Holger Schwender, [email protected]; Tobias Tietz, [email protected]
Ruczinski, I., Kooperberg, C., LeBlanc M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.
Schwender, H., Ickstadt, K. (2007). Identification of SNP Interactions Using Logic Regression. Biostatistics, 9(1), 187-198.
Tietz, T., Selinski, S., Golka, K., Hengstler, J.G., Gripp, S., Ickstadt, K., Ruczinski, I., Schwender, H. (2018). Identification of Interactions of Binary Variables Associated with Survival Time Using survivalFS. Submitted.
predict.logicBagg
, plot.logicBagg
,
logicFS
## Not run: # Load data. data(data.logicfs) # For logic regression and hence logic.bagging, the variables must # be binary. data.logicfs, however, contains categorical data # with realizations 1, 2 and 3. Such data can be transformed # into binary data by bin.snps<-make.snp.dummy(data.logicfs) # To speed up the search for the best logic regression models # only a small number of iterations is used in simulated annealing. my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000) # Bagged logic regression is then performed by bagg.out<-logic.bagging(bin.snps,cl.logicfs,B=20,nleaves=10, rand=123,anneal.control=my.anneal) # The output of logic.bagging can be printed bagg.out # By default, also the importances of the interactions are # computed bagg.out$vim # and can be plotted. plot(bagg.out) # The original variable names are displayed in plot(bagg.out,coded=FALSE) # New observations (here we assume that these observations are # in data.logicfs) are assigned to one of the classes by predict(bagg.out,data.logicfs) ## End(Not run)
## Not run: # Load data. data(data.logicfs) # For logic regression and hence logic.bagging, the variables must # be binary. data.logicfs, however, contains categorical data # with realizations 1, 2 and 3. Such data can be transformed # into binary data by bin.snps<-make.snp.dummy(data.logicfs) # To speed up the search for the best logic regression models # only a small number of iterations is used in simulated annealing. my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000) # Bagged logic regression is then performed by bagg.out<-logic.bagging(bin.snps,cl.logicfs,B=20,nleaves=10, rand=123,anneal.control=my.anneal) # The output of logic.bagging can be printed bagg.out # By default, also the importances of the interactions are # computed bagg.out$vim # and can be plotted. plot(bagg.out) # The original variable names are displayed in plot(bagg.out,coded=FALSE) # New observations (here we assume that these observations are # in data.logicfs) are assigned to one of the classes by predict(bagg.out,data.logicfs) ## End(Not run)
Computes the out-of-bag error of the classification rule comprised by
a logicBagg
object.
logic.oob(log.out, prob.case = 0.5)
logic.oob(log.out, prob.case = 0.5)
log.out |
an object of class |
prob.case |
a numeric value between 0 and 1. If the logic regression models
are logistic regression models, i.e.\ if in |
The out-of-bag error estimate.
Holger Schwender, [email protected]
Determines the prime implicants contained in the logic regression models
comprised in an object of class logicBagg
.
logic.pimp(log.out)
logic.pimp(log.out)
log.out |
an object of class |
Since we are interested in all potentially interested interactions and not
in a minimum set of them, logic.pimp
and returns all
prime implicants and not a minimum number of them.
A list consisting of the prime implicants for
each of the B
logic regression models of log.out
.
Holger Schwender, [email protected]
logic.bagging
, logicFS
,
prime.implicants
Identification of interesting interactions between binary variables
using logic regression. Currently available for the classification, the linear
regression and the logistic regression approach of logreg
and for
a multinomial logic regression as implemented in mlogreg
.
## Default S3 method: logicFS(x, y, B = 100, useN = TRUE, ntrees = 1, nleaves = 8, glm.if.1tree = FALSE, replace = TRUE, sub.frac = 0.632, anneal.control = logreg.anneal.control(), onlyRemove = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), addMatImp = TRUE, fast = FALSE, neighbor = NULL, adjusted = FALSE, ensemble = FALSE, rand = NULL, ...) ## S3 method for class 'formula' logicFS(formula, data, recdom = TRUE, ...) ## S3 method for class 'logicBagg' logicFS(x, neighbor = NULL, adjusted = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, addMatImp = TRUE, ...)
## Default S3 method: logicFS(x, y, B = 100, useN = TRUE, ntrees = 1, nleaves = 8, glm.if.1tree = FALSE, replace = TRUE, sub.frac = 0.632, anneal.control = logreg.anneal.control(), onlyRemove = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), addMatImp = TRUE, fast = FALSE, neighbor = NULL, adjusted = FALSE, ensemble = FALSE, rand = NULL, ...) ## S3 method for class 'formula' logicFS(formula, data, recdom = TRUE, ...) ## S3 method for class 'logicBagg' logicFS(x, neighbor = NULL, adjusted = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, addMatImp = TRUE, ...)
x |
a matrix consisting of 0's and 1's. Alternatively, |
y |
a numeric vector, a factor, or a vector of class |
B |
an integer specifying the number of iterations. |
useN |
logical specifying if the number of correctly classified out-of-bag observations should
be used in the computation of the importance measure. If |
ntrees |
an integer indicating how many trees should be used. For a binary response: If For a continuous response: A linear regression model with For a categorical response: For a response of class |
nleaves |
a numeric value specifying the maximum number of leaves used
in all trees combined. For details, see the help page of the function |
glm.if.1tree |
if |
replace |
should sampling of the cases be done with replacement? If
|
sub.frac |
a proportion specifying the fraction of the observations that
are used in each iteration to build a classification rule if |
anneal.control |
a list containing the parameters for simulated annealing.
See the help of the function |
onlyRemove |
should in the single tree case the multiple tree measure be used? If |
prob.case |
a numeric value between 0 and 1. If the outcome of the
logistic regression, i.e.\ the predicted probability, for an observation is
larger than |
score |
a character string naming the score that should be used in the computation of the importance measure for a survival time analysis. By default, the distance between predicted outcomes ( |
addMatImp |
should the matrix containing the improvements due to the prime implicants
in each of the iterations be added to the output? (For each of the prime implicants,
the importance is computed by the average over the |
fast |
should a greedy search (as implemented in |
neighbor |
a list consisting of character vectors specifying SNPs that are in LD. If specified, all SNPs need to occur exactly one time in this list. If specified, the importance measures are adjusted for LD by considering the SNPs within a LD block as exchangable. |
adjusted |
logical specifying whether the measures should be adjusted for noise. Often, the interaction actually associated with the response is not exactly found in some iterations of logic bagging, but an interaction is identified that additionally contains one (or seldomly more) noise SNPs. If |
ensemble |
in the case of a survival outcome, should |
rand |
numeric value. If specified, the random number generator will be set into a reproducible state. |
formula |
an object of class |
data |
a data frame containing the variables in the model. Each row of |
recdom |
a logical value or vector of length |
... |
for the |
An object of class logicFS
containing
primes |
the prime implicants, |
vim |
the importance of the prime implicants, |
prop |
the proportion of logic regression models containing the prime implicants (or the neighbors of the prime implicants, if |
type |
the type of model (1: classification, 2: linear regression, 3: logistic regression, 4: Cox regression), |
param |
further parameters (if |
mat.imp |
either the matrix containing the improvements if |
measure |
the name of the used importance measure, |
neighbor |
|
useN |
the value of |
threshold |
NULL, |
mu |
NULL. |
Holger Schwender, [email protected]; Tobias Tietz, [email protected]
Ruczinski, I., Kooperberg, C., LeBlanc M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.
Schwender, H., Ickstadt, K. (2007). Identification of SNP Interactions Using Logic Regression. Biostatistics, 9(1), 187-198.
Tietz, T., Selinski, S., Golka, K., Hengstler, J.G., Gripp, S., Ickstadt, K., Ruczinski, I., Schwender, H. (2018). Identification of Interactions of Binary Variables Associated with Survival Time Using survivalFS. Submitted.
## Not run: # Load data. data(data.logicfs) # For logic regression and hence logic.fs, the variables must # be binary. data.logicfs, however, contains categorical data # with realizations 1, 2 and 3. Such data can be transformed # into binary data by bin.snps<-make.snp.dummy(data.logicfs) # To speed up the search for the best logic regression models # only a small number of iterations is used in simulated annealing. my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000) # Feature selection using logic regression is then done by log.out<-logicFS(bin.snps,cl.logicfs,B=20,nleaves=10, rand=123,anneal.control=my.anneal) # The output of logic.fs can be printed log.out # One can specify another number of interactions that should be # printed, here, e.g., 15. print(log.out,topX=15) # The variable importance can also be plotted. plot(log.out) # And the original variable names are displayed in plot(log.out,coded=FALSE) ## End(Not run)
## Not run: # Load data. data(data.logicfs) # For logic regression and hence logic.fs, the variables must # be binary. data.logicfs, however, contains categorical data # with realizations 1, 2 and 3. Such data can be transformed # into binary data by bin.snps<-make.snp.dummy(data.logicfs) # To speed up the search for the best logic regression models # only a small number of iterations is used in simulated annealing. my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000) # Feature selection using logic regression is then done by log.out<-logicFS(bin.snps,cl.logicfs,B=20,nleaves=10, rand=123,anneal.control=my.anneal) # The output of logic.fs can be printed log.out # One can specify another number of interactions that should be # printed, here, e.g., 15. print(log.out,topX=15) # The variable importance can also be plotted. plot(log.out) # And the original variable names are displayed in plot(log.out,coded=FALSE) ## End(Not run)
Transforms SNPs into binary dummy variables.
make.snp.dummy(data)
make.snp.dummy(data)
data |
a matrix in which each column corresponds to a SNP and each row to an observation.
The genotypes of all SNPs must be either coded by 1 (for the homozygous reference genotype),
2 (heterozygous), and 3 (homozygous variant) or by 0, 1, 2. It is not allowed that some SNPs following the 1, 2, 3
coding scheme and some SNPs the 0, 1, 2 coding. Missing values are allowed, but please note that
neither |
make.snp.dummy
assumes that the homozygous dominant genotype
is coded by 1, the heterozygous genotype by 2, and the homozygous
recessive genotype by 3. Alternatively, the three genotypes can be coded
by the number of minor alleles, i.e. by 0, 1, and 2.
For each SNP, two dummy variables are generated:
At least one of the bases explaining the SNP are of the recessive type.
Both bases are of the recessive type.
A matrix with 2*ncol(data)
columns containing 2 dummy variables
for each SNP.
See the R
package scrime
for more general functions for recoding SNPs.
Holger Schwender, [email protected]
Performs a multinomial logic regression for a nominal response by fitting a logic regression model (with logit as link function) for each of the levels of the response except for the level with the smallest value which is used as reference category.
## S3 method for class 'formula' mlogreg(formula, data, recdom = TRUE, ...) ## Default S3 method: mlogreg(x, y, ntrees = 1, nleaves = 8, anneal.control = logreg.anneal.control(), select = 1, rand = NA, ...)
## S3 method for class 'formula' mlogreg(formula, data, recdom = TRUE, ...) ## Default S3 method: mlogreg(x, y, ntrees = 1, nleaves = 8, anneal.control = logreg.anneal.control(), select = 1, rand = NA, ...)
formula |
an object of class |
data |
a data frame containing the variables in the model. Each column of |
recdom |
a logical value or vector of length |
x |
a matrix consisting of 0's and 1's. Each column must correspond to a binary variable and each row to an observation. |
y |
either a factor or a numeric or character vector specifying the values of the response.
The length of |
ntrees |
an integer indicating how many trees should be used in the logic regression models.
For details, see |
nleaves |
a numeric value specifying the maximum number of leaves used
in all trees combined. See the help page of the function |
anneal.control |
a list containing the parameters for simulated annealing.
For details, see the help page of |
select |
numeric value. Either 0 for a stepwise greedy selection (corresponds to |
rand |
numeric value. If specified, the random number generator will be set into a reproducible state. |
... |
for the |
An object of class mlogreg
composed of
model |
a list containing the logic regression models, |
data |
a matrix containing the binary predictors, |
cl |
a vector comprising the class labels, |
ntrees |
a numeric value naming the maximum number of trees used in the logic regressions, |
nleaves |
a numeric value comprising the maximum number of leaves used in the logic regressions, |
fast |
a logical value specifying whether the faster search algorithm, i.e.\ the greedy search, has been used. |
Holger Schwender, [email protected]
Schwender, H., Ruczinski, I., Ickstadt, K. (2011). Testing SNPs and Sets of SNPs for Importance in Association Studies. Biostatistics, 12, 18-32.
predict.mlogreg
, logic.bagging
, logicFS
Generates a dotchart of the importance of the most important
interactions for an object of class logicFS
or logicBagg
.
## S3 method for class 'logicFS' plot(x, topX = 15, cex = 0.9, pch = 16, col = 1, show.prop = FALSE, force.topX = FALSE, coded = TRUE, add.thres = TRUE, thres = NULL, include0 = TRUE, add.v0 = TRUE, v0.col = "grey50", main = NULL, ...) ## S3 method for class 'logicBagg' plot(x, topX = 15, cex = 0.9, pch = 16, col = 1, show.prop = FALSE, force.topX = FALSE, coded = TRUE, include0 = TRUE, add.v0 = TRUE, v0.col = "grey50", main = NULL, ...)
## S3 method for class 'logicFS' plot(x, topX = 15, cex = 0.9, pch = 16, col = 1, show.prop = FALSE, force.topX = FALSE, coded = TRUE, add.thres = TRUE, thres = NULL, include0 = TRUE, add.v0 = TRUE, v0.col = "grey50", main = NULL, ...) ## S3 method for class 'logicBagg' plot(x, topX = 15, cex = 0.9, pch = 16, col = 1, show.prop = FALSE, force.topX = FALSE, coded = TRUE, include0 = TRUE, add.v0 = TRUE, v0.col = "grey50", main = NULL, ...)
x |
an object of either class |
topX |
integer specifying how many interactions should be shown.
If |
cex |
a numeric value specifying the relative size of the text and symbols. |
pch |
specifies the used symbol. See the help of |
col |
the color of the text and the symbols. See the help of |
show.prop |
if |
force.topX |
if |
coded |
should the coded variable names be displayed? Might be useful
if the actual variable names are pretty long. The coded variable name of
the j-th variable is |
add.thres |
should a vertical line marking the threshold for a prime implicant
to be called important be drawn in the plot? If |
thres |
non-negative numeric value specifying the threshold for a prime implicant
to be called important. If |
include0 |
should the x-axis include zero regardless whether the importances of the shown interactions are much higher than 0? |
add.v0 |
should a vertical line be drawn at |
v0.col |
the color of the vertical line at |
main |
character string naming the title of the plot. If |
... |
Ignored. |
Holger Schwender, [email protected]
Plots predicted survival or cumulative hazard curves of new
observations for an object of class predict.survivalFS
.
## S3 method for class 'predict.survivalFS' plot(x, select_obs, xlab = "time", ylab = NULL, ylim = NULL, type = "l", main = NULL, sub = NULL, vec_col = NULL, vec_lty = NULL, addLegend = TRUE, ...)
## S3 method for class 'predict.survivalFS' plot(x, select_obs, xlab = "time", ylab = NULL, ylim = NULL, type = "l", main = NULL, sub = NULL, vec_col = NULL, vec_lty = NULL, addLegend = TRUE, ...)
x |
an object of class |
select_obs |
a numeric vector identifying the observations whose
survival curves should be plotted. If |
xlab |
a title for the x axis: see |
ylab |
a title for the y axis: see |
ylim |
a numeric vector of length 2 that sets the limits of the y axis.
If |
type |
character indicating the type of plotting; actually any of
the types as in |
main |
an overall title for the plot: see |
sub |
a sub title for the plot: see |
vec_col |
a numeric or character vector that specifies the plotting colors
of the survival curves (see |
vec_lty |
a numeric or character vector that specifies the line types
of the survival curves (see |
addLegend |
should a legend be added to the plot automatically? |
... |
Ignored. |
Tobias Tietz, [email protected]
Prediction for test data using an object of class logicBagg
.
## S3 method for class 'logicBagg' predict(object, newData, prob.case = 0.5, type = c("class", "prob"), score = c("DPO", "Conc", "Brier"), ...)
## S3 method for class 'logicBagg' predict(object, newData, prob.case = 0.5, type = c("class", "prob"), score = c("DPO", "Conc", "Brier"), ...)
object |
an object of class |
newData |
a matrix or data frame containing new data. If omitted
|
prob.case |
a numeric value between 0 and 1. A new observation will be
classified as case (or more exactly, as 1) if the class probability, i.e.\
the average of the predicted probabilities of the models (if the logistic
regression approach of logic regression has been used), or the percentage of
votes for class 1 (if the classification approach of logic regression has been used)
is larger than |
type |
character vector indicating the type of output. If |
score |
a character string naming the score that should be used to assess the performance of the prediction model in the survival case. By default, the distance between predicted outcomes ( |
... |
Ignored. |
A numeric vector containing the predicted classes (if type = "class"
) or the
class probabilities (if type = "prob"
) of the new observations if the classification
or the logistic regression approach of logic regression is used. If the response is quantitative,
the predicted value of the response for all observations in the test data set is returned. If the
response is of class Surv
, an object of class predict.survivalFS
with either an
prediction for the cumulative hazard function or the survival function of the new observations is returned.
Holger Schwender, [email protected], Tobias Tietz, [email protected]
Prediction for test data using an object of class mlogreg
.
## S3 method for class 'mlogreg' predict(object, newData, type = c("class", "prob"), ...)
## S3 method for class 'mlogreg' predict(object, newData, type = c("class", "prob"), ...)
object |
an object of class |
newData |
a matrix or data frame containing new data. If omitted
|
type |
character vector indicating the type of output. If |
... |
Ignored. |
A numeric vector containing the predicted classes (if type = "class"
), or a matrix composed of the
class probabilities (if type = "prob"
).
Holger Schwender, [email protected]
Prints an object of class logicFS
.
## S3 method for class 'logicFS' print(x, topX = 5, show.prop = TRUE, coded = FALSE, digits = 2, ...)
## S3 method for class 'logicFS' print(x, topX = 5, show.prop = TRUE, coded = FALSE, digits = 2, ...)
x |
an object of either class |
topX |
integer indicating how many interactions should be shown.
Additionally to the |
show.prop |
should the proportions of models containing the interactions of interest also be shown? |
coded |
should the coded variable names be displayed? Might be useful
if the actual variable names are pretty long. The coded variable name of
the j-th variable is |
digits |
number of digits used in the output. |
... |
Ignored. |
Holger Schwender, [email protected]
Identification of interactions of binary variables associated with survival time using logic regression.
## Default S3 method: survivalFS(x, y, B = 20, replace = FALSE, sub.frac = 0.632, score = c("DPO", "Conc", "Brier", "PL"), addMatImp = TRUE, adjusted = FALSE, neighbor = NULL, ensemble = FALSE, rand = NULL, ...) ## S3 method for class 'formula' survivalFS(formula, data, recdom = TRUE, ...) ## S3 method for class 'logicBagg' survivalFS(x, score = c("DPO", "Conc", "Brier", "PL"), adjusted = FALSE, neighbor = NULL, ensemble = FALSE, addMatImp = TRUE, rand = NULL, ...)
## Default S3 method: survivalFS(x, y, B = 20, replace = FALSE, sub.frac = 0.632, score = c("DPO", "Conc", "Brier", "PL"), addMatImp = TRUE, adjusted = FALSE, neighbor = NULL, ensemble = FALSE, rand = NULL, ...) ## S3 method for class 'formula' survivalFS(formula, data, recdom = TRUE, ...) ## S3 method for class 'logicBagg' survivalFS(x, score = c("DPO", "Conc", "Brier", "PL"), adjusted = FALSE, neighbor = NULL, ensemble = FALSE, addMatImp = TRUE, rand = NULL, ...)
x |
a matrix consisting of 0's and 1's. Alternatively, |
y |
a vector of class |
B |
an integer specifying the number of iterations. |
replace |
should sampling of the cases be done with replacement? If
|
sub.frac |
a proportion specifying the fraction of the observations that
are used in each iteration to build a classification rule if |
score |
a character string naming the score that should be used in the computation of the importance measure for a survival time analysis. By default, the distance between predicted outcomes ( |
addMatImp |
should the matrix containing the improvements due to the prime implicants
in each of the iterations be added to the output if |
adjusted |
logical specifying whether the measures should be adjusted for noise. Often, the interaction actually associated with the response is not exactly found in some iterations of logic bagging, but an interaction is identified that additionally contains one (or seldomly more) noise SNPs. If |
neighbor |
a list consisting of character vectors specifying SNPs that are in LD. If specified, all SNPs need to occur exactly one time in this list. If specified, the importance measures are adjusted for LD by considering the SNPs within a LD block as exchangable. |
ensemble |
in the case of a survival outcome, should |
rand |
numeric value. If specified, the random number generator will be set into a reproducible state. |
formula |
an object of class |
data |
a data frame containing the variables in the model. Each row of |
recdom |
a logical value or vector of length |
... |
further arguments of |
An object of class logicFS
containing
primes |
the prime implicants, |
vim |
the importance of the prime implicants, |
prop |
the proportion of logic regression models containing the prime implicants, (or the neighbors of the prime implicants, if |
type |
the type of model (1: classification, 2: linear regression, 3: logistic regression, 4: Cox regression), |
param |
further parameters (if |
mat.imp |
either the matrix containing the improvements if |
measure |
the name of the used importance measure, |
neighbor |
|
useN |
the value of |
threshold |
NULL, |
mu |
NULL. |
Tobias Tietz, [email protected]
Tietz, T., Selinski, S., Golka, K., Hengstler, J.G., Gripp, S., Ickstadt, K., Ruczinski, I., Schwender, H. (2018). Identification of Interactions of Binary Variables Associated with Survival Time Using survivalFS. Submitted.
Computes the importances based on an approximation to a t- or F-distribution.
vim.approxPval(object, version = 1, adjust = "bonferroni")
vim.approxPval(object, version = 1, adjust = "bonferroni")
object |
an object of class |
version |
either |
adjust |
character vector naming the method with which the raw permutation based
p-values are adjusted for multiplicity. If |
An object of class logicFS
containing the same object as object
except for
vim |
the values of the importance measure based on an approximation to the t- or F-distribution, |
measure |
the name of the used importance measure, |
threshold |
0.95 if |
Holger Schwender, [email protected]
Schwender, H., Ruczinski, I., Ickstadt, K. (2011). Testing SNPs and Sets of SNPs for Importance in Association Studies. Biostatistics, 12, 18-32.
logic.bagging
, logicFS
, vim.input
, vim.set
, vim.permSet
Determining the importance of interactions found by logic.bagging
or logicFS
by Pearson's ChiSquare Statistic. Only available for the classification and the logistic
regression approach of logic regression.
vim.chisq(object, data = NULL, cl = NULL)
vim.chisq(object, data = NULL, cl = NULL)
object |
either an object of class |
data |
a data frame or matrix consisting of 0's and 1's in which each column corresponds
to one of the explanatory variables used in the original analysis with |
cl |
a numeric vector of 0's and 1's specifying the class labels of the observations in |
Currently Pearson's ChiSquare statistic is computed without continuity correction.
Contrary to vim.logicFS
(and vim.norm
and vim.signperm
),
vim.chisq
does neither take the logic regression models into acount nor uses the out-of-bag
observations for computing the importances of the identified interactions. It "just" tests each
of the found interactions on the whole data set by calculating Pearson's ChiSquare statistic for
each of these interactions. It is, therefore, highly recommended to use an independent data set
for specifying the importances of these interactions with vim.chisq
.
An object of class logicFS
containing
primes |
the prime implicants |
vim |
the values of Pearson's ChiSquare statistic, |
prop |
NULL, |
type |
NULL, |
param |
further parameters (if |
mat.imp |
NULL, |
measure |
"ChiSquare Based", |
threshold |
the 1 - 0.05/m quantile of the ChiSquare distribution with one degree of freedom, |
mu |
NULL. |
Holger Schwender, [email protected]
logic.bagging
, logicFS
,
vim.logicFS
, vim.norm
, vim.ebam
Determines the importance of interactions found by logic.bagging
or logicFS
by an Empirical Bayes Analysis of Microarrays (EBAM). Only available for the classification
and the logistic regression approach of logic regression.
vim.ebam(object, data = NULL, cl = NULL, storeEBAM = FALSE, ...)
vim.ebam(object, data = NULL, cl = NULL, storeEBAM = FALSE, ...)
object |
either an object of class |
data |
a data frame or matrix consisting of 0's and 1's in which each column corresponds
to one of the explanatory variables used in the original analysis with |
cl |
a numeric vector of 0's and 1's specifying the class labels of the observations in |
storeEBAM |
logical specifying whether the output of the EBAM analysis should be stored in the
output of |
... |
further arguments of |
For each interaction found by logic.bagging
or logicFS
, the posterior probability
that this interaction is significant is computed using the Empirical Bayes Analysis of Microarrays (EBAM).
These posterior probabilities are used as the EBAM based importances of the interactions.
The test statistic underlying this EBAM analysis is Pearson's ChiSquare statistic. Currently, the value of this statistic is computed without continuity correction.
Contrary to vim.logicFS
(and vim.norm
and vim.signperm
),
vim.ebam
does neither take the logic regression models into acount nor uses the out-of-bag
observations for computing the importances of the identified interactions. It "just" tests each
of the found interactions on the whole data set by calculating Pearson's ChiSquare statistic for
each of these interactions and performing an EBAM analysis. It is, therefore, highly recommended
to use an independent data set for specifying the importances of these interactions with vim.ebam
.
An object of class logicFS
containing
primes |
the prime implicants, |
vim |
the posterior probabilities of the interactions, |
prop |
NULL, |
type |
NULL, |
param |
further parameters (if |
mat.imp |
NULL, |
measure |
"EBAM Based", |
threshold |
the value of |
mu |
NULL, |
ebam |
an object of class |
Holger Schwender, [email protected]
Schwender, H. and Ickstadt, K. (2008). Empirical Bayes Analysis of Single Nucleotide Polymorphisms. BMC Bioinformatics, 9:144.
logic.bagging
, logicFS
,
vim.logicFS
, vim.norm
, vim.chisq
Quantifies the importance of each input variable occuring in at least one
of the logic regression models found in the application of logic.bagging
.
vim.input(object, useN = NULL, iter = NULL, prop = TRUE, standardize = NULL, mu = 0, addMatImp = FALSE, prob.case = 0.5, rand = NA)
vim.input(object, useN = NULL, iter = NULL, prop = TRUE, standardize = NULL, mu = 0, addMatImp = FALSE, prob.case = 0.5, rand = NA)
object |
an object of class |
useN |
logical specifying if the number of correctly classified out-of-bag observations should
be used in the computation of the importance measure. If |
iter |
integer specifying the number of times the values of the considered variable
are permuted in the computation of its importance. If |
prop |
should the proportion of logic regression models containing the respective variable also be computed? |
standardize |
should a standardized version of the importance measure for a set of variables
be returned? By default, |
mu |
a non-negative numeric value. Ignored if |
addMatImp |
should the matrix containing the improvements due to each of the variables in each of the logic regression models be added to the output? |
prob.case |
a numeric value between 0 and 1. If the logistic regression approach of logic
regression has been used in |
rand |
an integer for setting the random number generator in a reproducible case. |
An object of class logicFS
containing
vim |
the importances of the variables, |
prop |
the proportion of logic regression models containing the respective variable
(if |
primes |
the names of the variables, |
type |
the type of model (1: classification, 2:linear regression, 3: logistic regression), |
param |
further parameters (if |
mat.imp |
either a matrix containing the improvements due to the variables for each of the models
(if |
measure |
the name of the used importance measure, |
useN |
the value of |
threshold |
|
mu |
|
iter |
|
Holger Schwender, [email protected]
Schwender, H., Ruczinski, I., Ickstadt, K. (2011). Testing SNPs and Sets of SNPs for Importance in Association Studies. Biostatistics, 12, 18-32.
logic.bagging
, logicFS
,
vim.logicFS
, vim.set
, vim.ebam
, vim.chisq
Computes the value of the single or the multiple tree measure, respectively, for each prime implicant contained in a logic bagging model to specify the importance of the prime implicant for classification, if the response is binary. If the response is quantitative, the importance is specified by a measure based on the log2-transformed mean square prediction error. If the response is a time to an event, performance measures for time-to-event models are employed to determine the importance measures.
vim.logicFS(log.out, neighbor = NULL, adjusted = FALSE, useN = TRUE, onlyRemove = FALSE, prob.case = 0.5, addInfo = FALSE, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, addMatImp = TRUE)
vim.logicFS(log.out, neighbor = NULL, adjusted = FALSE, useN = TRUE, onlyRemove = FALSE, prob.case = 0.5, addInfo = FALSE, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, addMatImp = TRUE)
log.out |
an object of class |
neighbor |
a list consisting of character vectors specifying SNPs that are in LD. If specified, all SNPs need to occur exactly one time in this list. If specified, the importance measures are adjusted for LD by considering the SNPs within a LD block as exchangable. |
adjusted |
logical specifying whether the measures should be adjusted for noise. Often, the interaction actually associated with the response is not exactly found in some iterations of logic bagging, but an interaction is identified that additionally contains one (or seldomly more) noise SNPs. If |
useN |
logical specifying if the number of correctly classified out-of-bag observations should
be used in the computation of the importance measure. If |
onlyRemove |
should in the single tree case the multiple tree measure be used? If |
prob.case |
a numeric value between 0 and 1. If the logistic regression approach
of logic regression is used (i.e.\ if the response is binary, and in |
addInfo |
should further information on the logic regression models be added? |
score |
a character string naming the score that should be used in the computation of the importance measure for a survival time analysis. By default, the distance between predicted outcomes ( |
ensemble |
in the case of a survival outcome, should |
addMatImp |
should the matrix containing the improvements due to the prime implicants
in each of the iterations be added to the output? (For each of the prime implicants,
the importance is computed by the average over the |
An object of class logicFS
containing
primes |
the prime implicants, |
vim |
the importance of the prime implicants, |
prop |
the proportion of logic regression models containing the prime implicants (or the neighbors of the prime implicants, if |
type |
the type of model (1: classification, 2: linear regression, 3: logistic regression, 4: Cox regression), |
param |
further parameters (if |
mat.imp |
either the matrix containing the improvements if |
measure |
the name of the used importance measure, |
neighbor |
|
useN |
the value of |
threshold |
NULL, |
mu |
NULL. |
Holger Schwender, [email protected]; Tobias Tietz, [email protected]
Schwender, H., Ickstadt, K. (2007). Identification of SNP Interactions Using Logic Regression. Biostatistics, 9(1), 187-198.
Tietz, T., Selinski, S., Golka, K., Hengstler, J.G., Gripp, S., Ickstadt, K., Ruczinski, I., Schwender, H. (2018). Identification of Interactions of Binary Variables Associated with Survival Time Using survivalFS. Submitted.
logic.bagging
, logicFS
,
vim.norm
, vim.signperm
Computes a standarized or a sign-permutation based version of either the Single Tree Measure, the Quantitative Response Measure, or the Multiple Tree Measure.
vim.norm(object, mu = 0) vim.signperm(object, mu = 0, n.perm = 10000, n.subset = 1000, version = 1, adjust = "bonferroni", rand = NA)
vim.norm(object, mu = 0) vim.signperm(object, mu = 0, n.perm = 10000, n.subset = 1000, version = 1, adjust = "bonferroni", rand = NA)
object |
either the output of |
mu |
a non-negative numeric value against which the importances are tested. See |
n.perm |
the number of sign permutations used in |
n.subset |
an integer specifying how many permutations should be considered at once. |
version |
either |
adjust |
character vector naming the method with which the raw permutation based
p-values are adjusted for multiplicity. If |
rand |
an integer for setting the random number generator in a reproducible case. |
In both vim.norm
and vim.signperm
, a paired t-statistic is computed for each
prime implicant, where the numerator is given by mu
with VIM being the
single or the multiple tree importance, and the denominator is the corresponding standard
error computed by employing the B
improvements of the considered prime implicant
in the B
logic regression models, where VIM is the mean over these
B
improvements.
Note that in the case of a quantitative response, such a standardization is not necessary.
Thus, vim.norm
returns a warning when the response is quantitative, and vim.signperm
does not divide mu
by its sample standard error.
Using mu = 0
might lead to calling a prime implicant important, even though it actually
shows only improvements of 1 or 0. When considering the prime implicants, it might be therefore
be helpful to set mu
to a value slightly larger than zero.
In vim.norm
, the value of this t-statistic is returned as the standardized importance
of a prime implicant. The larger this value, the more important is the prime implicant. (This applies
to all importance measures – at least for those contained in this package.) Assuming normality,
a possible threshold for a prime implicant to be considered as important is the quantile
of the t-distribution with
degrees of freedom, where
is the number of prime implicants.
In vim.signperm
, the sign permutation is used to determine n.perm
permuted values of the
one-sample t-statistic, and to compute the raw p-values for each of the prime implicants. Afterwards,
these p-values are adjusted for multiple comparisons using the method specified by adjust
.
The permutation based importance of a prime implicant is then given by these adjusted p-values.
Here, a possible threshold for calling a prime implicant important is 0.95.
An object of class logicFS
containing
primes |
the prime implicants, |
vim |
the respective importance of the prime implicants, |
prop |
NULL, |
type |
the type of model (1: classification, 2: linear regression, 3: logistic regression), |
param |
further parameters (if |
mat.imp |
NULL, |
measure |
the name of the used importance measure, |
useN |
the value of |
threshold |
the threshold suggested in |
mu |
|
Holger Schwender, [email protected]
Schwender, H., Ruczinski, I., Ickstadt, K. (2011). Testing SNPs and Sets of SNPs for Importance in Association Studies. Biostatistics, 12, 18-32.
logic.bagging
, logicFS
,
vim.logicFS
, vim.chisq
, vim.ebam
Computes the importances of input variables, SNPs, or sets of SNPs, respectively, based on permutations of the response. Currently only available for the classification and the logistic regression approach of logic regression.
vim.permInput(object, n.perm = NULL, standardize = TRUE, rebuild = FALSE, prob.case = 0.5, useAll = FALSE, version = 1, adjust = "bonferroni", addMatPerm = FALSE, rand=NA) vim.permSNP(object, n.perm = NULL, standardize = TRUE, rebuild = FALSE, prob.case = 0.5, useAll = FALSE, version = 1, adjust = "bonferroni", addMatPerm = FALSE, rand = NA) vim.permSet(object, set = NULL, n.perm = NULL, standardize = TRUE, rebuild = FALSE, prob.case = 0.5, useAll = FALSE, version = 1, adjust = "bonferroni", addMatPerm = FALSE, rand = NA)
vim.permInput(object, n.perm = NULL, standardize = TRUE, rebuild = FALSE, prob.case = 0.5, useAll = FALSE, version = 1, adjust = "bonferroni", addMatPerm = FALSE, rand=NA) vim.permSNP(object, n.perm = NULL, standardize = TRUE, rebuild = FALSE, prob.case = 0.5, useAll = FALSE, version = 1, adjust = "bonferroni", addMatPerm = FALSE, rand = NA) vim.permSet(object, set = NULL, n.perm = NULL, standardize = TRUE, rebuild = FALSE, prob.case = 0.5, useAll = FALSE, version = 1, adjust = "bonferroni", addMatPerm = FALSE, rand = NA)
object |
an object of class |
set |
either a list or a character or numeric vector. If If a character or a numeric vector,
then the length of If |
n.perm |
number of permutations used in the computation of the importances. By default (i.e.\ if
|
standardize |
should the standardized importance measure be used? |
rebuild |
logical indicating whether the logic regression models should be rebuild (i.e.\ the parameters
|
prob.case |
a numeric value between 0 and 1. If the logistic regression approach of logic
regression has been used in |
useAll |
logical indicating whether all |
version |
either |
adjust |
character vector naming the method with which the raw permutation based
p-values are adjusted for multiplicity. If |
addMatPerm |
should the ( |
rand |
an integer for setting the random number generator in a reproducible state. |
An object of class logicFS
containing
vim |
the values of the importance measure for the input variables, the SNPs, or the sets of SNPs, respectively, |
prop |
|
primes |
the names of the inputs, SNPs, or sets of variables, respectively, |
type |
the type of model (1: classification, 3: logistic regression), |
param |
|
mat.imp |
|
measure |
the name of the used importance measure, |
threshold |
0.95, i.e.\ the suggested threshold for calling an input, SNP or set of SNPs, respectively, important
(this is just used as default value when plotting the importances, see argument |
mu |
|
useN |
|
name |
either |
mat.perm |
if |
Holger Schwender, [email protected]
Schwender, H., Ruczinski, I., Ickstadt, K. (2011). Testing SNPs and Sets of SNPs for Importance in Association Studies. Biostatistics, 12, 18-32.
logic.bagging
, vim.input
, vim.set
, vim.signperm
Quantifies the importances of SNPs or sets of variables, respectively, contained in a logic bagging model.
vim.snp(object, useN = NULL, iter = NULL, standardize = NULL, mu = 0, addMatImp = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, rand = NULL) vim.set(object, set = NULL, useN = NULL, iter = NULL, standardize = NULL, mu = 0, addMatImp = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, rand = NULL)
vim.snp(object, useN = NULL, iter = NULL, standardize = NULL, mu = 0, addMatImp = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, rand = NULL) vim.set(object, set = NULL, useN = NULL, iter = NULL, standardize = NULL, mu = 0, addMatImp = FALSE, prob.case = 0.5, score = c("DPO", "Conc", "Brier", "PL"), ensemble = FALSE, rand = NULL)
object |
an object of class |
set |
either a list or a character or numeric vector. If If a character or a numeric vector,
then the length of If |
useN |
logical specifying if the number of correctly classified out-of-bag observations should
be used in the computation of the importance measure. If |
iter |
integer specifying the number of times the values of the variables in the respective set
are permuted in the computation of the importance of this set. If |
standardize |
should a standardized version of the importance measure for a set of variables
be returned? By default, |
mu |
a non-negative numeric value. Ignored if |
addMatImp |
should the matrix containing the improvements due to each of the sets in each
of the logic regression models be added to the output? If |
prob.case |
a numeric value between 0 and 1. If the logistic regression approach of logic
regression has been used in |
score |
a character string naming the score that should be used in the computation of the importance measure for a survival time analysis. By default, the distance between predicted outcomes ( |
ensemble |
in the case of a survival outcome, should |
rand |
an integer for setting the random number generator in a reproducible state. |
An object of class logicFS
containing
vim |
the importances of the sets of variables, |
prop |
|
primes |
the names of the sets of variables, |
type |
the type of model (1: classification, 2:linear regression, 3: logistic regression, 4: Cox regression), |
param |
further parameters (if |
mat.imp |
either a matrix containing the improvements due to the sets of variables for each of the models
(if |
measure |
the name of the used importance measure, |
useN |
the value of |
threshold |
|
mu |
|
iter |
|
name |
|
Holger Schwender, [email protected]; Tobias Tietz, [email protected]
Schwender, H., Ruczinski, I., Ickstadt, K. (2011). Testing SNPs and Sets of SNPs for Importance in Association Studies. Biostatistics, 12, 18-32.
Tietz, T., Selinski, S., Golka, K., Hengstler, J.G., Gripp, S., Ickstadt, K., Ruczinski, I., Schwender, H. (2018). Identification of Interactions of Binary Variables Associated with Survival Time Using survivalFS. Submitted.
logic.bagging
, logicFS
,
vim.logicFS
, vim.input
, vim.ebam
, vim.chisq