| Title: | Machine Learning Interface for RNA-Seq Data |
|---|---|
| Description: | This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data. |
| Authors: | Gokmen Zararsiz [aut, cre], Dincer Goksuluk [aut], Selcuk Korkmaz [aut], Vahap Eldem [aut], Izzet Parug Duru [ctb], Ahmet Ozturk [aut], Ahmet Ergun Karaagaoglu [aut, ths] |
| Maintainer: | Gokmen Zararsiz <[email protected]> |
| License: | GPL(>=2) |
| Version: | 2.31.0 |
| Built: | 2026-05-30 07:33:03 UTC |
| Source: | https://github.com/bioc/MLSeq |
This package applies machine learning methods, such as Support Vector Machines (SVM), Random Forest (RF),
Classification and Regression Trees (CART), Linear Discriminant Analysis (LDA) and more to RNA-Seq data. MLSeq combines
well-known differential expression algorithms from bioconductor packages with functions from a famous package caret,
which has comprehensive machine learning algorithms for classification and regression tasks. Although caret has 200+
classification/regression algorithm built-in, approximately 85 classification algorithms are used in MLSeq for classifying
gene-expression data. See availableMethods() for further information.
Dincer Goksuluk, Gokmen Zararsiz, Selcuk Korkmaz, Vahap Eldem, Ahmet Ozturk and Ahmet Ergun Karaagaoglu
—————–
Maintainers:
Dincer Goksuluk [email protected]
Gokmen Zararsiz, [email protected]
Selcuk Korkmaz [email protected]
availableMethods, getModelInfo
| Package: | MLSeq |
| Type: | Package |
| License: | GPL (>= 2) |
MLSeq
This function returns a character vector of available classification/regression methods in MLSeq. These methods
are imported from caret package. See details below.
availableMethods(model = NULL, regex = TRUE, ...) printAvailableMethods()availableMethods(model = NULL, regex = TRUE, ...) printAvailableMethods()
model |
a character string indicating the name of classification model. If NULL, all the available methods from |
regex |
a logical: should a regular expressions be used? If FALSE, a simple match is conducted against the whole name of the model. |
... |
options to pass to |
There are 200+ methods available in caret. We import approximately 85 methods which are available for "classification" task.
Some of these methods are available for both classification and regression tasks. availableMethods() returns a character vector
of available methods in MLSeq. These names are directly used in classify function with arguement method.
See http://topepo.github.io/caret/available-models.html for a complete list of available methods in caret.
Run printAvailableMethods() to print detailed information about classification methods (prints to R Console).
a requested or complete character vector of available methods.
Available methods in MLSeq will be regularly updated. Some of the methods might be removed as well as some others
took its place in MLSeq. Please check the available methods before fitting the model. This function is inspired
from the function getModelInfo() in caret and some of the code chunks and help texts are used here.
Cervical cancer data measures the expressions of 714 miRNAs of human samples. There are 29 tumor and 29 non-tumor cervical samples and these two groups are treated as two separete classes.
A data frame with 58 observations and 714 variables (i.e miRNAs of human samples).
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880020/#supplementary-material-sec
Witten, D., et al. (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biology, 8:58
## Not run: data(cervical) ## End(Not run)## Not run: data(cervical) ## End(Not run)
This function fits classification algorithms to sequencing data and measures model performances using various statistics.
classify( data, method = "rpart", B = 25, ref = NULL, class.labels = NULL, preProcessing = c("deseq-vst", "deseq-rlog", "deseq-logcpm", "tmm-logcpm", "logcpm"), normalize = c("deseq", "TMM", "none"), control = NULL, ... )classify( data, method = "rpart", B = 25, ref = NULL, class.labels = NULL, preProcessing = c("deseq-vst", "deseq-rlog", "deseq-logcpm", "tmm-logcpm", "logcpm"), normalize = c("deseq", "TMM", "none"), control = NULL, ... )
data |
a |
method |
a character string indicating the name of classification method. Methods are implemented from the |
B |
an integer. It is the number of bootstrap samples for bagging classifiers, for example "bagFDA" and "treebag". Default is 25. |
ref |
a character string indicating the user defined reference class. Default is |
class.labels |
a character string indicating the column name of colData(...). Should be given as "character". The column from colData() which matches with given column name is used as class labels of samples. If NULL, first column is used as class labels. Default is NULL. |
preProcessing |
a character string indicating the name of the preprocessing method. This option consists both the normalization and transformation of the raw sequencing data. Available options are:
IMPORTANT: See Details for further information. |
normalize |
a character string indicating the type of normalization. Should be one of 'deseq', 'tmm' and 'none'. Default is 'deseq'. This option should be used with discrete and voom-based classifiers since no transformation is applied on raw counts. For caret-based classifiers, the argument 'preProcessing' should be used. |
control |
a list including all the control parameters passed to model training process. This arguement should be defined using wrapper functions
|
... |
optional arguments passed to selected classifiers. |
MLSeq consists both microarray-based and discrete-based classifiers along with the preprocessing approaches. These approaches include both normalization techniques, i.e. deseq median ratio (Anders et al., 2010) and trimmed mean of M values (Robinson et al., 2010) normalization methods, and the transformation techniques, i.e. variance- stabilizing transformation (vst)(Anders and Huber, 2010), regularized logarithmic transformation (rlog)(Love et al., 2014), logarithm of counts per million reads (log-cpm)(Robinson et al., 2010) and variance modeling at observational level (voom)(Law et al., 2014). Users can directly upload their raw RNA-Seq count data, preprocess their data, build one of the numerous classification models, optimize the model parameters and evaluate the model performances.
MLSeq package consists of a variety of classification algorithms for the classification of RNA-Seq data. These classifiers are categorized into two class: i) microarray-based classifiers after proper transformation, ii) discrete-based classifiers. First option is to transform the RNA-Seq data to bring it hierarchically closer to microarrays and apply microarray-based algorithms. These methods are implemented from the caret package. Run availableMethods() for a list of available methods. Note that voom transformation both exports transformed gene-expression matrix as well as the precision weight matrices in same dimension. Hence, the classifier should consider these two matrices. Zararsiz (2015) presented voom-based diagonal discriminant classifiers and the sparse voom-based nearest shrunken centroids classifier. Second option is to build new discrete-based classifiers to classify RNA-Seq data. Two methods are currently available in the literature. Witten (2011) considered modeling these counts with Poisson distribution and proposed sparse Poisson linear discriminant analysis (PLDA) classifier. The authors suggested a power transformation to deal with the overdispersion problem. Dong et al. (2016) extended this approach into a negative binomial linear discriminant analysis (NBLDA) classifier. More detailed information can be found in referenced papers.
an MLSeq object for trained model.
Dincer Goksuluk, Gokmen Zararsiz, Selcuk Korkmaz, Vahap Eldem, Ahmet Ozturk and Ahmet Ergun Karaagaoglu
Kuhn M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, (http://www.jstatsoft.org/v28/i05/)
Anders S. Huber W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106
Witten DM. (2011). Classification and clustering of sequencing data using a poisson model. The Annals of Applied Statistics, 5(4), 2493:2518
Law et al. (2014) Voom: precision weights unlock linear model analysis tools for RNA-Seq read counts, Genome Biology, 15:R29, doi:10.1186/gb-2014-15-2-r29
Witten D. et al. (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biology, 8:58
Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-Seq data. Genome Biology, 11:R25, doi:10.1186/gb-2010-11-3-r25
M. I. Love, W. Huber, and S. Anders (2014). Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome Biol, 15(12):550,. doi: 10.1186/s13059-014-0550-8.
Dong et al. (2016). NBLDA: negative binomial linear discriminant analysis for rna-seq data. BMC Bioinformatics, 17(1):369, Sep 2016. doi: 10.1186/s12859-016-1208-1.
Zararsiz G (2015). Development and Application of Novel Machine Learning Approaches for RNA-Seq Data Classification. PhD thesis, Hacettepe University, Institute of Health Sciences, June 2015.
predictClassify, train, trainControl,
voomControl, discreteControl
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## 1. caret-based classifiers: # Random Forest (RF) Classification rf <- classify(data = data.trainS4, method = "rf", preProcessing = "deseq-vst", ref = "T", control = trainControl(method = "repeatedcv", number = 5, repeats = 2, classProbs = TRUE)) rf # 2. Discrete classifiers: # Poisson Linear Discriminant Analysis pmodel <- classify(data = data.trainS4, method = "PLDA", ref = "T", class.labels = "condition",normalize = "deseq", control = discreteControl(number = 5, repeats = 2, tuneLength = 10, parallel = TRUE)) pmodel # 3. voom-based classifiers: # voom-based Nearest Shrunken Centroids vmodel <- classify(data = data.trainS4, normalize = "deseq", method = "voomNSC", class.labels = "condition", ref = "T", control = voomControl(number = 5, repeats = 2, tuneLength = 10)) vmodel ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## 1. caret-based classifiers: # Random Forest (RF) Classification rf <- classify(data = data.trainS4, method = "rf", preProcessing = "deseq-vst", ref = "T", control = trainControl(method = "repeatedcv", number = 5, repeats = 2, classProbs = TRUE)) rf # 2. Discrete classifiers: # Poisson Linear Discriminant Analysis pmodel <- classify(data = data.trainS4, method = "PLDA", ref = "T", class.labels = "condition",normalize = "deseq", control = discreteControl(number = 5, repeats = 2, tuneLength = 10, parallel = TRUE)) pmodel # 3. voom-based classifiers: # voom-based Nearest Shrunken Centroids vmodel <- classify(data = data.trainS4, normalize = "deseq", method = "voomNSC", class.labels = "condition", ref = "T", control = voomControl(number = 5, repeats = 2, tuneLength = 10)) vmodel ## End(Not run)
This slot stores the confusion matrix for the trained model using classify function.
confusionMat(object) ## S4 method for signature 'MLSeq' confusionMat(object) ## S4 method for signature 'MLSeqModelInfo' confusionMat(object)confusionMat(object) ## S4 method for signature 'MLSeq' confusionMat(object) ## S4 method for signature 'MLSeqModelInfo' confusionMat(object)
object |
an |
confusionMat slot includes information about cross-tabulation of observed and predicted classes
and corresponding statistics such as accuracy rate, sensitivity, specifity, etc. The returned object
is in confusionMatrix class of caret package. See confusionMatrix for details.
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) confusionMat(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) confusionMat(cart) ## End(Not run)
This slot stores the information about control parameters of selected classification model.
control(object) control(object) <- value ## S4 method for signature 'MLSeq' control(object) ## S4 replacement method for signature 'MLSeq,list' control(object) <- valuecontrol(object) control(object) <- value ## S4 method for signature 'MLSeq' control(object) ## S4 replacement method for signature 'MLSeq,list' control(object) <- value
object |
an |
value |
a list with elements for controlling trained model. It should be a list returned from one of
|
discreteControl, voomControl, trainControl
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) control(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) control(cart) ## End(Not run)
discrete.train objectThis object is the subclass for the MLSeq.train class. It contains trained model information for discrete
classifiers such as Poisson Linear Discriminant Analysis (PLDA) and Negative Binomial Linear Discriminant Analysis (NBLDA).
inputs:a list with elements used as input for classification task.
control:a list with control parameters for discrete classifiers, e.g. PLDA, PLDA2 and NBLDA.
crossValidatedModel:a list. It stores the results for cross validation.
finalModel:a list. This is the trained model with optimum parameters.
tuningResults:a list. It stores the results for tuning parameter if selected classifier has one or more parameters to be optimized.
callInfo:a list. call info for selected method.
This function sets the control parameters for discrete classifiers (PLDA and NBLDA) while training the model.
discreteControl( method = "repeatedcv", number = 5, repeats = 10, rho = NULL, rhos = NULL, beta = 1, prior = NULL, alpha = NULL, truephi = NULL, foldIdx = NULL, tuneLength = 30, parallel = FALSE, ... )discreteControl( method = "repeatedcv", number = 5, repeats = 10, rho = NULL, rhos = NULL, beta = 1, prior = NULL, alpha = NULL, truephi = NULL, foldIdx = NULL, tuneLength = 30, parallel = FALSE, ... )
method |
validation method. Support repeated cross validation only ("repeatedcv"). |
number |
a positive integer. Number of folds. |
repeats |
a positive integer. Number of repeats. |
rho |
a single numeric value. This parameter is used as tuning parameter in PLDA classifier. It does not effect NBLDA classifier. |
rhos |
a numeric vector. If optimum parameter is searched among given values, this option should be used. |
beta |
parameter of Gamma distribution. See PLDA for details. |
prior |
prior probabilities of each class. a numeric vector. |
alpha |
a numeric value in the interval 0 and 1. It is used to apply power transformation through PLDA method. |
truephi |
a numeric value. If true value of genewise dispersion is known and constant for all genes, this parameter should be used. |
foldIdx |
a list including the fold indexes. Each element of this list is the vector indices of samples which are used as test set in this fold. |
tuneLength |
a positive integer. If there is a tuning parameter in the classifier, this value is used to define total number of tuning parameter to be searched. |
parallel |
if TRUE, parallel computing is performed. |
... |
further arguments. Deprecated. |
Dincer Goksuluk, Gokmen Zararsiz, Selcuk Korkmaz, Vahap Eldem, Ahmet Ozturk and Ahmet Ergun Karaagaoglu
classify, trainControl, discreteControl
1L1L
MLSeq objectMLSeq package benefits from DESeqDataSet structure from bioconductor package DESeq2 for storing gene
expression data in a comprehensive structure. This object is used as an input for classification task through classify.
The input is stored in inputObject slot of MLSeq object.
input(object) ## S4 method for signature 'MLSeq' input(object)input(object) ## S4 method for signature 'MLSeq' input(object)
object |
an |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) input(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) input(cart) ## End(Not run)
These functions are used to check whether the MLSeq object is modified and/or updated. It is possible to update
classification parameters of MLSeq object which is returned by classify() function.
isUpdated(object) isUpdated(object) <- value isModified(object) isModified(object) <- value ## S4 method for signature 'MLSeq' isUpdated(object) ## S4 replacement method for signature 'MLSeq,logical' isUpdated(object) <- value ## S4 method for signature 'MLSeq' isModified(object) ## S4 replacement method for signature 'MLSeq,logical' isModified(object) <- valueisUpdated(object) isUpdated(object) <- value isModified(object) isModified(object) <- value ## S4 method for signature 'MLSeq' isUpdated(object) ## S4 replacement method for signature 'MLSeq,logical' isUpdated(object) <- value ## S4 method for signature 'MLSeq' isModified(object) ## S4 replacement method for signature 'MLSeq,logical' isModified(object) <- value
object |
an |
value |
a logical. Change the state of update info. |
a logical.
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) isUpdated(cart) isModified(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) isUpdated(cart) isModified(cart) ## End(Not run)
MLSeq objectThis slot stores metada information of MLSeq object.
metaData(object) ## S4 method for signature 'MLSeq' metaData(object)metaData(object) ## S4 method for signature 'MLSeq' metaData(object)
object |
an |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) metaData(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) metaData(cart) ## End(Not run)
This slot stores the name of selected model which is used in classify function.
The trained model is stored in slot trainedModel.
See trained for details.
method(object) method(object) <- value ## S4 method for signature 'MLSeq' method(object) ## S4 method for signature 'MLSeqModelInfo' method(object) ## S4 replacement method for signature 'MLSeq,character' method(object) <- valuemethod(object) method(object) <- value ## S4 method for signature 'MLSeq' method(object) ## S4 method for signature 'MLSeqModelInfo' method(object) ## S4 replacement method for signature 'MLSeq,character' method(object) <- value
object |
an |
value |
a character string. One of the available classification methods to replace with current method stored in MLSeq object. |
method slot stores the name of the classification method such as "svmRadial" for Radial-based Support Vector Machines, "rf" for Random Forests, "voomNSC" for
voom-based Nearest Shrunken Centroids, etc. For the complete list of available methods, see printAvailableMethods and availableMethods.
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) method(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) method(cart) ## End(Not run)
MLSeq objectFor classification, this is the main class for the MLSeq package. It contains all the information including trained model,
selected genes, cross-validation results, etc.
Objects can be created by calls of the form new("MLSeq", ...). This type
of objects is created as a result of classify function of MLSeq package.
It is then used in predict or predictClassify function for predicting the class labels of new samples.
inputObject:stores the data in DESeqDataSet object.
modelInfo:stores all the information about classification model. The object is from subclass MLSeqModelInfo. See MLSeqModelInfo-class for details.
metaData:metadata for MLSeq object. The object is from subclass MLSeqMetaData. See MLSeqMetaData-class for details.
An MLSeq class stores the results of classify function and offers further slots that are populated
during the analysis. The slot inputObject stores the raw and transformed data throughout the classification. The slot
modelInfo stores all the information about classification model. These results may contain the classification table
and performance measures such as accuracy rate, sensitivity, specifity, positive and negative predictive values, etc. It also
contains information on classification method, normalization and transformation used in the classification model.
Lastly, the slot metaData stores the information about modified or updated slots in MLSeq object.
Dincer Goksuluk, Gokmen Zararsiz, Selcuk Korkmaz, Vahap Eldem, Ahmet Ozturk and Ahmet Ergun Karaagaoglu
MLSeqModelInfo-class, MLSeqMetaData-class
MLSeqMetaData objectThis object is a subclass for the MLSeq class. It contains metadata information, i.e. information on modified and/or
updated elements, raw data etc..
Objects can be created by calls of the form new("MLSeqMetaData", ...). This type
of objects is created as a result of classify function of MLSeq package.
It is then used in update function for updating the object in given object.
updated, modified:a logical. See notes for details.
modified.elements:a list containing the modified elements in MLSeq obejct.
rawData.DESeqDataSet:raw data which is used for classification.
classLabel:a character string indicating the name of class variable.
The function update is used to re-run classification task with modified elements in MLSeq object. This function is
useful when one wish to perform classification task with modified options without running classify function from the beginning.
MLSeqMetaData object is used to store information on updated and/or modified elements in MLSeq object.
If an MLSeq object is modified, i.e. one or more elements in MLSeq object is replaced using related setter functions such as
method, ref etc., the slot modified becomes TRUE. Similarly, the slot updated stores the
information that the MLSeq object is updated (or classification task is re-runned) or not. If updated slot is FALSE and modified slot is TRUE, one
should run update to obtain the classification results by considering the modified elements.
MLSeqModelInfo objectFor classification, this is the subclass for the MLSeq class. This object contains all the information about classification model.
Objects can be created by calls of the form MLSeqModelInfo(...). This type
of objects is created as a result of classify function of MLSeq package.
It is then used in predictClassify function for predicting the class labels of new samples.
method, transformation, normalization:these slots store the classification method, transformation technique and normalization method respectively. See notes for details.
preProcessing:See classify for details.
ref:a character string indicating the reference category for cases (diseased subject, tumor sample, etc.)
control:a list with controlling parameters for classification task.
confusionMat:confusion table and accuracy measures for the predictions.
trainedModel:an object of MLSeq.train class. It contains the trained model. See notes for details.
trainParameters:a list with training parameters from final model. These parameters are used for test set before predicting class labels.
call:a call object for classification task.
method, transformation, normalization slots give the information on classifier, transformation and normalization techniques.
Since all possible pairs of transformation and normalization are not available in practice, we specify appropriate transformations and
normalization techniques with preProcessing argument in classify function. Finally, the information on normalization and transformation
is extracted from preProcessing argument.
MLSeq.train is a union class of train from caret package, voom.train and discrete.train from MLSeq package. See related class
manuals for details.
train, voom.train-class, discrete.train-class
MLSeq objectThis slot stores all the information about classification model.
modelInfo(object) ## S4 method for signature 'MLSeq' modelInfo(object)modelInfo(object) ## S4 method for signature 'MLSeq' modelInfo(object)
object |
an |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) modelInfo(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) modelInfo(cart) ## End(Not run)
This slot stores the name of normalization method which is used while normalizing the count data such as "deseq", "tmm" or "none"
normalization(object) normalization(object) <- value ## S4 method for signature 'MLSeq' normalization(object) ## S4 method for signature 'MLSeqModelInfo' normalization(object) ## S4 replacement method for signature 'MLSeq,character' normalization(object) <- valuenormalization(object) normalization(object) <- value ## S4 method for signature 'MLSeq' normalization(object) ## S4 method for signature 'MLSeqModelInfo' normalization(object) ## S4 replacement method for signature 'MLSeq,character' normalization(object) <- value
object |
an |
value |
a character string. One of the available normalization methods for voom-based classifiers. |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) normalization(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) normalization(cart) ## End(Not run)
This generic function is used to plot accuracy results from 'MLSeq' object returned by
classify function.
## S3 method for class 'MLSeq' plot(x, y, ...) ## S4 method for signature 'MLSeq,ANY' plot(x, y, ...)## S3 method for class 'MLSeq' plot(x, y, ...) ## S4 method for signature 'MLSeq,ANY' plot(x, y, ...)
x |
an |
y |
this parameter is not used. Deprecated. |
... |
further arguements. Deprecated. |
Dincer Goksuluk, Gokmen Zararsiz, Selcuk Korkmaz, Vahap Eldem, Ahmet Ozturk and Ahmet Ergun Karaagaoglu
classify() objectThis function predicts the class labels of test data for a given model.
predictClassify and predict functions return the predicted class information along with trained model.
Predicted values are given either as class labels or estimated probabilities of each class for
each sample. If type = "raw", as can be seen in the example below, the predictions are
extracted as raw class labels.In order to extract estimated class probabilities, one should follow the steps below:
set classProbs = TRUE within control arguement in classify
set type = "prob" within predictClassify
## S3 method for class 'MLSeq' predict(object, test.data, ...) predictClassify(object, test.data, ...) ## S4 method for signature 'MLSeq' predict(object, test.data, ...)## S3 method for class 'MLSeq' predict(object, test.data, ...) predictClassify(object, test.data, ...) ## S4 method for signature 'MLSeq' predict(object, test.data, ...)
object |
a model of |
test.data |
a |
... |
further arguments to be passed to or from methods. These arguements are used in
|
MLSeqObject an MLSeq object returned from classify. See details.
Predictions a data frame or vector including either the predicted class
probabilities or class labels of given test data.
predictClassify(...) function was used in MLSeq up to package version 1.14.x. This function is alliased with
generic function predict. In the upcoming versions of MLSeq package, predictClassify function will be ommitted. Default
function for predicting new observations will be predict from version 1.16.x and later.
Dincer Goksuluk, Gokmen Zararsiz, Selcuk Korkmaz, Vahap Eldem, Ahmet Ozturk and Ahmet Ergun Karaagaoglu
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) # test set data.test <- data[ ,ind] data.test <- as.matrix(data.test + 1) classts <- data.frame(condition=class[ind, ]) data.testS4 <- DESeqDataSetFromMatrix(countData = data.test, colData = classts, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) cart # predicted classes of test samples for CART method (class probabilities) pred.cart = predictClassify(cart, data.testS4, type = "prob") pred.cart # predicted classes of test samples for RF method (class labels) pred.cart = predictClassify(cart, data.testS4, type = "raw") pred.cart ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) # test set data.test <- data[ ,ind] data.test <- as.matrix(data.test + 1) classts <- data.frame(condition=class[ind, ]) data.testS4 <- DESeqDataSetFromMatrix(countData = data.test, colData = classts, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) cart # predicted classes of test samples for CART method (class probabilities) pred.cart = predictClassify(cart, data.testS4, type = "prob") pred.cart # predicted classes of test samples for RF method (class labels) pred.cart = predictClassify(cart, data.testS4, type = "raw") pred.cart ## End(Not run)
MLSeq objectMLSeq package benefits from DESeqDataSet structure from bioconductor package DESeq2 for storing gene
expression data in a comprehensive structure. This object is used as an input for classification task through classify.
The input is stored in inputObject slot of MLSeq object.
preProcessing(object) preProcessing(object) <- value ## S4 method for signature 'MLSeq' preProcessing(object) ## S4 replacement method for signature 'MLSeq,character' preProcessing(object) <- valuepreProcessing(object) preProcessing(object) <- value ## S4 method for signature 'MLSeq' preProcessing(object) ## S4 replacement method for signature 'MLSeq,character' preProcessing(object) <- value
object |
an |
value |
a character string. Which preProcessing should be replaced with current one? |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) preProcessing(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) preProcessing(cart) ## End(Not run)
This function prints the confusion matrix of the model.
## S3 method for class 'confMat' print(x, ..., mode = x$mode, digits = max(3, getOption("digits") - 3)) ## S4 method for signature 'confMat' print(x, ..., mode = x$mode, digits = max(3, getOption("digits") - 3))## S3 method for class 'confMat' print(x, ..., mode = x$mode, digits = max(3, getOption("digits") - 3)) ## S4 method for signature 'confMat' print(x, ..., mode = x$mode, digits = max(3, getOption("digits") - 3))
x |
an object of class |
... |
further arguments to be passed to |
mode |
|
digits |
This slot stores the information about reference category. Confusion matrix and related statistics are calculated using the user-defined reference category.
ref(object) ref(object) <- value ## S4 method for signature 'MLSeq' ref(object) ## S4 method for signature 'MLSeqModelInfo' ref(object) ## S4 replacement method for signature 'MLSeq,character' ref(object) <- valueref(object) ref(object) <- value ## S4 method for signature 'MLSeq' ref(object) ## S4 method for signature 'MLSeqModelInfo' ref(object) ## S4 replacement method for signature 'MLSeq,character' ref(object) <- value
object |
an |
value |
a character string. Select reference category for class labels. |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) ref(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) ref(cart) ## End(Not run)
This slot stores the name of selected genes which are used in the classifier.
The trained model is stored in slot trainedModel. See trained for details.
selectedGenes(object) ## S4 method for signature 'MLSeq' selectedGenes(object)selectedGenes(object) ## S4 method for signature 'MLSeq' selectedGenes(object)
object |
an |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) selectedGenes(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) selectedGenes(cart) ## End(Not run)
Prints out the information from the trained model using classify function.
show.MLSeq(object) ## S4 method for signature 'MLSeq' show(object) ## S4 method for signature 'MLSeqModelInfo' show(object) ## S4 method for signature 'MLSeqMetaData' show(object) ## S4 method for signature 'voom.train' show(object) ## S4 method for signature 'discrete.train' show(object)show.MLSeq(object) ## S4 method for signature 'MLSeq' show(object) ## S4 method for signature 'MLSeqModelInfo' show(object) ## S4 method for signature 'MLSeqMetaData' show(object) ## S4 method for signature 'voom.train' show(object) ## S4 method for signature 'discrete.train' show(object)
object |
an |
This slot stores the trained model. This object is returned from train function in caret package.
Any further request using caret functions is available for trainedModel since this object is in the
same class as the returned object from train. See train for details.
trained(object) ## S4 method for signature 'MLSeq' trained(object) ## S4 method for signature 'MLSeqModelInfo' trained(object)trained(object) ## S4 method for signature 'MLSeq' trained(object) ## S4 method for signature 'MLSeqModelInfo' trained(object)
object |
an |
train.default, voom.train-class, discrete.train-class
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) trained(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) trained(cart) ## End(Not run)
This slot stores the transformation and normalization parameters from train set. These parameters are used to normalize and transform test set using train set parameters.
trainParameters(object) ## S4 method for signature 'MLSeq' trainParameters(object) ## S4 method for signature 'MLSeqModelInfo' trainParameters(object)trainParameters(object) ## S4 method for signature 'MLSeq' trainParameters(object) ## S4 method for signature 'MLSeqModelInfo' trainParameters(object)
object |
an |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) trainParameters(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) trainParameters(cart) ## End(Not run)
This slot stores the name of transformation method which is used while transforming the count data (e.g "vst", "rlog", etc.)
transformation(object) ## S4 method for signature 'MLSeq' transformation(object) ## S4 method for signature 'MLSeqModelInfo' transformation(object)transformation(object) ## S4 method for signature 'MLSeq' transformation(object) ## S4 method for signature 'MLSeqModelInfo' transformation(object)
object |
an |
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) transformation(cart) ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) transformation(cart) ## End(Not run)
MLSeq objects returnd from classify()
This function updates the MLSeq object. If one of the options is changed inside MLSeq object, it should be updated to pass its effecs into classification results.
## S3 method for class 'MLSeq' update(object, ..., env = .GlobalEnv) ## S4 method for signature 'MLSeq' update(object, ..., env = .GlobalEnv)## S3 method for class 'MLSeq' update(object, ..., env = .GlobalEnv) ## S4 method for signature 'MLSeq' update(object, ..., env = .GlobalEnv)
object |
a model of |
... |
optional arguements passed to |
env |
an environment. Define the environment where the trained model is stored. |
same object as an MLSeq object returned from classify.
When an MLSeq object is updated, new results are updated on the given object. The results before update process are
lost when update is done. To keep the results before update, one should copy the MLSeq object to a new object in global environment.
## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) # test set data.test <- data[ ,ind] data.test <- as.matrix(data.test + 1) classts <- data.frame(condition=class[ind, ]) data.testS4 <- DESeqDataSetFromMatrix(countData = data.test, colData = classts, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) cart # Change classification model into "Random Forests" (rf) method(cart) <- "rf" rf <- update(cart) rf ## End(Not run)## Not run: library(DESeq2) data(cervical) # a subset of cervical data with first 150 features. data <- cervical[c(1:150), ] # defining sample classes. class <- data.frame(condition = factor(rep(c("N","T"), c(29, 29)))) n <- ncol(data) # number of samples p <- nrow(data) # number of features # number of samples for test set (30% test, 70% train). nTest <- ceiling(n*0.3) ind <- sample(n, nTest, FALSE) # train set data.train <- data[ ,-ind] data.train <- as.matrix(data.train + 1) classtr <- data.frame(condition = class[-ind, ]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ 1)) # test set data.test <- data[ ,ind] data.test <- as.matrix(data.test + 1) classts <- data.frame(condition=class[ind, ]) data.testS4 <- DESeqDataSetFromMatrix(countData = data.test, colData = classts, formula(~ 1)) ## Number of repeats (repeats) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart <- classify(data = data.trainS4, method = "rpart", ref = "T", preProcessing = "deseq-vst", control = trainControl(method = "repeatedcv", number = 5, repeats = 3, classProbs = TRUE)) cart # Change classification model into "Random Forests" (rf) method(cart) <- "rf" rf <- update(cart) rf ## End(Not run)
voom.train objectThis object is the subclass for the MLSeq.train class. It contains trained model information for voom based
classifiers, i.e. "voomDLDA", "voomDQDA" and "voomNSC".
weigtedStats:a list with elements of weighted statistics which are used for training the model. Weights are calculated from voom transformation.
foldInfo:a list containing information on cross-validated folds.
control:a list with control parameters for voom based classifiers.
tuningResults:a list. It stores the cross-validation results for tuning parameter(s).
finalModel:a list. It stores results for trained model with optimum parameters.
callInfo:a list. call info for related function.
This function sets the control parameters for voom based classifiers while training the model.
voomControl(method = "repeatedcv", number = 5, repeats = 10, tuneLength = 10)voomControl(method = "repeatedcv", number = 5, repeats = 10, tuneLength = 10)
method |
validation method. Support repeated cross validation only ("repeatedcv"). |
number |
a positive integer. Number of folds. |
repeats |
a positive integer. Number of repeats. |
tuneLength |
a positive integer. If there is a tuning parameter in the classifier, this value is used to define total number of tuning parameter to be searched. |
Dincer Goksuluk, Gokmen Zararsiz, Selcuk Korkmaz, Vahap Eldem, Ahmet Ozturk and Ahmet Ergun Karaagaoglu
classify, trainControl, discreteControl
1L1L