Package 'biosigner' reference manual

Title:	Signature discovery from omics data
Description:	Feature selection is critical in omics data analysis to extract restricted and meaningful molecular signatures from complex and high-dimension data, and to build robust classifiers. This package implements a new method to assess the relevance of the variables for the prediction performances of the classifier. The approach can be run in parallel with the PLS-DA, Random Forest, and SVM binary classifiers. The signatures and the corresponding 'restricted' models are returned, enabling future predictions on new datasets. A Galaxy implementation of the package is available within the Workflow4metabolomics.org online infrastructure for computational metabolomics.
Authors:	Philippe Rinaudo [aut], Etienne A. Thevenot [aut, cre]
Maintainer:	Etienne A. Thevenot <[email protected]>
License:	CeCILL
Version:	1.35.0
Built:	2025-03-29 06:06:55 UTC
Source:	https://github.com/bioc/biosigner

Molecular signature discovery from omics data

Description

Feature selection is critical in omics data analysis to extract restricted and meaningful molecular signatures from complex and high-dimension data, and to build robust classifiers. This package implements a new method to assess the relevance of the variables for the prediction performances of the classifier. The approach can be run in parallel with the PLS-DA, Random Forest, and SVM binary classifiers. The signatures and the corresponding 'restricted' models are returned, enabling future predictions on new datasets. A Galaxy implementation of the package is available within the Workflow4metabolomics.org online infrastructure for computational metabolomics.

Author(s)

Philippe Rinaudo <[email protected]> and Etienne A. Thevenot <[email protected]>.

Maintainer: Etienne A. Thevenot <[email protected]>

Builds the molecular signature.

Description

Main function of the 'biosigner' package. For each of the available classifiers (PLS-DA, Random Forest, and SVM), the significant features are selected and the corresponding models are built.

Usage

biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = NA,
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'matrix'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'data.frame'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'SummarizedExperiment'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'MultiAssayExperiment'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'ExpressionSet'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'MultiDataSet'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = NA,
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'matrix'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'data.frame'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'SummarizedExperiment'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'MultiAssayExperiment'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'ExpressionSet'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'MultiDataSet'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

Arguments

`x`	Numerical data frame or matrix (observations x variables), or SummarizedExperiment (or ExpressionSet) object ; NAs are allowed for PLS-DA but for SVM, samples with NA will be removed
`y`	Two-level factor corresponding to the class labels, or a character indicating the name of the column of the pData to be used, when x is an ExpressionSet object
`methodVc`	Character vector: Either one or all of the following classifiers: Partial Least Squares Discriminant Analysis ('plsda'), or Random Forest ('randomforest'), or Support Vector Machine ('svm')
`bootI`	Integer: Number of bootstaps for resampling
`pvalN`	Numeric: To speed up the selection, only variables which significantly improve the model up to two times this threshold (to take into account potential fluctuations) are computed
`permI`	Integer: Random permutation are used to assess the significance of each new variable included into the model (forward selection)
`fixRankL`	Logical: Should the initial ranking be computed with the full model only, or as the median of the ranks from the models built on the sampled dataset?
`seedI`	integer: optional seed to obtain exactly the same signature when rerunning biosigner; default is '123'; set to NULL to prevent seed setting
`plotSubC`	Character: Graphic subtitle
`fig.pdfC`	Character: File name with '.pdf' extension for the figure; if 'interactive' (default), figures will be displayed interactively; if 'none', no figure will be generated
`info.txtC`	Character: File name with '.txt' extension for the printed results (call to sink()'); if 'interactive' (default), messages will be printed on the screen; if 'none', no verbose will be generated

Value

An S4 object of class 'biosign' containing the following slots: 1) 'methodVc' character vector: selected classifier(s) ('plsda', 'randomforest', and/or 'svm'), 2) 'accuracyMN' numeric matrix: balanced accuracies for the full models, and the models restricted to the 'S' and 'AS' signatures (predictions are obtained by using the resampling scheme selected with the 'bootI' and 'crossvalI' arguments), 3) 'tierMC' character matrix: contains the tier ('S', 'A', 'B', 'C', 'D', or 'E') of each feature for each classifier (features with tier 'S' have been found significant in all backward selections; features with tier 'A' have been found significant in all but the last selection, and so on), 4) modelLs list: selected classifier(s) trained on the subset restricted to the 'S' features, 5) signatureLs list: 'S' signatures for each classifier; and 6) 'AS' list: 'AS' signatures and corresponding trained classifiers, in addition to the dataset restricted to tiers 'S' and 'A' ('xMN') and the labels ('yFc')

Author(s)

Philippe Rinaudo and Etienne Thevenot (CEA)

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

# signature selection for all 3 classifiers
# a bootI = 5 number of bootstraps is used for this example
# we recommend to keep the default bootI = 50 value for your analyzes

diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## Application to a SummarizedExperiment

diaplasma.se <- SummarizedExperiment::SummarizedExperiment(assays = list(diaplasma = t(dataMatrix)),
                                                           colData = sampleMetadata,
                                                           rowData = variableMetadata)
                                                           
# restricting to the first 100 features to speed up the example

diaplasma.se <- diaplasma.se[1:100, ]

diaplasma.se <- biosign(diaplasma.se, "type", bootI = 5)

head(SummarizedExperiment::rowData(diaplasma.se))

# getting the biosign output

diaplasma_type.biosign <- getBiosign(diaplasma.se)[["type_plsda.forest.svm"]]

getAccuracyMN(diaplasma_type.biosign)

## Application to an ExpressionSet

diaSet <- Biobase::ExpressionSet(assayData = t(dataMatrix), 
                                 phenoData = new("AnnotatedDataFrame", 
                                                data = sampleMetadata), 
                                 featureData = new("AnnotatedDataFrame", 
                                                data = variableMetadata),
                                 experimentData = new("MIAME", 
                                               title = "diaplasma"))
                                             
# restricting to the first 100 features to speed up the example

diaSet <- diaSet[1:100, ]
                                             
diaSign <- biosign(diaSet, "type", bootI = 5)
diaSet <- getEset(diaSign)
head(Biobase::fData(diaSet))

detach(diaplasma)

## Application to a MultiAssayExperiment

data("NCI60", package = "ropls")
nci.mae <- NCI60[["mae"]]
library(MultiAssayExperiment)

# Cancer types

table(nci.mae$cancer)

# Restricting to the 'ME' and 'LE' cancer types and to the 'agilent' and 'hgu95' datasets

nci.mae <- nci.mae[, nci.mae$cancer %in% c("ME", "LE"), c("agilent", "hgu95")]

# Selecting the significant features for PLS-DA, RF, and SVM classifiers

nci.mae <- biosign(nci.mae, "cancer", bootI = 5)

# Getting the tiers

SummarizedExperiment::rowData(nci.mae[["agilent"]])

# Getting the models

mae_biosign.ls <- getBiosign(nci.mae)

# Name of the models stored in the (metadata of) each SummarizedExperiment object

names(mae_biosign.ls[["agilent"]])

# Visualizing the individual results

for (set.c in names(mae_biosign.ls))
  plot(mae_biosign.ls[[set.c]][["cancer_plsda.forest.svm"]],
       typeC = "tier",
       plotSubC = set.c)

## Application to a MultiDataSet

data("NCI60", package = "ropls")
nci.mds <- NCI60[["mds"]]

# Restricting to the "agilent" and "hgu95" datasets

nci.mds <- nci.mds[, c("agilent", "hgu95")]

# Restricting to the 'ME' and 'LE' cancer types

library(Biobase)
sample_names.vc <- Biobase::sampleNames(nci.mds[["agilent"]])
cancer_type.vc <- Biobase::pData(nci.mds[["agilent"]])[, "cancer"]
nci.mds <- nci.mds[sample_names.vc[cancer_type.vc %in% c("ME", "LE")], ]

# Selecting the significant features for PLS-DA, RF, and SVM classifiers

nci_cancer.biosign <- biosign(nci.mds, "cancer", bootI = 5)

# Getting back the updated MultiDataSet

nci.mds <- getMset(nci_cancer.biosign)

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

# signature selection for all 3 classifiers
# a bootI = 5 number of bootstraps is used for this example
# we recommend to keep the default bootI = 50 value for your analyzes

diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## Application to a SummarizedExperiment

diaplasma.se <- SummarizedExperiment::SummarizedExperiment(assays = list(diaplasma = t(dataMatrix)),
                                                           colData = sampleMetadata,
                                                           rowData = variableMetadata)
                                                           
# restricting to the first 100 features to speed up the example

diaplasma.se <- diaplasma.se[1:100, ]

diaplasma.se <- biosign(diaplasma.se, "type", bootI = 5)

head(SummarizedExperiment::rowData(diaplasma.se))

# getting the biosign output

diaplasma_type.biosign <- getBiosign(diaplasma.se)[["type_plsda.forest.svm"]]

getAccuracyMN(diaplasma_type.biosign)

## Application to an ExpressionSet

diaSet <- Biobase::ExpressionSet(assayData = t(dataMatrix), 
                                 phenoData = new("AnnotatedDataFrame", 
                                                data = sampleMetadata), 
                                 featureData = new("AnnotatedDataFrame", 
                                                data = variableMetadata),
                                 experimentData = new("MIAME", 
                                               title = "diaplasma"))
                                             
# restricting to the first 100 features to speed up the example

diaSet <- diaSet[1:100, ]
                                             
diaSign <- biosign(diaSet, "type", bootI = 5)
diaSet <- getEset(diaSign)
head(Biobase::fData(diaSet))

detach(diaplasma)

## Application to a MultiAssayExperiment

data("NCI60", package = "ropls")
nci.mae <- NCI60[["mae"]]
library(MultiAssayExperiment)

# Cancer types

table(nci.mae$cancer)

# Restricting to the 'ME' and 'LE' cancer types and to the 'agilent' and 'hgu95' datasets

nci.mae <- nci.mae[, nci.mae$cancer %in% c("ME", "LE"), c("agilent", "hgu95")]

# Selecting the significant features for PLS-DA, RF, and SVM classifiers

nci.mae <- biosign(nci.mae, "cancer", bootI = 5)

# Getting the tiers

SummarizedExperiment::rowData(nci.mae[["agilent"]])

# Getting the models

mae_biosign.ls <- getBiosign(nci.mae)

# Name of the models stored in the (metadata of) each SummarizedExperiment object

names(mae_biosign.ls[["agilent"]])

# Visualizing the individual results

for (set.c in names(mae_biosign.ls))
  plot(mae_biosign.ls[[set.c]][["cancer_plsda.forest.svm"]],
       typeC = "tier",
       plotSubC = set.c)

## Application to a MultiDataSet

data("NCI60", package = "ropls")
nci.mds <- NCI60[["mds"]]

# Restricting to the "agilent" and "hgu95" datasets

nci.mds <- nci.mds[, c("agilent", "hgu95")]

# Restricting to the 'ME' and 'LE' cancer types

library(Biobase)
sample_names.vc <- Biobase::sampleNames(nci.mds[["agilent"]])
cancer_type.vc <- Biobase::pData(nci.mds[["agilent"]])[, "cancer"]
nci.mds <- nci.mds[sample_names.vc[cancer_type.vc %in% c("ME", "LE")], ]

# Selecting the significant features for PLS-DA, RF, and SVM classifiers

nci_cancer.biosign <- biosign(nci.mds, "cancer", bootI = 5)

# Getting back the updated MultiDataSet

nci.mds <- getMset(nci_cancer.biosign)

Class "biosign"

Description

The biosigner object class

Slots

methodVc: character vector: selected classifier(s) ('plsda', 'randomforest', or 'svm')
accuracyMN: numeric matrix: balanced accuracies for the full models, and the models restricted to the 'S' and 'AS' signatures
tierMC: character matrix: contains the tier ('S', 'A', 'B', 'C', 'D', or 'E') of each feature for each classifier
yFc: factor with two levels: response factor
modelLs: list: selected classifier(s) trained on the subset restricted to the 'S' features
signatureLs: list: 'S' signatures for each classifier
xSubMN: matrix: dataset restricted to the 'S' tier
AS: list: 'AS' signatures and corresponding trained classifiers, in addition to the dataset restricted to tiers 'S' and 'A' ('xMN') and the labels ('yFc')
eset: ExpressionSet: when 'biosign' has been applied to an ExpressionSet, the instance with additional columns in fData containing the selected features is stored here

Objects from the Class

Objects can be created by calls of the form new("biosign", ...) or by calling the biosign function

Author(s)

Philippe Rinaudo and Etienne Thevenot (CEA)

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

detach(diaplasma)

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

detach(diaplasma)

Class "biosignMultiDataSet"

Description

An S4 class to store the the biosign objects generated by the application of the 'biosign' method to a MultiDataSet instance

Slots

biosignLs: List: List of instances from the 'biosign' class corresponding to the models built on each ExpresssionSet

Objects from the Class

Objects can be created by calls of the form new("biosignMultiDataSet", ...) or by applying the biosign function to a MultiDataSet instance

Examples

# In progress
# In progress

Analysis of plasma from diabetic patients by LC-HRMS

Description

Plasma samples from 69 diabetic patients were analyzed by reversed-phase liquid chromatography coupled to high-resolution mass spectrometry (Orbitrap Exactive) in the negative ionization mode. The raw data were pre-processed with XCMS and CAMERA (5,501 features), corrected for signal drift, log10 transformed, and annotated with an in-house spectral database. The patient's age, body mass index, and diabetic type are recorded. These three clinical covariates are strongly associated, most of the type II patients being older and with a higher bmi than the type I individuals.

Format

A list with the following elements:

dataMatrix: a 69 samples x 5,501 features matrix of numeric type corresponding to the intensity profiles (values have been log10-transformed),
sampleMetadata: a 69 x 3 data frame, with the patients' diabetic type ('type', factor), age ('age', numeric), and body mass index ('bmi', numeric),
variableMetadata: a 5,501 x 8 data frame, with the median m/z ('mzmed', numeric) and the median retention time in seconds ('rtmed', numeric) from XCMS, the 'isotopes' (character), 'adduct' (character) and 'pcgroups' (numeric) annotations from CAMERA, and the names of the m/z and RT matching compounds from an in-house database of pure spectra from commercial metabolites ('spiDb', character).

Value

List containing the 'dataMatrix' matrix (numeric) of data (samples as rows, variables as columns), the 'sampleMetadata' data frame of sample metadata, and the variableMetadata data frame of variable metadata. Row names of 'dataMatrix' and 'sampleMetadata' are identical. Column names of 'dataMatrix' are identical to row names of 'variableMetadata'. For details see the 'Format' section above.

Source

'diaplasma' dataset.

References

Rinaudo P., Boudah S., Junot C. and Thevenot E.A. (2016). biosigner: a new method for the discovery of significant molecular signatures from omics data. Frontiers in Molecular Biosciences 3. doi:10.3389/fmolb.2016.00026

Accuracies of the full model and the models restricted to the signatures

Description

Balanced accuracies for the full models, and the models restricted to the 'S' and 'AS' signatures

Usage

getAccuracyMN(object)

## S4 method for signature 'biosign'
getAccuracyMN(object)
getAccuracyMN(object)

## S4 method for signature 'biosign'
getAccuracyMN(object)

Arguments

object

An S4 object of class biosign, created by the biosign function.

Value

A numeric matrix containing the balanced accuracies for the full models, and the models restricted to the 'S' and 'AS' signatures (predictions are obtained by using the resampling scheme selected with the 'bootI' and 'crossvalI' arguments)

Author(s)

Philippe Rinaudo and Etienne Thevenot (CEA)

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## individual boxplot of the selected signatures

getAccuracyMN(diaSign)

detach(diaplasma)

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## individual boxplot of the selected signatures

getAccuracyMN(diaSign)

detach(diaplasma)

Getting the biosigner signature from the SummarizedExperiment object

Description

The models are extracted as a list

Usage

getBiosign(object)

## S4 method for signature 'SummarizedExperiment'
getBiosign(object)

## S4 method for signature 'MultiAssayExperiment'
getBiosign(object)
getBiosign(object)

## S4 method for signature 'SummarizedExperiment'
getBiosign(object)

## S4 method for signature 'MultiAssayExperiment'
getBiosign(object)

Arguments

object

An S4 object of class SummarizedExperiment, once processed by the biosign method

Value

List of biosigner outputs contained in the SummarizedExperiment object

Author(s)

Etienne Thevenot, [email protected]

Examples


# Getting the diaplasma data set as a SummarizedExperiment

data(diaplasma)

diaplasma.se <- SummarizedExperiment::SummarizedExperiment(assays = list(diaplasma = t(diaplasma[["dataMatrix"]])),
                                                           colData = diaplasma[["sampleMetadata"]],
                                                           rowData = diaplasma[["variableMetadata"]])

diaplasma.se <- diaplasma.se[1:100, ]

# Selecting the features

diaplasma.se <- biosign(diaplasma.se, "type", bootI = 5, fig.pdfC = "none")

# Getting the signatures

diaplasma.biosign <- getBiosign(diaplasma.se)[["type_plsda.forest.svm"]]

diaplasma.biosign

# Getting the diaplasma data set as a SummarizedExperiment

data(diaplasma)

diaplasma.se <- SummarizedExperiment::SummarizedExperiment(assays = list(diaplasma = t(diaplasma[["dataMatrix"]])),
                                                           colData = diaplasma[["sampleMetadata"]],
                                                           rowData = diaplasma[["variableMetadata"]])

diaplasma.se <- diaplasma.se[1:100, ]

# Selecting the features

diaplasma.se <- biosign(diaplasma.se, "type", bootI = 5, fig.pdfC = "none")

# Getting the signatures

diaplasma.biosign <- getBiosign(diaplasma.se)[["type_plsda.forest.svm"]]

diaplasma.biosign

getEset method

Description

Extracts the complemented ExpressionSet when biosign has been applied to an ExpressionSet

Usage

## S4 method for signature 'biosign'
getEset(object)
## S4 method for signature 'biosign'
getEset(object)

Arguments

object

An S4 object of class biosign, created by biosign function.

Value

An S4 object of class ExpressionSet which contains the dataMatrix (t(exprs(eset))), and the sampleMetadata (pData(eset)) and variableMetadata (fData(eset)) with the additional columns containing the computed tiers for each feature and each classifier.

Author(s)

Etienne Thevenot, [email protected]

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## building the ExpresssionSet instance

diaSet <- Biobase::ExpressionSet(assayData = t(dataMatrix), 
                                 phenoData = new("AnnotatedDataFrame", 
                                                 data = sampleMetadata), 
                                 featureData = new("AnnotatedDataFrame", 
                                                   data = variableMetadata),
                                 experimentData = new("MIAME", 
                                                      title = "diaplasma"))

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
diaSet <- diaSet[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(diaSet, "type", bootI = 5)

diaSet <- biosigner::getEset(diaSign)
head(Biobase::pData(diaSet))
head(Biobase::fData(diaSet))

detach(diaplasma)

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## building the ExpresssionSet instance

diaSet <- Biobase::ExpressionSet(assayData = t(dataMatrix), 
                                 phenoData = new("AnnotatedDataFrame", 
                                                 data = sampleMetadata), 
                                 featureData = new("AnnotatedDataFrame", 
                                                   data = variableMetadata),
                                 experimentData = new("MIAME", 
                                                      title = "diaplasma"))

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
diaSet <- diaSet[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(diaSet, "type", bootI = 5)

diaSet <- biosigner::getEset(diaSign)
head(Biobase::pData(diaSet))
head(Biobase::fData(diaSet))

detach(diaplasma)

getMset method

Description

Extracts the complemented MultiDataSet when biosign has been applied to a MultiDataSet

Usage

## S4 method for signature 'biosignMultiDataSet'
getMset(object)
## S4 method for signature 'biosignMultiDataSet'
getMset(object)

Arguments

object

An S4 object of class biosignMultiDataSet, created by biosign function applied to a MultiDataSet

Value

An S4 object of class MultiDataSet.

Examples

# Loading the 'NCI60_4arrays' from the 'omicade4' package
data("NCI60_4arrays", package = "omicade4")
# Selecting two of the four datasets
setNamesVc <- c("agilent", "hgu95")
# Creating the MultiDataSet instance
nciMset <- MultiDataSet::createMultiDataSet()
# Adding the two datasets as ExpressionSet instances
for (setC in setNamesVc) {
  # Getting the data
  exprMN <- as.matrix(NCI60_4arrays[[setC]])
  pdataDF <- data.frame(row.names = colnames(exprMN),
                        cancer = substr(colnames(exprMN), 1, 2),
                        stringsAsFactors = FALSE)
  fdataDF <- data.frame(row.names = rownames(exprMN),
                        name = rownames(exprMN),
                        stringsAsFactors = FALSE)
  # Building the ExpressionSet
  eset <- Biobase::ExpressionSet(assayData = exprMN,
                                 phenoData = new("AnnotatedDataFrame",
                                                 data = pdataDF),
                                 featureData = new("AnnotatedDataFrame",
                                                   data = fdataDF),
                                 experimentData = new("MIAME",
                                                      title = setC))
  # Adding to the MultiDataSet
  nciMset <- MultiDataSet::add_eset(nciMset, eset, dataset.type = setC,
                                    GRanges = NA, warnings = FALSE)
}
# Restricting to the 'ME' and 'LE' cancer types
sampleNamesVc <- Biobase::sampleNames(nciMset[["agilent"]])
cancerTypeVc <- Biobase::pData(nciMset[["agilent"]])[, "cancer"]
nciMset <- nciMset[sampleNamesVc[cancerTypeVc %in% c("ME", "LE")], ]
# Summary of the MultiDataSet
nciMset
# Selecting the significant features for PLS-DA, RF, and SVM classifiers, and getting back the updated MultiDataSet
nciBiosign <- biosign(nciMset, "cancer")
nciMset <- getMset(nciBiosign)
# In the updated MultiDataSet, the updated featureData now contains the cancer_biosign_'classifier' columns
# indicating the selected features
lapply(Biobase::fData(nciMset), head)
# Loading the 'NCI60_4arrays' from the 'omicade4' package
data("NCI60_4arrays", package = "omicade4")
# Selecting two of the four datasets
setNamesVc <- c("agilent", "hgu95")
# Creating the MultiDataSet instance
nciMset <- MultiDataSet::createMultiDataSet()
# Adding the two datasets as ExpressionSet instances
for (setC in setNamesVc) {
  # Getting the data
  exprMN <- as.matrix(NCI60_4arrays[[setC]])
  pdataDF <- data.frame(row.names = colnames(exprMN),
                        cancer = substr(colnames(exprMN), 1, 2),
                        stringsAsFactors = FALSE)
  fdataDF <- data.frame(row.names = rownames(exprMN),
                        name = rownames(exprMN),
                        stringsAsFactors = FALSE)
  # Building the ExpressionSet
  eset <- Biobase::ExpressionSet(assayData = exprMN,
                                 phenoData = new("AnnotatedDataFrame",
                                                 data = pdataDF),
                                 featureData = new("AnnotatedDataFrame",
                                                   data = fdataDF),
                                 experimentData = new("MIAME",
                                                      title = setC))
  # Adding to the MultiDataSet
  nciMset <- MultiDataSet::add_eset(nciMset, eset, dataset.type = setC,
                                    GRanges = NA, warnings = FALSE)
}
# Restricting to the 'ME' and 'LE' cancer types
sampleNamesVc <- Biobase::sampleNames(nciMset[["agilent"]])
cancerTypeVc <- Biobase::pData(nciMset[["agilent"]])[, "cancer"]
nciMset <- nciMset[sampleNamesVc[cancerTypeVc %in% c("ME", "LE")], ]
# Summary of the MultiDataSet
nciMset
# Selecting the significant features for PLS-DA, RF, and SVM classifiers, and getting back the updated MultiDataSet
nciBiosign <- biosign(nciMset, "cancer")
nciMset <- getMset(nciBiosign)
# In the updated MultiDataSet, the updated featureData now contains the cancer_biosign_'classifier' columns
# indicating the selected features
lapply(Biobase::fData(nciMset), head)

Signatures selected by the models

Description

List of 'S' (or 'S' and 'A') signatures for each classifier

Usage

getSignatureLs(object, tierC = c("S", "AS")[1])

## S4 method for signature 'biosign'
getSignatureLs(object, tierC = c("S", "AS")[1])
getSignatureLs(object, tierC = c("S", "AS")[1])

## S4 method for signature 'biosign'
getSignatureLs(object, tierC = c("S", "AS")[1])

Arguments

`object`	An S4 object of class `biosign`, created by the `biosign` function.
`tierC`	Character: defines whether signatures from the 'S' tier only (default) or the ('S' and 'A') tiers should be returned

Value

List of 'S' (or 'S' and 'A') signatures for each classifier

Author(s)

Philippe Rinaudo and Etienne Thevenot (CEA)

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## individual boxplot of the selected signatures

getSignatureLs(diaSign)

detach(diaplasma)

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## individual boxplot of the selected signatures

getSignatureLs(diaSign)

detach(diaplasma)

Plot method for 'biosign' signature objects

Description

Displays classifier tiers or individual boxplots from selected features

This function plots signatures obtained by biosign.

Usage

## S4 method for signature 'biosign,ANY'
plot(
  x,
  y,
  tierMaxC = "S",
  typeC = c("tier", "boxplot")[1],
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'biosignMultiDataSet,ANY'
plot(
  x,
  y,
  tierMaxC = "S",
  typeC = c("tier", "boxplot")[1],
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)
## S4 method for signature 'biosign,ANY'
plot(
  x,
  y,
  tierMaxC = "S",
  typeC = c("tier", "boxplot")[1],
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

## S4 method for signature 'biosignMultiDataSet,ANY'
plot(
  x,
  y,
  tierMaxC = "S",
  typeC = c("tier", "boxplot")[1],
  plotSubC = "",
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2]
)

Arguments

`x`	An S4 object of class `biosign`, created by the `biosign` function.
`y`	Currently not used.
`tierMaxC`	Character: Maximum level of tiers to display: Either 'S' and 'A', (for boxplot), or also 'B', 'C', 'D', and 'E' (for tiers) by decreasing number of selections
`typeC`	Character: Plot type; either 'tier' [default] displaying the comparison of signatures up to the selected 'tierMaxC' or 'boxplot' showing the individual boxplots of the features selected by all the classifiers
`plotSubC`	Character: Graphic subtitle
`fig.pdfC`	Character: File name with '.pdf' extension for the figure; if 'interactive' (default), figures will be displayed interactively; if 'none', no figure will be generated
`info.txtC`	Character: File name with '.txt' extension for the printed results (call to sink()'); if 'interactive' (default), messages will be printed on the screen; if 'none', no verbose will be generated

Value

A plot is created on the current graphics device.

Author(s)

Philippe Rinaudo and Etienne Thevenot (CEA)

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## individual boxplot of the selected signatures

plot(diaSign, typeC = "boxplot")

detach(diaplasma)

data("NCI60", package = "ropls")
nci.mds <- NCI60[["mds"]]

# Restricting to the "agilent" and "hgu95" datasets

nci.mds <- nci.mds[, c("agilent", "hgu95")]

# Restricting to the 'ME' and 'LE' cancer types

library(Biobase)
sample_names.vc <- Biobase::sampleNames(nci.mds[["agilent"]])
cancer_type.vc <- Biobase::pData(nci.mds[["agilent"]])[, "cancer"]
nci.mds <- nci.mds[sample_names.vc[cancer_type.vc %in% c("ME", "LE")], ]

# Selecting the significant features for PLS-DA, RF, and SVM classifiers

nci_cancer.biosign <- biosign(nci.mds, "cancer", bootI = 5)
# Plotting the selected signatures
plot(nci_cancer.biosign)
## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## individual boxplot of the selected signatures

plot(diaSign, typeC = "boxplot")

detach(diaplasma)

data("NCI60", package = "ropls")
nci.mds <- NCI60[["mds"]]

# Restricting to the "agilent" and "hgu95" datasets

nci.mds <- nci.mds[, c("agilent", "hgu95")]

# Restricting to the 'ME' and 'LE' cancer types

library(Biobase)
sample_names.vc <- Biobase::sampleNames(nci.mds[["agilent"]])
cancer_type.vc <- Biobase::pData(nci.mds[["agilent"]])[, "cancer"]
nci.mds <- nci.mds[sample_names.vc[cancer_type.vc %in% c("ME", "LE")], ]

# Selecting the significant features for PLS-DA, RF, and SVM classifiers

nci_cancer.biosign <- biosign(nci.mds, "cancer", bootI = 5)
# Plotting the selected signatures
plot(nci_cancer.biosign)

Predict method for 'biosign' signature objects

Description

This function predicts values based upon biosign classifiers trained on the 'S' signature

Usage

## S4 method for signature 'biosign'
predict(object, newdata, tierMaxC = "S")
## S4 method for signature 'biosign'
predict(object, newdata, tierMaxC = "S")

Arguments

`object`	An S4 object of class `biosign`, created by `biosign` function.
`newdata`	Either a data frame or a matrix, containing numeric columns only, with column names identical to the 'x' used for model training with 'biosign'.
`tierMaxC`	Character: Maximum level of tiers to display: Either 'S'or 'A'.

Value

Data frame with the predictions for each classifier as factor columns.

Author(s)

Philippe Rinaudo and Etienne Thevenot (CEA)

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## training the classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## fitted values (for the subsets restricted to the 'S' signatures)
sFitDF <- predict(diaSign)

## confusion tables
print(lapply(sFitDF, function(predFc) table(actual = sampleMetadata[,
"type"], predicted = predFc)))

## balanced accuracies
sapply(sFitDF, function(predFc) { conf <- table(sampleMetadata[,
"type"], predFc)
conf <- sweep(conf, 1, rowSums(conf), "/")
mean(diag(conf))
})
## note that these values are slightly different from the accuracies
## returned by biosign because the latter are computed by using the
## resampling scheme selected by the bootI or crossvalI arguments
getAccuracyMN(diaSign)["S", ]

detach(diaplasma)

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## training the classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## fitted values (for the subsets restricted to the 'S' signatures)
sFitDF <- predict(diaSign)

## confusion tables
print(lapply(sFitDF, function(predFc) table(actual = sampleMetadata[,
"type"], predicted = predFc)))

## balanced accuracies
sapply(sFitDF, function(predFc) { conf <- table(sampleMetadata[,
"type"], predFc)
conf <- sweep(conf, 1, rowSums(conf), "/")
mean(diag(conf))
})
## note that these values are slightly different from the accuracies
## returned by biosign because the latter are computed by using the
## resampling scheme selected by the bootI or crossvalI arguments
getAccuracyMN(diaSign)["S", ]

detach(diaplasma)

Show method for 'biosign' signature objects

Description

Prints the selected features and the accuracies of the classifiers.

Usage

## S4 method for signature 'biosign'
show(object)
## S4 method for signature 'biosign'
show(object)

Arguments

object

An S4 object of class biosign, created by the biosign function.

Value

Invisible.

Author(s)

Philippe Rinaudo and Etienne Thevenot (CEA)

Examples


## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

diaSign

detach(diaplasma)

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

## signature selection for all 3 classifiers
## a bootI = 5 number of bootstraps is used for this example
## we recommend to keep the default bootI = 50 value for your analyzes

set.seed(123)
diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

diaSign

detach(diaplasma)

Package 'biosigner'

Help Index

Molecular signature discovery from omics data

Description

Author(s)

Builds the molecular signature.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Class "biosign"

Description

Slots

Objects from the Class

Author(s)

See Also

Examples

Class "biosignMultiDataSet"

Description

Slots

Objects from the Class

See Also

Examples

Analysis of plasma from diabetic patients by LC-HRMS

Description

Format

Value

Source

References

Accuracies of the full model and the models restricted to the signatures

Description

Usage

Arguments

Value

Author(s)

Examples

Getting the biosigner signature from the SummarizedExperiment object

Description

Usage

Arguments

Value

Author(s)

Examples

getEset method

Description

Usage

Arguments

Value

Author(s)

Examples

getMset method

Description

Usage

Arguments

Value

Examples

Signatures selected by the models

Description

Usage

Arguments

Value

Author(s)

Examples

Plot method for 'biosign' signature objects

Description

Usage

Arguments

Value

Author(s)

Examples

Predict method for 'biosign' signature objects

Description

Usage

Arguments

Value

Author(s)

Examples