Package 'XAItest' reference manual

Title:	XAItest: Enhancing Feature Discovery with eXplainable AI
Description:	XAItest is an R Package that identifies features using eXplainable AI (XAI) methods such as SHAP or LIME. This package allows users to compare these methods with traditional statistical tests like t-tests, empirical Bayes, and Fisher's test. Additionally, it includes a system that enables the comparison of feature importance with p-values by incorporating calibrated simulated data.
Authors:	Ghislain FIEVET [aut, cre] , Sébastien HERGALANT [aut]
Maintainer:	Ghislain FIEVET <[email protected]>
License:	MIT + file LICENSE
Version:	0.99.25
Built:	2025-03-24 03:43:33 UTC
Source:	https://github.com/bioc/XAItest

The getFeatImpThresholds function identifies the minimum level of feature importance required to exceed a specified significance threshold, which is determined by the p-value.

Description

The getFeatImpThresholds function identifies the minimum level of feature importance required to exceed a specified significance threshold, which is determined by the p-value.

Usage

getFeatImpThresholds(
  df,
  refPvalColumn = "adjpval",
  featImpColumns = "feat",
  refPval = 0.05
)
getFeatImpThresholds(
  df,
  refPvalColumn = "adjpval",
  featImpColumns = "feat",
  refPval = 0.05
)

Arguments

`df`	A dataframe containing p-value columns and feature importance columns.
`refPvalColumn`	Optional; the name of the column containing the reference p-values. If not provided, the function will search for a column name containing "adjpval", if not existing a column name containing "pval" (case insensitive).
`featImpColumns`	Optional; a vector of column names containing the feature importance values. If not provided, the function will search for column names containing "feat" (case insensitive).
`refPval`	The reference p-value threshold for filtering features. Defaults to 0.05.

Details

The reference p-value column can be given by the refPvalColumn argument. If not provided, the function will search for the first df column name containing "pval". The feature importance columns can be given by the featImpColumns argument. If not provided, the function will search for all df column names containing "feat".

It then selects feature importance values of features with p-values under the specified threshold and returns the lowest.

This is useful for identifying the most significant features in a dataset based on statistical testing, aiding in the interpretation of machine learning models and exploratory data analysis.

Value

A named vector of minimum feature importance values for each feature passing the p-value filter. The names of the vector elements correspond to the feature importance columns in df.

Examples

# Assuming `df` is a dataframe with columns `feature1_pval`,
# `feature2_pval`, `feature1_imp`, `feature2_imp`
df <- data.frame(pval = c(0.04, 0.02, 0.06, 0.8),
                 adjPval = c(0.01, 0.03, 0.05, 0.9),
                 feat_imp_1 = c(0.2, 0.3, 0.1, 0.6),
                 feat_imp_2 = c(0.4, 0.5, 0.3, 0.6))
thresholds <- getFeatImpThresholds(df)
print(thresholds)

# Assuming `df` is a dataframe with columns `feature1_pval`,
# `feature2_pval`, `feature1_imp`, `feature2_imp`
df <- data.frame(pval = c(0.04, 0.02, 0.06, 0.8),
                 adjPval = c(0.01, 0.03, 0.05, 0.9),
                 feat_imp_1 = c(0.2, 0.3, 0.1, 0.6),
                 feat_imp_2 = c(0.4, 0.5, 0.3, 0.6))
thresholds <- getFeatImpThresholds(df)
print(thresholds)

Get the Metrics Table

Description

This method retrieves the metrics table from an ObjXAI object.

Usage

getMetricsTable(object)
getMetricsTable(object)

Arguments

object

An ObjXAI object.

Value

A data frame containing the metrics.

Examples


obj <- new("ObjXAI", 
           data = data.frame(), 
           dataSim = data.frame(), 
           metricsTable = data.frame(Metric = c("Accuracy", "Precision"), 
                                       Value = c(0.95, 0.89)), 
           map = list(), 
           models = list(), 
           modelPredictions = list(), 
           args = list())
getMetricsTable(obj)

obj <- new("ObjXAI", 
           data = data.frame(), 
           dataSim = data.frame(), 
           metricsTable = data.frame(Metric = c("Accuracy", "Precision"), 
                                       Value = c(0.95, 0.89)), 
           map = list(), 
           models = list(), 
           modelPredictions = list(), 
           args = list())
getMetricsTable(obj)

The mapPvalImportance function displays a datatable with color-coded cells based on significance thresholds for feature importance and p-value columns.

Description

The mapPvalImportance function displays a datatable with color-coded cells based on significance thresholds for feature importance and p-value columns.

Usage

mapPvalImportance(
  objXAI,
  refPvalColumn = "adjpval",
  featImpColumns = "feat",
  pvalColumns = NULL,
  refPval = 0.05
)
mapPvalImportance(
  objXAI,
  refPvalColumn = "adjpval",
  featImpColumns = "feat",
  pvalColumns = NULL,
  refPval = 0.05
)

Arguments

`objXAI`	An object of class ObjXAI.
`refPvalColumn`	Optional; the name of the column containing reference p-values for feature importance. If not provided, the function will attempt to auto-detect.
`featImpColumns`	Optional; a vector of column names containing feature importance values. If not provided, the function will attempt to auto-detect.
`pvalColumns`	Optional; a vector of column names containing p-values. If not provided, the function searches for columns containing "pval" (case insensitive).
`refPval`	The reference p-value threshold used for filtering. Defaults to 0.05.

Details

The function first identifies the relevant p-value columns and feature importance columns, if not explicitly provided. It then calculates feature importance thresholds based on the specified p-value threshold. Then the dataframe is displaid with color-coded cells based on significance thresholds for feature importance and p-value columns.

Value

A dataframe and a datatable object with color-coded cells based on significance thresholds for feature importance and p-value columns.

Examples


df <- data.frame(
  feature1 = rnorm(10),
  feature2 = rnorm(10, mean = 5),
  feature3 = runif(10, min = 0, max = 10),
  feature4 = c(rnorm(5), rnorm(5, mean = 5)),
  categ = c(rep("Cat1",5), rep("Cat2", 5))
)

results <- XAI.test(df, y = "categ", simData = TRUE)

my_map <- mapPvalImportance(results)
my_map$df
my_map$dt

df <- data.frame(
  feature1 = rnorm(10),
  feature2 = rnorm(10, mean = 5),
  feature3 = runif(10, min = 0, max = 10),
  feature4 = c(rnorm(5), rnorm(5, mean = 5)),
  categ = c(rep("Cat1",5), rep("Cat2", 5))
)

results <- XAI.test(df, y = "categ", simData = TRUE)

my_map <- mapPvalImportance(results)
my_map$df
my_map$dt

Models Overview

Description

Returns mse, rmse, mae and r2 of regression models or accuracy, precision, recall and f1_score of classification models.

Usage

modelsOverview(objXAI, verbose = FALSE)
modelsOverview(objXAI, verbose = FALSE)

Arguments

`objXAI`	An object of class ObjXAI.
`verbose`	Logical; if TRUE, prints the models names.

Value

Returns mse, rmse, mae and r2 of regression models or accuracy, precision, recall and f1_score of classification models.

Examples


# Example with SummarizedExperiment object with a regression dataset.

library(S4Vectors)
library(SummarizedExperiment)

df <- data.frame(
 feature1 = rnorm(100),
 feature2 = rnorm(100, mean = 5),
 feature3 = runif(100, min = 0, max = 10),
 feature4 = c(rnorm(50), rnorm(50, mean = 5)),
 y = 1:100
)

assays <- SimpleList(counts = as.matrix(t(df[, 1:4])))
colData <- DataFrame(y = df[,"y"])
se <- SummarizedExperiment(assays = assays,
                           colData = colData)

resultsRegr <- XAI.test(se, y = "y", verbose = TRUE)

modelsOverview(resultsRegr)

# Example with a dataframe with a classification dataset.
df <- data.frame(
 feature1 = rnorm(100),
 feature2 = rnorm(100, mean = 5),
 feature3 = runif(100, min = 0, max = 10),
 feature4 = c(rnorm(50), rnorm(50, mean = 5)),
 y = c(rep("Cat1", 50), rep("Cat2", 50))
)
resultsClassif <- XAI.test(df, y = "y", verbose = TRUE)

modelsOverview(resultsClassif)
# Example with SummarizedExperiment object with a regression dataset.

library(S4Vectors)
library(SummarizedExperiment)

df <- data.frame(
 feature1 = rnorm(100),
 feature2 = rnorm(100, mean = 5),
 feature3 = runif(100, min = 0, max = 10),
 feature4 = c(rnorm(50), rnorm(50, mean = 5)),
 y = 1:100
)

assays <- SimpleList(counts = as.matrix(t(df[, 1:4])))
colData <- DataFrame(y = df[,"y"])
se <- SummarizedExperiment(assays = assays,
                           colData = colData)

resultsRegr <- XAI.test(se, y = "y", verbose = TRUE)

modelsOverview(resultsRegr)

# Example with a dataframe with a classification dataset.
df <- data.frame(
 feature1 = rnorm(100),
 feature2 = rnorm(100, mean = 5),
 feature3 = runif(100, min = 0, max = 10),
 feature4 = c(rnorm(50), rnorm(50, mean = 5)),
 y = c(rep("Cat1", 50), rep("Cat2", 50))
)
resultsClassif <- XAI.test(df, y = "y", verbose = TRUE)

modelsOverview(resultsClassif)

ObjXAI class

Description

ObjXAI is a class used to store the output values of the XAI.test function.

Value

A ObjXAI object

Examples


obj <- new("ObjXAI", 
           data = data.frame(), 
           dataSim = data.frame(), 
           metricsTable = data.frame(Metric = c("Accuracy", "Precision"), 
                                       Value = c(0.95, 0.89)), 
           map = list(), 
           models = list(), 
           modelPredictions = list(), 
           args = list())

obj <- new("ObjXAI", 
           data = data.frame(), 
           dataSim = data.frame(), 
           metricsTable = data.frame(Metric = c("Accuracy", "Precision"), 
                                       Value = c(0.95, 0.89)), 
           map = list(), 
           models = list(), 
           modelPredictions = list(), 
           args = list())

Plot the model

Description

This function plots the model.

Usage

plotModel(objXAI, modelName, xFeature, yFeature = "")
plotModel(objXAI, modelName, xFeature, yFeature = "")

Arguments

`objXAI`	The ObjXAI object created with the XAItest function
`modelName`	The name of the model, can be found in 'names(objXAI@models)'
`xFeature`	The x feature
`yFeature`	The y feature

Value

A plot

Examples


data(iris)
iris = subset(iris, Species == "setosa" | Species == "versicolor")
iris$Species = as.character(iris$Species)
objXAI <- XAI.test(iris, y = "Species")
plotModel(objXAI, "RF_feat_imp", "Sepal.Length", "Sepal.Width")

data(iris)
iris = subset(iris, Species == "setosa" | Species == "versicolor")
iris$Species = as.character(iris$Species)
objXAI <- XAI.test(iris, y = "Species")
plotModel(objXAI, "RF_feat_imp", "Sepal.Length", "Sepal.Width")

Set the Metrics Table

Description

This method sets the metrics table for an ObjXAI object.

Usage

setMetricsTable(object, value)
setMetricsTable(object, value)

Arguments

`object`	An ObjXAI object.
`value`	A data frame to set as the metrics table.

Value

The modified ObjXAI object.

Examples


obj <- new("ObjXAI", 
           data = data.frame(), 
           dataSim = data.frame(), 
           metricsTable = data.frame(Metric = c("Accuracy", "Precision"), 
                                       Value = c(0.95, 0.89)), 
           map = list(), 
           models = list(), 
           modelPredictions = list(), 
           args = list())

setMetricsTable(obj, data.frame(Metric = c("Accuracy", "Precision", "Recall"),
                               Value = c(0.95, 0.89, 0.91)))

obj <- new("ObjXAI", 
           data = data.frame(), 
           dataSim = data.frame(), 
           metricsTable = data.frame(Metric = c("Accuracy", "Precision"), 
                                       Value = c(0.95, 0.89)), 
           map = list(), 
           models = list(), 
           modelPredictions = list(), 
           args = list())

setMetricsTable(obj, data.frame(Metric = c("Accuracy", "Precision", "Recall"),
                               Value = c(0.95, 0.89, 0.91)))

Show Method for ObjXAI

Description

Prints the first 5 rows of the metrics table from an ObjXAI object.

Usage

## S4 method for signature 'ObjXAI'
show(object)
## S4 method for signature 'ObjXAI'
show(object)

Arguments

object

An ObjXAI object.

Value

The first 5 rows of the metrics table.

The XAI.test function complements t-test and correlation analyses in feature discovery by integrating eXplainable AI techniques such as feature importance, SHAP, LIME, or custom functions. It provides the option of automatic integration of simulated data to facilitate matching significance between p-values and feature importance.

Description

The XAI.test function complements t-test and correlation analyses in feature discovery by integrating eXplainable AI techniques such as feature importance, SHAP, LIME, or custom functions. It provides the option of automatic integration of simulated data to facilitate matching significance between p-values and feature importance.

Usage

XAI.test(
  data,
  y = "y",
  featImpAgr = "mean",
  simData = FALSE,
  simMethod = "regrnorm",
  simPvalTarget = 0.045,
  adjMethod = "bonferroni",
  customPVals = NULL,
  customFeatImps = NULL,
  modelType = "default",
  corMethod = "pearson",
  defaultMethods = c("ttest", "ebayes", "cor", "lm", "rf", "shap", "lime"),
  caretMethod = "rf",
  caretTrainArgs = NULL,
  verbose = FALSE
)
XAI.test(
  data,
  y = "y",
  featImpAgr = "mean",
  simData = FALSE,
  simMethod = "regrnorm",
  simPvalTarget = 0.045,
  adjMethod = "bonferroni",
  customPVals = NULL,
  customFeatImps = NULL,
  modelType = "default",
  corMethod = "pearson",
  defaultMethods = c("ttest", "ebayes", "cor", "lm", "rf", "shap", "lime"),
  caretMethod = "rf",
  caretTrainArgs = NULL,
  verbose = FALSE
)

Arguments

`data`	SummarizedExperiment or dataframe containing the data. If dataframe rows are samples and columns are features.
`y`	Name of the SummarizedExperiment metadata or column of the dataframe containing the target variable. Default to "y".
`featImpAgr`	Can be "mean" or "max_abs". It defines how the feature importance is aggregated.
`simData`	If TRUE, a simulated feature column is added to the dataframe to target a defined p-value that will serve as a benchmark for determining the significance thresholds of feature importances.
`simMethod`	Method used to generate the simulated data. Can be "regrnorm" or "rnorm", "regnorm" by default. "regrnorm" creates simulated data points that match specific percentiles within a normal distribution, defined by a given mean and standard deviation. "rnorm" creates simulated data points that follow a normal distribution. "regrnorm is more accurate in targeting the specified p-value.
`simPvalTarget`	Target p-value for the simulated data. It is used to determine the significance thresholds of feature importances.
`adjMethod`	Method used to adjust the p-values. "bonferroni" by default, can be any other method available in the p.adjust function.
`customPVals`	List of custom functions that compute p-values. The functions must take the dataframe and the target variable as arguments and return a names list with: 'pvals' => a dataframe with the p-values. 'adjPVal' => a dataframe with the adjusted p-values. Optional. 'model' => the prediction model object. Optional.
`customFeatImps`	List of custom functions that compute feature importances. The functions must take the dataframe and the target variable as arguments and return a names list with: 'featImps' => a dataframe with the feature importances. The names of the functions will be used as the column names in the output dataframe. Mandatory. 'model' => the predictionmodel object. Optional.
`modelType`	Type of the model. Can be "classification", "regression" or "default". If "default", the function will try to infer the model type from the target variable. If the target variable is a character, the model type will be "classification". If the target variable is numeric, the model type will be "regression".
`corMethod`	Method used to compute the correlation between the features and the target variable. "pearson" by default, can be any other method available in the cor.test function.
`defaultMethods`	List of default p-values and feature importances methods to compute. By default "ttest", "ebayes", "cor", "lm", "rf", "shap" and "lime".
`caretMethod`	Method used by the caret package to train the model. "rf" by default.
`caretTrainArgs`	List of arguments to pass to the caret::train function. Optional.
`verbose`	If TRUE, the function will print messages to the console.

Details

The XAI.test function is designed to extend the capabilities of conventional statistical analysis methods for feature discovery, such as t-tests and correlation, by incorporating techniques from explainable AI (XAI), such as feature importance, SHAP, LIME, or custom functions. This function aims at identifying significant features that influence a given target variable in a dataset, supporting both categorical and numerical target values. A key feature of XAI.test is its ability to automatically incorporate simulated data into the analysis. This simulated data is specifically designed to establish significance thresholds for feature importance values based on the p-values. This capability is useful for reinforcing the reliability of the feature importance metrics derived from machine learning models, by directly comparing them with established statistical significance metrics.

Value

A dataframe containing the pvalues and the feature importances of each features computed by the different methods.

Examples


library(S4Vectors)
library(SummarizedExperiment)

# With a dataframe
data <- data.frame(
  feature1 = rnorm(100),
  feature2 = rnorm(100, mean = 5),
  feature3 = runif(100, min = 0, max = 10),
  feature4 = c(rnorm(50), rnorm(50, mean = 5)),
  y = c(rep("Cat1", 50), rep("Cat2", 50))
)

results <- XAI.test(data, y = "y", verbose = TRUE)
results

# With a SummarizedExperiment
assays <- SimpleList(counts = as.matrix(t(data[, 1:4])))
colData <- DataFrame(y = data[,"y"])
se <- SummarizedExperiment(assays = assays,
                          colData = colData)
results <- XAI.test(se, y = "y", verbose = TRUE)
results

library(S4Vectors)
library(SummarizedExperiment)

# With a dataframe
data <- data.frame(
  feature1 = rnorm(100),
  feature2 = rnorm(100, mean = 5),
  feature3 = runif(100, min = 0, max = 10),
  feature4 = c(rnorm(50), rnorm(50, mean = 5)),
  y = c(rep("Cat1", 50), rep("Cat2", 50))
)

results <- XAI.test(data, y = "y", verbose = TRUE)
results

# With a SummarizedExperiment
assays <- SimpleList(counts = as.matrix(t(data[, 1:4])))
colData <- DataFrame(y = data[,"y"])
se <- SummarizedExperiment(assays = assays,
                          colData = colData)
results <- XAI.test(se, y = "y", verbose = TRUE)
results

Package 'XAItest'

Help Index

The getFeatImpThresholds function identifies the minimum level of feature importance required to exceed a specified significance threshold, which is determined by the p-value.

Description

Usage

Arguments

Details

Value

Examples

Get the Metrics Table

Description

Usage

Arguments

Value

Examples

The mapPvalImportance function displays a datatable with color-coded cells based on significance thresholds for feature importance and p-value columns.

Description

Usage

Arguments

Details

Value

Examples

Models Overview

Description

Usage

Arguments

Value

Examples

ObjXAI class

Description

Value

Examples

Plot the model

Description

Usage

Arguments

Value

Examples

Set the Metrics Table

Description

Usage

Arguments

Value

Examples

Show Method for ObjXAI

Description

Usage

Arguments

Value

Description

Usage

Arguments

Details

Value

Examples