Title: | adverSCarial, generate and analyze the vulnerability of scRNA-seq classifier to adversarial attacks |
---|---|
Description: | adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq classifiers to adversarial attacks. The package is versatile and provides a format for integrating any type of classifier. It offers functions for studying and generating two types of attacks, single gene attack and max change attack. The single-gene attack involves making a small modification to the input to alter the classification. The max-change attack involves making a large modification to the input without changing its classification. The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers against adversarial attacks. |
Authors: | Ghislain FIEVET [aut, cre] , Sébastien HERGALANT [aut] |
Maintainer: | Ghislain FIEVET <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.5.0 |
Built: | 2024-11-29 03:09:19 UTC |
Source: | https://github.com/bioc/adverSCarial |
Implementation of the IDG4C algorithm.
advCGD( expr, clusters, target, classifier, genes = NULL, exclNewTargets = NULL, newTarget = NULL, alpha = 0.1, epsilon = 0, slot = NULL, stopAtSwitch = TRUE, verbose = FALSE )
advCGD( expr, clusters, target, classifier, genes = NULL, exclNewTargets = NULL, newTarget = NULL, alpha = 0.1, epsilon = 0, slot = NULL, stopAtSwitch = TRUE, verbose = FALSE )
expr |
a matrix, a data.fram or a Seurat object |
clusters |
a character vector of the clusters to which the cells belong |
target |
the name of the cluster to modify |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
genes |
the character vector of genes to study |
exclNewTargets |
the character vector of cell types to exclude as new target |
newTarget |
the name of the new cell type cell type target, this will be the new prediction cell type after the attack |
alpha |
the alpha parameter of the IDG4C algorithm |
epsilon |
the epsilon parameter of the IDG4C algorithm |
slot |
the slot to modify in case of Seurat object |
stopAtSwitch |
logical, set to TRUE to stop the attack when the new target set to FALSE to continue the attack until all genes are tested |
verbose |
logical, set to TRUE to activate verbose mode |
this function is an implementation of the IDG4C algorithm which permits to generate adversarial attacks on a classifier on a given cluster. The attack is done by modifying the expression of the genes by gradient descent after having inferred if by the finite difference method. Two parameters alpha and epsilon are used to control the step size of the modifications.
a list containing the modified expression matrix, the list of modified genes, the summary of the attack by gene, the summary of the attack, the new cell types predictions and the original cell types predictions
MyClassifier <- function(expr, clusters, target) { typePredictions <- as.data.frame(matrix(nrow=nrow(expr), ncol=length(unique(clusters)))) colnames(typePredictions) <- unique(clusters) typePredictions[unique(clusters)[1]] <- c(1,0,0,0) typePredictions[unique(clusters)[2]] <- c(0,1,1,1) rownames(typePredictions) <- 1:4 list(prediction="T cell", odd=1, typePredictions=t(typePredictions), cellTypes=c("B cell","T cell","T cell","T cell")) } rna_expression <- data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3)) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","T cell","T cell","T cell") advCGD(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, verbose=TRUE)
MyClassifier <- function(expr, clusters, target) { typePredictions <- as.data.frame(matrix(nrow=nrow(expr), ncol=length(unique(clusters)))) colnames(typePredictions) <- unique(clusters) typePredictions[unique(clusters)[1]] <- c(1,0,0,0) typePredictions[unique(clusters)[2]] <- c(0,1,1,1) rownames(typePredictions) <- 1:4 list(prediction="T cell", odd=1, typePredictions=t(typePredictions), cellTypes=c("B cell","T cell","T cell","T cell")) } rna_expression <- data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3)) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","T cell","T cell","T cell") advCGD(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, verbose=TRUE)
advChar
is a class used to store the output values
of the advMaxChange
function. The result can be a vector
of few thousands genes, so a specific show method is associated to
this class to avoid overflooding the R scripts outputs.
A advChar object
MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } genes <- paste0("gene_",1:10000) rna_expression <- data.frame(lapply(genes, function(x) numeric(0))) rna_expression <- rbind(rna_expression, rep(1,10000)) rna_expression <- rbind(rna_expression, rep(2,10000)) colnames(rna_expression) <- genes clusters_id <- c("B cell","T cell") max_change_genes <- advMaxChange(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99") max_change_genes
MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } genes <- paste0("gene_",1:10000) rna_expression <- data.frame(lapply(genes, function(x) numeric(0))) rna_expression <- rbind(rna_expression, rep(1,10000)) rna_expression <- rbind(rna_expression, rep(2,10000)) colnames(rna_expression) <- genes clusters_id <- c("B cell","T cell") max_change_genes <- advMaxChange(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99") max_change_genes
Grid search of min change adversarial attack. Tries each combination on a cluster, given a list of genes and a list of modifications.
advGridMinChange( exprs, clusters, target, classifier, genes, modifications = list(c("perc1"), c("perc99")), returnFirstFound = FALSE, argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE, iamsure = FALSE )
advGridMinChange( exprs, clusters, target, classifier, genes, modifications = list(c("perc1"), c("perc99")), returnFirstFound = FALSE, argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE, iamsure = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. By default matrix and data.frame are converted to DelayedMatrix for memory performance, see 'argForModif' argument for options. |
clusters |
a character vector of the clusters to which the cells belong |
target |
the name of the cluster to modify |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
genes |
the character vector of genes to study |
modifications |
the list of the modifications to study |
returnFirstFound |
set to TRUE to return result when a the first misclassification is found |
argForClassif |
the type of the first argument to feed to the classifier function. 'data.frame' by default, can be 'SingleCellExperiment' or 'DelayedMatrix'. |
argForModif |
type of matrix during for the modification, 'data.frame' by default. Can be 'DelayedMatrix', which needs less memory but is slower. |
verbose |
logical, set to TRUE to activate verbose mode |
iamsure |
logical, prevents from expansive calculations
when |
This function aims to find the shortest combination of genes allowing to make a min change attack. It will test every possible combination for a given gene list. This function can take a long time to run, and we recommand to use the random walk search advRandWalkMinChange function instead for lists above 10 genes.
You can specify a list of modifications as so, each item of the list should be 1 or 2 length size. The 1 length vector must contain the prerecorded modifications, 'perc1' or 'perc99'. The 2 length vector must have as first item:
'fixed', in this case the second item should be the value to be replaced by.
'full_row_fct', 'target_row_fct', 'target_matrix_fct' or 'full_matrix_fct'. In this case the second item should be a function. Let's say we want to analysis the susceptibility to min change attack for 3 modifications: "perc1", the modification of each value of the cluster by 1000, and a custom modification stored inside a function myFct. Then the 'modification' parameter should be: my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myFct))
DataFrame results of the classification of all the grid combinations
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advGridMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = list(c("perc1"), c("perc99"))) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) advGridMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = my_modifications)
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advGridMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = list(c("perc1"), c("perc99"))) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) advGridMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = my_modifications)
advList
is a class used to store the output values
of the advSingleGene
function. The result can be a list
of few thousands genes:cell_type key/values, so a specific show
method is associated to this class to avoid overflooding the R
scripts outputs.
A advList object
MyClassifier <- function(expr, clusters, target) { c("B cell", 0.9) } rna_expression <- data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3)) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") adv_min_change <- advSingleGene(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99") adv_min_change
MyClassifier <- function(expr, clusters, target) { c("B cell", 0.9) } rna_expression <- data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3)) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") adv_min_change <- advSingleGene(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99") adv_min_change
Find a max change adversarial attack. It finds the longer list of genes you can modify on a cluster without changing its classification.
advMaxChange( exprs, clusters, target, classifier, exclGenes = c(), genes = c(), advMethod = "perc99", advFixedValue = 3, advFct = NULL, maxSplitSize = 1, argForClassif = "data.frame", argForModif = "data.frame", slot = NULL, verbose = FALSE )
advMaxChange( exprs, clusters, target, classifier, exclGenes = c(), genes = c(), advMethod = "perc99", advFixedValue = 3, advFct = NULL, maxSplitSize = 1, argForClassif = "data.frame", argForModif = "data.frame", slot = NULL, verbose = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. By default, these are converted to a data.frame to increase speed performance during modifications. However, this conversion can consume a significant amount of memory, see 'argForModif' argument for options. |
clusters |
a character vector of the clusters to which the cells belong |
target |
the name of the cluster to modify |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
exclGenes |
a list of genes to exclude from the analysis |
genes |
a list of genes in case you want to limit the attack on a subset of genes |
advMethod |
the name of the method to use |
advFixedValue |
the numeric value to use in case of
advMethod= |
advFct |
the function to use in case advMethod
belongs to the following list: |
maxSplitSize |
max size of dichotomic slices. |
argForClassif |
the type of the first argument to feed to the classifier function. 'data.frame' by default, can be 'SingleCellExperiment' or 'DelayedMatrix'. |
argForModif |
type of matrix during for the modification, 'data.frame' by default. Can be 'DelayedMatrix', which needs less memory but is slower. |
verbose |
logical, set to TRUE to activate verbose mode |
This function aims to get the largest part of the genes that can be modified without altering the classification, considering a given modification. You can refer to the 'advModifications' function documentation to more details on how to define a modification. The search is made by a dichotomic process, on a reccursive function. At each iteration the function splits the genes in two groups. It proceeds to the modification of the RNA gene value of the first group, makes its classification. Then three possible scenarios:
the classification is the same as the target cluster. We concat the genes list to the previous one, make the classification, and it still gives same classification. Then we return the genes list.
the classification is the same as the target cluster. We concat the genes list to the previous one, make the classification, and it gives a different classification. This happens often, you can modify the gene A with a classification of T cell, or modify the gene B with a classification of T cell, but modifying A and B returns another classification. In this case we split the genes list in two and try again.
the classification is not the same as the target cluster. In this case we split the genes list in two and try again. The iteration process stops when the length of the genes list is lower than the value of the 'maxSplitSize' argument. So you should set it to 1 to have the maximum number of genes for the max change attack. This function is used by the 'overMaxChange' function with a default argument value of 100 to increase speed, and still returns significant results.
a character vector of genes you can modify on a cluster without modifying its classification
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advMaxChange(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99")
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advMaxChange(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99")
Returns a modified RNA expression DelayedMatrix, or a modified SingleCellExperiment, for a given cluster, for a given modification.
advModifications( exprs, genes, clusters, target, advMethod = "perc99", advFixedValue = 3, advFct = NULL, argForClassif = "DelayedMatrix", argForModif = "data.frame", slot = NULL, verbose = FALSE )
advModifications( exprs, genes, clusters, target, advMethod = "perc99", advFixedValue = 3, advFct = NULL, argForClassif = "DelayedMatrix", argForModif = "data.frame", slot = NULL, verbose = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. By default, these are converted to a data.frame to increase speed performance during modifications. However, this conversion can consume a significant amount of memory, see 'argForModif' argument for options. |
genes |
the character vector of genes to modify |
clusters |
a character vector of the clusters to which the cells belong |
target |
the name of the cluster to modify |
advMethod |
the name of the method to use |
advFixedValue |
the numeric value to use in case of
advMethod= |
advFct |
the function to use in case advMethod
belongs to the following list: |
argForClassif |
the type of the first argument to feed to the classifier function. 'DelayedMatrix' by default, can be 'SingleCellExperiment' or 'data.frame'. |
argForModif |
type of matrix during for the modification, 'DelayedMatrix' by default. Can be 'data.frame', which is faster, but need more memory. |
verbose |
logical, set to TRUE to activate verbose mode |
The motivation for this function is to standardize the modifications we want to study in the attacks. We give as argument a DelayedMatrix of the RNA expression, the gene and the target cells we want to modify. Then we have three arguments allowing to specify what modification we want to apply on these cells. The advMethod contains, a specific prerecorded modification or an indication on how to use the other two arguments. The prerecorded modifications available for the advMethod argument are:
'perc1', replace the value by the whole matrix 1 percentile value of the gene. It is as if we biologically switched off the gene.
'perc99', replace the value by the whole matrix 99 percentile value of the gene. It is as if we biologically switched on the gene to the maximum.
'random', replace the value by from a uniform distribution between min and max of the gene on the dataset
'positive_aberrant' replace value by 10,000 times the max value of the gene on the dataset
'negative_aberrant' replace value by -10,000 times the max value of the gene on the dataset
'decile+X', shifts the gene value by + X deciles.
'decile-X', shifts the gene value by - X deciles. The value of the advMethod argument can also be 'fixed', in this case the modification would be to replace the value of the gene of the wanted cells by the value of the argument 'advFixedValue'. This can be useful to test aberrant values like negative integer, absurdly high values of character values. The value of the advMethod argument can also be 'full_row_fct', 'target_row_fct', 'target_matrix_fct' or 'full_matrix_fct'. They are used when we want to use a custom modification function, with the 'advFct' argument:
'full_row_fct' indicate that the 'advFct' function takes the whole gene values as input.
'target_row_fct' indicate that the 'advFct' function takes target cells gene values as input.
'full_matrix_fct' indicate that the 'advFct' function takes the whole gene expression values as input.
'target_matrix_fct' indicate that the 'advFct' function takes target cells all genes values as input.
the matrix or a data.frame exprs modified on asked genes with the specified modification
library(DelayedArray) rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advModifications(rna_expression, genes, clusters_id, "T cell", advMethod="perc99")
library(DelayedArray) rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advModifications(rna_expression, genes, clusters_id, "T cell", advMethod="perc99")
Random walk search of min change adversarial attack.
advRandWalkMinChange( exprs, clusters, target, classifier, genes, modifications = list(c("perc1"), c("perc99")), firstBatch = 100, walkLength = 100, stepChangeRatio = 0.2, whileMaxCount = 10000, changeType = "any", argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE )
advRandWalkMinChange( exprs, clusters, target, classifier, genes, modifications = list(c("perc1"), c("perc99")), firstBatch = 100, walkLength = 100, stepChangeRatio = 0.2, whileMaxCount = 10000, changeType = "any", argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. By default matrix and data.frame are converted to DelayedMatrix for memory performance, see 'argForModif' argument for options. |
clusters |
a character vector of the clusters to which the cells belong |
target |
the name of the cluster to modify |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
genes |
the character vector of genes to study |
modifications |
the list of the modifications to study |
firstBatch |
the maximum number of try in step 1 |
walkLength |
the maximum number of try in step 2 |
stepChangeRatio |
ratio of parameters change in new walk step |
whileMaxCount |
the maximum number of try when looking for new combination of parameters |
changeType |
|
argForClassif |
the type of the first argument to feed to the classifier function. 'data.frame' by default, can be 'SingleCellExperiment' or 'DelayedMatrix'. |
argForModif |
type of matrix during for the modification, 'data.frame' by default. Can be 'DelayedMatrix', which needs less memory but is slower. |
verbose |
logical, set to TRUE to activate verbose mode |
The parameter search by grid can take a long time, this function aims to make a parameter search faster. We have a function, advSingleGene, looking for one gene attacks. The advRandWalkMinChange function aims to find a multiple genes attack, with the fewer genes possible. At first the user have to provide a list of genes to test, for example by running differential statistics between two cell clusters. The user should also provide a list of modifications to test, to define as so - each item of the list should be 1 or 2 length size. The 1 length vector must contain the prerecorded modifications, 'perc1' or 'perc99'. The 2 length vector must have as first item:
'fixed', in this case the second item should be the value to be replaced by.
'full_row_fct', 'target_row_fct', 'target_matrix_fct' or 'full_matrix_fct'. In this case the second item should be a function. Let's say we want to analysis the susceptibility to min change attack for 3 modifications: "perc1", the modification of each value of the cluster by 1000, and a custom modification stored inside a function myFct. Then the 'modification' parameter should be: my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myFct))
Then the function will try to find the best combination of these genes and modifications to make the min change attack. Step 1 is to find a seed by trying random combinations of genes and modifications on a cluster until the classification is altered. Step 2 is to perform a random walk search to reduce the number of genes needed to change the classification. The
DataFrame results of the classification of all the grid combinations
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advRandWalkMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = list(c("perc1"), c("perc99"))) # Stop at first attack discovery, whitout going into the walk # parameter search. advRandWalkMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = list(c("perc1"), c("perc99")), walkLength=0) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) advRandWalkMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = my_modifications)
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advRandWalkMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = list(c("perc1"), c("perc99"))) # Stop at first attack discovery, whitout going into the walk # parameter search. advRandWalkMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = list(c("perc1"), c("perc99")), walkLength=0) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) advRandWalkMinChange(rna_expression, clusters_id, "T cell", MyClassifier, genes=genes, modifications = my_modifications)
Find a one gene min change adversarial attack list. A one gene min change adversarial attack refers to the modification of a single gene within a cluster, leading to a change in its classification. The function returns a list of genes/new classification.
advSingleGene( exprs, clusters, target, classifier, exclGenes = c(), genes = c(), advMethod = "perc99", advFixedValue = 3, advFct = NULL, firstDichot = 100, maxSplitSize = 1, returnFirstFound = FALSE, changeType = "any", argForClassif = "data.frame", argForModif = "data.frame", slot = NULL, verbose = FALSE )
advSingleGene( exprs, clusters, target, classifier, exclGenes = c(), genes = c(), advMethod = "perc99", advFixedValue = 3, advFct = NULL, firstDichot = 100, maxSplitSize = 1, returnFirstFound = FALSE, changeType = "any", argForClassif = "data.frame", argForModif = "data.frame", slot = NULL, verbose = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. By default, these are converted to a data.frame to increase speed performance during modifications. However, this conversion can consume a significant amount of memory, see 'argForModif' argument for options. |
clusters |
a character vector of the clusters to which the cells belong |
target |
the name of the cluster to modify |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
exclGenes |
a character vector of genes to exclude from the analysis |
genes |
a character vector of genes in case you want to limit the attack on a subset of genes |
advMethod |
the name of the method to use |
advFixedValue |
the numeric value to use in case of
advMethod= |
advFct |
the function to use in case advMethod
belongs to the following list: |
firstDichot |
the initial number of slices before the dichotomic search |
maxSplitSize |
max size of dichotomic slices |
returnFirstFound |
set to TRUE to return result when a the first misclassification is found |
changeType |
|
argForClassif |
the type of the first argument to feed to the classifier function. 'data.frame' by default, can be 'SingleCellExperiment' or 'DelayedMatrix'. |
argForModif |
type of matrix during for the modification, 'data.frame' by default. Can be 'DelayedMatrix', which needs less memory but is slower. |
verbose |
logical, set to TRUE to activate verbose mode |
This function aims to get all genes that when modified individually can lead to a misclassification. You can refer to the 'advModifications' function documentation to more details on how to define a modification. The function is made as a two step parameter search. The first step is to split the genes in 'firstDichot' sets, 100 by default. Then each set is studied by a dichotomic process in a recursive function. The aim of sarting by a high value of sets, instead of starting directly by the dichotomic research is to avoid the following scenario: we modify 5000 genes, the modification of one gene conpensates the modification of another. The classification remains unchanged, whereas there is a one gene classification modifying inside the 5000. The dichotomic process runs as follow. The function receives a list of genes, make the modification of the whole list and make the classification. Three scenarios possible:
the classification remains the same as the target cluster. The function returns, and the dichotomic process continues.
the classification is changed. There is only one gene in the list, the function returns the gene and the new classification.
the classification is changed. There is more than one gene in the list, the genes list is split in two, and the dichotomic process continues.
a list of genes/new classification tuples
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("B cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advSingleGene(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99")
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("B cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") advSingleGene(rna_expression, clusters_id, "T cell", MyClassifier, advMethod="perc99")
The function getSignGenes orders the genes by maximizing the significance of the gene to differentiate the clusters and ensures that they represent at most the variations across all possible pairs of clusters.
getSignGenes(expr, clusters, method = "wilcox", verbose = FALSE)
getSignGenes(expr, clusters, method = "wilcox", verbose = FALSE)
expr |
A matrix of gene expression data. Rows are cells and columns are genes. |
clusters |
a character vector of the clusters to which the cells belong |
method |
the statistical test to use. Either "wilcox" for the Wilcoxon rank sum test or "ttest" for the t-test. Default is "wilcox". |
verbose |
logical, set to TRUE to activate verbose mode |
The function getSignGenes orders the genes by maximizing the significance of the gene to differentiate the clusters and ensures that they represent at most the variations across all possible pairs of clusters.
rna_expression <- data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3)) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","T cell","T cell","T cell") getSignGenes(rna_expression, clusters_id, method="wilcox", verbose=TRUE)
rna_expression <- data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3)) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","T cell","T cell","T cell") getSignGenes(rna_expression, clusters_id, method="wilcox", verbose=TRUE)
Returns the RNA expression matrix from a SingleCellExperiment with unique hgnc gene names in columns
matrixFromSCE(sce)
matrixFromSCE(sce)
sce |
SingleCellExperiment object to convert |
This function retrieves from a SingleCellExperiment object the raw RNA expression value corresponding to the hgnc genes. The resulting matrix can then be used with adverSCarial packages.
the RNA expression matrix from a SingleCellExperiment with unique hgnc gene names in columns
library(TENxPBMCData) pbmc <- TENxPBMCData(dataset = "pbmc3k") mat_rna <- matrixFromSCE(pbmc)
library(TENxPBMCData) pbmc <- TENxPBMCData(dataset = "pbmc3k") mat_rna <- matrixFromSCE(pbmc)
Gives an overview of the susceptibility to max change attacks, for each cell type, for a given list of modifications.
maxChangeOverview( exprs, clusters, classifier, exclGenes = c(), genes = c(), modifications = list(c("perc1"), c("perc99")), advMethod = "perc99", advFixedValue = 3, advFct = NULL, maxSplitSize = 100, argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE )
maxChangeOverview( exprs, clusters, classifier, exclGenes = c(), genes = c(), modifications = list(c("perc1"), c("perc99")), advMethod = "perc99", advFixedValue = 3, advFct = NULL, maxSplitSize = 100, argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. By default, these are converted to a data.frame to increase speed performance during modifications. However, this conversion can consume a significant amount of memory, see 'argForModif' argument for options. |
clusters |
a character vector of the clusters to which the cells belong |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
exclGenes |
a character vector of genes to exclude from the analysis |
genes |
a character vector of genes in case you want to limit the analysis on a subset of genes |
modifications |
the list of the modifications to study |
advMethod |
the name of the method to use |
advFixedValue |
the numeric value to use in case of
advMethod= |
advFct |
the function to use in case advMethod
belongs to the following list: |
maxSplitSize |
max size of dichotomic slices. |
argForClassif |
the type of the first argument to feed to the classifier function. 'data.frame' by default, can be 'SingleCellExperiment' or 'DelayedMatrix'. |
argForModif |
type of matrix during for the modification, 'data.frame' by default. Can be 'DelayedMatrix', which needs less memory but is slower. |
verbose |
logical, set to TRUE to activate verbose mode |
Running the advMaxChange function for each cell type to see which ones are more vulerable can take a long time. The aim of the maxChangeOverview function is to make this process faster. It uses a default value of 100 for the 'maxSplitSize' parameter. So, the dichotomic process of the advMaxChange function stops as soon as the fold length is lower than 100. You can have more accurate results with maxSplitSize=1, but it will take longer. This function aims also to run the advMaxChange for several given modifications. You can specify a list of modifications as so - each item of the list should be 1 or 2 length size. The 1 length vector must contain the prerecorded modifications, 'perc1' or 'perc99'. The 2 length vector must have as first item:
'fixed', in this case the second item should be the value to be replaced by.
'full_row_fct', 'target_row_fct', 'target_matrix_fct' or 'full_matrix_fct'. In this case the second item should be a function. Let's say we want to analysis the susceptibility to max change attack for 3 modifications: "perc1", the modification of each value of the cluster by 1000, and a custom modification stored inside a function myFct. Then the 'modification' parameter should be: my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myFct))
The function returns a dataframe with the number of genes of the max change attack for each modification in columns, for each cell type in rows.
a DataFrame storing the number of possible max change attacks for each cell type and each modification.
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") maxChangeOverview(rna_expression, clusters_id, MyClassifier, modifications = list(c("perc1"), c("perc99"))) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) maxChangeOverview(rna_expression, clusters_id, MyClassifier, modifications = my_modifications)
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") maxChangeOverview(rna_expression, clusters_id, MyClassifier, modifications = list(c("perc1"), c("perc99"))) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) maxChangeOverview(rna_expression, clusters_id, MyClassifier, modifications = my_modifications)
Example cell type classifier for pbmc clustered datasets.
MClassifier(exprs, clusters, target)
MClassifier(exprs, clusters, target)
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. |
clusters |
vector of clusters to which each cell belongs |
target |
name of the cell cluster to classify |
This classifier aims at testing the adverSCarial package of real pbmc data. It is a simple marker based classifier. It looks at the average value of a few genes inside a cluster, and returns the associated cell type. Markers where found by differential expressions.
a vector with the classification, and the odd
library(TENxPBMCData) pbmc <- TENxPBMCData(dataset = "pbmc3k") mat_rna <- matrixFromSCE(pbmc) cell_types <- system.file("extdata", "pbmc3k_cell_types.tsv", package = "adverSCarial" ) cell_types <- read.table(cell_types, sep = "\t")$cell_type MClassifier(mat_rna, cell_types, "DC")
library(TENxPBMCData) pbmc <- TENxPBMCData(dataset = "pbmc3k") mat_rna <- matrixFromSCE(pbmc) cell_types <- system.file("extdata", "pbmc3k_cell_types.tsv", package = "adverSCarial" ) cell_types <- read.table(cell_types, sep = "\t")$cell_type MClassifier(mat_rna, cell_types, "DC")
Returns a classification and an odd value from a RNA expression DelayedMatrix or a SingleCellExperiment object, for given genes, for a given cluster, for a given modification.
predictWithNewValue( exprs, genes, clusters, target, classifier, advMethod = "perc99", advFixedValue = 3, advFct = NULL, argForClassif = "data.frame", argForModif = "data.frame", slot = NULL, verbose = FALSE )
predictWithNewValue( exprs, genes, clusters, target, classifier, advMethod = "perc99", advFixedValue = 3, advFct = NULL, argForClassif = "data.frame", argForModif = "data.frame", slot = NULL, verbose = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. |
genes |
the character vector of genes to modify |
clusters |
a character vector of the clusters to which the cells belong |
target |
the name of the cluster to modify |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
advMethod |
the name of the method to use |
advFixedValue |
the numeric value to use in case of
advMethod= |
advFct |
the function to use in case advMethod
belongs to the following list: |
argForClassif |
the type of the first argument to feed to the classifier function. 'data.frame' by default, can be 'SingleCellExperiment' or 'DelayedMatrix'. |
argForModif |
type of matrix during for the modification, 'data.frame' by default. Can be 'DelayedMatrix', which is slower, but need less memory. |
verbose |
logical, set to TRUE to activate verbose mode |
This function aims to concatenate the following actions:
modify the RNA gene expression
classify the result This is a widely used function in the other functions of the package.
a vector of the classification, and the associated odd
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") predictWithNewValue(rna_expression, genes, clusters_id, "T cell", MyClassifier, advMethod="perc99")
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") predictWithNewValue(rna_expression, genes, clusters_id, "T cell", MyClassifier, advMethod="perc99")
Returns a SingleCellExperiment object keeping unique HGNC gene
sceConvertToHGNC(sce)
sceConvertToHGNC(sce)
sce |
SingleCellExperiment object to convert |
Sometimes classifiers need HGNC instead of ensemble genes to run. This function allows to make the conversion.
the SingleCellExperiment object keeping unique HGNC gene
library(TENxPBMCData) pbmc <- TENxPBMCData(dataset = "pbmc3k") hgnc_pbmc <- sceConvertToHGNC(pbmc)
library(TENxPBMCData) pbmc <- TENxPBMCData(dataset = "pbmc3k") hgnc_pbmc <- sceConvertToHGNC(pbmc)
Gives an overview of the susceptibility to single gene attacks, for each cell type, for a given list of modifications.
singleGeneOverview( exprs, clusters, classifier, exclGenes = c(), genes = c(), modifications = list(c("perc1"), c("perc99")), advMethod = "perc99", advFixedValue = 3, advFct = NULL, firstDichot = 100, maxSplitSize = 100, changeType = "any", argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE )
singleGeneOverview( exprs, clusters, classifier, exclGenes = c(), genes = c(), modifications = list(c("perc1"), c("perc99")), advMethod = "perc99", advFixedValue = 3, advFct = NULL, firstDichot = 100, maxSplitSize = 100, changeType = "any", argForClassif = "data.frame", argForModif = "data.frame", verbose = FALSE )
exprs |
DelayedMatrix of numeric RNA expression, cells are rows and genes are columns - or a SingleCellExperiment object, a matrix or a data.frame. By default, these are converted to a data.frame to increase speed performance during modifications. However, this conversion can consume a significant amount of memory, see 'argForModif' argument for options. |
clusters |
a character vector of the clusters to which the cells belong |
classifier |
a classifier in the suitable format.
A classifier function should be formated as follow:
classifier = function(expr, clusters, target)
# Making the classification
c("cell type", score)
|
exclGenes |
a character vector of genes to exclude from the analysis |
genes |
a character vector of genes in case you want to limit the analysis on a subset of genes |
modifications |
the list of the modifications to study |
advMethod |
the name of the method to use |
advFixedValue |
the numeric value to use in case of
advMethod= |
advFct |
the function to use in case advMethod
belongs to the following list: |
firstDichot |
the initial number of slices before the dichotomic search |
maxSplitSize |
max size of dichotomic slices. |
changeType |
|
argForClassif |
the type of the first argument to feed to the classifier function. 'data.frame' by default, can be 'SingleCellExperiment' or 'DelayedMatrix'. |
argForModif |
type of matrix during for the modification, 'data.frame' by default. Can be 'DelayedMatrix', which needs less memory but is slower. |
verbose |
logical, set to TRUE to activate verbose mode |
Running the advSingleGene function for each cell type to see which ones are more vulerable can take a long time. The aim of the singleGeneOverview function is to make this process faster. It uses a default value of 100 for the 'maxSplitSize' parameter. So, the dichotomic process of the advSingleGene function stops as soon as the fold length is lower than 100. You can have more accurate results with maxSplitSize=1, but it will take longer. This function aims also to run the advSingleGene for several given modifications. You can specify a list of modifications as so - each item of the list should be 1 or 2 length size. The 1 length vector must contain the prerecorded modifications, 'perc1' or 'perc99'. The 2 length vector must have as first item:
'fixed', in this case the second item should be the value to be replaced by.
'full_row_fct', 'target_row_fct', 'target_matrix_fct' or 'full_matrix_fct'. In this case the second item should be a function. Let's say we want to analysis the susceptibility to single gene attack for 3 modifications: "perc1", the modification of each value of the cluster by 1000, and a custom modification stored inside a function myFct. Then the 'modification' parameter should be: my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myFct))
The function returns a dataframe with the number of genes of the max change attack for each modification in columns, for each cell type in rows.
a DataFrame storing the number of possible single gene attacks each cell type and each modification.
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") singleGeneOverview(rna_expression, clusters_id, MyClassifier, modifications = list(c("perc1"), c("perc99"))) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) singleGeneOverview(rna_expression, clusters_id, MyClassifier, modifications = my_modifications)
library(DelayedArray) MyClassifier <- function(expr, clusters, target) { c("T cell", 0.9) } rna_expression <- DelayedArray(data.frame(CD4=c(0,0,0,0), CD8A=c(1,1,1,1), CD8B=c(2,2,3,3))) genes <- c("CD4", "CD8A") clusters_id <- c("B cell","B cell","T cell","T cell") singleGeneOverview(rna_expression, clusters_id, MyClassifier, modifications = list(c("perc1"), c("perc99"))) myModif = function(x, y){ return(sample(1:10,1)) } my_modifications = list(c("perc1"), c("fixed", 1000), c("full_matrix_fct", myModif)) singleGeneOverview(rna_expression, clusters_id, MyClassifier, modifications = my_modifications)