Title: | Influential Points Detection for Feature Rankings |
---|---|
Description: | Feature rankings can be distorted by a single case in the context of high-dimensional data. The cases exerts abnormal influence on feature rankings are called influential points (IPs). The package aims at detecting IPs based on case deletion and quantifies their effects by measuring the rank changes (DOI:10.48550/arXiv.2303.10516). The package applies a novel rank comparing measure using the adaptive weights that stress the top-ranked important features and adjust the weights to ranking properties. |
Authors: | Shuo Wang [aut, cre] , Junyan Lu [aut] |
Maintainer: | Shuo Wang <[email protected]> |
License: | GPL-3 |
Version: | 1.3.1 |
Built: | 2024-12-22 03:21:18 UTC |
Source: | https://github.com/bioc/findIPs |
findIPs
employs two important functions: getdrop1ranks
and
sumRanks
. getdrop1ranks
can calculate the original feature
ranking and leave-one-out feature rankings. The outputs are subsequently
taken to sumRanks
, which computes the overall rank changes for
each observation, indicating their influence on feature rankings.
findIPs( X, y, fun, decreasing = FALSE, topN = 100, method = "adaptive", nCores = NULL )
findIPs( X, y, fun, decreasing = FALSE, topN = 100, method = "adaptive", nCores = NULL )
X |
A data matrix, with rows being the variables and columns being samples. |
y |
Groups or survival object (for cox regression). |
fun |
fun can either be a character or a function. fun should be one
of the 't.test', 'cox', 'log2fc', and 'kruskal.test' when it is a character.
fun can also be a function to obtain ranking criteria with x and y being the only input and the ranking criteria, such as p-values being the only output. |
decreasing |
logical. How the rank criteria are ordered? For instance, p-value should be ordered increasingly, while fold-change should be ordered decreasingly. |
topN |
the number of important features included for comparison. |
method |
method to summarize rank changes. It should be one of the 'adaptive', 'weightedSpearman', and 'unweighted'. Both 'adaptive' and 'weightedSpearman' are weighted rank comparison method, but former employs the weight that are adaptive to the distribution of rank changes. 'unweighted' denotes a direct comparison of ranks without considering weights. |
nCores |
the number of CPU cores used for parallel running. If nCores = NULL, a single core is used. |
kappa |
The weight function's shape is controlled by kappa, which ranges from 0 to 1. Weighted rank changes are calculated using kappa, with higher values indicating more weight on top features. |
score |
The influence of each observation on feature rankings, with larger values indicating more influence. |
origRank |
The original ranking. origRank is exactly the input. Here it is re-output for visualization purposes. |
drop1Rank |
The leave-one-out rankings. |
origRankWeighted |
The weighted original ranking |
drop1RankWeighted |
The weighted leave-one-out rankings |
data(miller05) X <- miller05$X y <- miller05$y obj <- findIPs(X, y, fun = 't.test', decreasing = FALSE, topN = 100, method = 'adaptive') par(mfrow = c(1, 3), mar = c(4, 4, 2, 2)) plotRankScatters(obj, top = TRUE) plotAdaptiveWeights(kappa = obj$kappa, n = nrow(obj$drop1Rank), type = 'line', ylim = NULL) plotIPs(obj, topn = 5, ylim = NULL) ## Interop with ExpressionSet class library(Biobase) data(sample.ExpressionSet) design <- phenoData(sample.ExpressionSet)$type IPs <- findIPs(exprs(sample.ExpressionSet), design, fun = "t.test", method = "adaptive") plotIPs(IPs) ## Interop with SummarizedExperiment class library(SummarizedExperiment) ## Make a SummarizedExperiment class sample.SummarizedExperiment <- makeSummarizedExperimentFromExpressionSet( sample.ExpressionSet) design <- colData(sample.SummarizedExperiment)$type IPs <- findIPs(assay(sample.SummarizedExperiment), design, fun = "t.test", method = "adaptive") plotIPs(IPs)
data(miller05) X <- miller05$X y <- miller05$y obj <- findIPs(X, y, fun = 't.test', decreasing = FALSE, topN = 100, method = 'adaptive') par(mfrow = c(1, 3), mar = c(4, 4, 2, 2)) plotRankScatters(obj, top = TRUE) plotAdaptiveWeights(kappa = obj$kappa, n = nrow(obj$drop1Rank), type = 'line', ylim = NULL) plotIPs(obj, topn = 5, ylim = NULL) ## Interop with ExpressionSet class library(Biobase) data(sample.ExpressionSet) design <- phenoData(sample.ExpressionSet)$type IPs <- findIPs(exprs(sample.ExpressionSet), design, fun = "t.test", method = "adaptive") plotIPs(IPs) ## Interop with SummarizedExperiment class library(SummarizedExperiment) ## Make a SummarizedExperiment class sample.SummarizedExperiment <- makeSummarizedExperimentFromExpressionSet( sample.ExpressionSet) design <- colData(sample.SummarizedExperiment)$type IPs <- findIPs(assay(sample.SummarizedExperiment), design, fun = "t.test", method = "adaptive") plotIPs(IPs)
This function calculates the original and leave-one-out feature rankings using a predefined rank method
getdrop1ranks(X, y, fun, decreasing = FALSE, topN = 100, nCores = NULL)
getdrop1ranks(X, y, fun, decreasing = FALSE, topN = 100, nCores = NULL)
X |
A data matrix, with rows being the variables and columns being samples. |
y |
Groups or survival object (for cox regression) |
fun |
fun can either be a character or a function. fun should be one
of the 't.test', 'cox', 'log2fc', and 'kruskal.test' when it is a character.
fun can also be a function to obtain ranking criteria with x and y being the only input and the ranking criteria, such as p-values being the only output. |
decreasing |
logical. How the rank criteria are ordered? For instance, p-value should be ordered increasingly, while fold-change should be ordered decreasingly. |
topN |
the number of important features included for comparison. The top n features in the original ranking list. |
nCores |
the number of CPU cores used for parallel running. If nCores = NULL, a single core is used. |
orig |
vector:,original ranking |
drop1rank |
matrix, Leave-one-out rankings |
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotIPs(rks, topn = 5, ylim = NULL)
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotIPs(rks, topn = 5, ylim = NULL)
miller05 is gene expression data with 1000 genes randomly sampled from 22283 genes and 236 samples since removing the case with missing response. The data has binary and survival response. The binary response contains 58 case with p53 mutant and 193 wild type mutant. The survival response has a total of 55 events.
data(miller05)
data(miller05)
a list
miller05 data, a list containing 1000 genes and binary and survival response.
Miller, Lance D., et al. 'An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.' Proceedings of the National Academy of Sciences 102.38 (2005): 13550-13555.doi:10.1073pnas.0506230102
data(miller05)
data(miller05)
Plot the weight function for the adaptive weights with given kappa and the list length (n).
plotAdaptiveWeights(kappa, n, type = c("line", "points"), ylim = NULL)
plotAdaptiveWeights(kappa, n, type = c("line", "points"), ylim = NULL)
kappa |
a shape parameter of the weight function. |
n |
the length list. |
type |
draw line or points. Both line and points will be plotted if type = c('line', 'points'). |
ylim |
y coordinates ranges. |
plot based on basic graph
par(mfrow = c(1, 2), mar = c(4, 4, 2, 2)) plotAdaptiveWeights(kappa = 0.01, n = 100, type = 'line', ylim = c(0, 0.025)) plotAdaptiveWeights(kappa = 0.02, n = 100, type = 'line', ylim = c(0, 0.025))
par(mfrow = c(1, 2), mar = c(4, 4, 2, 2)) plotAdaptiveWeights(kappa = 0.01, n = 100, type = 'line', ylim = c(0, 0.025)) plotAdaptiveWeights(kappa = 0.02, n = 100, type = 'line', ylim = c(0, 0.025))
Visualize influential score using lollipop plot. The function uses the output
obtained from rank.compare
or findIPs
function.
plotIPs(obj, topn = 5, ylim = NULL, ...)
plotIPs(obj, topn = 5, ylim = NULL, ...)
obj |
the object obtained from |
topn |
the top n most influential points to be labelled in the plot. |
ylim |
y coordinates ranges |
... |
other arguments |
plot based on basic graph
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotIPs(rks, topn = 5, ylim = NULL)
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotIPs(rks, topn = 5, ylim = NULL)
Visualize the unweighted rank changes using scatter plot. The plot displays the original ranking and leave-one-out rankings.
plotRankScatters(obj, top = TRUE, points.arg = list(), top.arg = list())
plotRankScatters(obj, top = TRUE, points.arg = list(), top.arg = list())
obj |
the objective obtained from |
top |
logical, whether the most influential case needs to be plot in black |
points.arg |
a list. Arguments in |
top.arg |
a list. Arguments in |
a plot based on basic graphic.
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotRankScatters(rks)
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotRankScatters(rks)
This function measures the overall rank changes due to case deletion.
A large rank changes indicates more influence of the deleted case on feature
rankings. sumRanks()
provides three methods to compute the overall
rank changes: unweighted, weighted Spearman, and adaptive weights.
sumRanks(origRank, drop1Rank, topN = NULL, method = "adaptive", ...)
sumRanks(origRank, drop1Rank, topN = NULL, method = "adaptive", ...)
origRank |
vectors, reference rankings. For influential observation detection, origRank denotes the original ranking obtained using the whole data. |
drop1Rank |
matrix or data.frame, Each column is a feature list with a case removed. |
topN |
the top n features in origRank will be used for rank comparison. If null, include all features. |
method |
method to summarize rank changes. It should be one of the 'adaptive', 'weightedSpearman', and 'unweighted'. Both 'adaptive' and 'weightedSpearman' are weighted rank comparison method, but former employs the weight that are adaptive to the distribution of rank changes. 'unweighted' denotes a direct comparison of ranks without considering weights. |
... |
other arguments |
kappa |
The weight function's shape is controlled by kappa, which ranges from 0 to 1. Weighted rank changes are calculated using kappa, with higher values indicating more weight on top features. |
score |
The influence of each observation on feature rankings, with larger values indicating more influence. |
origRank |
The original ranking. origRank is exactly the input. Here it is re-output for visualization purposes. |
drop1Rank |
The leave-one-out rankings. |
origRankWeighted |
The weighted original ranking. origRankWeighted will be returned when method = 'adaptive'. |
drop1RankWeighted |
The weighted leave-one-out rankings. drop1RankWeighted will be returned when method = 'adaptive'. |
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotIPs(rks, topn = 5, ylim = NULL)
data(miller05) X <- miller05$X y <- miller05$y obj <- getdrop1ranks(X, y, fun = 't.test', decreasing = FALSE, topN = 100) rks <- sumRanks(origRank = obj$origRank, drop1Rank = obj$drop1Rank, topN = 100, method = 'adaptive') plotIPs(rks, topn = 5, ylim = NULL)