Package 'findIPs'

Title: Influential Points Detection for Feature Rankings
Description: Feature rankings can be distorted by a single case in the context of high-dimensional data. The cases exerts abnormal influence on feature rankings are called influential points (IPs). The package aims at detecting IPs based on case deletion and quantifies their effects by measuring the rank changes (DOI:10.48550/arXiv.2303.10516). The package applies a novel rank comparing measure using the adaptive weights that stress the top-ranked important features and adjust the weights to ranking properties.
Authors: Shuo Wang [aut, cre] , Junyan Lu [aut]
Maintainer: Shuo Wang <[email protected]>
License: GPL-3
Version: 1.1.0
Built: 2024-06-30 02:44:26 UTC
Source: https://github.com/bioc/findIPs

Help Index


Function to detect influential points for feature rankings

Description

findIPs employs two important functions: getdrop1ranks and sumRanks. getdrop1ranks can calculate the original feature ranking and leave-one-out feature rankings. The outputs are subsequently taken to sumRanks, which computes the overall rank changes for each observation, indicating their influence on feature rankings.

Usage

findIPs(
  X,
  y,
  fun,
  decreasing = FALSE,
  topN = 100,
  method = "adaptive",
  nCores = NULL
)

Arguments

X

A data matrix, with rows being the variables and columns being samples.

y

Groups or survival object (for cox regression).

fun

fun can either be a character or a function. fun should be one of the 't.test', 'cox', 'log2fc', and 'kruskal.test' when it is a character. findIPs() incorporates four widely used ranking criteria: t-test, univariate cox model, log2fc, and kruskal test, whose outputs are p values except log2fc (absolute log2 fold changes). The features would be ordered by specifying the argument decreasing. For instance, if fun = 't.test', the decreasing = F, such that features are order by the pvalues of t.test in a increasing manner.

fun can also be a function to obtain ranking criteria with x and y being the only input and the ranking criteria, such as p-values being the only output.

decreasing

logical. How the rank criteria are ordered? For instance, p-value should be ordered increasingly, while fold-change should be ordered decreasingly.

topN

the number of important features included for comparison.

method

method to summarize rank changes. It should be one of the 'adaptive', 'weightedSpearman', and 'unweighted'. Both 'adaptive' and 'weightedSpearman' are weighted rank comparison method, but former employs the weight that are adaptive to the distribution of rank changes. 'unweighted' denotes a direct comparison of ranks without considering weights.

nCores

the number of CPU cores used for parallel running. If nCores = NULL, a single core is used.

Value

kappa

The weight function's shape is controlled by kappa, which ranges from 0 to 1. Weighted rank changes are calculated using kappa, with higher values indicating more weight on top features.

score

The influence of each observation on feature rankings, with larger values indicating more influence.

origRank

The original ranking. origRank is exactly the input. Here it is re-output for visualization purposes.

drop1Rank

The leave-one-out rankings.

origRankWeighted

The weighted original ranking

drop1RankWeighted

The weighted leave-one-out rankings

Examples

data(miller05)
X <- miller05$X
y <- miller05$y

obj <- findIPs(X, y,
               fun = 't.test',
               decreasing = FALSE,
               topN = 100,
               method = 'adaptive')

par(mfrow = c(1, 3), mar = c(4, 4, 2, 2))
plotRankScatters(obj, top = TRUE)
plotAdaptiveWeights(kappa = obj$kappa,
                    n = nrow(obj$drop1Rank),
                    type = 'line',
                    ylim = NULL)
plotIPs(obj, topn = 5, ylim = NULL)

## Interop with ExpressionSet class
library(Biobase)
data(sample.ExpressionSet)
design <- phenoData(sample.ExpressionSet)$type
IPs <- findIPs(exprs(sample.ExpressionSet), design, fun = "t.test",
               method = "adaptive")
plotIPs(IPs)

## Interop with SummarizedExperiment class
library(SummarizedExperiment)
## Make a SummarizedExperiment class
sample.SummarizedExperiment <- makeSummarizedExperimentFromExpressionSet(
  sample.ExpressionSet)

design <- colData(sample.SummarizedExperiment)$type
IPs <- findIPs(assay(sample.SummarizedExperiment), design, fun = "t.test",
               method = "adaptive")
plotIPs(IPs)

Derive ranking lists including original and leave-one-out rankings

Description

This function calculates the original and leave-one-out feature rankings using a predefined rank method

Usage

getdrop1ranks(X, y, fun, decreasing = FALSE, topN = 100, nCores = NULL)

Arguments

X

A data matrix, with rows being the variables and columns being samples.

y

Groups or survival object (for cox regression)

fun

fun can either be a character or a function. fun should be one of the 't.test', 'cox', 'log2fc', and 'kruskal.test' when it is a character. findIPs() incorporates four widely used ranking criteria: t-test, univariate cox model, log2fc, and kruskal test, whose outputs are p values except log2fc (absolute log2 fold changes). The features would be ordered by specifying the argument decreasing. For instance, if fun = 't.test', the decreasing = F, such that features are order by the pvalues of t.test in the increasing manner.

fun can also be a function to obtain ranking criteria with x and y being the only input and the ranking criteria, such as p-values being the only output.

decreasing

logical. How the rank criteria are ordered? For instance, p-value should be ordered increasingly, while fold-change should be ordered decreasingly.

topN

the number of important features included for comparison. The top n features in the original ranking list.

nCores

the number of CPU cores used for parallel running. If nCores = NULL, a single core is used.

Value

orig

vector:,original ranking

drop1rank

matrix, Leave-one-out rankings

Examples

data(miller05)
X <- miller05$X
y <- miller05$y
obj <- getdrop1ranks(X, y,
                     fun = 't.test',
                     decreasing = FALSE,
                     topN = 100)
rks <- sumRanks(origRank = obj$origRank,
                drop1Rank = obj$drop1Rank,
                topN = 100,
                method = 'adaptive')
plotIPs(rks, topn = 5, ylim = NULL)

miller05 data

Description

miller05 is gene expression data with 1000 genes randomly sampled from 22283 genes and 236 samples since removing the case with missing response. The data has binary and survival response. The binary response contains 58 case with p53 mutant and 193 wild type mutant. The survival response has a total of 55 events.

Usage

data(miller05)

Format

a list

Value

miller05 data, a list containing 1000 genes and binary and survival response.

References

Miller, Lance D., et al. 'An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.' Proceedings of the National Academy of Sciences 102.38 (2005): 13550-13555.doi:10.1073pnas.0506230102

Examples

data(miller05)

Visualize the weight function for adaptive weights

Description

Plot the weight function for the adaptive weights with given kappa and the list length (n).

Usage

plotAdaptiveWeights(kappa, n, type = c("line", "points"), ylim = NULL)

Arguments

kappa

a shape parameter of the weight function.

n

the length list.

type

draw line or points. Both line and points will be plotted if type = c('line', 'points').

ylim

y coordinates ranges.

Value

plot based on basic graph

Examples

par(mfrow = c(1, 2), mar = c(4, 4, 2, 2))
plotAdaptiveWeights(kappa = 0.01, n = 100, type = 'line', ylim = c(0, 0.025))
plotAdaptiveWeights(kappa = 0.02, n = 100, type = 'line', ylim = c(0, 0.025))

Visualize the influential scores

Description

Visualize influential score using lollipop plot. The function uses the output obtained from rank.compare or findIPs function.

Usage

plotIPs(obj, topn = 5, ylim = NULL, ...)

Arguments

obj

the object obtained from rank.compare or findIPs function.

topn

the top n most influential points to be labelled in the plot.

ylim

y coordinates ranges

...

other arguments

Value

plot based on basic graph

Examples

data(miller05)
X <- miller05$X
y <- miller05$y
obj <- getdrop1ranks(X, y,
                     fun = 't.test',
                     decreasing = FALSE,
                     topN = 100)
rks <- sumRanks(origRank = obj$origRank,
                drop1Rank = obj$drop1Rank,
                topN = 100,
                method = 'adaptive')
plotIPs(rks, topn = 5, ylim = NULL)

Visualize the unweighted rank changes

Description

Visualize the unweighted rank changes using scatter plot. The plot displays the original ranking and leave-one-out rankings.

Usage

plotRankScatters(obj, top = TRUE, points.arg = list(), top.arg = list())

Arguments

obj

the objective obtained from findIPs() or sumRanks() functions

top

logical, whether the most influential case needs to be plot in black

points.arg

a list. Arguments in graphics::points() can be used to define the points.

top.arg

a list. Arguments in graphics::points() can be used to define the top points.

Value

a plot based on basic graphic.

Examples

data(miller05)
X <- miller05$X
y <- miller05$y

obj <- getdrop1ranks(X, y,
                     fun = 't.test',
                     decreasing = FALSE,
                     topN = 100)
rks <- sumRanks(origRank = obj$origRank,
                drop1Rank = obj$drop1Rank,
                topN = 100,
                method = 'adaptive')
plotRankScatters(rks)

Summarize the weighted rank changes caused by case-deletion

Description

This function measures the overall rank changes due to case deletion. A large rank changes indicates more influence of the deleted case on feature rankings. sumRanks() provides three methods to compute the overall rank changes: unweighted, weighted Spearman, and adaptive weights.

Usage

sumRanks(origRank, drop1Rank, topN = NULL, method = "adaptive", ...)

Arguments

origRank

vectors, reference rankings. For influential observation detection, origRank denotes the original ranking obtained using the whole data.

drop1Rank

matrix or data.frame, Each column is a feature list with a case removed.

topN

the top n features in origRank will be used for rank comparison. If null, include all features.

method

method to summarize rank changes. It should be one of the 'adaptive', 'weightedSpearman', and 'unweighted'. Both 'adaptive' and 'weightedSpearman' are weighted rank comparison method, but former employs the weight that are adaptive to the distribution of rank changes. 'unweighted' denotes a direct comparison of ranks without considering weights.

...

other arguments

Value

kappa

The weight function's shape is controlled by kappa, which ranges from 0 to 1. Weighted rank changes are calculated using kappa, with higher values indicating more weight on top features.

score

The influence of each observation on feature rankings, with larger values indicating more influence.

origRank

The original ranking. origRank is exactly the input. Here it is re-output for visualization purposes.

drop1Rank

The leave-one-out rankings.

origRankWeighted

The weighted original ranking. origRankWeighted will be returned when method = 'adaptive'.

drop1RankWeighted

The weighted leave-one-out rankings. drop1RankWeighted will be returned when method = 'adaptive'.

Examples

data(miller05)
X <- miller05$X
y <- miller05$y
obj <- getdrop1ranks(X, y,
                     fun = 't.test',
                     decreasing = FALSE,
                     topN = 100)

rks <- sumRanks(origRank = obj$origRank,
                drop1Rank = obj$drop1Rank,
                topN = 100,
                method = 'adaptive')

plotIPs(rks, topn = 5, ylim = NULL)