Package 'BioQC' reference manual

Title:	Detect tissue heterogeneity in expression profiles with gene sets
Description:	BioQC performs quality control of high-throughput expression data based on tissue gene signatures. It can detect tissue heterogeneity in gene expression data. The core algorithm is a Wilcoxon-Mann-Whitney test that is optimised for high performance.
Authors:	Jitao David Zhang [cre, aut], Laura Badi [aut], Gregor Sturm [aut], Roland Ambs [aut], Iakov Davydov [aut]
Maintainer:	Jitao David Zhang <[email protected]>
License:	GPL (>=3) + file LICENSE
Version:	1.35.0
Built:	2025-03-29 03:02:47 UTC
Source:	https://github.com/bioc/BioQC

Subsetting GmtList object into another GmtList object

Description

Subsetting GmtList object into another GmtList object

Usage

## S3 method for class 'GmtList'
x[i, drop = FALSE]
## S3 method for class 'GmtList'
x[i, drop = FALSE]

Arguments

`x`	A GmtList object
`i`	Index to subset
`drop`	In case only one element remains, should a list representing the single geneset returned? Default: FALSE

Examples

myGmtList <- GmtList(list(gs1=letters[1:3], gs2=letters[3:4], gs3=letters[4:5]))
myGmtList[1:2]
myGmtList[1] ## default behaviour: not dropping
myGmtList[1,drop=TRUE] ## force dropping
myGmtList <- GmtList(list(gs1=letters[1:3], gs2=letters[3:4], gs3=letters[4:5]))
myGmtList[1:2]
myGmtList[1] ## default behaviour: not dropping
myGmtList[1,drop=TRUE] ## force dropping

Subsetting GmtList object to fetch one gene-set

Description

Subsetting GmtList object to fetch one gene-set

Usage

## S3 method for class 'GmtList'
x[[i]]
## S3 method for class 'GmtList'
x[[i]]

Arguments

`x`	A GmtList object
`i`	The index to subset

Examples

myGmtList <- GmtList(list(gs1=letters[1:3], gs2=letters[3:4], gs3=letters[4:5]))
myGmtList[[1]]
myGmtList <- GmtList(list(gs1=letters[1:3], gs2=letters[3:4], gs3=letters[4:5]))
myGmtList[[1]]

Absolute base-10 logarithm of p-values

Description

Absolute base-10 logarithm of p-values

Usage

absLog10p(x)
absLog10p(x)

Arguments

`x`	Numeric vector or matrix The function returns the absolute values of base-10 logarithm of p-values.

Details

The logarithm transformation of p-values is commonly used to visualize results from statistical tests. Although it may cause misunderstanding and therefore its use is disapproved by some experts, it helps to visualize and interpret results of statistical tests intuitively.

The function transforms p-values with base-10 logarithm, and returns its absolute value. The choice of base 10 is driven by the simplicity of interpreting the results.

Value

Numeric vector or matrix.

Author(s)

Jitao David Zhang <[email protected]>

Examples

testp <- runif(1000, 0, 1)
testp.al <- absLog10p(testp)

print(head(testp))
print(head(testp.al))

testp <- runif(1000, 0, 1)
testp.al <- absLog10p(testp)

print(head(testp))
print(head(testp.al))

Append a GmtList object to another one

Description

Append a GmtList object to another one

Usage

appendGmtList(gmtList, newGmtList, ...)
appendGmtList(gmtList, newGmtList, ...)

Arguments

`gmtList`	A `GmtList` object
`newGmtList`	Another `GmtList` object to be appended
`...`	Further `GmtList` object to be appended

Value

A new GmtList list, with all elements in the input appended in the given order

Examples

test_gmt_file<- system.file("extdata/test.gmt", package="BioQC")
testGmtList1 <- readGmt(test_gmt_file, namespace="test1")
testGmtList2 <- readGmt(test_gmt_file, namespace="test2")
testGmtList3 <- readGmt(test_gmt_file, namespace="test3")
testGmtAppended <- appendGmtList(testGmtList1, testGmtList2, testGmtList3)
test_gmt_file<- system.file("extdata/test.gmt", package="BioQC")
testGmtList1 <- readGmt(test_gmt_file, namespace="test1")
testGmtList2 <- readGmt(test_gmt_file, namespace="test2")
testGmtList3 <- readGmt(test_gmt_file, namespace="test3")
testGmtAppended <- appendGmtList(testGmtList1, testGmtList2, testGmtList3)

Convert a list of gene symbols into a gmtlist

Description

Convert a list of gene symbols into a gmtlist

Usage

as.GmtList(list, description = NULL, uniqGenes = TRUE, namespace = NULL)
as.GmtList(list, description = NULL, uniqGenes = TRUE, namespace = NULL)

Arguments

`list`	A named list with character vectors of genes. Names will become names of gene sets; character vectors will become genes
`description`	Character, description of gene-sets. The value will be expanded to the same length of the list.
`uniqGenes`	Logical, whether redundant genes should be made unique?
`namespace`	Character or `NULL`, namespace of the gene-set

Examples

testVec <- list(GeneSet1=c("AKT1", "AKT2"),
               GeneSet2=c("MAPK1", "MAPK3"),
               GeneSet3=NULL)
testVecGmtlist <- as.GmtList(testVec)

testVec <- list(GeneSet1=c("AKT1", "AKT2"),
               GeneSet2=c("MAPK1", "MAPK3"),
               GeneSet3=NULL)
testVecGmtlist <- as.GmtList(testVec)

An S4 class to hold a list of indices, with the possibility to specify the offset of the indices. IndexList and SignedIndexList extend this class

Description

An S4 class to hold a list of indices, with the possibility to specify the offset of the indices. IndexList and SignedIndexList extend this class

Slots

offset: An integer specifying the value of first element. Default 1
keepNA: Logical, whether NA is kept during construction
keepDup: Logical, whether duplicated values are kept during construction

Shannon entropy

Description

Shannon entropy

Usage

entropy(vector)
entropy(vector)

Arguments

vector

A vector of numbers, or characters. Discrete probability of each item is calculated and the Shannon entropy is returned.

Value

Shannon entropy

Shannon entropy can be used as measures of gene expression specificity, as well as measures of tissue diversity and specialization. See references below.

We use 2 as base for the entropy calculation, because in this base the unit of entropy is bit.

Author(s)

Jitao David Zhang <[email protected]>

References

Martinez and Reyes-Valdes (2008) Defining diversity, specialization, and gene specificity in transcriptomes through information theory. PNAS 105(28):9709–9714

Examples

myVec0 <- 1:9
entropy(myVec0) ## log2(9)
myVec1 <- rep(1, 9)
entropy(myVec1)

entropy(LETTERS)
entropy(rep(LETTERS, 5))
myVec0 <- 1:9
entropy(myVec0) ## log2(9)
myVec1 <- rep(1, 9)
entropy(myVec1)

entropy(LETTERS)
entropy(rep(LETTERS, 5))

Entropy-based sample diversity

Description

Entropy-based sample diversity

Usage

entropyDiversity(mat, norm = FALSE)
entropyDiversity(mat, norm = FALSE)

Arguments

`mat`	A matrix (usually an expression matrix), with genes (features) in rows and samples in columns.
`norm`	Logical, whether the diversity should be normalized by `log2(nrow(mat))`.

Value

A vector as long as the column number of the input matrix

References

Martinez and Reyes-Valdes (2008) Defining diversity, specialization, and gene specificity in transcriptomes through information theory. PNAS 105(28):9709–9714

Examples

myMat <- rbind(c(3,4,5),c(6,6,6), c(0,2,4))
entropyDiversity(myMat)
entropyDiversity(myMat, norm=TRUE)

myRandomMat <- matrix(runif(1000), ncol=20)
entropyDiversity(myRandomMat)
entropyDiversity(myRandomMat, norm=TRUE)
myMat <- rbind(c(3,4,5),c(6,6,6), c(0,2,4))
entropyDiversity(myMat)
entropyDiversity(myMat, norm=TRUE)

myRandomMat <- matrix(runif(1000), ncol=20)
entropyDiversity(myRandomMat)
entropyDiversity(myRandomMat, norm=TRUE)

Entropy-based gene-expression specificity

Description

Entropy-based gene-expression specificity

Usage

entropySpecificity(mat, norm = FALSE)
entropySpecificity(mat, norm = FALSE)

Arguments

`mat`	A matrix (usually an expression matrix), with genes (features) in rows and samples in columns.
`norm`	Logical, whether the specificity should be normalized by `log2(ncol(mat))`.

Value

A vector of the length of the row number of the input matrix, namely the specificity score of genes.

References

Martinez and Reyes-Valdes (2008) Defining diversity, specialization, and gene specificity in transcriptomes through information theory. PNAS 105(28):9709–9714

Examples

myMat <- rbind(c(3,4,5),c(6,6,6), c(0,2,4))
entropySpecificity(myMat)
entropySpecificity(myMat, norm=TRUE)

myRandomMat <- matrix(runif(1000), ncol=20)
entropySpecificity(myRandomMat)
entropySpecificity(myRandomMat, norm=TRUE)
myMat <- rbind(c(3,4,5),c(6,6,6), c(0,2,4))
entropySpecificity(myMat)
entropySpecificity(myMat, norm=TRUE)

myRandomMat <- matrix(runif(1000), ncol=20)
entropySpecificity(myRandomMat)
entropySpecificity(myRandomMat, norm=TRUE)

Filter a GmtList by size

Description

Filter a GmtList by size

Usage

filterBySize(x, min, max)
filterBySize(x, min, max)

Arguments

`x`	A `GmtList` object
`min`	Numeric, gene-sets with fewer genes than `min` will be removed
`max`	Numeric, gene-sets with more genes than `max` will be removed

Value

A GmtList object with sizes (count of genes) between min and max (inclusive).

Filter rows of p-value matrix under the significance threshold

Description

Filter rows of p-value matrix under the significance threshold

Usage

filterPmat(x, threshold)
filterPmat(x, threshold)

Arguments

`x`	A matrix of p-values. It must be raw p-values and should not be transformed (e.g. logarithmic).
`threshold`	A numeric value, the minimal p-value used to filter rows. If missing, given the values of `NA`, `NULL` or number `0`, no filtering will be done and the input matrix will be returned.

Value

Matrix of p-values. If no line is left, a empty matrix of the same dimension as input will be returned.

Author(s)

Jitao David Zhang <[email protected]>

Examples

set.seed(1235)
testMatrix <- matrix(runif(100,0,1), nrow=10)

## filtering
(testMatrix.filter <- filterPmat(testMatrix, threshold=0.05))
## more strict filtering
(testMatrix.strictfilter <- filterPmat(testMatrix, threshold=0.01))
## no filtering
(testMatrix.nofilter <- filterPmat(testMatrix))
set.seed(1235)
testMatrix <- matrix(runif(100,0,1), nrow=10)

## filtering
(testMatrix.filter <- filterPmat(testMatrix, threshold=0.05))
## more strict filtering
(testMatrix.strictfilter <- filterPmat(testMatrix, threshold=0.01))
## no filtering
(testMatrix.nofilter <- filterPmat(testMatrix))

Getting leading-edge indices from a vector

Description

Getting leading-edge indices from a vector

Usage

getLeadingEdgeIndexFromVector(
  x,
  index,
  comparison = c("greater", "less"),
  reference = c("background", "geneset")
)

getLeadingEdgeIndexFromMatrix(
  x,
  index,
  comparison = c("greater", "less"),
  reference = c("background", "geneset")
)
getLeadingEdgeIndexFromVector(
  x,
  index,
  comparison = c("greater", "less"),
  reference = c("background", "geneset")
)

getLeadingEdgeIndexFromMatrix(
  x,
  index,
  comparison = c("greater", "less"),
  reference = c("background", "geneset")
)

Arguments

`x`	A numeric vector (`getLeadingEdgeIndexFromVector`) or a numeric matrix (`getLeadingEdgeIndexFromMatrix`).
`index`	An integer vector, indicating the indices of genes in a gene-set.
`comparison`	Character string, are values greater than or less than the reference value considered as leading-edge? This depends on the type of value requested by the user in `wmwTest`.
`reference`	Character string, which reference is used? If `background`, genes with expression higher than the median of the background are reported. Otherwise in the case of `geneset`, genes with expression higher than the median of the gene-set is reported. Default is `background`, which is consistent with the results of the Wilcoxon-Mann-Whitney tests.

Value

An integer vector, indicating the indices of leading-edge genes.

Functions

getLeadingEdgeIndexFromMatrix: x is a matrix.

Examples

myProfile <- c(rnorm(5, 3), rnorm(15, -3), rnorm(100, 0))
getLeadingEdgeIndexFromVector(myProfile, 1:20)
getLeadingEdgeIndexFromVector(myProfile, 1:20, comparison="less")
getLeadingEdgeIndexFromVector(myProfile, 1:20, comparison="less", reference="geneset")
myProfile2 <- c(rnorm(15, 3), rnorm(5, -3), rnorm(100, 0))
myProfileMat <- cbind(myProfile, myProfile2)
getLeadingEdgeIndexFromMatrix(myProfileMat, 1:20)
getLeadingEdgeIndexFromMatrix(myProfileMat, 1:20, comparison="less")
getLeadingEdgeIndexFromMatrix(myProfileMat, 1:20, comparison="less", reference="geneset")
myProfile <- c(rnorm(5, 3), rnorm(15, -3), rnorm(100, 0))
getLeadingEdgeIndexFromVector(myProfile, 1:20)
getLeadingEdgeIndexFromVector(myProfile, 1:20, comparison="less")
getLeadingEdgeIndexFromVector(myProfile, 1:20, comparison="less", reference="geneset")
myProfile2 <- c(rnorm(15, 3), rnorm(5, -3), rnorm(100, 0))
myProfileMat <- cbind(myProfile, myProfile2)
getLeadingEdgeIndexFromMatrix(myProfileMat, 1:20)
getLeadingEdgeIndexFromMatrix(myProfileMat, 1:20, comparison="less")
getLeadingEdgeIndexFromMatrix(myProfileMat, 1:20, comparison="less", reference="geneset")

Calculate Gini Index of a numeric vector

Description

Calculate the Gini index of a numeric vector.

Usage

gini(x)
gini(x)

Arguments

`x`	A numeric vector.

Details

The Gini index (Gini coefficient) is a measure of statistical dispersion. A Gini coefficient of zero expresses perfect equality where all values are the same. A Gini coefficient of one expresses maximal inequality among values.

Value

A numeric value between 0 and 1.

Author(s)

Jitao David Zhang <[email protected]>

References

Gini. C. (1912) Variability and Mutability, C. Cuppini, Bologna 156 pages.

Examples

testValues <- runif(100)
gini(testValues)
testValues <- runif(100)
gini(testValues)

Convert a list to a GmtList object

Description

Convert a list to a GmtList object

Usage

GmtList(list)
GmtList(list)

Arguments

list

A list of genesets; each geneset is a list of at least three fields: 'name', 'desc', and 'genes'. 'name' and 'desc' contains one character string ('desc' can be NULL while 'name' cannot), and 'genes' can be either NULL or a character vector. In addition, 'namespace' is accepted to represent the namespace.

For convenience, the function also accepts a list of character vectors, each containing a geneset. In this case, the function works as a wrapper of as.GmtList

Examples

testList <- list(list(name="GS_A", desc=NULL, genes=LETTERS[1:3]),
                 list(name="GS_B", desc="gene set B", genes=LETTERS[1:5]),
                 list(name="GS_C", desc="gene set C", genes=NULL))
testGmt <- GmtList(testList)

# as wrapper of as.GmtList
testGeneList <- list(GS_A=LETTERS[1:3], GS_B=LETTERS[1:5], GS_C=NULL)
testGeneGmt <- GmtList(testGeneList)

testList <- list(list(name="GS_A", desc=NULL, genes=LETTERS[1:3]),
                 list(name="GS_B", desc="gene set B", genes=LETTERS[1:5]),
                 list(name="GS_C", desc="gene set C", genes=NULL))
testGmt <- GmtList(testList)

# as wrapper of as.GmtList
testGeneList <- list(GS_A=LETTERS[1:3], GS_B=LETTERS[1:5], GS_C=NULL)
testGeneGmt <- GmtList(testGeneList)

An S4 class to hold geneset in the GMT file in a list, each item in the list is in in turn a list containing following items: name, desc, and genes.

Description

An S4 class to hold geneset in the GMT file in a list, each item in the list is in in turn a list containing following items: name, desc, and genes.

Convert gmtlist into a list of signed genesets

Description

Convert gmtlist into a list of signed genesets

Usage

gmtlist2signedGenesets(
  gmtlist,
  posPattern = "_UP$",
  negPattern = "_DN$",
  nomatch = c("ignore", "pos", "neg")
)
gmtlist2signedGenesets(
  gmtlist,
  posPattern = "_UP$",
  negPattern = "_DN$",
  nomatch = c("ignore", "pos", "neg")
)

Arguments

`gmtlist`	A gmtlist object, probably read-in by `readGmt`
`posPattern`	Regular expression pattern of positive gene sets. It is trimmed from the original name to get the stem name of the gene set. See examples below.
`negPattern`	Regular expression pattern of negative gene sets. It is trimmed from the original name to get the stem name of the gene set. See examples below.
`nomatch`	Options to deal with gene sets that match neither positive nor negative patterns. ignore: they will be ignored (but not discarded, see details below); pos: they will be counted as positive signs; neg: they will be counted as negative signs

Value

An S4-object of SignedGenesets, which is a list of signed_geneset, each being a two-item list; the first item is 'pos', containing a character vector of positive genes; and the second item is 'neg', containing a character vector of negative genes.

Gene set names are detected whether they are positive or negative. If neither positive nor negative, nomatch will determine how will they be interpreted. In case of pos (or neg), such genesets will be treated as positive (or negative) gene sets.In case nomatch is set to ignore, the gene set will appear in the returned values with both positive and negative sets set to NULL.

Examples

testInputList <- list(list(name="GeneSetA_UP",genes=LETTERS[1:3]),
               list(name="GeneSetA_DN", genes=LETTERS[4:6]),
               list(name="GeneSetB", genes=LETTERS[2:4]),
               list(name="GeneSetC_DN", genes=LETTERS[1:3]),
               list(name="GeneSetD_UP", genes=LETTERS[1:3]))
testOutputList.ignore <- gmtlist2signedGenesets(testInputList, nomatch="ignore")
testOutputList.pos <- gmtlist2signedGenesets(testInputList, nomatch="pos")
testOutputList.neg <- gmtlist2signedGenesets(testInputList, nomatch="neg")
testInputList <- list(list(name="GeneSetA_UP",genes=LETTERS[1:3]),
               list(name="GeneSetA_DN", genes=LETTERS[4:6]),
               list(name="GeneSetB", genes=LETTERS[2:4]),
               list(name="GeneSetC_DN", genes=LETTERS[1:3]),
               list(name="GeneSetD_UP", genes=LETTERS[1:3]))
testOutputList.ignore <- gmtlist2signedGenesets(testInputList, nomatch="ignore")
testOutputList.pos <- gmtlist2signedGenesets(testInputList, nomatch="pos")
testOutputList.neg <- gmtlist2signedGenesets(testInputList, nomatch="neg")

Gene-set descriptions

Description

Gene-set descriptions

Usage

gsDesc(x)
gsDesc(x)

Arguments

`x`	A `GmtList` object

Value

Descriptions as a vector of character strings of the same length as x

Gene-set gene counts

Description

Gene-set gene counts

gsSize is the synonym of gsGeneCount

Usage

gsGeneCount(x, uniqGenes = TRUE)

gsSize(x, uniqGenes = TRUE)
gsGeneCount(x, uniqGenes = TRUE)

gsSize(x, uniqGenes = TRUE)

Arguments

`x`	A `GmtList` or similar object
`uniqGenes`	Logical, whether only unique genes are counted

Value

Gene counts (aka gene-set sizes) as a vector of integer of the same length as x

Gene-set member genes

Description

Gene-set member genes

Usage

gsGenes(x)
gsGenes(x)

Arguments

`x`	A `GmtList` object

Value

A list of genes as character strings of the same length as x

Gene-set names

Description

Gene-set names

Usage

gsName(x)
gsName(x)

Arguments

`x`	A `GmtList` object

Value

Names as a vector of character strings of the same length as x

Gene-set namespaces

Description

Gene-set namespaces

Usage

gsNamespace(x)
gsNamespace(x)

Arguments

`x`	A `GmtList` object

Value

Namespaces as a vector of character strings of the same length as x

gsNamespace<- is the synonym of setGsNamespace

Description

gsNamespace<- is the synonym of setGsNamespace

Usage

gsNamespace(x) <- value
gsNamespace(x) <- value

Arguments

`x`	A `GmtList` object
`value`	`namespace` in `setGsNamespace`. It can be either a function that applies to a `gene-set list` element of the object (for instance `function(x) x$desc` to extract description), or a vector of the same length of `x`, or in the special case `NULL`, which will erase the field namespace.

Whether namespace is set

Description

Whether namespace is set

Usage

hasNamespace(x)
hasNamespace(x)

Arguments

`x`	A `GmtList` object

Value

Logical, whether all gene-sets have the field namespace set

Convert a list to an IndexList object

Description

Convert a list to an IndexList object

Usage

IndexList(object, ..., keepNA = FALSE, keepDup = FALSE, offset = 1L)

## S4 method for signature 'numeric'
IndexList(object, ..., keepNA = FALSE, keepDup = FALSE, offset = 1L)

## S4 method for signature 'logical'
IndexList(object, ..., keepNA = FALSE, keepDup = FALSE, offset = 1L)

## S4 method for signature 'list'
IndexList(object, keepNA = FALSE, keepDup = FALSE, offset = 1L)
IndexList(object, ..., keepNA = FALSE, keepDup = FALSE, offset = 1L)

## S4 method for signature 'numeric'
IndexList(object, ..., keepNA = FALSE, keepDup = FALSE, offset = 1L)

## S4 method for signature 'logical'
IndexList(object, ..., keepNA = FALSE, keepDup = FALSE, offset = 1L)

## S4 method for signature 'list'
IndexList(object, keepNA = FALSE, keepDup = FALSE, offset = 1L)

Arguments

`object`	Either a list of unique integer indices, NULL and logical vectors (of same lengths), or a numerical vector or a logical vector. NA is discarded.
`...`	If `object` isn't a list, additional vectors can go here.
`keepNA`	Logical, whether NA indices should be kept or not. Default: FALSE (removed)
`keepDup`	Logical, whether duplicated indices should be kept or not. Default: FALSE (removed)
`offset`	Integer, the starting index. Default: 1 (as in the convention of R)

Value

The function returns a list of vectors

Examples

testList <- list(GS_A=c(1,2,3,4,3),
                 GS_B=c(2,3,4,5),
                 GS_C=NULL,
                 GS_D=c(1,3,5,NA),
                 GS_E=c(2,4))
testIndexList <- IndexList(testList)
IndexList(c(FALSE, TRUE, TRUE), c(FALSE, FALSE, TRUE), c(TRUE, FALSE, FALSE), offset=0)
IndexList(list(A=1:3, B=4:5, C=7:9))
IndexList(list(A=1:3, B=4:5, C=7:9), offset=0)
testList <- list(GS_A=c(1,2,3,4,3),
                 GS_B=c(2,3,4,5),
                 GS_C=NULL,
                 GS_D=c(1,3,5,NA),
                 GS_E=c(2,4))
testIndexList <- IndexList(testList)
IndexList(c(FALSE, TRUE, TRUE), c(FALSE, FALSE, TRUE), c(TRUE, FALSE, FALSE), offset=0)
IndexList(list(A=1:3, B=4:5, C=7:9))
IndexList(list(A=1:3, B=4:5, C=7:9), offset=0)

An S4 class to hold a list of integers as indices, with the possibility to specify the offset of the indices

Description

An S4 class to hold a list of integers as indices, with the possibility to specify the offset of the indices

Slots

offset: An integer specifying the value of first element. Default 1
keepNA: Logical, whether NA is kept during construction
keepDup: Logical, whether duplicated values are kept during construction

Function to validate a BaseIndexList object

Description

Function to validate a BaseIndexList object

Usage

isValidBaseIndexList(object)
isValidBaseIndexList(object)

Arguments

object

A BaseIndexList object Use setValidity("BaseIndexList", "isValidBaseIndexList") to check integrity of BaseIndexList objects. It can be very slow, therefore the feature is not turned on by default

Function to validate a GmtList object

Description

Function to validate a GmtList object

Usage

isValidGmtList(object)
isValidGmtList(object)

Arguments

object

A GmtList object Use setValidity("GmtList", "isValidGmtList") to check integrity of GmtList objects. It can be very slow, therefore the feature is not turned on by default

Function to validate an IndexList object

Description

Function to validate an IndexList object

Usage

isValidIndexList(object)
isValidIndexList(object)

Arguments

object

an IndexList object Use setValidity("BaseIndexList", "isValidBaseIndexList") to check integrity of IndexList objects. It can be very slow, therefore the feature is not turned on by default

Function to validate a SignedGenesets object

Description

Function to validate a SignedGenesets object

Usage

isValidSignedGenesets(object)
isValidSignedGenesets(object)

Arguments

object

A SignedGenesets object Use setValidity("SignedGenesets", "isValidSignedGenesets") to check integrity of SignedGenesets objects. It can be very slow, therefore the feature is not turned on by default

Function to validate a SignedIndexList object

Description

Function to validate a SignedIndexList object

Usage

isValidSignedIndexList(object)
isValidSignedIndexList(object)

Arguments

object

a SignedIndexList object Use setValidity("SignedIndexList", "isValidSignedIndexList") to check integrity of SignedIndexList objects. It can be very slow, therefore the feature is not turned on by default

Match genes in a list-like object to a vector of genesymbols

Description

Match genes in a list-like object to a vector of genesymbols

Usage

matchGenes(list, object, ...)

## S4 method for signature 'GmtList,character'
matchGenes(list, object)

## S4 method for signature 'GmtList,matrix'
matchGenes(list, object)

## S4 method for signature 'GmtList,eSet'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'character,character'
matchGenes(list, object)

## S4 method for signature 'character,matrix'
matchGenes(list, object)

## S4 method for signature 'character,eSet'
matchGenes(list, object)

## S4 method for signature 'character,DGEList'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'GmtList,DGEList'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'SignedGenesets,character'
matchGenes(list, object)

## S4 method for signature 'SignedGenesets,matrix'
matchGenes(list, object)

## S4 method for signature 'SignedGenesets,eSet'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'SignedGenesets,DGEList'
matchGenes(list, object, col = "GeneSymbol")
matchGenes(list, object, ...)

## S4 method for signature 'GmtList,character'
matchGenes(list, object)

## S4 method for signature 'GmtList,matrix'
matchGenes(list, object)

## S4 method for signature 'GmtList,eSet'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'character,character'
matchGenes(list, object)

## S4 method for signature 'character,matrix'
matchGenes(list, object)

## S4 method for signature 'character,eSet'
matchGenes(list, object)

## S4 method for signature 'character,DGEList'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'GmtList,DGEList'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'SignedGenesets,character'
matchGenes(list, object)

## S4 method for signature 'SignedGenesets,matrix'
matchGenes(list, object)

## S4 method for signature 'SignedGenesets,eSet'
matchGenes(list, object, col = "GeneSymbol")

## S4 method for signature 'SignedGenesets,DGEList'
matchGenes(list, object, col = "GeneSymbol")

Arguments

`list`	A GmtList, list, character or SignedGenesets object
`object`	Gene symbols to be matched; they can come from a vector of character strings, or a column in the fData of an `eSet` object.
`...`	additional arguments like `col`
`col`	Column name of `fData` in an `eSet` object, or `genes` in an `DGEList` object, to specify where gene symbols are stored. The default value is set to "GeneSymbol"

Value

An IndexList object, which is essentially a list of the same length as input (length of 1 in case characters are used as input), with matching indices.

Examples

## test GmtList, character
testGenes <- sprintf("gene%d", 1:10)
testGeneSets <- GmtList(list(gs1=c("gene1", "gene2"), gs2=c("gene9", "gene10"), gs3=c("gene100")))
matchGenes(testGeneSets, testGenes)

## test GmtList, matrix
testGenes <- sprintf("gene%d", 1:10)
testGeneSets <- GmtList(list(gs1=c("gene1", "gene2"), gs2=c("gene9", "gene10"), gs3=c("gene100")))
testGeneExprs <- matrix(rnorm(100), nrow=10, dimnames=list(testGenes, sprintf("sample%d", 1:10)))
matchGenes(testGeneSets, testGeneExprs)

## test GmtList, eSet
testGenes <- sprintf("gene%d", 1:10)
testGeneSets <- GmtList(list(gs1=c("gene1", "gene2"), gs2=c("gene9", "gene10"), gs3=c("gene100")))
testGeneExprs <- matrix(rnorm(100), nrow=10, dimnames=list(testGenes, sprintf("sample%d", 1:10)))
testFeat <- data.frame(GeneSymbol=rownames(testGeneExprs), row.names=testGenes)
testPheno <- data.frame(SampleId=colnames(testGeneExprs), row.names=colnames(testGeneExprs))
testEset <- ExpressionSet(assayData=testGeneExprs,
    featureData=AnnotatedDataFrame(testFeat),
    phenoData=AnnotatedDataFrame(testPheno))
matchGenes(testGeneSets, testGeneExprs)
## force using row names
matchGenes(testGeneSets, testEset, col=NULL)

 ## test GmtList, DGEList
 if(requireNamespace("edgeR")) {
    mat <- matrix(rnbinom(100, mu=5, size=2), ncol=10)
    rownames(mat) <- sprintf("gene%d", 1:nrow(mat))
    y <- edgeR::DGEList(counts=mat, group=rep(1:2, each=5))

    ## if genes are not set, row names of the count matrix will be used for lookup
    myGeneSet <- GmtList(list(gs1=rownames(mat)[1:2], gs2=rownames(mat)[9:10], gs3="gene100"))
    matchGenes(myGeneSet, y)

    matchGenes(c("gene1", "gene2"), y)
    ## alternatively, use 'col' parameter to specify the column in 'genes'
    y2 <- edgeR::DGEList(counts=mat,
      group=rep(1:2, each=5),
      genes=data.frame(GeneIdentifier=rownames(mat), row.names=rownames(mat)))
    matchGenes(myGeneSet, y2, col="GeneIdentifier")
 }

## test character, character
matchGenes(c("gene1", "gene2"), testGenes)

## test character, matrix
matchGenes(c("gene1", "gene2"), testGeneExprs)

## test character, eset
matchGenes(c("gene1", "gene2"), testEset)
## test GmtList, character
testGenes <- sprintf("gene%d", 1:10)
testGeneSets <- GmtList(list(gs1=c("gene1", "gene2"), gs2=c("gene9", "gene10"), gs3=c("gene100")))
matchGenes(testGeneSets, testGenes)

## test GmtList, matrix
testGenes <- sprintf("gene%d", 1:10)
testGeneSets <- GmtList(list(gs1=c("gene1", "gene2"), gs2=c("gene9", "gene10"), gs3=c("gene100")))
testGeneExprs <- matrix(rnorm(100), nrow=10, dimnames=list(testGenes, sprintf("sample%d", 1:10)))
matchGenes(testGeneSets, testGeneExprs)

## test GmtList, eSet
testGenes <- sprintf("gene%d", 1:10)
testGeneSets <- GmtList(list(gs1=c("gene1", "gene2"), gs2=c("gene9", "gene10"), gs3=c("gene100")))
testGeneExprs <- matrix(rnorm(100), nrow=10, dimnames=list(testGenes, sprintf("sample%d", 1:10)))
testFeat <- data.frame(GeneSymbol=rownames(testGeneExprs), row.names=testGenes)
testPheno <- data.frame(SampleId=colnames(testGeneExprs), row.names=colnames(testGeneExprs))
testEset <- ExpressionSet(assayData=testGeneExprs,
    featureData=AnnotatedDataFrame(testFeat),
    phenoData=AnnotatedDataFrame(testPheno))
matchGenes(testGeneSets, testGeneExprs)
## force using row names
matchGenes(testGeneSets, testEset, col=NULL)

 ## test GmtList, DGEList
 if(requireNamespace("edgeR")) {
    mat <- matrix(rnbinom(100, mu=5, size=2), ncol=10)
    rownames(mat) <- sprintf("gene%d", 1:nrow(mat))
    y <- edgeR::DGEList(counts=mat, group=rep(1:2, each=5))

    ## if genes are not set, row names of the count matrix will be used for lookup
    myGeneSet <- GmtList(list(gs1=rownames(mat)[1:2], gs2=rownames(mat)[9:10], gs3="gene100"))
    matchGenes(myGeneSet, y)

    matchGenes(c("gene1", "gene2"), y)
    ## alternatively, use 'col' parameter to specify the column in 'genes'
    y2 <- edgeR::DGEList(counts=mat,
      group=rep(1:2, each=5),
      genes=data.frame(GeneIdentifier=rownames(mat), row.names=rownames(mat)))
    matchGenes(myGeneSet, y2, col="GeneIdentifier")
 }

## test character, character
matchGenes(c("gene1", "gene2"), testGenes)

## test character, matrix
matchGenes(c("gene1", "gene2"), testGeneExprs)

## test character, eset
matchGenes(c("gene1", "gene2"), testEset)

Get offset from an IndexList object

Description

Get offset from an IndexList object

Usage

offset(object)

## S4 method for signature 'BaseIndexList'
offset(object)
offset(object)

## S4 method for signature 'BaseIndexList'
offset(object)

Arguments

object

An IndexList object

Examples

myIndexList <- IndexList(list(1:5, 2:7, 3:8), offset=1L)
offset(myIndexList)
myIndexList <- IndexList(list(1:5, 2:7, 3:8), offset=1L)
offset(myIndexList)

Set the offset of an `IndexList` or a `SignedIndexList` object

Description

Set the offset of an IndexList or a SignedIndexList object

Usage

`offset<-`(object, value)

## S4 replacement method for signature 'IndexList,numeric'
offset(object) <- value

## S4 replacement method for signature 'SignedIndexList,numeric'
offset(object) <- value
`offset<-`(object, value)

## S4 replacement method for signature 'IndexList,numeric'
offset(object) <- value

## S4 replacement method for signature 'SignedIndexList,numeric'
offset(object) <- value

Arguments

`object`	An `IndexList` or a `SignedIndexList` object
`value`	The value, that the offset of `object` is set too. If it isn't an integer, it's coerced into an integer.

Examples

myIndexList <- IndexList(list(1:5, 2:7, 3:8), offset=1L)
offset(myIndexList)
offset(myIndexList) <- 3
offset(myIndexList)
myIndexList <- IndexList(list(1:5, 2:7, 3:8), offset=1L)
offset(myIndexList)
offset(myIndexList) <- 3
offset(myIndexList)

Prettify default signature names

Description

Prettify default signature names

Usage

prettySigNames(names, includeNamespace = TRUE)
prettySigNames(names, includeNamespace = TRUE)

Arguments

`names`	Character strings, signature names
`includeNamespace`	Logical, whether the namespace of the signatures should be included

Value

Character strings, pretty signature names

Examples

sig <- readCurrentSignatures()
prettyNames <- prettySigNames(names(sig))
sig <- readCurrentSignatures()
prettyNames <- prettySigNames(names(sig))

Load current BioQC signatures

Description

Load current BioQC signatures

Usage

readCurrentSignatures(uniqGenes = TRUE, namespace = NULL)
readCurrentSignatures(uniqGenes = TRUE, namespace = NULL)

Arguments

`uniqGenes`	Logical, whether duplicated genes should be removed, passed to `readGmt`.
`namespace`	Character, namespace of the gene-set, or codeNULL, passed to `readGmt`

Value

A GmtList

Examples

readCurrentSignatures()
readCurrentSignatures()

Read in gene-sets from a GMT file

Description

Read in gene-sets from a GMT file

Usage

readGmt(..., uniqGenes = TRUE, namespace = NULL)
readGmt(..., uniqGenes = TRUE, namespace = NULL)

Arguments

`...`	Named or unnamed characater string vector, giving file names of one or more GMT format files.
`uniqGenes`	Logical, whether duplicated genes should be removed
`namespace`	Character, namespace of the gene-set. It can be used to specify namespace or sources of the gene-sets. If `NULL` is given, so no namespace is used and all gene-sets are assumed to come from the same unspecified namespace. The option can be helpful when gene-sets from multiple namespaces are jointly used.

Value

A GmtList object, which is a S4-class wrapper of a list. Each element in the object is a list of (at least) three items:

gene-set name (field name), character string, accessible with gsName
gene-set description (field desc), character string, accessible with gsDesc
genes (field genes), a vector of character strings, , accessible with gsGenes
namespace (field namespace), accessible with gsNamespace

Note

Currently, when namespace is set as NULL, no namespace is used. This may change in the future, since we may use file base name as the default namespace.

Examples

gmt_file <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
gmt_list <- readGmt(gmt_file)
gmt_nonUniqGenes_list <- readGmt(gmt_file, uniqGenes=FALSE)
gmt_namespace_list <- readGmt(gmt_file, uniqGenes=FALSE, namespace="myNamespace")

## suppose we have two lists of gene-sets to read in
test_gmt_file <- system.file("extdata/test.gmt", package="BioQC")
gmt_twons_list <- readGmt(gmt_file, test_gmt_file, namespace=c("BioQC", "test"))
## alternatively
gmt_twons_list <- readGmt(BioQC=gmt_file, test=test_gmt_file)

gmt_file <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
gmt_list <- readGmt(gmt_file)
gmt_nonUniqGenes_list <- readGmt(gmt_file, uniqGenes=FALSE)
gmt_namespace_list <- readGmt(gmt_file, uniqGenes=FALSE, namespace="myNamespace")

## suppose we have two lists of gene-sets to read in
test_gmt_file <- system.file("extdata/test.gmt", package="BioQC")
gmt_twons_list <- readGmt(gmt_file, test_gmt_file, namespace=c("BioQC", "test"))
## alternatively
gmt_twons_list <- readGmt(BioQC=gmt_file, test=test_gmt_file)

Read signed GMT files

Description

Read signed GMT files

Usage

readSignedGmt(
  filename,
  posPattern = "_UP$",
  negPattern = "_DN$",
  nomatch = c("ignore", "pos", "neg"),
  uniqGenes = TRUE,
  namespace = NULL
)
readSignedGmt(
  filename,
  posPattern = "_UP$",
  negPattern = "_DN$",
  nomatch = c("ignore", "pos", "neg"),
  uniqGenes = TRUE,
  namespace = NULL
)

Arguments

`filename`	A gmt file
`posPattern`	Pattern of positive gene sets
`negPattern`	Pattern of negative gene sets
`nomatch`	options to deal with gene sets that match to neither posPattern nor negPattern patterns
`uniqGenes`	Logical, whether genes should be made unique
`namespace`	Character string or `NULL`, namespace of gene-sets

Examples

testGmtFile <- system.file("extdata/test.gmt", package="BioQC")
testSignedGenesets.ignore <- readSignedGmt(testGmtFile, nomatch="ignore")
testSignedGenesets.pos <- readSignedGmt(testGmtFile, nomatch="pos")
testSignedGenesets.neg <- readSignedGmt(testGmtFile, nomatch="neg")
testGmtFile <- system.file("extdata/test.gmt", package="BioQC")
testSignedGenesets.ignore <- readSignedGmt(testGmtFile, nomatch="ignore")
testSignedGenesets.pos <- readSignedGmt(testGmtFile, nomatch="pos")
testSignedGenesets.neg <- readSignedGmt(testGmtFile, nomatch="neg")

Entropy-based sample specialization

Description

Entropy-based sample specialization

Usage

sampleSpecialization(mat, norm = TRUE)
sampleSpecialization(mat, norm = TRUE)

Arguments

`mat`	A matrix (usually an expression matrix), with genes (features) in rows and samples in columns.
`norm`	Logical, whether the specialization should be normalized by `log2(ncol(mat))`.

Value

A vector as long as the column number of the input matrix

References

Martinez and Reyes-Valdes (2008) Defining diversity, specialization, and gene specificity in transcriptomes through information theory. PNAS 105(28):9709–9714

Examples

myMat <- rbind(c(3,4,5),c(6,6,6), c(0,2,4))
sampleSpecialization(myMat)
sampleSpecialization(myMat, norm=TRUE)

myRandomMat <- matrix(runif(1000), ncol=20)
sampleSpecialization(myRandomMat)
sampleSpecialization(myRandomMat, norm=TRUE)
myMat <- rbind(c(3,4,5),c(6,6,6), c(0,2,4))
sampleSpecialization(myMat)
sampleSpecialization(myMat, norm=TRUE)

myRandomMat <- matrix(runif(1000), ncol=20)
sampleSpecialization(myRandomMat)
sampleSpecialization(myRandomMat, norm=TRUE)

Set gene-set description as namespace

Description

Set gene-set description as namespace

Usage

setDescAsNamespace(x)
setDescAsNamespace(x)

Arguments

`x`	A `GmtList` object This function wrapps `setNamespace` to set gene-set description as namespace

Set the namespace field in each gene-set within a GmtList

Description

Set the namespace field in each gene-set within a GmtList

Usage

setNamespace(x, namespace)
setNamespace(x, namespace)

Arguments

x

A GmtList object encoding a list of gene-sets

namespace

It can be either a function that applies to a gene-set list element of the object (for instance function(x) x$desc to extract description), or a vector of the same length of x, or in the special case NULL, which will erase the field namespace.

Note that using vectors as namespace leads to poor performance when the input object has many gene-sets.

Examples

myGmtList <- GmtList(list(list(name="GeneSet1", desc="Namespace1", genes=LETTERS[1:3]),
  list(name="GeneSet2", desc="Namespace1", genes=rep(LETTERS[4:6],2)),
  list(name="GeneSet1", desc="Namespace1", genes=LETTERS[4:6]),
  list(name="GeneSet3", desc="Namespace2", genes=LETTERS[1:5])))
hasNamespace(myGmtList)
myGmtList2 <- setNamespace(myGmtList, namespace=function(x) x$desc)
gsNamespace(myGmtList2)
## the function can provide flexible ways to encode the gene-set namespace
myGmtList3 <- setNamespace(myGmtList, namespace=function(x) gsub("Namespace", "C", x$desc))
gsNamespace(myGmtList3)
## using vectors
myGmtList4 <- setNamespace(myGmtList, namespace=c("C1", "C1", "C1", "C2"))
gsNamespace(myGmtList4)
myGmtList2null <- setNamespace(myGmtList2, namespace=NULL)
hasNamespace(myGmtList2null)
myGmtList <- GmtList(list(list(name="GeneSet1", desc="Namespace1", genes=LETTERS[1:3]),
  list(name="GeneSet2", desc="Namespace1", genes=rep(LETTERS[4:6],2)),
  list(name="GeneSet1", desc="Namespace1", genes=LETTERS[4:6]),
  list(name="GeneSet3", desc="Namespace2", genes=LETTERS[1:5])))
hasNamespace(myGmtList)
myGmtList2 <- setNamespace(myGmtList, namespace=function(x) x$desc)
gsNamespace(myGmtList2)
## the function can provide flexible ways to encode the gene-set namespace
myGmtList3 <- setNamespace(myGmtList, namespace=function(x) gsub("Namespace", "C", x$desc))
gsNamespace(myGmtList3)
## using vectors
myGmtList4 <- setNamespace(myGmtList, namespace=c("C1", "C1", "C1", "C2"))
gsNamespace(myGmtList4)
myGmtList2null <- setNamespace(myGmtList2, namespace=NULL)
hasNamespace(myGmtList2null)

Show method for GmtList

Description

Show method for GmtList

Usage

## S4 method for signature 'GmtList'
show(object)
## S4 method for signature 'GmtList'
show(object)

Arguments

object

An object of the class GmtList

Show method for IndexList

Description

Show method for IndexList

Usage

## S4 method for signature 'IndexList'
show(object)
## S4 method for signature 'IndexList'
show(object)

Arguments

object

An object of the class IndexList

Show method for SignedGenesets

Description

Show method for SignedGenesets

Usage

## S4 method for signature 'SignedGenesets'
show(object)
## S4 method for signature 'SignedGenesets'
show(object)

Arguments

object

An object of the class SignedGenesets

Show method for SignedIndexList

Description

Show method for SignedIndexList

Usage

## S4 method for signature 'SignedIndexList'
show(object)
## S4 method for signature 'SignedIndexList'
show(object)

Arguments

object

An object of the class SignedIndexList

Convert a list to a SignedGenesets object

Description

Convert a list to a SignedGenesets object

Usage

SignedGenesets(list)
SignedGenesets(list)

Arguments

list

A list of genesets; each geneset is a list of at least three fields: 'name', 'pos', and 'neg'. 'name' contains one non-null character string, and both 'pos' and 'neg' can be either NULL or a character vector.

Examples

testList <- list(list(name="GS_A", pos=NULL, neg=LETTERS[1:3]),
                 list(name="GS_B", pos=LETTERS[1:5], neg=LETTERS[7:9]),
                 list(name="GS_C", pos=LETTERS[1:5], neg=NULL),
                 list(name="GS_D", pos=NULL, neg=NULL))
testSigndGS <- SignedGenesets(testList)

testList <- list(list(name="GS_A", pos=NULL, neg=LETTERS[1:3]),
                 list(name="GS_B", pos=LETTERS[1:5], neg=LETTERS[7:9]),
                 list(name="GS_C", pos=LETTERS[1:5], neg=NULL),
                 list(name="GS_D", pos=NULL, neg=NULL))
testSigndGS <- SignedGenesets(testList)

An S4 class to hold signed genesets, each item in the list is in in turn a list containing following items: name, pos, and neg.

Description

An S4 class to hold signed genesets, each item in the list is in in turn a list containing following items: name, pos, and neg.

Convert a list into a SignedIndexList

Description

Convert a list into a SignedIndexList

Usage

SignedIndexList(object, ...)

## S4 method for signature 'list'
SignedIndexList(object, keepNA = FALSE, keepDup = FALSE, offset = 1L)
SignedIndexList(object, ...)

## S4 method for signature 'list'
SignedIndexList(object, keepNA = FALSE, keepDup = FALSE, offset = 1L)

Arguments

`object`	A list of lists, each with two elements named 'pos' or 'neg', can be logical vectors or integer indices
`...`	additional arguments, currently ignored
`keepNA`	Logical, whether NA indices should be kept or not. Default: FALSE (removed)
`keepDup`	Logical, whether duplicated indices should be kept or not. Default: FALSE (removed)
`offset`	offset; 1 if missing

Value

A SignedIndexList, a list of lists, containing two vectors named 'positive' and 'negative', which contain the indices of genes that are either positively or negatively associated with a certain phenotype

Examples

myList <- list(a = list(pos = list(1, 2, 2, 4), neg = c(TRUE, FALSE, TRUE)), 
b = list(NA), c = list(pos = c(c(2, 3), c(1, 3))))
SignedIndexList(myList)

## a special case of input is a single list with two elements, \code{pos} and \code{neg}
SignedIndexList(myList[[1]])
myList <- list(a = list(pos = list(1, 2, 2, 4), neg = c(TRUE, FALSE, TRUE)), 
b = list(NA), c = list(pos = c(c(2, 3), c(1, 3))))
SignedIndexList(myList)

## a special case of input is a single list with two elements, \code{pos} and \code{neg}
SignedIndexList(myList[[1]])

An S4 class to hold a list of signed integers as indices, with the possibility to specify the offset of the indices

Description

An S4 class to hold a list of signed integers as indices, with the possibility to specify the offset of the indices

Slots

offset: An integer specifying the value of first element. Default 1
keepNA: Logical, whether NA is kept during construction
keepDup: Logical, whether duplicated values are kept during construction

Simplify matrix in case of single row/columns

Description

Simplify matrix in case of single row/columns

Usage

simplifyMatrix(matrix)
simplifyMatrix(matrix)

Arguments

matrix

A matrix of any dimension

If only one row/column is present, the dimension is dropped and a vector will be returned

Examples

testMatrix <- matrix(round(rnorm(9),2), nrow=3)
simplifyMatrix(testMatrix)
simplifyMatrix(testMatrix[1L,,drop=FALSE])
simplifyMatrix(testMatrix[,1L,drop=FALSE])
testMatrix <- matrix(round(rnorm(9),2), nrow=3)
simplifyMatrix(testMatrix)
simplifyMatrix(testMatrix[1L,,drop=FALSE])
simplifyMatrix(testMatrix[,1L,drop=FALSE])

Make names of gene-sets unique by namespace, and member genes of gene-sets unique

Description

Make names of gene-sets unique by namespace, and member genes of gene-sets unique

Usage

uniqGenesetsByNamespace(gmtList)
uniqGenesetsByNamespace(gmtList)

Arguments

gmtList

A GmtList object, probably from readGmt. The object must have namespaces defined by setNamespace.

The function make sure that

names of gene-sets within each namespace are unique, by merging gene-sets with duplicated names
genes within each gene-set are unique, by removing duplicated genes

Gene-sets with duplicated names and different desc are merged, desc are made unique, and in case of multiple values, concatenated (with | as the collapse character).

Value

A GmtList object, with unique gene-sets and unique gene lists. If not already present, a new item namespace is appended to each list element in the GmtList object, recording the namespace used to make gene-sets unique. The order of the returned GmtList object is given by the unique gene-set name of the input object.

Examples

myGmtList <- GmtList(list(list(name="GeneSet1", desc="Namespace1", genes=LETTERS[1:3]),
  list(name="GeneSet2", desc="Namespace1", genes=rep(LETTERS[4:6],2)),
  list(name="GeneSet1", desc="Namespace1", genes=LETTERS[4:6]),
  list(name="GeneSet3", desc="Namespace2", genes=LETTERS[1:5])))
 
print(myGmtList)
myGmtList <- setNamespace(myGmtList, namespace=function(x) x$desc)
myUniqGmtList <- uniqGenesetsByNamespace(myGmtList)
print(myUniqGmtList)

myGmtList <- GmtList(list(list(name="GeneSet1", desc="Namespace1", genes=LETTERS[1:3]),
  list(name="GeneSet2", desc="Namespace1", genes=rep(LETTERS[4:6],2)),
  list(name="GeneSet1", desc="Namespace1", genes=LETTERS[4:6]),
  list(name="GeneSet3", desc="Namespace2", genes=LETTERS[1:5])))
 
print(myGmtList)
myGmtList <- setNamespace(myGmtList, namespace=function(x) x$desc)
myUniqGmtList <- uniqGenesetsByNamespace(myGmtList)
print(myUniqGmtList)

prints the options of valTypes of wmwTest

Description

prints the options of valTypes of wmwTest

Usage

valTypes()
valTypes()

Identify BioQC leading-edge genes of one gene-set

Description

Identify BioQC leading-edge genes of one gene-set

Usage

wmwLeadingEdge(
  matrix,
  indexVector,
  valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
    "log10p.less", "abs.log10p.two.sided", "Q", "r", "f", "U1", "U2"),
  thr = 0.05,
  reference = c("background", "geneset")
)
wmwLeadingEdge(
  matrix,
  indexVector,
  valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
    "log10p.less", "abs.log10p.two.sided", "Q", "r", "f", "U1", "U2"),
  thr = 0.05,
  reference = c("background", "geneset")
)

Arguments

`matrix`	A numeric matrix
`indexVector`	An integer vector, giving indices of a gene-set of interest
`valType`	Value type, consistent with the types in `wmwTest`
`thr`	Threshold of the value, greater or less than which the gene-set is considered significantly enriched in one sample
`reference`	Character string, which reference is used? If `background`, genes with expression higher than the median of the background are reported. Otherwise in the case of `geneset`, genes with expression higher than the median of the gene-set is reported. Default is `background`, which is consistent with the results of the Wilcoxon-Mann-Whitney tests.

Value

A list of integer vectors.

BioQC leading-edge genes are defined as those features whose expression is higher than the median expression of the background in a sample. The function identifies leading-edge genes of a given dataset (specified by the index vector) in a number of samples (specified by the matrix, with genes/features in rows and samples in columns) in three steps. The function calls wmwTest to run BioQC and identify samples in which the gene-set is significantly enriched. The enrichment criteria is specified by valType and thr. Then the function identifies genes in the gene-set that have greater or less expresion than the median value of the reference in those samples showing significant enrichment. Finally, it reports either leading-edge genes in individual samples, or the intersection/union of leading-edge genes in multiple samples.

Examples

myProfile <- c(rnorm(5, 3), rnorm(15, -3), rnorm(100, 0))
myProfile2 <- c(rnorm(15, 3), rnorm(5, -3), rnorm(100, 0))
myProfile3 <- c(rnorm(10, 5), rnorm(10, 0), rnorm(100, 0))
myProfileMat <- cbind(myProfile, myProfile2, myProfile3)
wmwLeadingEdge(myProfileMat, 1:20, valType="p.greater")
wmwLeadingEdge(myProfileMat, 1:20, valType="log10p.less")
wmwLeadingEdge(myProfileMat, 1:20, valType="U", reference="geneset")
wmwLeadingEdge(myProfileMat, 1:20, valType="abs.log10p.greater")
myProfile <- c(rnorm(5, 3), rnorm(15, -3), rnorm(100, 0))
myProfile2 <- c(rnorm(15, 3), rnorm(5, -3), rnorm(100, 0))
myProfile3 <- c(rnorm(10, 5), rnorm(10, 0), rnorm(100, 0))
myProfileMat <- cbind(myProfile, myProfile2, myProfile3)
wmwLeadingEdge(myProfileMat, 1:20, valType="p.greater")
wmwLeadingEdge(myProfileMat, 1:20, valType="log10p.less")
wmwLeadingEdge(myProfileMat, 1:20, valType="U", reference="geneset")
wmwLeadingEdge(myProfileMat, 1:20, valType="abs.log10p.greater")

Wilcoxon-Mann-Whitney rank sum test for high-throughput expression profiling data

Description

wmwTest is a highly efficient Wilcoxon-Mann-Whitney rank sum test for high-dimensional data, such as gene expression profiling. For datasets with more than 100 features (genes), the function can be more than 1,000 times faster than its R implementations (wilcox.test in stats, or rankSumTestWithCorrelation in limma).

Usage

wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
    "log10p.less", "abs.log10p.two.sided", "Q", "r", "f", "U1", "U2"),
  simplify = TRUE
)

## S4 method for signature 'matrix,IndexList'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'numeric,IndexList'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'matrix,GmtList'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'eSet,GmtList'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'eSet,numeric'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'eSet,logical'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'eSet,list'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'ANY,numeric'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'ANY,logical'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'ANY,list'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'matrix,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'matrix,SignedGenesets'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'numeric,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'eSet,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'eSet,SignedGenesets'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
    "log10p.less", "abs.log10p.two.sided", "Q", "r", "f", "U1", "U2"),
  simplify = TRUE
)
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
    "log10p.less", "abs.log10p.two.sided", "Q", "r", "f", "U1", "U2"),
  simplify = TRUE
)

## S4 method for signature 'matrix,IndexList'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'numeric,IndexList'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'matrix,GmtList'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'eSet,GmtList'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'eSet,numeric'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'eSet,logical'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'eSet,list'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = "p.greater",
  simplify = TRUE
)

## S4 method for signature 'ANY,numeric'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'ANY,logical'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'ANY,list'
wmwTest(x, indexList, valType = "p.greater", simplify = TRUE)

## S4 method for signature 'matrix,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'matrix,SignedGenesets'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'numeric,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'eSet,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)

## S4 method for signature 'eSet,SignedGenesets'
wmwTest(
  x,
  indexList,
  col = "GeneSymbol",
  valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
    "log10p.less", "abs.log10p.two.sided", "Q", "r", "f", "U1", "U2"),
  simplify = TRUE
)

Arguments

`x`	A numeric matrix. All other data types (e.g. numeric vectors or `ExpressionSet` objects) are coerced into matrix.
`indexList`	A list of integer indices (starting from 1) indicating signature genes. Can be of length zero. Other data types (e.g. a list of numeric or logical vectors, or a numeric or logical vector) are coerced into such a list. See `details` below for a special case using GMT files.
`col`	a string sometimes used with a `eSet`
`valType`	The value type to be returned, allowed values include `p.greater`, `p.less`, `abs.log10p.greater` and `abs.log10p.less` (one-sided tests),`p.two.sided`, and `U` statistic (or more specifically, either `U1` or `U2`), and a few other variants. See details below.
`simplify`	Logical. If not, the returning value is in matrix format; if set to `TRUE`, the results are simplified into vectors when possible (default).

Details

The basic application of the function is to test the enrichment of gene sets in expression profiling data or differentially expressed data (the matrix with feature/gene in rows and samples in columns).

A special case is when x is an eSet object (e.g. ExpressionSet), and indexList is a list returned from readGmt function. In this case, the only requirement is that one column named GeneSymbol in the featureData contain gene symbols used in the GMT file. The same applies to signed Gmt files. See the example below.

Besides the conventional value types such as ‘p.greater’, ‘p.less’, ‘p.two.sided’ , and ‘U’ (the U-statistic), wmwTest (from version 0.99-1) provides further value types: abs.log10p.greater and log10p.less perform log10 transformation on respective p-values and give the transformed value a proper sign (positive for greater than, and negative for less than); abs.log10p.two.sided transforms two-sided p-values to non-negative values; and Q score reports absolute log10-transformation of p-value of the two-side variant, and gives a proper sign to it, depending on whether it is rather greater than (positive) or less than (negative).

From version 1.19.1, the rank-biserial correlation coefficient (‘r’) and the common language effect size (‘f’) are supported value types.

Before version 1.19.3, the ‘U’ statistic returned is in fact ‘U2’. From version 1.19.3, ‘U1’ is returned when ‘U’ is used, and users can specify additional parameter values ‘U1’ and ‘U2’. The sum of ‘U1’ and ‘U2’ is the product of the sizes of two vectors to be compared.

Value

A numeric matrix or vector containing the statistic.

Methods (by class)

x = matrix,indexList = IndexList: x is a matrix and indexList is a IndexList
x = numeric,indexList = IndexList: x is a numeric and indexList is a IndexList
x = matrix,indexList = GmtList: x is a matrix and indexList is a GmtList
x = eSet,indexList = GmtList: x is a eSet and indexList is a GmtList
x = eSet,indexList = numeric: x is a eSet and indexList is a numeric
x = eSet,indexList = logical: x is a eSet and indexList is a logical
x = eSet,indexList = list: x is a eSet and indexList is a list
x = ANY,indexList = numeric: x is ANY and indexList is a numeric
x = ANY,indexList = logical: x is ANY and indexList is a logical
x = ANY,indexList = list: x is ANY and indexList is a list
x = matrix,indexList = SignedIndexList: x is a matrix and indexList is a SignedIndexList
x = matrix,indexList = SignedGenesets: x is a eSet and indexList is a SignedIndexList
x = numeric,indexList = SignedIndexList: x is a numeric and indexList is a SignedIndexList
x = eSet,indexList = SignedIndexList: x is a eSet and indexList is a SignedIndexList
x = eSet,indexList = SignedGenesets: x is a eSet and indexList is a SignedIndexList

Note

The function has been optimized for expression profiling data. It avoids repetitive ranking of data as done by native R implementations and uses efficient C code to increase the performance and control memory use. Simulation studies using expression profiles of 22000 genes in 2000 samples and 200 gene sets suggested that the C implementation can be >1000 times faster than the R implementation. And it is possible to further accelerate by parallel calling the function with mclapply in the multicore package.

Author(s)

Jitao David Zhang <[email protected]>, with critical inputs from Jan Aettig and Iakov Davydov about U statistics.

References

Barry, W.T., Nobel, A.B., and Wright, F.A. (2008). A statistical framework for testing functional namespaces in microarray data. _Annals of Applied Statistics_ 2, 286-315.

Wu, D, and Smyth, GK (2012). Camera: a competitive gene set test accounting for inter-gene correlation. _Nucleic Acids Research_ 40(17):e133

Zar, JH (1999). _Biostatistical Analysis 4th Edition_. Prentice-Hall International, Upper Saddle River, New Jersey.

Examples

## R-native data structures
set.seed(1887)
rd <- rnorm(1000)
rl <- sample(c(TRUE, FALSE), 1000, replace=TRUE)
wmwTest(rd, rl, valType="p.two.sided")
wmwTest(rd, which(rl), valType="p.two.sided")
rd1 <- rd + ifelse(rl, 0.5, 0)
wmwTest(rd1, rl, valType="p.greater")
wmwTest(rd1, rl, valType="U")
rd2 <- rd - ifelse(rl, 0.2, 0)
wmwTest(rd2, rl, valType="p.greater")
wmwTest(rd2, rl, valType="p.two.sided")
wmwTest(rd2, rl, valType="p.less")
wmwTest(rd2, rl, valType="r")
wmwTest(rd2, rl, valType="f")

## matrix forms
rmat <- matrix(c(rd, rd1, rd2), ncol=3, byrow=FALSE)
wmwTest(rmat, rl, valType="p.two.sided")
wmwTest(rmat, rl, valType="p.greater")

wmwTest(rmat, which(rl), valType="p.two.sided")
wmwTest(rmat, which(rl), valType="p.greater")

## other valTypes
wmwTest(rmat, which(rl), valType="U")
wmwTest(rmat, which(rl), valType="abs.log10p.greater")
wmwTest(rmat, which(rl), valType="log10p.less")
wmwTest(rmat, which(rl), valType="abs.log10p.two.sided")
wmwTest(rmat, which(rl), valType="Q")
wmwTest(rmat, which(rl), valType="r")
wmwTest(rmat, which(rl), valType="f")

## using ExpressionSet
data(sample.ExpressionSet)
testSet <- sample.ExpressionSet
fData(testSet)$GeneSymbol <- paste("GENE_",1:nrow(testSet), sep="")
mySig1 <- sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.25, 0.75), replace=TRUE)
wmwTest(testSet, which(mySig1), valType="p.greater")

## using integer
exprs(testSet)[,1L] <- exprs(testSet)[,1L] + ifelse(mySig1, 50, 0)
wmwTest(testSet, which(mySig1), valType="p.greater")

## using lists
mySig2 <- sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.6, 0.4), replace=TRUE)
wmwTest(testSet, list(first=mySig1, second=mySig2))
## using GMT file
gmt_file <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
gmt_list <- readGmt(gmt_file)

gss <- sample(unlist(sapply(gmt_list, function(x) x$genes)), 1000)
eset<-new("ExpressionSet",
         exprs=matrix(rnorm(10000), nrow=1000L),
         phenoData=new("AnnotatedDataFrame", data.frame(Sample=LETTERS[1:10])),
         featureData=new("AnnotatedDataFrame",data.frame(GeneSymbol=gss)))
esetWmwRes <- wmwTest(eset ,gmt_list, valType="p.greater")
summary(esetWmwRes)

## using signed GMT file
signed_gmt_file <- system.file("extdata/test.gmt", package="BioQC")
signed_gmt <- readSignedGmt(signed_gmt_file)
esetSignedWmwRes <- wmwTest(eset, signed_gmt, valType="p.greater")

esetMat <- exprs(eset); rownames(esetMat) <- fData(eset)$GeneSymbol
esetSignedWmwRes2 <- wmwTest(esetMat, signed_gmt, valType="p.greater")
## R-native data structures
set.seed(1887)
rd <- rnorm(1000)
rl <- sample(c(TRUE, FALSE), 1000, replace=TRUE)
wmwTest(rd, rl, valType="p.two.sided")
wmwTest(rd, which(rl), valType="p.two.sided")
rd1 <- rd + ifelse(rl, 0.5, 0)
wmwTest(rd1, rl, valType="p.greater")
wmwTest(rd1, rl, valType="U")
rd2 <- rd - ifelse(rl, 0.2, 0)
wmwTest(rd2, rl, valType="p.greater")
wmwTest(rd2, rl, valType="p.two.sided")
wmwTest(rd2, rl, valType="p.less")
wmwTest(rd2, rl, valType="r")
wmwTest(rd2, rl, valType="f")

## matrix forms
rmat <- matrix(c(rd, rd1, rd2), ncol=3, byrow=FALSE)
wmwTest(rmat, rl, valType="p.two.sided")
wmwTest(rmat, rl, valType="p.greater")

wmwTest(rmat, which(rl), valType="p.two.sided")
wmwTest(rmat, which(rl), valType="p.greater")

## other valTypes
wmwTest(rmat, which(rl), valType="U")
wmwTest(rmat, which(rl), valType="abs.log10p.greater")
wmwTest(rmat, which(rl), valType="log10p.less")
wmwTest(rmat, which(rl), valType="abs.log10p.two.sided")
wmwTest(rmat, which(rl), valType="Q")
wmwTest(rmat, which(rl), valType="r")
wmwTest(rmat, which(rl), valType="f")

## using ExpressionSet
data(sample.ExpressionSet)
testSet <- sample.ExpressionSet
fData(testSet)$GeneSymbol <- paste("GENE_",1:nrow(testSet), sep="")
mySig1 <- sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.25, 0.75), replace=TRUE)
wmwTest(testSet, which(mySig1), valType="p.greater")

## using integer
exprs(testSet)[,1L] <- exprs(testSet)[,1L] + ifelse(mySig1, 50, 0)
wmwTest(testSet, which(mySig1), valType="p.greater")

## using lists
mySig2 <- sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.6, 0.4), replace=TRUE)
wmwTest(testSet, list(first=mySig1, second=mySig2))
## using GMT file
gmt_file <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
gmt_list <- readGmt(gmt_file)

gss <- sample(unlist(sapply(gmt_list, function(x) x$genes)), 1000)
eset<-new("ExpressionSet",
         exprs=matrix(rnorm(10000), nrow=1000L),
         phenoData=new("AnnotatedDataFrame", data.frame(Sample=LETTERS[1:10])),
         featureData=new("AnnotatedDataFrame",data.frame(GeneSymbol=gss)))
esetWmwRes <- wmwTest(eset ,gmt_list, valType="p.greater")
summary(esetWmwRes)

## using signed GMT file
signed_gmt_file <- system.file("extdata/test.gmt", package="BioQC")
signed_gmt <- readSignedGmt(signed_gmt_file)
esetSignedWmwRes <- wmwTest(eset, signed_gmt, valType="p.greater")

esetMat <- exprs(eset); rownames(esetMat) <- fData(eset)$GeneSymbol
esetSignedWmwRes2 <- wmwTest(esetMat, signed_gmt, valType="p.greater")

Wilcoxon-Mann-Whitney test in R

Description

Wilcoxon-Mann-Whitney test in R

Usage

wmwTestInR(x, sub, valType = c("p.greater", "p.less", "p.two.sided", "W"))
wmwTestInR(x, sub, valType = c("p.greater", "p.less", "p.two.sided", "W"))

Arguments

`x`	A numerical vector
`sub`	A logical vector or integer vector to subset `x`. Numbers in `sub` are compared with numbers out of `sub`
`valType`	Type of retured-value. Supported values: p.greater, p.less, p.two.sided, and W statistic (note it is different from the U statistic)

Examples

testNums <- 1:10
testSub <- rep_len(c(TRUE, FALSE), length.out=length(testNums))
wmwTestInR(testNums, testSub)
wmwTestInR(testNums, testSub, valType="p.two.sided")
wmwTestInR(testNums, testSub, valType="p.less")
wmwTestInR(testNums, testSub, valType="W")
testNums <- 1:10
testSub <- rep_len(c(TRUE, FALSE), length.out=length(testNums))
wmwTestInR(testNums, testSub)
wmwTestInR(testNums, testSub, valType="p.two.sided")
wmwTestInR(testNums, testSub, valType="p.less")
wmwTestInR(testNums, testSub, valType="W")

Package 'BioQC'

Help Index

Subsetting GmtList object into another GmtList object

Description

Usage

Arguments

Examples

Subsetting GmtList object to fetch one gene-set

Description

Usage

Arguments

Examples

Absolute base-10 logarithm of p-values

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Append a GmtList object to another one

Description

Usage

Arguments

Value

Examples

Convert a list of gene symbols into a gmtlist

Description

Usage

Arguments

Examples

An S4 class to hold a list of indices, with the possibility to specify the offset of the indices. IndexList and SignedIndexList extend this class

Description

Slots

Shannon entropy

Description

Usage

Arguments

Value

Author(s)

References

Examples

Entropy-based sample diversity

Description

Usage

Arguments

Value

References

See Also

Examples

Entropy-based gene-expression specificity

Description

Usage

Arguments

Value

References

See Also

Examples

Filter a GmtList by size

Description

Usage

Arguments

Value

Filter rows of p-value matrix under the significance threshold

Description

Usage

Arguments

Value

Author(s)

Examples

Getting leading-edge indices from a vector

Description

Usage

Arguments

Value

Functions

See Also

Examples

Calculate Gini Index of a numeric vector

Description