Package 'adSplit' reference manual

Package 'adSplit'

Title:	Annotation-Driven Clustering
Description:	This package implements clustering of microarray gene expression profiles according to functional annotations. For each term genes are annotated to, splits into two subclasses are computed and a significance of the supporting gene set is determined.
Authors:	Claudio Lottaz, Joern Toedling
Maintainer:	Claudio Lottaz <[email protected]>
License:	GPL (>= 2)
Version:	1.77.0
Built:	2025-03-29 05:19:22 UTC
Source:	https://github.com/bioc/adSplit

Title:

Annotation-Driven Clustering

Description:

This package implements clustering of microarray gene expression profiles according to functional annotations. For each term genes are annotated to, splits into two subclasses are computed and a significance of the supporting gene set is determined.

Authors:

Claudio Lottaz, Joern Toedling

Maintainer:

Claudio Lottaz <[email protected]>

License:

GPL (>= 2)

Version:

1.77.0

Built:

2025-03-29 05:19:22 UTC

Source:

https://github.com/bioc/adSplit

Help Index

Annotation-Driven Splits

Description

This function searches for annotation-driven splits of patients in microarray data. A split is a partitioning of patients into two groups. In order to do so it refers to GO terms and KEGG pathways. In addition, a significance measure can be computed by simulating a random distribution of scores. DLD-scores are used to judge the quality of a split.

Usage

adSplit(mydata, annotation.ids, chip.name, 
        min.probes = 20, max.probes = NULL, 
        B = NULL, min.group.size = 5, ngenes = 50, 
        ignore.genes = 5)
adSplit(mydata, annotation.ids, chip.name, 
        min.probes = 20, max.probes = NULL, 
        B = NULL, min.group.size = 5, ngenes = 50, 
        ignore.genes = 5)

Arguments

`mydata`	either an expression set as defined by the package `Biobase` or a matrix of expression levels (rows=genes, columns=samples).
`annotation.ids`	a vector of GO or KEGG identifiers in the form "GO:..." or "KEGG:..." respectively. The prefix "KEGG:" is removed from the KEGG-identifiers before accessing the chip's "...PATH2PROBES" hash.
`chip.name`	the name of the chip by which the expression set is measured. `adSplit` attempts to load a library of the same name and expects to find a hash called "<chip-name>GO2ALLPROBES" and one called "<chip-name>PATH2PROBES" there.
`min.probes`	annotation identifiers with fewer than this associated genes are skipped.
`max.probes`	annotation identifiers with more than this associated genes are skipped. The default is ten percent of the genes on the chip.
`B`	the number of random gene set samplings to be performed to compute empirical p-values.
`min.group.size`	filter criteria to avoid splits suggesting tiny groups. Splits where one of the two suggested groups are smaller than this number are removed from the split set.
`ngenes`	number of genes used to compute DLD scores.
`ignore.genes`	number of best scoring genes to be ignored when computing DLD scores.

Details

This function applies the same splitting procedure to all annotation identifiers provided. Firstly, the associated genes for one identifier are determined and extracted from the expression data. Then the diana2means function is applied to the restricted data and the different splits generated are collected into a single splitSet object.

As annotation identifiers vectors of identifiers of the KEGG:nnnnn and GO:nnnnnn are valid. In addition, the keywords "KEGG", "GO" and "all" are allowed, representing all terms in the corresponding ontology.

If B is set to a integer number this number of samplings are used to generate a null-distribution of DLD-scores. This distribution is used to compute empirical p-values for each split. If more than one valid split is found, multiple testing is corrected for by applying Benjamini-Hochbergs correction from the multtest package.

Value

Returns an object of class splitSet with the following list elements:

`cuts`	a matrix of split attributions. One row per annotation identifier (GO term or KEGG pathway for which a split has been generated. One column per object in the dataset.
`score`	one score per generated split.
`pvalue`	one empirical p-value per generated split, or `NULL`
`qvalue`	one q-value computed according Benjamini-Hochberg's correction for multiple testing per generated split, or `NULL`

Author(s)

Claudio Lottaz, Joern Toedling

Examples

 
# prepare data
library(golubEsets) 
data(Golub_Merge) 

# generate annotation-driven splits for apoptosis and signal transduction
x <- adSplit(Golub_Merge, "GO:0006915", "hu6800")
x <- adSplit(Golub_Merge, c("GO:0007165","GO:0006915"), "hu6800", max.probes=7000)

# generate a split for alanine, aspartate and glutamate metabolism including 
# an empirical p-value
x <- adSplit(Golub_Merge, "KEGG:00250", "hu6800", B=100)

# generate splits for all KEGG pathways.
x <- adSplit(Golub_Merge, "KEGG", "hu6800")
image(x)
# prepare data
library(golubEsets) 
data(Golub_Merge) 

# generate annotation-driven splits for apoptosis and signal transduction
x <- adSplit(Golub_Merge, "GO:0006915", "hu6800")
x <- adSplit(Golub_Merge, c("GO:0007165","GO:0006915"), "hu6800", max.probes=7000)

# generate a split for alanine, aspartate and glutamate metabolism including 
# an empirical p-value
x <- adSplit(Golub_Merge, "KEGG:00250", "hu6800", B=100)

# generate splits for all KEGG pathways.
x <- adSplit(Golub_Merge, "KEGG", "hu6800")
image(x)

2-Means with Hierarchical Initialization

Description

Split a set of data points into two coherent groups using the k-means algorithm. Instead of random initialization, divisive hierarchical clustering is used to determine initial groups and the corresponding centroids.

Usage

diana2means(mydata, mingroupsize = 5, 
            ngenes = 50, ignore.genes = 5, 
            return.cut = FALSE)
diana2means(mydata, mingroupsize = 5, 
            ngenes = 50, ignore.genes = 5, 
            return.cut = FALSE)

Arguments

`mydata`	either an expression set as defined by the package `Biobase` or a matrix of expression levels (rows=genes, columns=samples).
`mingroupsize`	report only splits where both groups are larger than this size.
`ngenes`	number of genes used to compute cluster quality DLD-score.
`ignore.genes`	number of best scoring genes to be ignored when computing DLD-scores.
`return.cut`	logical, whether to return the attributions of samples to groups.

Details

This function uses divisive hierarchical clustering (diana) to generate a first split of the data. Thereby, each column of the data matrix is considered to represent a data element. From the thus generated temptative groups, centroids are deduced and used to initialize the k-means clustering algorithm.

For the split optimized by k-means the DLD-score is determined using the ngenes and ignore.genes arguments.

Value

If the logical return.cut is set to FALSE (the default), a single number is representing the DLD-score for the generated split is returned. Otherwise an object of class split containing the following elements is returned:

`cut`	one number out of 0 and 1 per column in the original data, specifying the split attribution.
`score`	the DLD-score achieved by the split.

Author(s)

Joern Toedling, Claudio Lottaz

Examples

# get golub data
library(vsn)
library(golubEsets)
data(Golub_Merge)

# use 10% most variable genes
e <- exprs(Golub_Merge)
vars <- apply(e, 1, var)
e <- e[vars > quantile(vars,0.9),]

# use diana2means to get splits and scores
diana2means(e)
diana2means(e, return.cut=TRUE)
# get golub data
library(vsn)
library(golubEsets)
data(Golub_Merge)

# use 10% most variable genes
e <- exprs(Golub_Merge)
vars <- apply(e, 1, var)
e <- e[vars > quantile(vars,0.9),]

# use diana2means to get splits and scores
diana2means(e)
diana2means(e, return.cut=TRUE)

Draw sets of probe-sets

Description

This function draws a given number of probe-sets randomly, such that probe-sets referring to the same are either included or excluded as a whole.

Usage

drawRandomPS(nps, EID2PSenv, allEIDs)
drawRandomPS(nps, EID2PSenv, allEIDs)

Arguments

`nps`	number of probe-sets to be drawn.
`EID2PSenv`	a hash mapping EntrezGene to probe-set identifiers.
`allEIDs`	vector of all EntrezGene identifiers represented on a chip.

Value

A named vector of probe-set identifiers. The names correspond to the EntrezGene identifiers.

Author(s)

Claudio Lottaz

Examples

# draw ten random probe-sets from hu6800
library(hu6800.db)
EID2PSenv <- makeEID2PROBESenv(hu6800ENTREZID)
drawRandomPS(10, EID2PSenv, ls(EID2PSenv))
# draw ten random probe-sets from hu6800
library(hu6800.db)
EID2PSenv <- makeEID2PROBESenv(hu6800ENTREZID)
drawRandomPS(10, EID2PSenv, ls(EID2PSenv))

Examplar splitSet

Description

This is a data object precomputed by adSplit for illustration.

Usage

data(golubKEGGSplits)data(golubKEGGSplits)

Format

Annotation-driven split set holds 70 splits on 72 elements, scores range is: 3.382672 17.31385, empirical p-values range is: 0.005 0.955, q-value range is: 0.1633333 0.955.

Details

This object is generated by the following call:

golubKEGGSplits <- adSplit(golubNorm, "KEGG", "hu6800", B=1000)

where golubNorm is a normalized version of Golub_Merge from the golubEsets package.

Examples

data(golubKEGGSplits)
data(golubKEGGSplits)

Overview Histogram for splitSets

Description

Draws a histogram of empirical p-values and shows the corresponding q-values corrected for multiple testing.

Usage

## S3 method for class 'splitSet'
hist(x, main = "Distribution of p-Values", 
              xlab = "p-values", col = "grey", xlim = c(0, 1), ...)
## S3 method for class 'splitSet'
hist(x, main = "Distribution of p-Values", 
              xlab = "p-values", col = "grey", xlim = c(0, 1), ...)

Arguments

`x`	object of type `splitSet`. Should hold a considerable number of splits.
`main`	main title of the histogram.
`xlab`	legend for the x-axis.
`col`	color for the histogram bars.
`xlim`	limits for the x-axis (p-values).
`...`	further parameters passed on to the default `hist` function.

Details

This function draws a regular histogram of empirical p-values observed in the splitSet at hand. The corresponding q-values, corrected by the method suggested by Benjamini-Hochberg, are plotted into the same graph. The scale for the q-values is shown at the left hand side of the plot.

Author(s)

Claudio Lottaz

Examples

data(golubKEGGSplits)
hist(golubKEGGSplits, col="red")
data(golubKEGGSplits)
hist(golubKEGGSplits, col="red")

Illustrate Split Sets

Description

Draws an image of all splits, one per row, of a splitSet object. Each column corresponds to a patient.

Usage

## S3 method for class 'splitSet'
image(x, filter.fdr = 1, main = "", max.label.length = 50, 
               full.names = TRUE, xlab = NULL, sample.labels = FALSE, 
               col = c("yellow", "red"), invert = FALSE, 
               outfile = NULL, res = 72, pointsize = 7, ...)
## S3 method for class 'splitSet'
image(x, filter.fdr = 1, main = "", max.label.length = 50, 
               full.names = TRUE, xlab = NULL, sample.labels = FALSE, 
               col = c("yellow", "red"), invert = FALSE, 
               outfile = NULL, res = 72, pointsize = 7, ...)

Arguments

`x`	the object of class `splitSet` to be illustrated.
`filter.fdr`	worst acceptable false discovery rate for the shown set of splits. All splits with q-values below this level are dropped from the image.
`main`	a title for the image.
`max.label.length`	Maximal length of the annotations shown to the right of the image. Longer annotations are truncated.
`full.names`	Show full names for annotations instead of their identifiers only.
`xlab`	additional annotation on the x-axis.
`sample.labels`	whether names of samples are to be shown on the x-axis.
`col`	two strings encoding the colors to be used to illustrate to which group a sample is attributed.
`invert`	whether to draw in white on black background.
`outfile`	the filename on which to draw the image in postscript format. The default is `NULL`, meaning to produce the image interactively.
`res`	resolution for bitmap output on postscript.
`pointsize`	size of font.
`...`	further arguments passed to `image`.

Details

The set of splits given is illustrated as an image. Each row corresponds to an annotation, each column to a patient. In position (x,y), the association of patient x to a group with respect to annotation y is coded as colors (yellow and red by default). The image is ordered by hierarchical clustering such that similar patients and similar splits are brought closer together.

Value

Always returns NULL.

Author(s)

Claudio Lottaz

Examples

data(golubKEGGSplits)
image(golubKEGGSplits, filter.fdr=0.5)
data(golubKEGGSplits)
image(golubKEGGSplits, filter.fdr=0.5)

Generate EID2PROBES environment

Description

Make hash containing probe-sets per EntrezGene identifier.

Usage

makeEID2PROBESenv(EIDenv)
makeEID2PROBESenv(EIDenv)

Arguments

EIDenv

an environment containing one entry per probe-set holding all corresponding EntrezGene identifiers.

Value

An environment containing one entry per EntrezGene identifier holding all corresponding probe-sets.

Author(s)

Joern Toedling, Claudio Lottaz

Examples

 
library(hu6800.db)
makeEID2PROBESenv(hu6800ENTREZID) 
library(hu6800.db)
makeEID2PROBESenv(hu6800ENTREZID)

Generate null-distributions of DLD-scores

Description

Draws a number of random sets of probe-sets consisting of the needed size and applies diana2means to compute DLD scores.

Usage

randomDiana2means(nprobes, data, chip, ndraws = 10000, 
                  ngenes = 50, ignore.genes = 5)
randomDiana2means(nprobes, data, chip, ndraws = 10000, 
                  ngenes = 50, ignore.genes = 5)

Arguments

`nprobes`	the size of gene sets.
`data`	a matrix of expression data, rows correspond to genes, columns to samples.
`chip`	the name of the used chip.
`ndraws`	the number of DLD scores computed.
`ngenes`	the number of genes used to compute DLD scores (passed to `diana2means`).
`ignore.genes`	the number of best scoring genes to be ignored when computing DLD scores (passed to `diana2means`)

Details

This function uses drawRandomPS to draw ndraws gene sets. On these it applies diana2means to determine a null-distribution of DLD-scores.

Value

A vector of DLD-scores.

Author(s)

Joern Toedling, Claudio Lottaz

Examples

# prepare data
library(vsn) 
library(golubEsets) 
data(Golub_Merge) 

# generate DLD scores
scores <- randomDiana2means(20, exprs(Golub_Merge), "hu6800", ndraws = 500) 
# prepare data
library(vsn) 
library(golubEsets) 
data(Golub_Merge) 

# generate DLD scores
scores <- randomDiana2means(20, exprs(Golub_Merge), "hu6800", ndraws = 500)

Package 'adSplit'

Help Index

Annotation-Driven Splits

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

2-Means with Hierarchical Initialization

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Draw sets of probe-sets

Description

Usage

Arguments

Value

Author(s)

Examples

Examplar splitSet

Description

Usage

Format

Details

Examples

Overview Histogram for splitSets

Description

Usage

Arguments

Details

Author(s)

See Also

Examples

Illustrate Split Sets

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Generate EID2PROBES environment

Description

Usage

Arguments

Value

Author(s)

Examples

Generate null-distributions of DLD-scores

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples