Package 'CAFE' reference manual

Title:	Chromosmal Aberrations Finder in Expression data
Description:	Detection and visualizations of gross chromosomal aberrations using Affymetrix expression microarrays as input
Authors:	Sander Bollen
Maintainer:	Sander Bollen <[email protected]>
License:	GPL-3
Version:	1.43.0
Built:	2025-02-17 06:09:34 UTC
Source:	https://github.com/bioc/CAFE

Chromosomal Aberrations Finder in Expression data

Description

CAFE attempts to find chromosomal aberrations in microarray expression (mRNA) data. It contains several plotting functions to aid in visualizing these aberrations. It generally recapitulates the workflow described by Mayshar et al (see references), and implements several algorithms described by Friedrich et al (see references).

Details

Package:	CAFE
Type:	Package
Version:	0.6.9.5
Date:	2013-05-16
License:	GPLv3

Author(s)

Sander Bollen

References

Friedrich, F., Kempe, a, Liebscher, V., & Winkler, G. (2008). Complexity Penalized M-Estimation. Journal of Computational and Graphical Statistics, 17(1), 201-224. doi:10.1198/106186008X285591

Mayshar, Y., Ben-David, U., Lavon, N., Biancotti, J.-C., Yakir, B., Clark, A. T., Plath, K., et al. (2010). Identification and classification of chromosomal aberrations in human induced pluripotent stem cells. Cell stem cell, 7(4), 521-31. doi:10.1016/j.stem.2010.07.017

Examples

## Not run: 
setwd("/some/path/to/cel/files")
data <- ProcessCels() 
# process cel files
samples <- c(1,2) 
# select samples 1 and 2 to compare against the rest
chromosomeStats(data,chromNum="ALL",samples=samples) 
# check for chromosomal gains
chromosomeStats(data,chromNum="ALL",samples=samples,alternative="less") 
# check for chromosomal losses
bandStats(data,chromNum=1,samples=samples) 
# check for band gains in chr1
bandStats(data,chromNum=1,samples=samples,alternative="less") 
# check for band losses in chr1
rawPlot(data,chromNum=1,samples=samples,idiogram=TRUE) 
# plot raw data with an ideogram
slidPlot(data,chromNum=1,samples=samples,idiogram=TRUE,combine=TRUE,k=100) 
# moving average plot with ideogram
discontPlot(data,chromNum=1,samples=samples,idiogram=TRUE) 
# discontinuous plot with ideogram


## End(Not run)
## Not run: 
setwd("/some/path/to/cel/files")
data <- ProcessCels() 
# process cel files
samples <- c(1,2) 
# select samples 1 and 2 to compare against the rest
chromosomeStats(data,chromNum="ALL",samples=samples) 
# check for chromosomal gains
chromosomeStats(data,chromNum="ALL",samples=samples,alternative="less") 
# check for chromosomal losses
bandStats(data,chromNum=1,samples=samples) 
# check for band gains in chr1
bandStats(data,chromNum=1,samples=samples,alternative="less") 
# check for band losses in chr1
rawPlot(data,chromNum=1,samples=samples,idiogram=TRUE) 
# plot raw data with an ideogram
slidPlot(data,chromNum=1,samples=samples,idiogram=TRUE,combine=TRUE,k=100) 
# moving average plot with ideogram
discontPlot(data,chromNum=1,samples=samples,idiogram=TRUE) 
# discontinuous plot with ideogram


## End(Not run)

Find aberrations with chromosome arm resolution

Description

Calculate significant chromosomal arms with various statistical tests

Usage

armStats(datalist, chromNum=1, arm="q",
samples=NULL, select="cli",test="fisher",
bonferroni = TRUE, enrichment = "greater")
armStats(datalist, chromNum=1, arm="q",
samples=NULL, select="cli",test="fisher",
bonferroni = TRUE, enrichment = "greater")

Arguments

`datalist`	The CAFE datalist to be analyzed, i.e. the output of `ProcessCels`.
`chromNum`	The chromosome to be calculated. This can be `"ALL"` to calculate all chromosomes.
`arm`	Select which arm - `"q"` or `"p"` - to analyse
`samples`	A vector containing sample numbers to be analyzed
`select`	Signifies which type of sample selection prompt will be shown, if `samples=NULL`. Currently supported are `"cli"` for a command line interface and `"gui"` for a tcl/tk-based graphical user interface.
`test`	Signifies which statistical test to be used in the final calculation. Must be either `"fisher"` for an exact fisher test or `"chisqr"` for a chi square test.
`bonferroni`	If `bonferroni=TRUE`, will correct the p-values of the enrichment test with a bonferroni method.
`enrichment`	Test for over or underexpression. Can be set to `"greater"` or `"less"`.

Value

A named vector containing p-values.

Note

Technically speaking, the Fisher's exact test is better than the chi-square test; the Fisher's exact test gives an exact p-value, whereas the chi-square test only gives an approximation. However, the Fisher's exact test can get slow for large sample sizes, and the chi-square test becomes better with increasing sample size but does not slow down as much.

Author(s)

Sander Bollen

Examples

data("CAFE_data")
armStats(CAFE_data,chromNum="ALL",samples=c(1,3),arm="p")
data("CAFE_data")
armStats(CAFE_data,chromNum="ALL",samples=c(1,3),arm="p")

Find aberrations with cytoband resolution

Description

Calculate significant chromosome bands with various statistical tests

Usage

bandStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher",
bonferroni = TRUE, enrichment = "greater")
bandStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher",
bonferroni = TRUE, enrichment = "greater")

Arguments

`datalist`	The CAFE datalist to be analyzed, i.e. the output of `ProcessCels`.
`chromNum`	The chromosome to be calculated. This can be `"ALL"` to calculate all chromosomes.
`samples`	A vector containing sample numbers to be analyzed
`select`	Signifies which type of sample selection prompt will be shown, if `samples=NULL`. Currently supported are `"cli"` for a command line interface and `"gui"` for a tcl/tk-based graphical user interface.
`test`	Signifies which statistical test to be used in the final calculation. Must be either `"fisher"` for an exact fisher test or `"chisqr"` for a chi square test.
`bonferroni`	If `bonferroni=TRUE`, will correct the p-values of the enrichment test with a bonferroni method.
`enrichment`	Test for over or underexpression. Can be set to `"greater"` or `"less"`.

Value

A named vector containing p-values if testing a single chromosome. If chromNum="ALL", the output will be a two-column data frame, with cytoband names in the first column and p-values in the second column.

Note

Author(s)

Sander Bollen

Examples

data(CAFE_data)
bandStats(CAFE_data,chromNum=17,samples=c(1,3),test="fisher")
data(CAFE_data)
bandStats(CAFE_data,chromNum=17,samples=c(1,3),test="fisher")

CAFE data set

Description

Contains the dataset of GSE6561 and GSE10809 processed by ProcessCels

Usage

data("CAFE_data")data("CAFE_data")

Format

A list containing two lists

whole: A list containing a dataframe for each sample
over: A list containing a dataframe for each sample, but with only those probes that are deemed overexpressed

The dataframes inside the lists contain the following columns:

ID: Affymetrix probe IDs
Sym: Gene symbols
Value: Log2 transformed expression values
LogRel: Log2 transformed relative expression values (to the median)
Loc: Chromosomal locations
Chr: Chromosome identifiers

Source

GSE6561: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6561

GSE10809: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10809

Examples

data("CAFE_data")
data("CAFE_data")

Find aberrations with whole-chromosome resolution

Description

Calculate significant chromosomes with various statistical tests

Usage

chromosomeStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher",
bonferroni = TRUE, enrichment = "greater")
chromosomeStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher",
bonferroni = TRUE, enrichment = "greater")

Arguments

`datalist`	The CAFE datalist to be analyzed, i.e. the output of `ProcessCels`.
`chromNum`	The chromosome to be calculated. This can be `"ALL"` to calculate all chromosomes.
`samples`	A vector containing sample numbers to be analyzed
`select`	Signifies which type of sample selection prompt will be shown, if `samples=NULL`. Currently supported are `"cli"` for a command line interface and `"gui"` for a tcl/tk-based graphical user interface.
`test`	Signifies which statistical test to be used in the final calculation. Must be either `"fisher"` for an exact fisher test or `"chisqr"` for a chi square test.
`bonferroni`	If `bonferroni=TRUE`, will correct the p-values of the enrichment test with a bonferroni method.
`enrichment`	Test for over or underexpression. Can be set to `"greater"` or `"less"`.

Value

A named vector containing p-values.

Note

Author(s)

Sander Bollen

Examples

data("CAFE_data")
sam <- c(9,11)
chromosomeStats(CAFE_data,chromNum=17,samples=sam,test="fisher")
data("CAFE_data")
sam <- c(9,11)
chromosomeStats(CAFE_data,chromNum=17,samples=sam,test="fisher")

Subset data with a CLI

Description

Provides command line interface for subsetting input datasets

Usage

cliSubset(datalist,alternative)
cliSubset(datalist,alternative)

Arguments

`datalist`	the dataset to be subsetted
`alternative`	"greater" or "less"

Value

subset of input

Author(s)

Sander Bollen

Examples

## Not run: 
datalist <- data("CAFE_data")
sub <- cliSubset(datalist,alternative="greater")

## End(Not run)
## Not run: 
datalist <- data("CAFE_data")
sub <- cliSubset(datalist,alternative="greater")

## End(Not run)

Plot with discontinuous smoother

Description

Plots chromosome plots with a discontinuous smoother

Usage

discontPlot(datalist,samples=c(1,2),chromNum=1,gamma=300,idiogram=FALSE,
file="default")
discontPlot(datalist,samples=c(1,2),chromNum=1,gamma=300,idiogram=FALSE,
file="default")

Arguments

`datalist`	The CAFE datalist to be analyzed, i.e. the output of `ProcessCels`.
`samples`	A vector or sample numbers to be plotted
`chromNum`	the chromosome to be plotted
`gamma`	The `gamma` level can be roughly compared to the sliding window size in a normal continuous smoother. The `gamma` level determines how strict the algorithm functions; a higher level will correspond to fewer jumps. This can not be higher than the total number of probesets on the to-be-analyzed chromosome. Must be a positive integer.
`idiogram`	if `TRUE`, will overlay a chromosome idiogram over the chromosome plot
`file`	Specify a file name to store output png file

Value

Plot to file system; Returns a ggplot2 graph if chromNum!="ALL". When chromNum=="ALL", returns a list of ggplot2 graphs.

Author(s)

Sander Bollen

References

Friedrich, F., Kempe, a, Liebscher, V., & Winkler, G. (2008). Complexity Penalized M-Estimation. Journal of Computational and Graphical Statistics, 17(1), 201-224. doi:10.1198/106186008X285591

Examples

data("CAFE_data")
discontPlot(CAFE_data,samples=9,chromNum=17,gamma=300)
data("CAFE_data")
discontPlot(CAFE_data,samples=9,chromNum=17,gamma=300)

A discontinuous smoother

Description

Calculates discontinuous smoother

Usage

discontSmooth(y,gamma)
discontSmooth(y,gamma)

Arguments

`y`	input vector
`gamma`	The `gamma` level can be roughly compared to the sliding window size in a normal continuous smoother. The `gamma` level determines how strict the algorithm functions; a higher level will correspond to fewer jumps. This cannot be larger than `length(y)`. Must be a positive integer.

Details

Uses the potts filter algorithm described by Friedrich et al.

Value

Vector with same length as input y

Author(s)

Sander Bollen

References

Friedrich, F., Kempe, a, Liebscher, V., & Winkler, G. (2008). Complexity Penalized M-Estimation. Journal of Computational and Graphical Statistics, 17(1), 201-224. doi:10.1198/106186008X285591

Examples

#generate piecewise vector with gaussian noise
y <- 1:450
y[1:150] <- 2
y[151:300] <- 3
y[301:450] <- 1
y <- y + rnorm(450)

#calculate smoother
y_smooth <- discontSmooth(y,20)
#generate piecewise vector with gaussian noise
y <- 1:450
y[1:150] <- 2
y[151:300] <- 3
y[301:450] <- 1
y <- y + rnorm(450)

#calculate smoother
y_smooth <- discontSmooth(y,20)

Plot all chromosomes horizontally next to each other

Description

Plots all chromosomes in horizontal alignment next to each other, with optionally a moving average smoother applied to the data

Usage

 
facetPlot(datalist,samples=c(1,2),slid=FALSE,combine=FALSE,k=1,file="default")
facetPlot(datalist,samples=c(1,2),slid=FALSE,combine=FALSE,k=1,file="default")

Arguments

`datalist`	The CAFE datalist to be analyzed, i.e. the output of `ProcessCels`.
`samples`	A vector or sample numbers to be plotted
`slid`	If `TRUE`, use moving average smoother
`combine`	If `TRUE`, will plot the unaltered raw data in the background
`k`	The sliding window size. Must be a positive integer, smaller than the length of Affy IDs on the chromosome
`file`	Specify a file name to store output png file

Value

Plot to file system. Return a ggplot2 graph

Note

Makes heavy use of the ggplot2 package

Author(s)

Sander Bollen

References

H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

Examples

data("CAFE_data")
facetPlot(CAFE_data,samples=9)
data("CAFE_data")
facetPlot(CAFE_data,samples=9)

Combines pvalues by using Fisher's method

Description

Combines pvalues by using Fisher's method

Usage

fisher.method(pvals)
fisher.method(pvals)

Arguments

pvals

Vector of p values

Value

Combined p value

Author(s)

Sander Bollen

Examples

pvals <- runif(20) #generate 20 pvals
fisher.method(pvals)
pvals <- runif(20) #generate 20 pvals
fisher.method(pvals)

Subset data with a GUI

Description

Provides graphical user interface for subsetting input datasets

Usage

guiSubset(datalist,alternative)
guiSubset(datalist,alternative)

Arguments

`datalist`	the dataset to be subsetted
`alternative`	"greater" or "less"

Value

Subset of input to variable guiSelectedSet in working directory

Author(s)

Sander Bollen

Examples

## Not run: 
data("CAFE_data")
guiSubset(CAFE_data,alternative="greater")

## End(Not run)
## Not run: 
data("CAFE_data")
guiSubset(CAFE_data,alternative="greater")

## End(Not run)

Processing CEL files

Description

Normalizes and computes relative expressions for all CEL files in work directory

Usage

ProcessCels(threshold.over=1.5,threshold.under=(2/3),remove_method=1,
local_file=NULL)
ProcessCels(threshold.over=1.5,threshold.under=(2/3),remove_method=1,
local_file=NULL)

Arguments

`threshold.over`	Determines the threshold, as a multiple of median value, where probes are considered overexpressed. Default is 1.5
`threshold.under`	Determines the threshold, as a fraction of median value, where probes are considered underexpressed. Default is 2/3
`remove_method`	Determines which method is used to remove multiple probesets that are annotated to map to the same gene. The default option, `1`, will keep 1 probeset with the following priority: 1): `nnn_at`; 2): `nnn_a_at`; 3): `nnn_s_at`; 4): `nnn_x_at`; 5): lowest `nnn` if multiple probes still exist If `remove_method=2`, probesets will only be removed if several probesets of the same gene map to the exact same location. In the case that many probesets map to the same location, one probeset will be retained according to the priority of option 1 above. If `remove_method=0`, no multiple probesets will be removed
`local_file`	Use a local - previously downloaded - UCSC file (e.g. http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ affyU133Plus2.txt.gz) instead of directly retrieving the file instead.

Details

this function uses the RMA algorithm to normalize *.CEL files in work directory. It then computes relative expressions for every probe on every sample. Locations for probesets are downloaded from UCSC, as the standard BioConductor annotations do not map probeset location (they only map the location to the corresponding gene). Multiple probesets belonging to the same gene are removed as described above. The function then determines which probes are overexpressed and underexpressed relative to the median probeset values across all samples. Finally,the relative expressions are log2-transformed.

Value

list

`$whole`	named list, where each element is a data.frame corresponding to a *.CEL file - containing columns: 1): `"ID"` (Affy ID number); 2): `"Sym"` (gene Symbol); 3): `"Value"` (Expression values); 4): `"LogRel"` (Relative expressions); 5): `"Loc"` (Chromosomal locations); 6): `"Chr"` (Chromosome number); 7): `"Band"` (Cytoband); 8): `"Arm"` (Chromosomal arm)
`$over`	same as $whole, but contains only those probes which are deemed overexpressed
`$under`	same as $whole, but contains only those probes which are deemd underexpressed

Author(s)

Sander Bollen

Examples

## Not run: 
data <- ProcessCels()

## End(Not run)
## Not run: 
data <- ProcessCels()

## End(Not run)

Plot without any smoother

Description

Makes chromosome plot using raw data values

Usage

rawPlot(datalist,samples=c(1,2),chromNum=1,idiogram=FALSE,file="default")
rawPlot(datalist,samples=c(1,2),chromNum=1,idiogram=FALSE,file="default")

Arguments

`datalist`	The CAFE datalist to be analyzed, i.e. the output of `ProcessCels`.
`samples`	A vector or sample numbers to be plotted
`chromNum`	The chromosome to be analyzed
`idiogram`	If `TRUE`, will plot a chromosome idiogram over the plot
`file`	Specify a file name to store output png file

Value

Plot to file system; Returns a ggplot2 graph if chromNum!="ALL". When chromNum=="ALL", returns a list of ggplot2 graphs.

Author(s)

Sander Bollen

Examples

data("CAFE_data")
rawPlot(CAFE_data,samples=8,chromNum=17)
data("CAFE_data")
rawPlot(CAFE_data,samples=8,chromNum=17)

Plot with sliding average smoother

Description

Plots chromosome plots with a moving average smoother

Usage

slidPlot(datalist,samples=c(1,2),chromNum=1,combine=FALSE,k=1,idiogram=FALSE,file="default")
slidPlot(datalist,samples=c(1,2),chromNum=1,combine=FALSE,k=1,idiogram=FALSE,file="default")

Arguments

`datalist`	The CAFE datalist to be analyzed, i.e. the output of `ProcessCels`.
`samples`	A vector of sample numbers to be plotted
`chromNum`	The chromosome to be analyzed
`combine`	If `TRUE`, will plot the unaltered raw data in the background
`k`	The sliding window size. Must be a positive integer, smaller than the total number of probesets on the chromosome
`idiogram`	If `TRUE`, will plot a chromosome idiogram over the plot
`file`	Specify a file name to store output png fileS

Value

Plot to file system; Returns a ggplot2 graph if chromNum!="ALL". When chromNum=="ALL", returns a list of ggplot2 graphs.

Note

Makes heavy use of the ggplot2 package.

Author(s)

Sander Bollen

References

H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

Examples

data("CAFE_data")
slidPlot(CAFE_data,samples=9,chromNum=17,k=50,combine=TRUE)
data("CAFE_data")
slidPlot(CAFE_data,samples=9,chromNum=17,k=50,combine=TRUE)

A moving average smoother

Description

Calculates moving average smoother

Usage

slidSmooth(x,k)
slidSmooth(x,k)

Arguments

`x`	input vector
`k`	The moving average window size. Must be an integer value greater than 0, and no larger than `length(y)`.

Value

Vector with same length as input y

Author(s)

Sander Bollen

Examples

#generate piecewise vector with gaussian noise
y <- 1:450
y[1:150] <- 2
y[151:300] <- 3
y[301:450] <- 1
y <- y + rnorm(450)

#calculate smoother
y_smooth <- slidSmooth(y,20)
#generate piecewise vector with gaussian noise
y <- 1:450
y[1:150] <- 2
y[151:300] <- 3
y[301:450] <- 1
y <- y + rnorm(450)

#calculate smoother
y_smooth <- slidSmooth(y,20)

Package 'CAFE'

Help Index

Chromosomal Aberrations Finder in Expression data

Description

Details

Author(s)

References

Examples

Find aberrations with chromosome arm resolution

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Examples

Find aberrations with cytoband resolution

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Examples

CAFE data set

Description

Usage

Format

Source

Examples

Find aberrations with whole-chromosome resolution

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Examples

Subset data with a CLI

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot with discontinuous smoother

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

A discontinuous smoother

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Plot all chromosomes horizontally next to each other

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Combines pvalues by using Fisher's method

Description

Usage