Title: | Chromosmal Aberrations Finder in Expression data |
---|---|
Description: | Detection and visualizations of gross chromosomal aberrations using Affymetrix expression microarrays as input |
Authors: | Sander Bollen |
Maintainer: | Sander Bollen <[email protected]> |
License: | GPL-3 |
Version: | 1.43.0 |
Built: | 2024-12-19 03:31:04 UTC |
Source: | https://github.com/bioc/CAFE |
CAFE attempts to find chromosomal aberrations in microarray expression (mRNA) data. It contains several plotting functions to aid in visualizing these aberrations. It generally recapitulates the workflow described by Mayshar et al (see references), and implements several algorithms described by Friedrich et al (see references).
Package: | CAFE |
Type: | Package |
Version: | 0.6.9.5 |
Date: | 2013-05-16 |
License: | GPLv3 |
Sander Bollen
Friedrich, F., Kempe, a, Liebscher, V., & Winkler, G. (2008). Complexity Penalized M-Estimation. Journal of Computational and Graphical Statistics, 17(1), 201-224. doi:10.1198/106186008X285591
Mayshar, Y., Ben-David, U., Lavon, N., Biancotti, J.-C., Yakir, B., Clark, A. T., Plath, K., et al. (2010). Identification and classification of chromosomal aberrations in human induced pluripotent stem cells. Cell stem cell, 7(4), 521-31. doi:10.1016/j.stem.2010.07.017
## Not run: setwd("/some/path/to/cel/files") data <- ProcessCels() # process cel files samples <- c(1,2) # select samples 1 and 2 to compare against the rest chromosomeStats(data,chromNum="ALL",samples=samples) # check for chromosomal gains chromosomeStats(data,chromNum="ALL",samples=samples,alternative="less") # check for chromosomal losses bandStats(data,chromNum=1,samples=samples) # check for band gains in chr1 bandStats(data,chromNum=1,samples=samples,alternative="less") # check for band losses in chr1 rawPlot(data,chromNum=1,samples=samples,idiogram=TRUE) # plot raw data with an ideogram slidPlot(data,chromNum=1,samples=samples,idiogram=TRUE,combine=TRUE,k=100) # moving average plot with ideogram discontPlot(data,chromNum=1,samples=samples,idiogram=TRUE) # discontinuous plot with ideogram ## End(Not run)
## Not run: setwd("/some/path/to/cel/files") data <- ProcessCels() # process cel files samples <- c(1,2) # select samples 1 and 2 to compare against the rest chromosomeStats(data,chromNum="ALL",samples=samples) # check for chromosomal gains chromosomeStats(data,chromNum="ALL",samples=samples,alternative="less") # check for chromosomal losses bandStats(data,chromNum=1,samples=samples) # check for band gains in chr1 bandStats(data,chromNum=1,samples=samples,alternative="less") # check for band losses in chr1 rawPlot(data,chromNum=1,samples=samples,idiogram=TRUE) # plot raw data with an ideogram slidPlot(data,chromNum=1,samples=samples,idiogram=TRUE,combine=TRUE,k=100) # moving average plot with ideogram discontPlot(data,chromNum=1,samples=samples,idiogram=TRUE) # discontinuous plot with ideogram ## End(Not run)
Calculate significant chromosomal arms with various statistical tests
armStats(datalist, chromNum=1, arm="q", samples=NULL, select="cli",test="fisher", bonferroni = TRUE, enrichment = "greater")
armStats(datalist, chromNum=1, arm="q", samples=NULL, select="cli",test="fisher", bonferroni = TRUE, enrichment = "greater")
datalist |
The CAFE datalist to be analyzed, i.e. the output of
|
chromNum |
The chromosome to be calculated. This can be |
arm |
Select which arm - |
samples |
A vector containing sample numbers to be analyzed |
select |
Signifies which type of sample selection prompt will be shown, if
|
test |
Signifies which statistical test to be used in the final calculation. Must be
either |
bonferroni |
If |
enrichment |
Test for over or underexpression. Can be set to |
A named vector containing p-values.
Technically speaking, the Fisher's exact test is better than the chi-square test; the Fisher's exact test gives an exact p-value, whereas the chi-square test only gives an approximation. However, the Fisher's exact test can get slow for large sample sizes, and the chi-square test becomes better with increasing sample size but does not slow down as much.
Sander Bollen
data("CAFE_data") armStats(CAFE_data,chromNum="ALL",samples=c(1,3),arm="p")
data("CAFE_data") armStats(CAFE_data,chromNum="ALL",samples=c(1,3),arm="p")
Calculate significant chromosome bands with various statistical tests
bandStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher", bonferroni = TRUE, enrichment = "greater")
bandStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher", bonferroni = TRUE, enrichment = "greater")
datalist |
The CAFE datalist to be analyzed, i.e. the output of
|
chromNum |
The chromosome to be calculated. This can be |
samples |
A vector containing sample numbers to be analyzed |
select |
Signifies which type of sample selection prompt will be shown, if
|
test |
Signifies which statistical test to be used in the final calculation. Must be
either |
bonferroni |
If |
enrichment |
Test for over or underexpression. Can be set to |
A named vector containing p-values if testing a single chromosome. If
chromNum="ALL"
, the output will be a two-column data frame, with
cytoband names in the first column and p-values in the second column.
Technically speaking, the Fisher's exact test is better than the chi-square test; the Fisher's exact test gives an exact p-value, whereas the chi-square test only gives an approximation. However, the Fisher's exact test can get slow for large sample sizes, and the chi-square test becomes better with increasing sample size but does not slow down as much.
Sander Bollen
data(CAFE_data) bandStats(CAFE_data,chromNum=17,samples=c(1,3),test="fisher")
data(CAFE_data) bandStats(CAFE_data,chromNum=17,samples=c(1,3),test="fisher")
Contains the dataset of GSE6561 and GSE10809 processed by
ProcessCels
data("CAFE_data")
data("CAFE_data")
A list containing two lists
whole
A list containing a dataframe for each sample
over
A list containing a dataframe for each sample, but with only those probes that are deemed overexpressed
The dataframes inside the lists contain the following columns:
ID
Affymetrix probe IDs
Sym
Gene symbols
Value
Log2 transformed expression values
LogRel
Log2 transformed relative expression values (to the median)
Loc
Chromosomal locations
Chr
Chromosome identifiers
GSE6561: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6561
GSE10809: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10809
data("CAFE_data")
data("CAFE_data")
Calculate significant chromosomes with various statistical tests
chromosomeStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher", bonferroni = TRUE, enrichment = "greater")
chromosomeStats(datalist, chromNum=1, samples=NULL, select="cli", test="fisher", bonferroni = TRUE, enrichment = "greater")
datalist |
The CAFE datalist to be analyzed, i.e. the output of
|
chromNum |
The chromosome to be calculated. This can be |
samples |
A vector containing sample numbers to be analyzed |
select |
Signifies which type of sample selection prompt will be shown, if
|
test |
Signifies which statistical test to be used in the final calculation.
Must be either |
bonferroni |
If |
enrichment |
Test for over or underexpression. Can be set to |
A named vector containing p-values.
Technically speaking, the Fisher's exact test is better than the chi-square test; the Fisher's exact test gives an exact p-value, whereas the chi-square test only gives an approximation. However, the Fisher's exact test can get slow for large sample sizes, and the chi-square test becomes better with increasing sample size but does not slow down as much.
Sander Bollen
data("CAFE_data") sam <- c(9,11) chromosomeStats(CAFE_data,chromNum=17,samples=sam,test="fisher")
data("CAFE_data") sam <- c(9,11) chromosomeStats(CAFE_data,chromNum=17,samples=sam,test="fisher")
Provides command line interface for subsetting input datasets
cliSubset(datalist,alternative)
cliSubset(datalist,alternative)
datalist |
the dataset to be subsetted |
alternative |
"greater" or "less" |
subset of input
Sander Bollen
## Not run: datalist <- data("CAFE_data") sub <- cliSubset(datalist,alternative="greater") ## End(Not run)
## Not run: datalist <- data("CAFE_data") sub <- cliSubset(datalist,alternative="greater") ## End(Not run)
Plots chromosome plots with a discontinuous smoother
discontPlot(datalist,samples=c(1,2),chromNum=1,gamma=300,idiogram=FALSE, file="default")
discontPlot(datalist,samples=c(1,2),chromNum=1,gamma=300,idiogram=FALSE, file="default")
datalist |
The CAFE datalist to be analyzed, i.e. the output of
|
samples |
A vector or sample numbers to be plotted |
chromNum |
the chromosome to be plotted |
gamma |
The |
idiogram |
if |
file |
Specify a file name to store output png file |
Plot to file system;
Returns a ggplot2 graph if chromNum!="ALL"
. When chromNum=="ALL"
,
returns a list of ggplot2 graphs.
Sander Bollen
Friedrich, F., Kempe, a, Liebscher, V., & Winkler, G. (2008). Complexity Penalized M-Estimation. Journal of Computational and Graphical Statistics, 17(1), 201-224. doi:10.1198/106186008X285591
data("CAFE_data") discontPlot(CAFE_data,samples=9,chromNum=17,gamma=300)
data("CAFE_data") discontPlot(CAFE_data,samples=9,chromNum=17,gamma=300)
Calculates discontinuous smoother
discontSmooth(y,gamma)
discontSmooth(y,gamma)
y |
input vector |
gamma |
The |
Uses the potts filter algorithm described by Friedrich et al.
Vector with same length as input y
Sander Bollen
Friedrich, F., Kempe, a, Liebscher, V., & Winkler, G. (2008). Complexity Penalized M-Estimation. Journal of Computational and Graphical Statistics, 17(1), 201-224. doi:10.1198/106186008X285591
#generate piecewise vector with gaussian noise y <- 1:450 y[1:150] <- 2 y[151:300] <- 3 y[301:450] <- 1 y <- y + rnorm(450) #calculate smoother y_smooth <- discontSmooth(y,20)
#generate piecewise vector with gaussian noise y <- 1:450 y[1:150] <- 2 y[151:300] <- 3 y[301:450] <- 1 y <- y + rnorm(450) #calculate smoother y_smooth <- discontSmooth(y,20)
Plots all chromosomes in horizontal alignment next to each other, with optionally a moving average smoother applied to the data
facetPlot(datalist,samples=c(1,2),slid=FALSE,combine=FALSE,k=1,file="default")
facetPlot(datalist,samples=c(1,2),slid=FALSE,combine=FALSE,k=1,file="default")
datalist |
The CAFE datalist to be analyzed, i.e. the output of |
samples |
A vector or sample numbers to be plotted |
slid |
If |
combine |
If |
k |
The sliding window size. Must be a positive integer, smaller than the length of Affy IDs on the chromosome |
file |
Specify a file name to store output png file |
Plot to file system. Return a ggplot2 graph
Makes heavy use of the ggplot2 package
Sander Bollen
H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
data("CAFE_data") facetPlot(CAFE_data,samples=9)
data("CAFE_data") facetPlot(CAFE_data,samples=9)
Combines pvalues by using Fisher's method
fisher.method(pvals)
fisher.method(pvals)
pvals |
Vector of p values |
Combined p value
Sander Bollen
pvals <- runif(20) #generate 20 pvals fisher.method(pvals)
pvals <- runif(20) #generate 20 pvals fisher.method(pvals)
Provides graphical user interface for subsetting input datasets
guiSubset(datalist,alternative)
guiSubset(datalist,alternative)
datalist |
the dataset to be subsetted |
alternative |
"greater" or "less" |
Subset of input to variable guiSelectedSet
in working directory
Sander Bollen
## Not run: data("CAFE_data") guiSubset(CAFE_data,alternative="greater") ## End(Not run)
## Not run: data("CAFE_data") guiSubset(CAFE_data,alternative="greater") ## End(Not run)
Normalizes and computes relative expressions for all CEL files in work directory
ProcessCels(threshold.over=1.5,threshold.under=(2/3),remove_method=1, local_file=NULL)
ProcessCels(threshold.over=1.5,threshold.under=(2/3),remove_method=1, local_file=NULL)
threshold.over |
Determines the threshold, as a multiple of median value, where probes are considered overexpressed. Default is 1.5 |
threshold.under |
Determines the threshold, as a fraction of median value, where probes are considered underexpressed. Default is 2/3 |
remove_method |
Determines which method is used to remove multiple probesets that are
annotated to map to the same gene. The default option, If If |
local_file |
Use a local - previously downloaded - UCSC file (e.g. http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ affyU133Plus2.txt.gz) instead of directly retrieving the file instead. |
this function uses the RMA algorithm to normalize *.CEL files in work directory. It then computes relative expressions for every probe on every sample. Locations for probesets are downloaded from UCSC, as the standard BioConductor annotations do not map probeset location (they only map the location to the corresponding gene). Multiple probesets belonging to the same gene are removed as described above. The function then determines which probes are overexpressed and underexpressed relative to the median probeset values across all samples. Finally,the relative expressions are log2-transformed.
list
$whole |
named list, where each element is a data.frame corresponding to
a *.CEL file - containing columns:
1): |
$over |
same as $whole, but contains only those probes which are deemed overexpressed |
$under |
same as $whole, but contains only those probes which are deemd underexpressed |
Sander Bollen
## Not run: data <- ProcessCels() ## End(Not run)
## Not run: data <- ProcessCels() ## End(Not run)
Makes chromosome plot using raw data values
rawPlot(datalist,samples=c(1,2),chromNum=1,idiogram=FALSE,file="default")
rawPlot(datalist,samples=c(1,2),chromNum=1,idiogram=FALSE,file="default")
datalist |
The CAFE datalist to be analyzed, i.e. the output of
|
samples |
A vector or sample numbers to be plotted |
chromNum |
The chromosome to be analyzed |
idiogram |
If |
file |
Specify a file name to store output png file |
Plot to file system;
Returns a ggplot2 graph if chromNum!="ALL"
. When chromNum=="ALL"
,
returns a list of ggplot2 graphs.
Sander Bollen
slidPlot
facetPlot
discontPlot
data("CAFE_data") rawPlot(CAFE_data,samples=8,chromNum=17)
data("CAFE_data") rawPlot(CAFE_data,samples=8,chromNum=17)
Plots chromosome plots with a moving average smoother
slidPlot(datalist,samples=c(1,2),chromNum=1,combine=FALSE,k=1,idiogram=FALSE,file="default")
slidPlot(datalist,samples=c(1,2),chromNum=1,combine=FALSE,k=1,idiogram=FALSE,file="default")
datalist |
The CAFE datalist to be analyzed, i.e. the output of |
samples |
A vector of sample numbers to be plotted |
chromNum |
The chromosome to be analyzed |
combine |
If |
k |
The sliding window size. Must be a positive integer, smaller than the total number of probesets on the chromosome |
idiogram |
If |
file |
Specify a file name to store output png fileS |
Plot to file system;
Returns a ggplot2 graph if chromNum!="ALL"
. When chromNum=="ALL"
,
returns a list of ggplot2 graphs.
Makes heavy use of the ggplot2 package.
Sander Bollen
H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
data("CAFE_data") slidPlot(CAFE_data,samples=9,chromNum=17,k=50,combine=TRUE)
data("CAFE_data") slidPlot(CAFE_data,samples=9,chromNum=17,k=50,combine=TRUE)
Calculates moving average smoother
slidSmooth(x,k)
slidSmooth(x,k)
x |
input vector |
k |
The moving average window size. Must be an integer value greater than 0,
and no larger than |
Vector with same length as input y
Sander Bollen
#generate piecewise vector with gaussian noise y <- 1:450 y[1:150] <- 2 y[151:300] <- 3 y[301:450] <- 1 y <- y + rnorm(450) #calculate smoother y_smooth <- slidSmooth(y,20)
#generate piecewise vector with gaussian noise y <- 1:450 y[1:150] <- 2 y[151:300] <- 3 y[301:450] <- 1 y <- y + rnorm(450) #calculate smoother y_smooth <- slidSmooth(y,20)