Title: | Calling aberrations for array CGH tumor profiles. |
---|---|
Description: | Calls aberrations for array CGH data using a six state mixture model as well as several biological concepts that are ignored by existing algorithms. Visualization of profiles is also provided. |
Authors: | Mark van de Wiel, Sjoerd Vosse |
Maintainer: | Mark van de Wiel <[email protected]> |
License: | GPL (http://www.gnu.org/copyleft/gpl.html) |
Version: | 2.69.0 |
Built: | 2024-11-18 03:08:11 UTC |
Source: | https://github.com/bioc/CGHcall |
Calls aberrations for array CGH data using a six state mixture model as well as several biological concepts that are ignored by existing algorithms. Visualization of profiles is also provided.
Package: | CGHcall |
Type: | Package |
Version: | 2.34.1 |
Date: | 2016-09-12 |
License: | GPL |
Sjoerd Vosse and Mark van de Wiel
Maintainer: Mark van de Wiel <[email protected]>
Mark A. van de Wiel, Kyung In Kim, Sjoerd J. Vosse, Wessel N. van Wieringen, Saskia M. Wilting and Bauke Ylstra. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics, 23, 892-894.
Calls aberrations for array CGH data using a six state mixture model.
CGHcall(inputSegmented, prior = "auto", nclass = 5, organism = "human", cellularity=1, robustsig="yes", nsegfit=3000, maxnumseg=100, minlsforfit=0.5, build="GRCh37",ncpus=1)
CGHcall(inputSegmented, prior = "auto", nclass = 5, organism = "human", cellularity=1, robustsig="yes", nsegfit=3000, maxnumseg=100, minlsforfit=0.5, build="GRCh37",ncpus=1)
inputSegmented |
An object of class |
prior |
Options are |
nclass |
The number of levels to be used for calling. Either |
organism |
Either |
cellularity |
A vector of cellularities ranging from 0 to 1 to define the contamination of your sample with healthy cells (1 = no contamination). See details for more information. |
robustsig |
Options are |
nsegfit |
Maximum number of segments used for fitting the mixture model. Posterior probabilities are computed for all segments |
maxnumseg |
Maximum number of segments per profile used for fitting the model |
minlsforfit |
Minimum length of the segment (in Mb) to be used for fitting the model |
build |
Build of Humane Genome. Either |
ncpus |
Number of cpus used for parallel calling. Has a large effect on computing time.
|
Please read the article and the supplementary information for detailed information on the algorithm.
The parameter prior
states how the data is used to determine the prior probabilities. When set to all
, the probabilities are determined using the entire genome of each sample.
When set to not all
probabilites are determined per chromosome for each sample when organism
is set to other
or per chromosome arm when organism
is human
.
The chromosome arm information is taken from the March 2006 version of the UCSC database. When prior
is set to auto
, the way probabilities are determined depends on the sample size. The entire genome is used when the sample size is smaller than 20, otherwise chromosome (arm) information is used.
Please note that CGHcall uses information from all input data to determine the aberration probabilities.
When for example triploid or tetraploid tumors are observed, we advise to run CGHcall separately on
those (groups of) samples. Note that robustsig = yes
enforces the sd corresponding to
the normal segments to be at least half times the pooled gain/loss sd. Use of nsegfit
significantly lower computing
time with respect to previous CGHcall versions without much accuracy loss. Moreover, maxnumseg
decreases the
impact on the results of profiles with inferior segmentation results. Finally, minlsforfit
decreases the impact
of very small aberations (potentially CNVs rather than CNAs) on the fit of the model. Note that always a result for all
segments is produced. IN MOST CASES, CGHcall SHOULD BE FOLLOWED BY FUNCTION ExpandCGHcall.
This function return a list with six components:
posteriorfin2 |
Matrix containing call probabilities for each segment. First column denotes profile number, followed by k columns with aberration probabilities for each sample, where k is the number of levels used for calling ( |
nclone |
Number of clone or probes |
nc |
Number of samples |
nclass |
Number of classes used |
regionsprof |
Matrix containing information about the segments, 4 colums: profile, start probe, end probe, segmented value |
params |
Vector containing the parameter values of the mixture model |
Sjoerd Vosse, Mark van de Wiel, Ilari Scheinin
Mark A. van de Wiel, Kyung In Kim, Sjoerd J. Vosse, Wessel N. van Wieringen, Saskia M. Wilting and Bauke Ylstra. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics, 23, 892-894.
data(Wilting) ## Convert to \code{\link{cghRaw}} object cgh <- make_cghRaw(Wilting) print(cgh) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells normalized.data <- normalize(raw.data) ## Segmentation with slightly relaxed significance level to accept change-points. ## Note that segmentation can take a long time. ## Not run: segmented.data <- segmentData(normalized.data, alpha=0.02) ## Not run: postsegnormalized.data <- postsegnormalize(segmented.data) ## Call aberrations perc.tumor <- rep(0.75, 3) ## Not run: result <- CGHcall(postsegnormalized.data,cellularity=perc.tumor) ## Expand to CGHcall object ## Not run: result <- ExpandCGHcall(result,postsegnormalized.data)
data(Wilting) ## Convert to \code{\link{cghRaw}} object cgh <- make_cghRaw(Wilting) print(cgh) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells normalized.data <- normalize(raw.data) ## Segmentation with slightly relaxed significance level to accept change-points. ## Note that segmentation can take a long time. ## Not run: segmented.data <- segmentData(normalized.data, alpha=0.02) ## Not run: postsegnormalized.data <- postsegnormalize(segmented.data) ## Call aberrations perc.tumor <- rep(0.75, 3) ## Not run: result <- CGHcall(postsegnormalized.data,cellularity=perc.tumor) ## Expand to CGHcall object ## Not run: result <- ExpandCGHcall(result,postsegnormalized.data)
Expands result from CGHcall
function to CGHcall object.
ExpandCGHcall(listcall,inputSegmented, digits=3, divide=4, memeff = FALSE, fileoutpre="Callobj_",CellularityCorrectSeg=TRUE)
ExpandCGHcall(listcall,inputSegmented, digits=3, divide=4, memeff = FALSE, fileoutpre="Callobj_",CellularityCorrectSeg=TRUE)
listcall |
List object; output of function |
inputSegmented |
An object of class |
digits |
Number of decimal digits to be saved in the resulting call object. Allows for saving storage space |
divide |
Number of batches to divide the work load in. Larger values saves memory, but requires more computing time |
memeff |
When set to TRUE, memory efficient mode is used: results are written in batches to multiple external files. If FALSE, one output object is provided. |
fileoutpre |
Only relevant when memeff=TRUE. Define prefix for output file names |
CellularityCorrectSeg |
If TRUE, corrects segmented and normalized values for cellularity as well |
This function is new in version 2.7.0. It allows more memory efficient handling of large data objects. If R crashes because of memory problem, we advise to set memeff = TRUE and increase the value of divide. When multiple files are output (in case of memeff=TRUE) the function combine may be used to combine CGHcall objects.
An object of class cghCall-class
either as one object (when memeff = FALSE) or as multiple objects stored in .Rdata files in the working directory (when memeff = FALSE)
Sjoerd Vosse & Mark van de Wiel
Mark A. van de Wiel, Kyung In Kim, Sjoerd J. Vosse, Wessel N. van Wieringen, Saskia M. Wilting and Bauke Ylstra. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics, 23, 892-894.
data(Wilting) ## Convert to \code{\link{cghRaw}} object cgh <- make_cghRaw(Wilting) print(cgh) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells perc.tumor <- rep(0.75, 3) normalized.data <- normalize(raw.data) ## Segmentation with slightly relaxed significance level to accept change-points. ## Note that segmentation can take a long time. ## Not run: segmented.data <- segmentData(normalized.data, alpha=0.02) ## Not run: postsegnormalized.data <- postsegnormalize(segmented.data) ## Call aberrations ## Not run: result <- CGHcall(postsegnormalized.data, cellularity=perc.tumor) ## Not run: result <- ExpandCGHcall(result,postsegnormalized.data)
data(Wilting) ## Convert to \code{\link{cghRaw}} object cgh <- make_cghRaw(Wilting) print(cgh) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells perc.tumor <- rep(0.75, 3) normalized.data <- normalize(raw.data) ## Segmentation with slightly relaxed significance level to accept change-points. ## Note that segmentation can take a long time. ## Not run: segmented.data <- segmentData(normalized.data, alpha=0.02) ## Not run: postsegnormalized.data <- postsegnormalize(segmented.data) ## Call aberrations ## Not run: result <- CGHcall(postsegnormalized.data, cellularity=perc.tumor) ## Not run: result <- ExpandCGHcall(result,postsegnormalized.data)
This function normalizes arrayCGH data using the global mode or median. It can also adjust for the cellularity of your data.
normalize(input, method = "median", smoothOutliers = TRUE, ...)
normalize(input, method = "median", smoothOutliers = TRUE, ...)
input |
Object of class |
method |
Normalization method, either |
smoothOutliers |
Logical. Indicates whether outliers should be smoothed using the |
... |
Arguments for |
The cellularity parameter should be a vector of length n where n is the number of samples in your dataset. The vector is recycled if there are not enough values in it, or truncated if there are too many. For more information on the correction we refer to section 1.6 of the supplementary information for van de Wiel et al. 2006.
This function returns a dataframe in the same format as the input with normalized and/or cellularity adjusted log2 ratios.
Sjoerd Vosse & Mark van de Wiel
data(Wilting) ## Convert to 'cghRaw' object cgh <- make_cghRaw(Wilting) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells normalized.data <- normalize(raw.data)
data(Wilting) ## Convert to 'cghRaw' object cgh <- make_cghRaw(Wilting) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells normalized.data <- normalize(raw.data)
This function normalizes arrayCGH data after segmentation in order to find a better 0-level.
postsegnormalize(segmentData, inter=c(-0.1,0.1))
postsegnormalize(segmentData, inter=c(-0.1,0.1))
segmentData |
Object of class |
inter |
Interval in which the function should search for the normal level. |
This function recursively searches for the interval containing the most segmented data, decreasing the interval length in each recursion. The recursive search makes the post-segmentation normalization robust against local maxima. This function is particularly useful for profiles for which, after segmentation, the 0-level does not coincide with many segments. It is more or less harmless to other profiles. We advise to keep the search interval (inter) small, in particular at the positive (gain) side to avoid that the 0-level is set to a common gain level.
This function returns a cghSeg object in the same format as the input with post-segmentation-normalized adjusted log2 ratios and segmented values.
Mark van de Wiel
data(Wilting) ## Convert to \code{\link{cghRaw}} object cgh <- make_cghRaw(Wilting) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells normalized.data <- normalize(raw.data) ## Segmentation with slightly relaxed significance level to accept change-points. ## Note that segmentation can take a long time. ## Not run: segmented.data <- segmentData(normalized.data, alpha=0.02) ## Not run: postsegnormalized.data <- postsegnormalize(segmented.data, inter=c(-0.1,0.1))
data(Wilting) ## Convert to \code{\link{cghRaw}} object cgh <- make_cghRaw(Wilting) ## First preprocess the data raw.data <- preprocess(cgh) ## Simple global median normalization for samples with 75% tumor cells normalized.data <- normalize(raw.data) ## Segmentation with slightly relaxed significance level to accept change-points. ## Note that segmentation can take a long time. ## Not run: segmented.data <- segmentData(normalized.data, alpha=0.02) ## Not run: postsegnormalized.data <- postsegnormalize(segmented.data, inter=c(-0.1,0.1))
This function preprocesses your aCGH data so it can be processed by other functions without errors.
preprocess(input, maxmiss = 30, nchrom = 23, ...)
preprocess(input, maxmiss = 30, nchrom = 23, ...)
input |
Object of class |
maxmiss |
Maximum percentage of missing values per row. |
nchrom |
Number of chromosomes. |
... |
Arguments for |
This function performs the following actions on arrayCGH data:
Filter out data with missing position information.
Remove data on chromosomes larger than nchrom.
Remove rows with more than maxmiss percentage missing values.
Imputes missing values using the impute.knn
function from the impute package.
This function returns a dataframe in the same format as the input with missing values imputed.
Sjoerd Vosse & Mark van de Wiel
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B. Altman (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17, 520-525.
data(WiltingRaw) preprocessed <- preprocess(WiltingRaw, nchrom = 22)
data(WiltingRaw) preprocessed <- preprocess(WiltingRaw, nchrom = 22)
A wrapper function to run existing breakpoint detection algorithms on arrayCGH data. Currently only DNAcopy is implemented.
segmentData(input, clen=10, relSDlong=3, method = "DNAcopy", ...)
segmentData(input, clen=10, relSDlong=3, method = "DNAcopy", ...)
input |
Object of class |
clen |
Boundary for short vs long segments, in number of features |
relSDlong |
Relative undo sd for long segments. See details. |
method |
The method to be used for breakpoint detection. Currently only DNAcopy is supported, which will run the |
... |
Arguments for |
See segment
for details on the algorithm. About clen
and relSDlong
:
these are only relevant when segment
option undo.splits
=sdundo
is set, in combination with segment
option undo.SD
.
relSDlong
provides the undo sd for long segments, which equals undo.SD/relSDlong
. undo.SD
is then used for short segments.
In the example below, short segments are considered to contain
less or equal to clen
=10 features. The example below undoes splits for two consecutive short segments if these are less than undo.SD
=3 sd apart,
while it undoes splits for two long segments if these are less than undo.SD/relSDlong
=3/3=1 sd apart. If, for two consecutive segements, one is short and one is long,
splits are undone in the same way as for two short segments.
This function returns a dataframe in the same format as the input with segmented arrayCGH data.
Sjoerd Vosse & Mark van de Wiel
Venkatraman, A.S., Olshen, A.B. (2007). A faster circulary binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 23, 657-663.
data(WiltingNorm) ## Not run: segmented.data <- segmentData(WiltingNorm, alpha=0.02,clen=10,relSDlong=3,undo.SD=3,undo.splits="sdundo")
data(WiltingNorm) ## Not run: segmented.data <- segmentData(WiltingNorm, alpha=0.02,clen=10,relSDlong=3,undo.SD=3,undo.splits="sdundo")
A dataframe containing 4709 rows and 8 columns with arrayCGH data.
Wilting
Wilting
A dataframe containing the following 8 columns:
The unique identifiers of array elements.
Chromosome number of each array element.
Chromosomal position in bp of each array element.
Raw log2 ratios for cervical cancer sample AdCA10.
Raw log2 ratios for cervical cancer sample SCC27.
Raw log2 ratios for cervical cancer sample SCC32.
Raw log2 ratios for cervical cancer sample SCC36.
Raw log2 ratios for cervical cancer sample SCC39.
Wilting, S.M., Snijders, P.J., Meijer, G.A., Ylstra, B., van den IJssel, P.R., Snijders, A.M., Albertson, D.G., Coffa, J., Schouten, J.P., van de Wiel, M.A., Meijer, C.J., & Steenbergen, R.D. (2006). Increased gene copy numbers at chromosome 20q are frequent in both squamous cell carcinomas and adenocarcinomas of the cervix. Journal of Pathology, 210, 258-259.