Title: | Gain and Loss Analysis of DNA |
---|---|
Description: | Analysis of array CGH data : detection of breakpoints in genomic profiles and assignment of a status (gain, normal or loss) to each chromosomal regions identified. |
Authors: | Philippe Hupe |
Maintainer: | Philippe Hupe <[email protected]> |
License: | GPL-2 |
Version: | 2.71.0 |
Built: | 2024-10-30 07:18:23 UTC |
Source: | https://github.com/bioc/GLAD |
Bladder cancer data from 3 arrays CGH (Comparative Genomic Hybridyzation). Arrays dimension are 4 blocs per column, 4 blocs per row, 21 columns per bloc and 22 rows by blocs.
data(arrayCGH)
data(arrayCGH)
A data frame composed of the following elements :
Log 2 ratio.
BAC position on the genome.
Chromosome.
Column location on the array.
Row location on the array.
Institut Curie, [email protected].
data(arrayCGH) data <- array1 #array1 to array3
data(arrayCGH) data <- array1 #array1 to array3
Description of the object arrayCGH
.
The object arrayCGH
is a list with at least a
data.frame named arrayValues
and a vector named
arrayDesign
. The data.frame arrayValues
must contain the
following fields :
Col |
Vector of columns coordinates. |
Row |
Vector of rows coordinates. |
... |
Other elements can be added. |
The vector arrayDesign
is composed of 4 values : c(arrayCol,
arrayRow, SpotCol, SpotRow). The array CGH is represented by arrayRow*arrayCol
blocs and each bloc is composed of SpotRow*SpotCol spots.
N.B. : Col takes the values in 1:arrayRow*SpotRow and Row takes the values in 1:arrayCol*SpotCol
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected].
glad
.
data(arrayCGH) # object of class arrayCGH array <- list(arrayValues=array2, arrayDesign=c(4,4,21,22)) class(array) <- "arrayCGH"
data(arrayCGH) # object of class arrayCGH array <- list(arrayValues=array2, arrayDesign=c(4,4,21,22)) class(array) <- "arrayCGH"
The function arrayPersp
creates perspective images of shades of gray or colors that correspond to the values of a statistic for each spot on the array. The statistic can be the intensity log-ratio, a spot quality measure (e.g. spot size or shape), or a test statistic. This function can be used to explore whether there are any spatial effects in the data, for example, print-tip or cover-slip effects.
## Default S3 method: arrayPersp(Statistic, Col, Row, ArrCol, ArrRow, SpotCol, SpotRow, mediancenter=FALSE, col=myPalette("green","red","yellow"), zlim=zlim, bar=TRUE, ...) ## S3 method for class 'arrayCGH' arrayPersp(arrayCGH, variable, mediancenter=FALSE, col=myPalette("green","red","yellow"), zlim=zlim, bar=TRUE, ...)
## Default S3 method: arrayPersp(Statistic, Col, Row, ArrCol, ArrRow, SpotCol, SpotRow, mediancenter=FALSE, col=myPalette("green","red","yellow"), zlim=zlim, bar=TRUE, ...) ## S3 method for class 'arrayCGH' arrayPersp(arrayCGH, variable, mediancenter=FALSE, col=myPalette("green","red","yellow"), zlim=zlim, bar=TRUE, ...)
arrayCGH |
Object of class |
variable |
Variable to be plotted |
Statistic |
Statistic to be plotted. |
Col |
Vector of columns coordinates. |
Row |
Vector of rows coordinates. |
ArrCol |
Number of columns for the blocs. |
ArrRow |
Number of rows for the blocs. |
SpotCol |
Number of column for each bloc. |
SpotRow |
Number of rows for each bloc. |
mediancenter |
If |
col |
List of colors such as that generated by |
zlim |
Numerical vector of length 2 giving the extreme values of
|
bar |
If |
... |
Graphical parameters can be given as arguments to function
|
N.B. : Col takes the values in 1:arrayRow*SpotRow and Row takes the values in 1:arrayCol*SpotCol
An image is created on the current graphics device.
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected].
## Not run: data(arrayCGH) # object of class arrayCGH array <- list(arrayValues=array2, arrayDesign=c(4,4,21,22)) class(array) <- "arrayCGH" arrayPersp(array,"Log2Rat", main="Perspective image of array CGH", box=FALSE, theta=110, phi=40) ## End(Not run)
## Not run: data(arrayCGH) # object of class arrayCGH array <- list(arrayValues=array2, arrayDesign=c(4,4,21,22)) class(array) <- "arrayCGH" arrayPersp(array,"Log2Rat", main="Perspective image of array CGH", box=FALSE, theta=110, phi=40) ## End(Not run)
The function arrayPlot
creates spatial images of shades of gray or colors that correspond to the values of a statistic for each spot on the array. The statistic can be the intensity log-ratio, a spot quality measure (e.g. spot size or shape), or a test statistic. This function can be used to explore whether there are any spatial effects in the data, for example, print-tip or cover-slip effects.
## Default S3 method: arrayPlot(Statistic, Col, Row, ArrCol, ArrRow, SpotCol, SpotRow, mediancenter=FALSE, col=myPalette("green", "red", "yellow"), contour=FALSE, nlevels=5, zlim=NULL, bar=c("none", "horizontal", "vertical"), layout=TRUE, ...) ## S3 method for class 'arrayCGH' arrayPlot(arrayCGH, variable, mediancenter=FALSE, col=myPalette("green", "red", "yellow"), contour=FALSE, nlevels=5, zlim=NULL, bar=c("none", "horizontal", "vertical"), layout=TRUE, ...)
## Default S3 method: arrayPlot(Statistic, Col, Row, ArrCol, ArrRow, SpotCol, SpotRow, mediancenter=FALSE, col=myPalette("green", "red", "yellow"), contour=FALSE, nlevels=5, zlim=NULL, bar=c("none", "horizontal", "vertical"), layout=TRUE, ...) ## S3 method for class 'arrayCGH' arrayPlot(arrayCGH, variable, mediancenter=FALSE, col=myPalette("green", "red", "yellow"), contour=FALSE, nlevels=5, zlim=NULL, bar=c("none", "horizontal", "vertical"), layout=TRUE, ...)
arrayCGH |
Object of class |
variable |
Variable to be plotted |
Statistic |
Statistic to be plotted. |
Col |
Vector of columns coordinates. |
Row |
Vector of rows coordinates. |
ArrCol |
Number of columns for the blocs. |
ArrRow |
Number of rows for the blocs. |
SpotCol |
Number of column for each bloc. |
SpotRow |
Number of rows for each bloc. |
mediancenter |
If |
col |
List of colors such as that generated by |
contour |
If |
nlevels |
Numbers of levels added by |
zlim |
Numerical vector of length 2 giving the extreme values of
|
bar |
If |
layout |
If |
... |
Graphical parameters can be given as arguments to function
|
N.B. : Col takes the values in 1:arrayRow*SpotRow and Row takes the values in 1:arrayCol*SpotCol
This function is very similar to the
maImage
written by Sandrine Dudoit
(available in marrayPlots package) with
added options zlim
, mediancenter
and layout
.
An image is created on the current graphics device.
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected].
image
, contour
, arrayPersp
, myPalette
.
data(arrayCGH) pdf(file="arrayCGH.pdf",height=21/cm(1),width=29.7/cm(1)) arrayPlot(array2$Log2Rat, array2$Col, array2$Row, 4,4,21,22, main="Spatial Image of array CGH") dev.off() # object of class arrayCGH array <- list(arrayValues=array2, arrayDesign=c(4,4,21,22)) class(array) <- "arrayCGH" arrayPlot(array,"Log2Rat", main="Spatial Image of array CGH")
data(arrayCGH) pdf(file="arrayCGH.pdf",height=21/cm(1),width=29.7/cm(1)) arrayPlot(array2$Log2Rat, array2$Col, array2$Row, 4,4,21,22, main="Spatial Image of array CGH") dev.off() # object of class arrayCGH array <- list(arrayValues=array2, arrayDesign=c(4,4,21,22)) class(array) <- "arrayCGH" arrayPlot(array,"Log2Rat", main="Spatial Image of array CGH")
Convert a profileCGH object into a data.frame.
## S3 method for class 'profileCGH' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'profileCGH' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
The object to converted into data.frame. |
row.names |
NULL or a character vector giving the row names for the data frame. Missing values are not allowed. |
optional |
logical. If 'TRUE', setting row names and converting column names (to syntactic names) is optional. |
... |
... |
The attributes profileValues
and profileValuesNA
are binded into a data.frame.
A data.frame object
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected]
data(snijders) ### Creation of "profileCGH" object profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### glad function as described in Hupé et al. (2004) ### ########################################################### res <- glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=2, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, verbose=FALSE) res <- as.data.frame(res)
data(snijders) ### Creation of "profileCGH" object profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### glad function as described in Hupé et al. (2004) ### ########################################################### res <- glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=2, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, verbose=FALSE) res <- as.data.frame(res)
Create an object of class profileCGH.
as.profileCGH(object,...) ## S3 method for class 'data.frame' as.profileCGH(object, infaction=c("value","empty"), value=20, keepSmoothing=FALSE, ...)
as.profileCGH(object,...) ## S3 method for class 'data.frame' as.profileCGH(object, infaction=c("value","empty"), value=20, keepSmoothing=FALSE, ...)
object |
A data.frame to be convert into profileCGH. |
infaction |
If "value" then the LogRatio with infinite values
(-Inf, Inf) are replace by + or - |
value |
replace Inf by |
keepSmoothing |
if TRUE the smoothing value in object is kept |
... |
... |
The data.frame to be convert must at least contain the
following fields: LogRatio, PosOrder, and Chromosome. If the field
Chromosome is of mode character, it is automatically converted into a
numeric vector (see ChrNumeric
); a field ChromosomeChar
contains the character labels. The data.frame to be converted into a
profileCGH objet is split into two data.frame: profileValuesNA contains
the rows for which there is at least a missing value for either
LogRatio, PosOrder or Chromosome; profileValues contains the remaining rows.
A list with the following attributes |
|
profileValues |
A data.frame |
profileValuesNA |
A data.frame |
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected]
data(snijders) ### Creation of "profileCGH" object profileCGH <- as.profileCGH(gm13330) attributes(profileCGH)
data(snijders) ### Creation of "profileCGH" object profileCGH <- as.profileCGH(gm13330) attributes(profileCGH)
Convert chromosome into numeric values.
ChrNumeric(Chromosome)
ChrNumeric(Chromosome)
Chromosome |
A vector with chromosome labels. |
For sexual chromosome, labels must contains X or Y which are coded by 23 and 24 respectively.
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected]
Chromosome <- c("1","X","Y","chr X", "ChrX", "chrX", "Chr Y") ChrNumeric(Chromosome)
Chromosome <- c("1","X","Y","chr X", "ChrX", "chrX", "Chr Y") ChrNumeric(Chromosome)
This function produces a color image (color bar) which can be used for
the legend to another color image obtained from the functions
image
or arrayPlot
.
ColorBar(x, horizontal=TRUE, col=heat.colors(50), scale=1:length(x), k=10, ...)
ColorBar(x, horizontal=TRUE, col=heat.colors(50), scale=1:length(x), k=10, ...)
x |
If "numeric", a vector containing the "z" values in the color image, i.e., the values which are represented in the color image. Otherwise, a "character" vector representing colors. |
horizontal |
If |
col |
Vector of colors such as that generated by
rainbow, heat.colors, topo.colors, terrain.colors, or similar
functions. In addition to these color palette functions, a new
function |
scale |
A "numeric" vector specifying the "z" values in the color
image. This is used when the argument |
k |
Object of class "numeric", for the number of labels displayed on the bar. |
... |
Optional graphical parameters, see |
Sandrine Dudoit, Yee Hwa (Jean) Yang.
par(mfrow=c(3,1)) Rcol <- myPalette(low="white", high="red", k=10) Gcol <- myPalette(low="white", high="green", k=50) RGcol <- myPalette(low="green", high="red", k=100) ColorBar(Rcol) ColorBar(Gcol, scale=c(-5,5)) ColorBar(1:50, col=RGcol) par(mfrow=c(1,3)) x<-seq(-1, 1, by=0.01) ColorBar(x, col=Gcol, horizontal=FALSE, k=11) ColorBar(x, col=Gcol, horizontal=FALSE, k=21) ColorBar(x, col=Gcol, horizontal=FALSE, k=51)
par(mfrow=c(3,1)) Rcol <- myPalette(low="white", high="red", k=10) Gcol <- myPalette(low="white", high="green", k=50) RGcol <- myPalette(low="green", high="red", k=100) ColorBar(Rcol) ColorBar(Gcol, scale=c(-5,5)) ColorBar(1:50, col=RGcol) par(mfrow=c(1,3)) x<-seq(-1, 1, by=0.01) ColorBar(x, col=Gcol, horizontal=FALSE, k=11) ColorBar(x, col=Gcol, horizontal=FALSE, k=21) ColorBar(x, col=Gcol, horizontal=FALSE, k=51)
Cytogenetic banding
data(cytoband)
data(cytoband)
data(cytoband) cytoband
data(cytoband) cytoband
This function allows the detection of breakpoints in genomic profiles obtained by array CGH technology and affects a status (gain, normal or lost) to each clone.
## S3 method for class 'profileCGH' daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE, OnlySmoothing = FALSE, OnlyOptimCall = FALSE, smoothfunc="lawsglad", lkern="Exponential", model="Gaussian", qlambda=0.999, bandwidth=10, sigma=NULL, base=FALSE, round=2, lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2, method="centroid", nmin=1, nmax=8, region.size=2, amplicon=1, deletion=-5, deltaN=0.10, forceGL=c(-0.15,0.15), nbsigma=3, MinBkpWeight=0.35, DelBkpInAmp=TRUE, DelBkpInDel=TRUE, CheckBkpPos=TRUE, assignGNLOut=TRUE, breaksFdrQ = 0.0001, haarStartLevel = 1, haarEndLevel = 5, weights.name = NULL, verbose=FALSE, ...)
## S3 method for class 'profileCGH' daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE, OnlySmoothing = FALSE, OnlyOptimCall = FALSE, smoothfunc="lawsglad", lkern="Exponential", model="Gaussian", qlambda=0.999, bandwidth=10, sigma=NULL, base=FALSE, round=2, lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2, method="centroid", nmin=1, nmax=8, region.size=2, amplicon=1, deletion=-5, deltaN=0.10, forceGL=c(-0.15,0.15), nbsigma=3, MinBkpWeight=0.35, DelBkpInAmp=TRUE, DelBkpInDel=TRUE, CheckBkpPos=TRUE, assignGNLOut=TRUE, breaksFdrQ = 0.0001, haarStartLevel = 1, haarEndLevel = 5, weights.name = NULL, verbose=FALSE, ...)
profileCGH |
Object of class |
mediancenter |
If |
genomestep |
If |
normalrefcenter |
If |
OnlySmoothing |
If |
OnlyOptimCall |
If |
smoothfunc |
Type of algorithm used to smooth |
lkern |
lkern determines the location kernel to be used
(see |
model |
model determines the distribution type of LogRatio
(see |
qlambda |
qlambda determines the scale parameter qlambda for the
stochastic penalty (see |
base |
If TRUE, the position of clone is the physical position onto the chromosome, otherwise the rank position is used. |
sigma |
Value to be passed to either argument |
bandwidth |
Set the maximal bandwidth |
round |
The smoothing results of either |
lambdabreak |
Penalty term ( |
lambdaclusterGen |
Penalty term ( |
param |
Parameter of kernel used in the penalty term. |
alpha |
Risk alpha used for the "Outlier detection" step. |
msize |
The outliers MAD are calculated on regions with a cardinality greater or equal to msize. |
method |
The agglomeration method to be used during the "clustering throughout the genome" steps. |
nmin |
Minimum number of clusters (N*max) allowed during the "clustering throughout the genome" clustering step. |
nmax |
Maximum number of clusters (N*max) allowed during the "clustering throughout the genome" clustering step. |
region.size |
The breakpoints which define regions with a number of probes lower or equal to this value are discared. |
amplicon |
Level (and outliers) with a smoothing value (log-ratio value) greater than this threshold are consider as amplicon. Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step. |
deletion |
Level (and outliers) with a smoothing value (log-ratio value) lower than this threshold are consider as deletion. Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step. |
deltaN |
Region with smoothing values in between the interval [-deltaN,+deltaN] are supposed to be normal. |
forceGL |
Level with smoothing value greater (lower) than
|
nbsigma |
For each breakpoints, a weight is calculated which is a function of absolute value of the Gap between the smoothing values of the two consecutive regions. Weight = 1- kernelpen(abs(Gap),param=c(d=nbsigma*Sigma)) where Sigma is the standard deviation of the LogRatio. |
MinBkpWeight |
Breakpoints which |
DelBkpInAmp |
If TRUE, the breakpoints identified inside amplicon regions are deleted. For amplicon, the log-ratio values are highly variable which lead to identification of false positive breakpoints. |
DelBkpInDel |
If TRUE, the breakpoints identified inside deletion regions are deleted. For deletion, the log-ratio values are highly variable which lead to identification of false positive breakpoints. |
CheckBkpPos |
If |
assignGNLOut |
If |
breaksFdrQ |
breaksFdrQ for HaarSeg algorithm. |
haarStartLevel |
haarStartLevel for HaarSeg algorithm. |
haarEndLevel |
haarEndLevel for HaarSeg algorithm. |
weights.name |
The name of the fields which contains the weights used for the haarseg algorithm. By default, the value is set to NULL meaning that all the observations have the same weights. If provided, the field must contain positive values. |
verbose |
If |
... |
... |
The function daglad
implements a slightly modified
version of the methodology described in the article : Analysis of array CGH data: from signal
ratio to gain and loss of DNA regions (Hupé et al., Bioinformatics,
2004). For smoothing, it is possible
to use either the AWS algorithm (Polzehl and Spokoiny, 2002) or the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008).
The daglad
function allows to choose some threshold to help the algorithm to identify the status of the genomic regions. The threshodls are given in the following parameters:
deltaN
forceGL
deletion
amplicon
An object of class "profileCGH" with the following attributes: |
|
profileValues |
is a data.frame with the following information:
|
BkpInfo |
is a data.frame sum up the information for each breakpoint:
|
NormalRef |
If |
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected].
Hupé et al. (Bioinformatics, 2004): Analysis of array CGH data: from signal ratio to gain and loss of DNA regions.
Polzehl and Spokoiny (WIAS-Preprint 787, 2002): Local likelihood modelling by adaptive weights smoothing.
Ben-Yaacov and Eldar (Bioinformatics, 2008): A fast and flexible method for the segmentation of aCGH data.
glad
.
data(snijders) gm13330$Clone <- gm13330$BAC profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### daglad function ### ########################################################### res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE, smoothfunc="lawsglad", lkern="Exponential", model="Gaussian", qlambda=0.999, bandwidth=10, base=FALSE, round=1.5, lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2, method="centroid", nmin=1, nmax=8, amplicon=1, deletion=-5, deltaN=0.10, forceGL=c(-0.15,0.15), nbsigma=3, MinBkpWeight=0.35, CheckBkpPos=TRUE) ### data for cytoband data(cytoband) ### Genomic profile on the whole genome plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", main="Breakpoints detection: DAGLAD analysis", cytoband = cytoband) ###Genomic profile for chromosome 1 plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis", cytoband = cytoband) ### The standard-deviation of LogRatio are: res$SigmaC ### The list of breakpoints is: res$BkpInfo
data(snijders) gm13330$Clone <- gm13330$BAC profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### daglad function ### ########################################################### res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE, smoothfunc="lawsglad", lkern="Exponential", model="Gaussian", qlambda=0.999, bandwidth=10, base=FALSE, round=1.5, lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2, method="centroid", nmin=1, nmax=8, amplicon=1, deletion=-5, deltaN=0.10, forceGL=c(-0.15,0.15), nbsigma=3, MinBkpWeight=0.35, CheckBkpPos=TRUE) ### data for cytoband data(cytoband) ### Genomic profile on the whole genome plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", main="Breakpoints detection: DAGLAD analysis", cytoband = cytoband) ###Genomic profile for chromosome 1 plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis", cytoband = cytoband) ### The standard-deviation of LogRatio are: res$SigmaC ### The list of breakpoints is: res$BkpInfo
This function allows the detection of breakpoints in genomic profiles obtained by array CGH technology and affects a status (gain, normal or lost) to each clone.
## S3 method for class 'profileCGH' glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=1.5, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, sigma, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, assignGNLOut=TRUE, breaksFdrQ = 0.0001, haarStartLevel = 1, haarEndLevel = 5, verbose=FALSE, ...)
## S3 method for class 'profileCGH' glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=1.5, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, sigma, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, assignGNLOut=TRUE, breaksFdrQ = 0.0001, haarStartLevel = 1, haarEndLevel = 5, verbose=FALSE, ...)
profileCGH |
Object of class |
mediancenter |
If |
smoothfunc |
Type of algorithm used to smooth |
bandwidth |
Set the maximal bandwidth |
round |
The smoothing results are rounded or not depending on
the |
model |
Determines the distribution type of the LogRatio. Keep
always the model as "Gaussian" (see |
lkern |
Determines the location kernel to be used (see |
qlambda |
Determines the scale parameter for the
stochastic penalty (see |
base |
If |
sigma |
Value to be passed to either argument |
lambdabreak |
Penalty term ( |
lambdacluster |
Penalty term ( |
lambdaclusterGen |
Penalty term ( |
type |
Type of kernel function used in the penalty term during the Optimization of the number of breakpoints step, the MSHR clustering by chromosome step and the HCSR clustering throughout the genome step. |
param |
Parameter of kernel used in the penalty term. |
alpha |
Risk alpha used for the Outlier detection step. |
msize |
The outliers MAD are calculated on regions with a cardinality greater or equal to msize. |
method |
The agglomeration method to be used during the MSHR clustering by chromosome and the HCSR clustering throughout the genome clustering steps. |
nmax |
Maximum number of clusters (N*max) allowed during the the MSHR clustering by chromosome and the HCSR clustering throughout the genome clustering steps. |
assignGNLOut |
If |
breaksFdrQ |
breaksFdrQ for HaarSeg algorithm. |
haarStartLevel |
haarStartLevel for HaarSeg algorithm. |
haarEndLevel |
for HaarSeg algorithm. |
verbose |
If |
... |
... |
The function glad
implements the methodology which
is described in the article: Analysis of array CGH data: from signal
ratio to gain and loss of DNA regions (Hupé et al., Bioinformatics, 2004).
The principles of the GLAD algorithm: First, the detection of breakpoints is based on the estimation of a piecewise constant function with the Adaptive Weights Smoothing (AWS) procedure (Polzehl and Spokoiny, 2002). Alternatively, it is possible to use the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008). Then, a procedure based on penalyzed maximum likelihood optimizes the number of breakpoints and removes the undesirable breakpoints. Finally, based on the regions previously identified, a two-step unsupervised classification (MSHR clustering by chromosome and the HCSR clustering throughout the genome) with model selection criteria allows a status to be assigned for each region (gain, loss or normal).
Main parameters to be tuned:
qlambda |
if you want the smoothing to fit some very local effect, choose a smaller qlambda . |
bandwidth |
choose a bandwidth not to small otherwise you will have a lot of little discontinuities. |
lambdabreak |
The higher the parameter is, the higher the number of undesirable breakpoints is. |
lambdacluster |
The higher the parameter is, the higher is the number of the regions within a chromosome which belong to the same cluster. |
lambdaclusterGen |
More the parameter is high more the regions over the whole genome are supposed to belong to the same cluster. |
An object of class "profileCGH" with the following attributes: |
|
profileValues: |
a data.frame with the following added information:
|
BkpInfo: |
the data.frame attribute
|
SigmaC: |
the data.frame attribute
|
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected].
Hupé et al. (Bioinformatics, 2004) Analysis of array CGH data: from signal ratio to gain and loss of DNA regions.
Polzehl and Spokoiny (WIAS-Preprint 787, 2002)Local likelihood modelling by adaptive weights smoothing.
Ben-Yaacov and Eldar (Bioinformatics, 2008)A fast and flexible method for the segmentation of aCGH data.
profileCGH
, as.profileCGH
, plotProfile
.
data(snijders) ### Creation of "profileCGH" object gm13330$Clone <- gm13330$BAC profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### glad function as described in Hupé et al. (2004) ### ########################################################### res <- glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=1.5, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, verbose=FALSE) ### cytoband data to plot chromosomes data(cytoband) ### Genomic profile on the whole genome plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", main="Breakpoints detection: GLAD analysis", cytoband = cytoband) ###Genomic profile for chromosome 1 plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", main="Chromosome 1: GLAD analysis", cytoband = cytoband) ### The standard-deviation of LogRatio are: res$SigmaC ### The list of breakpoints is: res$BkpInfo
data(snijders) ### Creation of "profileCGH" object gm13330$Clone <- gm13330$BAC profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### glad function as described in Hupé et al. (2004) ### ########################################################### res <- glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=1.5, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, verbose=FALSE) ### cytoband data to plot chromosomes data(cytoband) ### Genomic profile on the whole genome plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", main="Breakpoints detection: GLAD analysis", cytoband = cytoband) ###Genomic profile for chromosome 1 plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", main="Chromosome 1: GLAD analysis", cytoband = cytoband) ### The standard-deviation of LogRatio are: res$SigmaC ### The list of breakpoints is: res$BkpInfo
Hierarchical cluster analysis on a set of dissimilarities and methods for analyzing it.
hclustglad(d, method = "complete", members=NULL)
hclustglad(d, method = "complete", members=NULL)
d |
a dissimilarity structure as produced by |
method |
the agglomeration method to be used. This should
be (an unambiguous abbreviation of) one of
|
members |
|
This function performs a hierarchical cluster analysis
using a set of dissimilarities for the objects being
clustered. Initially, each object is assigned to its own
cluster and then the algorithm proceeds iteratively,
at each stage joining the two most similar clusters,
continuing until there is just a single cluster.
At each stage distances between clusters are recomputed
by the Lance–Williams dissimilarity update formula
according to the particular clustering method being used.
A number of different clustering methods are provided. Ward's minimum variance method aims at finding compact, spherical clusters. The complete linkage method finds similar clusters. The single linkage method (which is closely related to the minimal spanning tree) adopts a ‘friends of friends’ clustering strategy. The other methods can be regarded as aiming for clusters with characteristics somewhere between the single and complete link methods.
If members!=NULL
, then d
is taken to be a
dissimilarity matrix between clusters instead of dissimilarities
between singletons and members
gives the number of observations
per cluster. This way the hierarchical cluster algorithm can be “started
in the middle of the dendrogram”, e.g., in order to reconstruct the
part of the tree above a cut (see examples). Dissimilarities between
clusters can be efficiently computed (i.e., without hclustglad
itself) only for a limited number of distance/linkage combinations,
the simplest one being squared Euclidean distance and centroid
linkage. In this case the dissimilarities between the clusters are
the squared Euclidean distances between cluster means.
In hierarchical cluster displays, a decision is needed at each merge to
specify which subtree should go on the left and which on the right.
Since, for observations there are
merges,
there are
possible orderings for the leaves
in a cluster tree, or dendrogram.
The algorithm used in
hclustglad
is to order the subtree so that
the tighter cluster is on the left (the last, i.e. most recent,
merge of the left subtree is at a lower value than the last
merge of the right subtree).
Single observations are the tightest clusters possible,
and merges involving two observations place them in order by their
observation sequence number.
An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
merge |
an |
height |
a set of |
order |
a vector giving the permutation of the original
observations suitable for plotting, in the sense that a cluster
plot using this ordering and matrix |
labels |
labels for each of the objects being clustered. |
call |
the call which produced the result. |
method |
the cluster method that has been used. |
dist.method |
the distance that has been used to create |
The hclustglad
function is based an Algorithm
contributed to STATLIB by F. Murtagh.
Everitt, B. (1974). Cluster Analysis. London: Heinemann Educ. Books.
Hartigan, J. A. (1975). Clustering Algorithms. New York: Wiley.
Sneath, P. H. A. and R. R. Sokal (1973). Numerical Taxonomy. San Francisco: Freeman.
Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic Press: New York.
Gordon, A. D. (1999). Classification. Second Edition. London: Chapman and Hall / CRC
Murtagh, F. (1985). “Multidimensional Clustering Algorithms”, in COMPSTAT Lectures 4. Wuerzburg: Physica-Verlag (for algorithmic details of algorithms used).
data(USArrests) hc <- hclustglad(dist(USArrests), "ave") plot(hc) plot(hc, hang = -1) ## Do the same with centroid clustering and squared Euclidean distance, ## cut the tree into ten clusters and reconstruct the upper part of the ## tree from the cluster centers. hc <- hclustglad(dist(USArrests)^2, "cen") memb <- cutree(hc, k = 10) cent <- NULL for(k in 1:10){ cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE])) } hc1 <- hclustglad(dist(cent)^2, method = "cen", members = table(memb)) opar <- par(mfrow = c(1, 2)) plot(hc, labels = FALSE, hang = -1, main = "Original Tree") plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters") par(opar)
data(USArrests) hc <- hclustglad(dist(USArrests), "ave") plot(hc) plot(hc, hang = -1) ## Do the same with centroid clustering and squared Euclidean distance, ## cut the tree into ten clusters and reconstruct the upper part of the ## tree from the cluster centers. hc <- hclustglad(dist(USArrests)^2, "cen") memb <- cutree(hc, k = 10) cent <- NULL for(k in 1:10){ cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE])) } hc1 <- hclustglad(dist(cent)^2, method = "cen", members = table(memb)) opar <- par(mfrow = c(1, 2)) plot(hc, labels = FALSE, hang = -1, main = "Original Tree") plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters") par(opar)
Kernel function used in the penalty term.
kernelpen(x, type="tricubic", param)
kernelpen(x, type="tricubic", param)
x |
Real Value. |
type |
Type of kernelpen to be used |
param |
a named vector. |
The only kernel available is the "tricubic" kernel which takes
the values . The value of d is given by
param=c(d=6)
for example.
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected]
This function returns a vector of color names corresponding to a range of colors specified in the arguments.
myPalette(low = "white", high = c("green", "red"), mid=NULL, k =50)
myPalette(low = "white", high = c("green", "red"), mid=NULL, k =50)
low |
Color for the lower end of the color palette, specified using any of the three kinds of R colors, i.e., either a color name (an element of |
high |
Color for the upper end of the color palette, specified
using any of the three kinds of R colors, i.e., either a color name
(an element of |
mid |
Color for the middle portion of the color palette, specified using any of the three kinds of R colors, i.e., either a color name (an element of |
k |
Number of colors in the palette. |
A "character" vector of color names. This can be used to create a user-defined color palette for subsequent graphics by palette
, in a col=
specification in graphics functions, or in par
.
Sandrine Dudoit, Yee Hwa (Jean) Yang.
palette
, rgb
,
colors
, col2rgb
, image
, ColorBar
, arrayPlot
.
par(mfrow=c(1,4)) pal <- myPalette(low="red", high="green") ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21) pal <- myPalette(low="red", high="green", mid="yellow") ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21) pal <- myPalette() ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21) pal <- myPalette(low="purple", high="purple",mid="white") ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21)
par(mfrow=c(1,4)) pal <- myPalette(low="red", high="green") ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21) pal <- myPalette(low="red", high="green", mid="yellow") ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21) pal <- myPalette() ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21) pal <- myPalette(low="purple", high="purple",mid="white") ColorBar(seq(-2,2, 0.2), col=pal, horizontal=FALSE, k=21)
Plot genomic profile with breakpoints, outliers, smoothing line and cytogenetic banding.
## S3 method for class 'profileCGH' plotProfile(profileCGH, variable="LogRatio", Chromosome=NULL, Smoothing=NULL, GNL="ZoneGNL", Bkp=FALSE, labels=TRUE, plotband=TRUE, unit=0, colDAGLAD=c("black","blue","red","green","yellow"), pchSymbol=c(20,13), colCytoBand=c("white","darkblue"), colCentro="red", text=NULL, cytoband = NULL, main="", ylim=NULL, ...)
## S3 method for class 'profileCGH' plotProfile(profileCGH, variable="LogRatio", Chromosome=NULL, Smoothing=NULL, GNL="ZoneGNL", Bkp=FALSE, labels=TRUE, plotband=TRUE, unit=0, colDAGLAD=c("black","blue","red","green","yellow"), pchSymbol=c(20,13), colCytoBand=c("white","darkblue"), colCentro="red", text=NULL, cytoband = NULL, main="", ylim=NULL, ...)
profileCGH |
Object of class |
variable |
The variable to be plot. |
Chromosome |
A numeric vector with chromosome number to be
plotted. Use 23 and 24 for chromosome X and Y respectively. If
|
Smoothing |
The variable used to plot the smoothing line. If
|
GNL |
The variable used to plot the Gain, Normal and Loss color code. |
Bkp |
If |
labels |
If |
plotband |
If |
unit |
Give the unit of the PosBase. For example if |
colDAGLAD |
Color code to plot Deletion, Amplification, Gain, Lost and Normal status. |
pchSymbol |
A vector of two elements to specify the symbol tu be used for plotting point. pchSymbol[2] is the symbol for outliers. |
colCytoBand |
Color code for cytogenetic banding. |
colCentro |
Color code for centromere. |
text |
A list with the parameters to be passed to the function
|
cytoband |
cytodand data. For human, cytoband data are avaibale using data(cytoband). |
main |
title of the plot. |
ylim |
range of the y-axis |
... |
... |
" "
A plot
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected].
" "
### Cytogenetic banding information data(cytoband) ### data(snijders) ### Creation of "profileCGH" object profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### glad function as described in Hupé et al. (2004) ### ########################################################### res <- glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=2, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, verbose=FALSE) ### cytoband data to plot chromosome data(cytoband) ### Genomic profile on the whole genome plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", plotband=FALSE, cytoband = cytoband) ### Genomic profile on the whole genome and cytogenetic banding plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", cytoband = cytoband) ### Genomic profile for chromosome 1 text <- list(x=c(90000,200000),y=c(0.15,0.3),labels=c("NORMAL","GAIN"), cex=2) plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", plotband=FALSE, text=text, cytoband = cytoband) ### Genomic profile for chromosome 1 and cytogenetic banding with labels text <- list(x=c(90000,200000),y=c(0.15,0.3),labels=c("NORMAL","GAIN"), cex=2) plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", text=text, main="Chromosome 1", cytoband = cytoband)
### Cytogenetic banding information data(cytoband) ### data(snijders) ### Creation of "profileCGH" object profileCGH <- as.profileCGH(gm13330) ########################################################### ### ### glad function as described in Hupé et al. (2004) ### ########################################################### res <- glad(profileCGH, mediancenter=FALSE, smoothfunc="lawsglad", bandwidth=10, round=2, model="Gaussian", lkern="Exponential", qlambda=0.999, base=FALSE, lambdabreak=8, lambdacluster=8, lambdaclusterGen=40, type="tricubic", param=c(d=6), alpha=0.001, msize=5, method="centroid", nmax=8, verbose=FALSE) ### cytoband data to plot chromosome data(cytoband) ### Genomic profile on the whole genome plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", plotband=FALSE, cytoband = cytoband) ### Genomic profile on the whole genome and cytogenetic banding plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing", cytoband = cytoband) ### Genomic profile for chromosome 1 text <- list(x=c(90000,200000),y=c(0.15,0.3),labels=c("NORMAL","GAIN"), cex=2) plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", plotband=FALSE, text=text, cytoband = cytoband) ### Genomic profile for chromosome 1 and cytogenetic banding with labels text <- list(x=c(90000,200000),y=c(0.15,0.3),labels=c("NORMAL","GAIN"), cex=2) plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing", text=text, main="Chromosome 1", cytoband = cytoband)
Description of the objects profileCGH
and profileChr
. The
last object corresponds to data of only one chromosome.
LogRatio, Chromosome
and PosOrder
are compulsory.
Objects profileCGH
and profileChr
are composed of a list with the first element
profileValues
which is a data.frame
with the following columns
names:
LogRatio |
Test over Reference log-ratio. |
PosOrder |
The rank position of each clone on the genome. |
PosBase |
The base position of each clone on the genome. |
Chromosome |
Chromosome name. |
Clone |
The name of the corresponding clone. |
... |
Other elements can be added. |
People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.
Philippe Hupé, [email protected].
data(snijders) gm13330$Clone <- gm13330$BAC profileCGH <- as.profileCGH(gm13330) class(profileCGH) <- "profileCGH" profileChr <- as.profileCGH(gm13330[which(gm13330$Chromosome==1),]) class(profileChr) <- "profileChr"
data(snijders) gm13330$Clone <- gm13330$BAC profileCGH <- as.profileCGH(gm13330) class(profileCGH) <- "profileCGH" profileChr <- as.profileCGH(gm13330[which(gm13330$Chromosome==1),]) class(profileChr) <- "profileChr"
The data consist of 15 human cell strains with known karyotype (12 fibroblast cell strains, 2 chorionic villus cell strains, 1 lymploblast cell strain) from the NIGMS Human Genetics Cell Repository (http://locus.umdnj.edu/nigms). Each cell strain has been hybridized onto a CGH-array of 2276 BAC's spotted in triplicate.
data(snijders)
data(snijders)
http://www.nature.com/ng/journal/v29/n3/suppinfo/ng754_S1.html
A M Snijders, N Nowak, R Segraves, S Blackwood, N Brown, J Conroy, G Hamilton, A K Hindle, B Huey, K Kimura, S Law, K Myambo, J Palmer, B Ylstra, J P Yue, J W Gray, A N Jain, D Pinkel & D G Albertson , Assembly of microarrays for genome-wide measurement of DNA copy number, Nature Genetics 29, pp 263 - 264 (2001) Brief Communications.
data(snijders) array <- gm13330
data(snijders) array <- gm13330
The data consist of 2 bladder cancer tumors obtained by Veltman et al (2003).
data(veltman)
data(veltman)
http://cancerres.aacrjournals.org/cgi/content/full/63/11/2872
Joris A. Veltman, Jane Fridlyand, Sunanda Pejavar, Adam B. Olshen, James E. Korkola, Sandy DeVries, Peter Carroll, Wen-Lin Kuo, Daniel Pinkel, Donna Albertson, Carlos Cordon-Cardo, Ajay N. Jain and Frederic M. Waldman. Array-based Comparative Genomic Hybridization for Genome-Wide Screening of DNA Copy Number in Bladder Tumors. Cancer Research 63, 2872-2880, 2003.
data(veltman) P9
data(veltman) P9