Title: | Copy Number Variant Metrics |
---|---|
Description: | The CNVMetrics package calculates similarity metrics to facilitate copy number variant comparison among samples and/or methods. Similarity metrics can be employed to compare CNV profiles of genetically unrelated samples as well as those with a common genetic background. Some metrics are based on the shared amplified/deleted regions while other metrics rely on the level of amplification/deletion. The data type used as input is a plain text file containing the genomic position of the copy number variations, as well as the status and/or the log2 ratio values. Finally, a visualization tool is provided to explore resulting metrics. |
Authors: | Astrid Deschênes [aut, cre] , Pascal Belleau [aut] , David A. Tuveson [aut] , Alexander Krasnitz [aut] |
Maintainer: | Astrid Deschênes <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.11.0 |
Built: | 2024-10-30 05:17:40 UTC |
Source: | https://github.com/bioc/CNVMetrics |
The CNVMetrics package calculates similarity metrics to facilitate copy number variant comparison among samples and/or methods. Similarity metrics can be employed to compare CNV profiles of genetically unrelated samples as well as those with a common genetic background. Some metrics are based on the shared amplified/deleted regions while other metrics rely on the level of amplification/deletion. The data type used as input is a plain text file containing the genomic position of the copy number variations, as well as the status and/or the log2 ratio values. Finally, a visualization tool is provided to explore resulting metrics.
Astrid Deschênes, Pascal Belleau, David A. Tuveson and Alexander Krasnitz
Maintainer: Astrid Deschênes <[email protected]>
calculateOverlapMetric
for calculating metric
using overlapping amplified/deleted regions
calculateLog2ratioMetric
for calculating metric
using log2ratio values
processSim
for generating simulations
plotMetric
for plotting metrics
This function calculates a specific metric, as specified by the user, using overlapping amplified/deleted regions between to samples. The metric is calculated for the amplified and deleted regions separately. When more than 2 samples are present, the metric is calculated for each sample pair.
calculateLog2ratioMetric( segmentData, method = c("weightedEuclideanDistance"), minThreshold = 0.2, excludedRegions = NULL, nJobs = 1 )
calculateLog2ratioMetric( segmentData, method = c("weightedEuclideanDistance"), minThreshold = 0.2, excludedRegions = NULL, nJobs = 1 )
segmentData |
a |
method |
a |
minThreshold |
a single positive |
excludedRegions |
an optional |
nJobs |
a single positive |
The weighted euclidean distance is
where
x
and y
are the
values of 2 samples for a specific segment i
and nbrBases
the
number of bases of the segment i
.
an object of class "CNVMetric
" which contains the calculated
metric. This object is a list with the following components:
LOG2RATIO
a lower-triangular matrix
with the
results of the selected metric on the log2ratio values for each paired
samples. The value NA
is present when the metric cannot be
calculated. The value NA
is also present in the top-triangular
section, as well as the diagonal, of the matrix.
The object has the following attributes (besides "class" equal to "CNVMetric"):
metric
the metric used for the calculation.
names
the names of the two matrix containing the metrics for
the amplified and deleted regions.
Astrid Deschênes, Pascal Belleau
## Load required package to generate the samples require(GenomicRanges) ## Create a GRangesList object with 3 samples ## The stand of the regions doesn't affect the calculation of the metric demo <- GRangesList() demo[["sample01"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1905048, 4554832, 31686841), end=c(2004603, 4577608, 31695808)), strand="*", log2ratio=c(2.5555, 1.9932, -0.9999)) demo[["sample02"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1995066, 31611222, 31690000), end=c(2204505, 31689898, 31895666)), strand=c("-", "+", "+"), log2ratio=c(0.3422, 0.5454, -1.4444)) ## The amplified region in sample03 is a subset of the amplified regions ## in sample01 demo[["sample03"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1906069, 4558838), end=c(1909505, 4570601)), strand="*", log2ratio=c(3.2222, -1.3232)) ## Calculating Sorensen metric calculateLog2ratioMetric(demo, method="weightedEuclideanDistance", nJobs=1)
## Load required package to generate the samples require(GenomicRanges) ## Create a GRangesList object with 3 samples ## The stand of the regions doesn't affect the calculation of the metric demo <- GRangesList() demo[["sample01"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1905048, 4554832, 31686841), end=c(2004603, 4577608, 31695808)), strand="*", log2ratio=c(2.5555, 1.9932, -0.9999)) demo[["sample02"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1995066, 31611222, 31690000), end=c(2204505, 31689898, 31895666)), strand=c("-", "+", "+"), log2ratio=c(0.3422, 0.5454, -1.4444)) ## The amplified region in sample03 is a subset of the amplified regions ## in sample01 demo[["sample03"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1906069, 4558838), end=c(1909505, 4570601)), strand="*", log2ratio=c(3.2222, -1.3232)) ## Calculating Sorensen metric calculateLog2ratioMetric(demo, method="weightedEuclideanDistance", nJobs=1)
This function calculates a specific metric, as specified by the user, using overlapping regions of specific state between to samples. The metric is calculated for each state separately. When more than 2 samples are present, the metric is calculated for each sample pair. By default, the function is calculating metrics for the AMPLIFICATION and DELETION states. However, the user can specify the list of states to be analyzed.
calculateOverlapMetric( segmentData, states = c("AMPLIFICATION", "DELETION"), method = c("sorensen", "szymkiewicz", "jaccard"), nJobs = 1 )
calculateOverlapMetric( segmentData, states = c("AMPLIFICATION", "DELETION"), method = c("sorensen", "szymkiewicz", "jaccard"), nJobs = 1 )
segmentData |
a |
states |
a |
method |
a |
nJobs |
a single positive |
The two methods each estimate the overlap between paired samples. They use
different metrics, all in the range [0, 1] with 0 indicating no overlap.
The NA
is used when the metric cannot be calculated.
The available metrics are (written for two GRanges):
sorensen
:
This metric is calculated by dividing twice the size of the intersection by the sum of the size of the two sets. With this metric, an overlap metric value of 1 is only obtained when the two samples are identical.
szymkiewicz
:
This metric is calculated by dividing the size of the intersection by the size of the smallest set. With this metric, if one set is a subset of the other set, the overlap metric value is 1.
jaccard
:
This metric is calculated by dividing the size of the intersection by the size of the union of the two sets. With this metric, an overlap metric value of 1 is only obtained when the two samples are identical.
an object of class "CNVMetric
" which contains the calculated
metric. This object is a list where each entry corresponds to one state
specified in the 'states
' parameter. Each entry is a matrix
:
state
a lower-triangular matrix
with the
results of the selected metric on the amplified regions for each paired
samples. The value NA
is present when the metric cannot be
calculated. The value NA
is also present in the top-triangular
section, as well as the diagonal, of the matrix.
The object has the following attributes (besides "class" equal to "CNVMetric"):
metric
the metric used for the calculation.
names
the names of the two matrix containing the metrics for
the amplified and deleted regions.
Astrid Deschênes, Pascal Belleau
Sørensen, Thorvald. n.d. “A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species and Its Application to Analyses of the Vegetation on Danish Commons.” Biologiske Skrifter, no. 5: 1–34.
Vijaymeena, M. K, and Kavitha K. 2016. “A Survey on Similarity Measures in Text Mining.” Machine Learning and Applications: An International Journal 3 (1): 19–28. doi: https://doi.org/10.5121/mlaij.2016.3103
Jaccard, P. (1912), The Distribution of the Flora in the Alpine Zone. New Phytologist, 11: 37-50. doi: https://doi.org/10.1111/j.1469-8137.1912.tb05611.x
## Load required package to generate the samples require(GenomicRanges) ## Create a GRangesList object with 3 samples ## The stand of the regions doesn't affect the calculation of the metric demo <- GRangesList() demo[["sample01"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1905048, 4554832, 31686841, 32686222), end=c(2004603, 4577608, 31695808, 32689222)), strand="*", state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION", "LOH")) demo[["sample02"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1995066, 31611222, 31690000, 32006222), end=c(2204505, 31689898, 31895666, 32789233)), strand=c("-", "+", "+", "+"), state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION", "LOH")) ## The amplified region in sample03 is a subset of the amplified regions ## in sample01 demo[["sample03"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1906069, 4558838), end=c(1909505, 4570601)), strand="*", state=c("AMPLIFICATION", "DELETION")) ## Calculating Sorensen metric for both AMPLIFICATION and DELETION calculateOverlapMetric(demo, method="sorensen", nJobs=1) ## Calculating Szymkiewicz-Simpson metric on AMPLIFICATION only calculateOverlapMetric(demo, states="AMPLIFICATION", method="szymkiewicz", nJobs=1) ## Calculating Jaccard metric on LOH only calculateOverlapMetric(demo, states="LOH", method="jaccard", nJobs=1)
## Load required package to generate the samples require(GenomicRanges) ## Create a GRangesList object with 3 samples ## The stand of the regions doesn't affect the calculation of the metric demo <- GRangesList() demo[["sample01"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1905048, 4554832, 31686841, 32686222), end=c(2004603, 4577608, 31695808, 32689222)), strand="*", state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION", "LOH")) demo[["sample02"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1995066, 31611222, 31690000, 32006222), end=c(2204505, 31689898, 31895666, 32789233)), strand=c("-", "+", "+", "+"), state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION", "LOH")) ## The amplified region in sample03 is a subset of the amplified regions ## in sample01 demo[["sample03"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1906069, 4558838), end=c(1909505, 4570601)), strand="*", state=c("AMPLIFICATION", "DELETION")) ## Calculating Sorensen metric for both AMPLIFICATION and DELETION calculateOverlapMetric(demo, method="sorensen", nJobs=1) ## Calculating Szymkiewicz-Simpson metric on AMPLIFICATION only calculateOverlapMetric(demo, states="AMPLIFICATION", method="szymkiewicz", nJobs=1) ## Calculating Jaccard metric on LOH only calculateOverlapMetric(demo, states="LOH", method="jaccard", nJobs=1)
CNVMetric
Functions to test inheritance relationships between an
object and class CNVMetric
.
## S3 method for class 'CNVMetric' is(x, ...)
## S3 method for class 'CNVMetric' is(x, ...)
x |
an object. |
... |
further arguments passed to or from other methods. |
a logical
.
CNVMetric
objectThis function plots one heatmap (or two heatmaps) of the
metrics present in
a CNVMetric
object. For the overlapping metrics, the user can select
to print the heatmap related to amplified or deleted regions or both. The
NA
values present in the metric matrix are transformed into zero for
the creation of the heatmap.
plotMetric( metric, type = "ALL", colorRange = c(c("white", "darkblue")), show_colnames = FALSE, silent = TRUE, ... )
plotMetric( metric, type = "ALL", colorRange = c(c("white", "darkblue")), show_colnames = FALSE, silent = TRUE, ... )
metric |
a |
type |
a single |
colorRange |
a |
show_colnames |
a |
silent |
a |
... |
further arguments passed to
|
a gtable
object containing the heatmap(s) of the specified
metric(s).
Astrid Deschênes
The default method pheatmap::pheatmap()
.
## Load required package to generate the samples require(GenomicRanges) ## Create a GRangesList object with 3 samples ## The stand of the regions doesn't affect the calculation of the metric demo <- GRangesList() demo[["sample01"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1905048, 4554832, 31686841), end=c(2004603, 4577608, 31695808)), strand="*", state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION")) demo[["sample02"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1995066, 31611222, 31690000), end=c(2204505, 31689898, 31895666)), strand=c("-", "+", "+"), state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION")) ## The amplified region in sample03 is a subset of the amplified regions ## in sample01 demo[["sample03"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1906069, 4558838), end=c(1909505, 4570601)), strand="*", state=c("AMPLIFICATION", "DELETION")) ## Calculating Sorensen metric metric <- calculateOverlapMetric(demo, method="sorensen") ## Plot both amplification and deletion metrics plotMetric(metric, type="ALL") ## Extra parameters, used by pheatmap(), can also be passed to the function ## Here, we have the metric values print to the cell while the ## row names and column names are removed plotMetric(metric, type="DELETION", show_rownames=FALSE, show_colnames=FALSE, main="deletion", display_numbers=TRUE, number_format="%.2f")
## Load required package to generate the samples require(GenomicRanges) ## Create a GRangesList object with 3 samples ## The stand of the regions doesn't affect the calculation of the metric demo <- GRangesList() demo[["sample01"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1905048, 4554832, 31686841), end=c(2004603, 4577608, 31695808)), strand="*", state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION")) demo[["sample02"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1995066, 31611222, 31690000), end=c(2204505, 31689898, 31895666)), strand=c("-", "+", "+"), state=c("AMPLIFICATION", "AMPLIFICATION", "DELETION")) ## The amplified region in sample03 is a subset of the amplified regions ## in sample01 demo[["sample03"]] <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1906069, 4558838), end=c(1909505, 4570601)), strand="*", state=c("AMPLIFICATION", "DELETION")) ## Calculating Sorensen metric metric <- calculateOverlapMetric(demo, method="sorensen") ## Plot both amplification and deletion metrics plotMetric(metric, type="ALL") ## Extra parameters, used by pheatmap(), can also be passed to the function ## Here, we have the metric values print to the cell while the ## row names and column names are removed plotMetric(metric, type="DELETION", show_rownames=FALSE, show_colnames=FALSE, main="deletion", display_numbers=TRUE, number_format="%.2f")
Print a CNVMetric
object and returns it
invisibly
.
## S3 method for class 'CNVMetric' print(x, ...)
## S3 method for class 'CNVMetric' print(x, ...)
x |
the output object from |
... |
further arguments passed to or from other methods. |
the argument x
.
The default method print.default
.
The function uses the input sample to simulate new samples. The simulated samples will possess similar sizes of events, proportional to the original chromosome. To generate realistic simulations, the specified sample must contain segments covering the majority of the genome. Most importantly, the NEUTRAL segments should be present.
processSim(curSample, nbSim)
processSim(curSample, nbSim)
curSample |
a |
nbSim |
a single positive |
TODO
a data.frame
containing the segments for each
simulated sample. The data.frame
has 6 columns:
ID
a character
string, the name of the simulated
sample
chr
a character
string, the name fo the chromosome
start
a integer
, the starting position of the
segment
end
a integer
, the ending position of the segment
log2ratio
a numerical
, the log2 copy number
ratio assigned to the segment
state
a character
string, the state of the segment
(ex: DELETION, AMPLIFICATION, NEUTRAL, etc.)
Astrid Deschênes, Pascal Belleau
## Load required package to generate the sample require(GenomicRanges) ## Create one 'demo' genome with 2 chromosomes and few segments ## The stand of the regions doesn't affect the calculation of the metric sample01 <- GRanges(seqnames=c(rep("chr1", 4), rep("chr2", 3)), ranges=IRanges(start=c(1905048, 4554832, 31686841, 32686222, 1, 120331, 725531), end=c(2004603, 4577608, 31695808, 32689222, 117121, 325555, 1225582)), strand="*", state=c("AMPLIFICATION", "NEUTRAL", "DELETION", "LOH", "DELETION", "NEUTRAL", "NEUTRAL"), log2ratio=c(0.5849625, 0, -1, -1, -0.87777, 0, 0)) ## Generates 10 simulated genomes based on the 'demo' genome simRes <- processSim(curSample=sample01, nbSim=10)
## Load required package to generate the sample require(GenomicRanges) ## Create one 'demo' genome with 2 chromosomes and few segments ## The stand of the regions doesn't affect the calculation of the metric sample01 <- GRanges(seqnames=c(rep("chr1", 4), rep("chr2", 3)), ranges=IRanges(start=c(1905048, 4554832, 31686841, 32686222, 1, 120331, 725531), end=c(2004603, 4577608, 31695808, 32689222, 117121, 325555, 1225582)), strand="*", state=c("AMPLIFICATION", "NEUTRAL", "DELETION", "LOH", "DELETION", "NEUTRAL", "NEUTRAL"), log2ratio=c(0.5849625, 0, -1, -1, -0.87777, 0, 0)) ## Generates 10 simulated genomes based on the 'demo' genome simRes <- processSim(curSample=sample01, nbSim=10)