Title: | Analysis of Copy Number Variation in Single-Cell-Sequencing Data |
---|---|
Description: | AneuFinder implements functions for copy-number detection, breakpoint detection, and karyotype and heterogeneity analysis in single-cell whole genome sequencing and strand-seq data. |
Authors: | Aaron Taudt, Bjorn Bakker, David Porubsky |
Maintainer: | Aaron Taudt <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.35.0 |
Built: | 2024-12-31 12:03:49 UTC |
Source: | https://github.com/bioc/AneuFinder |
CNV detection in whole-genome single cell sequencing (WGSCS) and Strand-seq data using a Hidden Markov Model. The package implements CNV detection, commonly used plotting functions, export to BED format for upload to genome browsers, and measures for assessment of karyotype heterogeneity and quality metrics.
The main function of this package is Aneufinder
and produces several plots and browser files. If you want to have more fine-grained control over the different steps (binning, GC-correction, HMM, plotting) check the vignette Introduction to AneuFinder.
Aaron Taudt, David Porubsky
The aneuBiHMM
object is output of the function findCNVs.strandseq
and is basically a list with various entries. The class() attribute of this list was set to "aneuBiHMM". For a given hmm, the entries can be accessed with the list operators 'hmm[[]]' and 'hmm$'.
ID |
An identifier that is used in various AneuFinder functions. |
bins |
A GRanges-class object containing the genomic bin coordinates, their read count and state classification. |
segments |
A GRanges-class object containing regions and their state classification. |
weights |
Weight for each component. |
transitionProbs |
Matrix of transition probabilities from each state (row) into each state (column). |
transitionProbs.initial |
Initial |
startProbs |
Probabilities for the first bin |
startProbs.initial |
Initial |
distributions |
Estimated parameters of the emission distributions. |
distributions.initial |
Distribution parameters at the beginning of the Baum-Welch. |
convergenceInfo |
Contains information about the convergence of the Baum-Welch algorithm. |
convergenceInfo$eps |
Convergence threshold for the Baum-Welch. |
convergenceInfo$loglik |
Final loglikelihood after the last iteration. |
convergenceInfo$loglik.delta |
Change in loglikelihood after the last iteration (should be smaller than |
convergenceInfo$num.iterations |
Number of iterations that the Baum-Welch needed to converge to the desired |
convergenceInfo$time.sec |
Time in seconds that the Baum-Welch needed to converge to the desired |
findCNVs.strandseq
AneuFinder
packageThis function is an easy-to-use wrapper to bin the data, find copy-number-variations, locate breakpoints, plot genomewide heatmaps, distributions, profiles and karyograms.
Aneufinder(inputfolder, outputfolder, configfile = NULL, numCPU = 1, reuse.existing.files = TRUE, binsizes = 1e+06, stepsizes = binsizes, variable.width.reference = NULL, reads.per.bin = NULL, pairedEndReads = FALSE, assembly = NULL, chromosomes = NULL, remove.duplicate.reads = TRUE, min.mapq = 10, blacklist = NULL, use.bamsignals = FALSE, reads.store = FALSE, correction.method = NULL, GC.BSgenome = NULL, method = c("edivisive"), strandseq = FALSE, R = 10, sig.lvl = 0.1, eps = 0.01, max.time = 60, max.iter = 5000, num.trials = 15, states = c("zero-inflation", paste0(0:10, "-somy")), confint = NULL, refine.breakpoints = FALSE, hotspot.bandwidth = NULL, hotspot.pval = 0.05, cluster.plots = TRUE)
Aneufinder(inputfolder, outputfolder, configfile = NULL, numCPU = 1, reuse.existing.files = TRUE, binsizes = 1e+06, stepsizes = binsizes, variable.width.reference = NULL, reads.per.bin = NULL, pairedEndReads = FALSE, assembly = NULL, chromosomes = NULL, remove.duplicate.reads = TRUE, min.mapq = 10, blacklist = NULL, use.bamsignals = FALSE, reads.store = FALSE, correction.method = NULL, GC.BSgenome = NULL, method = c("edivisive"), strandseq = FALSE, R = 10, sig.lvl = 0.1, eps = 0.01, max.time = 60, max.iter = 5000, num.trials = 15, states = c("zero-inflation", paste0(0:10, "-somy")), confint = NULL, refine.breakpoints = FALSE, hotspot.bandwidth = NULL, hotspot.pval = 0.05, cluster.plots = TRUE)
inputfolder |
Folder with either BAM or BED files. |
outputfolder |
Folder to output the results. If it does not exist it will be created. |
configfile |
A file specifying the parameters of this function (without |
numCPU |
The numbers of CPUs that are used. Should not be more than available on your machine. |
reuse.existing.files |
A logical indicating whether or not existing files in |
binsizes |
An integer vector with bin sizes. If more than one value is given, output files will be produced for each bin size. |
stepsizes |
A vector of step sizes the same length as |
variable.width.reference |
A BAM file that is used as reference to produce variable width bins. See |
reads.per.bin |
Approximate number of desired reads per bin. The bin size will be selected accordingly. Output files are produced for each value. |
pairedEndReads |
Set to |
assembly |
Please see |
chromosomes |
If only a subset of the chromosomes should be imported, specify them here. |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
blacklist |
A |
use.bamsignals |
If |
reads.store |
Set |
correction.method |
Correction methods to be used for the binned read counts. Currently only |
GC.BSgenome |
A |
method |
Any combination of |
strandseq |
A logical indicating whether the data comes from Strand-seq experiments. If |
R |
method-edivisive: The maximum number of random permutations to use in each iteration of the permutation test (see |
sig.lvl |
method-edivisive: The level at which to sequentially test if a proposed change point is statistically significant (see |
eps |
method-HMM: Convergence threshold for the Baum-Welch algorithm. |
max.time |
method-HMM: The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. Set |
max.iter |
method-HMM: The maximum number of iterations for the Baum-Welch algorithm. Set |
num.trials |
method-HMM: The number of trials to find a fit where state |
states |
method-HMM: A subset or all of |
confint |
Desired confidence interval for breakpoints. Set |
refine.breakpoints |
A logical indicating whether breakpoints from the HMM should be refined with read-level information. |
hotspot.bandwidth |
A vector the same length as |
hotspot.pval |
P-value for breakpoint hotspot detection (see |
cluster.plots |
A logical indicating whether plots should be clustered by similarity. |
NULL
Aaron Taudt
## Not run: ## The following call produces plots and genome browser files for all BAM files in "my-data-folder" Aneufinder(inputfolder="my-data-folder", outputfolder="my-output-folder") ## End(Not run)
## Not run: ## The following call produces plots and genome browser files for all BAM files in "my-data-folder" Aneufinder(inputfolder="my-data-folder", outputfolder="my-output-folder") ## End(Not run)
The aneuHMM
object is output of the function findCNVs
and is basically a list with various entries. The class() attribute of this list was set to "aneuHMM". For a given hmm, the entries can be accessed with the list operators 'hmm[[]]' and 'hmm$'.
ID |
An identifier that is used in various AneuFinder functions. |
bins |
A GRanges-class object containing the genomic bin coordinates, their read count and state classification. |
segments |
A GRanges-class object containing regions and their state classification. |
weights |
Weight for each component. |
transitionProbs |
Matrix of transition probabilities from each state (row) into each state (column). |
transitionProbs.initial |
Initial |
startProbs |
Probabilities for the first bin |
startProbs.initial |
Initial |
distributions |
Estimated parameters of the emission distributions. |
distributions.initial |
Distribution parameters at the beginning of the Baum-Welch. |
convergenceInfo |
Contains information about the convergence of the Baum-Welch algorithm. |
convergenceInfo$eps |
Convergence threshold for the Baum-Welch. |
convergenceInfo$loglik |
Final loglikelihood after the last iteration. |
convergenceInfo$loglik.delta |
Change in loglikelihood after the last iteration (should be smaller than |
convergenceInfo$num.iterations |
Number of iterations that the Baum-Welch needed to converge to the desired |
convergenceInfo$time.sec |
Time in seconds that the Baum-Welch needed to converge to the desired |
findCNVs
Annotate breakpoints as sister-chromatid-exchange (SCE), copy-number-breakpoint (CNB).
annotateBreakpoints(breakpoints)
annotateBreakpoints(breakpoints)
breakpoints |
A |
The input GRanges-class
with additinal column 'type'.
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp readfragments <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), reads.return=TRUE) binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Fit the Hidden Markov Model model <- findCNVs.strandseq(binned[[1]]) ## Add confidence intervals breakpoints <- getBreakpoints(model, readfragments)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp readfragments <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), reads.return=TRUE) binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Fit the Hidden Markov Model model <- findCNVs.strandseq(binned[[1]]) ## Add confidence intervals breakpoints <- getBreakpoints(model, readfragments)
Import aligned reads from a BAM file into a GRanges-class
object.
bam2GRanges(bamfile, bamindex = bamfile, chromosomes = NULL, pairedEndReads = FALSE, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL, what = "mapq")
bam2GRanges(bamfile, bamindex = bamfile, chromosomes = NULL, pairedEndReads = FALSE, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL, what = "mapq")
bamfile |
A sorted BAM file. |
bamindex |
BAM index file. Can be specified without the .bai ending. If the index file does not exist it will be created and a warning is issued. |
chromosomes |
If only a subset of the chromosomes should be imported, specify them here. |
pairedEndReads |
Set to |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
max.fragment.width |
Maximum allowed fragment length. This is to filter out erroneously wrong fragments due to mapping errors of paired end reads. |
blacklist |
A |
what |
A character vector of fields that are returned. Uses the |
A GRanges-class
object containing the reads.
## Get an example BAM file with single-cell-sequencing reads bamfile <- system.file("extdata", "BB150803_IV_074.bam", package="AneuFinderData") ## Read the file into a GRanges object reads <- bam2GRanges(bamfile, chromosomes=c(1:19,'X','Y'), pairedEndReads=FALSE, min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
## Get an example BAM file with single-cell-sequencing reads bamfile <- system.file("extdata", "BB150803_IV_074.bam", package="AneuFinderData") ## Read the file into a GRanges object reads <- bam2GRanges(bamfile, chromosomes=c(1:19,'X','Y'), pairedEndReads=FALSE, min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
Import aligned reads from a BED file into a GRanges-class
object.
bed2GRanges(bedfile, assembly, chromosomes = NULL, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL)
bed2GRanges(bedfile, assembly, chromosomes = NULL, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL)
bedfile |
A file with aligned reads in BED format. The columns have to be c('chromosome','start','end','description','mapq','strand'). |
assembly |
Please see |
chromosomes |
If only a subset of the chromosomes should be imported, specify them here. |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
max.fragment.width |
Maximum allowed fragment length. This is to filter out erroneously wrong fragments. |
blacklist |
A |
A GRanges-class
object containing the reads.
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Read the file into a GRanges object reads <- bed2GRanges(bedfile, assembly='mm10', chromosomes=c(1:19,'X','Y'), min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Read the file into a GRanges object reads <- bed2GRanges(bedfile, assembly='mm10', chromosomes=c(1:19,'X','Y'), min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
Classify the binned read counts into several states which represent copy-number-variation. The function uses the e.divisive
function to segment the genome.
bi.edivisive.findCNVs(binned.data, ID = NULL, CNgrid.start = 0.5, R = 10, sig.lvl = 0.1)
bi.edivisive.findCNVs(binned.data, ID = NULL, CNgrid.start = 0.5, R = 10, sig.lvl = 0.1)
binned.data |
A GRanges-class object with binned read counts. |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
CNgrid.start |
Start parameter for the CNgrid variable. Very empiric. Set to 1.5 for normal data and 0.5 for Strand-seq data. |
R |
method-edivisive: The maximum number of random permutations to use in each iteration of the permutation test (see |
sig.lvl |
method-edivisive: The level at which to sequentially test if a proposed change point is statistically significant (see |
An aneuHMM
object.
biDNAcopy.findCNVs
classifies the binned read counts into several states which represent copy-number-variation using read count information from both strands.
biDNAcopy.findCNVs(binned.data, ID = NULL, CNgrid.start = 0.5)
biDNAcopy.findCNVs(binned.data, ID = NULL, CNgrid.start = 0.5)
binned.data |
A GRanges-class object with binned read counts. |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
CNgrid.start |
Start parameter for the CNgrid variable. Very empiric. Set to 1.5 for normal data and 0.5 for Strand-seq data. |
An aneuHMM
object.
biHMM.findCNVs
finds CNVs using read count information from both strands.
biHMM.findCNVs(binned.data, ID = NULL, eps = 0.01, init = "standard", max.time = -1, max.iter = -1, num.trials = 1, eps.try = NULL, num.threads = 1, count.cutoff.quantile = 0.999, states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "1-somy", algorithm = "EM", initial.params = NULL, verbosity = 1)
biHMM.findCNVs(binned.data, ID = NULL, eps = 0.01, init = "standard", max.time = -1, max.iter = -1, num.trials = 1, eps.try = NULL, num.threads = 1, count.cutoff.quantile = 0.999, states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "1-somy", algorithm = "EM", initial.params = NULL, verbosity = 1)
binned.data |
A |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
eps |
method-HMM: Convergence threshold for the Baum-Welch algorithm. |
init |
method-HMM: One of the following initialization procedures:
|
max.time |
method-HMM: The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. Set |
max.iter |
method-HMM: The maximum number of iterations for the Baum-Welch algorithm. Set |
num.trials |
method-HMM: The number of trials to find a fit where state |
eps.try |
method-HMM: If code num.trials is set to greater than 1, |
num.threads |
method-HMM: Number of threads to use. Setting this to >1 may give increased performance. |
count.cutoff.quantile |
method-HMM: A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. Set |
states |
method-HMM: A subset or all of |
most.frequent.state |
method-HMM: One of the states that were given in |
algorithm |
method-HMM: One of |
initial.params |
method-HMM: A |
verbosity |
method-HMM: Integer specifying the verbosity of printed messages. |
An aneuBiHMM
object.
A GRanges-class object which contains binned read counts as meta data column reads
. It is output of the various binning functions.
Please see functions fixedWidthBins
and variableWidthBins
for further details.
Convert aligned reads in .bam or .bed(.gz) format into read counts in equidistant windows.
binReads(file, assembly, ID = basename(file), bamindex = file, chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10, remove.duplicate.reads = TRUE, max.fragment.width = 1000, blacklist = NULL, outputfolder.binned = "binned_data", binsizes = 1e+06, stepsizes = NULL, reads.per.bin = NULL, reads.per.step = NULL, bins = NULL, variable.width.reference = NULL, save.as.RData = FALSE, calc.complexity = TRUE, call = match.call(), reads.store = FALSE, outputfolder.reads = "data", reads.return = FALSE, reads.overwrite = FALSE, reads.only = FALSE, use.bamsignals = FALSE)
binReads(file, assembly, ID = basename(file), bamindex = file, chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10, remove.duplicate.reads = TRUE, max.fragment.width = 1000, blacklist = NULL, outputfolder.binned = "binned_data", binsizes = 1e+06, stepsizes = NULL, reads.per.bin = NULL, reads.per.step = NULL, bins = NULL, variable.width.reference = NULL, save.as.RData = FALSE, calc.complexity = TRUE, call = match.call(), reads.store = FALSE, outputfolder.reads = "data", reads.return = FALSE, reads.overwrite = FALSE, reads.only = FALSE, use.bamsignals = FALSE)
file |
A file with aligned reads. Alternatively a |
assembly |
Please see |
ID |
An identifier that will be used to identify the file throughout the workflow and in plotting. |
bamindex |
BAM index file. Can be specified without the .bai ending. If the index file does not exist it will be created and a warning is issued. |
chromosomes |
If only a subset of the chromosomes should be binned, specify them here. |
pairedEndReads |
Set to |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
max.fragment.width |
Maximum allowed fragment length. This is to filter out erroneously wrong fragments due to mapping errors of paired end reads. |
blacklist |
A |
outputfolder.binned |
Folder to which the binned data will be saved. If the specified folder does not exist, it will be created. |
binsizes |
An integer vector with bin sizes. If more than one value is given, output files will be produced for each bin size. |
stepsizes |
A vector of step sizes the same length as |
reads.per.bin |
Approximate number of desired reads per bin. The bin size will be selected accordingly. Output files are produced for each value. |
reads.per.step |
Approximate number of desired reads per step. |
bins |
A named |
variable.width.reference |
A BAM file that is used as reference to produce variable width bins. See |
save.as.RData |
If set to |
calc.complexity |
A logical indicating whether or not to estimate library complexity. |
call |
The |
reads.store |
If |
outputfolder.reads |
Folder to which the read fragments will be saved. If the specified folder does not exist, it will be created. |
reads.return |
If |
reads.overwrite |
Whether or not an existing file with read fragments should be overwritten. |
reads.only |
If |
use.bamsignals |
If |
Convert aligned reads from .bam or .bed(.gz) files into read counts in equidistant windows (bins). This function uses GenomicRanges::countOverlaps
to calculate the read counts.
The function produces a list()
of GRanges-class
or GRangesList
objects with meta data columns 'counts', 'mcounts', 'pcounts' that contain the total, minus and plus read count. This binned data will be either written to file (save.as.RData=FALSE
) or given as return value (save.as.RData=FALSE
).
binning
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the BED file into bin size 1Mb binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) print(binned)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the BED file into bin size 1Mb binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) print(binned)
Produce a blacklist of genomic regions with a high ratio of duplicate to unique reads. This blacklist can be used to exclude reads for analysis in Aneufinder
, bam2GRanges
and bed2GRanges
. This function produces a pre-blacklist which has to manually be filtered with a sensible cutoff. See the examples section for details.
blacklist(files, assembly, bins, min.mapq = 10, pairedEndReads = FALSE)
blacklist(files, assembly, bins, min.mapq = 10, pairedEndReads = FALSE)
files |
A character vector of either BAM or BED files. |
assembly |
Please see |
bins |
A list with one |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
pairedEndReads |
Set to |
A GRanges-class
with the same coordinates as bins
with metadata columns ratio, duplicated counts and deduplicated counts.
## Get an example BAM file with single-cell-sequencing reads bamfile <- system.file("extdata", "BB150803_IV_074.bam", package="AneuFinderData") ## Prepare the blacklist bins <- fixedWidthBins(assembly='mm10', binsizes=1e6, chromosome.format='NCBI') pre.blacklist <- blacklist(bamfile, bins=bins) ## Plot a histogram to decide on a sensible cutoff qplot(pre.blacklist$ratio, binwidth=0.1) ## Make the blacklist with cutoff = 1.9 blacklist <- pre.blacklist[pre.blacklist$ratio > 1.9]
## Get an example BAM file with single-cell-sequencing reads bamfile <- system.file("extdata", "BB150803_IV_074.bam", package="AneuFinderData") ## Prepare the blacklist bins <- fixedWidthBins(assembly='mm10', binsizes=1e6, chromosome.format='NCBI') pre.blacklist <- blacklist(bamfile, bins=bins) ## Plot a histogram to decide on a sensible cutoff qplot(pre.blacklist$ratio, binwidth=0.1) ## Make the blacklist with cutoff = 1.9 blacklist <- pre.blacklist[pre.blacklist$ratio > 1.9]
This function uses the mclust package to cluster the input samples based on various quality measures.
clusterByQuality(hmms, G = 1:9, itmax = c(100, 100), measures = c("spikiness", "entropy", "num.segments", "bhattacharyya", "complexity", "sos"), orderBy = "spikiness", reverseOrder = FALSE)
clusterByQuality(hmms, G = 1:9, itmax = c(100, 100), measures = c("spikiness", "entropy", "num.segments", "bhattacharyya", "complexity", "sos"), orderBy = "spikiness", reverseOrder = FALSE)
hmms |
A list of |
G |
An integer vector specifying the number of clusters that are compared. See |
itmax |
The maximum number of outer and inner iterations for the |
measures |
The quality measures that are used for the clustering. Supported is any combination of |
orderBy |
The quality measure to order the clusters by. Default is |
reverseOrder |
Logical indicating whether the ordering by |
Please see getQC
for a brief description of the quality measures.
A list
with the classification, parameters and the Mclust
fit.
Aaron Taudt
## Get a list of HMMs folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) cl <- clusterByQuality(files) ## Plot the clustering and print the parameters plot(cl$Mclust, what='classification') print(cl$parameters) ## Select files from the best 2 clusters for further processing best.files <- unlist(cl$classification[1:2])
## Get a list of HMMs folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) cl <- clusterByQuality(files) ## Plot the clustering and print the parameters plot(cl$Mclust, what='classification') print(cl$parameters) ## Select files from the best 2 clusters for further processing best.files <- unlist(cl$classification[1:2])
Cluster a list of aneuHMM
or aneuBiHMM
objects by similarity in their CNV-state.
clusterHMMs(hmms, cluster = TRUE, exclude.regions = NULL)
clusterHMMs(hmms, cluster = TRUE, exclude.regions = NULL)
hmms |
A list of |
cluster |
Either |
exclude.regions |
A |
An list() with ordered ID indices and the hierarchical clustering.
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) models <- loadFromFiles(lung.files) ## Not run: # Plot unclustered heatmap heatmapGenomewide(models, cluster=FALSE) ## End(Not run) ## Cluster and reorder the models clust <- clusterHMMs(models) models <- models[clust$IDorder] ## Not run: # Plot re-ordered heatmap heatmapGenomewide(models, cluster=FALSE) ## End(Not run)
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) models <- loadFromFiles(lung.files) ## Not run: # Plot unclustered heatmap heatmapGenomewide(models, cluster=FALSE) ## End(Not run) ## Cluster and reorder the models clust <- clusterHMMs(models) models <- models[clust$IDorder] ## Not run: # Plot re-ordered heatmap heatmapGenomewide(models, cluster=FALSE) ## End(Not run)
The function will collapse consecutive bins which have, for example, the same combinatorial state.
collapseBins(data, column2collapseBy = NULL, columns2sumUp = NULL, columns2average = NULL, columns2getMax = NULL, columns2drop = NULL)
collapseBins(data, column2collapseBy = NULL, columns2sumUp = NULL, columns2average = NULL, columns2getMax = NULL, columns2drop = NULL)
data |
A data.frame containing the genomic coordinates in the first three columns. |
column2collapseBy |
The number of the column which will be used to collapse all other inputs. If a set of consecutive bins has the same value in this column, they will be aggregated into one bin with adjusted genomic coordinates. If |
columns2sumUp |
Column numbers that will be summed during the aggregation process. |
columns2average |
Column numbers that will be averaged during the aggregation process. |
columns2getMax |
Column numbers where the maximum will be chosen during the aggregation process. |
columns2drop |
Column numbers that will be dropped after the aggregation process. |
The following tables illustrate the principle of the collapsing:
Input data:
seqnames | start | end | column2collapseBy | moreColumns | columns2sumUp |
chr1 | 0 | 199 | 2 | 1 10 | 1 3 |
chr1 | 200 | 399 | 2 | 2 11 | 0 3 |
chr1 | 400 | 599 | 2 | 3 12 | 1 3 |
chr1 | 600 | 799 | 1 | 4 13 | 0 3 |
chr1 | 800 | 999 | 1 | 5 14 | 1 3 |
Output data:
seqnames | start | end | column2collapseBy | moreColumns | columns2sumUp |
chr1 | 0 | 599 | 2 | 1 10 | 2 9 |
chr1 | 600 | 999 | 1 | 4 13 | 1 6 |
A data.frame.
Aaron Taudt
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the BAM file into bin size 1Mp binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Collapse the bins by chromosome and get average, summed and maximum read count df <- as.data.frame(binned[[1]]) # Remove one bin for illustration purposes df <- df[-3,] head(df) collapseBins(df, column2collapseBy='seqnames', columns2sumUp=c('width','counts'), columns2average='counts', columns2getMax='counts', columns2drop=c('mcounts','pcounts')) collapseBins(df, column2collapseBy=NULL, columns2sumUp=c('width','counts'), columns2average='counts', columns2getMax='counts', columns2drop=c('mcounts','pcounts'))
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the BAM file into bin size 1Mp binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Collapse the bins by chromosome and get average, summed and maximum read count df <- as.data.frame(binned[[1]]) # Remove one bin for illustration purposes df <- df[-3,] head(df) collapseBins(df, column2collapseBy='seqnames', columns2sumUp=c('width','counts'), columns2average='counts', columns2getMax='counts', columns2drop=c('mcounts','pcounts')) collapseBins(df, column2collapseBy=NULL, columns2sumUp=c('width','counts'), columns2average='counts', columns2getMax='counts', columns2drop=c('mcounts','pcounts'))
Get the color schemes that are used in the AneuFinder plots.
stateColors(states = c("zero-inflation", paste0(0:10, "-somy"), "total")) strandColors(strands = c("+", "-")) breakpointColors(breaktypes = c("CNB", "SCE", "CNB+SCE", "other"))
stateColors(states = c("zero-inflation", paste0(0:10, "-somy"), "total")) strandColors(strands = c("+", "-")) breakpointColors(breaktypes = c("CNB", "SCE", "CNB+SCE", "other"))
states |
A character vector with states whose color should be returned. |
strands |
A character vector with strands whose color should be returned. Any combination of |
breaktypes |
A character vector with breakpoint types whose color should be returned. Any combination of |
A character vector with colors.
stateColors
: Colors that are used for the states.
strandColors
: Colors that are used to distinguish strands.
breakpointColors
: Colors that are used for breakpoint types.
## Make a nice pie chart with the AneuFinder state color scheme statecolors <- stateColors() pie(rep(1,length(statecolors)), labels=names(statecolors), col=statecolors) ## Make a nice pie chart with the AneuFinder strand color scheme strandcolors <- strandColors() pie(rep(1,length(strandcolors)), labels=names(strandcolors), col=strandcolors) ## Make a nice pie chart with the AneuFinder breakpoint-type color scheme breakpointcolors <- breakpointColors() pie(rep(1,length(breakpointcolors)), labels=names(breakpointcolors), col=breakpointcolors)
## Make a nice pie chart with the AneuFinder state color scheme statecolors <- stateColors() pie(rep(1,length(statecolors)), labels=names(statecolors), col=statecolors) ## Make a nice pie chart with the AneuFinder strand color scheme strandcolors <- strandColors() pie(rep(1,length(strandcolors)), labels=names(strandcolors), col=strandcolors) ## Make a nice pie chart with the AneuFinder breakpoint-type color scheme breakpointcolors <- breakpointColors() pie(rep(1,length(breakpointcolors)), labels=names(breakpointcolors), col=breakpointcolors)
Compare two sets of aneuHMM
objects generated by different methods (see option method
of findCNVs
).
compareMethods(models1, models2)
compareMethods(models1, models2)
models1 |
A list of |
models2 |
A list of |
A data.frame with one column 'concordance' which gives the fraction of the genome that is called concordantly between both models.
Aaron Taudt
## Get a list of HMMs folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Compare the models with themselves (non-sensical) df <- compareMethods(files, files) head(df)
## Get a list of HMMs folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Compare the models with themselves (non-sensical) df <- compareMethods(files, files) head(df)
Compare two aneuHMM
objects. The function computes the fraction of copy number calls that is concordant between both models.
compareModels(model1, model2)
compareModels(model1, model2)
model1 |
An |
model2 |
An |
A numeric.
Aaron Taudt
Make consensus segments from a list of aneuHMM
or aneuBiHMM
objects.
consensusSegments(hmms)
consensusSegments(hmms)
hmms |
A list of |
The function will produce a GRanges-class
object using the GenomicRanges::disjoin
function on all extracted $segment
entries.
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get consensus segments and states consensusSegments(lung.files)
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get consensus segments and states consensusSegments(lung.files)
Correct a list of binned.data
by GC content.
correctGC(binned.data.list, GC.BSgenome, same.binsize = FALSE, method = "loess", return.plot = FALSE, bins = NULL)
correctGC(binned.data.list, GC.BSgenome, same.binsize = FALSE, method = "loess", return.plot = FALSE, bins = NULL)
binned.data.list |
A |
GC.BSgenome |
A |
same.binsize |
If |
method |
One of |
return.plot |
Set to |
bins |
A |
Two methods are available for GC correction: Option method='quadratic'
uses the method described in the Supplementary of citation("AneuFinder")
. Option method='loess'
uses a loess fit to adjust the read count.
A list()
with binned.data
objects with adjusted read counts. Alternatively a list()
with ggplot
objects if return.plot=TRUE
.
Aaron Taudt
## Get a BED file, bin it and run GC correction bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) plot(binned[[1]], type=1) if (require(BSgenome.Mmusculus.UCSC.mm10)) { binned.GC <- correctGC(list(binned[[1]]), GC.BSgenome=BSgenome.Mmusculus.UCSC.mm10) plot(binned.GC[[1]], type=1) }
## Get a BED file, bin it and run GC correction bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) plot(binned[[1]], type=1) if (require(BSgenome.Mmusculus.UCSC.mm10)) { binned.GC <- correctGC(list(binned[[1]]), GC.BSgenome=BSgenome.Mmusculus.UCSC.mm10) plot(binned.GC[[1]], type=1) }
DNAcopy.findCNVs
classifies the binned read counts into several states which represent copy-number-variation.
DNAcopy.findCNVs(binned.data, ID = NULL, CNgrid.start = 1.5, strand = "*")
DNAcopy.findCNVs(binned.data, ID = NULL, CNgrid.start = 1.5, strand = "*")
binned.data |
A GRanges-class object with binned read counts. |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
CNgrid.start |
Start parameter for the CNgrid variable. Very empiric. Set to 1.5 for normal data and 0.5 for Strand-seq data. |
strand |
Find copy-numbers only for the specified strand. One of |
An aneuHMM
object.
Classify the binned read counts into several states which represent copy-number-variation. The function uses the e.divisive
function to segment the genome.
edivisive.findCNVs(binned.data, ID = NULL, CNgrid.start = 1.5, strand = "*", R = 10, sig.lvl = 0.1)
edivisive.findCNVs(binned.data, ID = NULL, CNgrid.start = 1.5, strand = "*", R = 10, sig.lvl = 0.1)
binned.data |
A GRanges-class object with binned read counts. |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
CNgrid.start |
Start parameter for the CNgrid variable. Very empiric. Set to 1.5 for normal data and 0.5 for Strand-seq data. |
strand |
Find copy-numbers only for the specified strand. One of |
R |
method-edivisive: The maximum number of random permutations to use in each iteration of the permutation test (see |
sig.lvl |
method-edivisive: The level at which to sequentially test if a proposed change point is statistically significant (see |
An aneuHMM
object.
Estimate library complexity using a very simple "Michaelis-Menten" approach.
estimateComplexity(reads)
estimateComplexity(reads)
reads |
A |
A list
with estimated complexity values and plots.
Export copy-number-variation state or read counts as genome browser viewable file
exportCNVs(hmms, filename, trackname = NULL, cluster = TRUE, export.CNV = TRUE, export.breakpoints = TRUE) exportReadCounts(hmms, filename) exportGRanges(gr, filename, header = TRUE, trackname = NULL, score = NULL, priority = NULL, append = FALSE, chromosome.format = "UCSC", thickStart = NULL, thickEnd = NULL, as.wiggle = FALSE, wiggle.val)
exportCNVs(hmms, filename, trackname = NULL, cluster = TRUE, export.CNV = TRUE, export.breakpoints = TRUE) exportReadCounts(hmms, filename) exportGRanges(gr, filename, header = TRUE, trackname = NULL, score = NULL, priority = NULL, append = FALSE, chromosome.format = "UCSC", thickStart = NULL, thickEnd = NULL, as.wiggle = FALSE, wiggle.val)
hmms |
A list of |
filename |
The name of the file that will be written. The appropriate ending will be appended, either ".bed.gz" for CNV-state or ".wig.gz" for read counts. Any existing file will be overwritten. |
trackname |
The name that will be used as track name and description in the header. |
cluster |
If |
export.CNV |
A logical, indicating whether the CNV-state shall be exported. |
export.breakpoints |
A logical, indicating whether breakpoints shall be exported. |
gr |
A |
header |
A logical indicating whether the output file will have a heading track line ( |
score |
A vector of the same length as |
priority |
Priority of the track for display in the genome browser. |
append |
Append to |
chromosome.format |
A character specifying the format of the chromosomes if |
thickStart , thickEnd
|
A vector of the same length as |
as.wiggle |
A logical indicating whether a variableStep-wiggle file will be exported instead of a BED file. If |
wiggle.val |
A vector of the same length as |
Use exportCNVs
to export the copy-number-variation state from an aneuHMM
object in BED format.
Use exportReadCounts
to export the binned read counts from an aneuHMM
object in WIGGLE format.
Use exportGRanges
to export a GRanges-class
object in BED format.
NULL
exportCNVs
: Export CNV-state as .bed.gz file
exportReadCounts
: Export binned read counts as .wig.gz file
exportGRanges
: Export GRanges-class
object as BED file.
Aaron Taudt
## Not run: ## Get results from a small-cell-lung-cancer folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Export the CNV states for upload to the UCSC genome browser exportCNVs(files, filename='upload-me-to-a-genome-browser', cluster=TRUE) ## End(Not run)
## Not run: ## Get results from a small-cell-lung-cancer folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Export the CNV states for upload to the UCSC genome browser exportCNVs(files, filename='upload-me-to-a-genome-browser', cluster=TRUE) ## End(Not run)
filterSegments
filters out segments below a specified minimal segment size. This can be useful to get rid of boundary effects from the Hidden Markov approach.
filterSegments(segments, min.seg.width)
filterSegments(segments, min.seg.width)
segments |
A |
min.seg.width |
The minimum segment width in base-pairs. |
The input model
with adjusted segments.
Aaron Taudt
## Load an HMM file <- list.files(system.file("extdata", "primary-lung", "hmms", package="AneuFinderData"), full.names=TRUE) hmm <- loadFromFiles(file)[[1]] ## Check number of segments before and after filtering length(hmm$segments) hmm$segments <- filterSegments(hmm$segments, min.seg.width=2*width(hmm$bins)[1]) length(hmm$segments)
## Load an HMM file <- list.files(system.file("extdata", "primary-lung", "hmms", package="AneuFinderData"), full.names=TRUE) hmm <- loadFromFiles(file)[[1]] ## Check number of segments before and after filtering length(hmm$segments) hmm$segments <- filterSegments(hmm$segments, min.seg.width=2*width(hmm$bins)[1]) length(hmm$segments)
findCNVs
classifies the binned read counts into several states which represent copy-numbers.
findCNVs(binned.data, ID = NULL, method = "edivisive", strand = "*", R = 10, sig.lvl = 0.1, eps = 0.01, init = "standard", max.time = -1, max.iter = 1000, num.trials = 15, eps.try = max(10 * eps, 1), num.threads = 1, count.cutoff.quantile = 0.999, states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "2-somy", algorithm = "EM", initial.params = NULL, verbosity = 1)
findCNVs(binned.data, ID = NULL, method = "edivisive", strand = "*", R = 10, sig.lvl = 0.1, eps = 0.01, init = "standard", max.time = -1, max.iter = 1000, num.trials = 15, eps.try = max(10 * eps, 1), num.threads = 1, count.cutoff.quantile = 0.999, states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "2-somy", algorithm = "EM", initial.params = NULL, verbosity = 1)
binned.data |
A GRanges-class object with binned read counts. |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
method |
Any combination of |
strand |
Find copy-numbers only for the specified strand. One of |
R |
method-edivisive: The maximum number of random permutations to use in each iteration of the permutation test (see |
sig.lvl |
method-edivisive: The level at which to sequentially test if a proposed change point is statistically significant (see |
eps |
method-HMM: Convergence threshold for the Baum-Welch algorithm. |
init |
method-HMM: One of the following initialization procedures:
|
max.time |
method-HMM: The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. Set |
max.iter |
method-HMM: The maximum number of iterations for the Baum-Welch algorithm. Set |
num.trials |
method-HMM: The number of trials to find a fit where state |
eps.try |
method-HMM: If code num.trials is set to greater than 1, |
num.threads |
method-HMM: Number of threads to use. Setting this to >1 may give increased performance. |
count.cutoff.quantile |
method-HMM: A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. Set |
states |
method-HMM: A subset or all of |
most.frequent.state |
method-HMM: One of the states that were given in |
algorithm |
method-HMM: One of |
initial.params |
method-HMM: A |
verbosity |
method-HMM: Integer specifying the verbosity of printed messages. |
An aneuHMM
object.
Aaron Taudt
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Find copy-numbers model <- findCNVs(binned[[1]]) ## Check the fit plot(model, type='histogram')
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Find copy-numbers model <- findCNVs(binned[[1]]) ## Check the fit plot(model, type='histogram')
findCNVs.strandseq
classifies the binned read counts into several states which represent copy-numbers on each strand.
findCNVs.strandseq(binned.data, ID = NULL, R = 10, sig.lvl = 0.1, eps = 0.01, init = "standard", max.time = -1, max.iter = 1000, num.trials = 5, eps.try = max(10 * eps, 1), num.threads = 1, count.cutoff.quantile = 0.999, strand = "*", states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "1-somy", method = "edivisive", algorithm = "EM", initial.params = NULL)
findCNVs.strandseq(binned.data, ID = NULL, R = 10, sig.lvl = 0.1, eps = 0.01, init = "standard", max.time = -1, max.iter = 1000, num.trials = 5, eps.try = max(10 * eps, 1), num.threads = 1, count.cutoff.quantile = 0.999, strand = "*", states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "1-somy", method = "edivisive", algorithm = "EM", initial.params = NULL)
binned.data |
A GRanges-class object with binned read counts. |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
R |
method-edivisive: The maximum number of random permutations to use in each iteration of the permutation test (see |
sig.lvl |
method-edivisive: The level at which to sequentially test if a proposed change point is statistically significant (see |
eps |
method-HMM: Convergence threshold for the Baum-Welch algorithm. |
init |
method-HMM: One of the following initialization procedures:
|
max.time |
method-HMM: The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. Set |
max.iter |
method-HMM: The maximum number of iterations for the Baum-Welch algorithm. Set |
num.trials |
method-HMM: The number of trials to find a fit where state |
eps.try |
method-HMM: If code num.trials is set to greater than 1, |
num.threads |
method-HMM: Number of threads to use. Setting this to >1 may give increased performance. |
count.cutoff.quantile |
method-HMM: A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. Set |
strand |
Find copy-numbers only for the specified strand. One of |
states |
method-HMM: A subset or all of |
most.frequent.state |
method-HMM: One of the states that were given in |
method |
Any combination of |
algorithm |
method-HMM: One of |
initial.params |
method-HMM: A |
An aneuBiHMM
object.
Aaron Taudt
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the file into bin size 1Mp binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), pairedEndReads=TRUE) ## Find copy-numbers model <- findCNVs.strandseq(binned[[1]]) ## Check the fit plot(model, type='histogram') plot(model, type='profile')
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the file into bin size 1Mp binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), pairedEndReads=TRUE) ## Find copy-numbers model <- findCNVs.strandseq(binned[[1]]) ## Check the fit plot(model, type='histogram') plot(model, type='profile')
Find breakpoint hotspots with kernel density estimation (KDE).
findHotspots(models, bw, pval = 0.05, spacing.bp = 5000, filename = NULL)
findHotspots(models, bw, pval = 0.05, spacing.bp = 5000, filename = NULL)
models |
A list of |
bw |
Bandwidth used for kernel density estimation (see |
pval |
P-value cutoff for hotspots. |
spacing.bp |
Spacing of datapoints for KDE in basepairs. |
filename |
Will write hotspot coordinates and densities to the specified file. Endings "_breakpoint-hotspots.bed.gz" and "_breakpoint-densities.wig.gz" will be appended to |
findHotspots
uses density
to perform a KDE. A p-value is calculated by comparing the density profile of the genomic events with the density profile of a randomly subsampled set of genomic events. Due to this random sampling, the result can vary for each function call, most likely for hotspots whose p-value is close to the specified pval
.
A list of GRanges-class
objects containing 1) coordinates of hotspots and 2) p-values within the hotspot.
Make fixed-width bins based on given bin size.
fixedWidthBins(bamfile = NULL, assembly = NULL, chrom.lengths = NULL, chromosome.format, binsizes = 1e+06, stepsizes = NULL, chromosomes = NULL)
fixedWidthBins(bamfile = NULL, assembly = NULL, chrom.lengths = NULL, chromosome.format, binsizes = 1e+06, stepsizes = NULL, chromosomes = NULL)
bamfile |
A BAM file from which the header is read to determine the chromosome lengths. If a |
assembly |
An assembly from which the chromosome lengths are determined. Please see |
chrom.lengths |
A named character vector with chromosome lengths. Names correspond to chromosomes. |
chromosome.format |
A character specifying the format of the chromosomes if |
binsizes |
A vector of bin sizes in base pairs. |
stepsizes |
A vector of step sizes in base pairs, the same length as |
chromosomes |
A subset of chromosomes for which the bins are generated. |
A list()
of GRanges-class
objects with fixed-width bins. If stepsizes
is specified, a list()
of GRangesList
objects with one entry per step.
Aaron Taudt
## Make fixed-width bins of size 500kb and 1Mb bins <- fixedWidthBins(assembly='mm10', chromosome.format='NCBI', binsizes=c(5e5,1e6)) bins
## Make fixed-width bins of size 500kb and 1Mb bins <- fixedWidthBins(assembly='mm10', chromosome.format='NCBI', binsizes=c(5e5,1e6)) bins
Extract breakpoints with confidence intervals from an aneuHMM
or aneuBiHMM
object.
getBreakpoints(model, fragments = NULL, confint = 0.99)
getBreakpoints(model, fragments = NULL, confint = 0.99)
model |
An |
fragments |
A |
confint |
Desired confidence interval for breakpoints. Set |
Confidence intervals for breakpoints are estimated by going outwards from the breakpoint read by read, and performing a test of getting the observed or a more extreme outcome, given that the reads within the confidence interval belong to the other side of the breakpoint.
A GRanges-class
with breakpoint coordinates and confidence interals if fragments
was specified.
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp readfragments <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), reads.return=TRUE) binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Fit the Hidden Markov Model model <- findCNVs.strandseq(binned[[1]]) ## Add confidence intervals breakpoints <- getBreakpoints(model, readfragments)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp readfragments <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), reads.return=TRUE) binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Fit the Hidden Markov Model model <- findCNVs.strandseq(binned[[1]]) ## Add confidence intervals breakpoints <- getBreakpoints(model, readfragments)
Get a set of distinct colors selected from colors
.
getDistinctColors(n, start.color = "blue4", exclude.colors = c("white", "black", "gray", "grey", "\\<yellow\\>", "yellow1", "lemonchiffon"), exclude.brightness.above = 1, exclude.rgb.above = 210)
getDistinctColors(n, start.color = "blue4", exclude.colors = c("white", "black", "gray", "grey", "\\<yellow\\>", "yellow1", "lemonchiffon"), exclude.brightness.above = 1, exclude.rgb.above = 210)
n |
Number of colors to select. If |
start.color |
Color to start the selection process from. |
exclude.colors |
Character vector with colors that should not be used. |
exclude.brightness.above |
Exclude colors where the 'brightness' value in HSV space is above. This is useful to obtain a matt palette. |
exclude.rgb.above |
Exclude colors where all RGB values are above. This is useful to exclude whitish colors. |
The function computes the euclidian distance between all colors
and iteratively selects those that have the furthest closes distance to the set of already selected colors.
A character vector with colors.
Aaron Taudt
cols <- AneuFinder:::getDistinctColors(5) pie(rep(1,5), labels=cols, col=cols)
cols <- AneuFinder:::getDistinctColors(5) pie(rep(1,5), labels=cols, col=cols)
Obtain a data.frame with quality metrics from a list of aneuHMM
objects or a list of files that contain such objects.
getQC(models)
getQC(models)
models |
A list of |
The employed quality measures are:
total.read.count: Total read count.
avg.binsize: Average binsize.
avg.read.count: Average read count.
spikiness: Bin-to-bin variability of read count.
entropy: Shannon entropy of read counts.
complexity: Library complexity approximated with a Michaelis-Menten curve.
loglik: Loglikelihood of the Hidden Markov Model.
num.segments: Number of copy number segments that have been found.
bhattacharrya distance: Bhattacharyya distance between 1-somy and 2-somy distributions.
sos: Sum-of-squares distance of read counts to the fitted distributions in their respective segments.
A data.frame with columns
Aaron Taudt
## Get a list of HMMs folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) df <- getQC(files)
## Get a list of HMMs folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) df <- getQC(files)
Extracts the coordinates of a sister chromatid exchanges (SCE) from an aneuBiHMM
object.
getSCEcoordinates(model, resolution = c(3, 6), min.segwidth = 2, fragments = NULL)
getSCEcoordinates(model, resolution = c(3, 6), min.segwidth = 2, fragments = NULL)
model |
An |
resolution |
An integer vector specifying the resolution at bin level at which to scan for SCE events. |
min.segwidth |
Segments below this width will be removed before scanning for SCE events. |
fragments |
A |
A GRanges-class
object containing the SCE coordinates.
Aaron Taudt
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the BAM file into bin size 1Mp binned <- binReads(bedfile, assembly='hg19', binsize=1e6, chromosomes=c(1:22,'X','Y'), pairedEndReads=TRUE) ## Fit the Hidden Markov Model ## Find copy-numbers model <- findCNVs.strandseq(binned[[1]]) ## Find sister chromatid exchanges model$sce <- getSCEcoordinates(model) print(model$sce) plot(model)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the BAM file into bin size 1Mp binned <- binReads(bedfile, assembly='hg19', binsize=1e6, chromosomes=c(1:22,'X','Y'), pairedEndReads=TRUE) ## Fit the Hidden Markov Model ## Find copy-numbers model <- findCNVs.strandseq(binned[[1]]) ## Find sister chromatid exchanges model$sce <- getSCEcoordinates(model) print(model$sce) plot(model)
Plot a heatmap of aneuploidy state for multiple samples. Samples can be clustered and the output can be returned as data.frame.
heatmapAneuploidies(hmms, ylabels = NULL, cluster = TRUE, as.data.frame = FALSE)
heatmapAneuploidies(hmms, ylabels = NULL, cluster = TRUE, as.data.frame = FALSE)
hmms |
A list of |
ylabels |
A vector with labels for the y-axis. The vector must have the same length as |
cluster |
If |
as.data.frame |
If |
A ggplot
object or a data.frame, depending on option as.data.frame
.
Aaron Taudt
## Get results from a small-cell-lung-cancer folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Plot the ploidy state per chromosome heatmapAneuploidies(files, cluster=FALSE) ## Return the ploidy state as data.frame df <- heatmapAneuploidies(files, cluster=FALSE, as.data.frame=TRUE) head(df)
## Get results from a small-cell-lung-cancer folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Plot the ploidy state per chromosome heatmapAneuploidies(files, cluster=FALSE) ## Return the ploidy state as data.frame df <- heatmapAneuploidies(files, cluster=FALSE, as.data.frame=TRUE) head(df)
Plot a genome wide heatmap of copy number variation state. This heatmap is best plotted to file, because in most cases it will be too big for cleanly plotting it to screen.
heatmapGenomewide(hmms, ylabels = NULL, classes = NULL, classes.color = NULL, file = NULL, cluster = TRUE, plot.breakpoints = FALSE, hotspots = NULL, exclude.regions = NULL)
heatmapGenomewide(hmms, ylabels = NULL, classes = NULL, classes.color = NULL, file = NULL, cluster = TRUE, plot.breakpoints = FALSE, hotspots = NULL, exclude.regions = NULL)
hmms |
A list of |
ylabels |
A vector with labels for the y-axis. The vector must have the same length as |
classes |
A character vector with the classification of the elements on the y-axis. The vector must have the same length as |
classes.color |
A (named) vector with colors that are used to distinguish |
file |
A PDF file to which the heatmap will be plotted. |
cluster |
Either |
plot.breakpoints |
Logical indicating whether breakpoints should be plotted. |
hotspots |
A |
exclude.regions |
A |
A ggplot
object or NULL
if a file was specified.
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Plot a clustered heatmap classes <- c(rep('lung', length(lung.files)), rep('liver', length(liver.files))) labels <- c(paste('lung',1:length(lung.files)), paste('liver',1:length(liver.files))) heatmapGenomewide(c(lung.files, liver.files), ylabels=labels, classes=classes, classes.color=c('blue','red'))
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Plot a clustered heatmap classes <- c(rep('lung', length(lung.files)), rep('liver', length(liver.files))) labels <- c(paste('lung',1:length(lung.files)), paste('liver',1:length(liver.files))) heatmapGenomewide(c(lung.files, liver.files), ylabels=labels, classes=classes, classes.color=c('blue','red'))
This function is a convenient wrapper to call heatmapGenomewide
for all clusters after calling clusterByQuality
and plot the heatmaps into one pdf for efficient comparison.
heatmapGenomewideClusters(cl = NULL, cutree = NULL, file = NULL, ...)
heatmapGenomewideClusters(cl = NULL, cutree = NULL, file = NULL, ...)
cl |
The return value of |
cutree |
The return value of |
file |
A character specifying the output file. |
... |
Further parameters passed on to |
A cowplot
object or NULL
if a file was specified.
## Get a list of HMMs and cluster them folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) cl <- clusterByQuality(files, G=5) heatmapGenomewideClusters(cl=cl) ## Plot sub-clones of the largest cluster largest.cluster <- which.max(sapply(cl$classification, length)) files <- cl$classification[[largest.cluster]] clust <- clusterHMMs(files) groups <- cutree(tree = clust$hclust, k = 5) heatmapGenomewideClusters(cutree = groups, cluster = FALSE)
## Get a list of HMMs and cluster them folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) cl <- clusterByQuality(files, G=5) heatmapGenomewideClusters(cl=cl) ## Plot sub-clones of the largest cluster largest.cluster <- which.max(sapply(cl$classification, length)) files <- cl$classification[[largest.cluster]] clust <- clusterHMMs(files) groups <- cutree(tree = clust$hclust, k = 5) heatmapGenomewideClusters(cutree = groups, cluster = FALSE)
HMM.findCNVs
classifies the binned read counts into several states which represent copy-number-variation.
HMM.findCNVs(binned.data, ID = NULL, eps = 0.01, init = "standard", max.time = -1, max.iter = -1, num.trials = 1, eps.try = NULL, num.threads = 1, count.cutoff.quantile = 0.999, strand = "*", states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "2-somy", algorithm = "EM", initial.params = NULL, verbosity = 1)
HMM.findCNVs(binned.data, ID = NULL, eps = 0.01, init = "standard", max.time = -1, max.iter = -1, num.trials = 1, eps.try = NULL, num.threads = 1, count.cutoff.quantile = 0.999, strand = "*", states = c("zero-inflation", paste0(0:10, "-somy")), most.frequent.state = "2-somy", algorithm = "EM", initial.params = NULL, verbosity = 1)
binned.data |
A |
ID |
An identifier that will be used to identify this sample in various downstream functions. Could be the file name of the |
eps |
method-HMM: Convergence threshold for the Baum-Welch algorithm. |
init |
method-HMM: One of the following initialization procedures:
|
max.time |
method-HMM: The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. Set |
max.iter |
method-HMM: The maximum number of iterations for the Baum-Welch algorithm. Set |
num.trials |
method-HMM: The number of trials to find a fit where state |
eps.try |
method-HMM: If code num.trials is set to greater than 1, |
num.threads |
method-HMM: Number of threads to use. Setting this to >1 may give increased performance. |
count.cutoff.quantile |
method-HMM: A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. Set |
strand |
Find copy-numbers only for the specified strand. One of |
states |
method-HMM: A subset or all of |
most.frequent.state |
method-HMM: One of the states that were given in |
algorithm |
method-HMM: One of |
initial.params |
method-HMM: A |
verbosity |
method-HMM: Integer specifying the verbosity of printed messages. |
An aneuHMM
object.
Find hotspots of genomic events by using kernel density estimation.
hotspotter(breakpoints, bw, pval = 0.05, spacing.bp = 5000)
hotspotter(breakpoints, bw, pval = 0.05, spacing.bp = 5000)
breakpoints |
A list with |
bw |
Bandwidth used for kernel density estimation (see |
pval |
P-value cutoff for hotspots. |
spacing.bp |
Spacing of datapoints for KDE in basepairs. |
The hotspotter uses density
to perform a KDE. A p-value is calculated by comparing the density profile of the genomic events with the density profile of a randomly subsampled set of genomic events (bootstrapping).
A list of GRanges-class
objects containing 1) coordinates of hotspots and 2) p-values within the hotspot.
Aaron Taudt
Find hotspots of genomic events by using kernel density estimation.
hotspotter.variable(breakpoints, confint, pval = 0.05, spacing.bp = 5000)
hotspotter.variable(breakpoints, confint, pval = 0.05, spacing.bp = 5000)
breakpoints |
A list with |
confint |
Confidence interval that was used for breakpoint estimation. |
pval |
P-value cutoff for hotspots. |
spacing.bp |
Spacing of datapoints for KDE in basepairs. |
The hotspotter uses a gaussian kernel with variable bandwidth to perform a KDE. The bandwidth depends on the confidence intervals of the breakpoints. A p-value is calculated by comparing the density profile of the genomic events with the density profile of a randomly subsampled set of genomic events (bootstrapping).
A list of GRanges-class
objects containing 1) coordinates of hotspots and 2) p-values within the hotspot.
Aaron Taudt
This is a simple convenience function to read a bed(.gz)-file into a GRanges-class
object. The bed-file is expected to have the following fields: chromosome, start, end, name, score, strand
.
importBed(bedfile, skip = 0, chromosome.format = "NCBI")
importBed(bedfile, skip = 0, chromosome.format = "NCBI")
bedfile |
Filename of the bed or bed.gz file. |
skip |
Number of lines to skip at the beginning. |
chromosome.format |
Desired format of the chromosomes. Either 'NCBI' for (1,2,3 ...) or 'UCSC' for (chr1,chr2,chr3 ...). |
A GRanges-class
object with the contents of the bed-file.
Aaron Taudt
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Import the file and skip the first 10 lines data <- importBed(bedfile, skip=10)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Import the file and skip the first 10 lines data <- importBed(bedfile, skip=10)
Initialize the state factor levels and distributions for the specified states.
initializeStates(states)
initializeStates(states)
states |
A subset of |
A list
with $labels, $distributions and $multiplicity values for the given states.
Computes measures for karyotype heterogeneity. See the Details section for how these measures are defined.
karyotypeMeasures(hmms, normalChromosomeNumbers = NULL, regions = NULL, exclude.regions = NULL)
karyotypeMeasures(hmms, normalChromosomeNumbers = NULL, regions = NULL, exclude.regions = NULL)
hmms |
A list with |
normalChromosomeNumbers |
A named integer vector or matrix with physiological copy numbers, where each element (vector) or column (matrix) corresponds to a chromosome. This is useful to specify male or female samples, e.g. |
regions |
A |
exclude.regions |
A |
We define as the vector of copy number states for each position. The number of HMMs is
. The measures are computed for each bin as follows:
, where P is the physiological number of chromosomes at that position.
A list
with two data.frame
s, containing the karyotype measures $genomewide and $per.chromosome. If region
was specified, a third list entry $regions will contain the regions with karyotype measures.
Aaron Taudt
### Example 1 ### ## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Compare karyotype measures between the two cancers normal.chrom.numbers <- rep(2, 23) names(normal.chrom.numbers) <- c(1:22,'X') lung <- karyotypeMeasures(lung.files, normalChromosomeNumbers=normal.chrom.numbers) liver <- karyotypeMeasures(liver.files, normalChromosomeNumbers=normal.chrom.numbers) print(lung$genomewide) print(liver$genomewide) ### Example 2 ### ## Construct a matrix with physiological copy numbers for a mix of 5 male and 5 female samples normal.chrom.numbers <- matrix(2, nrow=10, ncol=24, dimnames=list(sample=c(paste('male', 1:5), paste('female', 6:10)), chromosome=c(1:22,'X','Y'))) normal.chrom.numbers[1:5,c('X','Y')] <- 1 normal.chrom.numbers[6:10,c('Y')] <- 0 print(normal.chrom.numbers) ### Example 3 ### ## Exclude artifact regions with high variance consensus <- consensusSegments(c(lung.files, liver.files)) variance <- apply(consensus$copy.number, 1, var) exclude.regions <- consensus[variance > quantile(variance, 0.999)] ## Compare karyotype measures between the two cancers normal.chrom.numbers <- rep(2, 23) names(normal.chrom.numbers) <- c(1:22,'X') lung <- karyotypeMeasures(lung.files, normalChromosomeNumbers=normal.chrom.numbers, exclude.regions = exclude.regions) liver <- karyotypeMeasures(liver.files, normalChromosomeNumbers=normal.chrom.numbers, exclude.regions = exclude.regions) print(lung$genomewide) print(liver$genomewide)
### Example 1 ### ## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Compare karyotype measures between the two cancers normal.chrom.numbers <- rep(2, 23) names(normal.chrom.numbers) <- c(1:22,'X') lung <- karyotypeMeasures(lung.files, normalChromosomeNumbers=normal.chrom.numbers) liver <- karyotypeMeasures(liver.files, normalChromosomeNumbers=normal.chrom.numbers) print(lung$genomewide) print(liver$genomewide) ### Example 2 ### ## Construct a matrix with physiological copy numbers for a mix of 5 male and 5 female samples normal.chrom.numbers <- matrix(2, nrow=10, ncol=24, dimnames=list(sample=c(paste('male', 1:5), paste('female', 6:10)), chromosome=c(1:22,'X','Y'))) normal.chrom.numbers[1:5,c('X','Y')] <- 1 normal.chrom.numbers[6:10,c('Y')] <- 0 print(normal.chrom.numbers) ### Example 3 ### ## Exclude artifact regions with high variance consensus <- consensusSegments(c(lung.files, liver.files)) variance <- apply(consensus$copy.number, 1, var) exclude.regions <- consensus[variance > quantile(variance, 0.999)] ## Compare karyotype measures between the two cancers normal.chrom.numbers <- rep(2, 23) names(normal.chrom.numbers) <- c(1:22,'X') lung <- karyotypeMeasures(lung.files, normalChromosomeNumbers=normal.chrom.numbers, exclude.regions = exclude.regions) liver <- karyotypeMeasures(liver.files, normalChromosomeNumbers=normal.chrom.numbers, exclude.regions = exclude.regions) print(lung$genomewide) print(liver$genomewide)
Wrapper to load AneuFinder objects from file and check the class of the loaded objects.
loadFromFiles(files, check.class = c("GRanges", "GRangesList", "aneuHMM", "aneuBiHMM"))
loadFromFiles(files, check.class = c("GRanges", "GRangesList", "aneuHMM", "aneuBiHMM"))
files |
A list of |
check.class |
Any combination of |
A list of GRanges-class
, GRangesList
, aneuHMM
or aneuBiHMM
objects.
## Get some files that you want to load folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Load and plot the first ten hmms <- loadFromFiles(files[1:10]) lapply(hmms, plot, type='profile')
## Get some files that you want to load folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(folder, full.names=TRUE) ## Load and plot the first ten hmms <- loadFromFiles(files[1:10]) lapply(hmms, plot, type='profile')
Merge strand libraries to generate a high-coverage Strand-seq library.
mergeStrandseqFiles(files, assembly, chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10, remove.duplicate.reads = TRUE, max.fragment.width = 1000)
mergeStrandseqFiles(files, assembly, chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10, remove.duplicate.reads = TRUE, max.fragment.width = 1000)
files |
A character vector with files with aligned reads. |
assembly |
Please see |
chromosomes |
If only a subset of the chromosomes should be imported, specify them here. |
pairedEndReads |
Set to |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
max.fragment.width |
Maximum allowed fragment length. This is to filter out erroneously wrong fragments due to mapping errors of paired end reads. |
A GRanges-class
object with reads.
Perform a PCA for copy number profiles in aneuHMM
objects.
plot_pca(hmms, PC1 = 1, PC2 = 2, colorBy = NULL, plot = TRUE, exclude.regions = NULL)
plot_pca(hmms, PC1 = 1, PC2 = 2, colorBy = NULL, plot = TRUE, exclude.regions = NULL)
hmms |
A list of |
PC1 |
Integer specifying the first of the principal components to plot. |
PC2 |
Integer specifying the second of the principal components to plot. |
colorBy |
A character vector of the same length as |
plot |
Set to |
exclude.regions |
A |
A ggplot
object or a data.frame if plot=FALSE
.
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Plot the PCA classes <- c(rep('lung', length(lung.files)), rep('liver', length(liver.files))) labels <- c(paste('lung',1:length(lung.files)), paste('liver',1:length(liver.files))) plot_pca(c(lung.files, liver.files), colorBy=classes, PC1=2, PC2=4)
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Plot the PCA classes <- c(rep('lung', length(lung.files)), rep('liver', length(liver.files))) labels <- c(paste('lung',1:length(lung.files)), paste('liver',1:length(liver.files))) plot_pca(c(lung.files, liver.files), colorBy=classes, PC1=2, PC2=4)
aneuBiHMM
objectsMake different types of plots for aneuBiHMM
objects.
## S3 method for class 'aneuBiHMM' plot(x, type = "profile", ...)
## S3 method for class 'aneuBiHMM' plot(x, type = "profile", ...)
x |
An |
type |
Type of the plot, one of
|
... |
Additional arguments for the different plot types. |
A ggplot
object.
aneuHMM
objectsMake different types of plots for aneuHMM
objects.
## S3 method for class 'aneuHMM' plot(x, type = "profile", ...)
## S3 method for class 'aneuHMM' plot(x, type = "profile", ...)
x |
An |
type |
Type of the plot, one of
|
... |
Additional arguments for the different plot types. |
A ggplot
object.
Convenience function that loads and plots a AneuFinder object in one step.
## S3 method for class 'character' plot(x, ...)
## S3 method for class 'character' plot(x, ...)
x |
A filename that contains either |
... |
Additional arguments. |
A ggplot
object.
Make plots for binned read counts from binned.data
.
## S3 method for class 'GRanges' plot(x, type = "profile", ...)
## S3 method for class 'GRanges' plot(x, type = "profile", ...)
x |
A |
type |
Type of the plot, one of
|
... |
Additional arguments for the different plot types. |
A ggplot
object.
Make plots for binned read counts (list) from binned.data
.
## S3 method for class 'GRangesList' plot(x, type = "profile", ...)
## S3 method for class 'GRangesList' plot(x, type = "profile", ...)
x |
A |
type |
Type of the plot, one of
|
... |
Additional arguments for the different plot types. |
A ggplot
object.
Make heterogeneity vs. aneuploidy plots using individual chromosomes as datapoints.
plotHeterogeneity(hmms, hmms.list = NULL, normalChromosomeNumbers = NULL, plot = TRUE, regions = NULL, exclude.regions = NULL)
plotHeterogeneity(hmms, hmms.list = NULL, normalChromosomeNumbers = NULL, plot = TRUE, regions = NULL, exclude.regions = NULL)
hmms |
A list of |
hmms.list |
Alternative input for a faceted plot. A named list() of lists of |
normalChromosomeNumbers |
A named integer vector or matrix with physiological copy numbers, where each element (vector) or column (matrix) corresponds to a chromosome. This is useful to specify male or female samples, e.g. |
plot |
A logical indicating whether to plot or to return the underlying data.frame. |
regions |
A |
exclude.regions |
A |
A ggplot
object or a data.frame if plot=FALSE
.
### Example 1: A faceted plot of lung and liver cells ### ## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Make heterogeneity plots plotHeterogeneity(hmms.list = list(lung=lung.files, liver=liver.files)) ### Example 2: Plot a mixture sample of male and female cells ### ## Get results from a small-cell-lung-cancer folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(lung.folder, full.names=TRUE) ## Construct a matrix with physiological copy numbers for a mix of 48 male and 48 female samples normal.chrom.numbers <- matrix(2, nrow=96, ncol=24, dimnames=list(sample=c(paste('male', 1:48), paste('female', 49:96)), chromosome=c(1:22,'X','Y'))) normal.chrom.numbers[1:48,c('X','Y')] <- 1 normal.chrom.numbers[49:96,c('Y')] <- 0 head(normal.chrom.numbers) ## Make heterogeneity plots plotHeterogeneity(hmms = files, normalChromosomeNumbers = normal.chrom.numbers) ### Example 3: A faceted plot of male lung and female liver cells ### ## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Specify the physiological copy numbers chrom.numbers.lung <- c(rep(2, 22), 1, 1) names(chrom.numbers.lung) <- c(1:22, 'X', 'Y') print(chrom.numbers.lung) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Specify the physiological copy numbers chrom.numbers.liver <- c(rep(2, 22), 2, 0) names(chrom.numbers.liver) <- c(1:22, 'X', 'Y') print(chrom.numbers.liver) ## Make heterogeneity plots plotHeterogeneity(hmms.list = list(lung=lung.files, liver=liver.files), normalChromosomeNumbers = list(chrom.numbers.lung, chrom.numbers.liver)) ### Example 4 ### ## Exclude artifact regions with high variance consensus <- consensusSegments(c(lung.files, liver.files)) variance <- apply(consensus$copy.number, 1, var) exclude.regions <- consensus[variance > quantile(variance, 0.999)] ## Make heterogeneity plots plotHeterogeneity(hmms.list = list(lung=lung.files, liver=liver.files), exclude.regions=exclude.regions)
### Example 1: A faceted plot of lung and liver cells ### ## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Make heterogeneity plots plotHeterogeneity(hmms.list = list(lung=lung.files, liver=liver.files)) ### Example 2: Plot a mixture sample of male and female cells ### ## Get results from a small-cell-lung-cancer folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") files <- list.files(lung.folder, full.names=TRUE) ## Construct a matrix with physiological copy numbers for a mix of 48 male and 48 female samples normal.chrom.numbers <- matrix(2, nrow=96, ncol=24, dimnames=list(sample=c(paste('male', 1:48), paste('female', 49:96)), chromosome=c(1:22,'X','Y'))) normal.chrom.numbers[1:48,c('X','Y')] <- 1 normal.chrom.numbers[49:96,c('Y')] <- 0 head(normal.chrom.numbers) ## Make heterogeneity plots plotHeterogeneity(hmms = files, normalChromosomeNumbers = normal.chrom.numbers) ### Example 3: A faceted plot of male lung and female liver cells ### ## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Specify the physiological copy numbers chrom.numbers.lung <- c(rep(2, 22), 1, 1) names(chrom.numbers.lung) <- c(1:22, 'X', 'Y') print(chrom.numbers.lung) ## Get results from the liver metastasis of the same patient liver.folder <- system.file("extdata", "metastasis-liver", "hmms", package="AneuFinderData") liver.files <- list.files(liver.folder, full.names=TRUE) ## Specify the physiological copy numbers chrom.numbers.liver <- c(rep(2, 22), 2, 0) names(chrom.numbers.liver) <- c(1:22, 'X', 'Y') print(chrom.numbers.liver) ## Make heterogeneity plots plotHeterogeneity(hmms.list = list(lung=lung.files, liver=liver.files), normalChromosomeNumbers = list(chrom.numbers.lung, chrom.numbers.liver)) ### Example 4 ### ## Exclude artifact regions with high variance consensus <- consensusSegments(c(lung.files, liver.files)) variance <- apply(consensus$copy.number, 1, var) exclude.regions <- consensus[variance > quantile(variance, 0.999)] ## Make heterogeneity plots plotHeterogeneity(hmms.list = list(lung=lung.files, liver=liver.files), exclude.regions=exclude.regions)
Plot a histogram of binned read counts from with fitted mixture distributions from a aneuHMM
object.
plotHistogram(model)
plotHistogram(model)
model |
A |
A ggplot
object.
Plot a karyogram-like chromosome overview with read counts and CNV-state from a aneuHMM
object or binned.data
.
plotKaryogram(model, both.strands = FALSE, plot.breakpoints = TRUE, file = NULL)
plotKaryogram(model, both.strands = FALSE, plot.breakpoints = TRUE, file = NULL)
model |
A |
both.strands |
If |
plot.breakpoints |
Logical indicating whether breakpoints should be plotted. |
file |
A PDF file where the plot will be saved. |
A ggplot
object or NULL
if a file was specified.
Plot a profile with read counts and CNV-state from a aneuHMM
object or binned.data
.
plotProfile(model, both.strands = FALSE, plot.breakpoints = FALSE, file = NULL, normalize.counts = NULL)
plotProfile(model, both.strands = FALSE, plot.breakpoints = FALSE, file = NULL, normalize.counts = NULL)
model |
A |
both.strands |
If |
plot.breakpoints |
Logical indicating whether breakpoints should be plotted. |
file |
A PDF file where the plot will be saved. |
normalize.counts |
An character giving the copy number state to which to normalize the counts, e.g. '1-somy', '2-somy' etc. |
A ggplot
object or NULL
if a file was specified.
Print aneuBiHMM object
## S3 method for class 'aneuBiHMM' print(x, ...)
## S3 method for class 'aneuBiHMM' print(x, ...)
x |
An |
... |
Ignored. |
An invisible NULL
.
Print aneuHMM object
## S3 method for class 'aneuHMM' print(x, ...)
## S3 method for class 'aneuHMM' print(x, ...)
x |
An |
... |
Ignored. |
An invisible NULL
.
Calculate various quality control measures on binned read counts.
qc.spikiness(counts) qc.entropy(counts) qc.bhattacharyya(hmm) qc.sos(hmm)
qc.spikiness(counts) qc.entropy(counts) qc.bhattacharyya(hmm) qc.sos(hmm)
counts |
A vector of binned read counts. |
hmm |
An |
The Shannon entropy is defined as
, where
.
Spikyness is defined as .
A numeric.
qc.spikiness
: Calculate the spikiness of a library
qc.entropy
: Calculate the Shannon entropy of a library
qc.bhattacharyya
: Calculate the Bhattacharyya distance between the '1-somy' and '2-somy' distribution
qc.sos
: Sum-of-squares distance from the read counts to the fitted distributions
Aaron Taudt
Read an AneuFinder configuration file into a list structure. The configuration file has to be specified in INI format. R expressions can be used and will be evaluated.
readConfig(configfile)
readConfig(configfile)
configfile |
Path to the configuration file |
A list
with one entry for each element in configfile
.
Aaron Taudt
Refine breakpoints with confidence intervals from an initial estimate (from getBreakpoints
).
refineBreakpoints(model, fragments, breakpoints = model$breakpoints, confint = 0.99)
refineBreakpoints(model, fragments, breakpoints = model$breakpoints, confint = 0.99)
model |
An |
fragments |
A |
breakpoints |
A |
confint |
Desired confidence interval for breakpoints. |
Breakpoints are refined by shifting the breakpoint within its initial confidence interval read by read and maximizing the probability of observing the left-right read distribution.
An aneuBiHMM
with adjusted breakpoint coordinates and confidence interals, bins and segments.
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp readfragments <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), reads.return=TRUE) binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Fit the Hidden Markov Model model <- findCNVs.strandseq(binned[[1]]) ## Add confidence intervals breakpoints <- getBreakpoints(model, readfragments) ## Refine breakpoints refined.model <- refineBreakpoints(model, readfragments, breakpoints)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Bin the data into bin size 1Mp readfragments <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y'), reads.return=TRUE) binned <- binReads(bedfile, assembly='mm10', binsize=1e6, chromosomes=c(1:19,'X','Y')) ## Fit the Hidden Markov Model model <- findCNVs.strandseq(binned[[1]]) ## Add confidence intervals breakpoints <- getBreakpoints(model, readfragments) ## Refine breakpoints refined.model <- refineBreakpoints(model, readfragments, breakpoints)
Simulate single or paired end reads from any BSgenome-class object. These simulated reads can be mapped to the reference genome using any aligner to produce BAM files that can be used for mappability correction.
simulateReads(bsgenome, readLength, bamfile, file, pairedEndFragmentLength = NULL, every.X.bp = 500)
simulateReads(bsgenome, readLength, bamfile, file, pairedEndFragmentLength = NULL, every.X.bp = 500)
bsgenome |
A BSgenome-class object containing the sequence of the reference genome. |
readLength |
The length in base pairs of the simulated reads that are written to file. |
bamfile |
A BAM file. This file is used to estimate the distribution of Phred quality scores. |
file |
The filename that is written to disk. The ending .fastq.gz will be appended. |
pairedEndFragmentLength |
If this option is specified, paired end reads with length |
every.X.bp |
Stepsize for simulating reads. A read fragment will be simulated every X bp. |
Reads are simulated by splitting the genome into reads with the specified readLength
.
A fastq.gz file is written to disk.
Aaron Taudt
## Get an example BAM file with single-cell-sequencing reads bamfile <- system.file("extdata", "BB150803_IV_074.bam", package="AneuFinderData") ## Simulate 51bp reads for at a distance of every 5000bp if (require(BSgenome.Mmusculus.UCSC.mm10)) { simulateReads(BSgenome.Mmusculus.UCSC.mm10, bamfile=bamfile, readLength=51, file=tempfile(), every.X.bp=5000) }
## Get an example BAM file with single-cell-sequencing reads bamfile <- system.file("extdata", "BB150803_IV_074.bam", package="AneuFinderData") ## Simulate 51bp reads for at a distance of every 5000bp if (require(BSgenome.Mmusculus.UCSC.mm10)) { simulateReads(BSgenome.Mmusculus.UCSC.mm10, bamfile=bamfile, readLength=51, file=tempfile(), every.X.bp=5000) }
Get the IDs of models that have a certain CNV profile. The result will be TRUE
for models that overlap all specified ranges in profile
by at least one base pair with the correct state.
subsetByCNVprofile(hmms, profile)
subsetByCNVprofile(hmms, profile)
hmms |
A list of |
profile |
A |
A named logical vector with TRUE
for all models that are concordant with the given profile
.
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get all files that have a 3-somy on chromosome 1 and 4-somy on chromosome 2 profile <- GRanges(seqnames=c('1','2'), ranges=IRanges(start=c(1,1), end=c(195471971,182113224)), expected.state=c('3-somy','4-somy')) ids <- subsetByCNVprofile(lung.files, profile) print(which(ids))
## Get results from a small-cell-lung-cancer lung.folder <- system.file("extdata", "primary-lung", "hmms", package="AneuFinderData") lung.files <- list.files(lung.folder, full.names=TRUE) ## Get all files that have a 3-somy on chromosome 1 and 4-somy on chromosome 2 profile <- GRanges(seqnames=c('1','2'), ranges=IRanges(start=c(1,1), end=c(195471971,182113224)), expected.state=c('3-somy','4-somy')) ids <- subsetByCNVprofile(lung.files, profile) print(which(ids))
Add two columns with transformed genomic coordinates to the GRanges-class
object. This is useful for making genomewide plots.
transCoord(gr)
transCoord(gr)
gr |
A |
The input GRanges-class
with two additional metadata columns 'start.genome' and 'end.genome'.
Make variable-width bins based on a reference BAM file. This can be a simulated file (produced by simulateReads
and aligned with your favourite aligner) or a real reference.
variableWidthBins(reads, binsizes, stepsizes = NULL, chromosomes = NULL)
variableWidthBins(reads, binsizes, stepsizes = NULL, chromosomes = NULL)
reads |
A |
binsizes |
A vector with binsizes. Resulting bins will be close to the specified binsizes. |
stepsizes |
A vector of step sizes in base pairs, the same length as |
chromosomes |
A subset of chromosomes for which the bins are generated. |
Variable-width bins are produced by first binning the reference BAM file with fixed-width bins and selecting the desired number of reads per bin as the (non-zero) maximum of the histogram. A new set of bins is then generated such that every bin contains the desired number of reads.
A list()
of GRanges-class
objects with variable-width bins. If stepsizes
is specified, a list()
of GRangesList
objects with one entry per step.
Aaron Taudt
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Read the file into a GRanges object reads <- bed2GRanges(bedfile, assembly='mm10', chromosomes=c(1:19,'X','Y'), min.mapq=10, remove.duplicate.reads=TRUE) ## Make variable-width bins of size 500kb and 1Mb bins <- variableWidthBins(reads, binsizes=c(5e5,1e6)) ## Plot the distribution of binsizes hist(width(bins[['binsize_1e+06']]), breaks=50)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData") ## Read the file into a GRanges object reads <- bed2GRanges(bedfile, assembly='mm10', chromosomes=c(1:19,'X','Y'), min.mapq=10, remove.duplicate.reads=TRUE) ## Make variable-width bins of size 500kb and 1Mb bins <- variableWidthBins(reads, binsizes=c(5e5,1e6)) ## Plot the distribution of binsizes hist(width(bins[['binsize_1e+06']]), breaks=50)
Write an AneuFinder configuration file from a list structure.
writeConfig(conf, configfile)
writeConfig(conf, configfile)
conf |
A list structure with parameter values. Each entry will be written in one line. |
configfile |
Filename of the outputfile. |
NULL
Aaron Taudt
Density, distribution function, quantile function and random
generation for the zero-inflated negative binomial distribution with parameters
w
, size
and prob
.
dzinbinom(x, w, size, prob, mu) pzinbinom(q, w, size, prob, mu, lower.tail = TRUE) qzinbinom(p, w, size, prob, mu, lower.tail = TRUE) rzinbinom(n, w, size, prob, mu)
dzinbinom(x, w, size, prob, mu) pzinbinom(q, w, size, prob, mu, lower.tail = TRUE) qzinbinom(p, w, size, prob, mu, lower.tail = TRUE) rzinbinom(n, w, size, prob, mu)
x |
Vector of (non-negative integer) quantiles. |
w |
Weight of the zero-inflation. |
size |
Target for number of successful trials, or dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive, need not be integer. |
prob |
Probability of success in each trial. |
mu |
Alternative parametrization via mean: see ‘Details’. |
q |
Vector of quantiles. |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
Vector of probabilities. |
n |
number of observations. If |
The zero-inflated negative binomial distribution with size
and
prob
has density
for ,
,
and
.
for ,
,
and
.
dzinbinom gives the density, pzinbinom gives the distribution function, qzinbinom gives the quantile function, and rzinbinom generates random deviates.
dzinbinom
: gives the density
pzinbinom
: gives the cumulative distribution function
qzinbinom
: gives the quantile function
rzinbinom
: random number generation
Matthias Heinig, Aaron Taudt
Distributions for standard distributions, including
dbinom
for the binomial, dnbinom
for the negative binomial, dpois
for the
Poisson and dgeom
for the geometric distribution, which
is a special case of the negative binomial.