Title: | Mutational Signature Estimation for Single Samples |
---|---|
Description: | Single sample estimation of exposure to mutational signatures. Exposures to known mutational signatures are estimated for single samples, based on quadratic programming algorithms. Bootstrapping the input mutational catalogues provides estimations on the stability of these exposures. The effect of the sequence composition of mutational context can be taken into account by normalising the catalogues. |
Authors: | Franziska Schumann <[email protected]> |
Maintainer: | Franziska Schumann <[email protected]> |
License: | GPL-3 |
Version: | 1.21.0 |
Built: | 2024-11-30 04:51:35 UTC |
Source: | https://github.com/bioc/SigsPack |
Bootstraps a given mutational catalogue by replicating samples from the original catalogue's distribution of mutational features. The output can be input to signature_exposure.
bootstrap_mut_catalogues(n, original, m = NULL)
bootstrap_mut_catalogues(n, original, m = NULL)
n |
Amount of bootstrapped replicates that are created (they will by default have the same amount of mutations as the original catalogue) |
original |
Mutational catalogue (matrix) of a sample that is taken as the distribution from which the replicates are sampled |
m |
Amount of mutations the replicates are supposed to have (e.g. if this differs from the original or if the original is provided as probabilities instead of total counts) |
matrix containing the mutational catalogues of the replicates
data(cosmicSigs) reps <- bootstrap_mut_catalogues(n = 150, original = create_mut_catalogues( 10, 500)[["catalogues"]][,1])
data(cosmicSigs) reps <- bootstrap_mut_catalogues(n = 150, original = create_mut_catalogues( 10, 500)[["catalogues"]][,1])
COSMIC Signature Profiles
cosmicSigs
cosmicSigs
A matrix with 30 rows and 96 columns containing the signature profiles of the 30 consensus signatures as reported on COSMIC.
The profile of signature N representing the relative proportions of mutations generated by each signature based on the trinucleotide frequencies of hte reference human genome version GRCh37.
#'
https://cancer.sanger.ac.uk/cancergenome/assets/signatures_probabilities.txt
Creates mutational catalogues from chosen mutational signature profiles to gain simualted data sampled from a distirbution with with known signature contribution
create_mut_catalogues(n, m, P = get(utils::data("cosmicSigs", package = "SigsPack")), sig_set = NULL, c_exposure = NULL)
create_mut_catalogues(n, m, P = get(utils::data("cosmicSigs", package = "SigsPack")), sig_set = NULL, c_exposure = NULL)
n |
Amount of mutational catalogues that will be created |
m |
Amount of mutations that each catalogues will have |
P |
Matrix (f x k) containing the signature profiles of k signatures which will be used to create the catalogues k is the amount of signature profiles that P contains f is the amount of features that the profiles are defined on (default: COSMIC signature matrix 96 x 30) (optional) |
sig_set |
Numeric vector containing the index of the columns from P (which are the signature profiles) that will be used to create the catalogues (optional). Defaults to all columns of P. |
c_exposure |
Numeric vector specifying the contribution of each signature to the distribution each sample is drawn from. Samples will have an appr. exposure of c_exposure[idx] to signature sig_set[idx] (if sum(c_exposure)==1). (optional) Default: random exposure. |
List containing a matrix (f x n) with the simulated catalogues and a matrix (k x n) detailing the signature exposure of each catalogue
data(cosmicSigs) sim_data <- create_mut_catalogues(10, 300, cosmicSigs, c(2,6,15,27)) sim_data <- create_mut_catalogues(1000, 500) sim_data <- create_mut_catalogues(1, 500, sig_set = c(1,4,29), c_exposure = c(0.25, 0.65, 0.1))
data(cosmicSigs) sim_data <- create_mut_catalogues(10, 300, cosmicSigs, c(2,6,15,27)) sim_data <- create_mut_catalogues(1000, 500) sim_data <- create_mut_catalogues(1, 500, sig_set = c(1,4,29), c_exposure = c(0.25, 0.65, 0.1))
This function is taken from the package 'SignatureEstimation' by Xiaoqing Huang and Damian Wojtowicz (source: https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/index.cgi#signatureestimation) The function allows to get the optimal solution by using dual method to solve the quadratic programming problem.
decomposeQP(m, P, ...)
decomposeQP(m, P, ...)
m |
observed tumor profile vector for a single patient/sample, 96 by 1. m is normalized. |
P |
signature profile matrix, 96 by N (N = # signatures, COSMIC: N=30) |
... |
control parameter that can be passed into the solve.QP function |
matrix containing estimated signature exposures
data(cosmicSigs) mut_cat<- (create_mut_catalogues(1,500)[['catalogues']])/500 decomposeQP(mut_cat, cosmicSigs)
data(cosmicSigs) mut_cat<- (create_mut_catalogues(1,500)[['catalogues']])/500 decomposeQP(mut_cat, cosmicSigs)
Extracts the frequencies of the tri-nucleotide contexts in a given region of the genome. These frequencies are needed to normalize a mutational catalogue. The output can be input to normalize().
get_context_freq(genome, region = NULL)
get_context_freq(genome, region = NULL)
genome |
a BSgenome object |
region |
a GRanges object, path, URL, connection or BEDFile object. |
matrix containing the frequencies of the trinucleotide contexts
gr<-GenomicRanges::GRanges(seqnames=c("chr1"), ranges=IRanges::IRanges(start=c(100000),end=c(1000000)), strand=c("+")) get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, gr) get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19) ## Not run: get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, 'example.bed') ## End(Not run)
gr<-GenomicRanges::GRanges(seqnames=c("chr1"), ranges=IRanges::IRanges(start=c(100000),end=c(1000000)), strand=c("+")) get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, gr) get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19) ## Not run: get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, 'example.bed') ## End(Not run)
Trinuecleotide frequencies of the human reference genome hg19
hg19context_freq
hg19context_freq
A numeric vector with 32 elements
The frequency of a certain trinucleotide context in the genome.
BSgenome.Hsapiens.UCSC.hg19
Normalizes the catalogues to a target distribution (e.g. to match the distribution of the reference signatures).
normalize(mut_cat, source_context, target_context = get(utils::data("hg19context_freq", package = "SigsPack")))
normalize(mut_cat, source_context, target_context = get(utils::data("hg19context_freq", package = "SigsPack")))
mut_cat |
mutational catalogues (96 by n, n being the amounts of catalogues) that will be normalized. The tri-nucleotide contexts are expected to be in the default lexicographical order (see simulated data or cosmicSigs) |
source_context |
Distribution of tri-nucleotides in the source region. |
target_context |
Distribution of tri-nucleotides in the target region. Defaults to the context frequencies of BSgenome.Hsapiens.UCSC.hg19 since that corresponds to the COSMIC signatures |
mutational catalogues (96 by n, n being the amounts of catalogues) normalized to match the target distribution (context)
The output from get_context_freq() can be used as input to this function
# this is a toy example: #create mutational catalogue: sim_data <- create_mut_catalogues(1, 500)[['catalogues']] # get trinucleotide frequencies for the genome: genome_context <- get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19) #get trinucleotide frequencies for a specific region: gr<-GenomicRanges::GRanges(seqnames=c("chr1"), ranges=IRanges::IRanges(start=c(100000),end=c(1000000)), strand=c("+")) region_context<-get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, gr) #normalize data: normalized_mut_cat <- normalize(sim_data, region_context, genome_context) ## Not run: # get the tri-nucleotide distribution of an exome region exome_contexts <- get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, 'example_exome.bed') # normalize the mutational catalogue to match the COSMIC signatures normalized_mut_cat <- normalize(mut_cat, exome_contexts, hg19context_freq) ## End(Not run)
# this is a toy example: #create mutational catalogue: sim_data <- create_mut_catalogues(1, 500)[['catalogues']] # get trinucleotide frequencies for the genome: genome_context <- get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19) #get trinucleotide frequencies for a specific region: gr<-GenomicRanges::GRanges(seqnames=c("chr1"), ranges=IRanges::IRanges(start=c(100000),end=c(1000000)), strand=c("+")) region_context<-get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, gr) #normalize data: normalized_mut_cat <- normalize(sim_data, region_context, genome_context) ## Not run: # get the tri-nucleotide distribution of an exome region exome_contexts <- get_context_freq(BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, 'example_exome.bed') # normalize the mutational catalogue to match the COSMIC signatures normalized_mut_cat <- normalize(mut_cat, exome_contexts, hg19context_freq) ## End(Not run)
Creates a boxplot illustrating the results of the signature estimation for several mutational catalogues (e.g. bootstrapped re-samples or a cohort). The plot shows the distribution of estimated signature exposure for all the catalogues, highlighting the one of the original mutational catalogue if one is provided.
plot_bootstrapped_exposure(bootstrapped_exposure, original_estimation = NULL, title = NULL, box_col = NULL, point_col = NULL, sig_names = NULL, sample_names = NULL)
plot_bootstrapped_exposure(bootstrapped_exposure, original_estimation = NULL, title = NULL, box_col = NULL, point_col = NULL, sig_names = NULL, sample_names = NULL)
bootstrapped_exposure |
matrix (n by k) containing the signature exposure of several mutational catalogues (bootstrapped re-samples) n is the amount of re-samples k is the amount of signature profiles that P contains |
original_estimation |
matrix (n by k) containing the signature exposure of one or several mutational catalogues (typically, the original sample) that will be diplayed on top of the boxplots |
title |
character, title of the plot |
box_col |
color option of the boxplot |
point_col |
color option of the points that indicate the exposure of the original profile(s). Should be a vector, if several original profiles are plotted |
sig_names |
character vector, names of the signatures to be displayed as x-axis labels |
sample_names |
character vector, names of the original profile(s) |
Diplays a boxplot
The function can of course also be used to plot the distribution of estimated signature exposures in a cohort instead of one bootstrapped sample.
# prepare input data(cosmicSigs) mut_cat <- create_mut_catalogues(4,400) exposures <- signature_exposure(bootstrap_mut_catalogues( 1000, mut_cat[['catalogues']][,1]))[['exposures']] original_exposure <- signature_exposure(mut_cat[['catalogues']])[['exposures']] plot_bootstrapped_exposure(exposures, as.matrix(original_exposure[,1])) plot_bootstrapped_exposure(exposures, as.matrix(original_exposure), title='Example', box_col='grey')
# prepare input data(cosmicSigs) mut_cat <- create_mut_catalogues(4,400) exposures <- signature_exposure(bootstrap_mut_catalogues( 1000, mut_cat[['catalogues']][,1]))[['exposures']] original_exposure <- signature_exposure(mut_cat[['catalogues']])[['exposures']] plot_bootstrapped_exposure(exposures, as.matrix(original_exposure[,1])) plot_bootstrapped_exposure(exposures, as.matrix(original_exposure), title='Example', box_col='grey')
Estimates the signature exposure of a mutational catalogue by reconstructing it from a chosen set of signatures, by default, the used method is quadratic programming
signature_exposure(mut_cat, P = get(utils::data("cosmicSigs", package = "SigsPack")), sig_set = NULL, FUN = decomposeQP, ...)
signature_exposure(mut_cat, P = get(utils::data("cosmicSigs", package = "SigsPack")), sig_set = NULL, FUN = decomposeQP, ...)
mut_cat |
matrix (f by n) conatining one or several mutational catalogues of samples whose signature exposures are to be estimated f is the amount of features that the profiles are defined on n is the number of catalogues |
P |
Matrix (f by k) containing the signature profiles of k signatures whose exposure is to be found k is the amount of signature profiles that P contains f is the amount of features that the profiles are defined on (default: COSMIC signature matrix 96 x 30) |
sig_set |
Numeric vector containing the index of the columns from the signature matrix (which are the signature profiles) that will be used reconstruct the mutational catalogue. |
FUN |
Function to estimate the signature exposure. Default: Quadratic programming (P has to be of full rank to use this method) |
... |
control parameters that will be passed to FUN |
List of the estimated signature exposure, the reconstructed profile of the sample, the cosine similarity between the two and the error
data(cosmicSigs) signature_exposure(create_mut_catalogues(10,500)[['catalogues']]) signature_exposure(create_mut_catalogues(10,500)[['catalogues']], sig_set = c(2,7,16,28,30)) signature_exposure(as.matrix(create_mut_catalogues(10,500)[['catalogues']][,1]))
data(cosmicSigs) signature_exposure(create_mut_catalogues(10,500)[['catalogues']]) signature_exposure(create_mut_catalogues(10,500)[['catalogues']], sig_set = c(2,7,16,28,30)) signature_exposure(as.matrix(create_mut_catalogues(10,500)[['catalogues']][,1]))
COSMIC v3 whole genome SBS Signature Profiles
sigProfiler20190522
sigProfiler20190522
A matrix with 96 rows and 67 columns containing the signature profiles of the 67 consensus signatures as reported on COSMIC version 3.
The profile of signature SBS<N> representing the relative proportions of mutations generated by each signature based on the trinucleotide frequencies of the reference human genome version GRCh37. The signature profiles have been normalised for mutational catalogues collected on WGS data.
https://cancer.sanger.ac.uk/cosmic/signatures/SBS/
COSMIC v3 exome SBS Signature Profiles
sigProfilerExome
sigProfilerExome
A matrix with 96 rows and 67 columns containing the signature profiles of the 67 consensus signatures as reported on COSMIC version 3.
The profile of signature SBS<N> representing the relative proportions of mutations generated by each signature based on the trinucleotide frequencies of the reference human genome version GRCh37. The signature profiles have been normalised for mutational catalogues collected on exome datasets.
https://cancer.sanger.ac.uk/cosmic/signatures/SBS/
Bootstrapps a mutational catalogue and details the results about the signature estimation in a table and a boxplot
summarize_exposures(mut_cat, P = get(utils::data("cosmicSigs", package = "SigsPack")), plotting = TRUE, m = NULL)
summarize_exposures(mut_cat, P = get(utils::data("cosmicSigs", package = "SigsPack")), plotting = TRUE, m = NULL)
mut_cat |
mutational catalogue (96 by 1) |
P |
Matrix (f x k) containing the signature profiles of k signatures whose exposure is to be found k is the amount of signature profiles that P contains f is the amount of features that the profiles are defined on (default: COSMIC signature matrix 96 by 30) |
plotting |
boolean, TRUE by default, if True, a boxplot showing the distribution of estimated signature exposure for all the re-samples, highlighting the one of the original mutational catalogue, thus providing insights on the reliability of the estimates |
m |
numeric, amount of mutations in the profile (needed for bootstrapping in case mut_cat doesn't contain absolute counts.) |
table (k by 6) containing statistics for each signature about the estimated exposure over 1000 bootstrapped resamples - original exposure estimated exposure of the original mutational profile - min minimum estimated exposure to this signature - 1. quartile - median - 3. quartile - max maximum estimated exposure to this signature
if the plotting option is chosen, a boxplot will also be displayed
summarize_exposures(create_mut_catalogues(10,500)[['catalogues']][,1], plotting=FALSE)
summarize_exposures(create_mut_catalogues(10,500)[['catalogues']][,1], plotting=FALSE)
Creates a matrix containing the mutational catalogue from a vcf file or object. The result can be input to the analysis functions of this package.
vcf2mut_cat(vcf, genome, name = NULL, seqs = NULL)
vcf2mut_cat(vcf, genome, name = NULL, seqs = NULL)
vcf |
*.vcf file or a vcf object containing variant calling data for one patient |
genome |
a BSgenome object corresponding to the genome the variants were called on |
name |
optional, a sample name |
seqs |
optional, a character vector containing the names of the sequences that are to be included in the mutational profile. If none is given everything will we included |
mutational catalogue (matrix) of a patient containing SNV absolute counts (in the 96 trinucleotide context) format: 1 by 96
The execution can take some time, depending on the size of the vcf
## Not run: vcf2mut_cat('test.vcf', BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19) ## End(Not run)
## Not run: vcf2mut_cat('test.vcf', BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19) ## End(Not run)