Title: | Mutational Signature Comprehensive Analysis Toolkit |
---|---|
Description: | Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3. |
Authors: | Aaron Chevalier [aut] (0000-0002-3968-9250),
Natasha Gurevich [aut] (0000-0002-0747-6840),
Tao Guo [aut] (0009-0005-8960-9203),
Joshua D. Campbell [aut, cre] |
Maintainer: | Joshua D. Campbell <[email protected]> |
License: | LGPL-3 |
Version: | 1.15.1 |
Built: | 2024-07-17 19:44:10 UTC |
Source: | https://github.com/bioc/musicatk |
Uses a genome object to find context and add it to the variant table
add_flank_to_variants( musica, g, flank_start, flank_end, build_table = TRUE, overwrite = FALSE )
add_flank_to_variants( musica, g, flank_start, flank_end, build_table = TRUE, overwrite = FALSE )
musica |
Input samples |
g |
A BSgenome object indicating which genome reference the variants and their coordinates were derived from. |
flank_start |
Start of flank area to add, can be positive or negative |
flank_end |
End of flank area to add, can be positive or negative |
build_table |
Automatically build a table using the annotation and add |
overwrite |
Overwrite existing count table |
None it to the musica
data(musica_sbs96_tiny) g <- select_genome("19") add_flank_to_variants(musica_sbs96_tiny, g, 1, 2) add_flank_to_variants(musica_sbs96_tiny, g, -2, -1)
data(musica_sbs96_tiny) g <- select_genome("19") add_flank_to_variants(musica_sbs96_tiny, g, 1, 2) add_flank_to_variants(musica_sbs96_tiny, g, -2, -1)
Add replication strand annotation to SBS variants based on bedgraph file
annotate_replication_strand(musica, rep_range, build_table = TRUE)
annotate_replication_strand(musica, rep_range, build_table = TRUE)
musica |
A |
rep_range |
A GRanges object with replication timing as metadata |
build_table |
Automatically build a table from this annotation |
None
data(musica) data(rep_range) annotate_replication_strand(musica, rep_range)
data(musica) data(rep_range) annotate_replication_strand(musica, rep_range)
Add transcript strand annotation to SBS variants (defined in genes only)
annotate_transcript_strand(musica, genome_build, build_table = TRUE)
annotate_transcript_strand(musica, genome_build, build_table = TRUE)
musica |
A |
genome_build |
Which genome build to use: hg19, hg38, or a custom TxDb object |
build_table |
Automatically build a table from this annotation |
None
data(musica) annotate_transcript_strand(musica, 19)
data(musica) annotate_transcript_strand(musica, 19)
Adds an annotation to the input musica's variant table with length of each variant
annotate_variant_length(musica)
annotate_variant_length(musica)
musica |
Input samples |
None
data(musica) annotate_variant_length(musica) musica
data(musica) annotate_variant_length(musica) musica
Annotate variants with variant type ("SBS", "INS", "DEl", "DBS")
annotate_variant_type(musica)
annotate_variant_type(musica)
musica |
A |
None
data(musica) annotate_variant_type(musica)
data(musica) annotate_variant_type(musica)
Automatic filtering of signatures for exposure prediction gridded across specific annotation
auto_predict_grid( musica, table_name, signature_res, algorithm, sample_annotation = NULL, min_exists = 0.05, proportion_samples = 0.25, rare_exposure = 0.4, verbose = TRUE, combine_res = TRUE )
auto_predict_grid( musica, table_name, signature_res, algorithm, sample_annotation = NULL, min_exists = 0.05, proportion_samples = 0.25, rare_exposure = 0.4, verbose = TRUE, combine_res = TRUE )
musica |
Input samples to predit signature weights |
table_name |
Name of table used for posterior prediction (e.g. SBS96) |
signature_res |
Signatures to automatically subset from for prediction |
algorithm |
Algorithm to use for prediction. Choose from "lda_posterior", and decompTumor2Sig |
sample_annotation |
Annotation to grid across, if none given, prediction subsetting on all samples together |
min_exists |
Threshold to consider a signature active in a sample |
proportion_samples |
Threshold of samples to consider a signature active in the cohort |
rare_exposure |
A sample will be considered active in the cohort if at least one sample has more than this threshold proportion |
verbose |
Print current annotation value being predicted on |
combine_res |
Automatically combines a list of annotation results into a single result object with zero exposure values for signatures not found in a given annotation's set of samples |
A list of results, one per unique annotation value, if no annotation value is given, returns a single result for all samples, or combines into a single result if combines_res = TRUE
data(musica_annot) data(cosmic_v2_sigs) auto_predict_grid(musica = musica_annot, table_name = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda", sample_annotation = "Tumor_Subtypes") auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda")
data(musica_annot) data(cosmic_v2_sigs) auto_predict_grid(musica = musica_annot, table_name = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda", sample_annotation = "Tumor_Subtypes") auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda")
Builds a custom table from specified user variants
build_custom_table( musica, variant_annotation, name, description = character(), data_factor = NA, annotation_df = NULL, features = NULL, type = NULL, color_variable = NULL, color_mapping = NULL, return_instead = FALSE, overwrite = FALSE )
build_custom_table( musica, variant_annotation, name, description = character(), data_factor = NA, annotation_df = NULL, features = NULL, type = NULL, color_variable = NULL, color_mapping = NULL, return_instead = FALSE, overwrite = FALSE )
musica |
A |
variant_annotation |
User column to use for building table |
name |
Table name to refer to (must be unique) |
description |
Optional description of the table content |
data_factor |
Full set of table values, in case some are missing from the data. If NA, a superset of all available unique data values will be used |
annotation_df |
A data.frame of annotations to use for plotting |
features |
A data.frame of the input data from which the count table will be built |
type |
The type of data/mutation in each feature as an Rle object |
color_variable |
The name of the column of annotation_df used for the coloring in plots |
color_mapping |
The mapping from the values in the selected color_variable column to color values for plotting |
return_instead |
Instead of adding to musica object, return the created table |
overwrite |
Overwrite existing count table |
If return_instead = TRUE then the created table object is returned, otherwise the table object is automatically added to the musica's count_tables list and nothing is returned
data(musica) annotate_transcript_strand(musica, "19", build_table = FALSE) build_custom_table(musica, "Transcript_Strand", "Transcript_Strand", data_factor = factor(c("T", "U")))
data(musica) annotate_transcript_strand(musica, "19", build_table = FALSE) build_custom_table(musica, "Transcript_Strand", "Transcript_Strand", data_factor = factor(c("T", "U")))
Generates count tables for different mutation type schemas which can be
used as input to the mutational signature discovery or prediction functions.
"SBS96"
generates a table for single base substitutions following the
standard 96 mutation types derived from the trinucleotide context.
"SBS192"
is the 96 mutation type schema with the addition of
transcriptional strand or replication strand information added to each base.
"DBS"
generates a table for the double base substitution schema
used in COSMIC V3. "Indel"
generates a table for insertions and
deletions following the schema used in COSMIC V3.
build_standard_table( musica, g, table_name, strand_type = NULL, overwrite = FALSE, verbose = FALSE )
build_standard_table( musica, g, table_name, strand_type = NULL, overwrite = FALSE, verbose = FALSE )
musica |
A |
g |
A BSgenome object indicating which genome reference the variants and their coordinates were derived from. |
table_name |
Name of standard table to build. One of |
strand_type |
Strand type to use in SBS192 schema. One of
|
overwrite |
If |
verbose |
Show progress bar for processed samples |
No object will be returned. The count tables will be automatically
added to the musica
object.
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) data(musica) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") data(musica) data(rep_range) annotate_replication_strand(musica, rep_range) build_standard_table(musica, g, "SBS192", "Replication_Strand") data(dbs_musica) build_standard_table(dbs_musica, g, "DBS", overwrite = TRUE) data(indel_musica) build_standard_table(indel_musica, g, table_name = "INDEL")
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) data(musica) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") data(musica) data(rep_range) annotate_replication_strand(musica, rep_range) build_standard_table(musica, g, "SBS192", "Replication_Strand") data(dbs_musica) build_standard_table(dbs_musica, g, "DBS", overwrite = TRUE) data(indel_musica) build_standard_table(indel_musica, g, table_name = "INDEL")
The count_tables
contains standard and/or custom
count tables created from variants
built_tables(object) ## S4 method for signature 'musica' built_tables(object) ## S4 method for signature 'musica_result' built_tables(object)
built_tables(object) ## S4 method for signature 'musica' built_tables(object) ## S4 method for signature 'musica_result' built_tables(object)
object |
A |
The names of created count_tables
data(res) built_tables(res)
data(res) built_tables(res)
Proportional sample exposures will be used as input to perform clustering.
cluster_exposure( result, nclust, proportional = TRUE, method = "kmeans", dis.method = "euclidean", hc.method = "ward.D", clara.samples = 5, iter.max = 10, tol = 1e-15 )
cluster_exposure( result, nclust, proportional = TRUE, method = "kmeans", dis.method = "euclidean", hc.method = "ward.D", clara.samples = 5, iter.max = 10, tol = 1e-15 )
result |
A |
nclust |
Pre-defined number of clusters. |
proportional |
Logical, indicating if proportional exposure (default) will be used for clustering. |
method |
Clustering algorithms. Options are "kmeans" (K-means), "hkmeans" (hybrid of hierarchical K-means), "hclust" (hierarchical clustering), "pam" (PAM), and "clara" (Clara). |
dis.method |
Methods to calculate dissimilarity matrix. Options are "euclidean" (default), "manhattan", "jaccard", "cosine", and "canberra". |
hc.method |
Methods to perform hierarchical clustering. Options are "ward.D" (default), "ward.D2", "single", "complete", "average", "mcquitty", "median", and "centroid". |
clara.samples |
Number of samples to be drawn from dataset. Only used when "clara" is selected. Default is 5. |
iter.max |
Maximum number of iterations for k-means clustering. |
tol |
Tolerance level for kmeans clustering level iterations |
A one-column data frame with sample IDs as row names and cluster number for each sample.
set.seed(123) data(res_annot) clust_out <- cluster_exposure(res_annot, nclust = 2)
set.seed(123) data(res_annot) clust_out <- cluster_exposure(res_annot, nclust = 2)
Combines tables into a single table that can be used for discovery/prediction
combine_count_tables( musica, to_comb, name, description = character(), color_variable = character(), color_mapping = character(), overwrite = FALSE )
combine_count_tables( musica, to_comb, name, description = character(), color_variable = character(), color_mapping = character(), overwrite = FALSE )
musica |
A |
to_comb |
A vector of table names to combine. Each table must already exist within the input musica object |
name |
Name of table build, must be a new name |
description |
Description of the new table |
color_variable |
Annotation column to use for coloring plotted motifs, provided by counts table from input result's musica object |
color_mapping |
Mapping from color_variable to color names, provided by counts table from input result's musica object |
overwrite |
Overwrite existing count table |
None
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") combine_count_tables(musica, c("SBS96", "SBS192_Trans"), "combo")
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") combine_count_tables(musica, c("SBS96", "SBS192_Trans"), "combo")
Combine prediction grid list into a result object. Exposure values are zero for samples in an annotation where that signature was not predicted
combine_predict_grid(grid_list, musica, table_name, signature_res)
combine_predict_grid(grid_list, musica, table_name, signature_res)
grid_list |
A list of result objects from the prediction grid to combine into a single result |
musica |
A |
table_name |
Table name used for prediction |
signature_res |
Signatures to automatically subset from for prediction |
A result object combining all samples and signatures from a prediction grid. Samples have zero exposure value for signatures not found in that annotation type.
data(musica_annot) data(cosmic_v2_sigs) grid <- auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda", "Tumor_Subtypes", combine_res = FALSE) combined <- combine_predict_grid(grid, musica_annot, "SBS96", cosmic_v2_sigs) plot_exposures(combined, group_by = "annotation", annotation="Tumor_Subtypes")
data(musica_annot) data(cosmic_v2_sigs) grid <- auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda", "Tumor_Subtypes", combine_res = FALSE) combined <- combine_predict_grid(grid, musica_annot, "SBS96", cosmic_v2_sigs) plot_exposures(combined, group_by = "annotation", annotation="Tumor_Subtypes")
Compare a result object to COSMIC V2 SBS Signatures (combination whole-exome and whole-genome)
compare_cosmic_v2( result, threshold = 0.9, metric = "cosine", result_name = deparse(substitute(result)), decimals = 2, same_scale = FALSE )
compare_cosmic_v2( result, threshold = 0.9, metric = "cosine", result_name = deparse(substitute(result)), decimals = 2, same_scale = FALSE )
result |
A |
threshold |
threshold for similarity |
metric |
One of |
result_name |
title for plot user result signatures |
decimals |
Specifies rounding for similarity metric displayed. Default
|
same_scale |
If |
Returns the comparisons
data(res) compare_cosmic_v2(res, threshold = 0.7)
data(res) compare_cosmic_v2(res, threshold = 0.7)
Compare a result object to COSMIC V3 Signatures; Select exome or genome for SBS and only genome for DBS or Indel classes
compare_cosmic_v3( result, variant_class, sample_type, metric = "cosine", threshold = 0.9, result_name = deparse(substitute(result)), decimals = 2, same_scale = FALSE )
compare_cosmic_v3( result, variant_class, sample_type, metric = "cosine", threshold = 0.9, result_name = deparse(substitute(result)), decimals = 2, same_scale = FALSE )
result |
A |
variant_class |
Compare to SBS, DBS, or Indel |
sample_type |
exome (SBS only) or genome |
metric |
One of |
threshold |
threshold for similarity |
result_name |
title for plot user result signatures |
decimals |
Specifies rounding for similarity metric displayed. Default
|
same_scale |
If |
Returns the comparisons
data(res) compare_cosmic_v3(res, "SBS", "genome", threshold = 0.8)
data(res) compare_cosmic_v3(res, "SBS", "genome", threshold = 0.8)
Compare two result files to find similar signatures
compare_results( result, other_result, threshold = 0.9, metric = "cosine", result_name = deparse(substitute(result)), other_result_name = deparse(substitute(other_result)), decimals = 2, same_scale = FALSE )
compare_results( result, other_result, threshold = 0.9, metric = "cosine", result_name = deparse(substitute(result)), other_result_name = deparse(substitute(other_result)), decimals = 2, same_scale = FALSE )
result |
A |
other_result |
A second |
threshold |
threshold for similarity |
metric |
One of |
result_name |
title for plot of first result signatures |
other_result_name |
title for plot of second result signatures |
decimals |
Specifies rounding for similarity metric displayed. Default
|
same_scale |
If |
Returns the comparisons
data(res) compare_results(res, res, threshold = 0.8)
data(res) compare_results(res, res, threshold = 0.8)
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v2_sigs)
data(cosmic_v2_sigs)
An object of class musica_result
See [predict_exposure()].
COSMIC v2, <https://cancer.sanger.ac.uk/cosmic/signatures_v2>
Alexandrov, L., Nik-Zainal, S., Wedge, D. et al. (2013) Signatures of mutational processes in human cancer. Nature 500, 415–421 ([Nature](https://www.ncbi.nlm.nih.gov/pubmed/23945592))
Input a cancer subtype to return a list of related COSMIC signatures
cosmic_v2_subtype_map(tumor_type)
cosmic_v2_subtype_map(tumor_type)
tumor_type |
Cancer subtype to view related signatures |
Returns signatures related to a partial string match
cosmic_v2_subtype_map ("lung")
cosmic_v2_subtype_map ("lung")
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_dbs_sigs)
data(cosmic_v3_dbs_sigs)
An object of class musica_result
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_indel_sigs)
data(cosmic_v3_indel_sigs)
An object of class musica_result
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_sbs_sigs)
data(cosmic_v3_sbs_sigs)
An object of class musica_result
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_sbs_sigs_exome)
data(cosmic_v3_sbs_sigs_exome)
An object of class musica_result
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Object containing the count table matrices, their names and descriptions that we generated by provided and by user functions. These are used to discover and infer signatures and exposures.
name
A name that describes the type of table (e.g. "SBS96")
count_table
An array of counts with samples as the columns and motifs as the rows
annotation
A data.frame of annotations with three columns used for plotting: motif, mutation, and context
features
Original features used to generate the count_table
type
The mutation type of each feature, in case we need to plot or model they differently
color_variable
The variable used for plotting colors, selected from the annotation slot
color_mapping
The mapping of the annotations chosen by color_variable to color values for plotting
description
A summary table of the result objects in result_list a list of lists. The nested lists created combined (rbind) tables, and the tables at the first list level are modelled independantly. Combined tables must be named. list("tableA", comboTable = list("tableC", "tableD"))
This function creates a musica object from a variant
table or matrix. The musica class stores variants information,
variant-level annotations, sample-level annotations, and count tables and
is used as input to the mutational signature discovery and prediction
algorithms. The input variant table or matrix must have columns for
chromosome, start position, end position, reference allele,
alternate allele, and sample names. The column names in the variant table
can be mapped using the chromosome_col
, start_col
,
end_col
, ref_col
, alt_col
, and
sample_col parameters
.
create_musica( x, genome, check_ref_chromosomes = TRUE, check_ref_bases = TRUE, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", extra_fields = NULL, standardize_indels = TRUE, convert_dbs = TRUE, verbose = TRUE )
create_musica( x, genome, check_ref_chromosomes = TRUE, check_ref_bases = TRUE, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", extra_fields = NULL, standardize_indels = TRUE, convert_dbs = TRUE, verbose = TRUE )
x |
A data.table, matrix, or data.frame that contains columns with the variant information. |
genome |
A BSgenome object indicating which genome reference the variants and their coordinates were derived from. |
check_ref_chromosomes |
Whether to peform a check to ensure that
the chromosomes in the |
check_ref_bases |
Whether to check if the reference bases in the
|
chromosome_col |
The name of the column that contains the chromosome
reference for each variant. Default |
start_col |
The name of the column that contains the start
position for each variant. Default |
end_col |
The name of the column that contains the end
position for each variant. Default |
ref_col |
The name of the column that contains the reference
base(s) for each variant. Default |
alt_col |
The name of the column that contains the alternative
base(s) for each variant. Default |
sample_col |
The name of the column that contains the sample
id for each variant. Default |
extra_fields |
Which additional fields to extract and include in
the musica object. Default |
standardize_indels |
Flag to convert indel style (e.g. 'C > CAT' becomes '- > AT' and 'GCACA > G' becomes 'CACA > -') |
convert_dbs |
Flag to convert adjacent SBS into DBS (original SBS are removed) |
verbose |
Whether to print status messages during error checking.
Default |
Returns a musica object
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") variants <- extract_variants_from_maf_file(maf_file) g <- select_genome("38") musica <- create_musica(x = variants, genome = g)
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") variants <- extract_variants_from_maf_file(maf_file) g <- select_genome("38") musica <- create_musica(x = variants, genome = g)
Proportional sample exposures will be used as input into the
umap
function to generate a two dimensional UMAP.
create_umap(result, n_neighbors = 30, min_dist = 0.75, spread = 1)
create_umap(result, n_neighbors = 30, min_dist = 0.75, spread = 1)
result |
A |
n_neighbors |
The size of local neighborhood used for views of
manifold approximation. Larger values result in more global the manifold,
while smaller values result in more local data being preserved.
If |
min_dist |
The effective minimum distance between embedded points.
Smaller values will result in a more clustered/clumped embedding where
nearby points on the manifold are drawn closer together, while larger
values will result on a more even dispersal of points. Default |
spread |
The effective scale of embedded points. In combination with
‘min_dist’, this determines how clustered/clumped the embedded points are.
Default |
A musica_result
object with a new UMAP
stored in the UMAP
slot.
See plot_umap to display the UMAP and
umap
for more information on the individual parameters
for generating UMAPs.
data(res_annot) create_umap(result = res_annot)
data(res_annot) create_umap(result = res_annot)
A musica created for testing that includes DBS variants
data(dbs_musica)
data(dbs_musica)
An object of class musica
See [create_musica()].
Mutational signatures and exposures will be discovered using
methods such as Latent Dirichlet Allocation (lda) or Non-Negative
Matrix Factorization (nmf). These algorithms will deconvolute a matrix of
counts for mutation types in each sample to two matrices: 1) a "signature"
matrix containing the probability of each mutation type in each sample and
2) an "exposure" matrix containing the estimated counts for each signature
in each sample. Before mutational discovery can be performed,
variants from samples first need to be stored in a
musica
object using the create_musica function
and mutation count tables need to be created using functions such as
build_standard_table.
discover_signatures( musica, table_name, num_signatures, algorithm = "lda", seed = 1, nstart = 10, par_cores = 1 )
discover_signatures( musica, table_name, num_signatures, algorithm = "lda", seed = 1, nstart = 10, par_cores = 1 )
musica |
A |
table_name |
Name of the table to use for signature discovery. Needs to be the same name supplied to the table building functions such as build_standard_table. |
num_signatures |
Number of signatures to discover. |
algorithm |
Method to use for mutational signature discovery. One of
|
seed |
Seed to be used for the random number generators in the
signature discovery algorithms. Default |
nstart |
Number of independent random starts used in the mutational
signature algorithms. Default |
par_cores |
Number of parallel cores to use. Only used if
|
Returns a A musica_result
object containing
signatures and exposures.
data(musica) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) discover_signatures(musica = musica, table_name = "SBS96", num_signatures = 3, algorithm = "lda", seed = 12345, nstart = 1)
data(musica) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) discover_signatures(musica = musica, table_name = "SBS96", num_signatures = 3, algorithm = "lda", seed = 12345, nstart = 1)
Drops a column from the variant table that the user no longer needs
drop_annotation(musica, column_name)
drop_annotation(musica, column_name)
musica |
A |
column_name |
Name of column to drop |
None
data(musica) drop_annotation(musica, "Variant_Type")
data(musica) drop_annotation(musica, "Variant_Type")
exposure_differential_analysis
is used to run
differential analysis on the signature exposures of annotated samples within
the musica_result
object.
exposure_differential_analysis( musica_result, annotation, method = c("wilcox", "kruskal", "glm.nb"), group1 = NULL, group2 = NULL, ... )
exposure_differential_analysis( musica_result, annotation, method = c("wilcox", "kruskal", "glm.nb"), group1 = NULL, group2 = NULL, ... )
musica_result |
A |
annotation |
Column in the sample_annotations table of the
|
method |
Any method in |
group1 |
character vector used in the Wilcox test. Elements in
|
group2 |
character vector used in the Wilcox test. Elements in
|
... |
Additional arguments to be passed to the chosen method |
A matrix containing statistics summarizing the analysis dependent on the chosen method
data("res_annot") exposure_differential_analysis(res_annot, "Tumor_Subtypes", method="wilcox")
data("res_annot") exposure_differential_analysis(res_annot, "Tumor_Subtypes", method="wilcox")
The exposure
matrix contains estimated amount of
each signature for each sample. Rows correspond to each signature and
columns correspond to each sample.
exposures(result) ## S4 method for signature 'musica_result' exposures(result) exposures(result) <- value ## S4 replacement method for signature 'musica_result,matrix' exposures(result) <- value
exposures(result) ## S4 method for signature 'musica_result' exposures(result) exposures(result) <- value ## S4 replacement method for signature 'musica_result,matrix' exposures(result) <- value
result |
A |
value |
A matrix of samples by signature exposures |
A matrix of exposures
data(res) exposures(res) data(res) exposures(res) <- matrix()
data(res) exposures(res) data(res) exposures(res) <- matrix()
Extract count tables list from a musica object
extract_count_tables(musica)
extract_count_tables(musica)
musica |
A |
List of count tables objects
data(musica) extract_count_tables(musica)
data(musica) extract_count_tables(musica)
Chooses the correct function to extract variants from input based on
the class of the object or the file extension. Different types of objects
can be mixed within the list. For example, the list can include VCF files
and maf objects. Certain parameters such as id
and rename
only apply to VCF objects or files and need to be individually specified
for each VCF. Therefore, these parameters should be suppied as a vector
that is the same length as the number of inputs. If other types of
objects are in the input list, then the value of id
and rename
will be ignored for these items.
extract_variants( inputs, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), fix_vcf_errors = TRUE, extra_fields = NULL, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", verbose = TRUE )
extract_variants( inputs, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), fix_vcf_errors = TRUE, extra_fields = NULL, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", verbose = TRUE )
inputs |
A vector or list of objects or file names. Objects can be
CollapsedVCF, ExpandedVCF, MAF,
an object that inherits from |
id |
A character vector the same length as |
rename |
A character vector the same length as |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF.
See |
filename_as_id |
If set to |
strip_extension |
Only used if |
filter |
Exclude variants that do not have a |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf.
See |
fix_vcf_errors |
Attempt to automatically fix VCF file
formatting errors.
See |
extra_fields |
Optionally extract additional fields from all input
objects. Default |
chromosome_col |
The name of the column that contains the chromosome
reference for each variant. Only used if the input is a matrix or data.frame.
Default |
start_col |
The name of the column that contains the start
position for each variant. Only used if the input is a matrix or data.frame.
Default |
end_col |
The name of the column that contains the end
position for each variant. Only used if the input is a matrix or data.frame.
Default |
ref_col |
The name of the column that contains the reference
base(s) for each variant. Only used if the input is a matrix or data.frame.
Default |
alt_col |
The name of the column that contains the alternative
base(s) for each variant. Only used if the input is a matrix or data.frame.
Default |
sample_col |
The name of the column that contains the sample
id for each variant. Only used if the input is a matrix or data.frame.
Default |
verbose |
Show progress of variant extraction. Default |
Returns a data.table of variants from a vcf
# Get loations of two vcf files and a maf file luad_vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk") lusc_maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") melanoma_vcfs <- list.files(system.file("extdata", package = "musicatk"), pattern = glob2rx("*SKCM*vcf"), full.names = TRUE) # Read all files in at once inputs <- c(luad_vcf_file, melanoma_vcfs, lusc_maf_file) variants <- extract_variants(inputs = inputs) table(variants$sample) # Run again but renaming samples in first four vcfs new_name <- c(paste0("Sample", 1:4), NA) variants <- extract_variants(inputs = inputs, rename = new_name) table(variants$sample)
# Get loations of two vcf files and a maf file luad_vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk") lusc_maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") melanoma_vcfs <- list.files(system.file("extdata", package = "musicatk"), pattern = glob2rx("*SKCM*vcf"), full.names = TRUE) # Read all files in at once inputs <- c(luad_vcf_file, melanoma_vcfs, lusc_maf_file) variants <- extract_variants(inputs = inputs) table(variants$sample) # Run again but renaming samples in first four vcfs new_name <- c(paste0("Sample", 1:4), NA) variants <- extract_variants(inputs = inputs, rename = new_name) table(variants$sample)
Add description
extract_variants_from_maf(maf, extra_fields = NULL)
extract_variants_from_maf(maf, extra_fields = NULL)
maf |
MAF object loaded by read.maf() from the 'maftools' package |
extra_fields |
Optionally extract additional columns from the
maf object. Default |
Returns a data.table of variants from a maf which can be used to
create a musica
object.
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf)
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf)
Add Description - Aaron
extract_variants_from_maf_file(maf_file, extra_fields = NULL)
extract_variants_from_maf_file(maf_file, extra_fields = NULL)
maf_file |
Location of maf file |
extra_fields |
Optionally extract additional columns from the
object. Default |
Returns a data.table of variants from a maf
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") maf <- extract_variants_from_maf_file(maf_file = maf_file)
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") maf <- extract_variants_from_maf_file(maf_file = maf_file)
Add Description
extract_variants_from_matrix( mat, chromosome_col = "Chromosome", start_col = "Start_Position", end_col = "End_Position", ref_col = "Tumor_Seq_Allele1", alt_col = "Tumor_Seq_Allele2", sample_col = "Tumor_Sample_Barcode", extra_fields = NULL )
extract_variants_from_matrix( mat, chromosome_col = "Chromosome", start_col = "Start_Position", end_col = "End_Position", ref_col = "Tumor_Seq_Allele1", alt_col = "Tumor_Seq_Allele2", sample_col = "Tumor_Sample_Barcode", extra_fields = NULL )
mat |
An object that inherits from classes "matrix" or "data.frame" Examples include a matrix, data.frame, or data.table. |
chromosome_col |
The name of the column that contains the chromosome
reference for each variant. Default |
start_col |
The name of the column that contains the start
position for each variant. Default |
end_col |
The name of the column that contains the end
position for each variant. Default |
ref_col |
The name of the column that contains the reference
base(s) for each variant. Default |
alt_col |
The name of the column that contains the alternative
base(s) for each variant. Default |
sample_col |
The name of the column that contains the sample
id for each variant. Default |
extra_fields |
Optionally extract additional columns from the
object. Default |
Returns a data.table of variants from a maf which can be used to
create a musica
object.
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf) variants <- extract_variants_from_matrix(mat = variants, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample")
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf) variants <- extract_variants_from_matrix(mat = variants, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample")
Aaron - Need to describe differnce between ID, and name in the header, and rename in terms of naming the sample. Need to describe differences in multiallelic choices. Also need to describe the automatic error fixing
extract_variants_from_vcf( vcf, id = NULL, rename = NULL, sample_field = NULL, filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL )
extract_variants_from_vcf( vcf, id = NULL, rename = NULL, sample_field = NULL, filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL )
vcf |
Location of vcf file |
id |
ID of the sample to select from VCF. If |
rename |
Rename the sample to this value when extracting variants.
If |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF (e.g.
##SAMPLE=<ID=TUMOR,SampleName=TCGA-01-0001>). If the ID is specified via the
|
filter |
Exclude variants that do not have a |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf. One of |
extra_fields |
Optionally extract additional fields from the |
Returns a data.table of variants from a vcf
vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk") library(VariantAnnotation) vcf <- readVcf(vcf_file) variants <- extract_variants_from_vcf(vcf = vcf)
vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk") library(VariantAnnotation) vcf <- readVcf(vcf_file) variants <- extract_variants_from_vcf(vcf = vcf)
Add Description
extract_variants_from_vcf_file( vcf_file, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL, fix_vcf_errors = TRUE )
extract_variants_from_vcf_file( vcf_file, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL, fix_vcf_errors = TRUE )
vcf_file |
Path to the vcf file |
id |
ID of the sample to select from VCF. If |
rename |
Rename the sample to this value when extracting variants.
If |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF (e.g.
##SAMPLE=<ID=TUMOR,SampleName=TCGA-01-0001>). If the ID is specified via the
|
filename_as_id |
If set to |
strip_extension |
Only used if |
filter |
Exclude variants that do not have a |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf. One of |
extra_fields |
Optionally extract additional fields from the |
fix_vcf_errors |
Attempt to automatically fix VCF file formatting errors. |
Returns a data.table of variants extracted from a vcf
vcf <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk") variants <- extract_variants_from_vcf_file(vcf_file = vcf)
vcf <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk") variants <- extract_variants_from_vcf_file(vcf_file = vcf)
Generate result_grid from musica based on annotation and range of k
generate_result_grid( musica, table_name, algorithm = "lda", annotation = NA, k_start, k_end, n_start = 1, seed = NULL, par_cores = FALSE, verbose = FALSE )
generate_result_grid( musica, table_name, algorithm = "lda", annotation = NA, k_start, k_end, n_start = 1, seed = NULL, par_cores = FALSE, verbose = FALSE )
musica |
A |
table_name |
Name of table used for signature discovery |
algorithm |
Algorithm for signature discovery |
annotation |
Sample annotation to split results into |
k_start |
Lower range of number of signatures for discovery |
k_end |
Upper range of number of signatures for discovery |
n_start |
Number of times to discover signatures and compare based on posterior loglikihood |
seed |
Seed to use for reproducible results, set to null to disable |
par_cores |
Number of parallel cores to use (NMF only) |
verbose |
Whether to output loop iterations |
A result object containing signatures and sample weights
data(musica_sbs96) grid <- generate_result_grid(musica_sbs96, "SBS96", "lda", k_start = 2, k_end = 5)
data(musica_sbs96) grid <- generate_result_grid(musica_sbs96, "SBS96", "lda", k_start = 2, k_end = 5)
The musica
musica contains variants,
count tables, and sample annotations
get_musica(result) ## S4 method for signature 'musica_result' get_musica(result)
get_musica(result) ## S4 method for signature 'musica_result' get_musica(result)
result |
A |
A musica
musica object
data(res) get_musica(res)
data(res) get_musica(res)
A musica created for testing that includes INDEL variants
data(indel_musica)
data(indel_musica)
An object of class musica
See [create_musica()].
To help decide the number of cluster, three different methods are provided: total within cluster sum of squares, average silhouette coefficient, and gap statistics.
k_select( result, method = "wss", clust.method = "kmeans", n = 10, proportional = TRUE )
k_select( result, method = "wss", clust.method = "kmeans", n = 10, proportional = TRUE )
result |
A |
method |
A single character string indicating which statistic to use for plot. Options are "wss" (total within cluster sum of squares), "silhouette" (average silhouette coefficient), and "gap_stat" (gap statistic). Default is "wss". |
clust.method |
A character string indicating clustering method. Options are "kmeans" (default), "hclust" (hierarchical clustering), "hkmeans", "pam", and "clara". |
n |
An integer indicating maximum number of clusters to test. Default is 10. |
proportional |
Logical, indicating if proportional exposure (default) will be used for clustering. |
A ggplot object.
data(res_annot) set.seed(123) #Make an elbow plot k_select(res_annot, method = "wss", n = 6) #Plot average silhouette coefficient against number of clusters k_select(res_annot, method = "silhouette", n = 6) #Plot gap statistics against number of clusters k_select(res_annot, method = "gap_stat", n = 6)
data(res_annot) set.seed(123) #Make an elbow plot k_select(res_annot, method = "wss", n = 6) #Plot average silhouette coefficient against number of clusters k_select(res_annot, method = "silhouette", n = 6) #Plot gap statistics against number of clusters k_select(res_annot, method = "gap_stat", n = 6)
A musica created for testing that includes SBS variants
data(musica)
data(musica)
An object of class musica
See [create_musica()].
A musica created for testing that includes SBS variants and sample annotations
data(musica_annot)
data(musica_annot)
An object of class musica
See [create_musica()].
Object containing the result objects generated from the combination of annotations and a range of k values
grid_params
The parameters the result grid was created using
result_list
A list of result objects with different parameters
grid_table
A summary table of the result objects in result_list
Object containing deconvolved/predicted signatures, sample weights, and the musica object the result was generated from
signatures
A matrix of signatures by mutational motifs
exposures
A matrix of samples by signature weights
table_name
A character vector of table names used to make the result
algorithm
Describes how the signatures/weights were generated
musica
The musica object the results were generated from
umap
List of umap data.frames for plotting and analysis
A musica created for testing that includes SBS variants and a build counts table for them
data(musica_sbs96)
data(musica_sbs96)
An object of class musica
See [build_standard_table()].
A very small musica created for testing that includes SBS variants and a build counts table for them
data(musica_sbs96_tiny)
data(musica_sbs96_tiny)
An object of class musica
See [build_standard_table()].
The primary object that contains variants, count_tables, and samples annotations
variants
data.table
of variants
count_tables
Summary table with per-sample unnormalized motif counts
sample_annotations
Sample-level annotations (e.g. age, sex, primary)
The musicatk Shiny app allows users to perform mutational signature analysis using an interative graphical user interface (GUI)
musicatk(include_version = TRUE, theme = "yeti")
musicatk(include_version = TRUE, theme = "yeti")
include_version |
Include the version number in the header.
Default |
theme |
The theme to use for the GUI. Default |
The shiny app will open. No data will be returned.
## Not run: # Start the app musicatk() ## End(Not run)
## Not run: # Start the app musicatk() ## End(Not run)
Return sample from musica object
name_signatures(result, name_vector)
name_signatures(result, name_vector)
result |
Result object containing signatures and weights |
name_vector |
Vector of user-defined signature names |
Result object with user-defined signatures names
data(res) name_signatures(res, c("smoking", "apobec", "unknown"))
data(res) name_signatures(res, c("smoking", "apobec", "unknown"))
The clustering results can be visualized on a UMAP panel. Three different types of plots can be generated using this function: cluster-by-signature plot, cluster-by-annotation plot, and a single UMAP plot.
plot_cluster( result, clusters, group = "signature", annotation = NULL, plotly = TRUE )
plot_cluster( result, clusters, group = "signature", annotation = NULL, plotly = TRUE )
result |
A |
clusters |
The result generated from cluster_exposure function. |
group |
A single character string indicating the grouping factor. Possible options are: "signature" (columns are signatures in a grid), "annotation" (columns are sample annotation), and "none" (a single UMAP plot). Default is "signature". |
annotation |
Column name of annotation. |
plotly |
If TRUE, the plot will be made interactive using plotly. |
Generate a ggplot or plotly object.
set.seed(123) data(res_annot) #Get clustering result clust_out <- cluster_exposure(result = res_annot, nclust = 2) #UMAP create_umap(result = res_annot) #generate cluster X signature plot plot_cluster(result = res_annot, clusters = clust_out, group = "signature") #generate cluster X annotation plot plot_cluster(result = res_annot, clusters = clust_out, group = "annotation", annotation = "Tumor_Subtypes") #generate a single UMAP plot plot_cluster(result = res_annot, clusters = clust_out, group = "none")
set.seed(123) data(res_annot) #Get clustering result clust_out <- cluster_exposure(result = res_annot, nclust = 2) #UMAP create_umap(result = res_annot) #generate cluster X signature plot plot_cluster(result = res_annot, clusters = clust_out, group = "signature") #generate cluster X annotation plot plot_cluster(result = res_annot, clusters = clust_out, group = "annotation", annotation = "Tumor_Subtypes") #generate a single UMAP plot plot_cluster(result = res_annot, clusters = clust_out, group = "none")
plot_differential_analysis
is used to plot
differential analysis created by exposure_differential_analysis
.
plot_differential_analysis(analysis, analysis_type, samp_num)
plot_differential_analysis(analysis, analysis_type, samp_num)
analysis |
Analysis created by |
analysis_type |
Currently only |
samp_num |
Number of samples that went into the analysis |
Generates a ggplot object
data("res_annot") analysis <- exposure_differential_analysis(res_annot, "Tumor_Subtypes", method="wilcox") plot_differential_analysis(analysis, "glm", 2)
data("res_annot") analysis <- exposure_differential_analysis(res_annot, "Tumor_Subtypes", method="wilcox") plot_differential_analysis(analysis, "glm", 2)
The distributions of mutational signatures can be viewed
with barplots or box/violin plots. Barplots are most useful for viewing
the proportion of signatures within and across samples. The box/violin plots
are most useful for viewing the distributions of signatures with respect to
sample annotations. Samples can be grouped using the group_by
parameter. For barplots, various methods of sorting samples from left
to right can be chosen using the sort_samples
parameter.
plot_exposures( result, plot_type = c("bar", "box", "violin"), proportional = FALSE, group_by = "none", color_by = c("signature", "annotation"), annotation = NULL, num_samples = NULL, sort_samples = "total", threshold = NULL, same_scale = FALSE, add_points = FALSE, point_size = 2, label_x_axis = FALSE, legend = TRUE, plotly = FALSE )
plot_exposures( result, plot_type = c("bar", "box", "violin"), proportional = FALSE, group_by = "none", color_by = c("signature", "annotation"), annotation = NULL, num_samples = NULL, sort_samples = "total", threshold = NULL, same_scale = FALSE, add_points = FALSE, point_size = 2, label_x_axis = FALSE, legend = TRUE, plotly = FALSE )
result |
A |
plot_type |
One of |
proportional |
If |
group_by |
Determines how to group samples into the subplots
(i.e. facets). One of |
color_by |
Determines how to color the bars or box/violins. One of
|
annotation |
Sample annotation used to group the subplots and/or
color the bars, boxes, or violins. Default |
num_samples |
The top number of sorted samples to display. If
|
sort_samples |
This is used to change how samples are sorted in
the barplot from left to right. If set to |
threshold |
Exposures less than this threshold will be set to 0.
This is most useful when more than one signature is supplied to
|
same_scale |
If |
add_points |
If |
point_size |
Size of the points to be plotted on top of the
violin/box plots. Only used when |
label_x_axis |
If |
legend |
If |
plotly |
If |
Generates a ggplot or plotly object
data(res_annot) plot_exposures(res_annot, plot_type = "bar", annotation = "Tumor_Subtypes")
data(res_annot) plot_exposures(res_annot, plot_type = "bar", annotation = "Tumor_Subtypes")
The exposures for different signatures can be
visualized using a heatmap with this function.
Heatmaps make it easier to visualize the data by
representing the magnitude of exposure values
as color in 2-dimensions. The variation in color
intensity can help see if the exposures are clustered
or how they vary over space. Exposures can be
normalized by providing the proportional
argument.
Column annotations can also be seen by passing the col_annot
argument.
plot_heatmap( res_annot, proportional = FALSE, show_column_names = FALSE, show_row_names = TRUE, scale = TRUE, subset_tumor = NULL, subset_signatures = NULL, annotation = NULL, ... )
plot_heatmap( res_annot, proportional = FALSE, show_column_names = FALSE, show_row_names = TRUE, scale = TRUE, subset_tumor = NULL, subset_signatures = NULL, annotation = NULL, ... )
res_annot |
A |
proportional |
If |
show_column_names |
Boolean check. If |
show_row_names |
Boolean check. If |
scale |
Boolean check. If |
subset_tumor |
Users can specify certain tumor types on which they want to subset the exposure matrix for plotting the heatmap. |
subset_signatures |
Users can specify certain signatures on which they want to subset the exposure matrix plotting the heatmap. |
annotation |
Users have the option of plotting the exposure matrix based on their given annotation like Tumor_Subtypes or age. Error given if the user given annotation doesn't exist in the res_annot annotation object. |
... |
Ellipsis used for passing any arguments directly to the ComplexHeatmap's heatmap function. |
Generates a heatmap for using the exposure matrix.
data(res_annot) plot_heatmap(res_annot = res_annot, proportional = TRUE, scale = TRUE, annotation = "Tumor_Subtypes")
data(res_annot) plot_heatmap(res_annot = res_annot, proportional = TRUE, scale = TRUE, annotation = "Tumor_Subtypes")
Displays the proportion of counts for each mutation type across one or more samples.
plot_sample_counts( musica, sample_names, table_name = NULL, text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = TRUE, annotation = NULL )
plot_sample_counts( musica, sample_names, table_name = NULL, text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = TRUE, annotation = NULL )
musica |
A |
sample_names |
Names of the samples to plot. |
table_name |
Name of table used for plotting counts. If |
text_size |
Size of axis text. Default |
show_x_labels |
If |
show_y_labels |
If |
same_scale |
If |
annotation |
Vector of annotations to be displayed in the top right
corner of each sample. Vector length must be equivalent to the number of
samples. Default |
Generates a ggplot object
data(musica_sbs96) plot_sample_counts(musica_sbs96, sample_names = sample_names(musica_sbs96)[1])
data(musica_sbs96) plot_sample_counts(musica_sbs96, sample_names = sample_names(musica_sbs96)[1])
Displays the observed distribution of counts for each mutation type, the distribution of reconstructed counts for each mutation type using the inferred mutational signatures, and the difference between the two distributions.
plot_sample_reconstruction_error(result, sample, plotly = FALSE)
plot_sample_reconstruction_error(result, sample, plotly = FALSE)
result |
A |
sample |
Name of the sample within the
|
plotly |
If |
Generates a ggplot or plotly object
data(res) plot_sample_reconstruction_error(res, "TCGA-ER-A197-06A-32D-A197-08")
data(res) plot_sample_reconstruction_error(res, "TCGA-ER-A197-06A-32D-A197-08")
After mutational signature discovery has been performed, this function
can be used to display the distribution of each mutational signature. The
color_variable
and color_mapping
parameters can be used
to change the default color scheme of the bars.
plot_signatures( result, plotly = FALSE, color_variable = NULL, color_mapping = NULL, text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = TRUE, y_max = NULL, annotation = NULL, percent = TRUE )
plot_signatures( result, plotly = FALSE, color_variable = NULL, color_mapping = NULL, text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = TRUE, y_max = NULL, annotation = NULL, percent = TRUE )
result |
A |
plotly |
If |
color_variable |
Name of the column in the variant annotation data.frame
to use for coloring the mutation type bars. The variant annotation data.frame
can be found within the count table of the |
color_mapping |
A character vector used to map items in the
|
text_size |
Size of axis text. Default |
show_x_labels |
If |
show_y_labels |
If |
same_scale |
If |
y_max |
Vector of maximum y-axis limits for each signature. One value
may also be provided to specify a constant y-axis limit for all signatures.
Vector length must be 1 or equivalent to the number of signatures. Default
|
annotation |
Vector of annotations to be displayed in the top right
corner of each signature. Vector length must be equivalent to the number of
signatures. Default |
percent |
If |
Generates a ggplot or plotly object
data(res) plot_signatures(res)
data(res) plot_signatures(res)
Plots samples on a UMAP scatterplot. Samples can be colored by the levels of mutational signatures or by a annotation variable.
plot_umap( result, color_by = c("signatures", "annotation", "cluster", "none"), proportional = TRUE, annotation = NULL, point_size = 0.7, same_scale = TRUE, add_annotation_labels = FALSE, annotation_label_size = 3, annotation_text_box = TRUE, plotly = FALSE, clust = NULL, legend = TRUE, strip_axes = FALSE )
plot_umap( result, color_by = c("signatures", "annotation", "cluster", "none"), proportional = TRUE, annotation = NULL, point_size = 0.7, same_scale = TRUE, add_annotation_labels = FALSE, annotation_label_size = 3, annotation_text_box = TRUE, plotly = FALSE, clust = NULL, legend = TRUE, strip_axes = FALSE )
result |
A |
color_by |
One of |
proportional |
If |
annotation |
Sample annotation used to color the points. One used
when |
point_size |
Scatter plot point size. Default |
same_scale |
If |
add_annotation_labels |
If |
annotation_label_size |
Size of annotation labels. Only used if
|
annotation_text_box |
Place a white box around the annotation labels
to improve readability. Only used if |
plotly |
If |
clust |
Add cluster labels as annotation |
legend |
Plot legend |
strip_axes |
Remove axes labels for cleaner looking plots |
Generates a ggplot or plotly object
See create_umap to generate a UMAP in a musica result.
data(res_annot) create_umap(res_annot, "Tumor_Subtypes") plot_umap(res_annot, "none")
data(res_annot) create_umap(res_annot, "Tumor_Subtypes") plot_umap(res_annot, "none")
Exposures for samples will be predicted using an existing set
of signatures stored in a musica_result
object.
Algorithms available for prediction include a modify version of "lda"
,
and "decompTumor2Sig"
.
predict_exposure( musica, table_name, signature_res, algorithm = c("lda", "decompTumor2Sig"), signatures_to_use = seq_len(ncol(signatures(signature_res))), verbose = FALSE )
predict_exposure( musica, table_name, signature_res, algorithm = c("lda", "decompTumor2Sig"), signatures_to_use = seq_len(ncol(signatures(signature_res))), verbose = FALSE )
musica |
A |
table_name |
Name of table used for posterior prediction. Must match the table type used to generate the prediction signatures |
signature_res |
Signatures used to predict exposures for the samples
|
algorithm |
Algorithm to use for prediction of exposures. One of
|
signatures_to_use |
Which signatures in the |
verbose |
If |
Returns a A musica_result
object containing
signatures given by the signature_res
parameter and exposures
predicted from these signatures.
data(musica) data(cosmic_v2_sigs) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) result <- predict_exposure(musica = musica, table_name = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda") # Predict using LDA-like algorithm with seed set to 1 set.seed(1) predict_exposure(musica = musica, table_name = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda")
data(musica) data(cosmic_v2_sigs) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) result <- predict_exposure(musica = musica, table_name = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda") # Predict using LDA-like algorithm with seed set to 1 set.seed(1) predict_exposure(musica = musica, table_name = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda")
Reverse complement of a string using biostrings
rc(dna)
rc(dna)
dna |
Input DNA string |
Returns the reverse compliment of the input DNA string
rc("ATGC")
rc("ATGC")
Supplementary data converted from bigWig to bedgraph to GRanges, with low RFD indicating the leading strand and high RFD indicating lagging strand and removing uninformative zero RFD intervals. Timing data is 10kb bins from a colon cancer sample.
data(rep_range)
data(rep_range)
An object of class "GRanges"
; see
[annotate_replication_strand()].
GEO, <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134225>
Sriramachandran, A. M. et al. (2020) Genome-wide Nucleotide-Resolution Mapping of DNA Replication Patterns, Single-Strand Breaks, and Lesions by GLOE-Seq. ([Molecular Cell] (doi:10.1016/j.molcel.2020.03.027))
A musica result created for testing that includes SBS variants with discovered exposures and signatures
data(res)
data(res)
An object of class musica_result
See [discover_signatures()].
A musica result created for testing that includes SBS variants with annotations and discovered exposures and signatures
data(res_annot)
data(res_annot)
An object of class musica_result
See [discover_signatures()].
Sample annotations can be used to store information about
each sample such as tumor type or treatment status. These are used in
downstream plotting functions such as plot_exposures
or
plot_umap
to group or color samples by a particular annotation.
samp_annot(object) ## S4 method for signature 'musica' samp_annot(object) ## S4 method for signature 'musica_result' samp_annot(object) samp_annot(object, name) <- value ## S4 replacement method for signature 'musica,character,vector' samp_annot(object, name) <- value ## S4 replacement method for signature 'musica_result,character,vector' samp_annot(object, name) <- value
samp_annot(object) ## S4 method for signature 'musica' samp_annot(object) ## S4 method for signature 'musica_result' samp_annot(object) samp_annot(object, name) <- value ## S4 replacement method for signature 'musica,character,vector' samp_annot(object, name) <- value ## S4 replacement method for signature 'musica_result,character,vector' samp_annot(object, name) <- value
object |
A |
name |
The name of the new annotation to add. |
value |
A vector containing the new sample annotations. Needs to be the same length as the number of samples in the object. |
A new object with the sample annotations added to the table in the
sample_annotations
slot.
See sample_names
to get a vector of sample names in
the musica
or musica_result
object.
data(res_annot) samp_annot(res_annot) # Add new annotation samp_annot(res_annot, "New_Annotation") <- rep(c("A", "B"), c(3, 4)) samp_annot(res_annot) data(musica) samp_annot(musica, "example") <- rep("ex", 7)
data(res_annot) samp_annot(res_annot) # Add new annotation samp_annot(res_annot, "New_Annotation") <- rep(c("A", "B"), c(3, 4)) samp_annot(res_annot) data(musica) samp_annot(musica, "example") <- rep("ex", 7)
Sample names were included in the sample
column
in the variant object passed to create_musica
. This returns
a unique list of samples names in the order they are inside the
musica
object.
sample_names(object) ## S4 method for signature 'musica' sample_names(object) ## S4 method for signature 'musica_result' sample_names(object)
sample_names(object) ## S4 method for signature 'musica' sample_names(object) ## S4 method for signature 'musica_result' sample_names(object)
object |
A |
A character vector of sample names
data(res) sample_names(res)
data(res) sample_names(res)
Helper function to load common human or mouse genomes
select_genome(x)
select_genome(x)
x |
Select the hg19 or hg38 human genome or the mm9 or mm10 mouse genome in UCSC format |
Returns BSgenome of given version
g <- select_genome(x = "hg38")
g <- select_genome(x = "hg38")
The signature
matrix contains the probability of
mutation motif in each sample. Rows correspond to each motif and
columns correspond to each signature.
signatures(result) ## S4 method for signature 'musica_result' signatures(result) signatures(result) <- value ## S4 replacement method for signature 'musica_result,matrix' signatures(result) <- value
signatures(result) ## S4 method for signature 'musica_result' signatures(result) signatures(result) <- value ## S4 replacement method for signature 'musica_result,matrix' signatures(result) <- value
result |
A |
value |
A matrix of motifs counts by samples |
A matrix of mutational signatures
data(res) signatures(res) data(res) signatures(res) <- matrix()
data(res) signatures(res) data(res) signatures(res) <- matrix()
Creates a new musica object subsetted to only one value of a sample annotation
subset_musica_by_annotation(musica, annot_col, annot_names)
subset_musica_by_annotation(musica, annot_col, annot_names)
musica |
A |
annot_col |
Annotation class to use for subsetting |
annot_names |
Annotational value to subset to |
Returns a new musica object with sample annotations, count tables, and variants subsetted to only contains samples of the specified annotation type
data(musica_sbs96) annot <- read.table(system.file("extdata", "sample_annotations.txt", package = "musicatk"), sep = "\t", header=TRUE) samp_annot(musica_sbs96, "Tumor_Subtypes") <- annot$Tumor_Subtypes musica_sbs96 <- subset_musica_by_annotation(musica_sbs96, "Tumor_Subtypes", "Lung")
data(musica_sbs96) annot <- read.table(system.file("extdata", "sample_annotations.txt", package = "musicatk"), sep = "\t", header=TRUE) samp_annot(musica_sbs96, "Tumor_Subtypes") <- annot$Tumor_Subtypes musica_sbs96 <- subset_musica_by_annotation(musica_sbs96, "Tumor_Subtypes", "Lung")
Creates a new musica subsetted to only samples with enough variants
subset_musica_by_counts(musica, table_name, num_counts)
subset_musica_by_counts(musica, table_name, num_counts)
musica |
A |
table_name |
Name of table used for subsetting |
num_counts |
Minimum sum count value to drop samples |
Returns a new musica object with sample annotations, count tables, and variants subsetted to only contains samples with the specified minimum number of counts (column sums) in the specified table
data(musica_sbs96) subset_musica_by_counts(musica_sbs96, "SBS96", 20)
data(musica_sbs96) subset_musica_by_counts(musica_sbs96, "SBS96", 20)
Subsets a variant table based on Variant Type
subset_variant_by_type(tab, type)
subset_variant_by_type(tab, type)
tab |
Input variant table |
type |
Variant type to return e.g. "SBS", "INS", "DEL", "DBS" |
Returns the input variant table subsetted to only contain variants of the specified variant type
data(musica) annotate_variant_type(musica) subset_variant_by_type(variants(musica), "SBS")
data(musica) annotate_variant_type(musica) subset_variant_by_type(variants(musica), "SBS")
Return sample from musica_variant object
subset_variants_by_samples(musica, sample_name)
subset_variants_by_samples(musica, sample_name)
musica |
A |
sample_name |
Sample name to subset by |
Returns sample data.frame subset to a single sample
data(musica) subset_variants_by_samples(musica, "TCGA-94-7557-01A-11D-2122-08")
data(musica) subset_variants_by_samples(musica, "TCGA-94-7557-01A-11D-2122-08")
The table name
table_selected(result) ## S4 method for signature 'musica_result' table_selected(result)
table_selected(result) ## S4 method for signature 'musica_result' table_selected(result)
result |
A |
Table name used for plotting
data(res) table_selected(res)
data(res) table_selected(res)
The count_tables
contains standard and/or custom
count tables created from variants
tables(object) ## S4 method for signature 'musica' tables(object) ## S4 method for signature 'musica_result' tables(object) tables(musica) <- value ## S4 replacement method for signature 'musica,list' tables(musica) <- value
tables(object) ## S4 method for signature 'musica' tables(object) ## S4 method for signature 'musica_result' tables(object) tables(musica) <- value ## S4 replacement method for signature 'musica,list' tables(musica) <- value
object |
A |
musica |
A |
value |
A list of |
A list of count_tables
data(res) tables(res) data(musica) tables(musica)
data(res) tables(res) data(musica) tables(musica)
The signature
matrix contains the probability of
mutation motif in each sample. Rows correspond to each motif and
columns correspond to each signature.
umap(result) ## S4 method for signature 'musica_result' umap(result) umap(result) <- value ## S4 replacement method for signature 'musica_result,matrix' umap(result) <- value
umap(result) ## S4 method for signature 'musica_result' umap(result) umap(result) <- value ## S4 replacement method for signature 'musica_result,matrix' umap(result) <- value
result |
A |
value |
A list of umap dataframes |
A list of umap dataframes
data(res) umap(res) data(res) umap(res) <- matrix()
data(res) umap(res) data(res) umap(res) <- matrix()
The variants
data.table
contains the variants
and variant-level annotations
variants(object) ## S4 method for signature 'musica' variants(object) ## S4 method for signature 'musica_result' variants(object) variants(musica) <- value ## S4 replacement method for signature 'musica,data.table' variants(musica) <- value
variants(object) ## S4 method for signature 'musica' variants(object) ## S4 method for signature 'musica_result' variants(object) variants(musica) <- value ## S4 replacement method for signature 'musica,data.table' variants(musica) <- value
object |
A |
musica |
A |
value |
A |
A data.table of variants
data(res) variants(res) data(musica) variants(musica)
data(res) variants(res) data(musica) variants(musica)