Title: | Mutational Signature Comprehensive Analysis Toolkit |
---|---|
Description: | Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3. |
Authors: | Aaron Chevalier [aut] (0000-0002-3968-9250), Natasha Gurevich [aut] (0000-0002-0747-6840), Tao Guo [aut] (0009-0005-8960-9203), Joshua D. Campbell [aut, cre] |
Maintainer: | Joshua D. Campbell <[email protected]> |
License: | LGPL-3 |
Version: | 2.1.0 |
Built: | 2024-11-29 08:37:20 UTC |
Source: | https://github.com/bioc/musicatk |
Uses a genome object to find context and add it to the variant table
add_flank_to_variants( musica, g, flank_start, flank_end, build_table = TRUE, overwrite = FALSE )
add_flank_to_variants( musica, g, flank_start, flank_end, build_table = TRUE, overwrite = FALSE )
musica |
Input samples |
g |
A BSgenome object indicating which genome reference the variants and their coordinates were derived from. |
flank_start |
Start of flank area to add, can be positive or negative |
flank_end |
End of flank area to add, can be positive or negative |
build_table |
Automatically build a table using the annotation and add |
overwrite |
Overwrite existing count table |
None it to the musica
data(musica_sbs96_tiny) g <- select_genome("19") add_flank_to_variants(musica_sbs96_tiny, g, 1, 2) add_flank_to_variants(musica_sbs96_tiny, g, -2, -1)
data(musica_sbs96_tiny) g <- select_genome("19") add_flank_to_variants(musica_sbs96_tiny, g, 1, 2) add_flank_to_variants(musica_sbs96_tiny, g, -2, -1)
Add replication strand annotation to SBS variants based on bedgraph file
annotate_replication_strand(musica, rep_range, build_table = TRUE)
annotate_replication_strand(musica, rep_range, build_table = TRUE)
musica |
A |
rep_range |
A GRanges object with replication timing as metadata |
build_table |
Automatically build a table from this annotation |
None
data(musica) data(rep_range) annotate_replication_strand(musica, rep_range)
data(musica) data(rep_range) annotate_replication_strand(musica, rep_range)
Add transcript strand annotation to SBS variants (defined in genes only)
annotate_transcript_strand(musica, genome_build, build_table = TRUE)
annotate_transcript_strand(musica, genome_build, build_table = TRUE)
musica |
A |
genome_build |
Which genome build to use: hg19, hg38, or a custom TxDb object |
build_table |
Automatically build a table from this annotation |
None
data(musica) annotate_transcript_strand(musica, 19)
data(musica) annotate_transcript_strand(musica, 19)
Adds an annotation to the input musica's variant table with length of each variant
annotate_variant_length(musica)
annotate_variant_length(musica)
musica |
Input samples |
None
data(musica) annotate_variant_length(musica) musica
data(musica) annotate_variant_length(musica) musica
Annotate variants with variant type ("SBS", "INS", "DEl", "DBS")
annotate_variant_type(musica)
annotate_variant_type(musica)
musica |
A |
None
data(musica) annotate_variant_type(musica)
data(musica) annotate_variant_type(musica)
Automatic filtering of signatures for exposure prediction gridded across specific annotation
auto_predict_grid( musica, modality, signature_res, algorithm, model_id = NULL, result_name = "result", sample_annotation = NULL, min_exists = 0.05, proportion_samples = 0.25, rare_exposure = 0.4, verbose = TRUE, combine_res = TRUE, make_copy = FALSE, table_name = NULL )
auto_predict_grid( musica, modality, signature_res, algorithm, model_id = NULL, result_name = "result", sample_annotation = NULL, min_exists = 0.05, proportion_samples = 0.25, rare_exposure = 0.4, verbose = TRUE, combine_res = TRUE, make_copy = FALSE, table_name = NULL )
musica |
Input samples to predict signature weights |
modality |
Modality used for posterior prediction (e.g. SBS96) |
signature_res |
Signatures to automatically subset from for prediction |
algorithm |
Algorithm to use for prediction. Choose from "lda_posterior", and decompTumor2Sig |
model_id |
Name of model |
result_name |
Name for result_list entry to save the results to. Default
|
sample_annotation |
Annotation to grid across, if none given, prediction subsetting on all samples together |
min_exists |
Threshold to consider a signature active in a sample |
proportion_samples |
Threshold of samples to consider a signature active in the cohort |
rare_exposure |
A sample will be considered active in the cohort if at least one sample has more than this threshold proportion |
verbose |
Print current annotation value being predicted on |
combine_res |
Automatically combines a list of annotation results into a single result object with zero exposure values for signatures not found in a given annotation's set of samples |
make_copy |
If |
table_name |
Use modality instead |
Returns nothing or a new musica
object,
depending on the make_copy
parameter.
data(musica_annot) data(cosmic_v2_sigs) auto_predict_grid( musica = musica_annot, modality = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda", sample_annotation = "Tumor_Subtypes" ) auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda")
data(musica_annot) data(cosmic_v2_sigs) auto_predict_grid( musica = musica_annot, modality = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda", sample_annotation = "Tumor_Subtypes" ) auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda")
Builds a custom table from specified user variants
build_custom_table( musica, variant_annotation, name, description = character(), data_factor = NA, annotation_df = NULL, features = NULL, type = NULL, color_variable = NULL, color_mapping = NULL, return_instead = FALSE, overwrite = FALSE )
build_custom_table( musica, variant_annotation, name, description = character(), data_factor = NA, annotation_df = NULL, features = NULL, type = NULL, color_variable = NULL, color_mapping = NULL, return_instead = FALSE, overwrite = FALSE )
musica |
A |
variant_annotation |
User column to use for building table |
name |
Table name to refer to (must be unique) |
description |
Optional description of the table content |
data_factor |
Full set of table values, in case some are missing from the data. If NA, a superset of all available unique data values will be used |
annotation_df |
A data.frame of annotations to use for plotting |
features |
A data.frame of the input data from which the count table will be built |
type |
The type of data/mutation in each feature as an Rle object |
color_variable |
The name of the column of annotation_df used for the coloring in plots |
color_mapping |
The mapping from the values in the selected color_variable column to color values for plotting |
return_instead |
Instead of adding to musica object, return the created table |
overwrite |
Overwrite existing count table |
If return_instead = TRUE then the created table object is returned, otherwise the table object is automatically added to the musica's count_tables list and nothing is returned
data(musica) annotate_transcript_strand(musica, "19", build_table = FALSE) build_custom_table(musica, "Transcript_Strand", "Transcript_Strand", data_factor = factor(c("T", "U")) )
data(musica) annotate_transcript_strand(musica, "19", build_table = FALSE) build_custom_table(musica, "Transcript_Strand", "Transcript_Strand", data_factor = factor(c("T", "U")) )
Generates count tables for different mutation type schemas which can be
used as input to the mutational signature discovery or prediction functions.
"SBS96"
generates a table for single base substitutions following the
standard 96 mutation types derived from the trinucleotide context.
"SBS192"
is the 96 mutation type schema with the addition of
transcriptional strand or replication strand information added to each base.
"DBS"
generates a table for the double base substitution schema
used in COSMIC V3. "Indel"
generates a table for insertions and
deletions following the schema used in COSMIC V3.
build_standard_table( musica, g, modality, strand_type = NULL, overwrite = FALSE, verbose = FALSE, table_name = NULL )
build_standard_table( musica, g, modality, strand_type = NULL, overwrite = FALSE, verbose = FALSE, table_name = NULL )
musica |
A |
g |
A BSgenome object indicating which genome reference the variants and their coordinates were derived from. |
modality |
Modality of table to build. One of |
strand_type |
Strand type to use in SBS192 schema. One of
|
overwrite |
If |
verbose |
Show progress bar for processed samples |
table_name |
Use modality instead |
No object will be returned. The count tables will be automatically
added to the musica
object.
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) data(musica) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") data(musica) data(rep_range) annotate_replication_strand(musica, rep_range) build_standard_table(musica, g, "SBS192", "Replication_Strand") data(dbs_musica) build_standard_table(dbs_musica, g, "DBS", overwrite = TRUE) data(indel_musica) build_standard_table(indel_musica, g, modality = "INDEL")
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) data(musica) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") data(musica) data(rep_range) annotate_replication_strand(musica, rep_range) build_standard_table(musica, g, "SBS192", "Replication_Strand") data(dbs_musica) build_standard_table(dbs_musica, g, "DBS", overwrite = TRUE) data(indel_musica) build_standard_table(indel_musica, g, modality = "INDEL")
The count_tables
contains standard and/or custom
count tables created from variants
built_tables(object) ## S4 method for signature 'musica' built_tables(object)
built_tables(object) ## S4 method for signature 'musica' built_tables(object)
object |
A |
The names of created count_tables
data(res) built_tables(res)
data(res) built_tables(res)
Proportional sample exposures will be used as input to perform clustering.
cluster_exposure( musica, model_name, modality = "SBS96", result_name = "result", nclust, proportional = TRUE, method = "kmeans", dis.method = "euclidean", hc.method = "ward.D", clara.samples = 5, iter.max = 10, tol = 1e-15 )
cluster_exposure( musica, model_name, modality = "SBS96", result_name = "result", nclust, proportional = TRUE, method = "kmeans", dis.method = "euclidean", hc.method = "ward.D", clara.samples = 5, iter.max = 10, tol = 1e-15 )
musica |
A |
model_name |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing desired model.
Default |
nclust |
Pre-defined number of clusters. |
proportional |
Logical, indicating if proportional exposure (default) will be used for clustering. |
method |
Clustering algorithms. Options are "kmeans" (K-means), "hkmeans" (hybrid of hierarchical K-means), "hclust" (hierarchical clustering), "pam" (PAM), and "clara" (Clara). |
dis.method |
Methods to calculate dissimilarity matrix. Options are "euclidean" (default), "manhattan", "jaccard", "cosine", and "canberra". |
hc.method |
Methods to perform hierarchical clustering. Options are "ward.D" (default), "ward.D2", "single", "complete", "average", "mcquitty", "median", and "centroid". |
clara.samples |
Number of samples to be drawn from dataset. Only used when "clara" is selected. Default is 5. |
iter.max |
Maximum number of iterations for k-means clustering. |
tol |
Tolerance level for kmeans clustering level iterations |
A one-column data frame with sample IDs as row names and cluster number for each sample.
set.seed(123) data(res_annot) clust_out <- cluster_exposure(res_annot, model_name = "res_annot", nclust = 2)
set.seed(123) data(res_annot) clust_out <- cluster_exposure(res_annot, model_name = "res_annot", nclust = 2)
Combines tables into a single table that can be used for discovery/prediction
combine_count_tables( musica, to_comb, name, description = character(), color_variable = character(), color_mapping = character(), overwrite = FALSE )
combine_count_tables( musica, to_comb, name, description = character(), color_variable = character(), color_mapping = character(), overwrite = FALSE )
musica |
A |
to_comb |
A vector of table names to combine. Each table must already exist within the input musica object |
name |
Name of table build, must be a new name |
description |
Description of the new table |
color_variable |
Annotation column to use for coloring plotted motifs, provided by counts table from input result's musica object |
color_mapping |
Mapping from color_variable to color names, provided by counts table from input result's musica object |
overwrite |
Overwrite existing count table |
None
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") combine_count_tables(musica, c("SBS96", "SBS192_Trans"), "combo")
g <- select_genome("19") data(musica) build_standard_table(musica, g, "SBS96", overwrite = TRUE) annotate_transcript_strand(musica, "19") build_standard_table(musica, g, "SBS192", "Transcript_Strand") combine_count_tables(musica, c("SBS96", "SBS192_Trans"), "combo")
Combine signatures and exposures of different models. Exposure values are zero for samples in an annotation where that signature was not predicted
combine_predict_grid( musica, modality, signature_res, model_ids = NULL, result_name = "result", model_rename = NULL, make_copy = FALSE, table_name = NULL )
combine_predict_grid( musica, modality, signature_res, model_ids = NULL, result_name = "result", model_rename = NULL, make_copy = FALSE, table_name = NULL )
musica |
A |
modality |
Modality used for prediction. |
signature_res |
Signatures to automatically subset from for prediction |
model_ids |
Vector of ids for the models to combine. If null, all models
in the modality and result_list entry will be combined. Default |
result_name |
Name of the result list entry containing the signatures
to plot. Default |
model_rename |
New model identifier. If null, will be combination of
the ids for the models being combined. Deafult |
make_copy |
If |
table_name |
Use modality instead |
Returns nothing or a new musica
object,
depending on the make_copy
parameter.
data(musica_annot) data(cosmic_v2_sigs) grid <- auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda", "Tumor_Subtypes", combine_res = FALSE, make_copy = TRUE ) combined <- combine_predict_grid(grid, "SBS96", cosmic_v2_sigs, make_copy = TRUE)
data(musica_annot) data(cosmic_v2_sigs) grid <- auto_predict_grid(musica_annot, "SBS96", cosmic_v2_sigs, "lda", "Tumor_Subtypes", combine_res = FALSE, make_copy = TRUE ) combined <- combine_predict_grid(grid, "SBS96", cosmic_v2_sigs, make_copy = TRUE)
Compare a result object to COSMIC V2 SBS Signatures (combination whole-exome and whole-genome)
compare_cosmic_v2( musica, model_id, modality = "SBS96", result_name = "result", metric = "cosine", threshold = 0.9, result_rename = deparse(substitute(result)), decimals = 2, same_scale = FALSE )
compare_cosmic_v2( musica, model_id, modality = "SBS96", result_name = "result", metric = "cosine", threshold = 0.9, result_rename = deparse(substitute(result)), decimals = 2, same_scale = FALSE )
musica |
A |
model_id |
The name of the model containing the signatures to compare. |
modality |
Compare to SBS, DBS, or Indel. Default |
result_name |
Name of the result list entry. Default |
metric |
One of |
threshold |
threshold for similarity |
result_rename |
title for plot user result signatures |
decimals |
Specifies rounding for similarity metric displayed. Default
|
same_scale |
If |
Returns the comparisons
data(res) compare_cosmic_v2(res, model_id = "res", threshold = 0.7)
data(res) compare_cosmic_v2(res, model_id = "res", threshold = 0.7)
Compare a result object to COSMIC V3 Signatures; Select exome or genome for SBS and only genome for DBS or Indel classes
compare_cosmic_v3( musica, model_id, sample_type, modality = "SBS96", result_name = "result", metric = "cosine", threshold = 0.9, result_rename = deparse(substitute(model_id)), decimals = 2, same_scale = FALSE )
compare_cosmic_v3( musica, model_id, sample_type, modality = "SBS96", result_name = "result", metric = "cosine", threshold = 0.9, result_rename = deparse(substitute(model_id)), decimals = 2, same_scale = FALSE )
musica |
A |
model_id |
The name of the model containing the signatures to compare. |
sample_type |
exome (SBS only) or genome |
modality |
Compare to SBS, DBS, or Indel. Default |
result_name |
Name of the result list entry. Default |
metric |
One of |
threshold |
threshold for similarity |
result_rename |
title for plot user result signatures |
decimals |
Specifies rounding for similarity metric displayed. Default
|
same_scale |
If |
Returns the comparisons
data(res) compare_cosmic_v3(res, model_id = "res", modality = "SBS96", sample_type = "genome", threshold = 0.8 )
data(res) compare_cosmic_v3(res, model_id = "res", modality = "SBS96", sample_type = "genome", threshold = 0.8 )
Compare the stability and error of various k values to help determine the correct number of signatures (k).
compare_k_vals( musica, modality, reps = 100, min_k = 1, max_k = 10, error_type = "prop", algorithm = "nmf" )
compare_k_vals( musica, modality, reps = 100, min_k = 1, max_k = 10, error_type = "prop", algorithm = "nmf" )
musica |
A |
modality |
The modality to use, either "SBS96", "DBS78", or "IND83". |
reps |
Number of times prediction is performed. For each replicate, the
count table data is perturbed. Multiple replicates allows for stability
analysis by calculating silhouette width on the multiple results. Default
|
min_k |
Lower range of number of signatures for discovery. Default
|
max_k |
Upper range of number of signatures for discovery. Default
|
error_type |
Whether to calculate reconstruction error by proportions
("prop") or raw counts ("raw"). Default |
algorithm |
Algorithm for signature discovery. Default |
a data.frame with stats for each k value tested
data(musica) compare_k_vals(musica, "SBS96", reps = 3, min_k = 1, max_k = 5)
data(musica) compare_k_vals(musica, "SBS96", reps = 3, min_k = 1, max_k = 5)
Compare two result files to find similar signatures
compare_results( musica, model_id, other_model_id, modality = "SBS96", result_name = "result", other_musica = NULL, other_result_name = "result", threshold = 0.9, metric = "cosine", result_rename = deparse(substitute(model_id)), other_result_rename = deparse(substitute(other_model_id)), decimals = 2, same_scale = FALSE )
compare_results( musica, model_id, other_model_id, modality = "SBS96", result_name = "result", other_musica = NULL, other_result_name = "result", threshold = 0.9, metric = "cosine", result_rename = deparse(substitute(model_id)), other_result_rename = deparse(substitute(other_model_id)), decimals = 2, same_scale = FALSE )
musica |
A |
model_id |
The name of the first model to compare. |
other_model_id |
The name of the second model to compare. |
modality |
Modality of results being compared. Default |
result_name |
Name of the result list entry for the first result to
compare. Default |
other_musica |
A second |
other_result_name |
Name of the result list entry for the second result
to compare. Default |
threshold |
threshold for similarity |
metric |
One of |
result_rename |
title for plot of first result signatures |
other_result_rename |
title for plot of second result signatures |
decimals |
Specifies rounding for similarity metric displayed. Default
|
same_scale |
If |
Returns the comparisons
data(res) compare_results(res, model_id = "res", other_model_id = "res", modality = "SBS96", threshold = 0.8 )
data(res) compare_results(res, model_id = "res", other_model_id = "res", modality = "SBS96", threshold = 0.8 )
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v2_sigs)
data(cosmic_v2_sigs)
An object of class result_model
See [predict_exposure()].
COSMIC v2, <https://cancer.sanger.ac.uk/cosmic/signatures_v2>
Alexandrov, L., Nik-Zainal, S., Wedge, D. et al. (2013) Signatures of mutational processes in human cancer. Nature 500, 415–421 ([Nature](https://www.ncbi.nlm.nih.gov/pubmed/23945592))
Input a cancer subtype to return a list of related COSMIC signatures
cosmic_v2_subtype_map(tumor_type)
cosmic_v2_subtype_map(tumor_type)
tumor_type |
Cancer subtype to view related signatures |
Returns signatures related to a partial string match
cosmic_v2_subtype_map("lung")
cosmic_v2_subtype_map("lung")
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_dbs_sigs)
data(cosmic_v3_dbs_sigs)
An object of class result_model
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_indel_sigs)
data(cosmic_v3_indel_sigs)
An object of class result_model
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_sbs_sigs)
data(cosmic_v3_sbs_sigs)
An object of class result_model
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Data from COSMIC formatted to be used for prediction with individual tumors and cohorts.
data(cosmic_v3_sbs_sigs_exome)
data(cosmic_v3_sbs_sigs_exome)
An object of class result_model
.
See [predict_exposure()].
COSMIC v3, <https://cancer.sanger.ac.uk/cosmic/signatures>
Alexandrov, L.B., Kim, J., Haradhvala, N.J. et al. (2020) The repertoire of mutational signatures in human cancer. Nature 578, 94–101 ([Nature](https://doi.org/10.1038/s41586-020-1943-3))
Object containing the count table matrices, their names and descriptions that we generated by provided and by user functions. These are used to discover and infer signatures and exposures.
name
A name that describes the type of table (e.g. "SBS96")
count_table
An array of counts with samples as the columns and motifs as the rows
annotation
A data.frame of annotations with three columns used for plotting: motif, mutation, and context
features
Original features used to generate the count_table
type
The mutation type of each feature, in case we need to plot or model they differently
color_variable
The variable used for plotting colors, selected from the annotation slot
color_mapping
The mapping of the annotations chosen by color_variable to color values for plotting
description
A summary table of the result objects in result_list a list of lists. The nested lists created combined (rbind) tables, and the tables at the first list level are modelled independantly. Combined tables must be named. list("tableA", comboTable = list("tableC", "tableD"))
This function creates a musica object from a mutation count table or matrix. The musica class stores variants information, variant-level annotations, sample-level annotations, and count tables and is used as input to the mutational signature discovery and prediction algorithms.
create_musica_from_counts(x, variant_class)
create_musica_from_counts(x, variant_class)
x |
A data.table, matrix, or data.frame that contains counts of mutation types for each sample, with samples as columns. |
variant_class |
Mutations are SBS, DBS, or Indel. |
Returns a musica object
data(musica) count_table <- get_count_table(extract_count_tables(musica)$SBS96) musica <- create_musica_from_counts(count_table, "SBS96")
data(musica) count_table <- get_count_table(extract_count_tables(musica)$SBS96) musica <- create_musica_from_counts(count_table, "SBS96")
This function creates a musica object from a variant
table or matrix. The musica class stores variants information,
variant-level annotations, sample-level annotations, and count tables and
is used as input to the mutational signature discovery and prediction
algorithms. The input variant table or matrix must have columns for
chromosome, start position, end position, reference allele,
alternate allele, and sample names. The column names in the variant table
can be mapped using the chromosome_col
, start_col
,
end_col
, ref_col
, alt_col
, and
sample_col parameters
.
create_musica_from_variants( x, genome, check_ref_chromosomes = TRUE, check_ref_bases = TRUE, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", extra_fields = NULL, standardize_indels = TRUE, convert_dbs = TRUE, verbose = TRUE )
create_musica_from_variants( x, genome, check_ref_chromosomes = TRUE, check_ref_bases = TRUE, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", extra_fields = NULL, standardize_indels = TRUE, convert_dbs = TRUE, verbose = TRUE )
x |
A data.table, matrix, or data.frame that contains columns with the variant information. |
genome |
A BSgenome object indicating which genome reference the variants and their coordinates were derived from. |
check_ref_chromosomes |
Whether to peform a check to ensure that
the chromosomes in the |
check_ref_bases |
Whether to check if the reference bases in the
|
chromosome_col |
The name of the column that contains the chromosome
reference for each variant. Default |
start_col |
The name of the column that contains the start
position for each variant. Default |
end_col |
The name of the column that contains the end
position for each variant. Default |
ref_col |
The name of the column that contains the reference
base(s) for each variant. Default |
alt_col |
The name of the column that contains the alternative
base(s) for each variant. Default |
sample_col |
The name of the column that contains the sample
id for each variant. Default |
extra_fields |
Which additional fields to extract and include in
the musica object. Default |
standardize_indels |
Flag to convert indel style (e.g. 'C > CAT' becomes '- > AT' and 'GCACA > G' becomes 'CACA > -') |
convert_dbs |
Flag to convert adjacent SBS into DBS (original SBS are removed) |
verbose |
Whether to print status messages during error checking.
Default |
Returns a musica object
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) variants <- extract_variants_from_maf_file(maf_file) g <- select_genome("38") musica <- create_musica_from_variants(x = variants, genome = g)
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) variants <- extract_variants_from_maf_file(maf_file) g <- select_genome("38") musica <- create_musica_from_variants(x = variants, genome = g)
This function creates a result_model object from signatures, exposures, and a mutation count table.
create_result_model(signatures, exposures, model_id, modality)
create_result_model(signatures, exposures, model_id, modality)
signatures |
A matrix or data.frame of signatures by mutational motifs |
exposures |
A matrix or data.frame of samples by signature weights |
model_id |
Name of model |
modality |
Modality of the model |
A result_model object
signatures <- signatures(res, "result", "SBS96", "res") exposures <- exposures(res, "result", "SBS96", "res") model <- create_result_model(signatures, exposures, "example_model", "SBS96")
signatures <- signatures(res, "result", "SBS96", "res") exposures <- exposures(res, "result", "SBS96", "res") model <- create_result_model(signatures, exposures, "example_model", "SBS96")
Proportional sample exposures will be used as input into the
umap
function to generate a two dimensional UMAP.
create_umap( musica, model_name, modality = "SBS96", result_name = "result", n_neighbors = 30, min_dist = 0.75, spread = 1 )
create_umap( musica, model_name, modality = "SBS96", result_name = "result", n_neighbors = 30, min_dist = 0.75, spread = 1 )
musica |
A |
model_name |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing the model.
Default |
n_neighbors |
The size of local neighborhood used for views of
manifold approximation. Larger values result in more global the manifold,
while smaller values result in more local data being preserved.
If |
min_dist |
The effective minimum distance between embedded points.
Smaller values will result in a more clustered/clumped embedding where
nearby points on the manifold are drawn closer together, while larger
values will result on a more even dispersal of points. Default |
spread |
The effective scale of embedded points. In combination with
‘min_dist’, this determines how clustered/clumped the embedded points are.
Default |
A musica
object with a new UMAP
stored in the UMAP
slot of the result_model
object for the model.
See plot_umap to display the UMAP and
umap
for more information on the individual parameters
for generating UMAPs.
data(res_annot) create_umap(res_annot, model_name = "res_annot")
data(res_annot) create_umap(res_annot, model_name = "res_annot")
Credible intervals for the model
credible_intervals(x, ...) ## S4 method for signature 'musica' credible_intervals(x, result, modality, model_id) ## S4 method for signature 'result_collection' credible_intervals(x, modality, model_id) ## S4 method for signature 'result_model' credible_intervals(x) credible_intervals(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' credible_intervals(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' credible_intervals(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' credible_intervals(x) <- value
credible_intervals(x, ...) ## S4 method for signature 'musica' credible_intervals(x, result, modality, model_id) ## S4 method for signature 'result_collection' credible_intervals(x, modality, model_id) ## S4 method for signature 'result_model' credible_intervals(x) credible_intervals(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' credible_intervals(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' credible_intervals(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' credible_intervals(x) <- value
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the
credible_intervals. Used when |
modality |
Modality to assign the credible_intervals. Used when
|
model_id |
Model identifier to assign the credible_intervals. Used when
|
value |
List of credible intervals |
The credible intervals for the model
data(res) credible_intervals(res, "result", "SBS96", "res")
data(res) credible_intervals(res, "result", "SBS96", "res")
A musica created for testing that includes DBS variants
data(dbs_musica)
data(dbs_musica)
An object of class musica
See [create_musica_from_variants()] and [create_musica_from_counts()].
Mutational signatures and exposures will be discovered using
methods such as Latent Dirichlet Allocation (lda) or Non-Negative
Matrix Factorization (nmf). These algorithms will deconvolute a matrix of
counts for mutation types in each sample to two matrices: 1) a "signature"
matrix containing the probability of each mutation type in each sample and
2) an "exposure" matrix containing the estimated counts for each signature
in each sample. Before mutational discovery can be performed, samples first
need to be stored in a musica
object using the
create_musica_from_variants or create_musica_from_counts
function and mutation count tables need to be created using functions such as
build_standard_table if create_musica_from_counts was not used.
discover_signatures( musica, modality, num_signatures, algorithm = "lda", result_name = "result", model_id = NULL, seed = 1, nstart = 10, par_cores = 1, make_copy = FALSE, table_name = NULL )
discover_signatures( musica, modality, num_signatures, algorithm = "lda", result_name = "result", model_id = NULL, seed = 1, nstart = 10, par_cores = 1, make_copy = FALSE, table_name = NULL )
musica |
A |
modality |
Modality to use for signature discovery. Needs to be the same name supplied to the table building functions such as build_standard_table. |
num_signatures |
Number of signatures to discover. |
algorithm |
Method to use for mutational signature discovery. One of
|
result_name |
Name for result_list entry to save the results to. Default
|
model_id |
Identifier for the result. If |
seed |
Seed to be used for the random number generators in the
signature discovery algorithms. Default |
nstart |
Number of independent random starts used in the mutational
signature algorithms. Default |
par_cores |
Number of parallel cores to use. Only used if
|
make_copy |
If |
table_name |
Use modality instead |
Returns nothing or a new musica
object,
depending on the make_copy
parameter.
data(musica) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) discover_signatures( musica = musica, modality = "SBS96", num_signatures = 3, algorithm = "lda", seed = 12345, nstart = 1 )
data(musica) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) discover_signatures( musica = musica, modality = "SBS96", num_signatures = 3, algorithm = "lda", seed = 12345, nstart = 1 )
Drops a column from the variant table that the user no longer needs
drop_annotation(musica, column_name)
drop_annotation(musica, column_name)
musica |
A |
column_name |
Name of column to drop |
None
data(musica) drop_annotation(musica, "Variant_Type")
data(musica) drop_annotation(musica, "Variant_Type")
exposure_differential_analysis
is used to run
differential analysis on the signature exposures of annotated samples.
exposure_differential_analysis( musica, model_name, annotation, modality = "SBS96", result_name = "result", method = c("wilcox", "kruskal", "glm.nb"), group1 = NULL, group2 = NULL, ... )
exposure_differential_analysis( musica, model_name, annotation, modality = "SBS96", result_name = "result", method = c("wilcox", "kruskal", "glm.nb"), group1 = NULL, group2 = NULL, ... )
musica |
A |
model_name |
The name of the model. |
annotation |
Column in the sample_annotations table of the
|
modality |
The modality. Must be "SBS96", "DBS78", or "IND83". Default
|
result_name |
Name of the result list entry. Default |
method |
Any method in |
group1 |
character vector used in the Wilcox test. Elements in
|
group2 |
character vector used in the Wilcox test. Elements in
|
... |
Additional arguments to be passed to the chosen method |
A matrix containing statistics summarizing the analysis dependent on the chosen method
data("res_annot") exposure_differential_analysis(res_annot, model_name = "res_annot", annotation = "Tumor_Subtypes", method = "wilcox" )
data("res_annot") exposure_differential_analysis(res_annot, model_name = "res_annot", annotation = "Tumor_Subtypes", method = "wilcox" )
The exposure
matrix contains estimated amount of
each signature for each sample. Rows correspond to each signature and
columns correspond to each sample.
exposures(x, ...) ## S4 method for signature 'musica' exposures(x, result, modality, model_id) ## S4 method for signature 'result_collection' exposures(x, modality, model_id) ## S4 method for signature 'result_model' exposures(x) exposures(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' exposures(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' exposures(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' exposures(x) <- value
exposures(x, ...) ## S4 method for signature 'musica' exposures(x, result, modality, model_id) ## S4 method for signature 'result_collection' exposures(x, modality, model_id) ## S4 method for signature 'result_model' exposures(x) exposures(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' exposures(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' exposures(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' exposures(x) <- value
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the exposures.
Used when |
modality |
Modality to assign the exposures. Used when
|
model_id |
Model identifier to assign the exposures. Used when
|
value |
A matrix of samples by signature exposures |
A matrix of exposures
data(res) exposures(res, "result", "SBS96", "res") data(res) exposures(res, "result", "SBS96", "res") <- matrix()
data(res) exposures(res, "result", "SBS96", "res") data(res) exposures(res, "result", "SBS96", "res") <- matrix()
Extract count tables list from a musica object
extract_count_tables(musica)
extract_count_tables(musica)
musica |
A |
List of count tables objects
data(musica) extract_count_tables(musica)
data(musica) extract_count_tables(musica)
Chooses the correct function to extract variants from input based on
the class of the object or the file extension. Different types of objects
can be mixed within the list. For example, the list can include VCF files
and maf objects. Certain parameters such as id
and rename
only apply to VCF objects or files and need to be individually specified
for each VCF. Therefore, these parameters should be suppied as a vector
that is the same length as the number of inputs. If other types of
objects are in the input list, then the value of id
and rename
will be ignored for these items.
extract_variants( inputs, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), fix_vcf_errors = TRUE, extra_fields = NULL, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", verbose = TRUE )
extract_variants( inputs, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), fix_vcf_errors = TRUE, extra_fields = NULL, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample", verbose = TRUE )
inputs |
A vector or list of objects or file names. Objects can be
CollapsedVCF, ExpandedVCF, MAF,
an object that inherits from |
id |
A character vector the same length as |
rename |
A character vector the same length as |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF.
See |
filename_as_id |
If set to |
strip_extension |
Only used if |
filter |
Exclude variants that do not have a |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf.
See |
fix_vcf_errors |
Attempt to automatically fix VCF file
formatting errors.
See |
extra_fields |
Optionally extract additional fields from all input
objects. Default |
chromosome_col |
The name of the column that contains the chromosome
reference for each variant. Only used if the input is a matrix or data.frame.
Default |
start_col |
The name of the column that contains the start
position for each variant. Only used if the input is a matrix or data.frame.
Default |
end_col |
The name of the column that contains the end
position for each variant. Only used if the input is a matrix or data.frame.
Default |
ref_col |
The name of the column that contains the reference
base(s) for each variant. Only used if the input is a matrix or data.frame.
Default |
alt_col |
The name of the column that contains the alternative
base(s) for each variant. Only used if the input is a matrix or data.frame.
Default |
sample_col |
The name of the column that contains the sample
id for each variant. Only used if the input is a matrix or data.frame.
Default |
verbose |
Show progress of variant extraction. Default |
Returns a data.table of variants from a vcf
# Get loations of two vcf files and a maf file luad_vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk" ) lusc_maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) melanoma_vcfs <- list.files(system.file("extdata", package = "musicatk"), pattern = glob2rx("*SKCM*vcf"), full.names = TRUE ) # Read all files in at once inputs <- c(luad_vcf_file, melanoma_vcfs, lusc_maf_file) variants <- extract_variants(inputs = inputs) table(variants$sample) # Run again but renaming samples in first four vcfs new_name <- c(paste0("Sample", 1:4), NA) variants <- extract_variants(inputs = inputs, rename = new_name) table(variants$sample)
# Get loations of two vcf files and a maf file luad_vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk" ) lusc_maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) melanoma_vcfs <- list.files(system.file("extdata", package = "musicatk"), pattern = glob2rx("*SKCM*vcf"), full.names = TRUE ) # Read all files in at once inputs <- c(luad_vcf_file, melanoma_vcfs, lusc_maf_file) variants <- extract_variants(inputs = inputs) table(variants$sample) # Run again but renaming samples in first four vcfs new_name <- c(paste0("Sample", 1:4), NA) variants <- extract_variants(inputs = inputs, rename = new_name) table(variants$sample)
Add description
extract_variants_from_maf(maf, extra_fields = NULL)
extract_variants_from_maf(maf, extra_fields = NULL)
maf |
MAF object loaded by read.maf() from the 'maftools' package |
extra_fields |
Optionally extract additional columns from the
maf object. Default |
Returns a data.table of variants from a maf which can be used to
create a musica
object.
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf)
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf)
Add Description - Aaron
extract_variants_from_maf_file(maf_file, extra_fields = NULL)
extract_variants_from_maf_file(maf_file, extra_fields = NULL)
maf_file |
Location of maf file |
extra_fields |
Optionally extract additional columns from the
object. Default |
Returns a data.table of variants from a maf
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) maf <- extract_variants_from_maf_file(maf_file = maf_file)
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) maf <- extract_variants_from_maf_file(maf_file = maf_file)
Add Description
extract_variants_from_matrix( mat, chromosome_col = "Chromosome", start_col = "Start_Position", end_col = "End_Position", ref_col = "Tumor_Seq_Allele1", alt_col = "Tumor_Seq_Allele2", sample_col = "Tumor_Sample_Barcode", extra_fields = NULL )
extract_variants_from_matrix( mat, chromosome_col = "Chromosome", start_col = "Start_Position", end_col = "End_Position", ref_col = "Tumor_Seq_Allele1", alt_col = "Tumor_Seq_Allele2", sample_col = "Tumor_Sample_Barcode", extra_fields = NULL )
mat |
An object that inherits from classes "matrix" or "data.frame" Examples include a matrix, data.frame, or data.table. |
chromosome_col |
The name of the column that contains the chromosome
reference for each variant. Default |
start_col |
The name of the column that contains the start
position for each variant. Default |
end_col |
The name of the column that contains the end
position for each variant. Default |
ref_col |
The name of the column that contains the reference
base(s) for each variant. Default |
alt_col |
The name of the column that contains the alternative
base(s) for each variant. Default |
sample_col |
The name of the column that contains the sample
id for each variant. Default |
extra_fields |
Optionally extract additional columns from the
object. Default |
Returns a data.table of variants from a maf which can be used to
create a musica
object.
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf) variants <- extract_variants_from_matrix( mat = variants, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample" )
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk" ) library(maftools) maf <- read.maf(maf_file) variants <- extract_variants_from_maf(maf = maf) variants <- extract_variants_from_matrix( mat = variants, chromosome_col = "chr", start_col = "start", end_col = "end", ref_col = "ref", alt_col = "alt", sample_col = "sample" )
Aaron - Need to describe differnce between ID, and name in the header, and rename in terms of naming the sample. Need to describe differences in multiallelic choices. Also need to describe the automatic error fixing
extract_variants_from_vcf( vcf, id = NULL, rename = NULL, sample_field = NULL, filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL )
extract_variants_from_vcf( vcf, id = NULL, rename = NULL, sample_field = NULL, filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL )
vcf |
Location of vcf file |
id |
ID of the sample to select from VCF. If |
rename |
Rename the sample to this value when extracting variants.
If |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF (e.g.
##SAMPLE=<ID=TUMOR,SampleName=TCGA-01-0001>). If the ID is specified via the
|
filter |
Exclude variants that do not have a |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf. One of |
extra_fields |
Optionally extract additional fields from the |
Returns a data.table of variants from a vcf
vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk" ) library(VariantAnnotation) vcf <- readVcf(vcf_file) variants <- extract_variants_from_vcf(vcf = vcf)
vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk" ) library(VariantAnnotation) vcf <- readVcf(vcf_file) variants <- extract_variants_from_vcf(vcf = vcf)
Add Description
extract_variants_from_vcf_file( vcf_file, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL, fix_vcf_errors = TRUE )
extract_variants_from_vcf_file( vcf_file, id = NULL, rename = NULL, sample_field = NULL, filename_as_id = FALSE, strip_extension = c(".vcf", ".vcf.gz", ".gz"), filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL, fix_vcf_errors = TRUE )
vcf_file |
Path to the vcf file |
id |
ID of the sample to select from VCF. If |
rename |
Rename the sample to this value when extracting variants.
If |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF (e.g.
##SAMPLE=<ID=TUMOR,SampleName=TCGA-01-0001>). If the ID is specified via the
|
filename_as_id |
If set to |
strip_extension |
Only used if |
filter |
Exclude variants that do not have a |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf. One of |
extra_fields |
Optionally extract additional fields from the |
fix_vcf_errors |
Attempt to automatically fix VCF file formatting errors. |
Returns a data.table of variants extracted from a vcf
vcf <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk" ) variants <- extract_variants_from_vcf_file(vcf_file = vcf)
vcf <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk" ) variants <- extract_variants_from_vcf_file(vcf_file = vcf)
Generate result_grid from musica based on annotation and range of k
generate_result_grid( musica, modality, algorithm = "lda", annotation = NA, k_start, k_end, result_name = "result_grid", n_start = 1, seed = NULL, par_cores = FALSE, verbose = FALSE, make_copy = FALSE, table_name = NULL )
generate_result_grid( musica, modality, algorithm = "lda", annotation = NA, k_start, k_end, result_name = "result_grid", n_start = 1, seed = NULL, par_cores = FALSE, verbose = FALSE, make_copy = FALSE, table_name = NULL )
musica |
A |
modality |
Modality used for signature discovery |
algorithm |
Algorithm for signature discovery |
annotation |
Sample annotation to split results into |
k_start |
Lower range of number of signatures for discovery |
k_end |
Upper range of number of signatures for discovery |
result_name |
Name for result_list entry to save the results to. Default
|
n_start |
Number of times to discover signatures and compare based on posterior loglikihood |
seed |
Seed to use for reproducible results, set to null to disable |
par_cores |
Number of parallel cores to use (NMF only) |
verbose |
Whether to output loop iterations |
make_copy |
If |
table_name |
Use modality instead |
Returns nothing or a new musica
object,
depending on the make_copy
parameter.
data(musica_sbs96) grid <- generate_result_grid(musica_sbs96, "SBS96", "lda", k_start = 2, k_end = 5 )
data(musica_sbs96) grid <- generate_result_grid(musica_sbs96, "SBS96", "lda", k_start = 2, k_end = 5 )
The count table
get_count_table(count_table)
get_count_table(count_table)
count_table |
A |
The count table
modality
list contains model results for a modality
get_modality(x, ...) ## S4 method for signature 'musica' get_modality(x, result, modality) ## S4 method for signature 'result_collection' get_modality(x, modality)
get_modality(x, ...) ## S4 method for signature 'musica' get_modality(x, result, modality) ## S4 method for signature 'result_collection' get_modality(x, modality)
x |
A |
... |
Other inputs |
result |
The name of the result_list entry. |
modality |
The modality. |
A list of modality which contains result_model objects
data(res) get_modality(res, "result", "SBS96")
data(res) get_modality(res, "result", "SBS96")
Extract the result_model
object from the
musica
or result_collection
object
that contains the model.
get_model(x, ...) ## S4 method for signature 'musica' get_model(x, result, modality, model) ## S4 method for signature 'result_collection' get_model(x, modality, model)
get_model(x, ...) ## S4 method for signature 'musica' get_model(x, result, modality, model) ## S4 method for signature 'result_collection' get_model(x, modality, model)
x |
A |
... |
Other inputs |
result |
The name of the result_list entry. |
modality |
The modality. |
model |
The name of the model. |
A result_model
object
data(res) get_model(res, "result", "SBS96", "res")
data(res) get_model(res, "result", "SBS96", "res")
The result_list
contains results from various runs
get_result_list_entry(object, result_name) ## S4 method for signature 'musica,character' get_result_list_entry(object, result_name)
get_result_list_entry(object, result_name) ## S4 method for signature 'musica,character' get_result_list_entry(object, result_name)
object |
A |
result_name |
The name of the result_list entry. |
A list of results
data(res) get_result_list_entry(res, "result")
data(res) get_result_list_entry(res, "result")
The hyperparameter
contain list of prior and tuning
parameters
hyperparameter(x, ...) ## S4 method for signature 'musica' hyperparameter(x, result) ## S4 method for signature 'result_collection' hyperparameter(x) hyperparameter(x, ...) <- value ## S4 replacement method for signature 'musica,list' hyperparameter(x, result) <- value ## S4 replacement method for signature 'result_collection,list' hyperparameter(x) <- value
hyperparameter(x, ...) ## S4 method for signature 'musica' hyperparameter(x, result) ## S4 method for signature 'result_collection' hyperparameter(x) hyperparameter(x, ...) <- value ## S4 replacement method for signature 'musica,list' hyperparameter(x, result) <- value ## S4 replacement method for signature 'result_collection,list' hyperparameter(x) <- value
x |
A |
... |
Other inputs |
result |
The name of the result_list entry. |
value |
A |
A list of hyperparameters
data(res) hyperparameter(res, "result")
data(res) hyperparameter(res, "result")
A musica created for testing that includes INDEL variants
data(indel_musica)
data(indel_musica)
An object of class musica
See [create_musica_from_variants()] and [create_musica_from_counts()].
To help decide the number of cluster, three different methods are provided: total within cluster sum of squares, average silhouette coefficient, and gap statistics.
k_select( musica, model_name, modality = "SBS96", result_name = "result", method = "wss", clust.method = "kmeans", n = 10, proportional = TRUE )
k_select( musica, model_name, modality = "SBS96", result_name = "result", method = "wss", clust.method = "kmeans", n = 10, proportional = TRUE )
musica |
A |
model_name |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing desired model.
Default |
method |
A single character string indicating which statistic to use for plot. Options are "wss" (total within cluster sum of squares), "silhouette" (average silhouette coefficient), and "gap_stat" (gap statistic). Default is "wss". |
clust.method |
A character string indicating clustering method. Options are "kmeans" (default), "hclust" (hierarchical clustering), "hkmeans", "pam", and "clara". |
n |
An integer indicating maximum number of clusters to test. Default is 10. |
proportional |
Logical, indicating if proportional exposure (default) will be used for clustering. |
A ggplot object.
data(res_annot) set.seed(123) # Make an elbow plot k_select(res_annot, model_name = "res_annot", method = "wss", n = 6) # Plot average silhouette coefficient against number of clusters k_select(res_annot, model_name = "res_annot", method = "silhouette", n = 6) # Plot gap statistics against number of clusters k_select(res_annot, model_name = "res_annot", method = "gap_stat", n = 6)
data(res_annot) set.seed(123) # Make an elbow plot k_select(res_annot, model_name = "res_annot", method = "wss", n = 6) # Plot average silhouette coefficient against number of clusters k_select(res_annot, model_name = "res_annot", method = "silhouette", n = 6) # Plot gap statistics against number of clusters k_select(res_annot, model_name = "res_annot", method = "gap_stat", n = 6)
Metrics for the model
metrics(x, ...) ## S4 method for signature 'musica' metrics(x, result, modality, model_id) ## S4 method for signature 'result_collection' metrics(x, modality, model_id) ## S4 method for signature 'result_model' metrics(x) metrics(x, ...) <- value ## S4 replacement method for signature 'musica,SimpleList' metrics(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,SimpleList' metrics(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,SimpleList' metrics(x) <- value
metrics(x, ...) ## S4 method for signature 'musica' metrics(x, result, modality, model_id) ## S4 method for signature 'result_collection' metrics(x, modality, model_id) ## S4 method for signature 'result_model' metrics(x) metrics(x, ...) <- value ## S4 replacement method for signature 'musica,SimpleList' metrics(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,SimpleList' metrics(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,SimpleList' metrics(x) <- value
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the
metrics. Used when |
modality |
Modality to assign the metrics. Used when
|
model_id |
Model identifier to assign the metrics. Used when
|
value |
List of metrics |
The metrics for the model
data(res) metrics(res, "result", "SBS96", "res")
data(res) metrics(res, "result", "SBS96", "res")
The modality
modality(x, ...) ## S4 method for signature 'result_collection' modality(x, modality, model_id) ## S4 method for signature 'result_model' modality(x) modality(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' modality(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' modality(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' modality(x) <- value
modality(x, ...) ## S4 method for signature 'result_collection' modality(x, modality, model_id) ## S4 method for signature 'result_model' modality(x) modality(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' modality(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' modality(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' modality(x) <- value
x |
A |
... |
Other inputs
|
modality |
Modality to assign the modality. Used when
|
model_id |
Model identifier to assign the modality. Used when
|
value |
A modality |
result |
Name of result from result_list to assign the
modality. Used when |
The modality for the model
data(res) modality(res, "result", "SBS96", "res")
data(res) modality(res, "result", "SBS96", "res")
Model identifier
model_id(x, ...) ## S4 method for signature 'musica' model_id(x, result, modality, model_id) ## S4 method for signature 'result_collection' model_id(x, modality, model_id) ## S4 method for signature 'result_model' model_id(x) model_id(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' model_id(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' model_id(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' model_id(x) <- value ## S4 method for signature 'musica' modality(x, result, modality, model_id)
model_id(x, ...) ## S4 method for signature 'musica' model_id(x, result, modality, model_id) ## S4 method for signature 'result_collection' model_id(x, modality, model_id) ## S4 method for signature 'result_model' model_id(x) model_id(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' model_id(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' model_id(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' model_id(x) <- value ## S4 method for signature 'musica' modality(x, result, modality, model_id)
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the
model_id. Used when |
modality |
Modality to assign the model_id. Used when
|
model_id |
Model identifier to assign the model_id. Used when
|
value |
Model identifier |
The model_id for the model
data(res) model_id(res, "result", "SBS96", "res")
data(res) model_id(res, "result", "SBS96", "res")
A musica created for testing that includes SBS variants
data(musica)
data(musica)
An object of class musica
See [create_musica_from_variants()] and [create_musica_from_counts()].
A musica created for testing that includes SBS variants and sample annotations
data(musica_annot)
data(musica_annot)
An object of class musica
See [create_musica_from_variants()] and [create_musica_from_counts()].
A musica created for testing that includes SBS variants and a build counts table for them
data(musica_sbs96)
data(musica_sbs96)
An object of class musica
See [build_standard_table()].
A very small musica created for testing that includes SBS variants and a build counts table for them
data(musica_sbs96_tiny)
data(musica_sbs96_tiny)
An object of class musica
See [build_standard_table()].
The primary object that contains variants, count_tables, and samples annotations
variants
data.table
of variants
count_tables
Summary table with per-sample unnormalized motif counts
sample_annotations
Sample-level annotations (e.g. age, sex, primary)
result_list
Results from various algorithms, modalities, and models
The musicatk Shiny app allows users to perform mutational signature analysis using an interative graphical user interface (GUI)
musicatk(include_version = TRUE, theme = "yeti")
musicatk(include_version = TRUE, theme = "yeti")
include_version |
Include the version number in the header.
Default |
theme |
The theme to use for the GUI. Default |
The shiny app will open. No data will be returned.
## Not run: # Start the app musicatk() ## End(Not run)
## Not run: # Start the app musicatk() ## End(Not run)
Rename signatures for a model
name_signatures( musica, model_id, name_vector, modality = "SBS96", result_name = "result" )
name_signatures( musica, model_id, name_vector, modality = "SBS96", result_name = "result" )
musica |
A |
model_id |
The name of the model to rename signatures for. |
name_vector |
Vector of user-defined signature names |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing the model.
Default |
Musica object with user-defined signatures names
data(res) name_signatures(res, model_id = "res", name_vector = c("smoking", "apobec", "unknown") )
data(res) name_signatures(res, model_id = "res", name_vector = c("smoking", "apobec", "unknown") )
The number of signatures in a model
num_signatures(x, ...) ## S4 method for signature 'musica' num_signatures(x, result, modality, model_id) ## S4 method for signature 'result_collection' num_signatures(x, modality, model_id) ## S4 method for signature 'result_model' num_signatures(x) num_signatures(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' num_signatures(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' num_signatures(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' num_signatures(x) <- value
num_signatures(x, ...) ## S4 method for signature 'musica' num_signatures(x, result, modality, model_id) ## S4 method for signature 'result_collection' num_signatures(x, modality, model_id) ## S4 method for signature 'result_model' num_signatures(x) num_signatures(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' num_signatures(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' num_signatures(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' num_signatures(x) <- value
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the num_signatures.
Used when |
modality |
Modality to assign the num_signatures. Used when
|
model_id |
Model identifier to assign the num_signatures. Used when
|
value |
Number of signatures in the model |
The number of signatures in a model
data(res) num_signatures(res, "result", "SBS96", "res")
data(res) num_signatures(res, "result", "SBS96", "res")
Parameters for the model
other_parameters(x, ...) ## S4 method for signature 'musica' other_parameters(x, result, modality, model_id) ## S4 method for signature 'result_collection' other_parameters(x, modality, model_id) ## S4 method for signature 'result_model' other_parameters(x) other_parameters(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' other_parameters(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' other_parameters(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' other_parameters(x) <- value
other_parameters(x, ...) ## S4 method for signature 'musica' other_parameters(x, result, modality, model_id) ## S4 method for signature 'result_collection' other_parameters(x, modality, model_id) ## S4 method for signature 'result_model' other_parameters(x) other_parameters(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' other_parameters(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' other_parameters(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' other_parameters(x) <- value
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the other_parameters.
Used when |
modality |
Modality to assign the other_parameters. Used when
|
model_id |
Model identifier to assign the other_parameters. Used when
|
value |
List of other parameters |
The other parameters for the model
data(res) other_parameters(res, "result", "SBS96", "res")
data(res) other_parameters(res, "result", "SBS96", "res")
The parameter
contains input parameters used in the
model
parameter(x, ...) ## S4 method for signature 'musica' parameter(x, result) ## S4 method for signature 'result_collection' parameter(x) parameter(x, ...) <- value ## S4 replacement method for signature 'result_collection,list' parameter(x) <- value ## S4 replacement method for signature 'musica,list' parameter(x, result) <- value
parameter(x, ...) ## S4 method for signature 'musica' parameter(x, result) ## S4 method for signature 'result_collection' parameter(x) parameter(x, ...) <- value ## S4 replacement method for signature 'result_collection,list' parameter(x) <- value ## S4 replacement method for signature 'musica,list' parameter(x, result) <- value
x |
A |
... |
Other inputs |
result |
The name of the result_list entry. |
value |
a list of input parameters |
a list of parameters
data(res) parameter(res, "result")
data(res) parameter(res, "result")
The clustering results can be visualized on a UMAP panel. Three different types of plots can be generated using this function: cluster-by-signature plot, cluster-by-annotation plot, and a single UMAP plot.
plot_cluster( musica, model_name, modality = "SBS96", result_name = "result", clusters, group = "signature", annotation = NULL, plotly = TRUE )
plot_cluster( musica, model_name, modality = "SBS96", result_name = "result", clusters, group = "signature", annotation = NULL, plotly = TRUE )
musica |
A |
model_name |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing desired model.
Default |
clusters |
The result generated from cluster_exposure function. |
group |
A single character string indicating the grouping factor. Possible options are: "signature" (columns are signatures in a grid), "annotation" (columns are sample annotation), and "none" (a single UMAP plot). Default is "signature". |
annotation |
Column name of annotation. |
plotly |
If TRUE, the plot will be made interactive using plotly. |
Generate a ggplot or plotly object.
set.seed(123) data(res_annot) # Get clustering result clust_out <- cluster_exposure( musica = res_annot, model_name = "res_annot", nclust = 2, iter.max = 15 ) # UMAP create_umap(musica = res_annot, model_name = "res_annot") # generate cluster X signature plot plot_cluster( musica = res_annot, model_name = "res_annot", clusters = clust_out, group = "signature" ) # generate cluster X annotation plot plot_cluster( musica = res_annot, model_name = "res_annot", clusters = clust_out, group = "annotation", annotation = "Tumor_Subtypes" ) # generate a single UMAP plot plot_cluster( musica = res_annot, model_name = "res_annot", clusters = clust_out, group = "none" )
set.seed(123) data(res_annot) # Get clustering result clust_out <- cluster_exposure( musica = res_annot, model_name = "res_annot", nclust = 2, iter.max = 15 ) # UMAP create_umap(musica = res_annot, model_name = "res_annot") # generate cluster X signature plot plot_cluster( musica = res_annot, model_name = "res_annot", clusters = clust_out, group = "signature" ) # generate cluster X annotation plot plot_cluster( musica = res_annot, model_name = "res_annot", clusters = clust_out, group = "annotation", annotation = "Tumor_Subtypes" ) # generate a single UMAP plot plot_cluster( musica = res_annot, model_name = "res_annot", clusters = clust_out, group = "none" )
plot_differential_analysis
is used to plot
differential analysis created by exposure_differential_analysis
.
plot_differential_analysis(analysis, analysis_type, samp_num)
plot_differential_analysis(analysis, analysis_type, samp_num)
analysis |
Analysis created by |
analysis_type |
Currently only |
samp_num |
Number of samples that went into the analysis |
Generates a ggplot object
data("res_annot") analysis <- exposure_differential_analysis(res_annot, model_name = "res_annot", annotation = "Tumor_Subtypes", method = "wilcox" ) plot_differential_analysis(analysis, "glm", 2)
data("res_annot") analysis <- exposure_differential_analysis(res_annot, model_name = "res_annot", annotation = "Tumor_Subtypes", method = "wilcox" ) plot_differential_analysis(analysis, "glm", 2)
The distributions of mutational signatures can be viewed
with barplots or box/violin plots. Barplots are most useful for viewing
the proportion of signatures within and across samples. The box/violin plots
are most useful for viewing the distributions of signatures with respect to
sample annotations. Samples can be grouped using the group_by
parameter. For barplots, various methods of sorting samples from left
to right can be chosen using the sort_samples
parameter.
plot_exposures( musica, model_name, modality = "SBS96", result_name = "result", plot_type = c("bar", "box", "violin"), proportional = FALSE, group_by = "none", color_by = c("signature", "annotation"), annotation = NULL, num_samples = NULL, sort_samples = "total", threshold = NULL, same_scale = FALSE, add_points = FALSE, point_size = 2, label_x_axis = FALSE, legend = TRUE, plotly = FALSE )
plot_exposures( musica, model_name, modality = "SBS96", result_name = "result", plot_type = c("bar", "box", "violin"), proportional = FALSE, group_by = "none", color_by = c("signature", "annotation"), annotation = NULL, num_samples = NULL, sort_samples = "total", threshold = NULL, same_scale = FALSE, add_points = FALSE, point_size = 2, label_x_axis = FALSE, legend = TRUE, plotly = FALSE )
musica |
A |
model_name |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing desired model.
Default |
plot_type |
One of |
proportional |
If |
group_by |
Determines how to group samples into the subplots
(i.e. facets). One of |
color_by |
Determines how to color the bars or box/violins. One of
|
annotation |
Sample annotation used to group the subplots and/or
color the bars, boxes, or violins. Default |
num_samples |
The top number of sorted samples to display. If
|
sort_samples |
This is used to change how samples are sorted in
the barplot from left to right. If set to |
threshold |
Exposures less than this threshold will be set to 0.
This is most useful when more than one signature is supplied to
|
same_scale |
If |
add_points |
If |
point_size |
Size of the points to be plotted on top of the
violin/box plots. Only used when |
label_x_axis |
If |
legend |
If |
plotly |
If |
Generates a ggplot or plotly object
data(res_annot) plot_exposures(res_annot, model_name = "res_annot", plot_type = "bar", annotation = "Tumor_Subtypes" )
data(res_annot) plot_exposures(res_annot, model_name = "res_annot", plot_type = "bar", annotation = "Tumor_Subtypes" )
The exposures for different signatures can be
visualized using a heatmap with this function.
Heatmaps make it easier to visualize the data by
representing the magnitude of exposure values
as color in 2-dimensions. The variation in color
intensity can help see if the exposures are clustered
or how they vary over space. Exposures can be
normalized by providing the proportional
argument.
Column annotations can also be seen by passing the col_annot
argument.
plot_heatmap( musica, model_name, modality = "SBS96", result_name = "result", proportional = FALSE, show_column_names = FALSE, show_row_names = TRUE, scale = TRUE, subset_tumor = NULL, subset_signatures = NULL, annotation = NULL, ... )
plot_heatmap( musica, model_name, modality = "SBS96", result_name = "result", proportional = FALSE, show_column_names = FALSE, show_row_names = TRUE, scale = TRUE, subset_tumor = NULL, subset_signatures = NULL, annotation = NULL, ... )
musica |
A |
model_name |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing desired model.
Default |
proportional |
If |
show_column_names |
Boolean check. If |
show_row_names |
Boolean check. If |
scale |
Boolean check. If |
subset_tumor |
Users can specify certain tumor types on which they want to subset the exposure matrix for plotting the heatmap. |
subset_signatures |
Users can specify certain signatures on which they want to subset the exposure matrix plotting the heatmap. |
annotation |
Users have the option of plotting the exposure matrix based on their given annotation like Tumor_Subtypes or age. Error given if the user given annotation doesn't exist in the res_annot annotation object. |
... |
Ellipsis used for passing any arguments directly to the ComplexHeatmap's heatmap function. |
Generates a heatmap for using the exposure matrix.
data(res_annot) plot_heatmap( musica = res_annot, model_name = "res_annot", proportional = TRUE, scale = TRUE, annotation = "Tumor_Subtypes" )
data(res_annot) plot_heatmap( musica = res_annot, model_name = "res_annot", proportional = TRUE, scale = TRUE, annotation = "Tumor_Subtypes" )
Plot the results of comparing k values
plot_k_comparison(k_comparison)
plot_k_comparison(k_comparison)
k_comparison |
data.frame with k value comparisons returned from the
|
a ggplot figure
data(musica) k_comparison <- compare_k_vals(musica, "SBS96", reps = 3, min_k = 1, max_k = 5 ) plot_k_comparison(k_comparison)
data(musica) k_comparison <- compare_k_vals(musica, "SBS96", reps = 3, min_k = 1, max_k = 5 ) plot_k_comparison(k_comparison)
Displays the proportion of counts for each mutation type across one or more samples.
plot_sample_counts( musica, sample_names, modality = "SBS96", text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = TRUE, annotation = NULL, table_name = NULL )
plot_sample_counts( musica, sample_names, modality = "SBS96", text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = TRUE, annotation = NULL, table_name = NULL )
musica |
A |
sample_names |
Names of the samples to plot. |
modality |
Name of table used for plotting counts. Default
|
text_size |
Size of axis text. Default |
show_x_labels |
If |
show_y_labels |
If |
same_scale |
If |
annotation |
Vector of annotations to be displayed in the top right
corner of each sample. Vector length must be equivalent to the number of
samples. Default |
table_name |
Use modality instead |
Generates a ggplot object
data(musica_sbs96) plot_sample_counts(musica_sbs96, sample_names = sample_names(musica_sbs96)[1] )
data(musica_sbs96) plot_sample_counts(musica_sbs96, sample_names = sample_names(musica_sbs96)[1] )
Displays the observed distribution of counts for each mutation type, the distribution of reconstructed counts for each mutation type using the inferred mutational signatures, and the difference between the two distributions.
plot_sample_reconstruction_error( musica, sample, model_id, modality = "SBS96", result_name = "result", plotly = FALSE )
plot_sample_reconstruction_error( musica, sample, model_id, modality = "SBS96", result_name = "result", plotly = FALSE )
musica |
A |
sample |
Name of the sample within the |
model_id |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78",
or "IND83". Default |
result_name |
Name of the result list entry containing desired model.
Default |
plotly |
If |
Generates a ggplot or plotly object
data(res) plot_sample_reconstruction_error(res, "TCGA-ER-A197-06A-32D-A197-08", model_id = "res")
data(res) plot_sample_reconstruction_error(res, "TCGA-ER-A197-06A-32D-A197-08", model_id = "res")
After mutational signature discovery has been performed, this function
can be used to display the distribution of each mutational signature. The
color_variable
and color_mapping
parameters can be used
to change the default color scheme of the bars.
plot_signatures( musica, model_id, modality = "SBS96", result_name = "result", color_variable = NULL, color_mapping = NULL, text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = FALSE, y_max = NULL, annotation = NULL, percent = TRUE, plotly = FALSE )
plot_signatures( musica, model_id, modality = "SBS96", result_name = "result", color_variable = NULL, color_mapping = NULL, text_size = 10, show_x_labels = TRUE, show_y_labels = TRUE, same_scale = FALSE, y_max = NULL, annotation = NULL, percent = TRUE, plotly = FALSE )
musica |
A |
model_id |
The name of the model to plot. |
modality |
The modality of the signatures to plot. Must be
"SBS96", "DBS78", or "IND83". Default |
result_name |
Name of the result list entry containing the signatures
to plot. Default |
color_variable |
Name of the column in the variant annotation data.frame
to use for coloring the mutation type bars. The variant annotation data.frame
can be found within the count table of the |
color_mapping |
A character vector used to map items in the
|
text_size |
Size of axis text. Default |
show_x_labels |
If |
show_y_labels |
If |
same_scale |
If |
y_max |
Vector of maximum y-axis limits for each signature. One value
may also be provided to specify a constant y-axis limit for all signatures.
Vector length must be 1 or equivalent to the number of signatures. Default
|
annotation |
Vector of annotations to be displayed in the top right
corner of each signature. Vector length must be equivalent to the number of
signatures. Default |
percent |
If |
plotly |
If |
Generates a ggplot or plotly object
data(res) plot_signatures(res, model_id = "res")
data(res) plot_signatures(res, model_id = "res")
Plots samples on a UMAP scatterplot. Samples can be colored by the levels of mutational signatures or by a annotation variable.
plot_umap( musica, model_name, modality = "SBS96", result_name = "result", color_by = c("signatures", "annotation", "cluster", "none"), proportional = TRUE, annotation = NULL, point_size = 0.7, same_scale = TRUE, add_annotation_labels = FALSE, annotation_label_size = 3, annotation_text_box = TRUE, plotly = FALSE, clust = NULL, legend = TRUE, strip_axes = FALSE )
plot_umap( musica, model_name, modality = "SBS96", result_name = "result", color_by = c("signatures", "annotation", "cluster", "none"), proportional = TRUE, annotation = NULL, point_size = 0.7, same_scale = TRUE, add_annotation_labels = FALSE, annotation_label_size = 3, annotation_text_box = TRUE, plotly = FALSE, clust = NULL, legend = TRUE, strip_axes = FALSE )
musica |
A |
model_name |
The name of the desired model. |
modality |
The modality of the model. Must be "SBS96", "DBS78", or
"IND83". Default |
result_name |
Name of the result list entry containing the model.Default
|
color_by |
One of |
proportional |
If |
annotation |
Sample annotation used to color the points. One used
when |
point_size |
Scatter plot point size. Default |
same_scale |
If |
add_annotation_labels |
If |
annotation_label_size |
Size of annotation labels. Only used if
|
annotation_text_box |
Place a white box around the annotation labels
to improve readability. Only used if |
plotly |
If |
clust |
Add cluster labels as annotation |
legend |
Plot legend |
strip_axes |
Remove axes labels for cleaner looking plots |
Generates a ggplot or plotly object
See create_umap to generate a UMAP in a musica result.
data(res_annot) create_umap(res_annot, "res_annot") plot_umap(res_annot, "res_annot", color_by = "none")
data(res_annot) create_umap(res_annot, "res_annot") plot_umap(res_annot, "res_annot", color_by = "none")
Exposures for samples will be predicted using an existing set
of signatures stored in a result_model
object.
Algorithms available for prediction include a modify version of "lda"
,
and "decompTumor2Sig"
.
predict_exposure( musica, modality, signature_res, algorithm = c("lda", "decompTumor2Sig"), result_name = "result", model_id = NULL, signatures_to_use = seq_len(ncol(signatures(signature_res))), verbose = FALSE, make_copy = FALSE, table_name = NULL )
predict_exposure( musica, modality, signature_res, algorithm = c("lda", "decompTumor2Sig"), result_name = "result", model_id = NULL, signatures_to_use = seq_len(ncol(signatures(signature_res))), verbose = FALSE, make_copy = FALSE, table_name = NULL )
musica |
A |
modality |
Modality for posterior prediction. Must match the table type used to generate the prediction signatures |
signature_res |
Signatures used to predict exposures for the samples
|
algorithm |
Algorithm to use for prediction of exposures. One of
|
result_name |
Name for result_list entry to save the results to. Default
|
model_id |
Identifier for the result. If |
signatures_to_use |
Which signatures in the |
verbose |
If |
make_copy |
If |
table_name |
Use modality instead |
Returns nothing or a new musica
object,
depending on the make_copy
parameter.
data(musica) data(cosmic_v2_sigs) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) result <- predict_exposure( musica = musica, modality = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda" ) # Predict using LDA-like algorithm with seed set to 1 set.seed(1) predict_exposure( musica = musica, modality = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda" )
data(musica) data(cosmic_v2_sigs) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE) result <- predict_exposure( musica = musica, modality = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda" ) # Predict using LDA-like algorithm with seed set to 1 set.seed(1) predict_exposure( musica = musica, modality = "SBS96", signature_res = cosmic_v2_sigs, algorithm = "lda" )
Reverse complement of a string using biostrings
rc(dna)
rc(dna)
dna |
Input DNA string |
Returns the reverse compliment of the input DNA string
rc("ATGC")
rc("ATGC")
Supplementary data converted from bigWig to bedgraph to GRanges, with low RFD indicating the leading strand and high RFD indicating lagging strand and removing uninformative zero RFD intervals. Timing data is 10kb bins from a colon cancer sample.
data(rep_range)
data(rep_range)
An object of class "GRanges"
; see
[annotate_replication_strand()].
GEO, <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134225>
Sriramachandran, A. M. et al. (2020) Genome-wide Nucleotide-Resolution Mapping of DNA Replication Patterns, Single-Strand Breaks, and Lesions by GLOE-Seq. ([Molecular Cell] (doi:10.1016/j.molcel.2020.03.027))
A musica created for testing that includes SBS variants with discovered exposures and signatures
data(res)
data(res)
An object of class musica
See [discover_signatures()].
A musica created for testing that includes SBS variants with annotations and discovered exposures and signatures
data(res_annot)
data(res_annot)
An object of class musica
See [discover_signatures()].
The Result Collection object that contains modality, input parameters, prior hyperparameters
modality
a list contains model results for different modality
parameter
a list contains input parameters
hyperparameter
a list contains prior and tuning parameters
The result_list
contains results from various runs
result_list(object) ## S4 method for signature 'musica' result_list(object) result_list(musica) <- value ## S4 replacement method for signature 'musica,SimpleList' result_list(musica) <- value
result_list(object) ## S4 method for signature 'musica' result_list(object) result_list(musica) <- value ## S4 replacement method for signature 'musica,SimpleList' result_list(musica) <- value
object |
A |
musica |
A |
value |
A list of results |
A list of results
data(res) result_list(res)
data(res) result_list(res)
Object that contains results for a single model
signatures
A matrix of signatures by mutational motifs
exposures
A matrix of samples by signature weights
num_signatures
Number of signatures in the model
other_parameters
Parameters relevant to the model
credible_intervals
Credible intervals for parameters
metrics
Performance metrics for the model
umap
List of umap data.frames for plotting and analysis
model_id
Model identifier
modality
Modality of result (SBS96, DBS78, IND83)
Sample annotations can be used to store information about
each sample such as tumor type or treatment status. These are used in
downstream plotting functions such as plot_exposures
or
plot_umap
to group or color samples by a particular annotation.
samp_annot(object) ## S4 method for signature 'musica' samp_annot(object) samp_annot(object, name) <- value ## S4 replacement method for signature 'musica,character,vector' samp_annot(object, name) <- value
samp_annot(object) ## S4 method for signature 'musica' samp_annot(object) samp_annot(object, name) <- value ## S4 replacement method for signature 'musica,character,vector' samp_annot(object, name) <- value
object |
A |
name |
The name of the new annotation to add. |
value |
A vector containing the new sample annotations. Needs to be the same length as the number of samples in the object. |
A new object with the sample annotations added to the table in the
sample_annotations
slot.
See sample_names
to get a vector of sample names in
the musica
object.
data(res_annot) samp_annot(res_annot) # Add new annotation samp_annot(res_annot, "New_Annotation") <- rep(c("A", "B"), c(3, 4)) samp_annot(res_annot) data(musica) samp_annot(musica, "example") <- rep("ex", 7)
data(res_annot) samp_annot(res_annot) # Add new annotation samp_annot(res_annot, "New_Annotation") <- rep(c("A", "B"), c(3, 4)) samp_annot(res_annot) data(musica) samp_annot(musica, "example") <- rep("ex", 7)
Sample names were included in the sample
column
in the variant object passed to create_musica_from_variants
, or
in the colnames of the count table object passed to
create_musica_from_counts
. This returns a unique list of
samples names in the order they are inside the musica
object.
sample_names(object) ## S4 method for signature 'musica' sample_names(object)
sample_names(object) ## S4 method for signature 'musica' sample_names(object)
object |
A |
A character vector of sample names
data(res) sample_names(res)
data(res) sample_names(res)
Helper function to load common human or mouse genomes
select_genome(x)
select_genome(x)
x |
Select the hg19 or hg38 human genome or the mm9 or mm10 mouse genome in UCSC format |
Returns BSgenome of given version
g <- select_genome(x = "hg38")
g <- select_genome(x = "hg38")
The signatures
matrix contains the probability of
mutation motif in each sample. Rows correspond to each motif and
columns correspond to each signature.
signatures(x, ...) ## S4 method for signature 'musica' signatures(x, result, modality, model_id) ## S4 method for signature 'result_collection' signatures(x, modality, model_id) ## S4 method for signature 'result_model' signatures(x) signatures(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' signatures(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' signatures(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' signatures(x) <- value
signatures(x, ...) ## S4 method for signature 'musica' signatures(x, result, modality, model_id) ## S4 method for signature 'result_collection' signatures(x, modality, model_id) ## S4 method for signature 'result_model' signatures(x) signatures(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' signatures(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' signatures(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' signatures(x) <- value
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the signatures.
Used when |
modality |
Modality to assign the signatures. Used when
|
model_id |
Model identifier to assign the signatures. Used when
|
value |
A matrix of motifs counts by samples |
A matrix of mutational signatures
data(res) signatures(res, "result", "SBS96", "res") data(res) signatures(res, "result", "SBS96", "res") <- matrix()
data(res) signatures(res, "result", "SBS96", "res") data(res) signatures(res, "result", "SBS96", "res") <- matrix()
Creates a new musica object subsetted to only one value of a sample annotation
subset_musica_by_annotation(musica, annot_col, annot_names)
subset_musica_by_annotation(musica, annot_col, annot_names)
musica |
A |
annot_col |
Annotation class to use for subsetting |
annot_names |
Annotational value to subset to |
Returns a new musica object with sample annotations, count tables, and variants subsetted to only contains samples of the specified annotation type
data(musica_sbs96) annot <- read.table(system.file("extdata", "sample_annotations.txt", package = "musicatk" ), sep = "\t", header = TRUE) samp_annot(musica_sbs96, "Tumor_Subtypes") <- annot$Tumor_Subtypes musica_sbs96 <- subset_musica_by_annotation( musica_sbs96, "Tumor_Subtypes", "Lung" )
data(musica_sbs96) annot <- read.table(system.file("extdata", "sample_annotations.txt", package = "musicatk" ), sep = "\t", header = TRUE) samp_annot(musica_sbs96, "Tumor_Subtypes") <- annot$Tumor_Subtypes musica_sbs96 <- subset_musica_by_annotation( musica_sbs96, "Tumor_Subtypes", "Lung" )
Creates a new musica subsetted to only samples with enough variants
subset_musica_by_counts(musica, table_name, num_counts)
subset_musica_by_counts(musica, table_name, num_counts)
musica |
A |
table_name |
Name of table used for subsetting |
num_counts |
Minimum sum count value to drop samples |
Returns a new musica object with sample annotations, count tables, and variants subsetted to only contains samples with the specified minimum number of counts (column sums) in the specified table
data(musica_sbs96) subset_musica_by_counts(musica_sbs96, "SBS96", 20)
data(musica_sbs96) subset_musica_by_counts(musica_sbs96, "SBS96", 20)
Subsets a variant table based on Variant Type
subset_variant_by_type(tab, type)
subset_variant_by_type(tab, type)
tab |
Input variant table |
type |
Variant type to return e.g. "SBS", "INS", "DEL", "DBS" |
Returns the input variant table subsetted to only contain variants of the specified variant type
data(musica) annotate_variant_type(musica) subset_variant_by_type(variants(musica), "SBS")
data(musica) annotate_variant_type(musica) subset_variant_by_type(variants(musica), "SBS")
Return sample from musica_variant object
subset_variants_by_samples(musica, sample_name)
subset_variants_by_samples(musica, sample_name)
musica |
A |
sample_name |
Sample name to subset by |
Returns sample data.frame subset to a single sample
data(musica) subset_variants_by_samples(musica, "TCGA-94-7557-01A-11D-2122-08")
data(musica) subset_variants_by_samples(musica, "TCGA-94-7557-01A-11D-2122-08")
The table name
table_selected(result) ## S4 method for signature 'result_model' table_selected(result)
table_selected(result) ## S4 method for signature 'result_model' table_selected(result)
result |
A |
Table name used for plotting
data(res) model <- get_model(res, "result", "SBS96", "res") table_selected(model)
data(res) model <- get_model(res, "result", "SBS96", "res") table_selected(model)
The count_tables
contains standard and/or custom
count tables created from variants
tables(object) ## S4 method for signature 'musica' tables(object) tables(musica) <- value ## S4 replacement method for signature 'musica,list' tables(musica) <- value
tables(object) ## S4 method for signature 'musica' tables(object) tables(musica) <- value ## S4 replacement method for signature 'musica,list' tables(musica) <- value
object |
A |
musica |
A |
value |
A list of |
A list of count_tables
data(res) tables(res)
data(res) tables(res)
The umap dataframes for the model
umap(x, ...) ## S4 method for signature 'musica' umap(x, result, modality, model_id) ## S4 method for signature 'result_collection' umap(x, modality, model_id) ## S4 method for signature 'result_model' umap(x) umap(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' umap(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' umap(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' umap(x) <- value
umap(x, ...) ## S4 method for signature 'musica' umap(x, result, modality, model_id) ## S4 method for signature 'result_collection' umap(x, modality, model_id) ## S4 method for signature 'result_model' umap(x) umap(x, ...) <- value ## S4 replacement method for signature 'musica,matrix' umap(x, result, modality, model_id) <- value ## S4 replacement method for signature 'result_collection,matrix' umap(x, modality, model_id) <- value ## S4 replacement method for signature 'result_model,matrix' umap(x) <- value
x |
A |
... |
Other inputs |
result |
Name of result from result_list to assign the
umap. Used when |
modality |
Modality to assign the umap. Used when
|
model_id |
Model identifier to assign the umap. Used when
|
value |
A list of umap dataframes |
A list of umap dataframes
data(res) umap(res, "result", "SBS96", "res")
data(res) umap(res, "result", "SBS96", "res")
The variants
data.table
contains the variants
and variant-level annotations
variants(object) ## S4 method for signature 'musica' variants(object) variants(musica) <- value ## S4 replacement method for signature 'musica,data.table' variants(musica) <- value
variants(object) ## S4 method for signature 'musica' variants(object) variants(musica) <- value ## S4 replacement method for signature 'musica,data.table' variants(musica) <- value
object |
A |
musica |
A |
value |
A |
A data.table of variants
data(res) variants(res)
data(res) variants(res)