Title: | An R Package for Estimating Copy Number Levels of Viral Genome Segments Using Base-Resolution Read Depth Profile |
---|---|
Description: | Base-resolution copy number analysis of viral genome. Utilizes base-resolution read depth data over viral genome to find copy number segments with two-dimensional segmentation approach. Provides publish-ready figures, including histograms of read depths, coverage line plots over viral genome annotated with copy number change events and viral genes, and heatmaps showing multiple types of data with integrative clustering of samples. |
Authors: | Hyo Young Choi [aut, cph] |
Maintainer: | Jin-Young Lee <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.99.11 |
Built: | 2025-02-22 03:28:22 UTC |
Source: | https://github.com/bioc/ELViS |
Convert coordinate string to grng object
coord_to_grng(coord)
coord_to_grng(coord)
coord |
string in the form of "chr1:123-456" or "chr1:1,234-5,678,912" |
GRanges object corresponding to the input coordinate string
coord_to_grng("chr1:123-456") coord_to_grng("chr1:1,234-5,678,912")
coord_to_grng("chr1:123-456") coord_to_grng("chr1:1,234-5,678,912")
Convert coordinate string to list of chr,start and end
coord_to_lst(coord)
coord_to_lst(coord)
coord |
string in the form of "chr1:123-456" or "chr1:1,234-5,678,912" |
a list of 3 elements. Chromosome name, start position and end position.
coord_to_lst("chr1:123-456") coord_to_lst("chr1:1,234-5,678,912")
coord_to_lst("chr1:123-456") coord_to_lst("chr1:1,234-5,678,912")
Sample filtering threshold examination plot.
depth_hist(mtrx, th = 50, title_txt = NULL, smry_fun = max, ...)
depth_hist(mtrx, th = 50, title_txt = NULL, smry_fun = max, ...)
mtrx |
Matrix or data.frame. Rows are positions and columns are samples. |
th |
Numeric. Sample filtering threshold |
title_txt |
figure title. |
smry_fun |
function to calculate summary metric to apply sample filter threshold to |
... |
additional argument for |
ggplot2 object
data(mtrx_samtools_reticulate) th <- 50 depth_hist(mtrx_samtools_reticulate,th=th,smry_fun=max) depth_hist(mtrx_samtools_reticulate,th=th,smry_fun = quantile,prob=0.95) depth_hist(mtrx_samtools_reticulate,th=th,smry_fun = mean)
data(mtrx_samtools_reticulate) th <- 50 depth_hist(mtrx_samtools_reticulate,th=th,smry_fun=max) depth_hist(mtrx_samtools_reticulate,th=th,smry_fun = quantile,prob=0.95) depth_hist(mtrx_samtools_reticulate,th=th,smry_fun = mean)
Base-resolution copy number analysis of viral genome. Utilizes base-resolution read depth data over viral genome to find copy number segments with two-dimensional segmentation approach. Provides publish-ready figures, including histograms of read depths, coverage line plots over viral genome annotated with copy number change events and viral genes, and heatmaps showing multiple types of data with integrative clustering of samples.
get_depth_matrix : Generate a read depth matrix of positions x samples from input BAM files list.
run_ELViS : Run ELViS using input raw depth matrix.
raw depth matrix.
Maintainer: Jin-Young Lee [email protected] (ORCID) [copyright holder]
Authors:
Hyo Young Choi [email protected] (ORCID) [copyright holder]
Katherine A. Hoadley [email protected] (ORCID)
D. Neil Hayes [email protected] (ORCID) [funder, copyright holder]
Other contributors:
Xiaobei Zhao [email protected] (ORCID) [contributor]
Jeremiah R. Holt [email protected] (ORCID) [contributor]
Useful links:
List containing ELViS run result
data(ELViS_toy_run_result)
data(ELViS_toy_run_result)
ELViS_toy_run_result
A list of
Indicates whether this results is a reduced form
Final base-resolution segmentation output
Indices of samples in which copy number variants were detected
Normalized read depth
Filtering samples based on summary statistic
filt_samples(mtrx, th = 50, smry_fun = max)
filt_samples(mtrx, th = 50, smry_fun = max)
mtrx |
matrix or data.frame. Rows are positions and columns are samples. |
th |
Sample filtering threshold (Default : 50) |
smry_fun |
function to generate summary value of samples, which is used for filtering. (Default : |
matrix or data.frame. data matrix with low depth samples filtered out.
data(mtrx_samtools_reticulate) th<-50 filtered_mtrx <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max)
data(mtrx_samtools_reticulate) th<-50 filtered_mtrx <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max)
Gene Copy Number Heatmap
gene_cn_heatmaps( X_raw, result, gff3_fn, gene_ref, baseline = 1, exclude_genes, col_cn = colorRamp2(c(0.5, 1, 1.5), c(muted("blue"), "white", muted("red"))), heatmap_height = unit(1.5, "in") )
gene_cn_heatmaps( X_raw, result, gff3_fn, gene_ref, baseline = 1, exclude_genes, col_cn = colorRamp2(c(0.5, 1, 1.5), c(muted("blue"), "white", muted("red"))), heatmap_height = unit(1.5, "in") )
X_raw |
Raw depth matrix |
result |
Run result |
gff3_fn |
gene annotation file name |
gene_ref |
The name of the gene to set as reference for relative gene dosage heatmap |
baseline |
Vector of state numbers to use as baseline for each sample. If it is single integer, then the given state number is used for all samples. (Default : |
exclude_genes |
name of genes to exclude from the annotation track (Default : NULL) |
col_cn |
relative gene dosage color palette. (Default : |
heatmap_height |
heatmap height specified using unit function. (Default : |
a ComplexHeatmap Heatmap List object
# gff3 gene model file package_name <- "ELViS" gff3_fn <- system.file("extdata","HPV16REF_PaVE.gff",package = package_name) # loading precalculated depth matrix data(mtrx_samtools_reticulate) # threshold th <- 50 # filtered matrix base_resol_depth <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) # viral load data data(total_aligned_base__host_and_virus) viral_load <- (10^6)*(apply(base_resol_depth,2,\(x) sum(x)) )/total_aligned_base__host_and_virus # load ELViS run result data(ELViS_toy_run_result) result <- ELViS_toy_run_result # genes to exclude from plotting exclude_genes <- c("E6*","E1^E4","E8^E2") # heatmap of gene dosage gene_ref<-"E7" gene_cn <- gene_cn_heatmaps( X_raw = base_resol_depth, result = result, gff3_fn = gff3_fn, baseline = 1, gene_ref = gene_ref, exclude_genes = exclude_genes ) gene_cn
# gff3 gene model file package_name <- "ELViS" gff3_fn <- system.file("extdata","HPV16REF_PaVE.gff",package = package_name) # loading precalculated depth matrix data(mtrx_samtools_reticulate) # threshold th <- 50 # filtered matrix base_resol_depth <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) # viral load data data(total_aligned_base__host_and_virus) viral_load <- (10^6)*(apply(base_resol_depth,2,\(x) sum(x)) )/total_aligned_base__host_and_virus # load ELViS run result data(ELViS_toy_run_result) result <- ELViS_toy_run_result # genes to exclude from plotting exclude_genes <- c("E6*","E1^E4","E8^E2") # heatmap of gene dosage gene_ref<-"E7" gene_cn <- gene_cn_heatmaps( X_raw = base_resol_depth, result = result, gff3_fn = gff3_fn, baseline = 1, gene_ref = gene_ref, exclude_genes = exclude_genes ) gene_cn
Generate a read depth matrix of positions x samples from input BAM files list
get_depth_matrix( bam_files, mode = "samtools_basilisk", coord_or_target_virus_name, is_virus = TRUE, N_cores = detectCores(), min_mapq = 30, min_base_quality = 0, max_depth = 1e+05, modules = NULL, envs = NULL, tmpdir = tempdir(), samtools = NULL, condaenv = "env_samtools", condaenv_samtools_version = "1.21" )
get_depth_matrix( bam_files, mode = "samtools_basilisk", coord_or_target_virus_name, is_virus = TRUE, N_cores = detectCores(), min_mapq = 30, min_base_quality = 0, max_depth = 1e+05, modules = NULL, envs = NULL, tmpdir = tempdir(), samtools = NULL, condaenv = "env_samtools", condaenv_samtools_version = "1.21" )
bam_files |
Vector containing bam file names in character |
mode |
Mode of read depth calculation. Either of |
coord_or_target_virus_name |
The name of the target virus. This should be equal to the name of the sequence in the FASTA file reads are aligned to. |
is_virus |
logical indicating if the coord_or_target_virus_name is for viral genome(TRUE) or non-viral genome(FALSE) (default : TRUE) |
N_cores |
Number of cores to use for parallel processing (Default : min(10,available cores)) |
min_mapq |
Minimum MAPQ. (Default : 30) |
min_base_quality |
Minimum basecall quality score (Default : 0) |
max_depth |
(Rsamtools) Maximum read depth. (Default : 1e5) |
modules |
(samtools) Environment modulefile name. (Default : NULL) |
envs |
(samtools) Environmental variables for samtools. (Default : NULL) |
tmpdir |
(samtools) Temporary file directory (Default : |
samtools |
(samtools) Absolute path to samtools executable (Default : NULL) |
condaenv |
(samtools_basilisk) Name of the conda environment in which samtools are installed. If no environment with this name is available, one will be created. (Default : |
condaenv_samtools_version |
(samtools_basilisk) The version of samtools to install in the conda environment using basilisk (Default : "1.21") |
a matrix of positions x samples containing base-resolution raw read depth
package_name <- "ELViS" # The name of the target virus # in the reference sequence FASTA file used for alignment. # Can be check by samtools view -H input.bam target_virus_name <- "gi|333031|lcl|HPV16REF.1|" # get bam file pathes ext_path <- system.file("extdata",package = package_name) bam_files <- list.files(ext_path,full.names = TRUE,pattern = "bam$") # number of threads to use N_cores <- 1L # get read depth matrix tmpdir <- tempdir() mtrx_samtools_basilisk <- get_depth_matrix( bam_files = bam_files,coord_or_target_virus_name = target_virus_name,is_virus = TRUE ,mode = "samtools_basilisk" ,N_cores = N_cores ,min_mapq = 30 ,tmpdir=tempdir() ,condaenv = "env_samtools" )
package_name <- "ELViS" # The name of the target virus # in the reference sequence FASTA file used for alignment. # Can be check by samtools view -H input.bam target_virus_name <- "gi|333031|lcl|HPV16REF.1|" # get bam file pathes ext_path <- system.file("extdata",package = package_name) bam_files <- list.files(ext_path,full.names = TRUE,pattern = "bam$") # number of threads to use N_cores <- 1L # get read depth matrix tmpdir <- tempdir() mtrx_samtools_basilisk <- get_depth_matrix( bam_files = bam_files,coord_or_target_virus_name = target_virus_name,is_virus = TRUE ,mode = "samtools_basilisk" ,N_cores = N_cores ,min_mapq = 30 ,tmpdir=tempdir() ,condaenv = "env_samtools" )
Get new baselines according to criteria user designates
get_new_baseline(result, mode = "longest")
get_new_baseline(result, mode = "longest")
result |
Run result |
mode |
Indicate how new baseline should be set ("longest","shortest") |
a integer vector indicating new baseline index for each sample
# its usage example is given in vignette in detail data(ELViS_toy_run_result) result <- ELViS_toy_run_result get_new_baseline(result,mode="longest")
# its usage example is given in vignette in detail data(ELViS_toy_run_result) result <- ELViS_toy_run_result get_new_baseline(result,mode="longest")
Plot heatmaps based on simple integrative clustering of multiple matrices
integrative_heatmap( X_raw, result, gff3_fn, exclude_genes, col_pal_gene = col_yarrr_info2, col_cn = colorRamp2(c(0.5, 1, 1.5), c(muted("blue"), "white", muted("red"))), col_y = colorRamp2(c(0.5, 1, 2), c(muted("blue"), "white", muted("red"))), col_z = colorRamp2(c(-4, 0, 4), c(muted("blue"), "white", muted("red"))), col_x_scaled = "auto", col_vl = "auto", baseline = 1, matrices_to_plot = "all", matrices_integ_cluster = "all", total_aligned_base__host_and_virus = NULL, return_data_matrices = FALSE )
integrative_heatmap( X_raw, result, gff3_fn, exclude_genes, col_pal_gene = col_yarrr_info2, col_cn = colorRamp2(c(0.5, 1, 1.5), c(muted("blue"), "white", muted("red"))), col_y = colorRamp2(c(0.5, 1, 2), c(muted("blue"), "white", muted("red"))), col_z = colorRamp2(c(-4, 0, 4), c(muted("blue"), "white", muted("red"))), col_x_scaled = "auto", col_vl = "auto", baseline = 1, matrices_to_plot = "all", matrices_integ_cluster = "all", total_aligned_base__host_and_virus = NULL, return_data_matrices = FALSE )
X_raw |
Raw depth matrix |
result |
Run result |
gff3_fn |
gene annotation file name |
exclude_genes |
name of genes to exclude from the annotation track (Default : NULL) |
col_pal_gene |
color palette for gene colors |
col_cn |
Color scheme for copy number heatmap (Default : |
col_y |
Color scheme for normalized read depth(Y) heatmap (Default : |
col_z |
Color scheme for Z-score heatmap (Default : |
col_x_scaled |
Color scheme for scaled raw depth(X) heatmap (Default : |
col_vl |
Color scheme for positional viral load heatmap (Default : |
baseline |
Vector of state numbers to use as baseline for each sample. If it is single integer, then the given state number is used for all samples. (Default : |
matrices_to_plot |
Names and orders of the matrices to show as heatmap. Any permutation of |
matrices_integ_cluster |
Names of the matrices to be used for integrative clustering for column orders. Any combination of |
total_aligned_base__host_and_virus |
Total aligned bases for each sample(i.e. from picard,gatk,qualimap). Used to calculate positional load of viral DNA. Makes sense if regions in host genome are also included in the target panel. Ignored if set to NULL. (Default : NULL) |
return_data_matrices |
boolean whether to return the data matrices used. (Default : |
A ComplexHeatmap Heatmap List object vertically stacked
# gff3 gene model file package_name <- "ELViS" gff3_fn <- system.file("extdata","HPV16REF_PaVE.gff",package = package_name) # loading precalculated depth matrix data(mtrx_samtools_reticulate) # threshold th <- 50 # filtered matrix base_resol_depth <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) # viral load data data(total_aligned_base__host_and_virus) viral_load <- (10^6)*(apply(base_resol_depth,2,\(x) sum(x)) )/total_aligned_base__host_and_virus # load ELViS run result data(ELViS_toy_run_result) result <- ELViS_toy_run_result # genes to exclude from plotting exclude_genes <- c("E6*","E1^E4","E8^E2") # heatmap based on integrative clustering integ_ht_result <- integrative_heatmap( X_raw = base_resol_depth, result = result, gff3_fn = gff3_fn, exclude_genes = exclude_genes, baseline=1, total_aligned_base__host_and_virus = total_aligned_base__host_and_virus ) integ_ht_result
# gff3 gene model file package_name <- "ELViS" gff3_fn <- system.file("extdata","HPV16REF_PaVE.gff",package = package_name) # loading precalculated depth matrix data(mtrx_samtools_reticulate) # threshold th <- 50 # filtered matrix base_resol_depth <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) # viral load data data(total_aligned_base__host_and_virus) viral_load <- (10^6)*(apply(base_resol_depth,2,\(x) sum(x)) )/total_aligned_base__host_and_virus # load ELViS run result data(ELViS_toy_run_result) result <- ELViS_toy_run_result # genes to exclude from plotting exclude_genes <- c("E6*","E1^E4","E8^E2") # heatmap based on integrative clustering integ_ht_result <- integrative_heatmap( X_raw = base_resol_depth, result = result, gff3_fn = gff3_fn, exclude_genes = exclude_genes, baseline=1, total_aligned_base__host_and_virus = total_aligned_base__host_and_virus ) integ_ht_result
Base-resolution raw read depth profile over viral genome
data(mtrx_samtools_reticulate)
data(mtrx_samtools_reticulate)
mtrx_samtools_reticulate
A matrix with 7906 rows and 120 columns
Normalization - scaling by certain quantile
norm_fun(x, probs = 0.75)
norm_fun(x, probs = 0.75)
x |
numeric vector to normalize. |
probs |
a single numeric value of probabilities in |
numeric vector of normalized values
norm_fun(seq_len(5)) # [1] 0.25 0.50 0.75 1.00 1.25
norm_fun(seq_len(5)) # [1] 0.25 0.50 0.75 1.00 1.25
Get a list of pile-up plots over multiple samples
plot_pileUp_multisample( result, X_raw, target_indices = NULL, plot_target = "x", gff3_fn, baseline = 1, annot_margin = 0.01, arrow_spacing = 0.05, geme_name_space = 0.5, col_pal = col_yarrr_info2, col_cn_baseline = "#708C98", col_pal_cn = col_yarrr_info2[-5], exclude_genes = NULL, annot_plot_ratio = 0.3 )
plot_pileUp_multisample( result, X_raw, target_indices = NULL, plot_target = "x", gff3_fn, baseline = 1, annot_margin = 0.01, arrow_spacing = 0.05, geme_name_space = 0.5, col_pal = col_yarrr_info2, col_cn_baseline = "#708C98", col_pal_cn = col_yarrr_info2[-5], exclude_genes = NULL, annot_plot_ratio = 0.3 )
result |
analysis result |
X_raw |
input raw depth matrix |
target_indices |
sample indices to plot |
plot_target |
target data type to plot (Default : |
gff3_fn |
gene annotation file name |
baseline |
the state index to set as baseline (Default : |
annot_margin |
minimum of margin between gene annotations allowed. As a fraction of plotting area. (Default : |
arrow_spacing |
gene annotation arrow spacing. As a fraction of plotting area. (Default : |
geme_name_space |
the height of white space reserved for gene names in the annotation. (Default : |
col_pal |
gene color palette |
col_cn_baseline |
color for baseline (Default : |
col_pal_cn |
color palette for non-baseline copy number states |
exclude_genes |
name of genes to exclude from the annotation track (Default : NULL) |
annot_plot_ratio |
ratio of the annotation plot under the pileup plot |
a list of pile-up ggplot object
# gff3 gene model file package_name <- "ELViS" gff3_fn <- system.file("extdata","HPV16REF_PaVE.gff",package = package_name) # loading precalculated depth matrix data(mtrx_samtools_reticulate) # threshold th <- 50 # filtered matrix base_resol_depth <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) data(ELViS_toy_run_result) result <- ELViS_toy_run_result # get line plots for shape-change samples gg_lst_x <- plot_pileUp_multisample( result = result, X_raw = base_resol_depth, plot_target = "x", gff3 = gff3_fn, baseline=1, exclude_genes = c("E6*","E1^E4","E8^E2"), target_indices = result$final_call$cnv_samples[seq_len(3)] ) gg_lst_x[[1]]
# gff3 gene model file package_name <- "ELViS" gff3_fn <- system.file("extdata","HPV16REF_PaVE.gff",package = package_name) # loading precalculated depth matrix data(mtrx_samtools_reticulate) # threshold th <- 50 # filtered matrix base_resol_depth <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) data(ELViS_toy_run_result) result <- ELViS_toy_run_result # get line plots for shape-change samples gg_lst_x <- plot_pileUp_multisample( result = result, X_raw = base_resol_depth, plot_target = "x", gff3 = gff3_fn, baseline=1, exclude_genes = c("E6*","E1^E4","E8^E2"), target_indices = result$final_call$cnv_samples[seq_len(3)] ) gg_lst_x[[1]]
Run ELViS using input raw depth matrix
run_ELViS( X, N_cores = min(10L, detectCores()), reduced_output = TRUE, verbose = FALSE, save_intermediate_data = FALSE, save_dir = "save_dir", overwrite = FALSE )
run_ELViS( X, N_cores = min(10L, detectCores()), reduced_output = TRUE, verbose = FALSE, save_intermediate_data = FALSE, save_dir = "save_dir", overwrite = FALSE )
X |
Raw depth matrix of position x samples |
N_cores |
The number of cores to use (Default : |
reduced_output |
logical indicating whether to return only reduced output |
verbose |
logical whether to print out information for debugging |
save_intermediate_data |
logical indicating whether to save intermediate data as rds file. (default : FALSE) |
save_dir |
Name of the directory to save intermediate files in. (default : "save_dir") |
overwrite |
logical indicating whether to overwrite intermediate files. (default : FALSE) |
list containing ELViS run results
data(mtrx_samtools_reticulate) th<-50 filtered_mtrx <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) result <- run_ELViS(filtered_mtrx[,seq_len(5)],N_cores=1L)
data(mtrx_samtools_reticulate) th<-50 filtered_mtrx <- filt_samples(mtrx_samtools_reticulate,th=th,smry_fun=max) result <- run_ELViS(filtered_mtrx[,seq_len(5)],N_cores=1L)
Total aligned base both to host and viral genome.
data(total_aligned_base__host_and_virus)
data(total_aligned_base__host_and_virus)
total_aligned_base__host_and_virus
A vector of length 120
Metadata of samples in the toy examples
data(toy_example)
data(toy_example)
toy_example
A data frame with 120 rows and 6 columns:
Variant type. Set to control if there is no variant
The number of copies duplicated or deleted
Variant size
Mean read depth