This is a package containing unCOVERApp, a shiny graphical application for clinical assessment of sequence coverage. unCOVERApp allows:
to display interactive plots showing sequence gene coverage down
to base-pair resolution and functional/ clinical annotations of sequence
positions within coverage gaps (Coverage Analysis
page).
to calculate the maximum credible
population allele frequency (AF) to be applied as AF filtering
threshold tailored to the model of the disease-of-interest instead of a
general AF cut-off (e.g. 1 % or 0.1 %)
(Calculate AF by allele frequency app
page).
to calculate the 95 % probability of the binomial distribution to
observe at least N variant-supporting reads (N is the number of
successes) based on a user-defined allele fraction that is expected for
the variant (which is the probability of success). Especially useful to
obtain the range of variant-supporting reads that is most likely to
occur at a given a depth of coverage (DoC)(which is the number of
trials) for somatic variants with low allele fraction
(Binomial distribution
page).
Install the latest version of uncoverappLib
using
BiocManager
.
uncoverappLib
requires:
Alternatively, it can be installed from GitHub using:
When users load uncoverappLib
for the first time, the
first thing to do is a download of annotation files.
getAnnotationFiles()
function allows to download the
annotation files from Zenodo and
parse it using uncoverappLib package. The function does not return an R
object but store the annotation files in a cache
(sorted.bed.gz
and sorted.bed.gz.tbi
) and
show
the cache path. The local cache is managed by the
BiocFileCache
Bioconductor package. It is
sufficient run the function
getAnnotationFiles(verbose= TRUE)
one time after installing
uncoverappLib package as shown below. The preprocessing time
can take few minutes, therefore during running vignette, users can
provide vignette= TRUE
as a parameter to download an
example annotation files, as below.
The preprocessing time can take few minutes.
All unCOVERApp functionalities are based on the availability of a BED-style formatted input file containing tab-separated specifications of genomic coordinates (chromosome, start position, end position), the coverage value, and the reference:alternate allele counts for each position. In the first page Preprocessing, users can prepare the input file by specifying the genes to be examined and the BAM file(s) to be inspected. Users should be able to provide:
Load input file
box. An example file is included in extdata of uncoverappLib
packagesLoad bam file(s) list
box.Type the following command to load our example:
bam_example <- system.file("extdata", "example_POLG.bam", package = "uncoverappLib")
print(bam_example)
#> [1] "/tmp/RtmpFWFDat/Rinst49572dd0891/uncoverappLib/extdata/example_POLG.bam"
write.table(bam_example, file= "./bam.list", quote= FALSE, row.names = FALSE,
col.names = FALSE)
and launch run.uncoverapp(where="browser")
command.
After running run.uncoverapp(where="browser")
the shiny app
appears in your deafult browser. RStudio user can define where launching
uncoverapp using where
option:
browser
option will open uncoverapp
in
your default browserviewer
option will open uncoverapp
in
RStudio viewerwindow
option will open uncoverapp
in
RStudio RStudioIf option where
is not defined uncoverapp will launch
with default option of R.
In the first page Preprocessing users can load
mygene.txt
in Load input file
and
bam.list
in Load bam file(s) list
. In general,
a target bed can also be used instead of genes name selecting
Target Bed
option in
Choose the type of your input file
. Users should also
specify the reference genome in Genome
box and the
chromosome notation of their BAM file(s) in
Chromosome Notation
box. In the BAM file, the number option
refers to 1, 2, …, X,.M chromosome notation, while the chr option refers
to chr1, chr2, … chrX, chrM chromosome notation. Users can specify the
minimum mapping quality (MAPQ)
value in box and
minimum base quality (QUAL)
value in box. Default values
for both mapping and base qualities is 1. Users can download
Statistical_Summary
report to obtain a coverage metrics per
genes (List of genes name
) or per amplicons
(Target Bed
) according to uploaded input file. The report
summarizes following information: mean, median, number of positions
under 20x and percentage of position above 20x.
To run the example, choose chr chromosome notation, hg19 genome reference and leave minimum mapping and base qualities to the default settings, as shown in the following screenshot of the Preprocessing page:
unCOVERApp input file generation fails if incorrect gene names are specified. An unrecognized gene name(s) table is displayed if such a case occurs. Below is a snippet of a the unCOVERApp input file generated as a result of the preprocessing step performed for the example
chr15 89859516 89859516 68 A:68
chr15 89859517 89859517 70 T:70
chr15 89859518 89859518 73 A:2;G:71
chr15 89859519 89859519 73 A:73
chr15 89859520 89859520 74 C:74
chr15 89859521 89859521 75 C:1;T:74
The preprocessing time depends on the size of the BAM file(s) and on
the number of genes to investigate. In general, if many (e.g. > 50)
genes are to be analyzed, we would recommend to use
buildInput
function
in R console before launching the app as shown in following example.
This function also return a file with .txt estention containg
statistical report of each genes/amplicon Alternatively, other tools do
a similar job and can be used to generate the unCOVERApp input file (
for instance: bedtools, samtools, gatk). In this case,
users can load the file directly on Coverage Analysis
page in Select input file
box.
Once pre-processing is done, users can move to the Coverage
Analysis page and push the
load prepared input file
button.
To assess sequence coverage of the example, the following input parameters must be specified in the sidebar of the Coverage Analysis section
Reference Genome
: reference genome (hg19 or hg38);
choose hg19
Gene name
and push Apply
button: write
the HGNC official gene name POLG
Coverage threshold
: specify coverage threshold
(e.g. 20x)
Sample
: sample name to be analyzed
Transcript number
: transcript number. Choose
1
exon number
: to zoom in a specific exon. Choose
10
Other input sections, as Chromosome
,
Transcript ID
, START genomic position
,
END genomic position
and Region coordinate
,
are dynamically filled.
unCOVERApp generates the following outputs :
unfiltered BED file in bed file
and the
corresponding filtered dataset in
Low-coverage positions
information about POLG gene in UCSC gene
table
UCSC exons
tableGene coverage
. The plot
displays the chromosome ideogram, the genomic location and gene
annotations from Ensembl and the transcript(s)
annotation from UCSC. Processing time is few minutes. A related table
shows the number of uncovered positions in each exon given a
user-defined transcript number (here transcript number is 1), and the
user-defined threshold coverage (here the coverage threshold is 20x).
Table and plot both show the many genomic positions that display low-DoC
profile in POLG.Make exon
and view the plot in
Exon coverage
. Processing time is few minutes. A related
table shows the number of low-DoC positions in ClinVar
which have a high impact annotation. For this output to be generated,
sorted.bed.gz and sorted.bed.gz.tbi
are required to be downloaded with getAnnotationFiles()
function. Table and plot both show that 21 low-DoC genomic positions
have ClinVar annotation, suggesting several clinically relevant
positions that are not adequately represented in this experiment. It is
possible zooming at base pair level choosing a few interval (20-30 bp)
in Region coordinates
and moving on
Zoom to sequence
.Annotations on low-coverage positions
. Functional and
clinical annotations of all potential non- synonymous single-nucleotide
variants across the examined low DoC sites are made available. Potential
changes that have a clinical annotation, a high impact or deleterious
prediction are highlighted in yellow. In the example, a low Doc site
(chr15:89868687) is predicted as pathogenic and could be potentially
linked to disease.By clicking on the download
button, users can save the
table as spreadsheet format with certain cells colored according to
pre-specified thresholds for AF, CADD, MAP-CAP, SIFT, Polyphen2,
ClinVar, OMIM ID, HGVSp and HGVSc, …).
In Calculate maximum credible allele frequency page, users can set allele frequency cut-offs based on specific assumptions about the genetic architecture of the disease. If not specified, variants with allele frequency > 5 % will be instead filtered out. More details are available here. Moreover, users may click on the ”download” button and save the resulting table as spreadsheet format.
The Binomial distribution page returns the 95 %
binomial probability distribution of the variant supporting reads on the
input genomic position (Genomic position
). Users should
define the expected allele fraction
(the expected fraction
of variant reads, probability of success) and Variant reads
(the minimum number of variant reads required by the user to support
variant calling, number of successes). The comment color change
according to binomial proportion intervals. If the estimated intervals ,
with 95% confidence, is included or higher than user-defined
Variant reads
the color of comment appears blue, otherwise
if it is lower the color appears red.
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] uncoverappLib_1.17.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] later_1.3.2
#> [2] BiocIO_1.17.0
#> [3] bitops_1.0-9
#> [4] filelock_1.0.3
#> [5] tibble_3.2.1
#> [6] graph_1.85.0
#> [7] XML_3.99-0.17
#> [8] rpart_4.1.23
#> [9] lifecycle_1.0.4
#> [10] httr2_1.0.5
#> [11] Homo.sapiens_1.3.1
#> [12] processx_3.8.4
#> [13] lattice_0.22-6
#> [14] ensembldb_2.31.0
#> [15] OrganismDbi_1.49.0
#> [16] backports_1.5.0
#> [17] magrittr_2.0.3
#> [18] openxlsx_4.2.7.1
#> [19] Hmisc_5.2-0
#> [20] sass_0.4.9
#> [21] rmarkdown_2.28
#> [22] jquerylib_0.1.4
#> [23] yaml_2.3.10
#> [24] rlist_0.4.6.2
#> [25] shinyBS_0.61.1
#> [26] httpuv_1.6.15
#> [27] zip_2.3.1
#> [28] Gviz_1.51.0
#> [29] DBI_1.2.3
#> [30] buildtools_1.0.0
#> [31] RColorBrewer_1.1-3
#> [32] abind_1.4-8
#> [33] zlibbioc_1.52.0
#> [34] GenomicRanges_1.59.0
#> [35] AnnotationFilter_1.31.0
#> [36] biovizBase_1.55.0
#> [37] BiocGenerics_0.53.1
#> [38] RCurl_1.98-1.16
#> [39] nnet_7.3-19
#> [40] VariantAnnotation_1.52.0
#> [41] rappdirs_0.3.3
#> [42] GenomeInfoDbData_1.2.13
#> [43] IRanges_2.41.0
#> [44] S4Vectors_0.44.0
#> [45] maketools_1.3.1
#> [46] condformat_0.10.1
#> [47] codetools_0.2-20
#> [48] DelayedArray_0.33.1
#> [49] DT_0.33
#> [50] xml2_1.3.6
#> [51] tidyselect_1.2.1
#> [52] UCSC.utils_1.2.0
#> [53] shinyWidgets_0.8.7
#> [54] matrixStats_1.4.1
#> [55] stats4_4.4.1
#> [56] BiocFileCache_2.15.0
#> [57] base64enc_0.1-3
#> [58] GenomicAlignments_1.43.0
#> [59] jsonlite_1.8.9
#> [60] Formula_1.2-5
#> [61] tools_4.4.1
#> [62] progress_1.2.3
#> [63] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
#> [64] Rcpp_1.0.13
#> [65] glue_1.8.0
#> [66] gridExtra_2.3
#> [67] SparseArray_1.6.0
#> [68] xfun_0.48
#> [69] MatrixGenerics_1.19.0
#> [70] GenomeInfoDb_1.43.0
#> [71] dplyr_1.1.4
#> [72] BiocManager_1.30.25
#> [73] fastmap_1.2.0
#> [74] latticeExtra_0.6-30
#> [75] fansi_1.0.6
#> [76] shinyjs_2.1.0
#> [77] digest_0.6.37
#> [78] R6_2.5.1
#> [79] mime_0.12
#> [80] colorspace_2.1-1
#> [81] GO.db_3.20.0
#> [82] jpeg_0.1-10
#> [83] dichromat_2.0-0.1
#> [84] markdown_1.13
#> [85] biomaRt_2.63.0
#> [86] RSQLite_2.3.7
#> [87] utf8_1.2.4
#> [88] generics_0.1.3
#> [89] data.table_1.16.2
#> [90] rtracklayer_1.66.0
#> [91] prettyunits_1.2.0
#> [92] httr_1.4.7
#> [93] htmlwidgets_1.6.4
#> [94] S4Arrays_1.6.0
#> [95] pkgconfig_2.0.3
#> [96] gtable_0.3.6
#> [97] blob_1.2.4
#> [98] XVector_0.46.0
#> [99] sys_3.4.3
#> [100] htmltools_0.5.8.1
#> [101] RBGL_1.82.0
#> [102] ProtGenerics_1.39.0
#> [103] scales_1.3.0
#> [104] Biobase_2.67.0
#> [105] TxDb.Hsapiens.UCSC.hg38.knownGene_3.20.0
#> [106] png_0.1-8
#> [107] EnsDb.Hsapiens.v75_2.99.0
#> [108] knitr_1.48
#> [109] rstudioapi_0.17.1
#> [110] rjson_0.2.23
#> [111] checkmate_2.3.2
#> [112] curl_5.2.3
#> [113] org.Hs.eg.db_3.20.0
#> [114] cachem_1.1.0
#> [115] stringr_1.5.1
#> [116] shinycssloaders_1.1.0
#> [117] parallel_4.4.1
#> [118] foreign_0.8-87
#> [119] AnnotationDbi_1.69.0
#> [120] restfulr_0.0.15
#> [121] pillar_1.9.0
#> [122] grid_4.4.1
#> [123] vctrs_0.6.5
#> [124] promises_1.3.0
#> [125] dbplyr_2.5.0
#> [126] EnsDb.Hsapiens.v86_2.99.0
#> [127] xtable_1.8-4
#> [128] cluster_2.1.6
#> [129] htmlTable_2.4.3
#> [130] evaluate_1.0.1
#> [131] GenomicFeatures_1.59.0
#> [132] cli_3.6.3
#> [133] compiler_4.4.1
#> [134] Rsamtools_2.22.0
#> [135] rlang_1.1.4
#> [136] crayon_1.5.3
#> [137] interp_1.1-6
#> [138] ps_1.8.1
#> [139] stringi_1.8.4
#> [140] deldir_2.0-4
#> [141] BiocParallel_1.41.0
#> [142] txdbmaker_1.2.0
#> [143] munsell_0.5.1
#> [144] Biostrings_2.75.0
#> [145] lazyeval_0.2.2
#> [146] Matrix_1.7-1
#> [147] BSgenome_1.75.0
#> [148] hms_1.1.3
#> [149] bit64_4.5.2
#> [150] ggplot2_3.5.1
#> [151] KEGGREST_1.47.0
#> [152] shiny_1.9.1
#> [153] highr_0.11
#> [154] SummarizedExperiment_1.36.0
#> [155] memoise_2.0.1
#> [156] bslib_0.8.0
#> [157] bit_4.5.0