lineagespot
is a framework written in R, and aims to
identify SARS-CoV-2 related mutations based on a single (or a list) of
variant(s) file(s) (i.e., variant calling format). The method can
facilitate the detection of SARS-CoV-2 lineages in wastewater samples
using next generation sequencing, and attempts to infer the potential
distribution of the SARS-CoV-2 lineages.
lineagespot
is distributed as a Bioconductor package and
requires R
(version “4.1”), which can be installed on any
operating system from CRAN,
and Bioconductor (version “3.14”).
To install lineagespot
package enter the following
commands in your R
session:
Example fastq files are provided through zenodo. For the pre processing steps of them, the bioinformatics analysis pipeline is provided here.
Once lineagespot
is successfully installed, it can be
loaded as follow:
lineagespot
can be run by calling one function that
implements the overall pipeline:
The function returns three tables:
# overall table
head(results$variants.table)
#> CHROM POS ID REF ALT DP AD_ref
#> <char> <num> <char> <char> <char> <int> <num>
#> 1: NC_045512.2 328 NC_045512.2;328;ACA;ACCA ACA ACCA 36 34
#> 2: NC_045512.2 355 NC_045512.2;355;C;T C T 42 41
#> 3: NC_045512.2 366 NC_045512.2;366;C;T C T 42 28
#> 4: NC_045512.2 401 NC_045512.2;401;CTTAA;CTAA CTTAA CTAA 37 35
#> 5: NC_045512.2 406 NC_045512.2;406;AGA;AA AGA AA 35 34
#> 6: NC_045512.2 421 NC_045512.2;421;C;A C A 35 34
#> AD_alt Gene_Name Nt_alt AA_alt AF codon_num sample
#> <num> <char> <char> <char> <num> <num> <char>
#> 1: 1 ORF1a 64dupC Q22fs 0.02777778 21 SampleA_freebayes_ann
#> 2: 1 ORF1a 90C>T G30G 0.02380952 30 SampleA_freebayes_ann
#> 3: 14 ORF1a 101C>T S34F 0.33333333 34 SampleA_freebayes_ann
#> 4: 2 ORF1a 138delT D48fs 0.05405405 46 SampleA_freebayes_ann
#> 5: 1 ORF1a 142delG D48fs 0.02857143 47 SampleA_freebayes_ann
#> 6: 1 ORF1a 156C>A G52G 0.02857143 52 SampleA_freebayes_ann
# lineages' hits
head(results$lineage.hits)
#> Gene_Name AA_alt sample DP AD_alt AF lineage
#> <char> <char> <char> <num> <num> <num> <char>
#> 1: M I82T SampleC_freebayes_ann 3984 2770 0.6952811 AY.1
#> 2: N D63G SampleC_freebayes_ann 2180 787 0.3610092 AY.1
#> 3: N R203M SampleC_freebayes_ann 4147 4125 0.9946950 AY.1
#> 4: N G215C SampleC_freebayes_ann 4477 2574 0.5749386 AY.1
#> 5: N D377Y SampleC_freebayes_ann 4271 1623 0.3800047 AY.1
#> 6: ORF1a A1306S SampleC_freebayes_ann 2202 1267 0.5753860 AY.1
# lineagespot report
head(results$lineage.report)
#> lineage sample meanAF meanAF_uniq minAF_uniq_nonzero
#> <char> <char> <num> <num> <num>
#> 1: AY.1 SampleA_freebayes_ann 0.08333333 0.0000000 NA
#> 2: AY.1 SampleB_freebayes_ann 0.08333333 0.0000000 NA
#> 3: AY.1 SampleC_freebayes_ann 0.43162568 0.0000000 NA
#> 4: AY.2 SampleA_freebayes_ann 0.07692308 0.0000000 NA
#> 5: AY.2 SampleB_freebayes_ann 0.07692308 0.0000000 NA
#> 6: AY.2 SampleC_freebayes_ann 0.33117826 0.1198191 0.1594335
#> N lineage N. rules lineage prop.
#> <int> <int> <num>
#> 1: 1 31 0.03225806
#> 2: 1 31 0.03225806
#> 3: 6 31 0.19354839
#> 4: 1 29 0.03448276
#> 5: 1 29 0.03448276
#> 6: 4 29 0.13793103
Here is the output of sessionInfo()
on the system on
which this document was compiled running pandoc 3.2.1
:
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] lineagespot_1.11.0 RefManageR_1.4.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.45.1 SummarizedExperiment_1.35.5
#> [3] rjson_0.2.23 xfun_0.48
#> [5] bslib_0.8.0 Biobase_2.67.0
#> [7] lattice_0.22-6 vctrs_0.6.5
#> [9] tools_4.4.1 bitops_1.0-9
#> [11] generics_0.1.3 curl_5.2.3
#> [13] stats4_4.4.1 parallel_4.4.1
#> [15] AnnotationDbi_1.69.0 RSQLite_2.3.7
#> [17] blob_1.2.4 Matrix_1.7-1
#> [19] BSgenome_1.75.0 data.table_1.16.2
#> [21] S4Vectors_0.43.2 lifecycle_1.0.4
#> [23] GenomeInfoDbData_1.2.13 compiler_4.4.1
#> [25] stringr_1.5.1 Rsamtools_2.21.2
#> [27] Biostrings_2.75.0 codetools_0.2-20
#> [29] GenomeInfoDb_1.41.2 htmltools_0.5.8.1
#> [31] sys_3.4.3 buildtools_1.0.0
#> [33] sass_0.4.9 RCurl_1.98-1.16
#> [35] yaml_2.3.10 crayon_1.5.3
#> [37] jquerylib_0.1.4 BiocParallel_1.41.0
#> [39] cachem_1.1.0 DelayedArray_0.33.1
#> [41] abind_1.4-8 digest_0.6.37
#> [43] stringi_1.8.4 restfulr_0.0.15
#> [45] VariantAnnotation_1.51.2 maketools_1.3.1
#> [47] bibtex_0.5.1 fastmap_1.2.0
#> [49] grid_4.4.1 cli_3.6.3
#> [51] SparseArray_1.5.45 magrittr_2.0.3
#> [53] S4Arrays_1.5.11 GenomicFeatures_1.57.1
#> [55] XML_3.99-0.17 UCSC.utils_1.1.0
#> [57] backports_1.5.0 bit64_4.5.2
#> [59] lubridate_1.9.3 timechange_0.3.0
#> [61] rmarkdown_2.28 XVector_0.45.0
#> [63] httr_1.4.7 matrixStats_1.4.1
#> [65] bit_4.5.0 png_0.1-8
#> [67] memoise_2.0.1 evaluate_1.0.1
#> [69] knitr_1.48 BiocIO_1.17.0
#> [71] GenomicRanges_1.57.2 IRanges_2.39.2
#> [73] rtracklayer_1.65.0 rlang_1.1.4
#> [75] Rcpp_1.0.13 glue_1.8.0
#> [77] DBI_1.2.3 BiocManager_1.30.25
#> [79] xml2_1.3.6 BiocGenerics_0.53.0
#> [81] jsonlite_1.8.9 R6_2.5.1
#> [83] plyr_1.8.9 GenomicAlignments_1.41.0
#> [85] MatrixGenerics_1.17.1 zlibbioc_1.51.2