Title: | Detection of SARS-CoV-2 lineages in wastewater samples using next-generation sequencing |
---|---|
Description: | Lineagespot is a framework written in R, and aims to identify SARS-CoV-2 related mutations based on a single (or a list) of variant(s) file(s) (i.e., variant calling format). The method can facilitate the detection of SARS-CoV-2 lineages in wastewater samples using next generation sequencing, and attempts to infer the potential distribution of the SARS-CoV-2 lineages. |
Authors: | Nikolaos Pechlivanis [aut, cre] , Maria Tsagiopoulou [aut], Maria Christina Maniou [aut], Anastasis Togkousidis [aut], Evangelia Mouchtaropoulou [aut], Taxiarchis Chassalevris [aut], Serafeim Chaintoutis [aut], Chrysostomos Dovas [aut], Maria Petala [aut], Margaritis Kostoglou [aut], Thodoris Karapantsios [aut], Stamatia Laidou [aut], Elisavet Vlachonikola [aut], Aspasia Orfanou [aut], Styliani-Christina Fragkouli [aut], Sofoklis Keisaris [aut], Anastasia Chatzidimitriou [aut], Agis Papadopoulos [aut], Nikolaos Papaioannou [aut], Anagnostis Argiriou [aut], Fotis E. Psomopoulos [aut] |
Maintainer: | Nikolaos Pechlivanis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.11.0 |
Built: | 2024-10-30 08:37:53 UTC |
Source: | https://github.com/bioc/lineagespot |
Retrieve information about lineages' variants vie outbreak.info's API
get_lineage_report( lineages, base.url = "https://api.outbreak.info/genomics/lineage-mutations?pangolin_lineage=" )
get_lineage_report( lineages, base.url = "https://api.outbreak.info/genomics/lineage-mutations?pangolin_lineage=" )
lineages |
a character vector containing the names of the lineages of interest |
base.url |
The base API URL used to search for lineage reports Default value is "https://api.outbreak.info/genomics/ lineage-mutations?pangolin_lineage=" |
A list of data table elements of lineage reports
get_lineage_report(lineages = c("B.1.1.7", "B.1.617.2"))
get_lineage_report(lineages = c("B.1.1.7", "B.1.617.2"))
Identify whether a file is in GFF3 format.
is_gff3(file)
is_gff3(file)
file |
Path to GFF3 file. |
result; TRUE if the input file is in GFF3 format, FALSE if not.
gff3_path <- system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) is_gff3(gff3_path)
gff3_path <- system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) is_gff3(gff3_path)
Identify SARS-CoV-2 related mutations based on a single (or a list) of variant(s) file(s)
lineagespot( vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL, ref_folder = NULL, voc = c("B.1.617.2", "B.1.1.7", "B.1.351", "P.1"), AF_threshold = 0.8 )
lineagespot( vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL, ref_folder = NULL, voc = c("B.1.617.2", "B.1.1.7", "B.1.351", "P.1"), AF_threshold = 0.8 )
vcf_fls |
A character vector of paths to VCF files |
vcf_folder |
A path to a folder containing all VCF files that will be integrated into a single table |
gff3_path |
Path to GFF3 file containing SARS-CoV-2 gene coordinates. |
ref_folder |
A path to a folder containing lineage reports |
voc |
A character vector containing the names of the lineages of interest |
AF_threshold |
A parameter indicating the AF threshold for identifying variants per sample |
A list of three elements;
Variants' table; A data table containing all variants that are included in the input VCF files
Lineage hits; A data table containing identified hits between the input variants and outbreak.info's lineage reports
Lineage report; A data table with computed metrics about the prevalence of the lineage of interest per sample.
results <- lineagespot( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ), ref_folder = system.file("extdata", "ref", package = "lineagespot" ) ) head(results$lineage.report)
results <- lineagespot( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ), ref_folder = system.file("extdata", "ref", package = "lineagespot" ) ) head(results$lineage.report)
Find overlapping variants with SARS-CoV-2 reference lineages coming from outbreak.info reports
lineagespot_hits( vcf_table = NULL, ref_folder = NULL, voc = c("B.1.617.2", "B.1.1.7", "B.1.351", "P.1") )
lineagespot_hits( vcf_table = NULL, ref_folder = NULL, voc = c("B.1.617.2", "B.1.1.7", "B.1.351", "P.1") )
vcf_table |
A tab-delimited table containing all variants for all samples. This input
is generated by the |
ref_folder |
A path to lineages' reports |
voc |
A character vector containing the names of the lineages of interest |
A data table containing all identified SARS-CoV-2 variants based on the provided reference files
variants_table <- merge_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) ) # retrieve lineage reports using outbreak.info's API # use user-specified references lineage_hits_table <- lineagespot_hits( vcf_table = variants_table, ref_folder = system.file("extdata", "ref", package = "lineagespot" ) )
variants_table <- merge_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) ) # retrieve lineage reports using outbreak.info's API # use user-specified references lineage_hits_table <- lineagespot_hits( vcf_table = variants_table, ref_folder = system.file("extdata", "ref", package = "lineagespot" ) )
Check the validity of input parameters from lineagespot function.
list_input(vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL)
list_input(vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL)
vcf_fls |
A character vector of paths to VCF files. |
vcf_folder |
A path to a folder containing all VCF files that will be integrated into a single table. |
gff3_path |
Path to GFF3 file containing SARS-CoV-2 gene coordinates. |
Return a character vector of paths to VCF files.
vcflist <- list_input( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) )
vcflist <- list_input( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) )
Identify VCF files from a group of files.
list_vcf(vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL)
list_vcf(vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL)
vcf_fls |
A character vector of paths to VCF files |
vcf_folder |
A path to a folder containing all VCF files that will be integrated into a single table |
gff3_path |
Path to GFF3 file containing SARS-CoV-2 gene coordinates. |
VCF list; A list where only VCF files are stored.
list_vcf_info <- list_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) ) print(list_vcf_info)
list_vcf_info <- list_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) ) print(list_vcf_info)
Merge Variant Calling Format (VCF) files into a single tab-delimited table
merge_vcf(vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL)
merge_vcf(vcf_fls = NULL, vcf_folder = NULL, gff3_path = NULL)
vcf_fls |
A list of paths to VCF files |
vcf_folder |
A path to a folder containing all VCF file that will be integrated into a single table |
gff3_path |
Path to GFF3 file |
A data table contaiing all variants from each sample of the input VCF files
merge_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) )
merge_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) )
Lineage report for variants overlapping
uniq_variants(hits_table = NULL, AF_threshold = 0.8)
uniq_variants(hits_table = NULL, AF_threshold = 0.8)
hits_table |
A tab-delimited table containing the identified overlaps/hits between the
input files and the lineages' reports. This input is generated by the
|
AF_threshold |
A parameter indicating the AF threshold that is going to applied in order to identify the presence or not of a variant. This is used to compute the number of variants in a sample and eventually the proportion of a lineage. |
A data table with metrics assessing the abundance of every lineage in each samples
variants_table <- merge_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) ) lineage_hits_table <- lineagespot_hits( vcf_table = variants_table, ref_folder = system.file("extdata", "ref", package = "lineagespot") ) report <- uniq_variants(hits_table = lineage_hits_table) head(report)
variants_table <- merge_vcf( vcf_folder = system.file("extdata", "vcf-files", package = "lineagespot" ), gff3_path = system.file("extdata", "NC_045512.2_annot.gff3", package = "lineagespot" ) ) lineage_hits_table <- lineagespot_hits( vcf_table = variants_table, ref_folder = system.file("extdata", "ref", package = "lineagespot") ) report <- uniq_variants(hits_table = lineage_hits_table) head(report)