Title: | FFPE Artificial Chimeric Read Filter for NGS data |
---|---|
Description: | This package finds and filters artificial chimeric reads specifically generated in next-generation sequencing (NGS) process of formalin-fixed paraffin-embedded (FFPE) tissues. These artificial chimeric reads can lead to a large number of false positive structural variation (SV) calls. The required input is an indexed BAM file of a FFPE sample. |
Authors: | Lanying Wei [aut, cre] |
Maintainer: | Lanying Wei <[email protected]> |
License: | LGPL-3 |
Version: | 1.17.0 |
Built: | 2024-11-19 06:28:35 UTC |
Source: | https://github.com/bioc/FilterFFPE |
This package finds and filters artificial chimeric reads specifically generated in next-generation sequencing (NGS) process of formalin-fixed paraffin-embedded (FFPE) tissues. These artificial chimeric reads can lead to a large number of false positive structural variation (SV) calls. The required input is an indexed BAM file of a FFPE sample.
Package: | FilterFFPE |
Type: | Package |
Title: | FFPE Artificial Chimeric Read Filter for NGS data |
Version: | 1.17.0 |
Authors@R: | person("Lanying", "Wei", email="[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-4281-8017")) |
Description: | This package finds and filters artificial chimeric reads specifically generated in next-generation sequencing (NGS) process of formalin-fixed paraffin-embedded (FFPE) tissues. These artificial chimeric reads can lead to a large number of false positive structural variation (SV) calls. The required input is an indexed BAM file of a FFPE sample. |
License: | LGPL-3 |
Encoding: | UTF-8 |
Imports: | foreach, doParallel, GenomicRanges, IRanges, Rsamtools, parallel, S4Vectors |
Suggests: | BiocStyle |
biocViews: | StructuralVariation, Sequencing, Alignment, QualityControl, Preprocessing |
Config/pak/sysreqs: | libssl-dev |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/FilterFFPE |
RemoteRef: | HEAD |
RemoteSha: | 494b4cf7a855f72659ac0c586d763ac463d0a2c7 |
Author: | Lanying Wei [aut, cre] (<https://orcid.org/0000-0002-4281-8017>) |
Maintainer: | Lanying Wei <[email protected]> |
The next-generation sequencing (NGS) reads from formalin-fixed paraffin-embedded (FFPE) samples contain numerous artifact chimeric reads, which can lead to a large number of false positive structural variation (SV) calls. This package finds and filters these artifact chimeric reads from BAM files of FFPE samples to improve SV calling performance.
Index of help topics:
FFPEReadFilter Find and filter artifact chimeric reads in BAM file of FFPE sample FilterFFPE-package FFPE Artificial Chimeric Read Filter for NGS data filterBamByReadNames Filter reads from BAM file by read names findArtifactChimericReads Find artifact chimeric reads in BAM file of FFPE sample
There are three available functions to find and/or filter artifact chimeric reads of FFPE samples:
1. findArtifactChimericReads
: Find artifact chimeric reads in BAM
file of FFPE sample.
2. filterBamByReadNames
: Filter reads from BAM file by read names.
3. FFPEReadFilter
: Find and filter artifact chimeric reads in BAM
file of FFPE sample.
Lanying Wei <[email protected]>
FilterFFPE
, filterBamByReadNames
,
FFPEReadFilter
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") destination <- paste0(outFolder, "/example.FilterFFPE.bam") FFPEReadFilter(file = file, threads = 2, destination = destination, overwrite = TRUE, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile)
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") destination <- paste0(outFolder, "/example.FilterFFPE.bam") FFPEReadFilter(file = file, threads = 2, destination = destination, overwrite = TRUE, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile)
Artifact chimeric reads are enriched in NGS data of FFPE samples, these reads can lead to a large number of false positive SV calls. This function finds and filters these artifact chimeric reads.
FFPEReadFilter(file, maxReadsOfSameBreak=2, minMapBase=1, threads=1, index=file, destination=sub("\\.bam(\\.gz)?", ".FilterFFPE.bam", file), overwrite=FALSE, FFPEReadsFile=sub("\\.bam(\\.gz)?", ".FFPEReads.txt", file), dupChimFile=sub("\\.bam(\\.gz)?", ".dupChim.txt", file), filterdupChim=TRUE)
FFPEReadFilter(file, maxReadsOfSameBreak=2, minMapBase=1, threads=1, index=file, destination=sub("\\.bam(\\.gz)?", ".FilterFFPE.bam", file), overwrite=FALSE, FFPEReadsFile=sub("\\.bam(\\.gz)?", ".FFPEReads.txt", file), dupChimFile=sub("\\.bam(\\.gz)?", ".dupChim.txt", file), filterdupChim=TRUE)
file |
Path to the BAM file. |
maxReadsOfSameBreak |
The maximum allowed number of artifact chimeric reads sharing a false positive breakpoint. If the number of reads sharing the same breakpoint exceeds this number, these reads are not recognized as artifact chimeric reads. Reads marked as PCR or optical duplicates are excluded from the calculation. For paired-end sequencing, a read pair of artifact chimeric fragments may both contain the artifact breakpoints; thereby, the defalut is set to 2. |
minMapBase |
The minimum required length (bp) of a short complementary mapping for an artifact chimeric read. Artifact chimeric reads are derived from the combination of two single-stranded DNA fragments linked by short reverse complementary regions (SRCR). Reads with SRCR shorter than this length are not recognized as artifact chimeric reads. Note: sequence errors and mutations might influence the detection of the existence and length of SRCR. Suggested range: 0-3. When it is set to 0 or any value below 1, this step will be skipped. |
threads |
Number of threads to use. Multi-threading can speed up the process. |
index |
Path of the index file of the input BAM file. |
destination |
Path of the output filtered BAM file. |
overwrite |
Boolean value indicating whether the destination can be over-written if it already exists. |
FFPEReadsFile |
Path of the output txt file with artifact chimeric read names. |
dupChimFile |
Path of the output txt file with supplementary reads that are marked as PCR or optical duplicates. |
filterdupChim |
Filter PCR or optical duplicates of all chimeric reads when set to true. These reads may contain duplicates of artifact chimeric reads; therefore, it is recommended to also remove these reads. |
The next-generation sequencing (NGS) reads from formalin-fixed paraffin-embedded (FFPE) samples contain numerous artifact chimeric reads, which can lead to a large number of false positive structural variation (SV) calls. This function finds and filters these artifact chimeric reads. An index file is also generated for the created filtered BAM file.
The file name of the created destination file.
Lanying Wei <[email protected]>
FilterFFPE
, findArtifactChimericReads
,
filterBamByReadNames
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") destination <- paste0(outFolder, "/example.FilterFFPE.bam") FFPEReadFilter(file = file, threads = 2, destination = destination, overwrite = TRUE, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile)
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") destination <- paste0(outFolder, "/example.FilterFFPE.bam") FFPEReadFilter(file = file, threads = 2, destination = destination, overwrite = TRUE, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile)
Generate filtered BAM file that does not contain reads with the input read names.
filterBamByReadNames(file, readsToFilter, index=file, destination=sub("\\.bam(\\.gz)?", ".FilterFFPE.bam", file), overwrite=FALSE)
filterBamByReadNames(file, readsToFilter, index=file, destination=sub("\\.bam(\\.gz)?", ".FilterFFPE.bam", file), overwrite=FALSE)
file |
Path to the input BAM file. |
readsToFilter |
A character vector of read names to filter. |
index |
Path of the index file of the input BAM file. |
destination |
Path of the output filtered BAM file. |
overwrite |
Boolean value indicating whether the destination can be over-written if it already exists. |
Generate filtered BAM file that does not contain reads with the input read names, index file is also created.
The file name of the created destination file.
Lanying Wei <[email protected]>
FilterFFPE
, findArtifactChimericReads
,
FFPEReadFilter
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") destination <- paste0(outFolder, "/example.FilterFFPE.bam") artifactReads <- findArtifactChimericReads(file = file, threads = 2, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile) dupChim <- readLines(dupChimFile) readsToFilter <- c(artifactReads, dupChim) filterBamByReadNames(file = file, readsToFilter = readsToFilter, destination = destination, overwrite=TRUE)
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") destination <- paste0(outFolder, "/example.FilterFFPE.bam") artifactReads <- findArtifactChimericReads(file = file, threads = 2, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile) dupChim <- readLines(dupChimFile) readsToFilter <- c(artifactReads, dupChim) filterBamByReadNames(file = file, readsToFilter = readsToFilter, destination = destination, overwrite=TRUE)
Artifact chimeric reads are enriched in NGS data of FFPE samples, these reads can lead to a large number of false positive SV calls. This function finds these artifact chimeric reads.
findArtifactChimericReads(file, maxReadsOfSameBreak=2, minMapBase=1, threads=1, FFPEReadsFile=sub("\\.bam(\\.gz)?", ".FFPEReads.txt", file), dupChimFile=sub("\\.bam(\\.gz)?", ".dupChim.txt", file))
findArtifactChimericReads(file, maxReadsOfSameBreak=2, minMapBase=1, threads=1, FFPEReadsFile=sub("\\.bam(\\.gz)?", ".FFPEReads.txt", file), dupChimFile=sub("\\.bam(\\.gz)?", ".dupChim.txt", file))
file |
Path to the BAM file. |
maxReadsOfSameBreak |
The maximum allowed number of artifact chimeric reads sharing a false positive breakpoint. If the number of reads sharing the same breakpoint exceeds this number, these reads are not recognized as artifact chimeric reads. Reads marked as PCR or optical duplicates are excluded from the calculation. For paired-end sequencing, a read pair of artifact chimeric fragments may both contain the artifact breakpoints; thereby, the defalut is set to 2. |
minMapBase |
The minimum required length (bp) of a short complementary mapping for an artifact chimeric read. Artifact chimeric reads are derived from the combination of two single-stranded DNA fragments linked by short reverse complementary regions (SRCR). Reads with SRCR shorter than this length are not recognized as artifact chimeric reads. Note: sequence errors and mutations might influence the detection of the existence and length of SRCR. Suggested range: 0-3. When it is set to 0 or any value below 1, this step will be skipped. |
threads |
Number of threads to use. Multi-threading can speed up the process. |
FFPEReadsFile |
Path of the output txt file with artifact chimeric read names. |
dupChimFile |
Path of the output txt file with read names of PCR or optical duplicates of all chimeric reads. |
The next-generation sequencing (NGS) reads from formalin-fixed paraffin-embedded
(FFPE) samples contain numerous artifact chimeric reads, which can lead to a
large number of false positive structural variation (SV) calls. This function
finds the read names of these artifact chimeric reads. To further filter these
reads, filterBamByReadNames
can be applied.
A character vector of artifact chimeric read names.
Lanying Wei <[email protected]>
FilterFFPE
, filterBamByReadNames
,
FFPEReadFilter
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") artifactReads <- findArtifactChimericReads(file = file, threads = 2, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile) head(artifactReads)
file <- system.file("extdata", "example.bam", package = "FilterFFPE") outFolder <- tempdir() FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt") dupChimFile <- paste0(outFolder, "/example.dupChim.txt") artifactReads <- findArtifactChimericReads(file = file, threads = 2, FFPEReadsFile = FFPEReadsFile, dupChimFile = dupChimFile) head(artifactReads)