The Rfastp package provides an interface to the all-in-one preprocessing for FastQ files toolkit fastp(Chen et al. 2018).
Use the BiocManager
package to download and install the
package from Bioconductor as follows:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Rfastp")
If required, the latest development version of the package can also be installed from GitHub.
Once the package is installed, load it into your R session:
The package contains three example fastq files, corresponding to a single-end fastq file, a pair of paired-end fastq files.
se_read1 <- system.file("extdata","Fox3_Std_small.fq.gz",package="Rfastp")
pe_read1 <- system.file("extdata","reads1.fastq.gz",package="Rfastp")
pe_read2 <- system.file("extdata","reads2.fastq.gz",package="Rfastp")
outputPrefix <- tempfile(tmpdir = tempdir())
Rfastp support multiple threads, set threads number by parameter
thread
.
the following example will add prefix string before the UMI sequence in the sequence name. An “_” will be added between the prefix string and UMI sequence. The UMI sequences will be inserted into the sequence name before the first space.
Trim poor quality bases at 3’ end base by base with quality higher than 5; trim poor quality bases at 5’ end by a 29bp window with mean quality higher than 20; disable the polyG trimming, specify the adapter sequence for read1.
clipr_json_report <- rfastp(read1 = se_read1,
outputFastq = paste0(outputPrefix, '_clipr'),
disableTrimPolyG = TRUE,
cutLowQualFront = TRUE,
cutFrontWindowSize = 29,
cutFrontMeanQual = 20,
cutLowQualTail = TRUE,
cutTailWindowSize = 1,
cutTailMeanQual = 5,
minReadLength = 29,
adapterSequenceRead1 = 'GTGTCAGTCACTTCCAGCGG'
)
rfastq can accept multiple input files, and it will concatenate the input files into one and the run fastp.
pe001_read1 <- system.file("extdata","splited_001_R1.fastq.gz",
package="Rfastp")
pe002_read1 <- system.file("extdata","splited_002_R1.fastq.gz",
package="Rfastp")
pe003_read1 <- system.file("extdata","splited_003_R1.fastq.gz",
package="Rfastp")
pe004_read1 <- system.file("extdata","splited_004_R1.fastq.gz",
package="Rfastp")
inputfiles <- c(pe001_read1, pe002_read1, pe003_read1, pe004_read1)
cat_rjson_report <- rfastp(read1 = inputfiles,
outputFastq = paste0(outputPrefix, "_merged1"))
pe001_read2 <- system.file("extdata","splited_001_R2.fastq.gz",
package="Rfastp")
pe002_read2 <- system.file("extdata","splited_002_R2.fastq.gz",
package="Rfastp")
pe003_read2 <- system.file("extdata","splited_003_R2.fastq.gz",
package="Rfastp")
pe004_read2 <- system.file("extdata","splited_004_R2.fastq.gz",
package="Rfastp")
inputR2files <- c(pe001_read2, pe002_read2, pe003_read2, pe004_read2)
catfastq(output = paste0(outputPrefix,"_merged2_R2.fastq.gz"),
inputFiles = inputR2files)
usage of rfastp:
usage of catfastq:
usage of qcSummary:
usage of trimSummary:
usage of curvePlot:
Thank you to Ji-Dung Luo for testing/vignette review/critical feedback, Doug Barrows for critical feedback/vignette review and Ziwei Liang for their support. # Session info
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Rfastp_1.17.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_1.8.9 rjson_0.2.23 compiler_4.4.2
## [5] BiocManager_1.30.25 Rcpp_1.0.13-1 stringr_1.5.1 jquerylib_0.1.4
## [9] scales_1.3.0 yaml_2.3.10 fastmap_1.2.0 ggplot2_3.5.1
## [13] R6_2.5.1 plyr_1.8.9 labeling_0.4.3 knitr_1.49
## [17] tibble_3.2.1 maketools_1.3.1 munsell_0.5.1 bslib_0.8.0
## [21] pillar_1.9.0 rlang_1.1.4 utf8_1.2.4 cachem_1.1.0
## [25] stringi_1.8.4 xfun_0.49 sass_0.4.9 sys_3.4.3
## [29] cli_3.6.3 withr_3.0.2 magrittr_2.0.3 digest_0.6.37
## [33] grid_4.4.2 lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.1
## [37] glue_1.8.0 farver_2.1.2 buildtools_1.0.0 fansi_1.0.6
## [41] colorspace_2.1-1 reshape2_1.4.4 rmarkdown_2.29 tools_4.4.2
## [45] pkgconfig_2.0.3 htmltools_0.5.8.1