Package 'FastqCleaner' reference manual

Title:	A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files
Description:	An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.
Authors:	Leandro Roser [aut, cre], Fernán Agüero [aut], Daniel Sánchez [aut]
Maintainer:	Leandro Roser <[email protected]>
License:	MIT + file LICENSE
Version:	1.25.0
Built:	2025-03-29 06:19:16 UTC
Source:	https://github.com/bioc/FastqCleaner

Remove full and partial adapters from a ShortReadQ object

Description

This program can remove adapters and partial adapters from 3' and 5', using the functions trimLRPatterns The program extends the methodology of the trimLRPatterns function of Biostrings, being also capable of removing adapters present within reads and with other additional otpions (e.g., threshold of minimum number of bases for trimming). For a given position in the read, the two Biostrings functions return TRUE when a match is present between a substring of the read and the adapter. As trimLRPatterns , adapter_filter also selects region and goes up to the end of the sequence in the corresponding flank as the best match. The default error rate is 0.2. If several valid matches are found, the function removes the largest subsequence. Adapters can be anchored or not. When indels are allowed, the second method uses the 'edit distance' between the subsequences and the adapter

Usage

adapter_filter(
  input,
  Lpattern = "",
  Rpattern = "",
  rc.L = FALSE,
  rc.R = FALSE,
  first = c("R", "L"),
  with_indels = FALSE,
  error_rate = 0.2,
  anchored = TRUE,
  fixed = "subject",
  remove_zero = TRUE,
  checks = TRUE,
  min_match_flank = 3L,
  ...
)
adapter_filter(
  input,
  Lpattern = "",
  Rpattern = "",
  rc.L = FALSE,
  rc.R = FALSE,
  first = c("R", "L"),
  with_indels = FALSE,
  error_rate = 0.2,
  anchored = TRUE,
  fixed = "subject",
  remove_zero = TRUE,
  checks = TRUE,
  min_match_flank = 3L,
  ...
)

Arguments

`input`	`ShortReadQ` object
`Lpattern`	5' pattern (character or `DNAString` object)
`Rpattern`	3' pattern (character or `DNAString` object)
`rc.L`	Reverse complement Lpattern? default FALSE
`rc.R`	Reverse complement Rpatter? default FALSE
`first`	trim first right('R') or left ('L') side of sequences when both Lpattern and Rpattern are passed
`with_indels`	Allow indels? This feature is available only when the error_rate is not null
`error_rate`	Error rate (value in the range [0, 1] The error rate is the proportion of mismatches allowed between the adapter and the aligned portion of the subject. For a given adapter A, the number of allowed mismatches between each subsequence s of A and the subject is computed as: error_rate * L_s, where L_s is the length of the subsequence s
`anchored`	Adapter or partial adapter within sequence (anchored = FALSE, default) or only in 3' and 5' terminals? (anchored = TRUE)
`fixed`	Parameter passed to `trimLRPatterns` Default 'subject', ambiguities in the pattern only are interpreted as wildcard. See the argument fixed in `trimLRPatterns`
`remove_zero`	Remove zero-length sequences? Default TRUE
`checks`	Perform checks? Default TRUE
`min_match_flank`	Do not trim in flanks of the subject, if a match has min_match_flank of less length. Default 1L (only trim with >=2 coincidences in a flank match)
`...`	additional parameters passed to `trimLRPatterns`

Value

Edited DNAString or DNAStringSet object

Filtered ShortReadQ object

`input`	`ShortReadQ` object
`threshold`	A threshold value computed as the relation of the H of the sequences and the reference H. Default is 0.5
`referenceEntropy`	Reference entropy. By default, the program uses a value of 3.908, that corresponds to the entropy of the human genome in bits

`input`	`ShortReadQ` object
`trim3`	Number of bases to remove from 3'
`trim5`	Number of bases to remove from 5'

`my_seq`	character vector with sequences to inject
`how_many_seqs`	How many sequences pick to inject Ns. An interval [min_s, max_s] with min_s minimum and max_s maximum sequences can be passed. In this case, a value is picked from the interval. If NULL, a random value within the interval [1, length(my_seq)] is picked.
`how_many_letters`	How many times inject the letter in the i sequences that are going to be injected. An interval [min_i max_i] can be passed. In this case, a value is randomly picked for each sequence i. This value represents the number of times that the letter will be injected in the sequence i. If NULL, a random value within the interval [1, width(my_seq[i])] is picked for each sequence i.
`letter`	Letter to inject. Default: 'N'

`launch.browser`	Launch in browser? Default TRUE
`...`	Additional parameters passed to `runApp`

`input`	`ShortReadQ` object
`rm.min`	Threshold value for the minimun number of bases
`rm.max`	Threshold value for the maximum number of bases

`input`	`ShortReadQ` object
`rm.N`	Threshold value of N's to remove a sequence from the output (sequences with number of Ns > threshold are removed) For example, if rm.N is 3, all the sequences with a number of Ns > 3 (Ns >= 4) will be removed

`input`	`ShortReadQ` object
`minq`	Quality threshold
`q_format`	Quality format used for the file, as returned by check.encoding
`check.encod`	Check the encoding of the sequence? This argument is incompatible with q_format

`n`	number of sequences
`widths`	width of the sequences
`random_widths`	width must be picked at random from the passed parameter 'widths', considering the value as an interval where any integer can be picked. Default TRUE. Otherwise, widths are picked only from the vector passed.
`replace`	sample widths with replacement? Default TRUE.
`len_prob`	vector with probabilities for each width value. Default NULL (equiprobability)
`seq_prob`	a vector of four probabilities values to set the frequency of the nucleotides 'A', 'C', 'G', 'T', for DNA, or 'A', 'C', 'G', 'U', for RNA. For example = c(0.25, 0.25, 0.5, 0). Default is = c(0.25, 0.25, 0.25, 0.25) (equiprobability for the 4 bases). If the sum of the probabilities is > 1, the values will be nomalized to the range [0, 1].
`q_prob`	a vector of range = range(qual), with probabilities to set the frequency of each quality value. Default is equiprobability. If the sum of the probabilities is > 1, the values will be nomalized to the range [0, 1].
`nuc`	create sequences of DNA (nucleotides = c('A', 'C', 'G', 'T')) or RNA (nucleotides = c('A, 'C', 'G', 'U'))?. Default: 'DNA'
`qual`	quality range for the sequences. It must be a range included in the selected encoding: 'Sanger' = [0, 40] 'Illumina1.8' = [0, 41] 'Illumina1.5' = [0, 40] 'Illumina1.3' = [3, 40] 'Solexa' = [-5, 40] example: for a range from 20 to 30 in Sanger encoding, pass the argument = c(20, 30)
`encod`	sequence encoding
`base_name`	Base name for strings
`sep`	Character separing base names and the read number. Default: '_'

`slength`	number of sequences
`swidth`	width of the sequences
`qual`	quality range for the sequences. It must be a range included in the selected encoding: 'Sanger' = [0, 40] 'Illumina1.8' = [0, 41] 'Illumina1.5' = [0, 40] 'Illumina1.3' = [3, 40] 'Solexa' = [-5, 40] example: for a range from 20 to 30 in Sanger encoding, pass the argument = c(20, 30)
`encod`	sequence encoding
`prob`	a vector of range = range(qual), with probabilities to set the frequency of each quality value. Default is equiprobability. If the sum of the probabilities is > 1, the values will be nomalized to the range [0, 1].

`input`	`ShortReadQ` object
`rm.seq`	Ccharacter vector with sequences to remove

`n`	Number of reads
`base_name`	Base name for strings
`sep`	Character separing base names and the read number. Default: '_

`input`	`ShortReadQ` object
`rm.3qual`	Quality threshold for 3' tails
`q_format`	Quality format used for the file, as returned by check_encoding
`check.encod`	Check the encoding of the sequence? This argument is incompatible with q_format. Default TRUE
`remove_zero`	Remove zero-length sequences?

Package 'FastqCleaner'

Help Index

Remove full and partial adapters from a ShortReadQ object

Description

Usage

Arguments

Value

Author(s)

Examples

Check quality encoding

Description

Usage

Arguments

Value

Author(s)

Examples

Remove sequences with low complexity

Description

Usage

Arguments

Value

Author(s)

Examples

Remove a fixed number of bases of a ShortReadQ object from 3' or 5'

Description

Usage

Arguments

Value

Author(s)

Examples

Inject a letter in a set of sequences at random positions

Description

Usage

Arguments

Value

Author(s)

Examples

Launch FastqCleaner application

Description

Usage

Arguments

Value

Author(s)

Examples

Filter sequences of a FASTQ file by length

Description

Usage

Arguments

Value

Author(s)

Examples

Remove sequences with non-identified bases (Ns) from a ShortReadQ object

Description

Usage

Arguments

Value

Author(s)

Examples

Filter sequences by their average quality

Description

Usage

Arguments

Value

Author(s)

Examples

Create a named object with random sequences and qualities

Description

Usage

Arguments

Value

Author(s)

Examples

Create random qualities for a given encoding

Description

Usage

Arguments

Value

Author(s)

Examples

Create random sequences