Title: | Load FastqQC reports and other NGS related files |
---|---|
Description: | This package provides methods and object classes for parsing FastQC reports and output summaries from other NGS tools into R. As well as parsing files, multiple plotting methods have been implemented for visualising the parsed data. Plots can be generated as static ggplot objects or interactive plotly objects. |
Authors: | Stevie Pederson [aut, cre] , Christopher Ward [aut], Thu-Hien To [aut] |
Maintainer: | Stevie Pederson <[email protected]> |
License: | LGPL-3 |
Version: | 2.9.0 |
Built: | 2024-11-14 05:46:46 UTC |
Source: | https://github.com/bioc/ngsReports |
Extract elements from FastqcDataList Object
## S4 method for signature 'FastqcDataList,numeric,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastqcDataList,character,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastqcDataList,logical,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastqcDataList,ANY,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,numeric,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,character,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,logical,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,ANY,missing' x[i, j, ..., drop = TRUE]
## S4 method for signature 'FastqcDataList,numeric,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastqcDataList,character,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastqcDataList,logical,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastqcDataList,ANY,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,numeric,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,character,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,logical,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'FastpDataList,ANY,missing' x[i, j, ..., drop = TRUE]
x |
A FastqcDataList or FastpDataList |
i |
character, logical or integer vector |
j |
not used |
... |
not used |
drop |
not used |
Extract elements in a consistent manner with R conventions
Will return a subset of the original object following the standard rules for subsetting objects
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Subsetting using the standard methods fdl[1] fdl[[1]]
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Subsetting using the standard methods fdl[1] fdl[[1]]
Generate a GC content distribution from sequences for a given read length and fragment length
estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...) ## S4 method for signature 'ANY' estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...) ## S4 method for signature 'character' estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...) ## S4 method for signature 'DNAStringSet' estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...)
estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...) ## S4 method for signature 'ANY' estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...) ## S4 method for signature 'character' estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...) ## S4 method for signature 'DNAStringSet' estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...)
x |
|
n |
The number of reads to sample |
rl |
Read Lengths to sample |
fl |
The mean of the fragment lengths sequenced |
fragSd |
The standard deviation of the fragment lengths being sequenced |
bins |
The number of bins to estimate |
... |
Not used |
The function takes the supplied object and returns the theoretical GC content distribution. Using a fixed read length essentially leads to a discrete distribution so the bins argument is used to define the number of bins returned. This defaults to 101 for 0 to 100% inclusive.
The returned values are obtained by interpolating the values obtained during sampling. This avoids returned distributions with gaps and jumps as would be obtained setting readLengths at values not in multiples of 100.
Based heavily on https://github.com/mikelove/fastqcTheoreticalGC
A tibble
with two columns: GC_Content
and Freq
denoting the proportion of GC and frequency of occurence reqpectively
faDir <- system.file("extdata", package = "ngsReports") faFile <- list.files(faDir, pattern = "fasta", full.names = TRUE) df <- estGcDistn(faFile, n = 200)
faDir <- system.file("extdata", package = "ngsReports") faFile <- list.files(faDir, pattern = "fasta", full.names = TRUE) df <- estGcDistn(faFile, n = 200)
FastpData(x)
FastpData(x)
x |
Path to a single zip archive or extracted folder for a individual fastp report. |
This object class is the main object required for generating plots
and tables. Instantiation will first check for a .json file with the correct
data structure, and will then parse all the data into R as a FastpData
object. Fastp modules are contained as individual slots, which can be viewed
using slotNames
. Sub-modules are also contained within many larger modules
with modules being based on the sections within a fastp html report
Individual modules can be returned using the function getModule()
and specifying which module/sub-module is required. See getModule()
for
more details.
An object of class FastpData
Summary
Contains three submodules 1) Before_filtering, 2) After_filtering and 3) Filtering_result. All values presented in the initial table for individual fastp reports are contained in other sections of the report
Adapters
Contains a tibble with all data from this module
Duplication
Contains a tibble with all duplication results
Insert_size
Contains a tibble with all insert size estimates
Before_filtering,After_filtering
The modules can be selected for either Read1 or Read2
paired
logical(1) indicating whether the file is from paired-end sequencing
command
character(1) with the executed command
version
character(1) with the fastp version being used
path
Path to the Fastp report
The FastpDataList Object Class
FastpDataList(x)
FastpDataList(x)
x |
Character vector of file paths specifying paths to fastp.json.gz output |
An object of class FastpDataList
...
this can either be a single character vector of paths to fastp files, or several instances of .FastpFile objects
FastqcData(x)
FastqcData(x)
x |
Path to a single zip archive or extracted folder for a individual FastQC report. |
This object class is the main object required for generating plots
and tables. Instantiation will first test for a compressed file (or
extracted directory) with the correct data structure, and will then parse
all the data into R as a FastqcData
object. FastQC modules are
contained as individual slots, which can be viewed using slotNames
.
Individual modules can be returned using the function getModule()
and specifying which module is required. See getModule()
for
more details.
An object of class FastqcData
Summary
Summary of PASS/WARN/FAIL status for each module
Basic_Statistics
The Basic_Statstics table from the top of a FastQC html report
Per_base_sequence_quality
The underlying data from the Per_base_sequence_quality module
Per_sequence_quality_scores
The underlying data from the Per_sequence_quality_scores module
Per_base_sequence_content
The underlying data from the Per_base_sequence_content module
Per_sequence_GC_content
The underlying data from the Per_sequence_GC_content module
Per_base_N_content
The underlying data from the Per_base_N_content module
Sequence_Length_Distribution
The underlying data from the Sequence_Length_Distribution module
Sequence_Duplication_Levels
The underlying data from the Sequence_Duplication_Levels module
Overrepresented_sequences
The underlying data from the Overrepresented_sequences module
Adapter_Content
The underlying data from the Adapter_Content module
Kmer_Content
The underlying data from the Kmer_Content module
Total_Deduplicated_Percentage
Estimate taken from the plot data for Sequence_Duplication_Levels. Only included in later versions of FastQC
version
The version of FastQC used for generation of the report (if available)
path
Path to the FastQC report
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE)[1] # Load the FASTQC data as a FastqcData object fd <- FastqcData(fl) fd
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE)[1] # Load the FASTQC data as a FastqcData object fd <- FastqcData(fl) fd
The FastqcDataList Object Class
FastqcDataList(x)
FastqcDataList(x)
x |
Character vector of file paths specifying paths to FastQC reports |
An object of class FastqcDataList
...
this can either be a single character vector of paths to FASTQC files, or several instances of .FastqcFile objects
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) fdl
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) fdl
Get the FASTQC version used to generate the intial files
## S4 method for signature 'FastqcData' fqcVersion(object) ## S4 method for signature 'FastqcDataList' fqcVersion(object) ## S4 method for signature 'ANY' fqcVersion(object)
## S4 method for signature 'FastqcData' fqcVersion(object) ## S4 method for signature 'FastqcDataList' fqcVersion(object) ## S4 method for signature 'ANY' fqcVersion(object)
object |
An object of class |
A character vector (FastqcData), or tibble (FastqcDataList)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Get the FASTQC version fqcVersion(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Get the FASTQC version fqcVersion(fdl)
Return the Underlying Fastq File Names from Fastqc/Fastp Objects
fqName(object) ## S4 method for signature 'ANY' fqName(object) ## S4 method for signature 'FastqcData' fqName(object) ## S4 method for signature 'FastqcDataList' fqName(object) fqName(object) <- value ## S4 replacement method for signature 'FastqcData' fqName(object) <- value ## S4 replacement method for signature 'FastqcDataList' fqName(object) <- value ## S4 method for signature 'FastpData' fqName(object) ## S4 method for signature 'FastpDataList' fqName(object)
fqName(object) ## S4 method for signature 'ANY' fqName(object) ## S4 method for signature 'FastqcData' fqName(object) ## S4 method for signature 'FastqcDataList' fqName(object) fqName(object) <- value ## S4 replacement method for signature 'FastqcData' fqName(object) <- value ## S4 replacement method for signature 'FastqcDataList' fqName(object) <- value ## S4 method for signature 'FastpData' fqName(object) ## S4 method for signature 'FastpDataList' fqName(object)
object |
An object able to extract an Fastq name from |
value |
Replacement value for fqName |
Returns the names of the Fastq files the FastQC report was generated from, without any preceding directories.
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) fqName(fdl) nm <- paste0(letters[seq_along(fdl)], ".fq") fqName(fdl) <- nm fqName(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) fqName(fdl) nm <- paste0(letters[seq_along(fdl)], ".fq") fqName(fdl) <- nm fqName(fdl)
List available genomes or transcriptomes in a TheoreticalGC object
gcAvail(object, type) ## S4 method for signature 'TheoreticalGC' gcAvail(object, type)
gcAvail(object, type) ## S4 method for signature 'TheoreticalGC' gcAvail(object, type)
object |
An object of class TheoreticalGC |
type |
character indicating either Genome or Transcriptome |
An object of class TheoreticalGC can hold the theoretical GC content for one
or more species, for either the genome or transriptome.
This function checks which species are available in the given object, for
either the genome or transcriptome, as supplied to the parameter type
.
A tibble
object
gcAvail(gcTheoretical, "Genome")
gcAvail(gcTheoretical, "Genome")
This object contains the theoretical GC content for each provided species, for both the genome and transcriptome, where available.
gcTheoretical
gcTheoretical
An object of class TheoreticalGC
of length 1.
The object is defined with the S4 class TheoreticalGC
.
Species for which information is available can be found using
the command gcAvail(gcTheoretical)
and selecting the appropriate type.
Metadata is accessible using mData(gcTheoretical)
.
All GC content was calculated using code from https://github.com/mikelove/fastqcTheoreticalGC using BSgenome packages. This provides a default set of GC content data for common organisms generated using 100bp reads/fragments and 1e6 reads.
gcAvail
## Check which genomes are included gcAvail(gcTheoretical, "Genome") ## Check which transcriptomes are included gcAvail(gcTheoretical, "Transcriptome")
## Check which genomes are included gcAvail(gcTheoretical, "Genome") ## Check which transcriptomes are included gcAvail(gcTheoretical, "Transcriptome")
Get and modify colours from objects of class PwfCols
## S4 method for signature 'PwfCols' getColours(object) ## S4 method for signature 'PwfCols' setColours(object, PASS, WARN, FAIL, MAX) ## S4 method for signature 'PwfCols' setAlpha(object, alpha)
## S4 method for signature 'PwfCols' getColours(object) ## S4 method for signature 'PwfCols' setColours(object, PASS, WARN, FAIL, MAX) ## S4 method for signature 'PwfCols' setAlpha(object, alpha)
object |
An object of class PwfCols |
PASS |
The colour denoting PASS on all plots, in rgb format |
WARN |
The colour denoting WARN on all plots, in rgb format |
FAIL |
The colour denoting FAIL on all plots, in rgb format |
MAX |
The colour denoting the limit of values in rgb format |
alpha |
Numeric(1). Ranges from 0 to 1 by default, but can also be on the range 0 to 255. |
Use getColours
to obtain the colours in an object of class PwfCols.
These can be modified using the functions setColours
and
setAlpha
getColours will return a character vector of colours coresponding to PASS/WARN/FAIL
setColours will return an object of class PwfCols
setAlpha will return an object of class PwfCols
getColours(pwf) # How to add transparency pwf2 <- setAlpha(pwf, 0.1) getColours(pwf2)
getColours(pwf) # How to add transparency pwf2 <- setAlpha(pwf, 0.1) getColours(pwf2)
Get the GC content data from a TheoreticalGC object
getGC(object, name, type) ## S4 method for signature 'ANY' getGC(object, type) ## S4 method for signature 'TheoreticalGC' getGC(object, name, type)
getGC(object, name, type) ## S4 method for signature 'ANY' getGC(object, type) ## S4 method for signature 'TheoreticalGC' getGC(object, name, type)
object |
An object of class Theoretical GC |
name |
The Name of the species in 'Gspecies' format, e.g. Hsapiens |
type |
The type of GC content. Can only be either "Genome" or "Transcriptome" |
A tibble
object
getGC(gcTheoretical, name = "Hsapiens", type = "Genome")
getGC(gcTheoretical, name = "Hsapiens", type = "Genome")
Retrieve a specific module from a Fastqc* object as a data.frame
## S4 method for signature 'FastqcData' getModule(object, module) ## S4 method for signature 'FastqcDataList' getModule(object, module) ## S4 method for signature 'ANY' getModule(object, module) ## S4 method for signature 'FastpData' getModule(object, module) ## S4 method for signature 'FastpDataList' getModule(object, module)
## S4 method for signature 'FastqcData' getModule(object, module) ## S4 method for signature 'FastqcDataList' getModule(object, module) ## S4 method for signature 'ANY' getModule(object, module) ## S4 method for signature 'FastpData' getModule(object, module) ## S4 method for signature 'FastpDataList' getModule(object, module)
object |
Can be a |
module |
The requested module as contained in a FastQC report. Possible
values are |
This function will return a given module from a Fastqc* object as a data.frame. Note that each module will be it's own unique structure, although all will return a data.frame
A single tibble
containing module-level information
from all FastQC reports contained in the Fastqc* object.
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Extract the Summary module, which corresponds to the PASS/WARN/FAIL flags getModule(fdl, "Summary") # The Basic_Statistics module corresponds to the table at the top of each # FastQC report getModule(fdl, "Basic_Statistics")
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Extract the Summary module, which corresponds to the PASS/WARN/FAIL flags getModule(fdl, "Summary") # The Basic_Statistics module corresponds to the table at the top of each # FastQC report getModule(fdl, "Basic_Statistics")
Read the information from the summary.txt
files in each
.FastqcFile
## S4 method for signature '.FastqcFile' getSummary(object) ## S4 method for signature 'ANY' getSummary(object) ## S4 method for signature 'FastqcData' getSummary(object) ## S4 method for signature 'FastqcDataList' getSummary(object)
## S4 method for signature '.FastqcFile' getSummary(object) ## S4 method for signature 'ANY' getSummary(object) ## S4 method for signature 'FastqcData' getSummary(object) ## S4 method for signature 'FastqcDataList' getSummary(object)
object |
Can be a |
This simply extracts the summary of PASS/WARN/FAIL status for every module as defined by the tool FastQC for each supplied file.
A tibble
containing the PASS/WARN/FAIL status for each
module, as defined in a FastQC report.
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Return a tibble/tibble with the raw information getSummary(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Return a tibble/tibble with the raw information getSummary(fdl)
Imports NGS-related log files such as those generated from stderr.
importNgsLogs(x, type = "auto", which, stripPaths = TRUE)
importNgsLogs(x, type = "auto", which, stripPaths = TRUE)
x |
|
type |
|
which |
Which element of the parsed object to return. Ignored in all
file types except when |
stripPaths |
logical(1). Remove paths from the Filename column |
Imports one or more log files as output by tools such as:
bowtie
, bowtie2
, featureCounts
, Hisat2
, STAR
, salmon
picard MarkDuplicates
, cutadapt
, flagstat
, macs2Callpeak
,
Adapter Removal
, trimmomatic
, rnaseqcMetrics
, quast
or busco
.
autoDetect
can be used to detect the log type by parsing the file.
The featureCounts log file corresponds to the counts.out.summary
,
not the main counts.out
file.
Whilst most log files return a single tibble, some are more complex with multiple modules.
adapterRemoval
can return one of four modules (which = 1:4),.
When calling by name, the possible values are sequences, settings,
statistics or distribution.
Partial matching is implemented.
cutadapt
can return one of five modules (which = 1:5).
When calling by name the possible modules are summary, adapter1, adapter2,
adapter3 or overview.
Note that adapter2/3 may be missing from these files depending on the nature
of your data.
If cutadapt log files are obtained using report=minimal
, all supplied
log files must be of this format and no modules can be returned.
duplicationMetrics
will return either the metrics of histogram.
These can be requested by setting which as 1 or 2, or naming either module.
A tibble
.
Column names are broadly similar to the text in supplied files,
but have been modified for easier handling under R naming conventions.
f <- c("bowtiePE.txt", "bowtieSE.txt") bowtieLogs <- system.file("extdata", f, package = "ngsReports") df <- importNgsLogs(bowtieLogs, type = "bowtie")
f <- c("bowtiePE.txt", "bowtieSE.txt") bowtieLogs <- system.file("extdata", f, package = "ngsReports") df <- importNgsLogs(bowtieLogs, type = "bowtie")
Import the SJ.out.tab files produced by STAR
importSJ(x, stripPaths = TRUE)
importSJ(x, stripPaths = TRUE)
x |
vector of file paths to SJ.out.tab files |
stripPaths |
logical(1) Remove directory prefixes from the file paths in x |
Imports one or more splice-junction output files as produced by STAR. If all are located in separated directories with identical names, be sure to set the argument stripPaths = FALSE
All co-ordinates are 1-based, in keeping with the STAR manual
A tibble
Stephen Pederson [email protected]
sjFiles <- system.file("extdata", "SJ.out.tab", package = "ngsReports") # Import leaving the complete file path in the column Filename # The argument srtipPaths is set as TRUE by default df <- importSJ(sjFiles, stripPaths = FALSE)
sjFiles <- system.file("extdata", "SJ.out.tab", package = "ngsReports") # Import leaving the complete file path in the column Filename # The argument srtipPaths is set as TRUE by default df <- importSJ(sjFiles, stripPaths = FALSE)
Check to see if a file, or vector of files is compressed
isCompressed(path, type = c("zip", "gzip"), verbose = FALSE)
isCompressed(path, type = c("zip", "gzip"), verbose = FALSE)
path |
The path to one or more files |
type |
The type of compression to check for. Currently only ZIP/GZIP files have been implemented. |
verbose |
logical/integer Determine the level of output to show as messages |
Reads the first four bytes from the local file header.
If the file is a .ZIP file, this should match the magic number
PK\003\004
.
This function assumes that the first thing in a zip archive is the .ZIP entry with the local file header signature. ZIP files containing a self-extracting archive may not exhibit this structure and will return FALSE
A logical
vector
# Get the files included with the package fileDir <- system.file("extdata", package = "ngsReports") allFiles <- list.files(fileDir, pattern = "zip$", full.names = TRUE) isCompressed(allFiles)
# Get the files included with the package fileDir <- system.file("extdata", package = "ngsReports") allFiles <- list.files(fileDir, pattern = "zip$", full.names = TRUE) isCompressed(allFiles)
Get the maximum Adapter Content across one or more FASTQC reports
maxAdapterContent(x, asPercent = TRUE)
maxAdapterContent(x, asPercent = TRUE)
x |
Can be a |
asPercent |
|
This will extract the Adapter_Content
module from the
supplied object, and provide a tibble
with the final value for each
file.
A tibble
object containing the percent of reads with each
adapter type at the final position
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Get the maxAdapterContent maxAdapterContent(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Get the maxAdapterContent maxAdapterContent(fdl)
Extract Metadata for TheoreticalGC objects
mData(object) ## S4 method for signature 'TheoreticalGC' mData(object)
mData(object) ## S4 method for signature 'TheoreticalGC' mData(object)
object |
An object of class Theoretical GC |
A tibble
object
mData(gcTheoretical)
mData(gcTheoretical)
Output overrepresented sequences to disk in fasta format.
overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...) ## S4 method for signature 'ANY' overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...) ## S4 method for signature 'FastqcData' overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...) ## S4 method for signature 'FastqcDataList' overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...)
overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...) ## S4 method for signature 'ANY' overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...) ## S4 method for signature 'FastqcData' overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...) ## S4 method for signature 'FastqcDataList' overRep2Fasta(x, path, n = 10, labels, noAdapters = TRUE, ...)
x |
Can be a |
path |
Path to export the fasta file to. Reverts to a default in the working directory if not supplied |
n |
The number of sequences to output |
labels |
An optional named factor of labels for the file names. All filenames must be present in the names. File extensions are dropped by default. |
noAdapters |
logical. Remove any sequences identified as possible adapters or primers by FastQC |
... |
Used to pass any alternative patterns to remove from the end of filenames |
Fasta will contain Filename
, Possible Source
,
Percent of total reads
Exports to a fasta file, and returns the fasta information invisibly
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Export the top10 Overrepresented Sequences as a single fasta file faOut <- file.path(tempdir(), "top10.fa") overRep2Fasta(fdl, path = faOut)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Export the top10 Overrepresented Sequences as a single fasta file faOut <- file.path(tempdir(), "top10.fa") overRep2Fasta(fdl, path = faOut)
Return the File Paths from an object
## S4 method for signature '.FastqcFile' path(object) ## S4 method for signature 'FastqcData' path(object) ## S4 method for signature 'FastqcDataList' path(object) ## S4 method for signature '.FastpFile' path(object) ## S4 method for signature 'FastpData' path(object) ## S4 method for signature 'FastpDataList' path(object)
## S4 method for signature '.FastqcFile' path(object) ## S4 method for signature 'FastqcData' path(object) ## S4 method for signature 'FastqcDataList' path(object) ## S4 method for signature '.FastpFile' path(object) ## S4 method for signature 'FastpData' path(object) ## S4 method for signature 'FastpDataList' path(object)
object |
An object of class .FastqcFile |
Obtains the file.path for objects of multiple classes
A character vector of the file paths to the underlying FastQC reports
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) path(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) path(fdl)
Draw an Adapter Content Plot across one or more FASTQC reports
plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'ANY' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'FastqcData' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, warn = 5, fail = 10, scaleColour = NULL, plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, warn = 5, fail = 10, plotType = c("heatmap", "line"), adapterType = "Total", cluster = FALSE, dendrogram = FALSE, heat_w = 8L, scaleFill = NULL, scaleColour = NULL, plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpData' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", scaleFill = NULL, plotlyLegend = FALSE, plotTheme = theme_get(), ... ) ## S4 method for signature 'FastpDataList' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = FALSE, warn = 5, fail = 10, cluster = FALSE, dendrogram = FALSE, scaleFill = NULL, plotTheme = theme_get(), heat_w = 8L, ... )
plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'ANY' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'FastqcData' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, warn = 5, fail = 10, scaleColour = NULL, plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, warn = 5, fail = 10, plotType = c("heatmap", "line"), adapterType = "Total", cluster = FALSE, dendrogram = FALSE, heat_w = 8L, scaleFill = NULL, scaleColour = NULL, plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpData' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", scaleFill = NULL, plotlyLegend = FALSE, plotTheme = theme_get(), ... ) ## S4 method for signature 'FastpDataList' plotAdapterContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = FALSE, warn = 5, fail = 10, cluster = FALSE, dendrogram = FALSE, scaleFill = NULL, plotTheme = theme_get(), heat_w = 8L, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. |
pattern |
regex used to trim the ends of all filenames for plotting |
... |
Used to pass additional attributes to theme() for FastQC objects and geoms for Fastp objects |
pwfCols |
Object of class |
showPwf |
logical(1) Show PASS/WARN/FAIL status as would be included in a standard FastQC report |
warn , fail
|
The default values for warn and fail are 5 and 10 respectively (i.e. percentages) |
plotlyLegend |
logical(1) Show legend when choosing interactive plots. Ignored for heatmaps |
plotType |
|
adapterType |
A regular expression matching the adapter(s) to be
plotted. To plot all adapters summed, specify |
cluster |
|
dendrogram |
|
heat_w |
Width of the heatmap relative to other plot components |
scaleFill , scaleColour
|
scale_fill\* and scale_colour_\* objects |
plotTheme |
Set theme elements by passing a theme |
This extracts the Adapter_Content module from the supplied object and generates a ggplot2 object, with a set of minimal defaults. The output of this function can be further modified using the standard ggplot2 methods.
When x
is a single or FastqcData object line plots will always be
drawn for all adapters.
Otherwise, users can select line plots or heatmaps.
When plotting more than one fastqc file, any undetected adapters will not be
shown.
An interactive version of the plot can be made by setting usePlotly
as TRUE
A standard ggplot2 object, or an interactive plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot plotAdapterContent(fdl) # Also subset the reads to just the R1 files r1 <- grepl("R1", fqName(fdl)) plotAdapterContent(fdl[r1]) # Plot just the Universal Adapter # and change the y-axis using ggplot2::scale_y_continuous plotAdapterContent(fdl, adapterType ="Illumina_Universal", plotType = "line") + facet_wrap(~Filename) + guides(colour = "none") # For FastpData object, the plots are slightly different fp <- FastpData(system.file("extdata/fastp.json.gz", package = "ngsReports")) plotAdapterContent(fp, scaleFill = scale_fill_brewer(palette = "Set1"))
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot plotAdapterContent(fdl) # Also subset the reads to just the R1 files r1 <- grepl("R1", fqName(fdl)) plotAdapterContent(fdl[r1]) # Plot just the Universal Adapter # and change the y-axis using ggplot2::scale_y_continuous plotAdapterContent(fdl, adapterType ="Illumina_Universal", plotType = "line") + facet_wrap(~Filename) + guides(colour = "none") # For FastpData object, the plots are slightly different fp <- FastpData(system.file("extdata/fastp.json.gz", package = "ngsReports")) plotAdapterContent(fp, scaleFill = scale_fill_brewer(palette = "Set1"))
Plot a summary of alignments from a set of log files
plotAlignmentSummary( x, type = c("star", "bowtie", "bowtie2", "hisat2"), usePlotly = FALSE, stripPaths = TRUE, asPercent = FALSE, ..., fill = c("red", "yellow", "blue", rgb(0, 0.5, 1)) )
plotAlignmentSummary( x, type = c("star", "bowtie", "bowtie2", "hisat2"), usePlotly = FALSE, stripPaths = TRUE, asPercent = FALSE, ..., fill = c("red", "yellow", "blue", rgb(0, 0.5, 1)) )
x |
Paths to one or more alignment log files |
type |
The aligner used. Can be one of star, bowtie, bowtie2 or hisat2 |
usePlotly |
logical. If TRUE an interactive plot will be generated. |
stripPaths |
logical(1). Remove paths from the Filename column |
asPercent |
Show alignments as percentages, with the alternative (FALSE) being the total number of reads If FALSE a ggplot object will be output |
... |
Used to pass additional attributes to theme() and between methods |
fill |
Colours used to fill the bars. Passed to scale_fill_manual. |
Loads a set of alignment log files and creates a default plot.
Implemented aligners are bowtie
, bowtie2
, Hisat2
and
STAR
.
A ggplot2 object, or a plotly object
f <- c("bowtie2PE.txt", "bowtie2SE.txt") bowtie2Logs <- system.file("extdata", f, package = "ngsReports") plotAlignmentSummary(bowtie2Logs, "bowtie2")
f <- c("bowtie2PE.txt", "bowtie2SE.txt") bowtie2Logs <- system.file("extdata", f, package = "ngsReports") plotAlignmentSummary(bowtie2Logs, "bowtie2")
Plot a summary of assembly stats from a set of log files
plotAssemblyStats( x, type = c("quast", "busco"), usePlotly = FALSE, plotType = c("bar", "paracoord"), ... )
plotAssemblyStats( x, type = c("quast", "busco"), usePlotly = FALSE, plotType = c("bar", "paracoord"), ... )
x |
Paths to one or more log files |
type |
The tool used. Can be one of quast or busco |
usePlotly |
logical. If TRUE an interactive plot will be generated. If FALSE a ggplot object will be output |
plotType |
|
... |
Used to pass additional attributes to theme() and between methods |
Loads a set of assembly log files and creates a default plot.
Implemented tools are quast
and BUSCO
.
quast will plot a parralel coordinate plot of some assembly statistics
BUSCO will plot a stacked barplot of completeness statistics
A ggplot2 object, or a plotly object
#getquast log filenames quastFiles <- system.file("extdata", c("quast1.tsv", "quast2.tsv"), package = "ngsReports") # The default plot plotAssemblyStats(quastFiles)
#getquast log filenames quastFiles <- system.file("extdata", c("quast1.tsv", "quast2.tsv"), package = "ngsReports") # The default plot plotAssemblyStats(quastFiles)
Plot the Base Qualities for each file as separate plots
plotBaseQuals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotBaseQuals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, boxWidth = 0.8, showPwf = TRUE, plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, showPwf = TRUE, boxWidth = 0.8, plotType = c("heatmap", "boxplot"), plotValue = "Mean", cluster = FALSE, dendrogram = FALSE, nc = 2, heat_w = 8L, ... ) ## S4 method for signature 'FastpData' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, showPwf = FALSE, module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), readsBy = c("facet", "linetype"), bases = c("A", "T", "C", "G", "mean"), scaleColour = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpDataList' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, showPwf = FALSE, module = c("Before_filtering", "After_filtering"), plotType = "heatmap", plotValue = c("mean", "A", "T", "C", "G"), scaleFill = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8L, ... )
plotBaseQuals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotBaseQuals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, boxWidth = 0.8, showPwf = TRUE, plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, showPwf = TRUE, boxWidth = 0.8, plotType = c("heatmap", "boxplot"), plotValue = "Mean", cluster = FALSE, dendrogram = FALSE, nc = 2, heat_w = 8L, ... ) ## S4 method for signature 'FastpData' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, showPwf = FALSE, module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), readsBy = c("facet", "linetype"), bases = c("A", "T", "C", "G", "mean"), scaleColour = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpDataList' plotBaseQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 25, fail = 20, showPwf = FALSE, module = c("Before_filtering", "After_filtering"), plotType = "heatmap", plotValue = c("mean", "A", "T", "C", "G"), scaleFill = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8L, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. |
pattern |
Regex to remove from the end of the Fastp report and Fastq file names |
... |
Used to pass additional attributes to theme() and between methods |
pwfCols |
Object of class |
warn , fail
|
The default values for warn and fail are 30 and 20 respectively (i.e. percentages) |
boxWidth |
set the width of boxes when using a boxplot |
showPwf |
Include the Pwf status colours |
plotlyLegend |
logical(1) Show legend for interactive plots. Only called when drawing line plots |
plotType |
|
plotValue |
|
cluster |
|
dendrogram |
|
nc |
|
heat_w |
Relative width of any heatmap plot components |
module |
Select Before and After filtering when using a FastpDataList |
reads |
Create plots for read1, read2 or all when using a FastpDataList |
readsBy |
If paired reads are present, separate using either linetype or by facet |
bases |
Which bases to include on the plot |
scaleColour |
ggplot discrete colour scale, passed to lines |
plotTheme |
theme object |
scaleFill |
ggplot2 continuous scale. Passed to heatmap cells |
When acting on a FastqcDataList
, this defaults to a heatmap
using the mean Per_base_sequence_quality score. A set of plots which
replicate those obtained through a standard FastQC html report can be
obtained by setting plotType = "boxplot"
, which uses facet_wrap
to provide the layout as a single ggplot object.
When acting an a FastqcData
object, this replicates the
Per base sequence quality
plots from FastQC with no faceting.
For large datasets, subsetting by R1 or R2 reads may be helpful.
An interactive plot can be obtained by setting usePlotly = TRUE
.
A standard ggplot2 object or an interactive plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot for multiple libraries is a heatmap plotBaseQuals(fdl) # The default plot for a single library is the standard boxplot plotBaseQuals(fdl[[1]]) # FastpData objects have qyalities by base fp <- FastpData(system.file("extdata/fastp.json.gz", package = "ngsReports")) plotBaseQuals( fp, plotTheme = theme(plot.title = element_text(hjust = 0.5)) )
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot for multiple libraries is a heatmap plotBaseQuals(fdl) # The default plot for a single library is the standard boxplot plotBaseQuals(fdl[[1]]) # FastpData objects have qyalities by base fp <- FastpData(system.file("extdata/fastp.json.gz", package = "ngsReports")) plotBaseQuals( fp, plotTheme = theme(plot.title = element_text(hjust = 0.5)) )
Plot the Sequence_Duplication_Levels information for a set of FASTQC reports
plotDupLevels(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotDupLevels(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = TRUE, plotlyLegend = FALSE, lineCol = c("red", "blue"), lineWidth = 1, ... ) ## S4 method for signature 'FastqcDataList' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = TRUE, plotlyLegend = FALSE, deduplication = c("pre", "post"), plotType = c("heatmap", "line"), cluster = FALSE, dendrogram = FALSE, heatCol = hcl.colors(50, "inferno"), heat_w = 8, ... ) ## S4 method for signature 'FastpData' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = FALSE, maxLevel = 10, lineCol = "red", barFill = "dodgerblue4", barCol = barFill, plotlyLegend = FALSE, plotTheme = theme_get(), ... ) ## S4 method for signature 'FastpDataList' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = FALSE, plotlyLegend = FALSE, plotType = c("bar", "heatmap"), barFill = "blue", barCol = "blue", cluster = FALSE, dendrogram = FALSE, scaleFill = NULL, plotTheme = theme_get(), heat_w = 8, maxLevel = 10, ... )
plotDupLevels(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotDupLevels(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = TRUE, plotlyLegend = FALSE, lineCol = c("red", "blue"), lineWidth = 1, ... ) ## S4 method for signature 'FastqcDataList' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = TRUE, plotlyLegend = FALSE, deduplication = c("pre", "post"), plotType = c("heatmap", "line"), cluster = FALSE, dendrogram = FALSE, heatCol = hcl.colors(50, "inferno"), heat_w = 8, ... ) ## S4 method for signature 'FastpData' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = FALSE, maxLevel = 10, lineCol = "red", barFill = "dodgerblue4", barCol = barFill, plotlyLegend = FALSE, plotTheme = theme_get(), ... ) ## S4 method for signature 'FastpDataList' plotDupLevels( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 20, fail = 50, showPwf = FALSE, plotlyLegend = FALSE, plotType = c("bar", "heatmap"), barFill = "blue", barCol = "blue", cluster = FALSE, dendrogram = FALSE, scaleFill = NULL, plotTheme = theme_get(), heat_w = 8, maxLevel = 10, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. File extensions are dropped by default. |
pattern |
regex to remove from the end of fastp & fastq file names |
... |
Used to pass additional attributes to theme() and between methods |
pwfCols |
Object of class |
warn , fail
|
The default values for warn and fail are 20 and 50 respectively (i.e. percentages) |
showPwf |
logical(1) Show PWF rectangles in the background |
plotlyLegend |
logical(1) Show legend for line plots when using interactive plots |
lineCol , lineWidth
|
Colours and width of lines drawn |
deduplication |
Plot Duplication levels 'pre' or 'post' deduplication. Can only take values "pre" and "post" |
plotType |
Choose between "heatmap" and "line" |
cluster |
|
dendrogram |
|
heatCol |
Colour palette used for the heatmap |
heat_w |
Relative width of the heatmap relative to other plot components |
maxLevel |
The maximum duplication level to plot. Beyond this level, all values will be summed |
barFill , barCol
|
Colours for bars when calling geom_col() |
plotTheme |
theme object. Applied after a call to theme_bw() |
scaleFill |
Discrete scale used to fill heatmap cells |
This extracts the Sequence_Duplication_Levels from the supplied object and generates a ggplot2 object, with a set of minimal defaults. For multiple reports, this defaults to a heatmap with block sizes proportional to the percentage of reads belonging to that duplication category.
If setting usePlotly = FALSE
, the output of this function can be
further modified using standard ggplot2 syntax. If setting
usePlotly = TRUE
an interactive plotly object will be produced.
A standard ggplot2 or plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Draw the default plot for a single file plotDupLevels(fdl[[1]]) plotDupLevels(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Draw the default plot for a single file plotDupLevels(fdl[[1]]) plotDupLevels(fdl)
Draw a PCA plot for Fast QC modules across multiple samples
plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, ... ) ## S4 method for signature 'ANY' plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, ... ) ## S4 method for signature 'character' plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, ... ) ## S4 method for signature 'FastqcDataList' plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, pc = c(1, 2), ... )
plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, ... ) ## S4 method for signature 'ANY' plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, ... ) ## S4 method for signature 'character' plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, ... ) ## S4 method for signature 'FastqcDataList' plotFastqcPCA( x, module = "Per_sequence_GC_content", usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", sz = 4, groups, pc = c(1, 2), ... )
x |
Can be a |
module |
|
usePlotly |
|
labels |
An optional named vector of labels for the file names. All file names must be present in the names of the vector. |
pattern |
Regex to remove from the end of any filenames |
sz |
The size of the text labels |
groups |
Optional factor of the same length as x. If provided, the plot will be coloured using this factor as the defined groups. Ellipses will also be added to the final plot. |
... |
Used to pass additional attributes to theme() and between methods |
pc |
The two components to be plotted |
This carries out PCA on a single FastQC module and plots the output using either ggplot or plotly. Current modules for PCA are Per_base_sequence_quality, Per_sequence_quality_scores, Per_sequence_GC_content, Per_base_sequence_content, and Sequence_Length_Distribution.
If a factor is provided in the groups argument, this will be applied to the
plotting colours and ellipses will be drawn using these groups.
Only the labels will be plotted using geom_text()
A standard ggplot2 object, or an interactive plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) grp <- as.factor(gsub(".+(R[12]).*", "\\1", fqName(fdl))) plotFastqcPCA(fdl, module = "Per_sequence_GC_content", groups = grp)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) grp <- as.factor(gsub(".+(R[12]).*", "\\1", fqName(fdl))) plotFastqcPCA(fdl, module = "Per_sequence_GC_content", groups = grp)
Plot the Per Sequence GC Content for a set of FASTQC files
plotGcContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotGcContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, plotlyLegend = FALSE, Fastafile, n = 1e+06, counts = FALSE, scaleColour = NULL, lineCols = c("red3", "black"), linetype = 1, linewidth = 0.5, ... ) ## S4 method for signature 'FastqcDataList' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, Fastafile, n = 1e+06, plotType = c("heatmap", "line", "cdf"), cluster = FALSE, dendrogram = FALSE, heat_w = 8, pwfCols, showPwf = TRUE, scaleFill = NULL, scaleColour = NULL, plotlyLegend = FALSE, lineCols = RColorBrewer::brewer.pal(12, "Paired"), linetype = 1, linewidth = 0.5, ... ) ## S4 method for signature 'FastpData' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, Fastafile, n = 1e+06, plotType = "bar", scaleFill = NULL, plotlyLegend = FALSE, plotTheme = theme_get(), ... ) ## S4 method for signature 'FastpDataList' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, Fastafile, n = 1e+06, plotType = "bar", scaleFill = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... )
plotGcContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotGcContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, plotlyLegend = FALSE, Fastafile, n = 1e+06, counts = FALSE, scaleColour = NULL, lineCols = c("red3", "black"), linetype = 1, linewidth = 0.5, ... ) ## S4 method for signature 'FastqcDataList' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, Fastafile, n = 1e+06, plotType = c("heatmap", "line", "cdf"), cluster = FALSE, dendrogram = FALSE, heat_w = 8, pwfCols, showPwf = TRUE, scaleFill = NULL, scaleColour = NULL, plotlyLegend = FALSE, lineCols = RColorBrewer::brewer.pal(12, "Paired"), linetype = 1, linewidth = 0.5, ... ) ## S4 method for signature 'FastpData' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, Fastafile, n = 1e+06, plotType = "bar", scaleFill = NULL, plotlyLegend = FALSE, plotTheme = theme_get(), ... ) ## S4 method for signature 'FastpDataList' plotGcContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", theoreticalGC = TRUE, gcType = c("Genome", "Transcriptome"), species = "Hsapiens", GCobject, Fastafile, n = 1e+06, plotType = "bar", scaleFill = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. |
pattern |
Pattern to remove from the end of filenames |
... |
Used to pass various potting parameters to themes and geoms. |
theoreticalGC |
|
gcType |
|
species |
|
GCobject |
an object of class GCTheoretical. Defaults to the gcTheoretical object supplied with the package |
plotlyLegend |
logical(1) Show legend on interactive line plots |
Fastafile |
a fasta file contains DNA sequences to generate theoretical GC content |
n |
number of simulated reads to generate theoretical GC content from
|
counts |
|
scaleColour |
ggplot2 scale for line colours |
lineCols , linetype , linewidth
|
Line colour type and width for observed and theoretical GC lines |
plotType |
Takes values "line", "heatmap" or "cdf" |
cluster |
|
dendrogram |
|
heat_w |
Relative width of any heatmap plot components |
pwfCols |
Object of class |
showPwf |
logical(1) Show Pwf Status on the plot |
scaleFill |
ggplot2 scale for filling heatmap cells or bars |
plotTheme |
theme object |
Makes plots for GC_Content. When applied to a single FastqcData object a simple line plot will be drawn, with Theoretical GC content overlaid if desired.
When applied to multiple FastQC reports, the density at each GC content bin
can be shown as a heatmap by setting theoreticalGC = FALSE
. By
default the difference in observed and expected theoretical GC is shown.
Species and genome/transcriptome should also be set if utilising the
theoretical GC content.
As an alternative to a heatmap, a series of overlaid distributions can be
shown by setting plotType = "line"
.
Can produce a static ggplot2 object or an interactive plotly object.
A ggplot2 or plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot for a FastqcDataList plotGcContent(fdl) # Plot a single FastqcData object plotGcContent(fdl[[1]])
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot for a FastqcDataList plotGcContent(fdl) # Plot a single FastqcData object plotGcContent(fdl[[1]])
Plot the insert size distribution from one of Fastp reports
plotInsertSize(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastpData' plotInsertSize( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", plotType = c("histogram", "cumulative"), counts = FALSE, plotTheme = theme_get(), expand.x = 0.01, expand.y = c(0, 0.05), ... ) ## S4 method for signature 'FastpDataList' plotInsertSize( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", plotType = c("heatmap", "line", "cumulative"), plotTheme = theme_get(), scaleFill = NULL, scaleColour = NULL, cluster = FALSE, dendrogram = FALSE, heat_w = 8, ... )
plotInsertSize(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastpData' plotInsertSize( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", plotType = c("histogram", "cumulative"), counts = FALSE, plotTheme = theme_get(), expand.x = 0.01, expand.y = c(0, 0.05), ... ) ## S4 method for signature 'FastpDataList' plotInsertSize( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", plotType = c("heatmap", "line", "cumulative"), plotTheme = theme_get(), scaleFill = NULL, scaleColour = NULL, cluster = FALSE, dendrogram = FALSE, heat_w = 8, ... )
x |
A FastpData or FastpDataList object |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All file names must be present in the names of the vector. |
pattern |
Regex to remove from the end of any filenames |
... |
Passed to |
plotType |
Determine the plot type. Options vary with the input structure |
counts |
logical(1) Plot read counts, or percentages (default) |
plotTheme |
a theme object |
expand.x , expand.y
|
Axis expansions |
scaleFill |
Continuous scale used to fill heatmap cells. Defaults to the "inferno" palette |
scaleColour |
Discrete scale for adding line colours |
cluster |
|
dendrogram |
|
heat_w |
Width of the heatmap relative to other plot components |
Takes a Fastp os a set of Fastp reports and plot insert size distributions. Plots can be drawn as cumulative totals or the default histograms for a single report, and as boxplots or heatmaps for a set of reports
A ggplot or plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastp.json.gz", full.names = TRUE) fp <- FastpData(fl) plotInsertSize( fp, counts = TRUE, fill = "steelblue4", plotTheme = theme(plot.title = element_text(hjust = 0.5)) ) plotInsertSize(fp, plotType = "cumulative")
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastp.json.gz", full.names = TRUE) fp <- FastpData(fl) plotInsertSize( fp, counts = TRUE, fill = "steelblue4", plotTheme = theme(plot.title = element_text(hjust = 0.5)) ) plotInsertSize(fp, plotType = "cumulative")
Plot Overrepresented Kmers
plotKmers(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotKmers(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotKmers( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", n = 6, linewidth = 0.5, plotlyLegend = FALSE, scaleColour = NULL, pal = c("red", "blue", "green", "black", "magenta", "yellow"), ... ) ## S4 method for signature 'FastqcDataList' plotKmers( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", cluster = FALSE, dendrogram = FALSE, pwfCols, showPwf = TRUE, scaleFill = NULL, heatCol = hcl.colors(50, "inferno"), heat_w = 8, ... ) ## S4 method for signature 'FastpData' plotKmers( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), readsBy = c("facet", "mean", "diff"), trans = "log2", scaleFill = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... )
plotKmers(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotKmers(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotKmers( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", n = 6, linewidth = 0.5, plotlyLegend = FALSE, scaleColour = NULL, pal = c("red", "blue", "green", "black", "magenta", "yellow"), ... ) ## S4 method for signature 'FastqcDataList' plotKmers( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", cluster = FALSE, dendrogram = FALSE, pwfCols, showPwf = TRUE, scaleFill = NULL, heatCol = hcl.colors(50, "inferno"), heat_w = 8, ... ) ## S4 method for signature 'FastpData' plotKmers( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), readsBy = c("facet", "mean", "diff"), trans = "log2", scaleFill = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. |
pattern |
regex to drop from the end of filenames |
... |
Used to pass parameters to theme for FastqcData objects and to geoms for FastpData objects |
n |
|
linewidth |
Passed to |
plotlyLegend |
Show legend for interactive plots |
pal |
The colour palette. If the vector supplied is less than n,
|
cluster |
|
dendrogram |
|
pwfCols |
Object of class |
showPwf |
Show the PASS/WARN/FAIL status |
scaleFill , scaleColour
|
ggplot2 scales to be used for colour palettes |
heatCol |
Colour palette used for the heatmap. Default is |
heat_w |
Relative width of any heatmap plot components |
module |
The module to obtain data from when using a FastpData object |
reads |
Either read1 or read2. Only used when using a FastpData object |
readsBy |
Strategy for visualising both read1 and read2. Can be set to show each set of reads by facet, or within the same plot taking the mean of the enrichment above mean, or the difference in the enrichment above mean |
trans |
Function for transforming the count/mean ratio. Set as NULL to use the ratio without transformation |
plotTheme |
theme object |
As the Kmer Content module present in FastQC reports is relatively uninformative, and omitted by default in later versions of FastQC, these are rudimentary plots.
Plots for FastqcData
objects replicate those contained in a FastQC
report, whilst the heatmap generated from FastqcDataList
objects
simply show the location and abundance of over-represented Kmers.
A standard ggplot2 object or an interactive plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) plotKmers(fdl[[1]]) # Use a FastpData object fl <- system.file("extdata", "fastp.json.gz", package = "ngsReports") fp <- FastpData(fl) plotKmers(fp, size = 2) plotKmers( fp, reads = "read1", size = 2, trans = NULL, scaleFill = scale_fill_gradient(low = "white", high = "black") )
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) plotKmers(fdl[[1]]) # Use a FastpData object fl <- system.file("extdata", "fastp.json.gz", package = "ngsReports") fp <- FastpData(fl) plotKmers(fp, size = 2) plotKmers( fp, reads = "read1", size = 2, trans = NULL, scaleFill = scale_fill_gradient(low = "white", high = "black") )
Draw an N Content Plot across one or more FastQC reports
plotNContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotNContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 5, fail = 20, showPwf = TRUE, ..., lineCol = "red" ) ## S4 method for signature 'FastqcDataList' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 5, fail = 20, showPwf = TRUE, cluster = FALSE, dendrogram = FALSE, heat_w = 8, scaleFill = NULL, ... ) ## S4 method for signature 'FastpData' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), moduleBy = c("facet", "colour", "linetype"), reads = c("read1", "read2"), readsBy = c("facet", "colour", "linetype"), scaleColour = NULL, scaleLine = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpDataList' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), scaleFill = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8, ... )
plotNContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotNContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 5, fail = 20, showPwf = TRUE, ..., lineCol = "red" ) ## S4 method for signature 'FastqcDataList' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, warn = 5, fail = 20, showPwf = TRUE, cluster = FALSE, dendrogram = FALSE, heat_w = 8, scaleFill = NULL, ... ) ## S4 method for signature 'FastpData' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), moduleBy = c("facet", "colour", "linetype"), reads = c("read1", "read2"), readsBy = c("facet", "colour", "linetype"), scaleColour = NULL, scaleLine = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpDataList' plotNContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), scaleFill = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. |
pattern |
Regex used to trim the end of filenames |
... |
Used to pass additional attributes to theme() for FastqcData objects and to geom* calls for FastpData-based objects |
pwfCols |
Object of class |
warn , fail
|
The default values for warn and fail are 5 and 10 respectively (i.e. percentages) |
showPwf |
logical(1) Show the PASS/WARN/FAIL status |
lineCol |
Line colours |
cluster |
|
dendrogram |
|
heat_w |
Relative width of any heatmap plot components |
scaleFill , scaleColour , scaleLine
|
ggplot2 scale objects |
module |
Used for Fastp* structures to show results before or after filtering |
moduleBy , readsBy
|
How to show each module or set of reads on the plot |
reads |
Show plots for read1, read2 or both. |
plotTheme |
theme object |
plotlyLegend |
logical(1) Show legend on interactive plots |
This extracts the N_Content from the supplied object and generates a ggplot2 object, with a set of minimal defaults. The output of this function can be further modified using the standard ggplot2 methods.
When x
is a single FastqcData object line plots will always be drawn
for all Ns.
Otherwise, users can select line plots or heatmaps.
A standard ggplot2 object, or an interactive plotly object
## Using a Fastp Data object fl <- system.file("extdata/fastp.json.gz", package = "ngsReports") fp <- FastpData(fl) plotNContent(fp) plotNContent( fp, pattern = "_001.+", moduleBy = "colour", scaleColour = scale_colour_brewer(palette = "Set1"), plotTheme = theme( legend.position = 'inside', legend.position.inside = c(0.99, 0.99), legend.justification = c(1, 1), plot.title = element_text(hjust = 0.5) ) )
## Using a Fastp Data object fl <- system.file("extdata/fastp.json.gz", package = "ngsReports") fp <- FastpData(fl) plotNContent(fp) plotNContent( fp, pattern = "_001.+", moduleBy = "colour", scaleColour = scale_colour_brewer(palette = "Set1"), plotTheme = theme( legend.position = 'inside', legend.position.inside = c(0.99, 0.99), legend.justification = c(1, 1), plot.title = element_text(hjust = 0.5) ) )
Plot a summary of Over-represented Sequences for a set of FASTQC reports
plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'ANY' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'character' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'FastqcData' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, n = 10, expand.x = c(0, 0, 0.05, 0), expand.y = c(0, 0.6, 0, 0.6), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, cluster = FALSE, dendrogram = FALSE, scaleFill = NULL, paletteName = "Set1", panel_w = 8, expand.x = c(0, 0, 0.05, 0), expand.y = rep(0, 4), ... )
plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'ANY' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'character' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'FastqcData' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, n = 10, expand.x = c(0, 0, 0.05, 0), expand.y = c(0, 0.6, 0, 0.6), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotOverrep( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, cluster = FALSE, dendrogram = FALSE, scaleFill = NULL, paletteName = "Set1", panel_w = 8, expand.x = c(0, 0, 0.05, 0), expand.y = rep(0, 4), ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named factor of labels for the file names. All filenames must be present in the names. |
pattern |
Regex to remove from the end of any filenames |
pwfCols |
Object of class |
... |
Used to pass additional attributes to theme() and between methods |
n |
The number of sequences to plot from an individual file |
expand.x , expand.y
|
Output from |
plotlyLegend |
Show legend on interactive plots |
showPwf |
Show PASS/WARN/FAIL status on the plot |
cluster |
|
dendrogram |
|
scaleFill |
ggplot scale object |
paletteName |
Name of the palette for colouring the possible sources
of the overrepresented sequences. Must be a palette name from
|
panel_w |
Width of main panel on output |
Percentages are obtained by simply summing those within a report. Any possible double counting by FastQC is ignored for the purposes of a simple approximation.
Plots generated from a FastqcData
object will show the top n
sequences grouped by their predicted source & coloured by whether the
individual sequence would cause a WARN/FAIL.
Plots generated from a FastqcDataList
group sequences by predicted
source and summarise as a percentage of the total reads.
A standard ggplot2 object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # A brief summary across all FastQC reports plotOverrep(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # A brief summary across all FastQC reports plotOverrep(fdl)
Draw a barplot of read totals
plotReadTotals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotReadTotals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcDataList' plotReadTotals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", duplicated = TRUE, bars = c("stacked", "adjacent"), vertBars = TRUE, divBy = 1, barCols = c("red", "blue"), expand.y = c(0, 0.02), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpDataList' plotReadTotals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", adjPaired = TRUE, divBy = 1e+06, scaleFill = NULL, labMin = 0.05, status = TRUE, labelVJ = 0.5, labelFill = "white", plotTheme = theme_get(), vertBars = FALSE, plotlyLegend = FALSE, expand.y = c(0, 0.05), ... )
plotReadTotals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotReadTotals(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcDataList' plotReadTotals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", duplicated = TRUE, bars = c("stacked", "adjacent"), vertBars = TRUE, divBy = 1, barCols = c("red", "blue"), expand.y = c(0, 0.02), plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastpDataList' plotReadTotals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", adjPaired = TRUE, divBy = 1e+06, scaleFill = NULL, labMin = 0.05, status = TRUE, labelVJ = 0.5, labelFill = "white", plotTheme = theme_get(), vertBars = FALSE, plotlyLegend = FALSE, expand.y = c(0, 0.05), ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. |
pattern |
Regex used to trim the end of filenames |
... |
Used to pass additional attributes to theme() |
duplicated |
logical(1). Include deduplicated read total estimates to plot charts |
bars |
If |
vertBars |
logical(1) Show bars as vertical or horizontal |
divBy |
Scale read totals by this value. The default shows the y-axis in millions for FastpDataList objects, but does not scale FastQC objects, for the sake of backwards compatability |
barCols |
Colours for duplicated and unique reads. |
expand.y |
Passed to |
plotlyLegend |
logical(1) Show legend on interactive plots |
adjPaired |
Scale read totals by 0.5 when paired |
scaleFill |
ScaleDiscrete function to be applied to the plot |
labMin |
Only show labels for filtering categories higher than this values as a proportion of reads. Set to any number > 1 to turn off labels |
status |
logical(1) Include read status in the plot |
labelVJ |
Relative vertical position to labels within each bar. |
labelFill |
Passed to geom_label |
plotTheme |
theme to be added to the plot |
Draw a barplot of read totals using the standard ggplot2 syntax.
The raw data from readTotals()
can otherwise be used to manually
create a plot.
Duplication levels are based on the value shown on FASTQC reports at the
top of the DeDuplicatedTotals plot, which is known to be inaccurate.
As it still gives a good guide as to sequence diversity it is included as
the default. This can be turned off by setting duplicated = FALSE
.
For FastpDataList objects, duplication statistics are not part of the default module containing ReadTotals. However, the status of reads and the reason for being retained or filtered is, and as such these are shown instead of duplication statistics.
Returns a ggplot or plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Plot the Read Totals showing estimated duplicates plotReadTotals(fdl) # Plot the Read Totals without estimated duplicates plotReadTotals(fdl, duplicated = FALSE)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Plot the Read Totals showing estimated duplicates plotReadTotals(fdl) # Plot the Read Totals without estimated duplicates plotReadTotals(fdl, duplicated = FALSE)
Plot the Per Base content for a set of FASTQC files.
plotSeqContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotSeqContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", bases = c("A", "T", "C", "G"), scaleColour = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, expand.x = 0.02, expand.y = c(0, 0.05), ... ) ## S4 method for signature 'FastqcDataList' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, plotType = c("heatmap", "line", "residuals"), scaleColour = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8, plotlyLegend = FALSE, nc = 2, ... ) ## S4 method for signature 'FastpData' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), readsBy = c("facet", "linetype"), moduleBy = c("facet", "linetype"), bases = c("A", "T", "C", "G", "N", "GC"), scaleColour = NULL, scaleLine = NULL, plotlyLegend = FALSE, plotTheme = theme_get(), expand.x = 0.02, expand.y = c(0, 0.05), ... ) ## S4 method for signature 'FastpDataList' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), moduleBy = c("facet", "linetype"), reads = c("read1", "read2"), readsBy = c("facet", "linetype"), bases = c("A", "T", "C", "G", "N", "GC"), showPwf = FALSE, pwfCols, warn = 10, fail = 20, plotType = c("heatmap", "line", "residuals"), plotlyLegend = FALSE, scaleColour = NULL, scaleLine = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8, expand.x = c(0.01), expand.y = c(0, 0.05), nc = 2, ... )
plotSeqContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'ANY' plotSeqContent(x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ...) ## S4 method for signature 'FastqcData' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", bases = c("A", "T", "C", "G"), scaleColour = NULL, plotTheme = theme_get(), plotlyLegend = FALSE, expand.x = 0.02, expand.y = c(0, 0.05), ... ) ## S4 method for signature 'FastqcDataList' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, plotType = c("heatmap", "line", "residuals"), scaleColour = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8, plotlyLegend = FALSE, nc = 2, ... ) ## S4 method for signature 'FastpData' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), reads = c("read1", "read2"), readsBy = c("facet", "linetype"), moduleBy = c("facet", "linetype"), bases = c("A", "T", "C", "G", "N", "GC"), scaleColour = NULL, scaleLine = NULL, plotlyLegend = FALSE, plotTheme = theme_get(), expand.x = 0.02, expand.y = c(0, 0.05), ... ) ## S4 method for signature 'FastpDataList' plotSeqContent( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", module = c("Before_filtering", "After_filtering"), moduleBy = c("facet", "linetype"), reads = c("read1", "read2"), readsBy = c("facet", "linetype"), bases = c("A", "T", "C", "G", "N", "GC"), showPwf = FALSE, pwfCols, warn = 10, fail = 20, plotType = c("heatmap", "line", "residuals"), plotlyLegend = FALSE, scaleColour = NULL, scaleLine = NULL, plotTheme = theme_get(), cluster = FALSE, dendrogram = FALSE, heat_w = 8, expand.x = c(0.01), expand.y = c(0, 0.05), nc = 2, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All file names must be present in the names of the vector. |
pattern |
Regex to remove from the end of any filenames |
... |
Used to pass additional attributes to plotting geoms |
bases |
Which bases to draw on the plot. Also becomes the default plotting order by setting these as factor levels |
scaleColour |
Discrete colour scale as a ggplot ScaleDiscrete object If not provided, will default to scale_colour_manual |
plotTheme |
theme object to be applied. Note that all plots will have theme_bw theme applied by default, as well as any additional themes supplied here |
plotlyLegend |
logical(1) Show legends for interactive plots. Ignored for heatmaps |
expand.x , expand.y
|
Passed to expansion in the x- and y-axis scales respectively |
pwfCols |
Object of class |
showPwf |
Show PASS/WARN/FAIL categories as would be defined in a FastQC report |
plotType |
|
cluster |
|
dendrogram |
|
heat_w |
Relative width of any heatmap plot components |
nc |
Specify the number of columns if plotting a FastqcDataList as line plots. Passed to facet_wrap. |
module |
Fastp Module to show. Can only be Before/After_filtering |
reads |
Which set of reads to show |
readsBy , moduleBy
|
When plotting both R1 & R2 and both modules, separate by either linetype or linetype |
scaleLine |
Discrete scale_linetype object. Only relevant if plotting values by linetype |
warn , fail
|
Default values for WARN and FAIL based on FastQC reports. Only applied to heatmaps for FastpDataList objects |
Per base sequence content (%A, %T, %G, %C), is shown as four overlaid
heatmap colours when plotting from multiple reports. The individual line
plots are able to be generated by setting plotType = "line"
, and the
layout is determined by facet_wrap
from ggplot2.
Individual line plots are also generated when plotting from a single
FastqcData
object.
If setting usePlotly = TRUE
for a large number of reports, the plot
can be slow to render.
An alternative may be to produce a plot of residuals for each base, produced
by taking the position-specific mean for each base.
A ggplot2 object or an interactive plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot plotSeqContent(fdl) fp <- FastpData(system.file("extdata/fastp.json.gz", package = "ngsReports")) plotSeqContent(fp) plotSeqContent(fp, moduleBy = "linetype", bases = c("A", "C", "G", "T"))
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot plotSeqContent(fdl) fp <- FastpData(system.file("extdata/fastp.json.gz", package = "ngsReports")) plotSeqContent(fp) plotSeqContent(fp, moduleBy = "linetype", bases = c("A", "C", "G", "T"))
Plot the Sequence Length Distribution across one or more FASTQC reports
plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'ANY' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'character' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'FastqcData' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", counts = TRUE, plotType = c("line", "cdf"), expand.x = c(0, 0.2, 0, 0.2), plotlyLegend = FALSE, colour = "red", ... ) ## S4 method for signature 'FastqcDataList' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", counts = FALSE, plotType = c("heatmap", "line", "cdf"), cluster = FALSE, dendrogram = FALSE, heat_w = 8, pwfCols, showPwf = TRUE, scaleFill = NULL, scaleColour = NULL, heatCol = hcl.colors(50, "inferno"), plotlyLegend = FALSE, ... )
plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'ANY' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'character' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", ... ) ## S4 method for signature 'FastqcData' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", counts = TRUE, plotType = c("line", "cdf"), expand.x = c(0, 0.2, 0, 0.2), plotlyLegend = FALSE, colour = "red", ... ) ## S4 method for signature 'FastqcDataList' plotSeqLengthDistn( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", counts = FALSE, plotType = c("heatmap", "line", "cdf"), cluster = FALSE, dendrogram = FALSE, heat_w = 8, pwfCols, showPwf = TRUE, scaleFill = NULL, scaleColour = NULL, heatCol = hcl.colors(50, "inferno"), plotlyLegend = FALSE, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. |
pattern |
Regex to remove from the end of any filenames |
... |
Used to pass additional attributes to theme() |
counts |
|
plotType |
|
expand.x |
Output from |
plotlyLegend |
logical(1) Show legend for interactive line plots |
colour |
Line colour |
cluster |
|
dendrogram |
|
heat_w |
Relative width of any heatmap plot components |
pwfCols |
Object of class |
showPwf |
logical(1) Show PASS/WARN/FAIL status |
scaleFill , scaleColour
|
Optional ggplot scale objects |
heatCol |
The colour scheme for the heatmap |
This extracts the Sequence Length Distribution from the supplied object and generates a ggplot2 object, with a set of minimal defaults. The output of this function can be further modified using the standard ggplot2 methods.
A cdf plot can also be generated to provide guidance for minimum
read length in some NGS workflows, by setting plotType = "cdf"
.
If all libraries have reads of identical lengths, these plots may be less
informative.
An alternative interactive plot is available by setting the argument
usePlotly = TRUE
.
A standard ggplot2 object, or an interactive plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Plot as a frequency plot using lines plotSeqLengthDistn(fdl) # Or plot the cdf plotSeqLengthDistn(fdl, plotType = "cdf")
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Plot as a frequency plot using lines plotSeqLengthDistn(fdl) # Or plot the cdf plotSeqLengthDistn(fdl, plotType = "cdf")
Plot the Per Sequence Quality Scores for a set of FASTQC reports
plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'ANY' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'character' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'FastqcData' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, counts = FALSE, alpha = 0.1, warn = 30, fail = 20, colour = "red", plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, counts = FALSE, alpha = 0.1, warn = 30, fail = 20, showPwf = TRUE, plotType = c("heatmap", "line"), dendrogram = FALSE, cluster = FALSE, scaleFill = NULL, heatCols = hcl.colors(100, "inferno"), heat_w = 8, scaleColour = NULL, plotlyLegend = FALSE, ... )
plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'ANY' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'character' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, ... ) ## S4 method for signature 'FastqcData' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, showPwf = TRUE, counts = FALSE, alpha = 0.1, warn = 30, fail = 20, colour = "red", plotlyLegend = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotSeqQuals( x, usePlotly = FALSE, labels, pattern = ".(fast|fq|bam).*", pwfCols, counts = FALSE, alpha = 0.1, warn = 30, fail = 20, showPwf = TRUE, plotType = c("heatmap", "line"), dendrogram = FALSE, cluster = FALSE, scaleFill = NULL, heatCols = hcl.colors(100, "inferno"), heat_w = 8, scaleColour = NULL, plotlyLegend = FALSE, ... )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All file names must be present in the names of the vector. |
pattern |
Regex to remove from the end of any filenames |
pwfCols |
Object of class |
... |
Used to pass various potting parameters to theme. Can also be used to set size and colour for box outlines. |
showPwf |
logical(1) Show PASS/WARN/FAIL status |
counts |
|
alpha |
set alpha for line graph bounds |
warn , fail
|
The default values for warn and fail are 5 and 10 respectively (i.e. percentages) |
colour |
Colour for single line plots |
plotlyLegend |
logical(1) Show legend for interactive line plots |
plotType |
|
dendrogram |
|
cluster |
|
scaleFill , scaleColour
|
ggplot2 scales |
heatCols |
Colour palette for the heatmap |
heat_w |
Relative width of any heatmap plot components |
Plots the distribution of average sequence quality scores across the
set of files. Values can be plotted either as counts (counts = TRUE
)
or as frequencies (counts = FALSE
).
Any faceting or scale adjustment can be performed after generation of the initial plot, using the standard methods of ggplot2 as desired.
A standard ggplot2 object, or an interactive plotly object
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot plotSeqQuals(fdl) # Also subset the reads to just the R1 files r1 <- grepl("R1", fqName(fdl)) plotSeqQuals(fdl[r1])
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # The default plot plotSeqQuals(fdl) # Also subset the reads to just the R1 files r1 <- grepl("R1", fqName(fdl)) plotSeqQuals(fdl[r1])
Extract the PASS/WARN/FAIL summaries and plot them
plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ... ) ## S4 method for signature 'ANY' plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ... ) ## S4 method for signature 'character' plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ..., gridlineWidth = 0.2, gridlineCol = "grey20" )
plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ... ) ## S4 method for signature 'ANY' plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ... ) ## S4 method for signature 'character' plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ... ) ## S4 method for signature 'FastqcDataList' plotSummary( x, usePlotly = FALSE, labels, pwfCols, cluster = FALSE, dendrogram = FALSE, ..., gridlineWidth = 0.2, gridlineCol = "grey20" )
x |
Can be a |
usePlotly |
|
labels |
An optional named vector of labels for the file names. All filenames must be present in the names. File extensions are dropped by default. |
pwfCols |
Object of class |
cluster |
|
dendrogram |
|
... |
Used to pass various potting parameters to theme. |
gridlineWidth , gridlineCol
|
Passed to geom_hline and geom_vline to determine width and colour of gridlines |
This uses the standard ggplot2 syntax to create a three colour plot. The output of this function can be further modified using the standard ggplot2 methods if required.
A ggplot2 object (usePlotly = FALSE
)
or an interactive plotly object (usePlotly = TRUE
)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Check the overall PASS/WARN/FAIL status plotSummary(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Check the overall PASS/WARN/FAIL status plotSummary(fdl)
Default colours for PASS/WARN/FAIL
pwf
pwf
An object of class PwfCols
of length 1.
pwf
is an object of class PwfCols supplied with the package
and used as the default colouring.
Colours correspond approximately to PASS, WARN and FAIL from the FASTQC
reports, with the additional colour (MAX) included to indicate an extreme
FAIL.
In order, these colours in the default vector are green
(rgb(0, 0.8, 0)
), yellow (rgb(0.9, 0.9, 0.2)
), red
(rgb(0.8, 0.2, 0.2)
) and white (rgb(1, 1, 1)
)
# Make a pie chart showing the default colours pie(rep(1,4), labels = names(pwf), col = getColours(pwf))
# Make a pie chart showing the default colours pie(rep(1,4), labels = names(pwf), col = getColours(pwf))
Define the PwfCols class and associated methods
This is an object of with four colours in components named PASS, WARN, FAIL and MAX. Used to indicate these categories as defined on the standard plots from fastqc.
An S4 object of class PwfCols
PASS
A vector of length 1, defining the colour for PASS in rgb format. Defaults to rgb(0, 0.8, 0)
WARN
A vector of length 1, defining the colour for WARN in rgb format. Defaults to rgb(0.9, 0.9, 0.2)
FAIL
A vector of length 1, defining the colour for FAIL in rgb format. Defaults to rgb(0.8, 0.2, 0.2)
MAX
A vector of length 1, defining the colour for an extreme FAIL or NA in rgb format. Defaults to rgb(1, 1, 1)
Get the read totals from one or more FASTQC reports
readTotals(x)
readTotals(x)
x |
Can be a |
A tibble
with the columns Filename
and
Total_Sequences
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Print the read totals readTotals(fdl)
# Get the files included with the package packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) # Load the FASTQC data as a FastqcDataList object fdl <- FastqcDataList(fl) # Print the read totals readTotals(fdl)
Summarise the Overrepresented sequences found in one or more QC files
summariseOverrep(x, ...) ## S4 method for signature 'FastpData' summariseOverrep(x, step = c("Before", "After"), min_count = 0, ...) ## S4 method for signature 'FastpDataList' summariseOverrep( x, min_count = 0, step = c("Before", "After"), vals = c("count", "rate"), fn = c("mean", "sum", "max"), by = c("reads", "sequence"), ... ) ## S4 method for signature 'FastqcDataList' summariseOverrep( x, min_count = 0, vals = c("Count", "Percentage"), fn = c("mean", "sum", "max"), pattern = ".*", ... ) ## S4 method for signature 'FastqcData' summariseOverrep( x, min_count = 0, vals = c("Count", "Percentage"), fn = c("mean", "sum", "max"), pattern = ".*", by = "Filename", ... )
summariseOverrep(x, ...) ## S4 method for signature 'FastpData' summariseOverrep(x, step = c("Before", "After"), min_count = 0, ...) ## S4 method for signature 'FastpDataList' summariseOverrep( x, min_count = 0, step = c("Before", "After"), vals = c("count", "rate"), fn = c("mean", "sum", "max"), by = c("reads", "sequence"), ... ) ## S4 method for signature 'FastqcDataList' summariseOverrep( x, min_count = 0, vals = c("Count", "Percentage"), fn = c("mean", "sum", "max"), pattern = ".*", ... ) ## S4 method for signature 'FastqcData' summariseOverrep( x, min_count = 0, vals = c("Count", "Percentage"), fn = c("mean", "sum", "max"), pattern = ".*", by = "Filename", ... )
x |
An object of a suitable class |
... |
Not used |
step |
Can be 'Before', 'After' or both to obtain data from the Before_filtering or After_filtering modules |
min_count |
Filter sequences with counts less than this value, both before and after filtering |
vals |
Values to use for creating summaries across multiple files. For FastpDataList objects these can be "count" and/or "rate", whilst for FastqcDataList objects these values can be "Count" and/or "Percentage" |
fn |
Functions to use when summarising values across multiple files |
by |
character vector of columns to summarise by. See dplyr::summarise |
pattern |
Regular expression to filter the Possible_Source column by |
This function prepares a useful summary of all over-represented sequences as reported by either fastp or FastQC
A tibble
Tibble columns will vary between Fastp*, FastqcDataList and FastqcData objects. Calling this function on list-type objects will attempt to summarise the presence each over-represented sequence across all files.
In particular, FastqcData objects will provide the requested summary statistics across all sequences within a file
## For operations on a FastpData object f <- system.file("extdata/fastp.json.gz", package = "ngsReports") fp <- FastpData(f) summariseOverrep(fp, min_count = 100) ## Applying the function to a FastqcDataList packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) fdl <- FastqcDataList(fl) summariseOverrep(fdl) # An alternative viewpoint can be obtained using fdl |> lapply(summariseOverrep) |> dplyr::bind_rows()
## For operations on a FastpData object f <- system.file("extdata/fastp.json.gz", package = "ngsReports") fp <- FastpData(f) summariseOverrep(fp, min_count = 100) ## Applying the function to a FastqcDataList packageDir <- system.file("extdata", package = "ngsReports") fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE) fdl <- FastqcDataList(fl) summariseOverrep(fdl) # An alternative viewpoint can be obtained using fdl |> lapply(summariseOverrep) |> dplyr::bind_rows()
Contains Theoretical GC content for a selection of species
Estimates are able to be retained for genomic and transcriptomic sequences. Values are stored as frequencies.
An object of class TheoreticalGC
Genome
A data.frame
containing theoretical GC content for
genomic sequences
Transcriptome
A data.frame
containing theoretical GC content
for transcriptomic sequences
mData
A data.frame
containing metadata about all species in the
object
## How to form an object using your own fasta file faDir <- system.file("extdata", package = "ngsReports") faFile <- list.files(faDir, pattern = "fasta", full.names = TRUE) gen_df <- estGcDistn(faFile, n = 200) gen_df <- dplyr::rename(gen_df, Athaliana = Freq) mData_df <- data.frame(Name = "Athaliana", Genome = TRUE, Transcriptome = FALSE) tr_df <- data.frame() myGC <- new( "TheoreticalGC", Genome = gen_df, Transcriptome = tr_df, mData = mData_df)
## How to form an object using your own fasta file faDir <- system.file("extdata", package = "ngsReports") faFile <- list.files(faDir, pattern = "fasta", full.names = TRUE) gen_df <- estGcDistn(faFile, n = 200) gen_df <- dplyr::rename(gen_df, Athaliana = Freq) mData_df <- data.frame(Name = "Athaliana", Genome = TRUE, Transcriptome = FALSE) tr_df <- data.frame() myGC <- new( "TheoreticalGC", Genome = gen_df, Transcriptome = tr_df, mData = mData_df)
Compiles an HTML report using a supplied template
writeHtmlReport( fastqcDir, template, outDir, usePlotly = TRUE, species = "Hsapiens", gcType = c("Genome", "Transcriptome"), nOver = 30, targetsDF, overwrite = FALSE, quiet = TRUE )
writeHtmlReport( fastqcDir, template, outDir, usePlotly = TRUE, species = "Hsapiens", gcType = c("Genome", "Transcriptome"), nOver = 30, targetsDF, overwrite = FALSE, quiet = TRUE )
fastqcDir |
A directory containing zipped, or extracted FastQC reports |
template |
The template file which will be copied into |
outDir |
The directory to write the compiled document to |
usePlotly |
Generate interactive plots? |
species |
Species/closely related species of sequenced samples |
gcType |
Is the data "Transcriptomic" or "Genomic" in nature? |
nOver |
The maximum number of Overrepresented Sequences to show |
targetsDF |
A data.frame with at least two columns named
|
overwrite |
|
quiet |
|
This will take a user supplied template, or the file supplied with the package and create an HTML summary of all standard FASTQC plots for all files in the supplied directory.
Silently returns TRUE
and will output a compiled HTML file from the
supplied Rmarkdown template file
## Not run: packageDir <- system.file("extdata", package = "ngsReports") fileList <- list.files(packageDir, pattern = "fastqc.zip", full.names= TRUE) # Copy these files to tempdir() to avoid overwriting # any files in the package directory file.copy(fileList, tempdir(), overwrite = TRUE) writeHtmlReport(fastqcDir = tempdir()) ## End(Not run)
## Not run: packageDir <- system.file("extdata", package = "ngsReports") fileList <- list.files(packageDir, pattern = "fastqc.zip", full.names= TRUE) # Copy these files to tempdir() to avoid overwriting # any files in the package directory file.copy(fileList, tempdir(), overwrite = TRUE) writeHtmlReport(fastqcDir = tempdir()) ## End(Not run)