Package 'countsimQC' reference manual

Title:	Compare Characteristic Features of Count Data Sets
Description:	countsimQC provides functionality to create a comprehensive report comparing a broad range of characteristics across a collection of count matrices. One important use case is the comparison of one or more synthetic count matrices to a real count matrix, possibly the one underlying the simulations. However, any collection of count matrices can be compared.
Authors:	Charlotte Soneson [aut, cre]
Maintainer:	Charlotte Soneson <[email protected]>
License:	GPL (>=2)
Version:	1.25.0
Built:	2025-01-21 06:15:02 UTC
Source:	https://github.com/bioc/countsimQC

Example list with three count data sets

Description

A named list with three elements, each corresponding to a (real or simulated) count data set.

Usage

countsimExample
countsimExample

Format

A named list with three elements, each corresponding to a (real or simulated) count data set.

Details

The Original data set represents a subset of 10,000 genes and 11 cells from the GSE74596 single-cell RNA-seq data set, obtained from the conquer repository (http://imlspenticton.uzh.ch:3838/conquer/). The Sim1 and Sim2 data sets similarly represent subsets of scRNA-seq data sets simulated with two different simulation methods, using the real GSE74596 data set as the basis for parameter estimation. Each data set is represented as a DESeqDataSet object.

Value

A named list with three elements, each corresponding to a (real or simulated) count data set.

Example list with three count data sets in different formats

Description

A named list with three elements, each corresponding to a (real or simulated) count data set. One of them is provided as a DESeqDataset, one as a count data frame and one as a count matrix.

Usage

countsimExample_dfmat
countsimExample_dfmat

Format

A named list with three elements, each corresponding to a (real or simulated) count data set.

Details

Value

A named list with three elements, each corresponding to a (real or simulated) count data set.

countsimQC

Description

countsimQC

Generate countsimQC report

Description

Generate a report comparing a range of characteristics across a collection of one or more count data sets.

Usage

countsimQCReport(
  ddsList,
  outputFile,
  outputDir = "./",
  outputFormat = NULL,
  showCode = FALSE,
  rmdTemplate = NULL,
  forceOverwrite = FALSE,
  savePlots = FALSE,
  description = NULL,
  maxNForCorr = 500,
  maxNForDisp = Inf,
  calculateStatistics = TRUE,
  subsampleSize = 500,
  kfrac = 0.01,
  kmin = 5,
  permutationPvalues = FALSE,
  nPermutations = NULL,
  knitrProgress = FALSE,
  quiet = FALSE,
  ignorePandoc = FALSE,
  useRAGG = FALSE,
  dpi = 96,
  ...
)
countsimQCReport(
  ddsList,
  outputFile,
  outputDir = "./",
  outputFormat = NULL,
  showCode = FALSE,
  rmdTemplate = NULL,
  forceOverwrite = FALSE,
  savePlots = FALSE,
  description = NULL,
  maxNForCorr = 500,
  maxNForDisp = Inf,
  calculateStatistics = TRUE,
  subsampleSize = 500,
  kfrac = 0.01,
  kmin = 5,
  permutationPvalues = FALSE,
  nPermutations = NULL,
  knitrProgress = FALSE,
  quiet = FALSE,
  ignorePandoc = FALSE,
  useRAGG = FALSE,
  dpi = 96,
  ...
)

Arguments

`ddsList`	Named list of `DESeqDataSets` or count matrices to compare. See the `DESeq2` Bioconductor package (http://bioconductor.org/packages/release/bioc/html/DESeq2.html) for more information about the `DESeqDataSet` class. Each `DESeqDataSet` object in the list should contain a count matrix, a data frame with sample information and a design formula. The sample information and design formula will be used to calculate dispersions appropriately. If count matrices are provided, it is assumed that all columns represent replicate samples, and the design formula ~1 will be used.
`outputFile`	The file name of the final report. The extension must match the selected `outputFormat` (i.e., either .html or .pdf).
`outputDir`	The directory where the final report should be saved.
`outputFormat`	The output format of the report. If set to NULL or "html_document", an html report will be generated. If set to "pdf_document", a pdf report will be generated.
`showCode`	Whether or not to include the code in the final report.
`rmdTemplate`	The Rmarkdown (.Rmd) file that will be used as the template for generating the report. If set to NULL (default), the template provided with the countsimQC package will be used. See Details for more information.
`forceOverwrite`	Whether to force overwrite existing output files when saving the generated report and figures.
`savePlots`	Whether to save the ggplot objects for all the output figures, to allow additional fine-tuning and generation of individual plots. Note that the resulting file can be quite large, especially when many and/or large data sets are compared.
`description`	A string (of arbitrary length) describing the content of the generated report. This will be included in the beginning of the report. If set to NULL, a default description listing the number and names of the included data sets will be used.
`maxNForCorr`	The maximal number of samples (features) for which pairwise correlation coefficients will be calculated. If the number of samples (features) exceeds this number, they will be randomly subsampled.
`maxNForDisp`	The maximal number of samples that will be used to estimate dispersions. By default, all samples are used. This can be lowered to speed up calculations (and obtain approximate results) for large data sets.
`calculateStatistics`	Whether to calculate quantitative pairwise statistics for comparing data sets in addition to generating the plots.
`subsampleSize`	The number of randomly selected observations (samples, features or pairs of samples or features) for which certain (time-consuming) statistics will be calculated. Only used if `calculateStatistics` = TRUE.
`kmin`, `kfrac`	For statistics that require the extraction of the k nearest neighbors of a given point, the number of neighbors will be max(kmin, kfrac * nrow(df))
`permutationPvalues`	Whether to calculate permutation p-values for selected pairwise data set comparison statistics.
`nPermutations`	The number of permutations to perform when calculating permutation p-values for data set comparison statistics. Only used if `permutationPvalues` = TRUE.
`knitrProgress`	Whether to show the progress bar when the report is generated.
`quiet`	Whether to suppress warnings and progress messages when the report is generated.
`ignorePandoc`	Determines what to do if pandoc or pandoc-citeproc is missing (if Sys.which("pandoc") or Sys.which("pandoc-citeproc") is ""). If ignorePandoc is TRUE, only a warning is given. The figures will be generated, but not the final report. If ignorePandoc is FALSE (default), the execution stops immediately.
`useRAGG`	Logical scalar, indicating whether to use ragg_png as the graphics device in the report rather than the default png.
`dpi`	Numeric scalar, setting the dpi of the generated plots. Only used if `useRAGG` is `TRUE`.
`...`	Other arguments that will be passed to `rmarkdown::render`.

Details

When the function is called, the template file (specified by rmdTemplate) will be copied into the output folder, and rmarkdown::render will be called to generate the final report. If there is already a .Rmd file with the same name in the output folder, the function will raise an error and stop, to avoid overwriting the existing file. The reason for this behaviour is that the copied template in the output folder will be deleted once the report is generated.

Value

No value is returned, but a report is generated in the outputDir directory.

Author(s)

Charlotte Soneson

Examples

## Load example data
data(countsimExample)
## Not run: 
## Generate report
countsimQCReport(countsimExample, outputDir = "./",
                 outputFile = "example.html")

## End(Not run)

## Load example data
data(countsimExample)
## Not run: 
## Generate report
countsimQCReport(countsimExample, outputDir = "./",
                 outputFile = "example.html")

## End(Not run)

Generate individual plots from countsimQCReport output

Description

Generate separate plots for all evaluation criteria using the collection of ggplot objects that can be saved when generating a countsimQC report (by setting savePlots = TRUE).

Usage

generateIndividualPlots(
  ggplotsRds,
  device = "png",
  outputDir = "./",
  nDatasets = 2
)
generateIndividualPlots(
  ggplotsRds,
  device = "png",
  outputDir = "./",
  nDatasets = 2
)

Arguments

`ggplotsRds`	The path to a .rds file generated by `countsimQCReport` by setting `savePlots = TRUE`, or the list of plots stored in this file.
`device`	One of "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only) (will be provided to the `ggsave` function from the `ggplot2` package).
`outputDir`	The output directory where the plots should be generated.
`nDatasets`	The number of data sets that are compared in the figures. This is needed to set the size of the plots correctly.

Value

Nothing is returned, but plots are generated in the designated output directory.

Author(s)

Charlotte Soneson

Examples

## Load example data
data(countsimExample)
## Not run: 
## Generate report
countsimQCReport(countsimExample, outputDir = "./",
                 outputFile = "example.html", savePlots = TRUE)
## Generate individual plots
generateIndividualPlots("example_ggplots.rds", nDatasets = 3)

## End(Not run)

## Load example data
data(countsimExample)
## Not run: 
## Generate report
countsimQCReport(countsimExample, outputDir = "./",
                 outputFile = "example.html", savePlots = TRUE)
## Generate individual plots
generateIndividualPlots("example_ggplots.rds", nDatasets = 3)

## End(Not run)

Package 'countsimQC'

Help Index

Example list with three count data sets

Description

Usage

Format

Details

Value

Example list with three count data sets in different formats

Description

Usage

Format

Details

Value

countsimQC

Description

Generate countsimQC report

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Generate individual plots from countsimQCReport output

Description

Usage

Arguments

Value

Author(s)

Examples