Package 'systemPipeR' reference manual

Title:	systemPipeR: Workflow Environment for Data Analysis and Report Generation
Description:	systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.
Authors:	Thomas Girke
Maintainer:	Thomas Girke <[email protected]>
License:	Artistic-2.0
Version:	2.13.0
Built:	2025-03-27 05:23:14 UTC
Source:	https://github.com/bioc/systemPipeR

systemPipeR package for Workflow Environment

Description

The systemPipeR package provides a suite of R/Bioconductor for designing, building and running end-to-end analysis workflows on local machines, HPC clusters and cloud systems, while generating at the same time publication quality analysis reports.

For detailed information on usage, see the package vignette, by typing vignette("systemPipeR"), or more information on the project here: https://systempipe.org/spr

All software-related questions should be posted to the Bioconductor Support Site: https://support.bioconductor.org

The code can be viewed at the GitHub repository: https://github.com/tgirke/systemPipeR

Author(s)

Daniela Cassol, Tyler Backman, Thomas Girke

References

Backman TWH, Girke T (2016) systemPipeR: NGS workflow and report generation environment. BMC Bioinformatics 17 (1). https://doi.org/10.1186/s12859-016-1241-0

Extension accessor methods for SummarizedExperiment object

Description

Accessors for adding new data to the 'assay' and 'metadata' slot of a SummarizedExperiment object

Usage

addAssay(x, ...)
addMetadata(x, ...)
addAssay(x, ...)
addMetadata(x, ...)

Arguments

`x`	Object of class `SummarizedExperiment`.
`...`	dots, name of the object.

Methods

addAssay: signature(x = "SummarizedExperiment"): add new dataset to assays slot
addMetadata: signature(x = "SummarizedExperiment"): add new dataset to metadata slot

Author(s)

Daniela Cassol

Alignment statistics

Description

Generate data frame containing important read alignment statistics such as the total number of reads in the FASTQ files, the number of total alignments, as well as the number of primary alignments in the corresponding BAM files.

Usage

alignStats(args, fqpaths, pairEnd = TRUE, output_index = 1, subset="FileName1")
alignStats(args, fqpaths, pairEnd = TRUE, output_index = 1, subset="FileName1")

Arguments

`args`	Object of class `SYSargs` or `SYSargs2` or `named character vector` with BAM files PATH and the elements names should be the sampleID.
`fqpaths`	`named character vector` with raw FASTQ files PATH and the names should be the sampleID. Required when `class(args)` is "character".
`pairEnd`	logical. For pair-end libraries, select `TRUE`.
`output_index`	A numeric index positions of the file in `SYSargs2` object, slot `output`. Default is `output_index=1`.
`subset`	`subset` are the variables defined in the `param.yml` file, for example "FileName1".

Value

data.frame with alignment statistics.

Author(s)

Thomas Girke

Examples

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF

names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on single machine
WF <- runCommandline(args=WF, make_bam=TRUE)

## Alignment stats
read_statsDF <- alignStats(WF, subset="FileName") 
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")

## End(Not run)

#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Alignment stats
read_statsDF <- alignStats(args) 
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")

## End(Not run)
##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF

names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on single machine
WF <- runCommandline(args=WF, make_bam=TRUE)

## Alignment stats
read_statsDF <- alignStats(WF, subset="FileName") 
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")

## End(Not run)

#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Alignment stats
read_statsDF <- alignStats(args) 
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")

## End(Not run)

Class `"catDB"`

Description

Container for storing mappings of genes to annotation categories such as gene ontologies (GO), pathways or conserved sequence domains. The catmap slot stores a list of data.frames providing the direct assignments of genes to annotation categories (e.g. gene-to-GO mappings); catlist is a list of lists of all direct and indirect associations to the annotation categories (e.g. genes mapped to a pathway); and idconv allows to store a lookup-table for converting identifiers (e.g. array feature ids to gene ids).

Objects from the Class

Objects can be created by calls of the form new("catDB", ...).

Slots

catmap:: Object of class "list" list of data.frames
catlist:: Object of class "list" list of lists
idconv:: Object of class "ANY" list of data.frames

Methods

catlist: signature(x = "catDB"): extracts data from catlist slot
catmap: signature(x = "catDB"): extracts data from catmap slot
coerce: signature(from = "list", to = "catDB"): as(list, "catDB")
idconv: signature(x = "catDB"): extracts data from idconv slot
names: signature(x = "catDB"): extracts slot names
show: signature(object = "catDB"): summary view of catDB objects

Author(s)

Thomas Girke

Examples

showClass("catDB")
## Not run: 
## Obtain annotations from BioMart
library("biomaRt")
listMarts()  # To choose BioMart database
listMarts(host = "plants.ensembl.org")
m <- useMart("plants_mart", host = "plants.ensembl.org")
listDatasets(m)
m <- useMart("plants_mart", dataset = "athaliana_eg_gene", host = "plants.ensembl.org")
listAttributes(m)  # Choose data types you want to download
go <- getBM(attributes = c("go_id", "tair_locus", "namespace_1003"), mart = m)
go <- go[go[, 3] != "", ]
go[, 3] <- as.character(go[, 3])
go[go[, 3] == "molecular_function", 3] <- "F"
go[go[, 3] == "biological_process", 3] <- "P"
go[go[, 3] == "cellular_component", 3] <- "C"
go[1:4, ]

dir.create("./data/GO", recursive = TRUE)
write.table(go, "data/GO/GOannotationsBiomart_mod.txt", quote = FALSE, row.names = FALSE,
            col.names = FALSE, sep = "\t")
            
## Create catDB instance (takes a while but needs to be done only once)
catdb <- makeCATdb(myfile = "data/GO/GOannotationsBiomart_mod.txt", lib = NULL, org = "",
                   colno = c(1, 2, 3), idconv = NULL)
catdb
save(catdb, file = "data/GO/catdb.RData")
load("data/GO/catdb.RData")

## End(Not run)
showClass("catDB")
## Not run: 
## Obtain annotations from BioMart
library("biomaRt")
listMarts()  # To choose BioMart database
listMarts(host = "plants.ensembl.org")
m <- useMart("plants_mart", host = "plants.ensembl.org")
listDatasets(m)
m <- useMart("plants_mart", dataset = "athaliana_eg_gene", host = "plants.ensembl.org")
listAttributes(m)  # Choose data types you want to download
go <- getBM(attributes = c("go_id", "tair_locus", "namespace_1003"), mart = m)
go <- go[go[, 3] != "", ]
go[, 3] <- as.character(go[, 3])
go[go[, 3] == "molecular_function", 3] <- "F"
go[go[, 3] == "biological_process", 3] <- "P"
go[go[, 3] == "cellular_component", 3] <- "C"
go[1:4, ]

dir.create("./data/GO", recursive = TRUE)
write.table(go, "data/GO/GOannotationsBiomart_mod.txt", quote = FALSE, row.names = FALSE,
            col.names = FALSE, sep = "\t")
            
## Create catDB instance (takes a while but needs to be done only once)
catdb <- makeCATdb(myfile = "data/GO/GOannotationsBiomart_mod.txt", lib = NULL, org = "",
                   colno = c(1, 2, 3), idconv = NULL)
catdb
save(catdb, file = "data/GO/catdb.RData")
load("data/GO/catdb.RData")

## End(Not run)

catDB accessor methods

Description

Methods to access information from catDB object.

Usage

	catmap(x)
catmap(x)

Arguments

`x`	object of class `catDB`

Value

various outputs

Author(s)

Thomas Girke

Examples

## Not run: 
## Obtain annotations from BioMart
m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) 
m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene")
listAttributes(m) # Choose data types you want to download
go <- getBM(attributes=c("go_accession", "tair_locus", 
                            "go_namespace_1003"), mart=m)
go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3])
write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, 
            row.names=FALSE, col.names=FALSE, sep="\t")

## Create catDB instance (takes a while but needs to be done only once)
catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, 
                    org="", colno=c(1,2,3), idconv=NULL)
catdb

## Access methods for catDB
catmap(catdb)$D_MF[1:4,]
catlist(catdb)$L_MF[1:4]
idconv(catdb)


## End(Not run)
## Not run: 
## Obtain annotations from BioMart
m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) 
m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene")
listAttributes(m) # Choose data types you want to download
go <- getBM(attributes=c("go_accession", "tair_locus", 
                            "go_namespace_1003"), mart=m)
go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3])
write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, 
            row.names=FALSE, col.names=FALSE, sep="\t")

## Create catDB instance (takes a while but needs to be done only once)
catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, 
                    org="", colno=c(1,2,3), idconv=NULL)
catdb

## Access methods for catDB
catmap(catdb)$D_MF[1:4,]
catlist(catdb)$L_MF[1:4]
idconv(catdb)


## End(Not run)

Checking if the outfiles files exist

Description

This function returns a data.frame indicating the number of existing files and how many files are missing.

Usage

check.output(sysargs, type="data.frame")
check.outfiles(sysargs, type="data.frame")
check.output(sysargs, type="data.frame")
check.outfiles(sysargs, type="data.frame")

Arguments

`sysargs`	object of class `SYSargs2` or `SYSargsList`.
`type`	return object option. It can be `data.frame` or `list`.

Value

data.frame or list containing all the outfiles file information.

Author(s)

Daniela Cassol and Thomas Girke

Examples

## Construct SYSargs2 object 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
## Check output
check.output(WF)
check.output(WF, "list")

## Construct SYSargsList object 
sal <- SPRproject(overwrite=TRUE) 
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")
appendStep(sal) <- SYSargsList(step_name = "echo", 
                      targets=targetspath, dir=TRUE,
                      wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                      dir_path = system.file("extdata/cwl", package="systemPipeR"),
                      inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
## Check outfiles
check.outfiles(sal)
## Construct SYSargs2 object 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
## Check output
check.output(WF)
check.output(WF, "list")

## Construct SYSargsList object 
sal <- SPRproject(overwrite=TRUE) 
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")
appendStep(sal) <- SYSargsList(step_name = "echo", 
                      targets=targetspath, dir=TRUE,
                      wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                      dir_path = system.file("extdata/cwl", package="systemPipeR"),
                      inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
## Check outfiles
check.outfiles(sal)

Submit command-line tools to cluster

Description

Submits non-R command-line software to queueing/scheduling systems of compute clusters using run specifications defined by functions similar to runCommandline. clusterRun can be used with most queueing systems since it is based on utilities from the batchtools package which supports the use of template files (*.tmpl) for defining the run parameters of the different schedulers. The path to the *.tmpl file needs to be specified in a conf file provided under the conffile argument.

Usage

clusterRun(args, 
            FUN = runCommandline, 
            more.args = list(args = args, make_bam = TRUE), 
            conffile = ".batchtools.conf.R", 
            template = "batchtools.slurm.tmpl", 
            Njobs, 
            runid = "01", 
            resourceList)
clusterRun(args, 
            FUN = runCommandline, 
            more.args = list(args = args, make_bam = TRUE), 
            conffile = ".batchtools.conf.R", 
            template = "batchtools.slurm.tmpl", 
            Njobs, 
            runid = "01", 
            resourceList)

Arguments

`args`	Object of class `SYSargs` or `SYSargs2`.
`FUN`	Accepts functions such as `runCommandline(args, ...)` where the `args` argument is mandatory and needs to be of class `SYSargs` or `SYSargs2`.
`more.args`	Object of class `list`, which provides the arguments that control the `FUN` function.
`conffile`	Path to conf file (default location `./.batchtools.conf.R`). This file contains in its simplest form just one command, such as this line for the Slurm scheduler: `cluster.functions <- makeClusterFunctionsSlurm(template="batchtools.slurm.tmpl")`. For more detailed information visit this page: https://mllg.github.io/batchtools/index.html
`template`	The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/mllg/batchtools/tree/master/inst/templates. Slurm, PBS/Torque, and Sun Grid Engine (SGE) templates are provided.
`Njobs`	Interger defining the number of cluster jobs. For instance, if `args` contains 18 command-line jobs and `Njobs=9`, then the function will distribute them accross 9 cluster jobs each running 2 command-line jobs. To increase the number of CPU cores used by each process, one can do this under the corresonding argument of the command-line tool, e.g. `-p` argument for Tophat.
`runid`	Run identifier used for log file to track system call commands. Default is `"01"`.
`resourceList`	`List` for reserving for each cluster job sufficient computing resources including memory (Megabyte), number of nodes, CPU cores, walltime (minutes), etc. For more details, one can consult the template file for each queueing/scheduling system.

Value

Object of class Registry, as well as files and directories created by the executed command-line tools.

Author(s)

Daniela Cassol and Thomas Girke

References

For more details on batchtools, please consult the following page: https://github.com/mllg/batchtools/

Examples

#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers. 
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, FUN = runCommandline, 
                    more.args = list(args = args, make_bam = TRUE), 
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl", 
                    Njobs=18, runid="01", 
                    resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))

## End(Not run)

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers.  
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, 
                    more.args = list(args = WF, make_bam = TRUE),
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl",
                    Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)

## Updates the path in the object \code{output(WF)}
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

## Alignment stats
read_statsDF <- alignStats(WF) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, 
                quote=FALSE, sep="\t")

## End(Not run)
#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers. 
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, FUN = runCommandline, 
                    more.args = list(args = args, make_bam = TRUE), 
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl", 
                    Njobs=18, runid="01", 
                    resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))

## End(Not run)

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers.  
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, 
                    more.args = list(args = WF, make_bam = TRUE),
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl",
                    Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)

## Updates the path in the object \code{output(WF)}
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

## Alignment stats
read_statsDF <- alignStats(WF) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, 
                quote=FALSE, sep="\t")

## End(Not run)

Adding param file

Description

Replace or adding a input configuration setting at "YML" param file

Usage

config.param(input_file = NULL, param, file = "default", silent = FALSE)
config.param(input_file = NULL, param, file = "default", silent = FALSE)

Arguments

`input_file`	a `list` of parameters, `param file` path, or `SYSargs2` object.
`param`	object of class `list`, expressing the values and names to be added or replace at the command-line definition or the configuration files.
`file`	name and path of the new file. If set to `default`, the name of the write file will have the pattern: `<Date-Time>_<OriginalName>.yml`. If set to `append`, the `param` information it will be append on the same file.
`silent`	if set to TRUE, all messages returned by the function will be suppressed.

Author(s)

Daniela Cassol

Examples

## Not run: 
input_file <- system.file("extdata", "cwl/hisat2/hisat2-mapping-se.yml", package="systemPipeR")
param <- list(thread=10, fq=list(class="File", path="./results2"))
input <- config.param(input_file=input_file, param, file="default")
 
## End(Not run)
## Not run: 
input_file <- system.file("extdata", "cwl/hisat2/hisat2-mapping-se.yml", package="systemPipeR")
param <- list(thread=10, fq=list(class="File", path="./results2"))
input <- config.param(input_file=input_file, param, file="default")
 
## End(Not run)

Workflow Steps Selection

Description

This function allows the user to control of which step of the workflow will be run and the generation of the new RMarkdown.

Usage

configWF(x, input_steps = "ALL", exclude_steps = NULL, silent = FALSE, ...)
configWF(x, input_steps = "ALL", exclude_steps = NULL, silent = FALSE, ...)

Arguments

`x`	object of class `SYSargsList`.
`input_steps`	a character vector of all steps desires to preserve on the output file. Default is `ALL`. It can be used the symbol ":" to select many steps in sequence, for example, input_steps=1:5.2, from step 1 to step 5.2. The symbol "." represents the substeps and symbol "," is used to separate selections. Jump from a major step to sub-step is supported, but if a major step is selected/excluded, all sub-steps of this major step will be selected/excluded. Repeatedly selected steps will only result in a unique step.It is recommended to put major steps in `input_steps`, like `1:4, 6:8, 10`; and unwanted sub-steps in `exclude_step`, like `1.1, 3.1.1-3.1.3, 6.5`. Reverse selecting is supported e.g. `10:1`.
`exclude_steps`	a character vector of all steps desires to remove on the output file.
`silent`	if set to TRUE, all messages returned by the function will be suppressed.
`...`	Additional arguments to pass on to `configWF()`.

Author(s)

Daniela Cassol

Examples

## Not run: 
library(systemPipeRdata)
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
script <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata")
SYSconfig <- initProject(projPath="./", targets=targets, script=script, overwrite=TRUE, silent=TRUE)

sysargslist <- configWF(x=sysargslist)
 
## End(Not run)
## Not run: 
library(systemPipeRdata)
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
script <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata")
SYSconfig <- initProject(projPath="./", targets=targets, script=script, overwrite=TRUE, silent=TRUE)

sysargslist <- configWF(x=sysargslist)
 
## End(Not run)

Read counting for several range sets

Description

Convenience function to perform read counting iteratively for serveral range sets, e.g. peak range sets or feature types. Internally, the read counting is performed with the summarizeOverlaps function from the GenomicAlignments package. The resulting count tables are directly saved to files.

Usage

countRangeset(bfl, args, outfiles=NULL, format="tabular", ...)
countRangeset(bfl, args, outfiles=NULL, format="tabular", ...)

Arguments

`bfl`	`BamFileList` object containing paths to one or more BAM files.
`args`	An instance of `SYSargs` or `SYSargs2` constructed from a `targets` file where the first column (`targetsin(args)` or `targets.as.df(targets(args))`) contains the paths to the tabular range data files (e.g. peak ranges) used for counting. Another possibily is `named character vector` with the paths to the tabular range data files and the elements names should be the sampleID.
`outfiles`	Default is `NULL`. When `args` is an object of `named character vector` class, `outfile` argument is required. `Named character vector` with the paths to the resulting count tables and the elements names should be the sampleID.
`format`	Format of input range files. Currently, supported are `tabular` or `bed`. If `tabular` is selected then the input range files need to contain the proper column titles to coerce with `as(..., "GRanges")` to `GRanges` objects after importing them with `read.delim`. The latter is the case for the peak files (`*peaks.xls`) generated by the MACS2 software.
`...`	Arguments to be passed on to internally used `summarizeOverlaps` function.

Value

Named character vector containing the paths from outpaths(args) to the resulting count table files.

Author(s)

Thomas Girke

Examples

## Paths to BAM files
param <- system.file("extdata", "bowtieSE.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args_bam <- systemArgs(sysma=param, mytargets=targets)
bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character())

## Not run: 
##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object 
## SYSargs2 with paths to range data and count files
dir_path <- system.file("extdata/cwl/count_rangesets", package="systemPipeR")
args <- loadWF(targets = "targets_macs.txt", wf_file = "count_rangesets.cwl", 
    input_file = "count_rangesets.yml", dir_path = dir_path)
args <- renderWF(args, inputvars = c(FileName = "_FASTQ_PATH1_", SampleName = "_SampleName_"))

## Iterative read counting
countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE)

##########################################
## Examples with \code{SYSargs} object ##
##########################################
## Construct SYSargs object 
## SYSargs with paths to range data and count files
args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt")

## Iterative read counting
countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE)
writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE)

## End(Not run)
## Paths to BAM files
param <- system.file("extdata", "bowtieSE.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args_bam <- systemArgs(sysma=param, mytargets=targets)
bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character())

## Not run: 
##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object 
## SYSargs2 with paths to range data and count files
dir_path <- system.file("extdata/cwl/count_rangesets", package="systemPipeR")
args <- loadWF(targets = "targets_macs.txt", wf_file = "count_rangesets.cwl", 
    input_file = "count_rangesets.yml", dir_path = dir_path)
args <- renderWF(args, inputvars = c(FileName = "_FASTQ_PATH1_", SampleName = "_SampleName_"))

## Iterative read counting
countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE)

##########################################
## Examples with \code{SYSargs} object ##
##########################################
## Construct SYSargs object 
## SYSargs with paths to range data and count files
args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt")

## Iterative read counting
countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE)
writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE)

## End(Not run)

createParam

Description

The constructor function creates an SYSargs2 S4 class object from command-line string. Also, the function creates and saves the CWL param files. The latest storages all the parameters required for running command-line software, following the standard and specification defined on Common Workflow Language (CWL).

Usage

createParamFiles(commandline, cwlVersion = "v1.1", class = "CommandLineTool",
                  results_path = "./results", module_load = "baseCommand",
                  file = "default", syntaxVersion = "v1",
                  writeParamFiles = TRUE, confirm = FALSE,
                  overwrite = FALSE, silent = FALSE)

writeParamFiles(sysargs, file = "default", overwrite = TRUE, silent = FALSE,
                syntaxVersion = "v1")
createParamFiles(commandline, cwlVersion = "v1.1", class = "CommandLineTool",
                  results_path = "./results", module_load = "baseCommand",
                  file = "default", syntaxVersion = "v1",
                  writeParamFiles = TRUE, confirm = FALSE,
                  overwrite = FALSE, silent = FALSE)

writeParamFiles(sysargs, file = "default", overwrite = TRUE, silent = FALSE,
                syntaxVersion = "v1")

Arguments

`commandline`	string. Original command-line to create the CWL files from. Please see `Details` for more information.
`cwlVersion`	string. The version of the Common Workflow Language. More information here: https://www.commonwl.org/.
`class`	character. Name of Common Workflow Language Description class. The following is accepted: `CommandLineTool`.
`results_path`	Path to the results folder. Default is `results`.
`module_load`	string, Name of software to load by Environment Modules system. Default is "baseCommand", which creates a subfolder and two files: .cwl and .yml at `./param/cwl/`.
`file`	character. Name and path of output files. If set to "default" then the name of the output files will have the pattern `<software>.cwl` and `<software>.yml`, where <software> will be what `baseCommand(x)` returns, when `x` is an object of class `SYSargs2`. Also, it creates a subfolder at `./param/cwl/` with name <software>.
`syntaxVersion`	character. One of `"v1"` or `"v2"`, what CWL parsing syntax version to use. Default is to use the old version. The V2 version comes with more feature support, but has more syntax restrictions. See details.
`writeParamFiles`	logical. If set to TRUE, it will write to file the content of the CWL files:`.cwl` and `.yml`. Default is `TRUE`.
`confirm`	If set to `FALSE` and in an interactive section, it will prompt a question to proceed or not.
`overwrite`	logical. If set to TRUE, existing files of the same name will be overwritten. Default is `FALSE`.
`silent`	logical. If set to TRUE, all messages returned by the function will be suppressed. Default is `FALSE`.
`sysargs`	Object of class `SYSargs2`. Output from the `createParamFiles` function.

Details

Version 1 syntax

- First line of the command-line object will be treated as the baseCommand;

- For argument lines (starting from the second line), any word before the first space with leading '-' or '–' in each will be treated as a prefix, like -S or --min. Any line without this first word will be treated as no prefix;

- All defaults are placed inside <...>;

- First argument is the input argument type. F for "File", int for integer, string for string;

- Optional: use the keyword out followed the type with a , comma separation to indicate if this argument is also a CWL output;

- Then, use : to separate keywords and default values, any non-space value after the : will be treated as the default value;

- If any argument has no default value, just a flag, like --verbose, there no need to add any <...>.

- The \ is not required, however for consistency it is recommended to add.

Version 2 syntax

- First line of the command-line object will be treated as the baseCommand;

- Each line specifies one argument and its default value.

- Each line is composed with exact 2 ; to seprate 3 columns.

- Text before first ; will be will used as prefix/names. If it starts with keyword "p:", anything after "p:" and before the first ; will be used as prefix, and the name of this position will be the prefix but with leading dash(s) "-", "-" removed. If there is any duplication, a number index will be added to the end. If there is no keyword "p:" before first ;, all text before first ; will be the name.

- If there is keyword "p:" before first ; but nothing before and after the second ;, this position will be treated as CWL argument instead of input.

- Text between first and second ; is type. Must be one of File, Directory, string, int, double, float, long, boolean.

- Text after second ; and before \ or end of the line is the default value. If it starts with keyword "out" or "stdout", this position will also be added to outputs or standard output.

- There is only 1 position with "stdout" allowed and usually it is the last position arguement.

- Ending with "\" is recommended but not required.

Value

SYSargs2 object

Author(s)

Le Zhang and Daniela Cassol

References

For more details on CWL, please consult the following page: https://www.commonwl.org/

Examples

## syntax version 1  example
command <- "
hisat2 \
    -S <F, out: ./results/M1A.sam> \
    -x <F: ./data/tair10.fasta> \
    -k <int: 1> \
    -min-intronlen <int: 30> \
    -max-intronlen <int: 3000> \
    -threads <int: 4> \
    -U <F: ./data/SRR446027_1.fastq.gz> \
    --verbose
"
cmd <- createParam(command, writeParamFiles=FALSE)
cmdlist(cmd)

## syntax version 2  example
command2 <- '
mycmd2 \
    p: -s; File; sample1.txt \
    p: -s; File; sample2.txt \
    p: --c; ; \
    p: -o; File; out: myout.txt \
    ref_genome; File; a.fasta \
    p: --nn; int; 12 \
    mystdout; File; stdout: abc.txt
'
cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE)
cmdlist(cmd2)

## syntax version 1  example
command <- "
hisat2 \
    -S <F, out: ./results/M1A.sam> \
    -x <F: ./data/tair10.fasta> \
    -k <int: 1> \
    -min-intronlen <int: 30> \
    -max-intronlen <int: 3000> \
    -threads <int: 4> \
    -U <F: ./data/SRR446027_1.fastq.gz> \
    --verbose
"
cmd <- createParam(command, writeParamFiles=FALSE)
cmdlist(cmd)

## syntax version 2  example
command2 <- '
mycmd2 \
    p: -s; File; sample1.txt \
    p: -s; File; sample2.txt \
    p: --c; ; \
    p: -o; File; out: myout.txt \
    ref_genome; File; a.fasta \
    p: --nn; int; 12 \
    mystdout; File; stdout: abc.txt
'
cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE)
cmdlist(cmd2)

Update CWL description files

Description

Function to sync and download the most updated CWL description files from the systemPipeR package.

Usage

cwlFilesUpdate(destdir, force = FALSE, verbose = TRUE)
cwlFilesUpdate(destdir, force = FALSE, verbose = TRUE)

Arguments

`destdir`	a character string with the directory path where the param are stored.
`force`	logical. Force the download of the CWL files.
`verbose`	logical. The setting `FALSE` suppresses all print messages.

Author(s)

Daniela Cassol

Examples

## Not run: 
destdir <- "param/"
cwlFilesUpdate(destdir)

## End(Not run)
## Not run: 
destdir <- "param/"
cwlFilesUpdate(destdir)

## End(Not run)

Class `"EnvModules"`

Description

The function module enables use of the Environment Modules system (http://modules.sourceforge.net/) from within the R environment. By default the user's login shell environment (ie. bash -l) will be used to initialize the current session. The module function can also; load or unload specific software, list all the loaded software within the current session, and list all the applications available for loading from the module system. Lastly, the module function can remove all loaded software from the current session.

Objects from the Class

Objects can be created by calls of the form new("EnvModules", ...).

Slots

available_modules:: Object of class "list" ~~
loaded_modules:: Object of class "list" ~~
default_modules:: Object of class "list" ~~
modulecmd:: Object of class "character" ~~

Methods

[: signature(x = "EnvModules"): ...
[[: signature(x = "EnvModules", i = "ANY", j = "missing"): ...
[[<-: signature(x = "EnvModules"): ...
$: signature(x = "EnvModules"): ...
available_modules: signature(x = "EnvModules"): ...
coerce: signature(from = "EnvModules", to = "list"): ...
coerce: signature(from = "list", to = "EnvModules"): ...
default_modules: signature(x = "EnvModules"): ...
EnvModules: signature(x = "EnvModules"): ...
loaded_modules: signature(x = "EnvModules"): ...
modulecmd: signature(x = "EnvModules"): ...
names: signature(x = "EnvModules"): ...
show: signature(object = "EnvModules"): ...

Author(s)

Jordan Hayes and Daniela Cassol

Examples

showClass("EnvModules")
showClass("EnvModules")

Toggles option `eval` on the RMarkdown files

Description

Function to evaluate (eval=TRUE) or not evaluate (eval=FALSE) R chunk codes in the Rmarkdown file. This function does not run the code, just toggles between TRUE or FALSE the option eval and writes out a new file with the chosen option.

Usage

evalCode(infile, eval = TRUE, output)
evalCode(infile, eval = TRUE, output)

Arguments

`infile`	name and path of the infile file, format `Rmd`.
`eval`	whether to evaluate the code and include its results. The default is `TRUE`.
`output`	name and path of the output file. File format `Rmd`.

Value

Writes Rmarkdown file containing the exact copy of the infile file with the option choose on the eval argument. It will be easy to toggle between run all the R chunk codes or not.

Author(s)

Daniela Cassol

Examples

library(systemPipeRdata)
file <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata")
evalCode(infile=file, eval=FALSE, output=file.path(tempdir(), "test.Rmd"))
library(systemPipeRdata)
file <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata")
evalCode(infile=file, eval=FALSE, output=file.path(tempdir(), "test.Rmd"))

Genome read coverage by transcript models

Description

Computes read coverage along single and multi component features based on genomic alignments. The coverage segments of component features are spliced to continuous ranges, such as exons to transcripts or CDSs to ORFs. The results can be obtained with single nucleotide resolution (e.g. around start and stop codons) or as mean coverage of relative bin sizes, such as 100 bins for each feature. The latter allows comparisons of coverage trends among transcripts of variable length. The results can be obtained for single or many features (e.g. any number of transcritpts) at once. Visualization of the coverage results is facilitated by a downstream plotfeatureCoverage function.

Usage

featureCoverage(bfl, grl, resizereads = NULL, readlengthrange = NULL, Nbins = 20, 
                method = mean, fixedmatrix, resizefeatures, upstream, downstream, 
                outfile, overwrite = FALSE)

featureCoverage(bfl, grl, resizereads = NULL, readlengthrange = NULL, Nbins = 20, 
                method = mean, fixedmatrix, resizefeatures, upstream, downstream, 
                outfile, overwrite = FALSE)

Arguments

`bfl`	Paths to BAM files provided as `BamFileList` object. The name slot of the BAM files will be used for naming samples in the results.
`grl`	Genomic ranges provided as `GRangesList` typically generated form `txdb` instances with operations like: `cdsBy(txdb, "tx")` or `exonsBy(txdb, "tx")`. Single component features will be processed the same way as multi component features.
`resizereads`	Positive integer defining the length alignments should be resized to prior to the coverage calculation. `NULL` will omit the resizing step.
`readlengthrange`	Positive integer of length 2 determining the read length range to use for the coverage calculation. Reads falling outside of the specified length range will be excluded from the coverage calculation. For instance, `readlengthrange=c(30:40)` will base the coverage calculation on reads between 30 to 40 bps. Assigning `NULL` will skip this filtering step.
`Nbins`	Single positive integer defining the number of segments the coverage of each feature should be binned into in order to obtain coverage summaries of constant length, e.g. for plotting purposes.
`method`	Defines the summary statistics to use for binning. The default is `method=mean`.
`fixedmatrix`	If set to `TRUE`, a coverage matrix with single nucleotide resolution will be returned for any number of transcripts centered around precise anchor points in a genome annotation, such a stop/start codons or transcription start sites. For instance, a matrix with coverage information 20bps upstream and downstream of the stop/start codons can be obtained with `fixedmatrix=TRUE, upstream=20, downstream=20` along with a `grl` instance containing the CDS exon ranges required for this operation, e.g. generated with `cdsBy(txdb, "tx")`.
`resizefeatures`	Needs to be set to `TRUE` when `fixedmatrix=TRUE`. Internally, this will use the `systemPipeR::.resizeFeature` function to extend single and multi component features at their most left and most right end coordinates. The corresponding extension values are specified under the `upstream` and `downstream` arguments.
`upstream`	Single positive integer specifying the upstream extension length relative to the orientation of each feature in the genome. More details are given above.
`downstream`	Single positive integer specifying the downstream extension length relative to the orientation of each feature in the genome. More details are given above.
`outfile`	Default `NULL` omits writing of the results to a file. If a file name is specified then the results are written to a tabular file. If `bfl` contains the paths to several BAM files then the results will be appended to the same file where the first column specifies the sample labels. Redirecting the results to file is particularly useful when processing large files of many sample where computation times can be significant.
`overwrite`	If set to `TRUE` any existing file assigned to `outfile` will be overwritten.

Value

The function allows to return the following four distinct outputs. The settings to return these instances are illustrated below in the example section.

`(A)`	`data.frame` containing binned coverage where rows are features and columns coverage bins. The first four columns contain (i) the sample names, (ii) the number of total aligned reads in the corresponding BAM files (useful for normalization), (iii) the feature IDs, (iv) strand of the coverage. All following columns are numeric and contain the actual coverage data for the sense and antisense strand of each feature.
`(B)`	`data.frame` containing coverage with single nucleotide resolution around anchor points such as start and stop codons. The two matrix components are appended column-wise. To clearly distinguish the two data components, they are separated by a specialty column containing pipe characters. The first four columns are the same as described under (A). The column title for the anchor point is 0. For instance, if the features are CDSs then the first 0 corresponds to the first nucleotide of the start codon and the second 0 to the last nucleotide of the stop codon. Upstream and downstream positions are indicated by negative and positive column numbers, respectively.
`(C)`	`data.frame` containing combined results of (A) and (B) where the first set of columns contains to the coverage around the start codons, the second one the binned coverage of the CDSs and the third one the coverage around the stop codons separated by the same pipe columns mentioned under (B).
`(D)`	`Rle` list containing the nucleotide level coverage of each feature

Author(s)

Thomas Girke

Examples

## Construct SYSargs2 object from param and targets files 
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args <- runCommandline(args, make_bam = TRUE, dir = TRUE)
outpaths <- subsetWF(args , slot="output", subset=1, index=1)
file.exists(outpaths)

## (A) Generate binned coverage for two BAM files and 4 transcripts
grl <- cdsBy(txdb, "tx", use.names=TRUE)
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20,
                    outfile="results/featureCoverage.xls", overwrite=TRUE)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (B) Coverage matrix upstream and downstream of start/stop codons
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_up_down.xls", overwrite=TRUE)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (C) Combined matrix for both binned and start/stop codon 
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_binned.xls", overwrite=TRUE)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (D) Rle coverage objects one for each query feature
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_query.xls", overwrite=TRUE)

## End(Not run)
## Construct SYSargs2 object from param and targets files 
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args <- runCommandline(args, make_bam = TRUE, dir = TRUE)
outpaths <- subsetWF(args , slot="output", subset=1, index=1)
file.exists(outpaths)

## (A) Generate binned coverage for two BAM files and 4 transcripts
grl <- cdsBy(txdb, "tx", use.names=TRUE)
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20,
                    outfile="results/featureCoverage.xls", overwrite=TRUE)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (B) Coverage matrix upstream and downstream of start/stop codons
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_up_down.xls", overwrite=TRUE)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (C) Combined matrix for both binned and start/stop codon 
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_binned.xls", overwrite=TRUE)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (D) Rle coverage objects one for each query feature
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_query.xls", overwrite=TRUE)

## End(Not run)

Plot read distribution across genomic features

Description

Counts how many reads in short read alignment files (BAM format) overlap with entire annotation categories. This utility is useful for analyzing the distribution of the read mappings across feature types, e.g. coding versus non-coding genes. By default the read counts are reported for the sense and antisense strand of each feature type separately. To minimize memory consumption, the BAM files are processed in a stream using utilities from the Rsamtools and GenomicAlignment packages. The counts can be reported for each read length separately or as a single value for reads of any length. Subsequently, the counting results can be plotted with the associated plotfeaturetypeCounts function.

Usage

featuretypeCounts(bfl, grl, singleEnd = TRUE, readlength = NULL, type = "data.frame")
featuretypeCounts(bfl, grl, singleEnd = TRUE, readlength = NULL, type = "data.frame")

Arguments

`bfl`	`BamFileList` object containing the paths to the target BAM files stored on disk. Note, memory-efficient processing is achieved by streaming through BAM files rather than reading entire files into memory at once. The maximum number of alignments to process in each iteration is determined by the `yieldSize` value passed on to the `BamFileList` function. For details see `?BamFileList`.
`grl`	`GRangesList` object containing in each list component the range set of a feature type. Typically, this object is generated with the `genFeatures` function. For details see the example section below and `?genFeatures`.
`singleEnd`	Specifies whether the targets BAM files contain alignments for single-end (SE) or paired-end read data. `TRUE` is for SE and `FALSE` for PE data.
`readlength`	Integer vector specifying the read length values for which to report counts separately. If `readlength=NULL` the length of the reads will be ignored resulting in a single value for each feature type and strand. Note, for PE data the two reads in a pair may differ in length. In those cases the length of the two reads is averaged and then assigned to the corresponding length category after rounding the mean length to the closest integer. This is not an ideal solution but a reasonable compromise for the purpose of the summary statistics generated by `featuretypeCounts`.
`type`	Determines whether the results are returned as `data.frame` (`type="data.frame"`) or as `list` (`type="list"`). Each list component contains the counting results for one BAM file and is named after the corresponding sample. The `data.frame` result contains this sample assignment information in a separate column.

Value

The results are returned as data.frame or list of data.frames. For details see above under types argument. The result data.frames contain the following columns in the given order:

`SampleName`	Sample names obtained from `BamFileList` object.
`Strand`	Sense or antisense strand of read mappings.
`Featuretype`	Name of feature type provided by `GRangesList` object. Note, the total number of aligned reads is reported under the special feature type 'N_total_aligned'. This value is useful for scaling/normalization purposes in plots, e.g. counts per million reads.
`Featuretypelength`	Total genomic length of each reduced feature type in bases. This value is useful to normalize the read counts by genomic length units, e.g. in plots.
`Subsequent columns`	Counts for reads of any length or for individual read lengths.

Author(s)

Thomas Girke

Examples

## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE)

## Generate and plot feature counts for specific read lengths
fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame")
p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE)

## Generate and plot feature counts for any read length  
fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame")
p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE)

## End(Not run)
## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE)

## Generate and plot feature counts for specific read lengths
fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame")
p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE)

## Generate and plot feature counts for any read length  
fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame")
p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE)

## End(Not run)

Filter and plot DEG results

Description

Filters and plots DEG results for a given set of sample comparisons. The gene idenifiers of all (i) Up_or_Down, (ii) Up and (iii) Down regulated genes are stored as separate list components, while the corresponding summary statistics, stored in a fourth list component, is plotted in form of a stacked bar plot.

Usage

filterDEGs(degDF, filter, plot = TRUE)
filterDEGs(degDF, filter, plot = TRUE)

Arguments

`degDF`	`data.frame` generated by `run_edgeR`
`filter`	Named vector with filter cutoffs of format `c(Fold=2, FDR=1)` where `Fold` refers to the fold change cutoff (unlogged) and `FDR` to the p-value cutoff.
`plot`	Allows to turn plotting behavior on and off with default set to `TRUE`.

Details

Currently, there is no community standard available how to calculate fold changes (here logFC) of genomic ranges, such as gene or feature ranges, to unambiguously refer to them as features with increased or decreased read abundandce; or in case of gene expression experiments to up or down regulated genes, respectively. To be consistent within systemPipeR, the corresponding functions, such as filterDEGs, use here the following definition. Genomic ranges with positive logFC values are classified as up and those with negative logFC values as down. This means if a comparison among two samples a and b is specified in the corresponding targets file as a-b then the feature with a positive logFC has a higher _normalized_ read count value in sample a than in sample b, and vice versa. To inverse this assignment, users want to change the specification of their chosen sample comparison(s) in the targets file accordingly, e.g. change a-b to b-a. Alternatively, one can swap the column order of the matrix assigned to the cmp argument of the run_edgeR or run_DESeq2 functions. Users should also be aware that for logFC values close to zero (noise range), the direction of the fold change (sign of logFC) can be very sensitive to minor differences in the normalization method, while this assignment is much more robust for more pronounced changes or higher absolute logFC values.

Value

Returns list with four components

`UporDown`	List of up or down regulated gene/transcript indentifiers meeting the chosen filter settings for all comparisons defined in data frames `pval` and `log2FC`.
`Up`	Same as above but only for up regulated genes/transcript.
`Down`	Same as above but only for down regulated genes/transcript.

Author(s)

Thomas Girke

Examples

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
cmp <- readComp(file=targetspath, format="matrix", delim="-")
countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR")
countDF <- read.delim(countfile, row.names=1)
edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="")
pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE]
fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE]
DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10))
names(DEG_list)
DEG_list$Summary
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
cmp <- readComp(file=targetspath, format="matrix", delim="-")
countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR")
countDF <- read.delim(countfile, row.names=1)
edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="")
pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE]
fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE]
DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10))
names(DEG_list)
DEG_list$Summary

Filter VCF files

Description

Convenience function for filtering VCF files based on user definable quality parameters. The function imports each VCF file into R, applies the filtering on an internally generated VRanges object and then writes the results to a new VCF file.

Usage

filterVars(files, filter, varcaller="gatk", organism,
            out_dir="results")
filterVars(files, filter, varcaller="gatk", organism,
            out_dir="results")

Arguments

`files`	`named character vector` with paths of the input VCF files.
`filter`	Character vector of length one specifying the filter syntax that will be applied to the internally created `VRanges` object.
`varcaller`	Character vector of length one specifying the variant caller used for generating the input VCFs. Currently, this argument can be assigned 'gatk', 'bcftools' or 'vartools'.
`organism`	Character vector specifying the organism name of the reference genome.
`out_dir`	Character vector of a `results` directory name. Default is `results`.

Value

Output files in VCF format. Their paths can be obtained with outpaths(args) or output(args).

Author(s)

Thomas Girke

Examples

## Alignment with BWA (sequentially on single machine)
param <- system.file("extdata", "bwa.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
sysargs(args)[1]

## Not run: 
library(VariantAnnotation)
system("bwa index -a bwtsw ./data/tair10.fasta")
bampaths <- runCommandline(args=args)

## Alignment with BWA (parallelized on compute cluster)
resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", cores(args)), memory="10gb")
reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01",
                  resourceList=resources)

## Variant calling with GATK
## The following creates in the inital step a new targets file
## (targets_bam.txt). The first column of this file gives the paths to
## the BAM files created in the alignment step. The new targets file and the
## parameter file gatk.param are used to create a new SYSargs
## instance for running GATK. Since GATK involves many processing steps, it is
## executed by a bash script gatk_run.sh where the user can specify the
## detailed run parameters. All three files are expected to be located in the
## current working directory. Samples files for gatk.param and
## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the 
## source file of the systemPipeR package. 
writeTargetsout(x=args, file="targets_bam.txt")
system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
# system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt")
resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb")
reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01",
                  resourceList=resources)
writeTargetsout(x=args, file="targets_gatk.txt")

## Variant calling with BCFtools
## The following runs the variant calling with BCFtools. This step requires in
## the current working directory the parameter file sambcf.param and the 
## bash script sambcf_run.sh.
args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt")
resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb")
reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01",
                  resourceList=resources)
writeTargetsout(x=args, file="targets_sambcf.txt")

## Filtering of VCF files generated by GATK
args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt")
filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4"
# filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6"
filterVars(args, filter, varcaller="gatk", organism="A. thaliana")
writeTargetsout(x=args, file="targets_gatk_filtered.txt")

## Filtering of VCF files generated by BCFtools
args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt")
filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
# filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
filterVars(args, filter, varcaller="bcftools", organism="A. thaliana")
writeTargetsout(x=args, file="targets_sambcf_filtered.txt")

## Annotate filtered variants from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Annotate filtered variants from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Combine results from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Combine results from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Summary for GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t")

## Summary for BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t")

## Venn diagram of variants
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_gatk <- overLapper(varlist, type="vennsets")
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_bcf <- overLapper(varlist, type="vennsets")
vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red"))

## End(Not run)
## Alignment with BWA (sequentially on single machine)
param <- system.file("extdata", "bwa.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
sysargs(args)[1]

## Not run: 
library(VariantAnnotation)
system("bwa index -a bwtsw ./data/tair10.fasta")
bampaths <- runCommandline(args=args)

## Alignment with BWA (parallelized on compute cluster)
resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", cores(args)), memory="10gb")
reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01",
                  resourceList=resources)

## Variant calling with GATK
## The following creates in the inital step a new targets file
## (targets_bam.txt). The first column of this file gives the paths to
## the BAM files created in the alignment step. The new targets file and the
## parameter file gatk.param are used to create a new SYSargs
## instance for running GATK. Since GATK involves many processing steps, it is
## executed by a bash script gatk_run.sh where the user can specify the
## detailed run parameters. All three files are expected to be located in the
## current working directory. Samples files for gatk.param and
## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the 
## source file of the systemPipeR package. 
writeTargetsout(x=args, file="targets_bam.txt")
system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
# system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt")
resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb")
reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01",
                  resourceList=resources)
writeTargetsout(x=args, file="targets_gatk.txt")

## Variant calling with BCFtools
## The following runs the variant calling with BCFtools. This step requires in
## the current working directory the parameter file sambcf.param and the 
## bash script sambcf_run.sh.
args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt")
resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb")
reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01",
                  resourceList=resources)
writeTargetsout(x=args, file="targets_sambcf.txt")

## Filtering of VCF files generated by GATK
args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt")
filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4"
# filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6"
filterVars(args, filter, varcaller="gatk", organism="A. thaliana")
writeTargetsout(x=args, file="targets_gatk_filtered.txt")

## Filtering of VCF files generated by BCFtools
args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt")
filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
# filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
filterVars(args, filter, varcaller="bcftools", organism="A. thaliana")
writeTargetsout(x=args, file="targets_sambcf_filtered.txt")

## Annotate filtered variants from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Annotate filtered variants from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Combine results from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Combine results from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Summary for GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t")

## Summary for BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t")

## Venn diagram of variants
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_gatk <- overLapper(varlist, type="vennsets")
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_bcf <- overLapper(varlist, type="vennsets")
vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red"))

## End(Not run)

Generate feature ranges from TxDb

Description

Function to generate a variety of feature types from TxDb objects using utilities provided by the GenomicFeatures package. The feature types are organized per gene and can be returned on that level in their non-reduced or reduced form.

Currently, supported features include intergenic, promoter, intron, exon, cds, 5'/3'UTR and different transcript types. The latter contains as many transcript types as available in the tx_type column when extracting transcripts from TxDb objects as follows: transcripts(txdb, c("tx_name", "gene_id", "tx_type"))

Usage

genFeatures(txdb, featuretype = "all", reduce_ranges, upstream = 1000, downstream = 0, verbose = TRUE)
genFeatures(txdb, featuretype = "all", reduce_ranges, upstream = 1000, downstream = 0, verbose = TRUE)

Arguments

`txdb`	`TxDb` object
`featuretype`	Feature types can be specified by assigning a `character` vector containing any of the following: c("tx_type", "promoter", "intron", "exon", "cds", "fiveUTR", "threeUTR", "intergenic"). The default `all` is a shorthand to select all supported features.
`reduce_ranges`	If set to `TRUE` the feature ranges will be reduced on the gene level. As a result overlapping feature components of the same type and from the same gene will be merged to a single range, e.g. two overlapping exons from the same gene are merged to one. Intergenic ranges are not affected by this setting. Note, all reduced feature types are labeled with the suffix '_red'.
`upstream`	Defines for promoter features the number of bases upstream from the transcription start site.
`downstream`	Defines for promoter features the number of bases downstream from the transcription start site.
`verbose`	`verbose=FALSE` turns off all print messages.

Value

The results are returned as a GRangesList where each component is a GRanges object containing the range set of each feature type. Intergenic ranges are assigned unique identifiers and recorded in the featuretype_id column of the metadata block. For this the ids of their adjacent genes are concatenated with two underscores as separator. If the adjacent genes overlap with other genes then their identifiers are included in the id string as well and separated by a single underscore.

Author(s)

Thomas Girke

Examples

## Sample from txdbmaker package
library(txdbmaker)
gffFile <- system.file("extdata", "GFF3_files", "a.gff3", package="txdbmaker")
txdb <- makeTxDbFromGFF(file=gffFile, format="gff3", organism="Solanum lycopersicum")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=FALSE, upstream=1000, downstream=0)

## List extracted feature types
names(feat)

## Obtain feature lists by genes, here for promoter
split(feat$promoter, unlist(mcols(feat$promoter)$feature_by))

## Return all features in single GRanges object
unlist(feat)

## Not run: 
## Sample from systemPipeRdata package
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0)

## End(Not run)
## Sample from txdbmaker package
library(txdbmaker)
gffFile <- system.file("extdata", "GFF3_files", "a.gff3", package="txdbmaker")
txdb <- makeTxDbFromGFF(file=gffFile, format="gff3", organism="Solanum lycopersicum")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=FALSE, upstream=1000, downstream=0)

## List extracted feature types
names(feat)

## Obtain feature lists by genes, here for promoter
split(feat$promoter, unlist(mcols(feat$promoter)$feature_by))

## Return all features in single GRanges object
unlist(feat)

## Not run: 
## Sample from systemPipeRdata package
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0)

## End(Not run)

GO term enrichment analysis for large numbers of gene sets

Description

To test a sample population of genes for over-representation of GO terms, the core function GOHyperGAll computes for all nodes in the three GO networks (BP, CC and MF) an enrichment test based on the hypergeometric distribution and returns the corresponding raw and Bonferroni corrected p-values. Subsequently, a filter function supports GO Slim analyses using default or custom GO Slim categories. Several convenience functions are provided to process large numbers of gene sets (e.g. clusters from partitioning results) and to visualize the results.

Note: GOHyperGAll provides similar utilities as the GOHyperG function in the GOstats package. The main difference is that GOHyperGAll simplifies processing of large numbers of gene sets, as well as the usage of custom array-to-gene and gene-to-GO mappings.

Usage

## Generate gene-to-GO mappings and store as catDB object 
makeCATdb(myfile, lib = NULL, org = "", colno = c(1, 2, 3), idconv = NULL, 
            rootUK=FALSE)

## Enrichment function
GOHyperGAll(catdb, gocat = "MF", sample, Nannot = 2)

## GO slim analysis
GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample = test_sample, 
            type = "goSlim", myslimv)

## Reduce GO term redundancy 
GOHyperGAll_Simplify(GOHyperGAll_result, gocat = "MF", cutoff = 0.001, correct = TRUE)

## Batch analysis of many gene sets
GOCluster_Report(catdb, setlist, id_type = "affy", method = "all", CLSZ = 10, 
            cutoff = 0.001, gocats = c("MF", "BP", "CC"), myslimv = "default",
            correct = TRUE, recordSpecGO = NULL, ...)

## Bar plot of GOCluster_Report results
goBarplot(GOBatchResult, gocat)
## Generate gene-to-GO mappings and store as catDB object 
makeCATdb(myfile, lib = NULL, org = "", colno = c(1, 2, 3), idconv = NULL, 
            rootUK=FALSE)

## Enrichment function
GOHyperGAll(catdb, gocat = "MF", sample, Nannot = 2)

## GO slim analysis
GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample = test_sample, 
            type = "goSlim", myslimv)

## Reduce GO term redundancy 
GOHyperGAll_Simplify(GOHyperGAll_result, gocat = "MF", cutoff = 0.001, correct = TRUE)

## Batch analysis of many gene sets
GOCluster_Report(catdb, setlist, id_type = "affy", method = "all", CLSZ = 10, 
            cutoff = 0.001, gocats = c("MF", "BP", "CC"), myslimv = "default",
            correct = TRUE, recordSpecGO = NULL, ...)

## Bar plot of GOCluster_Report results
goBarplot(GOBatchResult, gocat)

Arguments

`myfile`	File with gene-to-GO mappings. Sample files can be downloaded from geneontology.org (http://geneontology.org/GO.downloads.annotations.shtml) or from BioMart as shown in example below.
`colno`	Column numbers referencing in `myfile` the three target columns containing GOID, GeneID and GOCAT, in that order.
`org`	Optional argument. Currently, the only valid option is `org="Arabidopsis"` to get rid of transcript duplications in this particular annotation.
`lib`	If the gene-to-GO mappings are obtained from a `*.db` package from Bioconductor then the package name can be specified under the `lib` argument of the `sampleDFgene2GO` function.
`idconv`	Optional id conversion `data.frame`
`catdb`	`catdb` object storing mappings of genes to annotation categories. For details, see ?"SYSargs-class".
`rootUK`	If the argument `rootUK` is set to `TRUE` then the root nodes are treated as terminal nodes to account for the new unknown terms.
`sample`	`character vector` containing the test set of gene identifiers
`Nannot`	Defines the minimum number of direct annotations per GO node from the sample set to determine the number of tested hypotheses for the p-value adjustment.
`gocat`	Specifies the GO type, can be assigned one of the following character values: "MF", "BP" and "CC".
`GOHyperGAll_result`	`data.frame` generated by `GOHyperGAll`
`type`	The function `GOHyperGAll_Subset` subsets the `GOHyperGAll` results by directly assigned GO nodes or custom `goSlim` categories. The argument `type` can be assigned the values `goSlim` or `assigned`.
`myslimv`	optional argument to provide custom `goSlim` vector
`cutoff`	p-value cutoff for GO terms to show in result `data.frame`
`correct`	If `TRUE` the function will favor the selection of terminal (informationich) GO terms that have at the same time a large number of sample matches.
`setlist`	`list` of `character vectors` containing gene IDs (or array feature IDs). The names of the `list` components correspond to the set labels, e.g. DEG comparisons or cluster IDs.
`id_type`	specifies type of IDs in input, can be assigned `gene` or `affy`
`method`	Specifies analysis type. Current options are `all` for `GOHyperGAll`, `slim` for `GOHyperGAll_Subset` or `simplify` for `GOHyperGAll_Simplify`.
`CLSZ`	minimum gene set (cluster) size to consider. Gene sets below this cutoff will be ignored.
`gocats`	Specifies GO type, can be assigned the values "MF", "BP" and "CC".
`recordSpecGO`	argument to report in the result `data.frame` specific GO IDs for any of the 3 ontologies disregarding whether they meet the specified p-value `cutoff`, e.g: `recordSpecGO=c("GO:0003674", "GO:0008150", "GO:0005575")`
`GOBatchResult`	`data.frame` generated by `GOCluster_Report`
`...`	additional arguments to pass on

Details

GOHyperGAll_Simplify: The result data frame from GOHyperGAll will often contain several connected GO terms with significant scores which can complicate the interpretation of large sample sets. To reduce this redundancy, the function GOHyperGAll_Simplify subsets the data frame by a user specified p-value cutoff and removes from it all GO nodes with overlapping children sets (OFFSPRING), while the best scoring nodes are retained in the result data.frame.

GOCluster_Report: performs the three types of GO term enrichment analyses in batch mode: GOHyperGAll, GOHyperGAll_Subset or GOHyperGAll_Simplify. It processes many gene sets (e.g. gene expression clusters) and returns the results conveniently organized in a single result data frame.

Value

makeCATdb generates catDB object from file.

Author(s)

Thomas Girke

References

This workflow has been published in Plant Physiol (2008) 147, 41-57.

Examples

## Not run: 

## Obtain annotations from BioMart
listMarts() # To choose BioMart database
m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) 
m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene")
listAttributes(m) # Choose data types you want to download
go <- getBM(attributes=c("go_accession", "tair_locus", 
            "go_namespace_1003"), mart=m)
go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3])
write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, 
            row.names=FALSE, col.names=FALSE, sep="\t")

## Create catDB instance (takes a while but needs to be done only once)
catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="",
                    colno=c(1,2,3), idconv=NULL)
catdb

## Create catDB from Bioconductor annotation package
# catdb <- makeCATdb(myfile=NULL, lib="ath1121501.db", org="", 
                    colno=c(1,2,3), idconv=NULL)

## AffyID-to-GeneID mappings when working with AffyIDs 
# affy2locusDF <- systemPipeR:::.AffyID2GeneID(map = "ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt",
                                                download=TRUE)
# catdb_conv <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", 
                            colno=c(1,2,3), idconv=list(affy=affy2locusDF))
# systemPipeR:::.AffyID2GeneID(catdb=catdb_conv, 
                                affyIDs=c("244901_at", "244902_at"))

## Next time catDB can be loaded from file
save(catdb, file="catdb.RData") 
load("catdb.RData")

## Perform enrichment test on single gene set
test_sample <- unique(as.character(catmap(catdb)$D_MF[1:100,"GeneID"]))
GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2)[1:20,]

## GO Slim analysis by subsetting results accordingly
GOHyperGAll_result <- GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2)
GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample=test_sample, type="goSlim") 

## Reduce GO term redundancy in 'GOHyperGAll_results'
simplifyDF <- GOHyperGAll_Simplify(GOHyperGAll_result, gocat="MF", 
                                    cutoff=0.001, correct=T)
# Returns the redundancy reduced data set. 
data.frame(GOHyperGAll_result[GOHyperGAll_result[,1] 

## Batch Analysis of Gene Clusters
testlist <- list(Set1=test_sample)
GOBatchResult <- GOCluster_Report(catdb=catdb, setlist=testlist, method="all", 
                                    id_type="gene", CLSZ=10, cutoff=0.001, 
                                    gocats=c("MF", "BP", "CC"), 
                                    recordSpecGO=c("GO:0003674", "GO:0008150", "GO:0005575"))

## Plot 'GOBatchResult' as bar plot
goBarplot(GOBatchResult, gocat="MF")


## End(Not run)
## Not run: 

## Obtain annotations from BioMart
listMarts() # To choose BioMart database
m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) 
m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene")
listAttributes(m) # Choose data types you want to download
go <- getBM(attributes=c("go_accession", "tair_locus", 
            "go_namespace_1003"), mart=m)
go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3])
write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, 
            row.names=FALSE, col.names=FALSE, sep="\t")

## Create catDB instance (takes a while but needs to be done only once)
catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="",
                    colno=c(1,2,3), idconv=NULL)
catdb

## Create catDB from Bioconductor annotation package
# catdb <- makeCATdb(myfile=NULL, lib="ath1121501.db", org="", 
                    colno=c(1,2,3), idconv=NULL)

## AffyID-to-GeneID mappings when working with AffyIDs 
# affy2locusDF <- systemPipeR:::.AffyID2GeneID(map = "ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt",
                                                download=TRUE)
# catdb_conv <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", 
                            colno=c(1,2,3), idconv=list(affy=affy2locusDF))
# systemPipeR:::.AffyID2GeneID(catdb=catdb_conv, 
                                affyIDs=c("244901_at", "244902_at"))

## Next time catDB can be loaded from file
save(catdb, file="catdb.RData") 
load("catdb.RData")

## Perform enrichment test on single gene set
test_sample <- unique(as.character(catmap(catdb)$D_MF[1:100,"GeneID"]))
GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2)[1:20,]

## GO Slim analysis by subsetting results accordingly
GOHyperGAll_result <- GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2)
GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample=test_sample, type="goSlim") 

## Reduce GO term redundancy in 'GOHyperGAll_results'
simplifyDF <- GOHyperGAll_Simplify(GOHyperGAll_result, gocat="MF", 
                                    cutoff=0.001, correct=T)
# Returns the redundancy reduced data set. 
data.frame(GOHyperGAll_result[GOHyperGAll_result[,1] 

## Batch Analysis of Gene Clusters
testlist <- list(Set1=test_sample)
GOBatchResult <- GOCluster_Report(catdb=catdb, setlist=testlist, method="all", 
                                    id_type="gene", CLSZ=10, cutoff=0.001, 
                                    gocats=c("MF", "BP", "CC"), 
                                    recordSpecGO=c("GO:0003674", "GO:0008150", "GO:0005575"))

## Plot 'GOBatchResult' as bar plot
goBarplot(GOBatchResult, gocat="MF")


## End(Not run)

Import R Markdown file as workflow

Description

Import R Markdown file as workflow. Each R code chunk will be set as a step in the workflow. This operation requires a few extra settings on the R Markdown chunk options, to include a particular code chunk in the workflow analysis. Please check Details.

Usage

importWF(
    sysargs, file_path, ignore_eval = TRUE,
    update = FALSE, confirm = FALSE,
    check_tool = !update,
    check_module = check_tool,
    verbose = TRUE
)
importWF(
    sysargs, file_path, ignore_eval = TRUE,
    update = FALSE, confirm = FALSE,
    check_tool = !update,
    check_module = check_tool,
    verbose = TRUE
)

Arguments

`sysargs`	`SYSargsList` empty object. More information on `Details`.
`file_path`	string, file path of the workflow file.
`ignore_eval`	logical, treat all R chunks' `eval` option as `TRUE` in workflow `Rmd` file even if some chunks have `eval=FALSE`.
`update`	logical, If you have ever changed the template and want to sync new changes, turn `TRUE` to update the workflow. This function will find the difference between old template and this new template, update line number records, update preprocessing code, and try to import new steps. The updated template is useful in `renderReport()` function.
`confirm`	logical, Only useful when you combine `update = TRUE`, some questions will be asked during update. Changing this to `TRUE` would skip these questions and directly say "Yes" to all answers. Useful in non-interactive mode. Default is to say "No" in non-interactive mode.
`check_tool`	logical, whether to check all required tools by this workflow are in PATH (callable). It uses the `listCmdTools` function to perform the check. The default is the reverse of `update`. It means if it is the first importing the workflow, tools and modules will be checked. If it is updating the existing workflow, tools will not be checked.
`check_module`	logical, whether to check all required modules are available. To do so, a modular system has be to installed. If no modular system, this check will be skipped, even if `check_module = TRUE`. It uses the `listCmdModules` function to perform the check. Check the help of `module` to know more about modular system.
`verbose`	logical, print out verbose message while function running.

Details

To include a particular code chunk from the R Markdown file in the workflow analysis, please use the following code chunk options:

- spr = 'r': for code chunks with R code lines; - spr = 'sysargs': for code chunks with an 'SYSargsList' object; - spr.dep = <StepName>: for specify the previous dependency. If this options is not found, it will automaticly add the previous step.

For spr = 'sysargs', the last object assigned needs to be the SYSargsList. If the spr flag is not found, the R chunk will not be included in the workflow.

It is required to start a project using SPRproject() function, and use the object to populate the steps from R Markdown file.

Value

importWF will return an SYSargsList object.

Author(s)

Le Zhang and Daniela Cassol

Examples

file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
sal <- importWF(sal, file_path)
file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
sal <- importWF(sal, file_path)

Class `"INTERSECTset"`

Description

Container for storing standard intersect results created by the overLapper function. The setlist slot stores the original label sets as vectors in a list; intersectmatrix organizes the label sets in a present-absent matrix; complexitylevels represents the number of comparisons considered for each comparison set as vector of integers; and intersectlist contains the standard intersect vectors.

Objects from the Class

Objects can be created by calls of the form new("INTERSECTset", ...).

Slots

setlist:: Object of class "list": list of vectors
intersectmatrix:: Object of class "matrix": binary matrix
complexitylevels:: Object of class "integer": vector of integers
intersectlist:: Object of class "list": list of vectors

Methods

as.list: signature(x = "INTERSECTset"): coerces INTERSECTset to list
coerce: signature(from = "list", to = "INTERSECTset"): as(list, "INTERSECTset")
complexitylevels: signature(x = "INTERSECTset"): extracts data from complexitylevels slot
intersectlist: signature(x = "INTERSECTset"): extracts data from intersectlist slot
intersectmatrix: signature(x = "INTERSECTset"): extracts data from intersectmatrix slot
length: signature(x = "INTERSECTset"): returns number of original label sets
names: signature(x = "INTERSECTset"): extracts slot names
setlist: signature(x = "INTERSECTset"): extracts data from setlist slot
show: signature(object = "INTERSECTset"): summary view of INTERSECTset objects

Author(s)

Thomas Girke

Examples

showClass("INTERSECTset")

## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## Create VENNset
interset <- overLapper(setlist[1:5], type="intersects")
class(interset)

## Accessor methods for VENNset/INTERSECTset objects
names(interset)
setlist(interset)
intersectmatrix(interset)
complexitylevels(interset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(interset)
showClass("INTERSECTset")

## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## Create VENNset
interset <- overLapper(setlist[1:5], type="intersects")
class(interset)

## Accessor methods for VENNset/INTERSECTset objects
names(interset)
setlist(interset)
intersectmatrix(interset)
complexitylevels(interset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(interset)

Class `"LineWise"`

Description

S4 class container for storing R-based code for a workflow step. LineWise class instances are constructed by the LineWise function, based on the R-based code, step name, and dependency tree. When the container is built from the R Markdown, using importWF function, two other slots are populated: codeChunkStart and rmdPath. codeChunkStart will store the first line of each R chunk, and rmdPath will store the R Markdown file path.

Usage

## Constructor

LineWise(code, step_name = "default", codeChunkStart = integer(), 
            rmdPath = character(), dependency="", 
            run_step = "mandatory",
            run_session = "management", 
            run_remote_resources = NULL) 
        
## Constructor

LineWise(code, step_name = "default", codeChunkStart = integer(), 
            rmdPath = character(), dependency="", 
            run_step = "mandatory",
            run_session = "management", 
            run_remote_resources = NULL)

Arguments

`code`	R code separated either by a semi-colon (`\;`), or by a newline, and enclosed by braces (`{ }`).
`step_name`	character. Step name needs to be unique and is required when appending this step to the workflow.
`codeChunkStart`	integer. R Markdown code chunk line start. This element will be populated when the object is built by `importWF`.
`rmdPath`	character. Path of R Markdown file used by `importWF`.
`dependency`	character. Dependency tree, required when appending this step to the workflow. Character name of a previous step in the workflow. Default is empty string `""`.
`run_step`	character. If the step has "mandatory" or "optional" flag for the execution.
`run_session`	character. If the step has "management" or "compute" flag for the execution.
`run_remote_resources`	`List` for reserving for each cluster job sufficient computing resources including memory (Megabyte), number of nodes, CPU cores, walltime (minutes), etc. It is necessary two aditional files: `conffile` and `template`. `conffile` is the path to conf file (default location `./.batchtools.conf.R`). This file contains in its simplest form just one command, such as this line for the Slurm scheduler: `cluster.functions <- makeClusterFunctionsSlurm(template="batchtools.slurm.tmpl")`. For more detailed information visit this page: https://mllg.github.io/batchtools/index.html `template` The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/mllg/batchtools/tree/master/inst/templates. Slurm, PBS/Torque, and Sun Grid Engine (SGE) templates are provided

Objects from the Class

Objects can be created by calls of the form new("LineWise", ...).

Slots

codeLine:: Object of class "expression" storing R-based code.
codeChunkStart:: Object of class "interger" storing start line from the rmdPath file, when the "LineWise" is built from R Markdown.
stepName:: Object of class "character" storing step name.
dependency:: Object of class "list" storing dependency tree.
status:: Object of class "list" storing status steps.
files:: Object of class "list" storing file for R Markdown file and the file containing stdout and stderr after running the R-based code.
runInfo:: Object of class "list" storing all the runInfo information of the workflow

Functions and Methods

See 'Usage' for details on invocation.

Constructor:

LineWise:: Returns a LineWise object.

Accessors:

codeLine: Printing method for the CodeLine slot.
codeChunkStart: Extract start line of the R Markdown R chunk.
rmdPath: Extract Rmarkdown file path.
stepName: Extract the step name.
dependency: Extract the dependency tree.
status: Extract status of the step.
files: Extract log file path storing stdout and stderr after running step.
appendCodeLine<-: Replacement method for append a R code line.
replaceCodeLine<-: Replacement method for replace a R code line.

Methods:

[: Return a new LineWise object made of the selected R code lines.
[[: Extract the slot information from LineWine object.
[[<-: Replacement method for LineWine slots.
$: Extract slots elements by name.
length: Extract number of R-based code lines.
names: Extract slot names.
show: Summary view of LineWise elements.
coerce: signature(from = "LineWise", to = "list")as(LineWise, "list")
coerce: signature(from = "list", to = "LineWise")as(list, "LineWise")
linewise: Coerce back to list as(LineWise, "list")

Author(s)

Daniela Cassol

Examples

showClass("LineWise")
lw <- LineWise(code = {
              log_out <- log(10)
              },
              step_name = "R_log")

codeLine(lw)

## ImportWF option
file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
# file_path <- "../inst/extdata/spr_simple_lw.Rmd"
sal <- importWF(sal, file_path)
sal <- runWF(sal)
lw2 <- sal$stepsWF[[2]]
lw2
names(lw2)
length(lw2)

## Accessors
codeLine(lw2)
codeChunkStart(lw2)
rmdPath(lw2)
stepName(lw2)
dependency(lw2)
status(lw2)
files(lw2)

## Replacement
appendCodeLine(lw2, after = 0) <- "log <- log(10)"
codeLine(lw2)
replaceCodeLine(lw2, 1) <- "plot(iris)"
codeLine(lw2)

## Coerce
lw2 <- linewise(lw2) ## OR lw2 <- as(lw2, "list") 
lw2 <- as(lw2, "LineWise")
showClass("LineWise")
lw <- LineWise(code = {
              log_out <- log(10)
              },
              step_name = "R_log")

codeLine(lw)

## ImportWF option
file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
# file_path <- "../inst/extdata/spr_simple_lw.Rmd"
sal <- importWF(sal, file_path)
sal <- runWF(sal)
lw2 <- sal$stepsWF[[2]]
lw2
names(lw2)
length(lw2)

## Accessors
codeLine(lw2)
codeChunkStart(lw2)
rmdPath(lw2)
stepName(lw2)
dependency(lw2)
status(lw2)
files(lw2)

## Replacement
appendCodeLine(lw2, after = 0) <- "log <- log(10)"
codeLine(lw2)
replaceCodeLine(lw2, 1) <- "plot(iris)"
codeLine(lw2)

## Coerce
lw2 <- linewise(lw2) ## OR lw2 <- as(lw2, "list") 
lw2 <- as(lw2, "LineWise")

List/check the existence of command-line tools of a workflow

Description

These functions list/check whether required command-line tools/modules are installed in the PATH and are callable.

Usage

listCmdTools(sal, check_path = FALSE, check_module = FALSE)

listCmdModules(sal, check_module = FALSE)
listCmdTools(sal, check_path = FALSE, check_module = FALSE)

listCmdModules(sal, check_module = FALSE)

Arguments

`sal`	SPR workflow object in `SYSargsList` class.
`check_path`	logical, whether to check if the required tools are in PATH.
`check_module`	logical, whether to check if the required modules are installed.

Details

Both functions by default will not check the existence of tools or modules. The default is to list the requirement.

Value

Both functions print out the list/check results as dataframe. The first column is workflow step names that require certain tools/modules. The second column is the tool/module names. The third column is logical, TRUE for the existence of the tool in PATH/modular system, if check_path = TRUE or check_module = TRUE. Otherwise, the third column will be NA.

In the case of both check_path = TRUE, check_module = TRUE for listCmdTools, the returned dataframe is still results for tool PATH checking but not module checking results. If one wish to obtain the module checking results, please use listCmdModules.

When the current workflow has no command-line (SYSargs) step, or there is no module required, or there is no modular system installed, the return will be NULL.

These two functions are automatically performed when importWF is called.

Author(s)

Le Zhang

Examples

# See examples of `importWF`
# See examples of `importWF`

Constructs SYSargs2 object from CWL param and targets files

Description

The constructor functions create an SYSargs2 S4 class object from three input files: a CWL param and input files, and one simple tabular or yml file, a targets file. The latter is optional for workflow steps lacking input files. TheCWL param provides all the parameters required for running command-line software, following the standard and specification defined on Common Workflow Language (CWL). The input file provides additional information for the command-line, allowing each sample level input/outfile operation uses its own SYSargs2 instance. In the targets file users could provide the paths to the initial sample input files (e.g. FASTQ) along with sample labels, and if appropriate biological replicate and contrast information for controlling differential abundance analyses.

The renderWF function populates all the command-line for each sample in each step of the particular workflow. Each sample level input/outfile operation uses its own SYSargs2 instance. The output of SYSargs2 define all the expected output files for each step in the workflow, which usually it is the sample input for the next step in an SYSargs2 instance. By chaining several SYSargs2 steps together one can construct complex workflows involving many sample-level input/output file operations with any combination of command-line or R-based software. Between different instances, this connectivity is established by `appendStep<-` method. Please check more details from SYSargsList-class class.

Usage

loadWorkflow(targets = NULL, wf_file, input_file, dir_path = "param/cwl", id = "SampleName")

renderWF(WF, inputvars = NULL)

updateWF(WF, write.yaml=FALSE, name.yaml="default", new_targets=NULL,
                     new_targetsheader=NULL, inputvars=NULL, silent=FALSE)
loadWorkflow(targets = NULL, wf_file, input_file, dir_path = "param/cwl", id = "SampleName")

renderWF(WF, inputvars = NULL)

updateWF(WF, write.yaml=FALSE, name.yaml="default", new_targets=NULL,
                     new_targetsheader=NULL, inputvars=NULL, silent=FALSE)

Arguments

`targets`	either the path to `targets` file or an object of `SummarizedExperiment` class. The `targets` file can be either a simple tabular or yml file. Also, it is possible to assign `NULL` to run the pipeline without the 'targets' file. This can be useful for running specific workflows that do not require input files.
`wf_file`	name and path to `CWL parameters` file.
`input_file`	name and path to `input parameters` file.
`dir_path`	path to the `parameters` directory with the `wf_file` and `input_file` files. It is recommended to keep both files in the same directory.
`id`	A column from `targets` file, which will be used as an id for each one of the samples. It is required to be unique.
`WF`	Object of class `SYSargs2`, generated by `loadWF` .
`inputvars`	named character vector. Variables defined in the `input` file that matches the column names defined in the `targets` file.
`write.yaml`	logical. If set to `TRUE`, it will write to file the content of the CWL files: `*.yml`. Default is `FALSE`.
`name.yaml`	name and path to `input parameters` file, if `write.yaml` is set to `TRUE`. Default value will write a file at the same directory of `dir_path` appending to the file name the current date.
`new_targets`	new targets files as list. 'targets' `data.frame` can be converted by `targets.as.list` function. Default is `NULL`, and it will maintain the original.
`new_targetsheader`	character. New header/comment lines of targets file. Default is `NULL`, and it will maintain the original.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.

Value

SYSargs2 object.

Author(s)

Daniela Cassol and Thomas Girke

Examples

## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF

## If required to update the object
yamlinput(WF, "thread")  <- 6L
WF <- updateWF(WF)
cmdlist(WF)[1]
yamlinput(WF)$thread
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF

## If required to update the object
yamlinput(WF, "thread")  <- 6L
WF <- updateWF(WF)
cmdlist(WF)[1]
yamlinput(WF)$thread

Merge BAM files based on factor

Description

Merges BAM files based on sample groupings provided by a factor using internally the mergeBam function from the Rsamtools package. The function also returns an updated SYSargs or SYSargs2 object containing the paths to the merged BAM files as well as to the unmerged BAM files if there are any. All rows of merged parent samples are removed. When a named character vector is provided as input, a data.frame with a target containing the paths to the merged BAM files as output.

The functionality provided by mergeBamByFactor is useful for experiments where pooling of replicates is advantageous to maximize the depth of read coverage, such as prior to peak calling in ChIP-Seq or miRNA gene prediction experiments.

Usage

mergeBamByFactor(args, targetsDF = NULL, mergefactor = "Factor",
                out_dir = file.path("results", "merge_bam"),
                overwrite = FALSE, silent = FALSE, ...)
mergeBamByFactor(args, targetsDF = NULL, mergefactor = "Factor",
                out_dir = file.path("results", "merge_bam"),
                overwrite = FALSE, silent = FALSE, ...)

Arguments

`args`	An instance of `SYSargs` or `SYSargs2` constructed from a `targets` file where the first column (`targetsin(args)` or `targets.as.df(targets(args))`) contains the paths to the BAM files along with the column title `FileName`. Another possibily is `named character vector` with BAM files PATH and the elements names should be the sampleID.
`targetsDF`	This argument is required when `named character vector` is provided as input. Default is `NULL`. Object of class `DFrame`, and it can be obtained with `targetsWF(<SYSargsList>)`.
`mergefactor`	`factor` containing the grouping information required for merging the BAM files referenced in the first column of `targetsin(args)` or `targets.as.df(targets(args))`. The default uses `Factor` column from the `targets` files as factor. The latter merges BAM files for which replicates are specified in the `Factor` column.
`out_dir`	The directory path to store merged bam files. Default uses `"merge_bam"` directory inside the results directory. directory not existing before running the function is alllowed. It will be created while running.
`overwrite`	If `overwrite=FALSE` existing BAM files of the same name will not be overwritten.
`silent`	If `silent=TRUE` print statements will be suppressed.
`...`	To pass on additional arguments to the internally used `mergeBam` function from `Rsamtools`.

Value

The merged BAM files will be written to output files with the following naming convention: <first_BAM_file_name>_<grouping_label_of_factor>.<bam>. In addition, the function returns an updated SYSargs or SYSargs2 object where all output file paths contain the paths to the merged BAM files. When a named character vector is provided as input, a data.frame with a target containing the paths to the merged BAM files as output. The rows of the merged parent samples are removed and the rows of the unmerged samples remain unchanged.

Author(s)

Thomas Girke

Examples

## Construct initial SYSargs object
targetspath <- system.file("extdata", "targets_chip.txt", package="systemPipeR")
parampath <- system.file("extdata", "bowtieSE.param", package="systemPipeR")
args <- systemArgs(sysma=parampath, mytargets=targetspath)

## Not run: 
## After running alignmets (e.g. with Bowtie2) generate targets file
## for the corresponding BAM files. The alignment step is skipped here.
writeTargetsout(x=args, file="targets_bam.txt", overwrite=TRUE)
args <- systemArgs(sysma=NULL, mytargets="targets_bam.txt")

## Merge BAM files and return updated SYSargs object
args_merge <- mergeBamByFactor(args, overwrite=TRUE, silent=FALSE)

## Export modified targets file
writeTargetsout(x=args_merge, file="targets_mergeBamByFactor.txt", overwrite=TRUE)

## End(Not run)
## Construct initial SYSargs object
targetspath <- system.file("extdata", "targets_chip.txt", package="systemPipeR")
parampath <- system.file("extdata", "bowtieSE.param", package="systemPipeR")
args <- systemArgs(sysma=parampath, mytargets=targetspath)

## Not run: 
## After running alignmets (e.g. with Bowtie2) generate targets file
## for the corresponding BAM files. The alignment step is skipped here.
writeTargetsout(x=args, file="targets_bam.txt", overwrite=TRUE)
args <- systemArgs(sysma=NULL, mytargets="targets_bam.txt")

## Merge BAM files and return updated SYSargs object
args_merge <- mergeBamByFactor(args, overwrite=TRUE, silent=FALSE)

## Export modified targets file
writeTargetsout(x=args_merge, file="targets_mergeBamByFactor.txt", overwrite=TRUE)

## End(Not run)

Interface to allow full use of the Environment Modules system for Unix

Description

The function module enables use of the Environment Modules system (http://modules.sourceforge.net/) from within the R environment. The user's login shell environment (i.e. bash -l) will be used to initialize the current session. The module function can also; load or unload specific software, list all the loaded software within the current session, and list all the applications available for loading from the module system. Lastly, the module function can remove all loaded software from the current session.

Usage

module(action_type, module_name = NULL)
moduleload(module_name)
moduleUnload(module_name)
modulelist()
moduleAvail()
moduleClear()
moduleInit()
module(action_type, module_name = NULL)
moduleload(module_name)
moduleUnload(module_name)
modulelist()
moduleAvail()
moduleClear()
moduleInit()

Arguments

`action_type`	Name of the action to be executed as character vector. The following switches are accepted: `avail`, `list`, `init`, `load`, `unload`, and `clear`.
`module_name`	Name of software to load as character vector. Examples: `"hisat2"`, `"hisat2/2.1.0"`, `c("hisat2", "samtools")`.

Details

Partial failure would also result 'FALSE', e.g. "load" two modules, one successful and the other failed, then the return is 'FALSE'. For "unload" action will always return 'TRUE' even if the module is not loaded at all or not found.

Author(s)

Tyler Backman, Jordan Hayes and Thomas Girke

Examples

## Not run: 
## List all available software from the module system
avail <- moduleAvail()

## List loaded software in the current session
modulelist()

## Example for loading a software into the shell environment
moduleload("hisat2") 
moduleload("hisat2/2.2.1")

## Example for removing software from the shell environment
moduleUnload("hisat2")

## Clear all of the software from the shell's initialization files
moduleClear()

## List and load all the software loaded in users default login shell into the 
current session (default)
moduleInit()

## End(Not run)
## Not run: 
## List all available software from the module system
avail <- moduleAvail()

## List loaded software in the current session
modulelist()

## Example for loading a software into the shell environment
moduleload("hisat2") 
moduleload("hisat2/2.2.1")

## Example for removing software from the shell environment
moduleUnload("hisat2")

## Clear all of the software from the shell's initialization files
moduleClear()

## List and load all the software loaded in users default login shell into the 
current session (default)
moduleInit()

## End(Not run)

Bar plot for intersect sets

Description

Generates bar plots of the intersect counts of VENNset and INTERSECTset objects generated by the overLapper function. It is an alternative to Venn diagrames (e.g. vennPlot) that scales to larger numbers of label sets. By default the bars in the plot are colored and grouped by complexity levels of the intersect sets.

Usage

olBarplot(x, mincount = 0, complexity="default", myxlabel = "default", myylabel="Counts", mytitle = "default", ...)
olBarplot(x, mincount = 0, complexity="default", myxlabel = "default", myylabel="Counts", mytitle = "default", ...)

Arguments

`x`	Object of class `VENNset` or `INTERSECTset`.
`mincount`	Sets minimum number of counts to consider in the bar plot. Default `mincount=0` considers all counts.
`complexity`	Allows user to limit the bar plot to specific complexity levels of intersects by specifying the chosen ones with an integer vector. Default `complexity="default"` considers all complexity levels.
`myxlabel`	Defines label of x-axis.
`myylabel`	Defines label of y-axis.
`mytitle`	Defines main title of plot.
`...`	Allows to pass on additional arguments to `geom_bar` from `ggplot2`. For instance, `fill=seq(along=vennlist(x))` or `fill=seq(along=intersectlist(x))` will assign a different color to each bar, or `fill="blue"` will color all of them blue. The default bar coloring is by complexity levels of the intersect sets.

Value

Bar plot.

Note

The functions provided here are an extension of the Venn diagram resources on this site: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams

Author(s)

Thomas Girke

Examples

## Sample data: list of vectors with object labels
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## 2-way Venn diagram
vennset <- overLapper(setlist[1:2], type="vennsets")
vennPlot(vennset)

## 3-way Venn diagram
vennset <- overLapper(setlist[1:3], type="vennsets")
vennPlot(vennset)

## 4-way Venn diagram
vennset <- overLapper(setlist[1:4], type="vennsets")
vennPlot(list(vennset, vennset))

## Pseudo 4-way Venn diagram with circles
vennPlot(vennset, type="circle")

## 5-way Venn diagram
vennset <- overLapper(setlist[1:5], type="vennsets")
vennPlot(vennset)

## Alternative Venn count input to vennPlot (not recommended!)
counts <- sapply(vennlist(vennset), length)
vennPlot(counts)

## 6-way Venn comparison as bar plot
vennset <- overLapper(setlist[1:6], type="vennsets")
olBarplot(vennset, mincount=1)

## Bar plot of standard intersect counts
interset <- overLapper(setlist, type="intersects")
olBarplot(interset, mincount=1)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
names(interset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)
as.list(interset)

## Pairwise intersect matrix and heatmap
olMA <- sapply(names(setlist), 
		function(x) sapply(names(setlist), 
		function(y) sum(setlist[[x]] %in% setlist[[y]])))
olMA
heatmap(olMA, Rowv=NA, Colv=NA)

## Presence-absence matrices for large numbers of sample sets
interset <- overLapper(setlist=setlist, type="intersects", complexity=2)
(paMA <- intersectmatrix(interset))
heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray")) 
## Sample data: list of vectors with object labels
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## 2-way Venn diagram
vennset <- overLapper(setlist[1:2], type="vennsets")
vennPlot(vennset)

## 3-way Venn diagram
vennset <- overLapper(setlist[1:3], type="vennsets")
vennPlot(vennset)

## 4-way Venn diagram
vennset <- overLapper(setlist[1:4], type="vennsets")
vennPlot(list(vennset, vennset))

## Pseudo 4-way Venn diagram with circles
vennPlot(vennset, type="circle")

## 5-way Venn diagram
vennset <- overLapper(setlist[1:5], type="vennsets")
vennPlot(vennset)

## Alternative Venn count input to vennPlot (not recommended!)
counts <- sapply(vennlist(vennset), length)
vennPlot(counts)

## 6-way Venn comparison as bar plot
vennset <- overLapper(setlist[1:6], type="vennsets")
olBarplot(vennset, mincount=1)

## Bar plot of standard intersect counts
interset <- overLapper(setlist, type="intersects")
olBarplot(interset, mincount=1)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
names(interset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)
as.list(interset)

## Pairwise intersect matrix and heatmap
olMA <- sapply(names(setlist), 
		function(x) sapply(names(setlist), 
		function(y) sum(setlist[[x]] %in% setlist[[y]])))
olMA
heatmap(olMA, Rowv=NA, Colv=NA)

## Presence-absence matrices for large numbers of sample sets
interset <- overLapper(setlist=setlist, type="intersects", complexity=2)
(paMA <- intersectmatrix(interset))
heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))

Identify Range Overlaps for IRanges and GRanges Object

Description

Function for identifying consensus peak among two peaks sets sharing a minimum relative overlap.

Usage

olRanges(query, subject, output = "gr")
olRanges(query, subject, output = "gr")

Arguments

`query`	Object of class `GRanges`, which is a vector of genomic locations and associated annotations.
`subject`	Object of class `GRanges`.
`output`	By default `"gr"` returns any overlap with OL length information in an object of class `GRanges`. Also, can returns an object of class `data.frame` with `"df"`.

Author(s)

Thomas Girke

Examples

## Sample Data Sets
grq <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges::IRanges(seq(1, 100, by=10), end = seq(30, 120, by=10)), strand = Rle(strand(c("-", "+", "-")), c(1, 7, 2)))
grs <- shift(grq[c(2,5,6)], 5)
## Run olRanges function
olRanges(query=grq, subject=grs, output="df") 
olRanges(query=grq, subject=grs, output="gr") 
## Sample Data Sets
grq <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges::IRanges(seq(1, 100, by=10), end = seq(30, 120, by=10)), strand = Rle(strand(c("-", "+", "-")), c(1, 7, 2)))
grs <- shift(grq[c(2,5,6)], 5)
## Run olRanges function
olRanges(query=grq, subject=grs, output="df") 
olRanges(query=grq, subject=grs, output="gr")

Updates the output files paths in the `SYSargs2` object

Description

After executing all the command-lines by the runCommadline function, the output files can be created in specific directories rather then results in a particular directory. Also, the runCommadline function allows converting the SAM file outputs to sorted and indexed BAM files. Thus, the output_update function allows updating the location of these files in the output of the SYSargs2 object.

Usage

output_update(args, dir = FALSE, dir.name = NULL, replace = FALSE, extension = NULL, make_bam=FALSE, del_sam=TRUE)
output_update(args, dir = FALSE, dir.name = NULL, replace = FALSE, extension = NULL, make_bam=FALSE, del_sam=TRUE)

Arguments

`args`	object of class `SYSargs2`.
`dir`	assign `TRUE` to update the location of the output files in the `args` object accordingly with the workflow name directory. Default is `dir=FALSE`.
`dir.name`	if the results directory name is not specified in the `input` file, it is possible to specify here the name. This argument is required if the path name return `NULL` from the `input` file. Default is `dir.name=NULL`.
`replace`	replace the extension for selected output files in the `args` object. Default is `replace=FALSE`.
`extension`	object of class `"character"` storing the current extension of the files and the respective replacement. For example, `runCommandline` function by default autodetects SAM file outputs in the `args` object and create the BAM files. In order to update the output of `args` object, the extension argument should be set: extension = c(`".sam"`, `".bam"`).
`make_bam`	Auto detects SAM file outputs and update them on the `SYSargs2` object for sorted and indexed BAM files. Default is `make_bam=FALSE`. This argument should be used in integration with `runCommandline` function.
`del_sam`	This option allows deleting the SAM files created when the `make_BAM` converts the SAM files to sorted and indexed BAM files. Default is `del_sam=TRUE`.

Value

SYSargs2 object with output location files updated.

Author(s)

Daniela Cassol and Thomas Girke

Examples

 
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
output(WF)

## Not run: 
runCommandline(args=WF, make_bam=TRUE)
## Output paths update
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

runCommandline(args=WF, make_bam=TRUE, dir=TRUE)
## Output paths update
WF <- output_update(WF, dir=TRUE, replace=TRUE, extension=c(".sam", ".bam"))

## End(Not run)
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
output(WF)

## Not run: 
runCommandline(args=WF, make_bam=TRUE)
## Output paths update
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

runCommandline(args=WF, make_bam=TRUE, dir=TRUE)
## Output paths update
WF <- output_update(WF, dir=TRUE, replace=TRUE, extension=c(".sam", ".bam"))

## End(Not run)

Set Intersect and Venn Diagram Functions

Description

Function for computing Venn intersects or standard intersects among large numbers of label sets provided as list of vectors. The resulting intersect objects can be used for plotting 2-5 way Venn diagrams or intersect bar plots using the functions vennPlot or olBarplot, respectively. The overLapper function scales to 2-20 or more label vectors for Venn intersect calculations and to much larger sample numbers for standard intersects. The different intersect types are explained below under the definition of the type argument. The upper Venn limit around 20 label sets is unavoidable because the complexity of Venn intersects increases exponentially with the label set number n according to this relationship: 2^n - 1. The current implementation of the plotting function vennPlot supports Venn diagrams for 2-5 label sets. To visually analyze larger numbers of label sets, a variety of intersect methods are introduced in the olBarplot help file. These methods are much more scalable than Venn diagrams, but lack their restrictive intersect logic.

Usage

overLapper(setlist, complexity = "default", sep = "_", cleanup = FALSE, keepdups = FALSE, type)
overLapper(setlist, complexity = "default", sep = "_", cleanup = FALSE, keepdups = FALSE, type)

Arguments

`setlist`	Object of class `list` where each list component stores a label set as `vector` and the name of each label set is stored in the name slot of each list component. The names are used for naming the label sets in all downstream analysis steps and plots.
`complexity`	Complexity level of intersects specified as integer vector. For Venn intersects it needs to be assigned `1:length(setlist)` (default). If `complexity=2` the function returns all pairwise intersects.
`sep`	Character used to separate set labels.
`cleanup`	If set to `TRUE` then all characters of the label sets are set to upper case, and leading and trailing spaces are removed. The default `cleanup=FALSE` omits this step.
`keepdups`	By default all duplicates are removed from the label sets. The setting `keepdups=TRUE` will retain duplicates by appending a counter to each entry.
`type`	With the default setting `type="vennsets"` the overLapper function computes the typical Venn intersects for the label sets provided under `setlist`. With the setting `type="intersects"` the function will compute pairwise intersects (not compatible with Venn diagrams). Venn intersects follow the typical 'only in' intersect logic of Venn comparisons, such as: labels present only in set A, labels present only in the intersect of A & B, etc. Due to this restrictive intersect logic, the combined Venn sets contain no duplicates. In contrast to this, regular intersects follow this logic: labels present in the intersect of A & B, labels present in the intersect of A & B & C, etc. This approach results usually in many duplications of labels among the intersect sets.

Details

Additional Venn diagram resources are provided by the packages limma, gplots, vennerable, eVenn and VennDiagram, or online resources such as shapes, Venn Diagram Generator and Venny.

Value

overLapper returns standard intersect and Venn intersect results as INTERSECTset or VENNset objects, respectively. These S4 objects contain the following components:

`setlist`	Original label sets accessible with `setlist()`.
`intersectmatrix`	Present-absent matrix accessible with `intersectmatrix()`, where each overlap set in the `vennlist` data component is labeled according to the label set names provided under `setlist`. For instance, the composite name 'ABC' indicates that the entries are restricted to A, B and C. The seperator used for naming the intersect sets can be specified under the `sep` argument.
`complexitylevels`	Complexity levels accessible with `complexitylevels()`.
`vennlist`	Venn intersects for `VENNset` objects accessible with `vennlist()`.
`intersectlist`	Standard intersects for `INTERSECTset` objects accessible with `intersectlist()`.

Note

The functions provided here are an extension of the Venn diagram resources on this site: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams

Author(s)

Thomas Girke

References

See examples in 'The Electronic Journal of Combinatorics': http://www.combinatorics.org/files/Surveys/ds5/VennSymmExamples.html

Examples

## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## 2-way Venn diagram
vennset <- overLapper(setlist[1:2], type="vennsets")
vennPlot(vennset)

## 3-way Venn diagram
vennset <- overLapper(setlist[1:3], type="vennsets")
vennPlot(vennset)

## 4-way Venn diagram
vennset <- overLapper(setlist[1:4], type="vennsets")
vennPlot(list(vennset, vennset))

## Pseudo 4-way Venn diagram with circles
vennPlot(vennset, type="circle")

## 5-way Venn diagram
vennset <- overLapper(setlist[1:5], type="vennsets")
vennPlot(vennset)

## Alternative Venn count input to vennPlot (not recommended!)
counts <- sapply(vennlist(vennset), length)
vennPlot(counts)

## 6-way Venn comparison as bar plot
vennset <- overLapper(setlist[1:6], type="vennsets")
olBarplot(vennset, mincount=1)

## Bar plot of standard intersect counts
interset <- overLapper(setlist, type="intersects")
olBarplot(interset, mincount=1)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
names(interset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)
as.list(interset)

## Pairwise intersect matrix and heatmap
olMA <- sapply(names(setlist), 
		function(x) sapply(names(setlist), 
		function(y) sum(setlist[[x]] %in% setlist[[y]])))
olMA
heatmap(olMA, Rowv=NA, Colv=NA)

## Presence-absence matrices for large numbers of sample sets
interset <- overLapper(setlist=setlist, type="intersects", complexity=2)
(paMA <- intersectmatrix(interset))
heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray")) 
## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## 2-way Venn diagram
vennset <- overLapper(setlist[1:2], type="vennsets")
vennPlot(vennset)

## 3-way Venn diagram
vennset <- overLapper(setlist[1:3], type="vennsets")
vennPlot(vennset)

## 4-way Venn diagram
vennset <- overLapper(setlist[1:4], type="vennsets")
vennPlot(list(vennset, vennset))

## Pseudo 4-way Venn diagram with circles
vennPlot(vennset, type="circle")

## 5-way Venn diagram
vennset <- overLapper(setlist[1:5], type="vennsets")
vennPlot(vennset)

## Alternative Venn count input to vennPlot (not recommended!)
counts <- sapply(vennlist(vennset), length)
vennPlot(counts)

## 6-way Venn comparison as bar plot
vennset <- overLapper(setlist[1:6], type="vennsets")
olBarplot(vennset, mincount=1)

## Bar plot of standard intersect counts
interset <- overLapper(setlist, type="intersects")
olBarplot(interset, mincount=1)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
names(interset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)
as.list(interset)

## Pairwise intersect matrix and heatmap
olMA <- sapply(names(setlist), 
		function(x) sapply(names(setlist), 
		function(y) sum(setlist[[x]] %in% setlist[[y]])))
olMA
heatmap(olMA, Rowv=NA, Colv=NA)

## Presence-absence matrices for large numbers of sample sets
interset <- overLapper(setlist=setlist, type="intersects", complexity=2)
(paMA <- intersectmatrix(interset))
heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))

Plot feature coverage results

Description

Plots the 3 tabular data types (A-C) generated by the featureCoverage function. It accepts data from single or many features (e.g. CDSs) and samples (BAM files). The coverage from multiple features will be summarized using methods such as mean, while the data from multiple samples will be plotted in separate panels.

Usage

plotfeatureCoverage(covMA, method = mean, scales = "fixed", extendylim=2, 
                    scale_count_val = 10^6)
plotfeatureCoverage(covMA, method = mean, scales = "fixed", extendylim=2, 
                    scale_count_val = 10^6)

Arguments

`covMA`	Object of class `data.frame` generated by `featureCoverage` function.
`method`	Defines the summary statistics to use when `covMA` contains coverage data from multiple features (e.g. transcripts). The default calculates the mean coverage for each positon and/or bin of the corresponding coverage vectors.
`scales`	Scales setting passed on to the `facet_wrap` function of `ggplot2`. For details see `ggplot2::facet_wrap`. The default `fixed` assures a constant scale across all bar plot panels, while `free` uses the optimum scale within each bar plot panel. To evaluate plots in all their details, it may be necessary to generate two graphics files one for each scaling option.
`extendylim`	Allows to extend the upper limit of the y axis when `scales=fixed`. Internally, the function identifies the maximum value in the data and then multiplies this maximum value by the value provided under `extendylim`. The default is set to `extendylim=2`.
`scale_count_val`	Scales (normalizes) the read counts to a fixed value of aligned reads in each sample such as counts per million aligned reads (default is 10^6). For this calculation the `N_total_aligned` values are used that are reported in the input `data.frame` generated by the upstream `featureCoverage` function. Assign `NULL` to turn off scaling.

Value

Currently, the function returns ggplot2 bar plot graphics.

Author(s)

Thomas Girke

Examples

## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")

## (A) Generate binned coverage for two BAM files and 4 transcripts
grl <- cdsBy(txdb, "tx", use.names=TRUE)
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage.xls")
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (B) Coverage matrix upstream and downstream of start/stop codons
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_UpDown.xls")
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (C) Combined matrix for both binned and start/stop codon 
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/test.xls")
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (D) Rle coverage objects one for each query feature
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/RleCoverage.xls")

## End(Not run)
## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")

## (A) Generate binned coverage for two BAM files and 4 transcripts
grl <- cdsBy(txdb, "tx", use.names=TRUE)
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage.xls")
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (B) Coverage matrix upstream and downstream of start/stop codons
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/featureCoverage_UpDown.xls")
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (C) Combined matrix for both binned and start/stop codon 
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/test.xls")
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (D) Rle coverage objects one for each query feature
fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20, 
                    outfile="results/RleCoverage.xls")

## End(Not run)

Plot read distribution across genomic features

Description

Function to visualize the distribution of reads across different feature types for many alignment files in parallel. The plots are stacked bar plots representing the raw or normalized read counts for the sense and antisense strand of each feature. The graphics results are generated with ggplot2. Typically, the expected input is generated with the affiliated featuretypeCounts function.

Usage

plotfeaturetypeCounts(x, graphicsfile, graphicsformat = "pdf", scales = "fixed", anyreadlength = FALSE, 
                      drop_N_total_aligned = TRUE, scale_count_val = 10^6, scale_length_val = NULL)
plotfeaturetypeCounts(x, graphicsfile, graphicsformat = "pdf", scales = "fixed", anyreadlength = FALSE, 
                      drop_N_total_aligned = TRUE, scale_count_val = 10^6, scale_length_val = NULL)

Arguments

`x`	`data.frame` with feature counts generated by the `featuretypeCounts` function.
`graphicsfile`	Path to file where to write the output graphics. Note, the function returns the graphics instructions from `ggplot2` for interactive plotting in R. However, due to the complexity of the graphics generated here, the finished results are written to a file directly.
`graphicsformat`	Graphics file format. Currently, supported formats are: pdf, png or jpeg. Argument accepts one of them as character string.
`scales`	Scales setting passed on to the `facet_wrap` function of `ggplot2`. For details see `ggplot2::facet_wrap`. The default `fixed` assures a constant scale across all bar plot panels, while `free` uses the optimum scale within each bar plot panel. To evaluate plots in all their details, it may be necessary to generate two graphics files one for each scaling option.
`anyreadlength`	If set to `TRUE` read length specific read counts will be summed up to a single count value to plot read counts for any read length. Otherwise the bar plots will show the counts for each read length value.
`drop_N_total_aligned`	If set to `TRUE` the special feature count `N_total_aligned` will not be included as a separate feature in the plots. However, the information will still be used internally for scaling the read counts to a fixed value if this option is requested under the `scale_count_val` argument.
`scale_count_val`	Scales (normalizes) the read counts to a fixed value of aligned reads in each sample such as counts per million aligned reads (default is 10^6). For this calculation the `N_total_aligned` values are used that are reported in the input `data.frame` generated by the upstream `featuretypeCounds` function. Assign `NULL` to turn off scaling by aligned reads.
`scale_length_val`	Allows to adjust the raw or scaled read counts to a constant length interval (e.g. `scale_length_val=10^3` in bps) considering the total genomic length of the corresponding feature type. The required genomic length information for each feature type is obtained from the `Featuretypelength` column of the input `data.frame` generated by the `featuretypeCount` function. To turn off feature length adjustment, assign `NULL` (default).

Value

The function returns bar plot graphics for aligned read counts with read length resolution if the input contains this information and argument anyreadlength is set to FALSE. If the input contains counts for any read length and/or anyreadlength=TRUE then there will be only one bar per feature and sample. Due to the complexity of the plots, the results are directly written to file in the chosen graphics format. However, the function also returns the plotting instructions returned by ggplot2 to display the result components using R's plotting device.

Author(s)

Thomas Girke

Examples

## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE)

## Generate and plot feature counts for specific read lengths
fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame")
p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE)

## Generate and plot feature counts for any read length  
fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame")
p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE)

## End(Not run)
## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE)

## Generate and plot feature counts for specific read lengths
fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame")
p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE)

## Generate and plot feature counts for any read length  
fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame")
p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE)

## End(Not run)

Visualize SPR workflow and status

Description

Visualize SPR workflow and status. plotWF is the general function that creates the plot. plotwfOutput and renderPlotwf are used in Shiny UI and server respectively, similar to plotOutput and renderPlot.

Usage

plotWF(
  sysargs,
  width = NULL, height = NULL,
  elementId = NULL,
  responsive = TRUE,
  branch_method = "auto",
  branch_no = NULL,
  layout = "compact",
  no_plot = FALSE,
  plot_method = "svg",
  out_format = "plot",
  out_path = NULL,
  show_legend = TRUE,
  mark_main_branch = FALSE,
  rstudio = FALSE,
  in_log = FALSE,
  rmarkdown = "detect",
  verbose = FALSE,
  show_warns = FALSE,
  plot_ctr = TRUE,
  pan_zoom = FALSE,
  exit_point = 0
)

plotwfOutput(
  outputId,
  width = '100%',
  height = '400px'
)

renderPlotwf(
  expr,
  env = parent.frame(),
  quoted = FALSE
)
plotWF(
  sysargs,
  width = NULL, height = NULL,
  elementId = NULL,
  responsive = TRUE,
  branch_method = "auto",
  branch_no = NULL,
  layout = "compact",
  no_plot = FALSE,
  plot_method = "svg",
  out_format = "plot",
  out_path = NULL,
  show_legend = TRUE,
  mark_main_branch = FALSE,
  rstudio = FALSE,
  in_log = FALSE,
  rmarkdown = "detect",
  verbose = FALSE,
  show_warns = FALSE,
  plot_ctr = TRUE,
  pan_zoom = FALSE,
  exit_point = 0
)

plotwfOutput(
  outputId,
  width = '100%',
  height = '400px'
)

renderPlotwf(
  expr,
  env = parent.frame(),
  quoted = FALSE
)

Arguments

`sysargs`	object of class `SYSargsList`.
`width`	string, a valid CSS string for width, like "500px", "100%".
`height`	string, a valid CSS string for height, like "500px", "100%".
`elementId`	string, optional ID value for the plot.
`responsive`	bool, should the plot be responsive? useful in Rstudio built-in viewer, Rmarkdown, Shiny or embed it into other web pages.
`branch_method`	string, one of "auto", "choose". How to determine the main branch of the workflow. "auto" will be determined by internal alrgothrim: Branches connecting the frist and last step and/or the longest will be favored. "choose" will list all possible branches and you can make a choice.
`branch_no`	numeric, only works if `branch_method == "choose"`. Specify a branch number to be the main branch instead of choosing from the prompt. This option can be good if you are in a non-interactive mode, e.g. rendering Rmd.
`layout`	string, one of "compact", "vertical", "horizontal", "execution".
`no_plot`	bool, if you want to assgin the plot to a variable and do not want to see it interactively, change this to `FALSE`.
`plot_method`	string, one of "svg", "png", how to make plot, use svg or png to embed the plot.
`out_format`	string, one of "plot", "html", "dot", "dot_print" plot: directly open your viewer or browser of the plot html: save the plot to a html file dot: save the plot in DOT language, need a dot engine to remake the plot dot_print: directly cat the DOT code on console See `details` section if one wish to generate other output format, such as `jpg` or `png`.
`out_path`	string, if the `out_format` is not "plot" or "dot_print", provide a path of where to save the plot.
`show_legend`	bool, show plot legend?
`mark_main_branch`	bool, color the main branch on the plot?
`rstudio`	bool, if you are using Rstudio, open the built-in viewer to see the plot? Default is no, open the browser tab to see it plot. The default viewer is too small to see the full plot clearly, so we recommend to use the browser tab. However, if you are using this plot in Shiny apps, always turn `rstudio = TRUE`.
`in_log`	bool, is this plot been made in a SPR log file? If `TRUE` will add links of steps to the corresponding log sections.
`rmarkdown`	are you rendering this plot in a Rmarkdown document? default value is "detect", this function will determine based on current R environment, or you can force it to be `TRUE` or `FALSE`.
`verbose`	bool, turn on verbose mode will give you more information.
`show_warns`	bool, print the warning messages on the plot?.
`plot_ctr`	bool, add the plot control panel to the plot? This requires you to have internet connection. It will download some additional javascript libraries, and allow you to save the plot as png, jpg, svg, pdf or graphviz directly from the browser.
`pan_zoom`	bool, allow panning and zooming of the plot? Use mouse wheel or touch pad to zoom in and out of the plot. You need to have internet connection, additional javascript libraries will be loaded automatically online. Cannot be used with `responsive = TRUE` together. If both `TRUE`, `responsive` will be automatically set to `FALSE`.
`exit_point`	numeric, for advanced debugging only, see details
`outputId`	string, shiny output ID
`expr`	An expression that generates a plotwf, like `plotWF(sal)`
`env`	The environment in which to evaluate `expr`.
`quoted`	Is `expr` a quoted expression (with `quote()`)? This is useful if you want to save an expression in a variable.

Details

layout

compact: try to plot steps as close as possible.
vertical: main branch will be placed vertically and side branches will be placed on the same horizontal level and sub steps of side branches will be placed vertically.
horizontal: main branch is placed horizontally and side branches and sub steps will be placed vertically.
execution: a linear plot to show the workflow execution order of all steps.

exit_point

return intermediate results at different points and exit the function

0: no early exit
1: after all branches are found, return tree
2: after the new tree has been built, return new nodes
3: after dot translation, return graph string

Rmarkdown

Rmarkdown will change some of the format and cause conflicts. If the plot can be rendered outside Rmd but cannot within Rmd, try to turn this option on. Some additional javascript processing will be performed to avoid the conflict but may cause unknown issues.

Other output formats

The plot rendering uses htmlwidgets, which generates an interactive HTML page. Saving these plots directly to standard image files, such as png, is not possible. However, a few workarounds exist to save to these image formats:

1: use webshot2::webshot function.
2: use the interactive panel located on the top-left corner to download as an image after the plot is rendered.
3: use plotWF(sal, plot_method = "png") to embed the plot as png, and then right-click to save the image.

Please see our website for examples: https://systempipe.org/sp/spr/sp_run/step_vis/

Shiny

When the plot is rendered in a Shiny app, the rstudio option must be turned on, plotWF(sal, rstudio = TRUE, ...).

Value

see out_format and exit_point

Predict ORFs

Description

Predicts open reading frames (ORFs) and coding sequences (CDSs) in DNA sequences provided as DNAString or DNAStringSet objects.

Usage

predORF(x, n = 1, type = "grl", mode = "orf", strand = "sense", longest_disjoint=FALSE, startcodon = "ATG", stopcodon = c("TAA", "TAG", "TGA"))
predORF(x, n = 1, type = "grl", mode = "orf", strand = "sense", longest_disjoint=FALSE, startcodon = "ATG", stopcodon = c("TAA", "TAG", "TGA"))

Arguments

`x`	DNA query sequence(s) provided as `DNAString` or `DNAStringSet` object.
`n`	Defines the maximum number of ORFs to return for each input sequence. The ORFs identified are sorted decreasingly by their length. For instance, `n=1` (default) returns the longest ORF, `n=2` the two longest ones, and so on.
`type`	One of three options provided as character values: `'df'` returns results as `data.frame`, while `'gr'` and `'grl'` (default) return them as `GRanges` or `GRangesList` objects, respectively.
`mode`	The setting `mode='ORF'` returns a continuous reading frame that begins with a start codon and ends with a stop codon. The setting `mode='CDS'` return continuous reading frames that do not need to begin or end with start or stop codons, respectively.
`strand`	One of three options passed on as character vector of length one: `'sense'` performs the predictions only for the sense strand of the query sequence(s), `'antisense'` does it only for the antisense strand and `'both'` does it for both strands.
`longest_disjoint`	If set to `TRUE` and `n='all'`, the results will be subsetted to non-overlapping ORF set containing longest ORF.
`startcodon`	Defines the start codon(s) for ORF predictions. The default is set to the standard start codon 'ATG'. Any custom set of triplet DNA sequences can be assigned here.
`stopcodon`	Defines the stop codon(s) for ORF predictions. The default is set to the three standard stop codons 'TAA', 'TAG' and 'TGA'. Any custom set of triplet DNA sequences can be assigned here.

Value

Returns ORF/CDS ranges identified in query sequences as GRanges or data.frame object. The type argument defines which one of them will be returned. The objects contain the following columns:

seqnames: names of query sequences
subject_id: identified ORF/CDS ranges numbered by query
start/end: start and end positions of ORF/CDS ranges
strand: strand of query sequence used for prediction
width: length of subject range in bases
inframe2end: frame of identified ORF/CDS relative to 3' end of query sequence. This can be important if the query sequence was extracted directly upstream of an ORF (e.g. 5' UTR upstream of main ORF). The value 1 stands for in-frame with downstream ORF, while 2 or 3 indicates a shift of one or two bases, respectively.

Author(s)

Thomas Girke

Examples

## Load DNA sample data set from Biostrings package
file <- system.file("extdata", "someORF.fa", package="Biostrings")
dna <- readDNAStringSet(file)

## Predict longest ORF for sense strand in each query sequence
(orf <- predORF(dna[1:4], n=1, type="gr", mode="orf", strand="sense"))

## Not run: 
## Usage for more complex example
library(txdbmaker); library(systemPipeRdata)
gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis")
futr <- fiveUTRsByTranscript(txdb, use.names=TRUE)
genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata")
dna <- extractTranscriptSeqs(FaFile(genome), futr)
uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense")
grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE)
export.gff3(unlist(grl_scaled), "uorf.gff")

## End(Not run)
## Load DNA sample data set from Biostrings package
file <- system.file("extdata", "someORF.fa", package="Biostrings")
dna <- readDNAStringSet(file)

## Predict longest ORF for sense strand in each query sequence
(orf <- predORF(dna[1:4], n=1, type="gr", mode="orf", strand="sense"))

## Not run: 
## Usage for more complex example
library(txdbmaker); library(systemPipeRdata)
gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis")
futr <- fiveUTRsByTranscript(txdb, use.names=TRUE)
genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata")
dna <- extractTranscriptSeqs(FaFile(genome), futr)
uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense")
grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE)
export.gff3(unlist(grl_scaled), "uorf.gff")

## End(Not run)

Run custom read preprocessing functions

Description

Applies custom read preprocessing functions to single-end or paired-end FASTQ files. The function uses the FastqStreamer function from the ShortRead package to stream through large files in a memory-efficient manner.

Usage

preprocessReads(args = NULL,
                FileName1 = NULL, FileName2 = NULL, 
                outfile1 = NULL, outfile2 = NULL, 
                Fct, batchsize = 100000, overwrite = TRUE, ...)
preprocessReads(args = NULL,
                FileName1 = NULL, FileName2 = NULL, 
                outfile1 = NULL, outfile2 = NULL, 
                Fct, batchsize = 100000, overwrite = TRUE, ...)

Arguments

`args`	Object of class `SYSargs` or `SYSargs2`.
`FileName1`	Path to input forward fastq file. Defaul is `NULL`.
`FileName2`	Path to input reverse fastq file. Defaul is `NULL`
`outfile1`	Path to output forward fastq file. Defaul is `NULL`
`outfile2`	Path to output reverse fastq file. Defaul is `NULL`
`Fct`	`character` string of custom read preprocessing function call where both the input and output needs to be an object of class `ShortReadQ`. The name of the input `ShortReadQ` object needs to be `fq`.
`batchsize`	Number of reads to process in each iteration by the internally used `FastqStreamer` function.
`overwrite`	If `TRUE` existing file will be overwritten.
`...`	To pass on additional arguments to the internally used `writeFastq` function.

Value

Writes to files in FASTQ format. Their names are specified by outpaths(args).

Author(s)

Thomas Girke

Examples

## Preprocessing of single-end reads
dir_path <- system.file("extdata/cwl/preprocessReads/trim-se", package="systemPipeR")
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
trim <- loadWorkflow(targets=targetspath, wf_file="trim-se.cwl", input_file="trim-se.yml", dir_path=dir_path)
trim <- renderWF(trim, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
## Not run: 
preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE)

## End(Not run)

## Preprocessing of paired-end reads
dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR")
targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR")
trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path)
trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_"))
trim
## Not run: 
preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE)

## End(Not run)
## Preprocessing of single-end reads
dir_path <- system.file("extdata/cwl/preprocessReads/trim-se", package="systemPipeR")
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
trim <- loadWorkflow(targets=targetspath, wf_file="trim-se.cwl", input_file="trim-se.yml", dir_path=dir_path)
trim <- renderWF(trim, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
## Not run: 
preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE)

## End(Not run)

## Preprocessing of paired-end reads
dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR")
targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR")
trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path)
trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_"))
trim
## Not run: 
preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE)

## End(Not run)

Accessories function to modify the Command-line Version 1

Description

Accessories function to modify the Command-line Version 1

Usage

printParam(sysargs, position, index = NULL)
subsetParam(sysargs, position, index = NULL, trim = TRUE, mute = FALSE)
replaceParam(sysargs, position, index = NULL, replace, mute = FALSE)
renameParam(sysargs, position, index = FALSE, rename, mute = FALSE)
appendParam(sysargs, position, index = NULL, append, after, mute = FALSE)
printParam(sysargs, position, index = NULL)
subsetParam(sysargs, position, index = NULL, trim = TRUE, mute = FALSE)
replaceParam(sysargs, position, index = NULL, replace, mute = FALSE)
renameParam(sysargs, position, index = FALSE, rename, mute = FALSE)
appendParam(sysargs, position, index = NULL, append, after, mute = FALSE)

Arguments

`sysargs`	Object of class `SYSargs2`. Output from the `createParamFiles` function.
`position`	string, one of `baseCommand`, `inputs`, `outputs` to view or apply a modification.
`index`	numeric or character vector, index to view or change a single item in `baseCommand`, `inputs`, `outputs`.
`trim`	logical, only keep arguments specified by `index`. Default is "TRUE".
`replace`	named list, replace arguments in different positions. Replace list length must be the same as `index`. Different positions will have different requirements.
`rename`	character vector, rename items in different positions. `rename` vector length must be the same as `index`.
`append`	named list, same requirements as `replace`, however it cannot append `baseCommand`.
`after`	a subscript, after which the values are to be appended. If `NULL` will be after the last argument or specify a numeric integer.
`mute`	logical, print the raw command-line string and output after replacing or rename.

Details

- printParam: prints its arguments defined by position and index.

- subsetParam: returns subsets of command-line, keeping the arguments defined by position and index.

- replaceParam: replaces the values in command-line with indices given in list by those given in values

- renameParam: rename the names of the arguments.

- appendParam: Add arguments to the original command line.

Value

SYSargs2 object

Author(s)

Le Zhang and Daniela Cassol

References

For more details on CWL, please consult the following page: https://www.commonwl.org/

Examples

command <- "
hisat2 \
    -S <F, out: ./results/M1A.sam> \
    -x <F: ./data/tair10.fasta> \
    -k <int: 1> \
    -min-intronlen <int: 30> \
    -max-intronlen <int: 3000> \
    -threads <int: 4> \
    -U <F: ./data/SRR446027_1.fastq.gz> \
    --verbose
"
cmd <- createParamFiles(command)
cmdlist(cmd)
command <- "
hisat2 \
    -S <F, out: ./results/M1A.sam> \
    -x <F: ./data/tair10.fasta> \
    -k <int: 1> \
    -min-intronlen <int: 30> \
    -max-intronlen <int: 3000> \
    -threads <int: 4> \
    -U <F: ./data/SRR446027_1.fastq.gz> \
    --verbose
"
cmd <- createParamFiles(command)
cmdlist(cmd)

Accessories function to modify the Command-line Version 2

Description

Accessories function to modify the Command-line Version 2

Usage

printParam2(sysargs, base = FALSE, args = FALSE, inputs = FALSE,
        outputs = FALSE, stdout = FALSE, raw_cmd = FALSE, all = TRUE)
appendParam2(sysargs, x, position = c("inputs", "args", "outputs"),
        after = NULL, verbose = FALSE)
replaceParam2(sysargs, x, index=NULL,
              position = c("inputs", "baseCommand", "args", "outputs", "stdout"),
              verbose = FALSE)
removeParam2(sysargs, index=NULL, position = c("inputs", "args", "outputs", "stdout"),
             verbose = FALSE)
renameParam2(sysargs, index=NULL, new_names,
             position = c("inputs", "args", "outputs", "stdout"), verbose = FALSE)
printParam2(sysargs, base = FALSE, args = FALSE, inputs = FALSE,
        outputs = FALSE, stdout = FALSE, raw_cmd = FALSE, all = TRUE)
appendParam2(sysargs, x, position = c("inputs", "args", "outputs"),
        after = NULL, verbose = FALSE)
replaceParam2(sysargs, x, index=NULL,
              position = c("inputs", "baseCommand", "args", "outputs", "stdout"),
              verbose = FALSE)
removeParam2(sysargs, index=NULL, position = c("inputs", "args", "outputs", "stdout"),
             verbose = FALSE)
renameParam2(sysargs, index=NULL, new_names,
             position = c("inputs", "args", "outputs", "stdout"), verbose = FALSE)

Arguments

`sysargs`	Object of class `SYSargs2`. Output from the `createParamFiles` function.
`base`	logical, print out base command information?
`args`	logical, print out arguments information?
`inputs`	logical, print out inputs information?
`outputs`	logical, print out outputs information?
`stdout`	logical, print out stdout information?
`raw_cmd`	logical, print out parsed raw command information?
`all`	logical, print out all base command, arguments, inputs, outputs, and raw command information? Turn this to `TRUE` will overwrite all above to `TRUE`. If you need to print out selected positions, turn this to `FALSE` and turn other positions to `TRUE`.
`position`	string, one of the positions to apply a modification. For `appendParam2`: "inputs", "args", "outputs", for `replaceParam2`: "inputs", "baseCommand", "args", "outputs", "stdout", for `removeParam2`, `renameParam2`: "inputs", "args", "outputs", "stdout".
`index`	numeric or character vector, index of items to remove or rename item(s) inside the position you choose, in `removeParam2`, `renameParam2`, `replaceParam2`.
`after`	integer, in `appendParam2`, after which current item you want to append the new items in the position you have selected? For example, if you want the new item to be the first, then use `0`, if last, then use `999` to make sure it goes to the last one.
`x`	named list or string, new items to replace or append in different positions. Replace list length must be the same as `index`. Different positions will have different requirements. See details of the `x` format requirements.
`new_names`	character vector, new names that you wish to replace the old names. `new_names` vector length must be the same as `index` in `renameParam2`.
`verbose`	logical, show addtional information during/after operation? for example, print the new changes.

Details

- printParam2: prints its arguments defined by position and index.

- removeParam2: removes items in certain positions you select.

- replaceParam2: replaces the values in command-line with indices given in list by those given in values

- renameParam2: rename the names of items in certain position.

- appendParam2: Add arguments to the original command line. Adding new basecommand or standard out is not allowed.

x format

- If x is a character, it requires exact 3 semi-colons ; to separate the string in to 4 columns. Values before the third column is the same as createParam inputs, first column: prefix/argument name, second column: type, third column: default value. The fourth column (new): numeric, index of the new item, this will be translated into position entries in CWL.

- If x is a list, it must be named. Following items must be included in list: preF, type, value, index. They refer to prefix, param type, default value, and position index correspondingly.

Value

SYSargs2 object

Author(s)

Le Zhang and Daniela Cassol

References

For more details on CWL, please consult the following page: https://www.commonwl.org/

Examples

command2 <- '
mycmd2 \
    p: -s; File; sample1.txt \
    p: -s; File; sample2.txt \
    p: --c; ; \
    p: -o; File; out: myout.txt \
    ref_genome; File; a.fasta \
    p: --nn; int; 12 \
    mystdout; File; stdout: abc.txt
'
cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE)
# string format
new_cmd <- 'p: -abc; string; abc; 7'
cmd2 <- appendParam2(cmd2, x = new_cmd, position = "inputs")
printParam2(cmd2, all = FALSE, inputs = TRUE, raw_cmd = TRUE)
# list format
new_cmd <- list(name = "new_arg", preF = "--foo", index = "8")
cmd2 <- appendParam2(cmd2, x = new_cmd, position = "args")
printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)
# rename
cmd2 <- renameParam2(cmd2, "new_name_arg", index = "new_arg", position = "args")
printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)
# remove
cmd2 <- removeParam2(cmd2, index = "new_name_arg", position = "args")
printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)
command2 <- '
mycmd2 \
    p: -s; File; sample1.txt \
    p: -s; File; sample2.txt \
    p: --c; ; \
    p: -o; File; out: myout.txt \
    ref_genome; File; a.fasta \
    p: --nn; int; 12 \
    mystdout; File; stdout: abc.txt
'
cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE)
# string format
new_cmd <- 'p: -abc; string; abc; 7'
cmd2 <- appendParam2(cmd2, x = new_cmd, position = "inputs")
printParam2(cmd2, all = FALSE, inputs = TRUE, raw_cmd = TRUE)
# list format
new_cmd <- list(name = "new_arg", preF = "--foo", index = "8")
cmd2 <- appendParam2(cmd2, x = new_cmd, position = "args")
printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)
# rename
cmd2 <- renameParam2(cmd2, "new_name_arg", index = "new_arg", position = "args")
printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)
# remove
cmd2 <- removeParam2(cmd2, index = "new_name_arg", position = "args")
printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)

Import sample comparisons from targets file

Description

Parses sample comparisons specified in <CMP> line(s) of targets file or in targetsheader slot of SYSargs object. All possible comparisons can be specified with 'CMPset: ALL'.

Usage

readComp(file, format = "vector", delim = "-")
readComp(file, format = "vector", delim = "-")

Arguments

`file`	Path to targets file. Alternatively, a `SYSargs` or `SYSargs2` object can be assigned.
`format`	Object type to return: `vector` or `matrix`.
`delim`	Delimiter to use when sample comparisons are returned as `vector`.

Value

list where each component is named according to the name(s) used in the <CMP> line(s) of the targets file. The list will contain as many sample comparisons sets (list components) as there are sample comparisons lines in the corresponding targets file.

Author(s)

Thomas Girke

Examples

## Return comparisons from targets file
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")
readComp(file=targetspath, format="vector", delim="-")

## Return comparisons from SYSargs2 object
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args
readComp(args, format = "vector", delim = "-")
## Return comparisons from targets file
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")
readComp(file=targetspath, format="vector", delim="-")

## Return comparisons from SYSargs2 object
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args
readComp(args, format = "vector", delim = "-")

Render RMarkdown Logs Report

Description

Render the logs report file to the specified output format using pandoc.

Usage

renderLogs(sysargs, type = c("html_document", "pdf_document"), 
            fileName = "default", quiet = FALSE, 
            open_file = TRUE)
renderLogs(sysargs, type = c("html_document", "pdf_document"), 
            fileName = "default", quiet = FALSE, 
            open_file = TRUE)

Arguments

`sysargs`	object of class `SYSargsList`.
`type`	The R Markdown output format to convert to. The option can be the name of a format (e.g. `"pdf_document"` or `"html_document"`).
`fileName`	character string naming a file output. Default is `"logs_<date>.Rmd"`.
`quiet`	If set to `TRUE`, all messages returned by the function will be suppressed.
`open_file`	Default is `TRUE`.

Value

It will return an SYSargsList updated.

Author(s)

Daniela Cassol

Examples

## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
sal <- runWF(sal)
sal <- renderLogs(sal, open_file = FALSE)
## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
sal <- runWF(sal)
sal <- renderLogs(sal, open_file = FALSE)

Render RMarkdown Report

Description

Render the technical report file to the specified output format using pandoc.

Usage

renderReport(sysargs, fileName ="SPR_Report",
              rmd_title = "SPR workflow Template - Report",
              rmd_author = "Author",
              rmd_date= "Last update: `r format(Sys.time(), '%d %B, %Y')`",
              type = c("html_document"),
              desc = "This is a workflow template.",
              quiet = FALSE, open_file = TRUE)
renderReport(sysargs, fileName ="SPR_Report",
              rmd_title = "SPR workflow Template - Report",
              rmd_author = "Author",
              rmd_date= "Last update: `r format(Sys.time(), '%d %B, %Y')`",
              type = c("html_document"),
              desc = "This is a workflow template.",
              quiet = FALSE, open_file = TRUE)

Arguments

`sysargs`	object of class `SYSargsList`.
`fileName`	character string naming a file output. Default is `"spr_report.Rmd"`.
`rmd_title`	string, title of the Rmd.
`rmd_author`	string, author(s) of the Rmd, put all authors in a single character string.
`rmd_date`	string, date header of Rmd.
`type`	The R Markdown output format to convert to. The option can be the name of a format (e.g. `"pdf_document"` or `"html_document"`).
`desc`	string, or character vector of strings, some description text in format Rmarkdown that will be added to the document before the workflow steps start. It can be a single line or multiple lines by providing a character vector, each item is one line.
`quiet`	If set to `TRUE`, all messages returned by the function will be suppressed.
`open_file`	Default is `TRUE`.

Value

It will return an SYSargsList updated, with the file path location.

Author(s)

Daniela Cassol

Examples

sal <- SPRproject(overwrite = TRUE)
file_path <- system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR")
sal <- importWF(sal, file_path = file_path, verbose = FALSE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package = "systemPipeR")
appendStep(sal) <- SYSargsList(step_name = "echo", 
                              targets = targetspath, dir = TRUE,
                              wf_file = "example/workflow_example.cwl", input_file = "example/example.yml", 
                              dir_path = system.file("extdata/cwl", package = "systemPipeR"),
                              inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
sal <- renderReport(sal, open_file = FALSE)
sal <- SPRproject(overwrite = TRUE)
file_path <- system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR")
sal <- importWF(sal, file_path = file_path, verbose = FALSE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package = "systemPipeR")
appendStep(sal) <- SYSargsList(step_name = "echo", 
                              targets = targetspath, dir = TRUE,
                              wf_file = "example/workflow_example.cwl", input_file = "example/example.yml", 
                              dir_path = system.file("extdata/cwl", package = "systemPipeR"),
                              inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
sal <- renderReport(sal, open_file = FALSE)

RPKM Normalization

Description

Converts read counts to RPKM normalized values.

Usage

returnRPKM(counts, ranges)
returnRPKM(counts, ranges)

Arguments

`counts`	Count data frame, e.g. from an RNA-Seq experiment.
`ranges`	`GRangesList` object, e.g. generated by `exonsBy(txdb, by="gene")`.

Value

data.frame

Author(s)

Thomas Girke

Examples

## Not run: 
countDFrpkm <- apply(countDF, 2, function(x) returnRPKM(counts=x, gffsub=eByg))

## End(Not run)
## Not run: 
countDFrpkm <- apply(countDF, 2, function(x) returnRPKM(counts=x, gffsub=eByg))

## End(Not run)

Runs DESeq2

Description

Convenience wrapper function to identify differentially expressed genes (DEGs) in batch mode with DESeq2 for any number of pairwise sample comparisons specified under the cmp argument. Users are strongly encouraged to consult the DESeq2 vignette for more detailed information on this topic and how to properly run DESeq2 on data sets with more complex experimental designs.

Usage

run_DESeq2(countDF, targets, cmp, independent = FALSE, lfcShrink=FALSE, type="normal")
run_DESeq2(countDF, targets, cmp, independent = FALSE, lfcShrink=FALSE, type="normal")

Arguments

`countDF`	`date.frame` containing raw read counts
`targets`	targets `data.frame`
`cmp`	`character matrix` where comparisons are defined in two columns. This matrix should be generated with the `readComp()` function from the targets file. Values used for comparisons need to match those in the `Factor` column of the targets file.
`independent`	If `independent=TRUE` then the `countDF` will be subsetted for each comparison. This behavior can be useful when working with samples from unrelated studies. For samples from the same or comparable studies, the setting `independent=FALSE` is usually preferred.
`lfcShrink`	logiacal. If `TRUE` adds shrunken log2 fold changes (LFC) to the object.
`type`	please check `DESeq2::lfcShrink()` documentation. Available `character` alternatives: "apeglm"; "ashr"; "normal".

Value

data.frame containing DESeq2 results from all comparisons. Comparison labels are appended to column titles for tracking.

Author(s)

Thomas Girke

References

Please properly cite the DESeq2 papers when using this function: http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html

Examples

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
cmp <- readComp(file=targetspath, format="matrix", delim="-")
countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR")
countDF <- read.delim(countfile, row.names=1)
degseqDF <- run_DESeq2(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE)
pval <- degseqDF[, grep("_FDR$", colnames(degseqDF)), drop=FALSE]
fold <- degseqDF[, grep("_logFC$", colnames(degseqDF)), drop=FALSE]
DEG_list <- filterDEGs(degDF=degseqDF, filter=c(Fold=2, FDR=10))
names(DEG_list)
DEG_list$Summary
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
cmp <- readComp(file=targetspath, format="matrix", delim="-")
countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR")
countDF <- read.delim(countfile, row.names=1)
degseqDF <- run_DESeq2(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE)
pval <- degseqDF[, grep("_FDR$", colnames(degseqDF)), drop=FALSE]
fold <- degseqDF[, grep("_logFC$", colnames(degseqDF)), drop=FALSE]
DEG_list <- filterDEGs(degDF=degseqDF, filter=c(Fold=2, FDR=10))
names(DEG_list)
DEG_list$Summary

Runs edgeR

Description

Convenience wrapper function to identify differentially expressed genes (DEGs) in batch mode with the edgeR GML method for any number of pairwise sample comparisons specified under the cmp argument. Users are strongly encouraged to consult the edgeR vignette for more detailed information on this topic and how to properly run edgeR on data sets with more complex experimental designs.

Usage

run_edgeR(countDF, targets, cmp, independent = TRUE, paired = NULL, mdsplot = "")
run_edgeR(countDF, targets, cmp, independent = TRUE, paired = NULL, mdsplot = "")

Arguments

`countDF`	`date.frame` containing raw read counts
`targets`	targets `data.frame`
`cmp`	`character matrix` where comparisons are defined in two columns. This matrix should be generated with `readComp()` from the targets file. Values used for comparisons need to match those in the `Factor` column of the targets file.
`independent`	If `independent=TRUE` then the `countDF` will be subsetted for each comparison. This behavior can be useful when working with samples from unrelated studies. For samples from the same or comparable studies, the setting `independent=FALSE` is usually preferred.
`paired`	Defines pairs (`character vector`) for paired analysis. Default is unpaired (`paired=NULL`).
`mdsplot`	Directory where `plotMDS` should be written to. Default setting `mdsplot=""` will omit the plotting step.

Value

data.frame containing edgeR results from all comparisons. Comparison labels are appended to column titles for tracking.

Author(s)

Thomas Girke

References

Please properly cite the edgeR papers when using this function: http://www.bioconductor.org/packages/devel/bioc/html/edgeR.html

Examples

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
cmp <- readComp(file=targetspath, format="matrix", delim="-")
countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR")
countDF <- read.delim(countfile, row.names=1)
edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="")
pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE]
fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE]
DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10))
names(DEG_list)
DEG_list$Summary
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
cmp <- readComp(file=targetspath, format="matrix", delim="-")
countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR")
countDF <- read.delim(countfile, row.names=1)
edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="")
pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE]
fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE]
DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10))
names(DEG_list)
DEG_list$Summary

Execute SYSargs and SYSargs2

Description

Function to execute system parameters specified in SYSargs and SYSargs2 object.

Usage

runCommandline(args, runid = "01", make_bam = FALSE, del_sam=TRUE, dir = TRUE, 
                dir.name = NULL, force=FALSE, input_targets = NULL, ...)
runCommandline(args, runid = "01", make_bam = FALSE, del_sam=TRUE, dir = TRUE, 
                dir.name = NULL, force=FALSE, input_targets = NULL, ...)

Arguments

`args`	object of class `SYSargs` or `SYSargs2`.
`runid`	Run identifier used for log file to track system call commands. Default is `"01"`.
`make_bam`	Auto-detects SAM file outputs and converts them to sorted and indexed BAM files. Default is `make_bam=FALSE`.
`del_sam`	This option allows deleting the SAM files created when the `make_BAM` converts the SAM files to sorted and indexed BAM files. Default is `del_sam=TRUE`.
`dir`	This option allows creating an exclusive results folder for each step in the workflow and a sub-folder for each sample defined in the `targets` file. All the outputs and log files for the particular step will be created in the respective folders. Default is `dir=TRUE`. Option available only for an object of class `SYSargs2`.
`dir.name`	Name of the workflow directory. Default is `dir.name=FALSE`. Note: This argument is required when the `dir=TRUE`.
`force`	Internally, the function checks if the expected `output` files exist, and it skips the command lines when the respective files exist. If the argument `force` is set to `TRUE`, the command line will be executed and the files overwrite. Default is `force=FALSE`.
`input_targets`	This option allows selecting which targets file and, by consequence which command line will be executed. Default is `NULL`, in which all command lines will be executed.
`...`	Additional arguments to pass on to `runCommandline()`.

Value

Output files, their paths can be obtained with outpaths() from SYSargs container or output() from SYSargs2. In addition, a character vector is returned containing the same paths.

Author(s)

Daniela Cassol and Thomas Girke

Examples

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on single machine
WF <- runCommandline(args=WF)

## Execute SYSargs on multiple machines of a compute cluster. 
file.copy(system.file("extdata", ".batchtools.conf.R", 
            package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", 
            package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, 
        more.args = list(args = WF, make_bam = TRUE), 
        conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl",
        Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)

## Updates the path in the object \code{output(WF)}
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

## Alignment stats
read_statsDF <- alignStats(WF) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls",
            row.names=FALSE, quote=FALSE, sep="\t")

## End(Not run)

#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Execute SYSargs on multiple machines of a compute cluster. 
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, FUN = runCommandline, conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl", Njobs=18, 
                    runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))

## Alignment stats
read_statsDF <- alignStats(args) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, 
            quote=FALSE, sep="\t")

## End(Not run)
##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on single machine
WF <- runCommandline(args=WF)

## Execute SYSargs on multiple machines of a compute cluster. 
file.copy(system.file("extdata", ".batchtools.conf.R", 
            package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", 
            package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, 
        more.args = list(args = WF, make_bam = TRUE), 
        conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl",
        Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)

## Updates the path in the object \code{output(WF)}
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

## Alignment stats
read_statsDF <- alignStats(WF) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls",
            row.names=FALSE, quote=FALSE, sep="\t")

## End(Not run)

#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Execute SYSargs on multiple machines of a compute cluster. 
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, FUN = runCommandline, conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl", Njobs=18, 
                    runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))

## Alignment stats
read_statsDF <- alignStats(args) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, 
            quote=FALSE, sep="\t")

## End(Not run)

Differential abundance analysis for many range sets

Description

Convenience wrapper function for run_edgeR and run_DESeq2 to perform differential expression or abundance analysis iteratively for several count tables. The latter can be peak calling results for several samples or counts generated for different genomic feature types. The function also returns the filtering results and plots from filterDEGs.

Usage

runDiff(args, outfiles=NULL, diffFct, targets, cmp, dbrfilter, ...)
runDiff(args, outfiles=NULL, diffFct, targets, cmp, dbrfilter, ...)

Arguments

`args`	An instance of `SYSargs` or `SYSargs2` constructed from a `targets` file where the first column (`targetsin(args)` or `targets.as.df(targets(args))`) contains the paths to the tabular read count data files. Another possibily is `named character vector` with the paths to the tabular range data files and the elements names should be the sampleID.
`outfiles`	Default is `NULL`. When `args` is an object of `named character vector` class, `outfile` argument is required. `Named character vector` with the paths to the resulting count tables and the elements names should be the sampleID.
`diffFct`	Defines which function should be used for the differential abundance analysis. Can be `diffFct=run_edgeR` or `diffFct=run_DESeq2`.
`targets`	targets `data.frame`
`cmp`	`character matrix` where comparisons are defined in two columns. This matrix should be generated with `readComp()` from the targets file. Values used for comparisons need to match those in the `Factor` column of the targets file.
`dbrfilter`	Named vector with filter cutoffs of format `c(Fold=2, FDR=1)` where `Fold` refers to the fold change cutoff (unlogged) and `FDR` to the p-value cutoff. Those values are passed on to the `filterDEGs` function.
`...`	Arguments to be passed on to the internally used `run_edgeR` or `run_DESeq2` function.

Value

Returns list containing the filterDEGs results for each count table. Each result set is a list with four components which are described under ?filterDEGs. The result files contain the edgeR or DESeq2 results from the comparisons specified under cmp. The base names of the result files are the same as the corresponding input files specified under countfiles and the value of extension appended.

Author(s)

Thomas Girke

Examples

## Paths to BAM files
param <- system.file("extdata", "bowtieSE.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args_bam <- systemArgs(sysma=param, mytargets=targets)
bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character())

## Not run: 
## SYSargs with paths to range data and count files
args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt")

## Iterative read counting
countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE)
writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE)

## Run differential abundance analysis
cmp <- readComp(file=args_bam, format="matrix")
args_diff <- systemArgs(sysma="param/rundiff.param", mytargets="targets_countDF.txt")
dbrlist <- runDiff(args, diffFct=run_edgeR, targets=targetsin(args_bam), cmp=cmp[[1]], independent=TRUE, dbrfilter=c(Fold=2, FDR=1))
writeTargetsout(x=args_diff, file="targets_rundiff.txt", overwrite=TRUE)

## End(Not run)
## Paths to BAM files
param <- system.file("extdata", "bowtieSE.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args_bam <- systemArgs(sysma=param, mytargets=targets)
bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character())

## Not run: 
## SYSargs with paths to range data and count files
args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt")

## Iterative read counting
countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE)
writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE)

## Run differential abundance analysis
cmp <- readComp(file=args_bam, format="matrix")
args_diff <- systemArgs(sysma="param/rundiff.param", mytargets="targets_countDF.txt")
dbrlist <- runDiff(args, diffFct=run_edgeR, targets=targetsin(args_bam), cmp=cmp[[1]], independent=TRUE, dbrfilter=c(Fold=2, FDR=1))
writeTargetsout(x=args_diff, file="targets_rundiff.txt", overwrite=TRUE)

## End(Not run)

Execute SYSargsList

Description

Function to execute all the code list specified in SYSargsList object.

Usage

runWF(sysargs, steps = NULL, targets = NULL,
                  force = FALSE, saveEnv = TRUE,
                  run_step = "ALL", ignore.dep = FALSE,
                  warning.stop = FALSE, error.stop = TRUE, silent = FALSE, ...)
runWF(sysargs, steps = NULL, targets = NULL,
                  force = FALSE, saveEnv = TRUE,
                  run_step = "ALL", ignore.dep = FALSE,
                  warning.stop = FALSE, error.stop = TRUE, silent = FALSE, ...)

Arguments

`sysargs`	object of class `SYSargsList`.
`steps`	character or numeric. Step name or index. If `NULL`, all the steps will be executed.
`targets`	This option allows selecting which targets file and, by consequence which command line will be executed for each `SYSargs2` class step. Default is `NULL`, in which all command lines will be executed.
`force`	Internally, the option checks if the expected `output` files exist, and it skips the command lines when the respective files exist. If the argument `force` is set to `TRUE`, the command line will be executed and the files overwrite. Default is `force=FALSE`.
`saveEnv`	If set to `TRUE`, the environment will be saved to an RDS file. To check the RDS file location, please use `projectInfo(sysargs)[["envir"]]`.
`run_step`	character. If the step has "mandatory" or "optional" flag for the execution. When `ALL`, all the steps will be executed.
`ignore.dep`	logical. This option allow to igonore the dependency tree, when `TRUE`.
`warning.stop`	If set to `TRUE`, the process will be interrupted when a warning is detected.
`error.stop`	If set to `TRUE`, the process will be interrupted when a error is detected.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.
`...`	Additional arguments to pass on from `runCommandline()`.

Value

It will return an SYSargsList updated.

Author(s)

Daniela Cassol and Thomas Girke

Examples

## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
## Not run: 
sal <- runWF(sal)

## End(Not run)
## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
## Not run: 
sal <- runWF(sal)

## End(Not run)

Translate SYSargsList back to a bash workflow

Description

This function takes a SYSargsList object and translate it to an executable bash script, so one can run the workflow without loading SPR or using an R console.

Usage

sal2bash(sal, out_dir = ".", bash_path = "/bin/bash", stop_on_error = TRUE)
sal2bash(sal, out_dir = ".", bash_path = "/bin/bash", stop_on_error = TRUE)

Arguments

`sal`	`SYSargsList` object.
`out_dir`	string, a relative or absolute path to a directory. If the directory does not exist, this function will first try to create it.
`bash_path`	string, the path to the bash executable program
`stop_on_error`	bool, should the bash script stop if any error happens in running

Details

## out files

1. The main executable bash file will be created to the root of 'out_dir'

2. All R steps will be stored as R scripts and along with other supporting files inside a folder called 'spr_bash' under 'out_dir'

3. Not all R steps will have an individual file. This function will "collapse" adjacent R steps into one file as much as possible. Namely, if there is no sysArgs steps in between, R steps will be merged into one file, otherwise they will be in different files.

## R steps

Similarly as running the workflow in R console, all R steps will share the same environment variables and loaded packages. This is done by loading and saving the R environment into a file 'spr_wf.RData' before and after the R script execution. Therefore, it will be good to keep all R steps bundle together as much as possible to avoid the package and environment loading/saving overhead time.

Initially, this environment only contains the SYSargsList object that was used to create the bash script. Note: the SYSargsList object name will be the same as what you pass to 'sal2bash'. If you have 'sal2bash(my_sal)', then in the 'spr_wf.RData' there will be an object called 'my_sal' saved there. It is important to keep using the same name for the SYSargsList object managing the workflow and in workflow running. For example, if you have an R step that requires to query a column from the outfiles of the SAL, 'getColumn(sal, "FileName1")', but you pass 'my_sal' to 'sal2bash(my_sal)', this will cause this R step cannot find the SAL object when run the workflow from bash.

## Execution

This way of execution is not able to handle complex dependency graphs. The original step dependencies from SAL object will be ignored, so all steps will be executed in a linear manner. It is recommended to adjust the workflow order before using this function.

Value

no return

Author(s)

Le Zhang and Daniela Cassol

Examples

file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
sal <- importWF(sal, file_path)
sal2bash(sal)
file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
sal <- importWF(sal, file_path)
sal2bash(sal)

Translate SYSargsList back to a workflow template Rmarkdown file

Description

This function takes a SYSargsList object and translate it to SPR workflow template Rmarkdown format.

Usage

sal2rmd(sal, out_path = "spr_template.Rmd", rmd_title = "SPR workflow template", 
        rmd_author = "my name", 
        rmd_date = "Last update: `r format(Sys.time(), '%d %B, %Y')`", 
        rmd_output = "html_document", 
        desc = "This is a workflow template.", verbose = TRUE)
sal2rmd(sal, out_path = "spr_template.Rmd", rmd_title = "SPR workflow template", 
        rmd_author = "my name", 
        rmd_date = "Last update: `r format(Sys.time(), '%d %B, %Y')`", 
        rmd_output = "html_document", 
        desc = "This is a workflow template.", verbose = TRUE)

Arguments

`sal`	`SYSargsList` object.
`out_path`	string, output file name.
`rmd_title`	string, title of the Rmd.
`rmd_author`	string, author(s) of the Rmd, put all authors in a single character string.
`rmd_date`	string, date header of Rmd.
`rmd_output`	string, output format of Rmd, used in header.
`desc`	string, or character vector of strings, some description text in format Rmarkdown that will be added to the document before the workflow steps start. It can be a single line or multiple lines by providing a character vector, each item is one line.
`verbose`	logical. If `TRUE` will show you more information as the function runs.

Value

no return

Author(s)

Le Zhang and Daniela Cassol

Examples

file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
sal <- importWF(sal, file_path)
sal2rmd(sal)
file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR")
sal <- SPRproject(overwrite = TRUE)
sal <- importWF(sal, file_path)
sal2rmd(sal)

Scale spliced ranges to genome coordinates

Description

Function to scale mappings of spliced features (query ranges) to their corresponding genome coordinates (subject ranges). The method accounts for introns in the subject ranges that are absent in the query ranges. A use case example are uORFs predicted in the 5' UTRs sequences using predORF. These query ranges are given relative to the 5' UTR sequence. The scaleRanges function will scale them to the corresponding genome coordinates. This way they can be used in RNA-Seq expression experiments like other gene ranges.

Usage

scaleRanges(subject, query, type = "custom", verbose = TRUE)
scaleRanges(subject, query, type = "custom", verbose = TRUE)

Arguments

`subject`	Genomic ranges provided as `GRangesList` object. Their name and length requirements are described under `query`.
`query`	Feature level ranges provided as `GRangesList` object. The names of the query ranges need to match the names of the `GRangesList` object assigened to the `subject` argument. In addition, the length of each `query` range cannot exceed the total length of the corresponding `subject` range set.
`type`	Feature name to use in `type` column of `GRangesList` result.
`verbose`	The setting `verbose=FALSE` suppresses all print messages.

Value

Object of class GRangesList

Author(s)

Thomas Girke

Examples

library(IRanges)
## Usage for simple example
subject <- GRanges(seqnames="Chr1", IRanges(c(5,15,30),c(10,25,40)), strand="+") 
query <- GRanges(seqnames="myseq", IRanges(1, 9), strand="+")
scaleRanges(GRangesList(myid1=subject), GRangesList(myid1=query), type="test")

## Not run: 
## Usage for more complex example
library(txdbmaker); library(systemPipeRdata)
gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis")
futr <- fiveUTRsByTranscript(txdb, use.names=TRUE)
genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata")
dna <- extractTranscriptSeqs(FaFile(genome), futr)
uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense")
grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE)
export.gff3(unlist(grl_scaled), "uorf.gff")

## End(Not run)
library(IRanges)
## Usage for simple example
subject <- GRanges(seqnames="Chr1", IRanges(c(5,15,30),c(10,25,40)), strand="+") 
query <- GRanges(seqnames="myseq", IRanges(1, 9), strand="+")
scaleRanges(GRangesList(myid1=subject), GRangesList(myid1=query), type="test")

## Not run: 
## Usage for more complex example
library(txdbmaker); library(systemPipeRdata)
gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis")
futr <- fiveUTRsByTranscript(txdb, use.names=TRUE)
genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata")
dna <- extractTranscriptSeqs(FaFile(genome), futr)
uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense")
grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE)
export.gff3(unlist(grl_scaled), "uorf.gff")

## End(Not run)

Quality reports for FASTQ files

Description

The following seeFastq and seeFastqPlot functions generate and plot a series of useful quality statistics for a set of FASTQ files including per cycle quality box plots, base proportions, base-level quality trends, relative k-mer diversity, length and occurrence distribution of reads, number of reads above quality cutoffs and mean quality distribution. The functions allow processing of reads with variable length, but most plots are only meaningful if the read positions in the FASTQ file are aligned with the sequencing cycles. For instance, constant length clipping of the reads on either end or variable length clipping on the 3' end maintains this relationship, while variable length clipping on the 5' end without reversing the reads erases it.

The function seeFastq computes the summary stats and stores them in a relatively small list object that can be saved to disk with save() and reloaded with load() for later plotting. The argument 'klength' specifies the k-mer length and 'batchsize' the number of reads to random sample from each fastq file.

Usage

seeFastq(fastq, batchsize, klength = 8)

seeFastqPlot(fqlist, arrange = c(1, 2, 3, 4, 5, 8, 6, 7), ...)
seeFastq(fastq, batchsize, klength = 8)

seeFastqPlot(fqlist, arrange = c(1, 2, 3, 4, 5, 8, 6, 7), ...)

Arguments

`fastq`	Named character vector containing paths to FASTQ file in the data fields and sample labels in the name slots.
`batchsize`	Number of reads to random sample from each FASTQ file that will be considered in the QC analysis. Smaller numbers reduce the memory footprint and compute time.
`klength`	Specifies the k-mer length in the plot for the relative k-mer diversity.
`fqlist`	`list` object returned by `seeFastq()`.
`arrange`	Integer vector from 1 to 7 specifying the row order of the QC plot. Dropping numbers eliminates the corresponding plots.
`...`	Additional plotting arguments to pass on to `seeFastqPlot()`.

Value

The function seeFastq returns the summary stats in a list containing all information required for the quality plots. The function seeFastqPlot plots the information generated by seeFastq using ggplot2.

Author(s)

Thomas Girke

Examples

## Not run: 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
fqlist <- seeFastq(fastq=infile1(args), batchsize=10000, klength=8)
pdf("fastqReport.pdf", height=18, width=4*length(fastq))
seeFastqPlot(fqlist)
dev.off()


## End(Not run)
## Not run: 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
fqlist <- seeFastq(fastq=infile1(args), batchsize=10000, klength=8)
pdf("fastqReport.pdf", height=18, width=4*length(fastq))
seeFastqPlot(fqlist)
dev.off()


## End(Not run)

Create an HTML table using DT package with fixed columns

Description

Create an HTML table using DT package with fixed columns

Usage

showDF(data, ...)
showDF(data, ...)

Arguments

`data`	data object (either a matrix or a data frame).
`...`	Additional arguments used by dDT::atatable() function.

Value

returns an object of datatables and htmlwidget.

Examples

showDF(iris)
showDF(iris)

Workflow Project Initiation

Description

Function to construct SYSargsList workflow control environment (S4 object). This function creates and checks the directory structure. If the expected directories are not available, it is possible to create those. The project directory default structure expected is:

SPRproject/
- data/
- param/
- results/

The project working directory names can be modified, but users need to edit the code accordingly.

Usage

SPRproject(projPath = getwd(), data = "data", param = "param", results = "results",
                      logs.dir= ".SPRproject", sys.file="SYSargsList.yml", 
                      envir = new.env(), 
                      restart = FALSE, resume=FALSE, 
                      load.envir = FALSE,
                      overwrite = FALSE, silent = FALSE)
SPRproject(projPath = getwd(), data = "data", param = "param", results = "results",
                      logs.dir= ".SPRproject", sys.file="SYSargsList.yml", 
                      envir = new.env(), 
                      restart = FALSE, resume=FALSE, 
                      load.envir = FALSE,
                      overwrite = FALSE, silent = FALSE)

Arguments

`projPath`	a character vector of a full project path name. Default is the current path.
`data`	a character vector of a `data` directory name. Default is `data`. This subdirectory in the project stores all the raw data, reference, and annotation files.
`param`	a character vector of a `param` directory name. Default is `param`. This subdirectory in the project stores all the parameter and configuration files.
`results`	a character vector of a `results` directory name. Default is `results`. This subdirectory in the project stores all the analysis results, including but not limited: alignment, variant, and peak files (BAM, VCF, BED); tabular result files; and image/plot files.
`logs.dir`	a character vector of a `logs` directory name. Default is `.SPRproject`.
`sys.file`	a character vector of `SYSargsList` file name which will store all the project configuration information. Default is `SYSargsList.yml` and it will be save in the `logs.dir` folder.
`envir`	the environment in which workflow will be evaluated. Default will create a `new.env()`.
`restart`	if set to `TRUE`, existing `SYSargsList` object saved in the `logs.dir` directory will be used to restart the workflow. This option will delete all the log files.
`resume`	if set to `TRUE`, existing `SYSargsList` object saved in the `logs.dir` directory will be used to resume the workflow.
`load.envir`	This argument allows to load the environment and recover all the objects saved during the workflow execution (please check `runWF` function for more details). This argument can be set as `TRUE` when `restart = TRUE` or `resume = TRUE` are used.
`overwrite`	if set to `TRUE`, existing `logs.dir` directory and all the content, as logs files and `SYSargsList` file will be removed, and a new and empty `SYSargsList` object will be created. This option should be used with caution!
`silent`	if set to TRUE, all messages returned by the function will be suppressed.

Details

If an SYSargsList instance was created before or independent of the project initialization, it is possible to append this instance after the project is created. Please see check appendStep<- function.

Value

SPRproject will return a SYSargsList object.

Author(s)

Daniela Cassol

Examples

sal <- SPRproject(projPath = tempdir())
sal
sal <- SPRproject(projPath = tempdir())
sal

Subsetting SYSargs2 class slots

Description

Return subsets of character for the input, output or the list of command-line for each workflow step.

Usage

subsetWF(args, slot, subset=NULL, index=NULL, delete=FALSE)
subsetWF(args, slot, subset=NULL, index=NULL, delete=FALSE)

Arguments

`args`	object of class `SYSargs2`.
`slot`	three options available: `type="input"` returns input slot from `SYSargs2` object; `type="output"` returns output slot from `SYSargs2` object; and `type="step"` returns all the command-line for each workflow step from `SYSargs2` object.
`subset`	name or numeric position of the values to be subsetting in the `slot`. If `slot="input"`, the `subset` are the variables defined in the `param.yml` file. If `slot="step"`, the `subset` is the command line defined on the `SYSargs2` object for all the steps of the workflow. If `slot="output"`, the `subset` is the path for the expected output files for all the steps in the workflow. Default is `subset=NULL`
`index`	A numeric index positions of the file in `SYSargs2` object, slot `output`. It requires a `subset` to be defined. Default is `index=NULL`.
`delete`	allows to delete a subset of files in the case of `slot="output"`. Default is `delete=NULL`.

Author(s)

Daniela Cassol and Thomas Girke

Examples

## Construct SYSargs2 object 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF

## Testing subset_wf function
input <- subsetWF(WF, slot="input", subset='FileName')
output <- subsetWF(WF, slot="output", subset=1, index=1)
step.cmd <- subsetWF(WF, slot="step", subset=1) ## subset all the HISAT2 commandline 
# subsetWF(WF, slot="output", subset=1, index=1, delete=TRUE) ## in order to delete the subset files list
## Construct SYSargs2 object 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF

## Testing subset_wf function
input <- subsetWF(WF, slot="input", subset='FileName')
output <- subsetWF(WF, slot="output", subset=1, index=1)
step.cmd <- subsetWF(WF, slot="step", subset=1) ## subset all the HISAT2 commandline 
# subsetWF(WF, slot="output", subset=1, index=1, delete=TRUE) ## in order to delete the subset files list

Symbolic links for IGV

Description

Function for creating symbolic links to view BAM files in a genome browser such as IGV.

Usage

symLink2bam(sysargs, command="ln -s", htmldir, ext = c(".bam", ".bai"), urlbase, urlfile)
symLink2bam(sysargs, command="ln -s", htmldir, ext = c(".bam", ".bai"), urlbase, urlfile)

Arguments

`sysargs`	Object of class `SYSargs` or `SYSargs2` or `named character vector` with BAM files PATH and the elements names should be the sampleID.
`command`	Shell command, defaults to "ln -s"
`htmldir`	Path to HTML directory with http access.
`ext`	File name extensions to use for BAM and index files.
`urlbase`	The base URL structure to use in URL file.
`urlfile`	Name and path of URL file.

Value

symbolic links and url file

Author(s)

Thomas Girke

Examples

## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)

## Create sym links and URL file for IGV
symLink2bam(sysargs=args, command="ln -s", htmldir=c("~/.html/", "somedir/"), ext=c(".bam", ".bai"), urlbase="http://myserver.edu/~username/", urlfile="IGVurl.txt")

## End(Not run)
## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)

## Create sym links and URL file for IGV
symLink2bam(sysargs=args, command="ln -s", htmldir=c("~/.html/", "somedir/"), ext=c(".bam", ".bai"), urlbase="http://myserver.edu/~username/", urlfile="IGVurl.txt")

## End(Not run)

SYSargs accessor methods

Description

Methods to access information from SYSargs object.

Usage

	sysargs(x)
sysargs(x)

Arguments

`x`	object of class `SYSargs`

Value

various outputs

Author(s)

Thomas Girke

Examples

## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

Class `"SYSargs"`

Description

S4 class container for storing parameters of command-line- or R-based software. SYSargs instances are constructed by the systemArgs function from two simple tabular files: a targets file and a param file. The latter is optional for workflow steps lacking command-line software. Typically, a SYSargs instance stores all sample-level inputs as well as the paths to the corresponding outputs generated by command-line- or R-based software generating sample-level output files. Each sample level input/outfile operation uses its own SYSargs instance. The outpaths of SYSargs usually define the sample inputs for the next SYSargs instance. This connectivity is achieved by writing the outpaths with the writeTargetsout function to a new targets file that serves as input to the next systemArgs call. By chaining several SYSargs steps together one can construct complex workflows involving many sample-level input/output file operations with any combination of command-line or R-based software.

Objects from the Class

Objects can be created by calls of the form new("SYSargs", ...).

Slots

targetsin:: Object of class "data.frame" storing tabular data from targets input file
targetsout:: Object of class "data.frame" storing tabular data from targets output file
targetsheader:: Object of class "character" storing header/comment lines of targets file
modules:: Object of class "character" storing software versions from module system
software:: Object of class "character" name of executable of command-line software
cores:: Object of class "numeric" number of CPU cores to use
other:: Object of class "character" additional arguments
reference:: Object of class "character" path to reference genome file
results:: Object of class "character" path to results directory
infile1:: Object of class "character" paths to first FASTQ file
infile2:: Object of class "character" paths to second FASTQ file if data is PE
outfile1:: Object of class "character" paths to output files generated by command-line software
sysargs:: Object of class "character" full commands used to execute external software
outpaths:: Object of class "character" paths to final outputs including postprocessing by Rsamtools

Methods

SampleName: signature(x = "SYSargs"): extracts sample names
[: signature(x = "SYSargs"): subsetting of class with bracket operator
coerce: signature(from = "list", to = "SYSargs"): as(list, "SYSargs")
cores: signature(x = "SYSargs"): extracts data from cores slot
infile1: signature(x = "SYSargs"): extracts data from infile1 slot
infile2: signature(x = "SYSargs"): extracts data from infile2 slot
modules: signature(x = "SYSargs"): extracts data from modules slot
names: signature(x = "SYSargs"): extracts slot names
length: signature(x = "SYSargs"): extracts number of samples
other: signature(x = "SYSargs"): extracts data from other slot
outfile1: signature(x = "SYSargs"): extracts data from outfile1 slot
outpaths: signature(x = "SYSargs"): extracts data from outpath slot
reference: signature(x = "SYSargs"): extracts data from reference slot
results: signature(x = "SYSargs"): extracts data from results slot
show: signature(object = "SYSargs"): summary view of SYSargs objects
software: signature(x = "SYSargs"): extracts data from software slot
targetsheader: signature(x = "SYSargs"): extracts data from targetsheader slot
targetsin: signature(x = "SYSargs"): extracts data from targetsin slot
targetsout: signature(x = "SYSargs"): extracts data from targetsout slot

Author(s)

Thomas Girke

Examples

showClass("SYSargs")
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); targetsin(args); targetsout(args); targetsheader(args); 
software(args); modules(args); cores(args); outpaths(args) 
sysargs(args); other(args); reference(args); results(args); infile1(args)
infile2(args); outfile1(args); SampleName(args)

## Return sample comparisons
readComp(args, format = "vector", delim = "-")

## The subsetting operator '[' allows to select specific samples
args[1:4]

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Execute SYSargs on multiple machines  
qsubargs <- getQsubargs(queue="batch", Nnodes="nodes=1", cores=cores(args), 
                        memory="mem=10gb", time="walltime=20:00:00")
qsubRun(appfct="runCommandline(args=args)", appargs=args, qsubargs=qsubargs, 
        Nqsubs=1, submitdir="results", package="systemPipeR")

## Write outpaths to new targets file for next SYSargs step
writeTargetsout(x=args, file="default")

## End(Not run)
showClass("SYSargs")
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); targetsin(args); targetsout(args); targetsheader(args); 
software(args); modules(args); cores(args); outpaths(args) 
sysargs(args); other(args); reference(args); results(args); infile1(args)
infile2(args); outfile1(args); SampleName(args)

## Return sample comparisons
readComp(args, format = "vector", delim = "-")

## The subsetting operator '[' allows to select specific samples
args[1:4]

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Execute SYSargs on multiple machines  
qsubargs <- getQsubargs(queue="batch", Nnodes="nodes=1", cores=cores(args), 
                        memory="mem=10gb", time="walltime=20:00:00")
qsubRun(appfct="runCommandline(args=args)", appargs=args, qsubargs=qsubargs, 
        Nqsubs=1, submitdir="results", package="systemPipeR")

## Write outpaths to new targets file for next SYSargs step
writeTargetsout(x=args, file="default")

## End(Not run)

Class `"SYSargs2"`

Description

SYSargs2 class stores all the information and instructions needed for processing a set of input files with a specific command-line or a series of command-line within a workflow. The SYSargs2 S4 class object is created from the loadWF and renderWF function, which populates all the command-line for each sample in each step of the particular workflow. Each sample level input/outfile operation uses its own SYSargs2 instance. The output of SYSargs2 define all the expected output files for each step in the workflow, which usually it is the sample input for the next step in an SYSargs2 instance. By chaining several SYSargs2 steps together one can construct complex workflows involving many sample-level input/output file operations with any combination of command-line or R-based software.

Objects from the Class

Objects can be created by calls of the form new("SYSargs2", ...).

Slots

targets:: Object of class "list" storing data from each sample from targets file
targetsheader:: Object of class "list" storing header/comment lines of targets file
modules:: Object of class "list" storing software versions from module system
wf:: Object of class "list" storing data from Workflow CWL parameters file
clt:: Object of class "list" storing data from each CommandLineTool substep in the Workflow or the single CommandLineTool CWL parameters file
yamlinput:: Object of class "list" storing data from input (*.yml) file
cmdlist:: Object of class "list" storing all command-line used to execute external software
input:: Object of class "list" storing data from each target defined in inputvars
output:: Object of class "list" paths to final outputs files
files:: Object of class "list" paths to input and CWL parameters files
inputvars:: Object of class "list" storing data from each inputvars
cmdToCwl:: Object of class "list" storing data from each cmdToCwl
status:: Object of class "list" storing data from each status
internal_outfiles:: Object of class "list" storing raw data from each output

Methods

[: Subsetting of class with bracket operator.
[[: Subsetting of class with bracket operator.
[[<-: Replacement method for "SYSargs2" class.
$: Extracting slots elements by name.
length: Extracts number of samples.
names: Extracts slot names.
show: Summary view of SYSargs2 objects.
coerce: signature(from = "list", to = "SYSargs2"): as(list, "SYSargs2")
coerce: signature(from = "SYSargs2", to = "list")as(SYSargs2, "list")
coerce: signature(from = "SYSargs2", to = "DataFrame"): as(x, "DataFrame"); for targets slot.
sysargs2: Coerce back to list as(SYSargs2, "list")
targets: Extract data from targets slot.
targetsheader: Extracts data from targetsheader slot.
modules: Extracts data from modules slot.
wf: Extracts data from wf slot.
clt: Extracts data from clt slot.
yamlinput: Extracts data from yamlinput slot.
cmdlist: Extracts data from cmdlist slot.
input: Extracts data from input slot.
output: Extracts data from cmdlist slot.
files: Extracts data from files slot.
inputvars: Extracts data from inputvars slot.
cmdToCwl: Extracts data from cmdToCwl slot.
status: Extracts data from status slot.
infile1: extracting paths to first FASTQ file.
infile2: extracting paths to second FASTQ file if data is PE.
baseCommand: Extracts baseCommand from command-line used to execute external software.
SampleName: Extracts all samples names.
yamlinput<-: Replacement method for yamlinput slot input.

Author(s)

Daniela Cassol and Thomas Girke

Examples

showClass("SYSargs2")

## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targetspath, 
                  wf_file="example/workflow_example.cwl", 
                  input_file="example/example.yml", 
                  dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(Message = "_STRING_", SampleName = "_SAMPLE_"))
WF

## Methods
names(WF)
length(WF)
baseCommand(WF)
SampleName(WF)

## Accessors
targets(WF)
targetsheader(WF)
modules(WF)
yamlinput(WF)
cmdlist(WF)
input(WF)
output(WF)
files(WF)
inputvars(WF)
cmdToCwl(WF)
status(WF)

## The subsetting operator '[' allows to select specific command-line/sample
WF2 <- WF[1:2]

## Not run: 
## Execute SYSargs2 on single machine
WF2 <- runCommandline(WF2)

## End(Not run)
## Not run: 
## Execute SYSargs2 on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers.  
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, conffile=".batchtools.conf.R", 
                  template="batchtools.slurm.tmpl", Njobs=2, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)


## End(Not run)
showClass("SYSargs2")

## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targetspath, 
                  wf_file="example/workflow_example.cwl", 
                  input_file="example/example.yml", 
                  dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(Message = "_STRING_", SampleName = "_SAMPLE_"))
WF

## Methods
names(WF)
length(WF)
baseCommand(WF)
SampleName(WF)

## Accessors
targets(WF)
targetsheader(WF)
modules(WF)
yamlinput(WF)
cmdlist(WF)
input(WF)
output(WF)
files(WF)
inputvars(WF)
cmdToCwl(WF)
status(WF)

## The subsetting operator '[' allows to select specific command-line/sample
WF2 <- WF[1:2]

## Not run: 
## Execute SYSargs2 on single machine
WF2 <- runCommandline(WF2)

## End(Not run)
## Not run: 
## Execute SYSargs2 on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers.  
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, conffile=".batchtools.conf.R", 
                  template="batchtools.slurm.tmpl", Njobs=2, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)


## End(Not run)

SYSargsList Constructor

Description

SYSargsList instances are constructed by the SYSargsList function.

Usage

SYSargsList(sysargs = NULL, step_name = "default", 
            targets = NULL, wf_file = NULL, input_file = NULL, dir_path = ".", 
            id = "SampleName",
            inputvars = NULL, rm_targets_col = NULL, 
            dir = TRUE, dependency = NA, 
            run_step = "mandatory",
            run_session = "management",
            run_remote_resources = NULL,
            silent = FALSE, 
            projPath = getOption("projPath", getwd()))
SYSargsList(sysargs = NULL, step_name = "default", 
            targets = NULL, wf_file = NULL, input_file = NULL, dir_path = ".", 
            id = "SampleName",
            inputvars = NULL, rm_targets_col = NULL, 
            dir = TRUE, dependency = NA, 
            run_step = "mandatory",
            run_session = "management",
            run_remote_resources = NULL,
            silent = FALSE, 
            projPath = getOption("projPath", getwd()))

Arguments

`sysargs`	`SYSargs2` object. If the object already exists, it can be used to construct the `SYSargsList` object.
`step_name`	character with the step index name.
`targets`	the path to `targets` file. The `targets` file can be either a simple tabular or yml file. Also, it is possible to assign `NULL` to run the pipeline without the 'targets' file. This can be useful for running specific workflows that do not require input files.
`wf_file`	name and path to `CWL parameters` file.
`input_file`	name and path to `input parameters` file.
`dir_path`	full path to the directory with the `CWL parameters` and `input` files.
`id`	A column from `targets` file, which will be used as an id for each one of the samples. It is required to be unique.
`inputvars`	Each vector element is required to be defined in the `input` file, and the names of the elements are needed to match the column names defined in the `targets` file.
`rm_targets_col`	targets file colunms to be removed.
`dir`	This option allows creating an exclusive results folder for each step in the workflow. All the outfiles and log files for the particular step will be created in the respective folders. Default is `dir=TRUE`.
`dependency`	character. Dependency tree, required when appending this step to the workflow. Character name of a previous step in the workflow. Default is `NA`.
`run_step`	character. If the step has "mandatory" or "optional" flag for the execution.
`run_session`	character. If the step has "management" or "compute" flag for the execution.
`run_remote_resources`	`List` for reserving for each cluster job sufficient computing resources including memory (Megabyte), number of nodes (Int), CPU cores, walltime (Minutes), etc. It is necessary two additional files: `conffile` and `template`. `conffile` is the path to configuration file (default location `./.batchtools.conf.R`). This file contains in its simplest form just one command, such as this line for the scheduler: `cluster.functions <- makeClusterFunctionsSlurm(template="batchtools.slurm.tmpl")`. For more detailed information visit this page: https://mllg.github.io/batchtools/index.html `template` The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/mllg/batchtools/tree/master/inst/templates. Slurm, PBS/Torque, and Sun Grid Engine (SGE) templates are provided within the package demo data.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.
`projPath`	a character vector of a full project path name. Default is the current path.

Author(s)

Daniela Cassol

Examples

sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", 
                            package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", 
                               input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", 
                               package="systemPipeR"),
                               inputvars = c(Message = "_STRING_",
                               SampleName = "_SAMPLE_"))
                               
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
                            
sal
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", 
                            package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", 
                               input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", 
                               package="systemPipeR"),
                               inputvars = c(Message = "_STRING_",
                               SampleName = "_SAMPLE_"))
                               
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
                            
sal

Class `"SYSargsList"`

Description

SYSargsList S4 class is a list-like container where each instance stores all the input/output paths and parameter components required for a particular data analysis step based on command-line- or R-based software.

SYSargsList instances are constructed by the SYSargsList function.

Usage

## Accessors

stepsWF(x)
statusWF(x)
targetsWF(x)
outfiles(x)
SE(x, ...)
dependency(x)
projectInfo(x)
runInfo(x)

## Methods

cmdlist(x, ...)
codeLine(x, ...)
SampleName(x, ...)
stepName(x)
baseCommand(x, ...)
targetsheader(x, ...)
yamlinput(x, ...)
viewEnvir(x, silent = FALSE)
copyEnvir(x, list = character(), new.env = globalenv(), silent = FALSE)
addResources(x, step, resources)

## Subset Methods

subset(x, ...)
getColumn(x, step, position = c("outfiles", "targetsWF"), column = 1, 
          names = SampleName(x, step))

## Replacement

appendStep(x, after = length(x), ...) <- value
yamlinput(x, paramName, ...) <- value
replaceStep(x, step, step_name = "default") <- value
renameStep(x, step, ...) <- value
dependency(x, step, ...) <- value
appendCodeLine(x, after = length(x), ...) <- value
replaceCodeLine(x, line, ...) <- value
updateColumn(x, step, position = c("outfiles", "targetsWF")) <- value
## Accessors

stepsWF(x)
statusWF(x)
targetsWF(x)
outfiles(x)
SE(x, ...)
dependency(x)
projectInfo(x)
runInfo(x)

## Methods

cmdlist(x, ...)
codeLine(x, ...)
SampleName(x, ...)
stepName(x)
baseCommand(x, ...)
targetsheader(x, ...)
yamlinput(x, ...)
viewEnvir(x, silent = FALSE)
copyEnvir(x, list = character(), new.env = globalenv(), silent = FALSE)
addResources(x, step, resources)

## Subset Methods

subset(x, ...)
getColumn(x, step, position = c("outfiles", "targetsWF"), column = 1, 
          names = SampleName(x, step))

## Replacement

appendStep(x, after = length(x), ...) <- value
yamlinput(x, paramName, ...) <- value
replaceStep(x, step, step_name = "default") <- value
renameStep(x, step, ...) <- value
dependency(x, step, ...) <- value
appendCodeLine(x, after = length(x), ...) <- value
replaceCodeLine(x, line, ...) <- value
updateColumn(x, step, position = c("outfiles", "targetsWF")) <- value

Arguments

`x`	An instance of class `SYSargsList`.
`step`	character or numeric. Workflow step name or position index.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.
`list`	a character vector naming objects to be copyied from the enviroment.
`new.env`	An environment to copy to. Default is `globalenv()`.
`resources`	`List` for reserving for each cluster job sufficient computing resources including memory (Megabyte), number of nodes (Int), CPU cores, walltime (Minutes), etc. It is necessary two additional files: `conffile` and `template`. `conffile` is the path to configuration file (default location `./.batchtools.conf.R`). This file contains in its simplest form just one command, such as this line for the scheduler: `cluster.functions <- makeClusterFunctionsSlurm(template="batchtools.slurm.tmpl")`. For more detailed information visit this page: https://mllg.github.io/batchtools/index.html. `template` The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/mllg/batchtools/tree/master/inst/templates. Slurm, PBS/Torque, and Sun Grid Engine (SGE) templates are provided within the package demo data.
`position`	character. Options are `"outfiles" or "targetsWF"` slots.
`column`	character or numeric. Which column will be subset from the `position` argument.
`names`	character vector. Names of the workflow step.
`after`	A subscript, after which the values are to be appended.
`paramName`	character. Input name from `yamlinput(x)`, which value should be replaced.
`step_name`	character with the new step name. Default value will automatically give a name: `step_<step index position>`.
`line`	numeric. Index position of the code line to be added or replaced.
`value`	object containing the values to be replaced to `SYSargsList`. Values may be of the same class as the orginal values. For `updateColumn<-` a `DataFrame` must have the same rows as the original `DataFrame`. However, if there is no column/rows in the original `DataFrame`, the new `DataFrame` will replace the empty one. If there is a non-empty `DataFrame`, any existing columns with the same name as this new `DataFrame` will be replaced. Any columns that do not exist in the original `DataFrame` will be added to the original `DataFrame`.
`...`	Further arguments to be passed to or from other methods.

Objects from the Class

Objects can be created by calls of the form new("SYSargsList", ...).

Slots

stepsWF:: Object of class "list" storing all the steps objects of the workflow. Each step can either be SYSargs2 or LineWise.
statusWF:: Object of class "list" storing all the success and failure of each step in the workflow.
targetsWF:: Object of class "list" storing all the targets DataFrame for each step in the workflow. For the LineWise steps, a DataFrame with 0 rows and 0 columns will be displayed.
outfiles:: Object of class "list" storing all the output DataFrame for each step in the workflow. For the LineWise steps, a DataFrame with 0 rows and 0 columns will be displayed.
SE:: Object of class "list" storing all the SummarizedExperiment objects in the workflow.
dependency:: Object of class "list" storing all the dependency graphs in the workflow.
projectInfo:: Object of class "list" storing all the projectInfo information of the workflow.
runInfo:: Object of class "list" storing all the runInfo information of each step in the workflow.
targets_connection:: Object of class "list" storing all targets files connection in the workflow.

Methods

[: signature(x = "SYSargsList", i = "ANY", j = "ANY", drop = "ANY"): subsetting of class with bracket operator
[[: signature(x = "SYSargsList", i = "ANY", j = "ANY"): subsetting of class with bracket operator
$: signature(x = "SYSargsList"): extracting slots elements by name
coerce: signature(from = "list", to = "SYSargsList"): as(list, "SYSargsList")
coerce: signature(from = "SYSargsList", to = "list"): as(SYSargsList, "list")
sysargslist: signature(x = "SYSargsList"): Coerce back to list as(SYSargsList, "list")
length: signature(x = "SYSargsList"): extracts number of SYSargsList steps
names: signature(x = "SYSargsList"): extracts slot names
show: signature(object = "SYSargsList"): summary view of SYSargsList steps
stepsWF: signature(x = "SYSargsList"): extract data from stepsWF slot
statusWF: signature(x = "SYSargsList"): extract data from statusWF slot
targetsWF: signature(x = "SYSargsList"): extract data from targetsWF slot
outfiles: signature(x = "SYSargsList"): extract data from outfiles slot
SE: signature(x = "SYSargsList"): extract data from SE slot
dependency: signature(x = "SYSargsList"): extract data from dependency slot
projectInfo: signature(x = "SYSargsList"): extract data from projectInfo slot
runInfo: signature(x = "SYSargsList"): extract data from runInfo slot
cmdlist: signature(x = "SYSargsList", ...): extracts data from cmdlist slot for each SYSargs2 step
codeLine: signature(x = "SYSargsList", step): extracts data from codeLine slot for LineWise step
SampleName: signature(x = "SYSargsList", step): extracts Sample ID from SYSargs2 instance step
stepName: signature(x = "SYSargsList"): extracts steps names from workflow instance
baseCommand: signature(x = "SYSargsList", step): extracts baseCommand from SYSargs2 instance step
targetsheader: signature(x = "SYSargsList", step): extracts targetsheader from SYSargs2 instance step
yamlinput: signature(x = "SYSargsList", step): extracts data from yamlinput slot for each SYSargs2 step
viewEnvir: signature(x = "SYSargsList", silent = FALSE): return a vector of character strings giving the names of the objects in the SYSargsList environment
copyEnvir: signature(x = "SYSargsList", list = character(), new.env = globalenv(), silent = FALSE): copy of the contents or select objects from SYSargsList environment and place them into new.env
addResources: signature(x = "SYSargsList", step, resources): Adds the computing resources for one or multiple steps in the workflow. If the particular step(s) is set to be executed "management section," when the resources is added, the step(s) will be executed on the "compute section."
getColumn: signature(x = "SYSargsList", step, position = c("outfiles", "targetsWF"), column = 1, names = SampleName(x, step)): extracts the information for targetsWF or outfiles slots. The information can be used in an R code downstream
[[<-: signature(x = "SYSargsList", i = "ANY", j = "ANY", value = "ANY"): replacement method for SYSargsList class
appendStep<-: signature(x = "SYSargsList", after = length(x)): insert the SYSargsList or LineWise object onto x at the position given by after
yamlinput<-: signature(x = "SYSargsList", step, paramName ): replace a value in the yamlinput slot for a specific step instance
replaceStep<-: signature(x = "SYSargsList", step, step_name = "default"): replace a specific step in the workflow instance
renameStep<-: signature(x = "SYSargsList"): rename a stepName in the workflow instance
dependency<-: signature(x = "SYSargsList", step): replace dependency graph for a specific step instance
appendCodeLine<-: signature(x = "SYSargsList", step, after = length(x)): insert the R code in a specific step at the position given by after
replaceCodeLine<-: signature(x = "SYSargsList", step, line): replace the R code in a specific step at the position given by line
updateColumn<-: signature(x = "SYSargsList", step, position = c("outfiles", "targetsWF")): update or add a new column in targetsWF or outfiles slots

Author(s)

Daniela Cassol and Thomas Girke

Examples

sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", 
                            package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", 
                               input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", 
                               package="systemPipeR"),
                               inputvars = c(Message = "_STRING_",
                               SampleName = "_SAMPLE_"))
                               
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
sal 

length(sal)
names(sal)

## Accessors

stepsWF(sal)
statusWF(sal)
targetsWF(sal)
outfiles(sal)
SE(sal)
dependency(sal)
projectInfo(sal)
runInfo(sal)

## Methods

cmdlist(sal, step=1, targets=1:2) ## SYSargs2 step
codeLine(sal, step=2) ## LineWise step
SampleName(sal, step="echo")
stepName(sal)
baseCommand(sal, 1) ## SYSargs2 step
targetsheader(sal, step=1) ## SYSargs2 step
yamlinput(sal, step=1) ## SYSargs2 step
viewEnvir(sal)
copyEnvir(sal, list = character(), new.env = globalenv())

resources <- list(conffile= system.file("extdata/.batchtools.conf.R", 
                            package="systemPipeR"),
                  template= system.file("extdata/batchtools.slurm.tmpl", 
                            package="systemPipeR"),
                  Njobs=3, ## Usually, the samples number
                  walltime=60, ## minutes
                  ntasks=1,
                  ncpus=4, 
                  memory=1024 ## Mb
)
addResources(sal, 1, resources= resources)

## Subset Methods

sal_sub <- subset(sal, subset_steps=1, input_targets=1:2, keep_steps = TRUE)
sal_sub
targetsIn <- getColumn(sal, step=1, position = c("outfiles"))
targetsIn

## Replacement
renameStep(sal, step=1) <- "new_echo"
dependency(sal, step=2) <- "new_echo"
updateColumn(sal, step=2, position = c("targetsWF")) <- data.frame(targetsIn)
targetsWF(sal)

replaceStep(sal, step=2) <- LineWise(code = {
                            hello <- "Printing a new message"
                            }, 
                            step_name = "R_hello", 
                            dependency = "new_echo")
codeLine(sal)
yamlinput(sal, step=1, paramName="results_path") <- list(results_path=list(
                                            class="Directory", path="./data"))
cmdlist(sal, step = 1, targets = 1)
appendCodeLine(sal, step=2, after = 0) <- "log <- log(10)"
codeLine(sal, 2)
replaceCodeLine(sal, step=2, line=1) <- LineWise(code = {
                            log <- log(50)
                            })
codeLine(sal, 2)
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", 
                            package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", 
                               input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", 
                               package="systemPipeR"),
                               inputvars = c(Message = "_STRING_",
                               SampleName = "_SAMPLE_"))
                               
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
sal 

length(sal)
names(sal)

## Accessors

stepsWF(sal)
statusWF(sal)
targetsWF(sal)
outfiles(sal)
SE(sal)
dependency(sal)
projectInfo(sal)
runInfo(sal)

## Methods

cmdlist(sal, step=1, targets=1:2) ## SYSargs2 step
codeLine(sal, step=2) ## LineWise step
SampleName(sal, step="echo")
stepName(sal)
baseCommand(sal, 1) ## SYSargs2 step
targetsheader(sal, step=1) ## SYSargs2 step
yamlinput(sal, step=1) ## SYSargs2 step
viewEnvir(sal)
copyEnvir(sal, list = character(), new.env = globalenv())

resources <- list(conffile= system.file("extdata/.batchtools.conf.R", 
                            package="systemPipeR"),
                  template= system.file("extdata/batchtools.slurm.tmpl", 
                            package="systemPipeR"),
                  Njobs=3, ## Usually, the samples number
                  walltime=60, ## minutes
                  ntasks=1,
                  ncpus=4, 
                  memory=1024 ## Mb
)
addResources(sal, 1, resources= resources)

## Subset Methods

sal_sub <- subset(sal, subset_steps=1, input_targets=1:2, keep_steps = TRUE)
sal_sub
targetsIn <- getColumn(sal, step=1, position = c("outfiles"))
targetsIn

## Replacement
renameStep(sal, step=1) <- "new_echo"
dependency(sal, step=2) <- "new_echo"
updateColumn(sal, step=2, position = c("targetsWF")) <- data.frame(targetsIn)
targetsWF(sal)

replaceStep(sal, step=2) <- LineWise(code = {
                            hello <- "Printing a new message"
                            }, 
                            step_name = "R_hello", 
                            dependency = "new_echo")
codeLine(sal)
yamlinput(sal, step=1, paramName="results_path") <- list(results_path=list(
                                            class="Directory", path="./data"))
cmdlist(sal, step = 1, targets = 1)
appendCodeLine(sal, step=2, after = 0) <- "log <- log(10)"
codeLine(sal, 2)
replaceCodeLine(sal, step=2, line=1) <- LineWise(code = {
                            log <- log(50)
                            })
codeLine(sal, 2)

Constructs SYSargs object from param and targets files

Description

Constructs SYSargs S4 class objects from two simple tablular files: a targets file and a param file. The latter is optional for workflow steps lacking command-line software. Typically, a SYSargs instance stores all sample-level inputs as well as the paths to the corresponding outputs generated by command-line- or R-based software generating sample-level output files. Each sample level input/outfile operation uses its own SYSargs instance. The outpaths of SYSargs usually define the sample inputs for the next SYSargs instance. This connectivity is established by writing the outpaths with the writeTargetsout function to a new targets file that serves as input to the next systemArgs call. By chaining several SYSargs steps together one can construct complex workflows involving many sample-level input/output file operations with any combinaton of command-line or R-based software.

Usage

systemArgs(sysma, mytargets, type = "SYSargs")
systemArgs(sysma, mytargets, type = "SYSargs")

Arguments

`sysma`	path to 'param' file; file structure follows a simple name/value syntax that converted into JSON format; for details about the file structure see sample files provided by package. Assign `NULL` to run the pipeline without 'param' file. This can be useful for running partial workflows, e.g. with pregenerated BAM files.
`mytargets`	path to targets file
`type`	`type="SYSargs"` returns `SYSargs`, `type="json"` returns param file content in JSON format (requires `rjson` library)

Value

SYSargs object or character string in JSON format

Author(s)

Thomas Girke

Examples

## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Execute SYSargs on multiple machines of a compute cluster
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))
sapply(1:length(args), function(x) loadResult(reg, x)) # Works once all jobs have completed successfully.

## Alignment stats
read_statsDF <- alignStats(args) 
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")

## Write outpaths to new targets file for next SYSargs step
writeTargetsout(x=args, file="default")

## End(Not run)
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on single machine
runCommandline(args=args)

## Execute SYSargs on multiple machines of a compute cluster
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))
sapply(1:length(args), function(x) loadResult(reg, x)) # Works once all jobs have completed successfully.

## Alignment stats
read_statsDF <- alignStats(args) 
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")

## Write outpaths to new targets file for next SYSargs step
writeTargetsout(x=args, file="default")

## End(Not run)

Convert targets to list or data.frame

Description

Convert targets files to list or data.frame object.

Usage

targets.as.df(x)
targets.as.list(x, id="SampleName")
targets.as.df(x)
targets.as.list(x, id="SampleName")

Arguments

`x`	An object of the class `"list"` or `"data.frame"` that stores data from each target file, as `targets(WF)`.
`id`	A column from `targets` file, which will be used as an id for each one of the samples. It is required to be unique.

Value

data.frame or list containing all the targets file information.

Author(s)

Daniela Cassol

Examples

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
targetslist <- targets.as.list(x=targets)
targets.as.df(x=targetslist)
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")
targetslist <- targets.as.list(x=targets)
targets.as.df(x=targetslist)

Genome read coverage by transcript models

Description

## Trims adaptors hierarchically from longest to shortest match from right end of read. ## If 'internalmatch=TRUE' then internal matches will trigger the same behavior. The ## argument minpatternlength defines shortest adaptor match to consider for reads ## containing only partial adaptors at the right end.

Usage

trimbatch(fq, pattern, internalmatch=FALSE, minpatternlength=8, 
                            Nnumber=1, polyhomo=100, minreadlength=18, 
                            maxreadlength)
trimbatch(fq, pattern, internalmatch=FALSE, minpatternlength=8, 
                            Nnumber=1, polyhomo=100, minreadlength=18, 
                            maxreadlength)

Arguments

`fq`	`character` path to fastq file that contains the target sequences.
`pattern`	`character` pattern used to trim the sequence.
`internalmatch`	The default is `FALSE`. Trims adaptors hierarchically from longest to shortest match from right end of read. If 'internalmatch=TRUE' then internal matches will trigger the same behavior.
`minpatternlength`	It defines shortest adaptor match to consider for reads containing only partial adaptors at the right end.
`Nnumber`	A numeric value representing a minimum criterion for the filter. It selects elements with fewer than `Nnumber` 'N' symbols in each element.
`polyhomo`	A numeric value representing a maximum criterion for the filter. It selects elements with fewer than threshold copies of any nucleotide.
`minreadlength`	`numeric` value representing minimum read length.
`maxreadlength`	`numeric` value representing maximun read length.

Author(s)

Thomas Girke

Examples

## Preprocessing of paired-end reads
dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR")
targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR")
trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path)
trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_"))
trim
## Not run: 
iterTrim <- "trimbatch(fq, pattern='ACACGTCT', internalmatch=FALSE, minpatternlength=6, Nnumber=1, polyhomo=50, minreadlength=16, maxreadlength=101)" 
preprocessReads(args=trim[1], Fct=iterTrim, batchsize=100000, overwrite=TRUE, compress=TRUE)

## End(Not run)
## Preprocessing of paired-end reads
dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR")
targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR")
trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path)
trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_"))
trim
## Not run: 
iterTrim <- "trimbatch(fq, pattern='ACACGTCT', internalmatch=FALSE, minpatternlength=6, Nnumber=1, polyhomo=50, minreadlength=16, maxreadlength=101)" 
preprocessReads(args=trim[1], Fct=iterTrim, batchsize=100000, overwrite=TRUE, compress=TRUE)

## End(Not run)

Collect information about the third-party software

Description

Function to check if third-party software or utility is installed and set in the PATH.

Usage

tryCMD(command, silent = FALSE)
tryCMD(command, silent = FALSE)

Arguments

`command`	a character vector containing the command line name to be tested.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.

Value

It will return a positive message if the software is set on the PATH or an error message if the software is not set it.

Note

Please note that not necessary the software is not installed if the message indicates an error, but it has not been exported on the current PATH.

Author(s)

Danela Cassol

Examples

## Not run: 
tryCMD(command="R") 
tryCMD(command="blastp") 
tryCMD(command="hisat2") 
  
## End(Not run)
## Not run: 
tryCMD(command="R") 
tryCMD(command="blastp") 
tryCMD(command="hisat2") 
  
## End(Not run)

Validation of the files or directories

Description

Function to check if the full path (file or directory) exists.

Usage

tryPath(path)
tryPath(path)

Arguments

path

a character vector of full path name.

Details

This function produces a character vector of the file or directory name defined on the path argument.

Value

A character vector containing the name of the file or directory. If the path does not exist, it will return an error message.

Author(s)

Daniela Cassol

Examples

file <- system.file("extdata/", "targets.txt", package="systemPipeR")
tryPath(path=file)
file <- system.file("extdata/", "targets.txt", package="systemPipeR")
tryPath(path=file)

Generate Variant Report

Description

Functions for generating tabular variant reports including genomic context annotations and confidence statistics of variants. The annotations are obtained with utilities provided by the VariantAnnotation package and the variant statistics are retrieved from the input VCF files.

Usage

## Variant report
variantReport(files, txdb, fa, organism, out_dir = "results")

## Combine variant reports
combineVarReports(files, filtercol, ncol = 15)

## Create summary statistics of variants
varSummary(files)
## Variant report
variantReport(files, txdb, fa, organism, out_dir = "results")

## Combine variant reports
combineVarReports(files, filtercol, ncol = 15)

## Create summary statistics of variants
varSummary(files)

Arguments

`files`	`named character vector` with paths of the input VCF files.
`txdb`	Annotation data stored as `TranscriptDb` object, which can be obtained from GFF/GTF files, BioMart, Bioc Annotation packages, UCSC, etc. For details see the vignette of the `GenomicFeatures` package. It is important to use here matching versions of the `txdb` and `fa` objects. The latter is the genome sequence used for read mapping and variant calling.
`fa`	`FaFile` object pointing to the sequence file of the corresponding r eference genome stored in FASTA format or a `BSgenome` instance.
`organism`	Character vector specifying the organism name of the reference genome.
`filtercol`	Named character vector containing in the name field the column titles to filter on, and in the data field the corresponding values to include in the report. For instance, the setting `filtercol=c(Consequence="nonsynonymous")` will include only nonsysynonymous variances listed in the `Consequence` column. To omit the filtering step, one can use the setting `filtercol="All"`.
`ncol`	Integer specifying the number of columns in the tabular input files. Default is set to 15.
`out_dir`	Character vector of a `results` directory name. Default is `results`.

Value

Tabular output files.

Author(s)

Thomas Girke

Examples

## Alignment with BWA (sequentially on single machine)
param <- system.file("extdata", "bwa.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
sysargs(args)[1]

## Not run: 
library(VariantAnnotation)
system("bwa index -a bwtsw ./data/tair10.fasta")
bampaths <- runCommandline(args=args)

## Alignment with BWA (parallelized on compute cluster)
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)

## Variant calling with GATK
## The following creates in the inital step a new targets file
## (targets_bam.txt). The first column of this file gives the paths to
## the BAM files created in the alignment step. The new targets file and the
## parameter file gatk.param are used to create a new SYSargs
## instance for running GATK. Since GATK involves many processing steps, it is
## executed by a bash script gatk_run.sh where the user can specify the
## detailed run parameters. All three files are expected to be located in the
## current working directory. Samples files for gatk.param and
## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the 
## source file of the systemPipeR package. 
writeTargetsout(x=args, file="targets_bam.txt")
system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
# system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)
writeTargetsout(x=args, file="targets_gatk.txt")

## Variant calling with BCFtools
## The following runs the variant calling with BCFtools. This step requires in
## the current working directory the parameter file sambcf.param and the 
## bash script sambcf_run.sh.
args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)
writeTargetsout(x=args, file="targets_sambcf.txt")

## Filtering of VCF files generated by GATK
args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt")
filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4"
# filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6"
filterVars(args, filter, varcaller="gatk", organism="A. thaliana")
writeTargetsout(x=args, file="targets_gatk_filtered.txt")

## Filtering of VCF files generated by BCFtools
args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt")
filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
# filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
filterVars(args, filter, varcaller="bcftools", organism="A. thaliana")
writeTargetsout(x=args, file="targets_sambcf_filtered.txt")

## Annotate filtered variants from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Annotate filtered variants from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Combine results from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Combine results from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Summary for GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t")

## Summary for BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t")

## Venn diagram of variants
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_gatk <- overLapper(varlist, type="vennsets")
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_bcf <- overLapper(varlist, type="vennsets")
vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red"))

## End(Not run)
## Alignment with BWA (sequentially on single machine)
param <- system.file("extdata", "bwa.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
sysargs(args)[1]

## Not run: 
library(VariantAnnotation)
system("bwa index -a bwtsw ./data/tair10.fasta")
bampaths <- runCommandline(args=args)

## Alignment with BWA (parallelized on compute cluster)
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)

## Variant calling with GATK
## The following creates in the inital step a new targets file
## (targets_bam.txt). The first column of this file gives the paths to
## the BAM files created in the alignment step. The new targets file and the
## parameter file gatk.param are used to create a new SYSargs
## instance for running GATK. Since GATK involves many processing steps, it is
## executed by a bash script gatk_run.sh where the user can specify the
## detailed run parameters. All three files are expected to be located in the
## current working directory. Samples files for gatk.param and
## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the 
## source file of the systemPipeR package. 
writeTargetsout(x=args, file="targets_bam.txt")
system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
# system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict")
args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)
writeTargetsout(x=args, file="targets_gatk.txt")

## Variant calling with BCFtools
## The following runs the variant calling with BCFtools. This step requires in
## the current working directory the parameter file sambcf.param and the 
## bash script sambcf_run.sh.
args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources)
writeTargetsout(x=args, file="targets_sambcf.txt")

## Filtering of VCF files generated by GATK
args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt")
filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4"
# filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6"
filterVars(args, filter, varcaller="gatk", organism="A. thaliana")
writeTargetsout(x=args, file="targets_gatk_filtered.txt")

## Filtering of VCF files generated by BCFtools
args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt")
filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
# filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
filterVars(args, filter, varcaller="bcftools", organism="A. thaliana")
writeTargetsout(x=args, file="targets_sambcf_filtered.txt")

## Annotate filtered variants from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Annotate filtered variants from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
txdb <- loadDb("./data/tair10.sqlite")
fa <- FaFile(systemPipeR::reference(args))
variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana")

## Combine results from GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Combine results from BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous"))
write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t")

## Summary for GATK
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t")

## Summary for BCFtools
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t")

## Venn diagram of variants
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_gatk <- overLapper(varlist, type="vennsets")
args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt")
varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID))
vennset_bcf <- overLapper(varlist, type="vennsets")
vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red"))

## End(Not run)

Plot 2-5 way Venn diagrams

Description

Ploting function of 2-5 way Venn diagrams from 'VENNset' objects or count set vectors. A useful feature is the possiblity to combine the counts from several Venn comparisons with the same number of label sets in a single Venn diagram.

Usage

vennPlot(x, mymain = "Venn Diagram", mysub = "default", setlabels = "default", yoffset = seq(0, 10, by = 0.34), ccol = rep(1, 31), colmode = 1, lcol = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), lines = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), mylwd = 3, diacol = 1, type = "ellipse", ccex = 1, lcex = 1, sepsplit = "_", ...)
vennPlot(x, mymain = "Venn Diagram", mysub = "default", setlabels = "default", yoffset = seq(0, 10, by = 0.34), ccol = rep(1, 31), colmode = 1, lcol = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), lines = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), mylwd = 3, diacol = 1, type = "ellipse", ccex = 1, lcex = 1, sepsplit = "_", ...)

Arguments

`x`	`VENNset` or list of `VENNset` objects. Alternatively, a `vector` of Venn counts or a `list` of vectors of Venn counts can be provided as input. If several Venn comparisons are provided in a `list` then their results are combined in a single Venn diagram, where the count sets are organized above each other.
`mymain`	Main title of plot.
`mysub`	Subtitle of plot. Default `mysub="default"` reports the number of unique items in all sets, as well as the number of unique items in each individual set, respectively.
`setlabels`	The argument `setlabels` allows to provide a vector of custom sample labels. However, assigning the proper names in the name slots of the initial `setlist` is preferred for tracking purposes.
`yoffset`	The results from several Venn comparisons can be combined in a single Venn diagram by assigning to `x` a `list` with several `VENNsets` or count vectors. The positonal offset of the count sets in the plot can be controlled with the `yoffset` argument. The argument setting `colmode` allows to assign different colors to each count set. For instance, with `colmode=2` one can assign to `ccol` a color vector or a list, such as `ccol=c("blue", "red")` or `ccol=list(1:8, 8:1)`.
`ccol`	Character or numeric `vector` to define colors of count values, e.g. `ccol=c("black","black","red")`.
`colmode`	See argument `yoffset`.
`lcol`	Character or numeric `vector` to define colors of set labels, e.g. `lcol=c("red", "green")`
`lines`	Character or numeric `vector` to define colors of lines in plot.
`mylwd`	Defines line width of shapes used in plot.
`diacol`	See argument `type`.
`type`	Defines shapes used to plot 4-way Venn diagram. Default `type="ellipse"` uses ellipses. The setting `type="circle"` returns an incomplete 4-way Venn diagram as circles. This representation misses two overlap sectors, but is sometimes easier to navigate than the default ellipse version. The missing Venn intersects are reported below the Venn diagram. Their font color can be controled with the argument `diacol`.
`ccex`	Controls font size for count values.
`lcex`	Controls font size for set labels.
`sepsplit`	Character used to separate sample labels in Venn counts.
`...`	Additional arguments to pass on.

Value

Venn diagram plot.

Note

The functions provided here are an extension of the Venn diagram resources on this site: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams

Author(s)

Thomas Girke

References

See examples in 'The Electronic Journal of Combinatorics': http://www.combinatorics.org/files/Surveys/ds5/VennSymmExamples.html

Examples

## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## 2-way Venn diagram
vennset <- overLapper(setlist[1:2], type="vennsets")
vennPlot(vennset)

## 3-way Venn diagram
vennset <- overLapper(setlist[1:3], type="vennsets")
vennPlot(vennset)

## 4-way Venn diagram
vennset <- overLapper(setlist[1:4], type="vennsets")
vennPlot(list(vennset, vennset))

## Pseudo 4-way Venn diagram with circles
vennPlot(vennset, type="circle")

## 5-way Venn diagram
vennset <- overLapper(setlist[1:5], type="vennsets")
vennPlot(vennset)

## Alternative Venn count input to vennPlot (not recommended!)
counts <- sapply(vennlist(vennset), length)
vennPlot(counts)

## 6-way Venn comparison as bar plot
vennset <- overLapper(setlist[1:6], type="vennsets")
olBarplot(vennset, mincount=1)

## Bar plot of standard intersect counts
interset <- overLapper(setlist, type="intersects")
olBarplot(interset, mincount=1)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
names(interset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)
as.list(interset)

## Pairwise intersect matrix and heatmap
olMA <- sapply(names(setlist), 
		function(x) sapply(names(setlist), 
		function(y) sum(setlist[[x]] %in% setlist[[y]])))
olMA
heatmap(olMA, Rowv=NA, Colv=NA)

## Presence-absence matrices for large numbers of sample sets
interset <- overLapper(setlist=setlist, type="intersects", complexity=2)
(paMA <- intersectmatrix(interset))
heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray")) 
## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## 2-way Venn diagram
vennset <- overLapper(setlist[1:2], type="vennsets")
vennPlot(vennset)

## 3-way Venn diagram
vennset <- overLapper(setlist[1:3], type="vennsets")
vennPlot(vennset)

## 4-way Venn diagram
vennset <- overLapper(setlist[1:4], type="vennsets")
vennPlot(list(vennset, vennset))

## Pseudo 4-way Venn diagram with circles
vennPlot(vennset, type="circle")

## 5-way Venn diagram
vennset <- overLapper(setlist[1:5], type="vennsets")
vennPlot(vennset)

## Alternative Venn count input to vennPlot (not recommended!)
counts <- sapply(vennlist(vennset), length)
vennPlot(counts)

## 6-way Venn comparison as bar plot
vennset <- overLapper(setlist[1:6], type="vennsets")
olBarplot(vennset, mincount=1)

## Bar plot of standard intersect counts
interset <- overLapper(setlist, type="intersects")
olBarplot(interset, mincount=1)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
names(interset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)
intersectlist(interset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)
as.list(interset)

## Pairwise intersect matrix and heatmap
olMA <- sapply(names(setlist), 
		function(x) sapply(names(setlist), 
		function(y) sum(setlist[[x]] %in% setlist[[y]])))
olMA
heatmap(olMA, Rowv=NA, Colv=NA)

## Presence-absence matrices for large numbers of sample sets
interset <- overLapper(setlist=setlist, type="intersects", complexity=2)
(paMA <- intersectmatrix(interset))
heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))

Class `"VENNset"`

Description

Container for storing Venn intersect results created by the overLapper function. The setlist slot stores the original label sets as vectors in a list; intersectmatrix organizes the label sets in a present-absent matrix; complexitylevels represents the number of comparisons considered for each comparison set as vector of integers; and vennlist contains the Venn intersect vectors.

Objects from the Class

Objects can be created by calls of the form new("VENNset", ...).

Slots

setlist:: Object of class "list": list of vectors
intersectmatrix:: Object of class "matrix": binary matrix
complexitylevels:: Object of class "integer": vector of integers
vennlist:: Object of class "list": list of vectors

Methods

as.list: signature(x = "VENNset"): coerces VENNset to list
coerce: signature(from = "list", to = "VENNset"): as(list, "VENNset")
complexitylevels: signature(x = "VENNset"): extracts data from complexitylevels slot
intersectmatrix: signature(x = "VENNset"): extracts data from intersectmatrix slot
length: signature(x = "VENNset"): returns number of original label sets
names: signature(x = "VENNset"): extracts slot names
setlist: signature(x = "VENNset"): extracts data from setlist slot
show: signature(object = "VENNset"): summary view of VENNset objects
vennlist: signature(x = "VENNset"): extracts data from vennset slot

Author(s)

Thomas Girke

Examples

showClass("VENNset")

## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## Create VENNset
vennset <- overLapper(setlist[1:5], type="vennsets")
class(vennset)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)
showClass("VENNset")

## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## Create VENNset
vennset <- overLapper(setlist[1:5], type="vennsets")
class(vennset)

## Accessor methods for VENNset/INTERSECTset objects
names(vennset)
setlist(vennset)
intersectmatrix(vennset)
complexitylevels(vennset)
vennlist(vennset)

## Coerce VENNset/INTERSECTset object to list
as.list(vennset)

Writeout SYSargsList object

Description

Function to writeout SYSargsList workflow control environment (S4 object) object.

Usage

write_SYSargsList(sysargs, sys.file=".SPRproject/SYSargsList.yml", silent=TRUE)
write_SYSargsList(sysargs, sys.file=".SPRproject/SYSargsList.yml", silent=TRUE)

Arguments

`sysargs`	object of class `SYSargsList`.
`sys.file`	name and path of the SYSargsList file which will store all the project configuration information. Default is `.SPRproject/SYSargsList.yml`.
`silent`	if set to TRUE, all messages returned by the function will be suppressed.

Value

write_SYSargsList will return a sys.file path.

Author(s)

Daniela Cassol

Examples

## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")

sal <- write_SYSargsList(sal)
sal
## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")

sal <- write_SYSargsList(sal)
sal

Write updated targets out to file from `SYSargsList`

Description

Convenience write function for generating targets files containing the paths to files generated by input processes. These processes can be commandline- or R-based software. Typically, the paths to the inputs are stored in the targets infile targetsWF(sysargs) for SYSargsList objects. Note: by default the function cannot overwrite any existing files. If a file exists then the user has to explicitly remove it or set overwrite=TRUE.

Usage

writeTargets(sysargs, step, file = "default", silent = FALSE, overwrite = FALSE)
writeTargets(sysargs, step, file = "default", silent = FALSE, overwrite = FALSE)

Arguments

`sysargs`	Object of class `SYSargsList`.
`step`	character with the step name. To check all the names, please use `stepName(sysargs)`.
`file`	Name and path of the output file. If set to "default" then the name of the output file will have the pattern 'targets_<stepName>.txt'.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.
`overwrite`	If set to `TRUE`, existing files of same name will be overwritten.

Value

Writes tabular targes files containing the header/comment lines from stepsWF(sysargs)[[step]][["targetsheader"]] and the columns from targetsWF(sysargs)[[step]].

Author(s)

Daniela Cassol

Examples

## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
writeTargets(sal, "echo")
## Construct SYSargsList object from Rmd file
sal <- SPRproject(overwrite=TRUE)
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR")

## Constructor and `appendStep<-`
appendStep(sal) <- SYSargsList(step_name = "echo", 
                               targets=targetspath, dir=TRUE,
                               wf_file="example/workflow_example.cwl", input_file="example/example.yml", 
                               dir_path = system.file("extdata/cwl", package="systemPipeR"),
                               inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_"))
appendStep(sal) <- LineWise(code = {
                            hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x))
                            }, 
                            step_name = "R_read", 
                            dependency = "echo")
writeTargets(sal, "echo")

Write updated targets out to file

Description

Convenience write function for generating targets files with updated FileName columns containing the paths to files generated by input/output processes. These processes can be commandline- or R-based software. Typically, the paths to the inputs are stored in the targets infile (targetsin(args) for SYSargs objects or targets.as.df(targets(WF)) for SYSargs2 objects) and the outputs are stored in the targets outfile (targetsout(args) for SYSargs objects or output(WF)) for SYSargs2 objects. Note: by default the function cannot overwrite any existing files. If a file exists then the user has to explicitly remove it or set overwrite=TRUE.

Usage

writeTargetsout(x, file = "default", silent = FALSE, overwrite = FALSE, step = NULL, new_col=NULL, new_col_output_index=NULL, remove=FALSE, ...)
writeTargetsout(x, file = "default", silent = FALSE, overwrite = FALSE, step = NULL, new_col=NULL, new_col_output_index=NULL, remove=FALSE, ...)

Arguments

`x`	Object of class `SYSargs` or `SYSargs2`.
`file`	Name and path of the output file. If set to "default" then the name of the output file will have the pattern 'targets_<software>.txt', where <software> will be what `software(x)` returns, when `x` is an object of class `SYSargs`. For an object of class `SYSargs2`, the output file will have the pattern 'targets_<software>_<step>.txt', where <software> will be the workflow name (`files(x)$cwl`) and <step> will be the step chosen in the argument `step`.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.
`overwrite`	If set to `TRUE`, existing files of same name will be overwritten.
`step`	Name or numeric position of the step in the workflow to write the output files. The names can be check running `names(x$clt)`.
`new_col`	A vector of character strings of the new column name to add to target file.
`new_col_output_index`	A vector of numeric index positions of the file in `SYSargs2` class `output`. It requires `new_col` definition.
`remove`	To remove the original columns. Default is `FALSE`.
`...`	To pass on additional arguments.

Value

Writes tabular targes files containing the header/comment lines from targetsheader(x) and the columns from targetsout(x).

Author(s)

Daniela Cassol and Thomas Girke

Examples

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
## Write targets out file
names(WF$clt)
writeTargetsout(x=WF, file="default", step=1, new_col=c("sam_file"), new_col_output_index=c(1))

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
## Write targets out file
names(WF$clt)
writeTargetsout(x=WF, file="default", step=1, new_col=c("sam_file"), new_col_output_index=c(1))

Generate targets file with reference

Description

Generates targets file with sample-wise reference as required for some NGS applications, such as ChIP-Seq containing input samples. The reference sample information needs to be provided in the input file in a column called SampleReference where the values reference the labels used in the SampleName column. Sample rows without reference assignments will be removed automatically.

Usage

writeTargetsRef(infile, outfile, silent = FALSE, overwrite = FALSE, ...)
writeTargetsRef(infile, outfile, silent = FALSE, overwrite = FALSE, ...)

Arguments

`infile`	Path to input targets file.
`outfile`	Path to output targets file.
`silent`	If set to `TRUE`, all messages returned by the function will be suppressed.
`overwrite`	If set to `TRUE`, existing files of same name will be overwritten.
`...`	To pass on additional arguments.

Value

Generates modified targets file with the paths to the reference samples in the second column named FileName2. Note, sample rows not assigned reference samples are removed automatically.

Author(s)

Thomas Girke

Examples

## Path to input targets file
targets <- system.file("extdata", "targets_chip.txt", package="systemPipeR")

## Not run: 
## Write modified targets file with reference (e.g. input) sample
writeTargetsRef(infile=targets, outfile="~/targets_refsample.txt", silent=FALSE, overwrite=FALSE)

## End(Not run)
## Path to input targets file
targets <- system.file("extdata", "targets_chip.txt", package="systemPipeR")

## Not run: 
## Write modified targets file with reference (e.g. input) sample
writeTargetsRef(infile=targets, outfile="~/targets_refsample.txt", silent=FALSE, overwrite=FALSE)

## End(Not run)

Package 'systemPipeR'

Help Index

systemPipeR package for Workflow Environment

Description

Author(s)

References

Extension accessor methods for SummarizedExperiment object

Description

Usage

Arguments

Methods

Author(s)

Alignment statistics

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Class "catDB"

Description

Objects from the Class

Slots

Methods

Author(s)

See Also

Examples

catDB accessor methods

Description

Usage

Arguments

Value

Author(s)

Examples

Checking if the outfiles files exist

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Submit command-line tools to cluster

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Adding param file

Description

Usage

Arguments

Author(s)

Examples

Workflow Steps Selection

Description

Usage

Arguments

Author(s)

Examples

Read counting for several range sets

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

createParam

Description

Usage

Arguments

Details

Version 1 syntax

Version 2 syntax

Value

Class `"catDB"`

Class `"EnvModules"`

Toggles option `eval` on the RMarkdown files