Title: | systemPipeR: Workflow Environment for Data Analysis and Report Generation |
---|---|
Description: | systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates. |
Authors: | Thomas Girke |
Maintainer: | Thomas Girke <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.13.0 |
Built: | 2024-11-27 05:13:56 UTC |
Source: | https://github.com/bioc/systemPipeR |
The systemPipeR package provides a suite of R/Bioconductor for designing, building and running end-to-end analysis workflows on local machines, HPC clusters and cloud systems, while generating at the same time publication quality analysis reports.
For detailed information on usage, see the package vignette, by typing vignette("systemPipeR"), or more information on the project here: https://systempipe.org/spr
All software-related questions should be posted to the Bioconductor Support Site: https://support.bioconductor.org
The code can be viewed at the GitHub repository: https://github.com/tgirke/systemPipeR
Daniela Cassol, Tyler Backman, Thomas Girke
Backman TWH, Girke T (2016) systemPipeR: NGS workflow and report generation environment. BMC Bioinformatics 17 (1). https://doi.org/10.1186/s12859-016-1241-0
Accessors for adding new data to the 'assay' and 'metadata' slot of a SummarizedExperiment object
addAssay(x, ...) addMetadata(x, ...)
addAssay(x, ...) addMetadata(x, ...)
x |
Object of class |
... |
dots, name of the object. |
signature(x = "SummarizedExperiment")
: add new dataset to assays
slot
signature(x = "SummarizedExperiment")
: add new dataset to metadata
slot
Daniela Cassol
Generate data frame containing important read alignment statistics such as the total number of reads in the FASTQ files, the number of total alignments, as well as the number of primary alignments in the corresponding BAM files.
alignStats(args, fqpaths, pairEnd = TRUE, output_index = 1, subset="FileName1")
alignStats(args, fqpaths, pairEnd = TRUE, output_index = 1, subset="FileName1")
args |
Object of class |
fqpaths |
|
pairEnd |
logical. For pair-end libraries, select |
output_index |
A numeric index positions of the file in |
subset |
|
data.frame
with alignment statistics.
Thomas Girke
clusterRun
and runCommandline
and output_update
########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object from CWl param, CWL input, and targets files targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF) ## Not run: ## Execute SYSargs2 on single machine WF <- runCommandline(args=WF, make_bam=TRUE) ## Alignment stats read_statsDF <- alignStats(WF, subset="FileName") write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run) ######################################### ## Examples with \code{SYSargs} object ## ######################################### ## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Alignment stats read_statsDF <- alignStats(args) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run)
########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object from CWl param, CWL input, and targets files targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF) ## Not run: ## Execute SYSargs2 on single machine WF <- runCommandline(args=WF, make_bam=TRUE) ## Alignment stats read_statsDF <- alignStats(WF, subset="FileName") write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run) ######################################### ## Examples with \code{SYSargs} object ## ######################################### ## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Alignment stats read_statsDF <- alignStats(args) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run)
"catDB"
Container for storing mappings of genes to annotation categories
such as gene ontologies (GO), pathways or conserved sequence domains.
The catmap
slot stores a list
of data.frames
providing the direct
assignments of genes to annotation categories (e.g. gene-to-GO mappings);
catlist
is a list
of lists
of all direct and indirect associations to
the annotation categories (e.g. genes mapped to a pathway); and idconv
allows to store a lookup-table for converting identifiers (e.g. array feature
ids to gene ids).
Objects can be created by calls of the form new("catDB", ...)
.
catmap
:Object of class "list"
list
of data.frames
catlist
:Object of class "list"
list
of lists
idconv
:Object of class "ANY"
list
of data.frames
signature(x = "catDB")
: extracts data from catlist
slot
signature(x = "catDB")
: extracts data from catmap
slot
signature(from = "list", to = "catDB")
: as(list, "catDB")
signature(x = "catDB")
: extracts data from idconv
slot
signature(x = "catDB")
: extracts slot names
signature(object = "catDB")
: summary view of catDB
objects
Thomas Girke
makeCATdb
, GOHyperGAll
, GOHyperGAll_Subset
, GOHyperGAll_Simplify
, GOCluster_Report
, goBarplot
showClass("catDB") ## Not run: ## Obtain annotations from BioMart library("biomaRt") listMarts() # To choose BioMart database listMarts(host = "plants.ensembl.org") m <- useMart("plants_mart", host = "plants.ensembl.org") listDatasets(m) m <- useMart("plants_mart", dataset = "athaliana_eg_gene", host = "plants.ensembl.org") listAttributes(m) # Choose data types you want to download go <- getBM(attributes = c("go_id", "tair_locus", "namespace_1003"), mart = m) go <- go[go[, 3] != "", ] go[, 3] <- as.character(go[, 3]) go[go[, 3] == "molecular_function", 3] <- "F" go[go[, 3] == "biological_process", 3] <- "P" go[go[, 3] == "cellular_component", 3] <- "C" go[1:4, ] dir.create("./data/GO", recursive = TRUE) write.table(go, "data/GO/GOannotationsBiomart_mod.txt", quote = FALSE, row.names = FALSE, col.names = FALSE, sep = "\t") ## Create catDB instance (takes a while but needs to be done only once) catdb <- makeCATdb(myfile = "data/GO/GOannotationsBiomart_mod.txt", lib = NULL, org = "", colno = c(1, 2, 3), idconv = NULL) catdb save(catdb, file = "data/GO/catdb.RData") load("data/GO/catdb.RData") ## End(Not run)
showClass("catDB") ## Not run: ## Obtain annotations from BioMart library("biomaRt") listMarts() # To choose BioMart database listMarts(host = "plants.ensembl.org") m <- useMart("plants_mart", host = "plants.ensembl.org") listDatasets(m) m <- useMart("plants_mart", dataset = "athaliana_eg_gene", host = "plants.ensembl.org") listAttributes(m) # Choose data types you want to download go <- getBM(attributes = c("go_id", "tair_locus", "namespace_1003"), mart = m) go <- go[go[, 3] != "", ] go[, 3] <- as.character(go[, 3]) go[go[, 3] == "molecular_function", 3] <- "F" go[go[, 3] == "biological_process", 3] <- "P" go[go[, 3] == "cellular_component", 3] <- "C" go[1:4, ] dir.create("./data/GO", recursive = TRUE) write.table(go, "data/GO/GOannotationsBiomart_mod.txt", quote = FALSE, row.names = FALSE, col.names = FALSE, sep = "\t") ## Create catDB instance (takes a while but needs to be done only once) catdb <- makeCATdb(myfile = "data/GO/GOannotationsBiomart_mod.txt", lib = NULL, org = "", colno = c(1, 2, 3), idconv = NULL) catdb save(catdb, file = "data/GO/catdb.RData") load("data/GO/catdb.RData") ## End(Not run)
Methods to access information from catDB
object.
catmap(x)
catmap(x)
x |
object of class |
various outputs
Thomas Girke
## Not run: ## Obtain annotations from BioMart m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene") listAttributes(m) # Choose data types you want to download go <- getBM(attributes=c("go_accession", "tair_locus", "go_namespace_1003"), mart=m) go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3]) write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t") ## Create catDB instance (takes a while but needs to be done only once) catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", colno=c(1,2,3), idconv=NULL) catdb ## Access methods for catDB catmap(catdb)$D_MF[1:4,] catlist(catdb)$L_MF[1:4] idconv(catdb) ## End(Not run)
## Not run: ## Obtain annotations from BioMart m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene") listAttributes(m) # Choose data types you want to download go <- getBM(attributes=c("go_accession", "tair_locus", "go_namespace_1003"), mart=m) go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3]) write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t") ## Create catDB instance (takes a while but needs to be done only once) catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", colno=c(1,2,3), idconv=NULL) catdb ## Access methods for catDB catmap(catdb)$D_MF[1:4,] catlist(catdb)$L_MF[1:4] idconv(catdb) ## End(Not run)
This function returns a data.frame
indicating the number of existing files
and how many files are missing.
check.output(sysargs, type="data.frame") check.outfiles(sysargs, type="data.frame")
check.output(sysargs, type="data.frame") check.outfiles(sysargs, type="data.frame")
sysargs |
object of class |
type |
return object option. It can be |
data.frame
or list
containing all the outfiles
file information.
Daniela Cassol and Thomas Girke
## Construct SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## Check output check.output(WF) check.output(WF, "list") ## Construct SYSargsList object sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) ## Check outfiles check.outfiles(sal)
## Construct SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## Check output check.output(WF) check.output(WF, "list") ## Construct SYSargsList object sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) ## Check outfiles check.outfiles(sal)
Submits non-R command-line software to queueing/scheduling systems of compute
clusters using run specifications defined by functions similar to
runCommandline
. clusterRun
can be used with most queueing systems
since it is based on utilities from the batchtools
package which supports
the use of template files (*.tmpl
) for defining the run parameters of
the different schedulers. The path to the *.tmpl
file needs to be
specified in a conf file provided under the conffile
argument.
clusterRun(args, FUN = runCommandline, more.args = list(args = args, make_bam = TRUE), conffile = ".batchtools.conf.R", template = "batchtools.slurm.tmpl", Njobs, runid = "01", resourceList)
clusterRun(args, FUN = runCommandline, more.args = list(args = args, make_bam = TRUE), conffile = ".batchtools.conf.R", template = "batchtools.slurm.tmpl", Njobs, runid = "01", resourceList)
args |
Object of class |
FUN |
Accepts functions such as |
more.args |
Object of class |
conffile |
Path to conf file (default location |
template |
The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/mllg/batchtools/tree/master/inst/templates. Slurm, PBS/Torque, and Sun Grid Engine (SGE) templates are provided. |
Njobs |
Interger defining the number of cluster jobs. For instance, if |
runid |
Run identifier used for log file to track system call commands.
Default is |
resourceList |
|
Object of class Registry
, as well as files and directories
created by the executed command-line tools.
Daniela Cassol and Thomas Girke
For more details on batchtools
, please consult the following
page: https://github.com/mllg/batchtools/
clusterRun
replaces the older functions getQsubargs
and qsubRun
.
######################################### ## Examples with \code{SYSargs} object ## ######################################### ## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on multiple machines of a compute cluster. The following ## example uses the conf and template files for the Slurm scheduler. Please ## read the instructions on how to obtain the corresponding files for other schedulers. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, FUN = runCommandline, more.args = list(args = args, make_bam = TRUE), conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) file.exists(outpaths(args)) ## End(Not run) ########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF) ## Not run: ## Execute SYSargs2 on multiple machines of a compute cluster. The following ## example uses the conf and template files for the Slurm scheduler. Please ## read the instructions on how to obtain the corresponding files for other schedulers. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) reg <- clusterRun(WF, FUN = runCommandline, more.args = list(args = WF, make_bam = TRUE), conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) ## Updates the path in the object \code{output(WF)} WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam")) ## Alignment stats read_statsDF <- alignStats(WF) read_statsDF <- cbind(read_statsDF[targets$FileName,], targets) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run)
######################################### ## Examples with \code{SYSargs} object ## ######################################### ## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on multiple machines of a compute cluster. The following ## example uses the conf and template files for the Slurm scheduler. Please ## read the instructions on how to obtain the corresponding files for other schedulers. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, FUN = runCommandline, more.args = list(args = args, make_bam = TRUE), conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) file.exists(outpaths(args)) ## End(Not run) ########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF) ## Not run: ## Execute SYSargs2 on multiple machines of a compute cluster. The following ## example uses the conf and template files for the Slurm scheduler. Please ## read the instructions on how to obtain the corresponding files for other schedulers. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) reg <- clusterRun(WF, FUN = runCommandline, more.args = list(args = WF, make_bam = TRUE), conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) ## Updates the path in the object \code{output(WF)} WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam")) ## Alignment stats read_statsDF <- alignStats(WF) read_statsDF <- cbind(read_statsDF[targets$FileName,], targets) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run)
Replace or adding a input configuration setting at "YML" param file
config.param(input_file = NULL, param, file = "default", silent = FALSE)
config.param(input_file = NULL, param, file = "default", silent = FALSE)
input_file |
a |
param |
object of class |
file |
name and path of the new file. If set to |
silent |
if set to TRUE, all messages returned by the function will be suppressed. |
Daniela Cassol
## Not run: input_file <- system.file("extdata", "cwl/hisat2/hisat2-mapping-se.yml", package="systemPipeR") param <- list(thread=10, fq=list(class="File", path="./results2")) input <- config.param(input_file=input_file, param, file="default") ## End(Not run)
## Not run: input_file <- system.file("extdata", "cwl/hisat2/hisat2-mapping-se.yml", package="systemPipeR") param <- list(thread=10, fq=list(class="File", path="./results2")) input <- config.param(input_file=input_file, param, file="default") ## End(Not run)
This function allows the user to control of which step of the workflow will be run and the generation of the new RMarkdown.
configWF(x, input_steps = "ALL", exclude_steps = NULL, silent = FALSE, ...)
configWF(x, input_steps = "ALL", exclude_steps = NULL, silent = FALSE, ...)
x |
object of class |
input_steps |
a character vector of all steps desires to preserve on the output file.
Default is |
exclude_steps |
a character vector of all steps desires to remove on the output file. |
silent |
if set to TRUE, all messages returned by the function will be suppressed. |
... |
Additional arguments to pass on to |
Daniela Cassol
## Not run: library(systemPipeRdata) targets <- system.file("extdata", "targets.txt", package="systemPipeR") script <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata") SYSconfig <- initProject(projPath="./", targets=targets, script=script, overwrite=TRUE, silent=TRUE) sysargslist <- configWF(x=sysargslist) ## End(Not run)
## Not run: library(systemPipeRdata) targets <- system.file("extdata", "targets.txt", package="systemPipeR") script <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata") SYSconfig <- initProject(projPath="./", targets=targets, script=script, overwrite=TRUE, silent=TRUE) sysargslist <- configWF(x=sysargslist) ## End(Not run)
Convenience function to perform read counting iteratively for serveral
range sets, e.g. peak range sets or feature types. Internally,
the read counting is performed with the summarizeOverlaps
function from the GenomicAlignments
package. The resulting count
tables are directly saved to files.
countRangeset(bfl, args, outfiles=NULL, format="tabular", ...)
countRangeset(bfl, args, outfiles=NULL, format="tabular", ...)
bfl |
|
args |
An instance of |
outfiles |
Default is |
format |
Format of input range files. Currently, supported are |
... |
Arguments to be passed on to internally used |
Named character vector containing the paths from outpaths(args)
to the
resulting count table files.
Thomas Girke
summarizeOverlaps
## Paths to BAM files param <- system.file("extdata", "bowtieSE.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args_bam <- systemArgs(sysma=param, mytargets=targets) bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character()) ## Not run: ########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object ## SYSargs2 with paths to range data and count files dir_path <- system.file("extdata/cwl/count_rangesets", package="systemPipeR") args <- loadWF(targets = "targets_macs.txt", wf_file = "count_rangesets.cwl", input_file = "count_rangesets.yml", dir_path = dir_path) args <- renderWF(args, inputvars = c(FileName = "_FASTQ_PATH1_", SampleName = "_SampleName_")) ## Iterative read counting countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE) ########################################## ## Examples with \code{SYSargs} object ## ########################################## ## Construct SYSargs object ## SYSargs with paths to range data and count files args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt") ## Iterative read counting countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE) writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE) ## End(Not run)
## Paths to BAM files param <- system.file("extdata", "bowtieSE.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args_bam <- systemArgs(sysma=param, mytargets=targets) bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character()) ## Not run: ########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object ## SYSargs2 with paths to range data and count files dir_path <- system.file("extdata/cwl/count_rangesets", package="systemPipeR") args <- loadWF(targets = "targets_macs.txt", wf_file = "count_rangesets.cwl", input_file = "count_rangesets.yml", dir_path = dir_path) args <- renderWF(args, inputvars = c(FileName = "_FASTQ_PATH1_", SampleName = "_SampleName_")) ## Iterative read counting countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE) ########################################## ## Examples with \code{SYSargs} object ## ########################################## ## Construct SYSargs object ## SYSargs with paths to range data and count files args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt") ## Iterative read counting countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE) writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE) ## End(Not run)
The constructor function creates an SYSargs2
S4 class object from
command-line string. Also, the function creates and saves the CWL param files.
The latest storages all the parameters required for running command-line software,
following the standard and specification defined on Common Workflow Language (CWL).
createParamFiles(commandline, cwlVersion = "v1.1", class = "CommandLineTool", results_path = "./results", module_load = "baseCommand", file = "default", syntaxVersion = "v1", writeParamFiles = TRUE, confirm = FALSE, overwrite = FALSE, silent = FALSE) writeParamFiles(sysargs, file = "default", overwrite = TRUE, silent = FALSE, syntaxVersion = "v1")
createParamFiles(commandline, cwlVersion = "v1.1", class = "CommandLineTool", results_path = "./results", module_load = "baseCommand", file = "default", syntaxVersion = "v1", writeParamFiles = TRUE, confirm = FALSE, overwrite = FALSE, silent = FALSE) writeParamFiles(sysargs, file = "default", overwrite = TRUE, silent = FALSE, syntaxVersion = "v1")
commandline |
string. Original command-line to create the CWL files from. Please see
|
cwlVersion |
string. The version of the Common Workflow Language. More information here: https://www.commonwl.org/. |
class |
character. Name of Common Workflow Language Description class. The following
is accepted: |
results_path |
Path to the results folder. Default is |
module_load |
string, Name of software to load by Environment Modules system. Default is
"baseCommand", which creates a subfolder and two files: *.cwl and *.yml
at |
file |
character. Name and path of output files. If set to "default" then the name of
the output files will have the pattern |
syntaxVersion |
character. One of |
writeParamFiles |
logical. If set to TRUE, it will write to file the content of the CWL
files: |
confirm |
If set to |
overwrite |
logical. If set to TRUE, existing files of the same name will be overwritten.
Default is |
silent |
logical. If set to TRUE, all messages returned by the function will be
suppressed. Default is |
sysargs |
Object of class |
- First line of the command-line object will be treated as the
baseCommand
;
- For argument lines (starting from the second line), any word before the first
space with leading '-' or '–' in each will be treated as a prefix, like
-S
or --min
. Any line without this first word will be treated
as no prefix;
- All defaults are placed inside <...>
;
- First argument is the input argument type. F
for "File", int
for integer, string
for string;
- Optional: use the keyword out
followed the type with a ,
comma
separation to indicate if this argument is also a CWL output;
- Then, use :
to separate keywords and default values, any non-space
value after the :
will be treated as the default value;
- If any argument has no default value, just a flag, like --verbose
,
there no need to add any <...>
.
- The \
is not required, however for consistency it is recommended to add.
- First line of the command-line object will be treated as the
baseCommand
;
- Each line specifies one argument and its default value.
- Each line is composed with exact 2 ;
to seprate 3 columns.
- Text before first ;
will be will used as prefix/names.
If it starts with keyword "p:
", anything after "p:
"
and before the first ;
will be used as prefix, and the name
of this position will be the prefix but with leading dash(s) "-
",
"-
" removed. If there is any duplication, a number index will be
added to the end. If there is no keyword "p:
" before first ;
,
all text before first ;
will be the name.
- If there is keyword "p:
" before first ;
but nothing before
and after the second ;
, this position will be treated as CWL argument
instead of input.
- Text between first and second ;
is type. Must be one of File, Directory,
string, int, double, float, long, boolean.
- Text after second ;
and before \
or end of the line is the
default value. If it starts with keyword "out
" or "stdout
", this
position will also be added to outputs or standard output.
- There is only 1 position with "stdout
" allowed and usually it is the
last position arguement.
- Ending with "\
" is recommended but not required.
SYSargs2
object
Le Zhang and Daniela Cassol
For more details on CWL
, please consult the following
page: https://www.commonwl.org/
writeParamFiles
printParam
subsetParam
replaceParam
renameParam
appendParam
loadWorkflow
renderWF
showClass("SYSargs2")
## syntax version 1 example command <- " hisat2 \ -S <F, out: ./results/M1A.sam> \ -x <F: ./data/tair10.fasta> \ -k <int: 1> \ -min-intronlen <int: 30> \ -max-intronlen <int: 3000> \ -threads <int: 4> \ -U <F: ./data/SRR446027_1.fastq.gz> \ --verbose " cmd <- createParam(command, writeParamFiles=FALSE) cmdlist(cmd) ## syntax version 2 example command2 <- ' mycmd2 \ p: -s; File; sample1.txt \ p: -s; File; sample2.txt \ p: --c; ; \ p: -o; File; out: myout.txt \ ref_genome; File; a.fasta \ p: --nn; int; 12 \ mystdout; File; stdout: abc.txt ' cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE) cmdlist(cmd2)
## syntax version 1 example command <- " hisat2 \ -S <F, out: ./results/M1A.sam> \ -x <F: ./data/tair10.fasta> \ -k <int: 1> \ -min-intronlen <int: 30> \ -max-intronlen <int: 3000> \ -threads <int: 4> \ -U <F: ./data/SRR446027_1.fastq.gz> \ --verbose " cmd <- createParam(command, writeParamFiles=FALSE) cmdlist(cmd) ## syntax version 2 example command2 <- ' mycmd2 \ p: -s; File; sample1.txt \ p: -s; File; sample2.txt \ p: --c; ; \ p: -o; File; out: myout.txt \ ref_genome; File; a.fasta \ p: --nn; int; 12 \ mystdout; File; stdout: abc.txt ' cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE) cmdlist(cmd2)
Function to sync and download the most updated CWL description files from the systemPipeR package.
cwlFilesUpdate(destdir, force = FALSE, verbose = TRUE)
cwlFilesUpdate(destdir, force = FALSE, verbose = TRUE)
destdir |
a character string with the directory path where the param are stored. |
force |
logical. Force the download of the CWL files. |
verbose |
logical. The setting |
Daniela Cassol
## Not run: destdir <- "param/" cwlFilesUpdate(destdir) ## End(Not run)
## Not run: destdir <- "param/" cwlFilesUpdate(destdir) ## End(Not run)
"EnvModules"
The function module
enables use of the Environment Modules system
(http://modules.sourceforge.net/) from within the R environment. By default
the user's login shell environment (ie. bash -l
) will be used to
initialize the current session. The module function can also; load or unload
specific software, list all the loaded software within the current session, and
list all the applications available for loading from the module system.
Lastly, the module function can remove all loaded software from the current
session.
Objects can be created by calls of the form new("EnvModules", ...)
.
available_modules
:Object of class "list"
~~
loaded_modules
:Object of class "list"
~~
default_modules
:Object of class "list"
~~
modulecmd
:Object of class "character"
~~
signature(x = "EnvModules")
: ...
signature(x = "EnvModules", i = "ANY", j = "missing")
: ...
signature(x = "EnvModules")
: ...
signature(x = "EnvModules")
: ...
signature(x = "EnvModules")
: ...
signature(from = "EnvModules", to = "list")
: ...
signature(from = "list", to = "EnvModules")
: ...
signature(x = "EnvModules")
: ...
signature(x = "EnvModules")
: ...
signature(x = "EnvModules")
: ...
signature(x = "EnvModules")
: ...
signature(x = "EnvModules")
: ...
signature(object = "EnvModules")
: ...
Jordan Hayes and Daniela Cassol
showClass("EnvModules")
showClass("EnvModules")
eval
on the RMarkdown files
Function to evaluate (eval=TRUE
) or not evaluate (eval=FALSE
) R chunk codes in the Rmarkdown file. This function does not run the code, just toggles between TRUE
or FALSE
the option eval
and writes out a new file with the chosen option.
evalCode(infile, eval = TRUE, output)
evalCode(infile, eval = TRUE, output)
infile |
name and path of the infile file, format |
eval |
whether to evaluate the code and include its results. The default is |
output |
name and path of the output file. File format |
Writes Rmarkdown file containing the exact copy of the infile
file with the option choose on the eval
argument.
It will be easy to toggle between run all the R chunk codes or not.
Daniela Cassol
library(systemPipeRdata) file <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata") evalCode(infile=file, eval=FALSE, output=file.path(tempdir(), "test.Rmd"))
library(systemPipeRdata) file <- system.file("extdata/workflows/rnaseq", "systemPipeRNAseq.Rmd", package="systemPipeRdata") evalCode(infile=file, eval=FALSE, output=file.path(tempdir(), "test.Rmd"))
Computes read coverage along single and multi component features based on
genomic alignments. The coverage segments of component features are spliced to
continuous ranges, such as exons to transcripts or CDSs to ORFs. The results
can be obtained with single nucleotide resolution (e.g. around start and stop
codons) or as mean coverage of relative bin sizes, such as 100 bins for each
feature. The latter allows comparisons of coverage trends among transcripts of
variable length. The results can be obtained for single or many features (e.g.
any number of transcritpts) at once. Visualization of the coverage results is
facilitated by a downstream plotfeatureCoverage
function.
featureCoverage(bfl, grl, resizereads = NULL, readlengthrange = NULL, Nbins = 20, method = mean, fixedmatrix, resizefeatures, upstream, downstream, outfile, overwrite = FALSE)
featureCoverage(bfl, grl, resizereads = NULL, readlengthrange = NULL, Nbins = 20, method = mean, fixedmatrix, resizefeatures, upstream, downstream, outfile, overwrite = FALSE)
bfl |
Paths to BAM files provided as |
grl |
Genomic ranges provided as |
resizereads |
Positive integer defining the length alignments should be resized to prior to the
coverage calculation. |
readlengthrange |
Positive integer of length 2 determining the read length range to use for
the coverage calculation. Reads falling outside of the specified length range
will be excluded from the coverage calculation. For instance,
|
Nbins |
Single positive integer defining the number of segments the coverage of each feature should be binned into in order to obtain coverage summaries of constant length, e.g. for plotting purposes. |
method |
Defines the summary statistics to use for binning. The default is |
fixedmatrix |
If set to |
resizefeatures |
Needs to be set to |
upstream |
Single positive integer specifying the upstream extension length relative to the orientation of each feature in the genome. More details are given above. |
downstream |
Single positive integer specifying the downstream extension length relative to the orientation of each feature in the genome. More details are given above. |
outfile |
Default |
overwrite |
If set to |
The function allows to return the following four distinct outputs. The settings to return these instances are illustrated below in the example section.
(A) |
|
(B) |
|
(C) |
|
(D) |
|
Thomas Girke
plotfeatureCoverage
## Construct SYSargs2 object from param and targets files targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args <- runCommandline(args, make_bam = TRUE, dir = TRUE) outpaths <- subsetWF(args , slot="output", subset=1, index=1) file.exists(outpaths) ## (A) Generate binned coverage for two BAM files and 4 transcripts grl <- cdsBy(txdb, "tx", use.names=TRUE) fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage.xls", overwrite=TRUE) plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (B) Coverage matrix upstream and downstream of start/stop codons fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_up_down.xls", overwrite=TRUE) plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (C) Combined matrix for both binned and start/stop codon fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_binned.xls", overwrite=TRUE) plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (D) Rle coverage objects one for each query feature fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_query.xls", overwrite=TRUE) ## End(Not run)
## Construct SYSargs2 object from param and targets files targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targetspath, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args <- runCommandline(args, make_bam = TRUE, dir = TRUE) outpaths <- subsetWF(args , slot="output", subset=1, index=1) file.exists(outpaths) ## (A) Generate binned coverage for two BAM files and 4 transcripts grl <- cdsBy(txdb, "tx", use.names=TRUE) fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage.xls", overwrite=TRUE) plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (B) Coverage matrix upstream and downstream of start/stop codons fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_up_down.xls", overwrite=TRUE) plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (C) Combined matrix for both binned and start/stop codon fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_binned.xls", overwrite=TRUE) plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (D) Rle coverage objects one for each query feature fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_query.xls", overwrite=TRUE) ## End(Not run)
Counts how many reads in short read alignment files (BAM format) overlap with
entire annotation categories. This utility is useful for analyzing the
distribution of the read mappings across feature types, e.g. coding versus non-coding
genes. By default the read counts are reported for the sense and antisense
strand of each feature type separately. To minimize memory consumption, the
BAM files are processed in a stream using utilities from the Rsamtools
and GenomicAlignment
packages. The counts can be reported for each
read length separately or as a single value for reads of any length.
Subsequently, the counting results can be plotted with the associated
plotfeaturetypeCounts
function.
featuretypeCounts(bfl, grl, singleEnd = TRUE, readlength = NULL, type = "data.frame")
featuretypeCounts(bfl, grl, singleEnd = TRUE, readlength = NULL, type = "data.frame")
bfl |
|
grl |
|
singleEnd |
Specifies whether the targets BAM files contain alignments for single-end (SE) or paired-end
read data. |
readlength |
Integer vector specifying the read length values for which to report counts
separately. If |
type |
Determines whether the results are returned as |
The results are returned as data.frame
or list
of data.frames
.
For details see above under types
argument. The result data.frames
contain
the following columns in the given order:
SampleName |
Sample names obtained from |
Strand |
Sense or antisense strand of read mappings. |
Featuretype |
Name of feature type provided by |
Featuretypelength |
Total genomic length of each reduced feature type in bases. This value is useful to normalize the read counts by genomic length units, e.g. in plots. |
Subsequent columns |
Counts for reads of any length or for individual read lengths. |
Thomas Girke
plotfeaturetypeCounts
, genFeatures
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1) ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE) ## Generate and plot feature counts for specific read lengths fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame") p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE) ## Generate and plot feature counts for any read length fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame") p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE) ## End(Not run)
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1) ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE) ## Generate and plot feature counts for specific read lengths fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame") p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE) ## Generate and plot feature counts for any read length fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame") p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE) ## End(Not run)
Filters and plots DEG results for a given set of sample comparisons. The gene idenifiers of all (i) Up_or_Down, (ii) Up and (iii) Down regulated genes are stored as separate list components, while the corresponding summary statistics, stored in a fourth list component, is plotted in form of a stacked bar plot.
filterDEGs(degDF, filter, plot = TRUE)
filterDEGs(degDF, filter, plot = TRUE)
degDF |
|
filter |
Named vector with filter cutoffs of format |
plot |
Allows to turn plotting behavior on and off with default set to |
Currently, there is no community standard available how to calculate fold
changes (here logFC) of genomic ranges, such as gene or feature ranges, to
unambiguously refer to them as features with increased or decreased read
abundandce; or in case of gene expression experiments to up or down regulated
genes, respectively. To be consistent within systemPipeR
, the
corresponding functions, such as filterDEGs
, use here the following
definition. Genomic ranges with positive logFC values are classified as
up
and those with negative logFC values as down
. This means if a
comparison among two samples a
and b
is specified in the
corresponding targets file as a-b
then the feature with a positive logFC
has a higher _normalized_ read count value in sample a
than in sample
b
, and vice versa. To inverse this assignment, users want to change the
specification of their chosen sample comparison(s) in the targets file
accordingly, e.g. change a-b
to b-a
. Alternatively, one can swap
the column order of the matrix assigned to the cmp
argument of the
run_edgeR
or run_DESeq2
functions. Users should also be aware
that for logFC values close to zero (noise range), the direction of the fold
change (sign of logFC) can be very sensitive to minor differences in the
normalization method, while this assignment is much more robust for more
pronounced changes or higher absolute logFC values.
Returns list
with four components
UporDown |
List of up or down regulated gene/transcript indentifiers meeting the chosen filter settings for all comparisons defined in data frames |
Up |
Same as above but only for up regulated genes/transcript. |
Down |
Same as above but only for down regulated genes/transcript. |
Thomas Girke
run_edgeR
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") cmp <- readComp(file=targetspath, format="matrix", delim="-") countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR") countDF <- read.delim(countfile, row.names=1) edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="") pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE] fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE] DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10)) names(DEG_list) DEG_list$Summary
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") cmp <- readComp(file=targetspath, format="matrix", delim="-") countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR") countDF <- read.delim(countfile, row.names=1) edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="") pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE] fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE] DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10)) names(DEG_list) DEG_list$Summary
Convenience function for filtering VCF files based on user definable quality
parameters. The function imports each VCF file into R, applies the filtering on
an internally generated VRanges
object and then writes the results to a
new VCF file.
filterVars(files, filter, varcaller="gatk", organism, out_dir="results")
filterVars(files, filter, varcaller="gatk", organism, out_dir="results")
files |
|
filter |
Character vector of length one specifying the filter syntax that will be
applied to the internally created |
varcaller |
Character vector of length one specifying the variant caller used for generating the input VCFs. Currently, this argument can be assigned 'gatk', 'bcftools' or 'vartools'. |
organism |
Character vector specifying the organism name of the reference genome. |
out_dir |
Character vector of a |
Output files in VCF format. Their paths can be obtained with
outpaths(args)
or output(args)
.
Thomas Girke
variantReport
combineVarReports
, varSummar
## Alignment with BWA (sequentially on single machine) param <- system.file("extdata", "bwa.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) sysargs(args)[1] ## Not run: library(VariantAnnotation) system("bwa index -a bwtsw ./data/tair10.fasta") bampaths <- runCommandline(args=args) ## Alignment with BWA (parallelized on compute cluster) resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", cores(args)), memory="10gb") reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01", resourceList=resources) ## Variant calling with GATK ## The following creates in the inital step a new targets file ## (targets_bam.txt). The first column of this file gives the paths to ## the BAM files created in the alignment step. The new targets file and the ## parameter file gatk.param are used to create a new SYSargs ## instance for running GATK. Since GATK involves many processing steps, it is ## executed by a bash script gatk_run.sh where the user can specify the ## detailed run parameters. All three files are expected to be located in the ## current working directory. Samples files for gatk.param and ## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the ## source file of the systemPipeR package. writeTargetsout(x=args, file="targets_bam.txt") system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") # system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt") resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb") reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_gatk.txt") ## Variant calling with BCFtools ## The following runs the variant calling with BCFtools. This step requires in ## the current working directory the parameter file sambcf.param and the ## bash script sambcf_run.sh. args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt") resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb") reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_sambcf.txt") ## Filtering of VCF files generated by GATK args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt") filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4" # filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6" filterVars(args, filter, varcaller="gatk", organism="A. thaliana") writeTargetsout(x=args, file="targets_gatk_filtered.txt") ## Filtering of VCF files generated by BCFtools args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt") filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" # filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" filterVars(args, filter, varcaller="bcftools", organism="A. thaliana") writeTargetsout(x=args, file="targets_sambcf_filtered.txt") ## Annotate filtered variants from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Annotate filtered variants from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Combine results from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Combine results from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Summary for GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t") ## Summary for BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t") ## Venn diagram of variants args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_gatk <- overLapper(varlist, type="vennsets") args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_bcf <- overLapper(varlist, type="vennsets") vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red")) ## End(Not run)
## Alignment with BWA (sequentially on single machine) param <- system.file("extdata", "bwa.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) sysargs(args)[1] ## Not run: library(VariantAnnotation) system("bwa index -a bwtsw ./data/tair10.fasta") bampaths <- runCommandline(args=args) ## Alignment with BWA (parallelized on compute cluster) resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", cores(args)), memory="10gb") reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01", resourceList=resources) ## Variant calling with GATK ## The following creates in the inital step a new targets file ## (targets_bam.txt). The first column of this file gives the paths to ## the BAM files created in the alignment step. The new targets file and the ## parameter file gatk.param are used to create a new SYSargs ## instance for running GATK. Since GATK involves many processing steps, it is ## executed by a bash script gatk_run.sh where the user can specify the ## detailed run parameters. All three files are expected to be located in the ## current working directory. Samples files for gatk.param and ## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the ## source file of the systemPipeR package. writeTargetsout(x=args, file="targets_bam.txt") system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") # system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt") resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb") reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_gatk.txt") ## Variant calling with BCFtools ## The following runs the variant calling with BCFtools. This step requires in ## the current working directory the parameter file sambcf.param and the ## bash script sambcf_run.sh. args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt") resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", 1), memory="10gb") reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_sambcf.txt") ## Filtering of VCF files generated by GATK args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt") filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4" # filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6" filterVars(args, filter, varcaller="gatk", organism="A. thaliana") writeTargetsout(x=args, file="targets_gatk_filtered.txt") ## Filtering of VCF files generated by BCFtools args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt") filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" # filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" filterVars(args, filter, varcaller="bcftools", organism="A. thaliana") writeTargetsout(x=args, file="targets_sambcf_filtered.txt") ## Annotate filtered variants from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Annotate filtered variants from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Combine results from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Combine results from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Summary for GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t") ## Summary for BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t") ## Venn diagram of variants args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_gatk <- overLapper(varlist, type="vennsets") args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_bcf <- overLapper(varlist, type="vennsets") vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red")) ## End(Not run)
Function to generate a variety of feature types from TxDb
objects using
utilities provided by the GenomicFeatures
package. The feature types are
organized per gene and can be returned on that level in their non-reduced or
reduced form.
Currently, supported features include intergenic
, promoter
,
intron
, exon
, cds
, 5'/3'UTR
and different
transcript
types. The latter contains as many transcript types as
available in the tx_type
column when extracting transcripts from
TxDb
objects as follows:
transcripts(txdb, c("tx_name", "gene_id", "tx_type"))
genFeatures(txdb, featuretype = "all", reduce_ranges, upstream = 1000, downstream = 0, verbose = TRUE)
genFeatures(txdb, featuretype = "all", reduce_ranges, upstream = 1000, downstream = 0, verbose = TRUE)
txdb |
|
featuretype |
Feature types can be specified by assigning a |
reduce_ranges |
If set to |
upstream |
Defines for promoter features the number of bases upstream from the transcription start site. |
downstream |
Defines for promoter features the number of bases downstream from the transcription start site. |
verbose |
|
The results are returned as a GRangesList
where each component is a
GRanges
object containing the range set of each feature type. Intergenic
ranges are assigned unique identifiers and recorded in the featuretype_id
column of the metadata block. For this the ids of their adjacent genes are concatenated with two
underscores as separator. If the adjacent genes overlap with other genes then
their identifiers are included in the id string as well and separated by a single
underscore.
Thomas Girke
transcripts
and associated TxDb
accessor functions from
the GenomicFeatures
package.
## Sample from txdbmaker package library(txdbmaker) gffFile <- system.file("extdata", "GFF3_files", "a.gff3", package="txdbmaker") txdb <- makeTxDbFromGFF(file=gffFile, format="gff3", organism="Solanum lycopersicum") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=FALSE, upstream=1000, downstream=0) ## List extracted feature types names(feat) ## Obtain feature lists by genes, here for promoter split(feat$promoter, unlist(mcols(feat$promoter)$feature_by)) ## Return all features in single GRanges object unlist(feat) ## Not run: ## Sample from systemPipeRdata package file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0) ## End(Not run)
## Sample from txdbmaker package library(txdbmaker) gffFile <- system.file("extdata", "GFF3_files", "a.gff3", package="txdbmaker") txdb <- makeTxDbFromGFF(file=gffFile, format="gff3", organism="Solanum lycopersicum") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=FALSE, upstream=1000, downstream=0) ## List extracted feature types names(feat) ## Obtain feature lists by genes, here for promoter split(feat$promoter, unlist(mcols(feat$promoter)$feature_by)) ## Return all features in single GRanges object unlist(feat) ## Not run: ## Sample from systemPipeRdata package file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0) ## End(Not run)
To test a sample population of genes for over-representation of GO terms, the
core function GOHyperGAll
computes for all nodes in the three GO networks
(BP, CC and MF) an enrichment test based on the hypergeometric distribution and
returns the corresponding raw and Bonferroni corrected p-values.
Subsequently, a filter function supports GO Slim analyses using default or
custom GO Slim categories. Several convenience functions are provided to process
large numbers of gene sets (e.g. clusters from partitioning results) and to
visualize the results.
Note: GOHyperGAll
provides similar utilities as the GOHyperG
function in the GOstats
package. The main difference is that
GOHyperGAll
simplifies processing of large numbers of gene sets, as well
as the usage of custom array-to-gene and gene-to-GO mappings.
## Generate gene-to-GO mappings and store as catDB object makeCATdb(myfile, lib = NULL, org = "", colno = c(1, 2, 3), idconv = NULL, rootUK=FALSE) ## Enrichment function GOHyperGAll(catdb, gocat = "MF", sample, Nannot = 2) ## GO slim analysis GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample = test_sample, type = "goSlim", myslimv) ## Reduce GO term redundancy GOHyperGAll_Simplify(GOHyperGAll_result, gocat = "MF", cutoff = 0.001, correct = TRUE) ## Batch analysis of many gene sets GOCluster_Report(catdb, setlist, id_type = "affy", method = "all", CLSZ = 10, cutoff = 0.001, gocats = c("MF", "BP", "CC"), myslimv = "default", correct = TRUE, recordSpecGO = NULL, ...) ## Bar plot of GOCluster_Report results goBarplot(GOBatchResult, gocat)
## Generate gene-to-GO mappings and store as catDB object makeCATdb(myfile, lib = NULL, org = "", colno = c(1, 2, 3), idconv = NULL, rootUK=FALSE) ## Enrichment function GOHyperGAll(catdb, gocat = "MF", sample, Nannot = 2) ## GO slim analysis GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample = test_sample, type = "goSlim", myslimv) ## Reduce GO term redundancy GOHyperGAll_Simplify(GOHyperGAll_result, gocat = "MF", cutoff = 0.001, correct = TRUE) ## Batch analysis of many gene sets GOCluster_Report(catdb, setlist, id_type = "affy", method = "all", CLSZ = 10, cutoff = 0.001, gocats = c("MF", "BP", "CC"), myslimv = "default", correct = TRUE, recordSpecGO = NULL, ...) ## Bar plot of GOCluster_Report results goBarplot(GOBatchResult, gocat)
myfile |
File with gene-to-GO mappings. Sample files can be downloaded from geneontology.org (http://geneontology.org/GO.downloads.annotations.shtml) or from BioMart as shown in example below. |
colno |
Column numbers referencing in |
org |
Optional argument. Currently, the only valid option is |
lib |
If the gene-to-GO mappings are obtained from a |
idconv |
Optional id conversion |
catdb |
|
rootUK |
If the argument |
sample |
|
Nannot |
Defines the minimum number of direct annotations per GO node from the sample set to determine the number of tested hypotheses for the p-value adjustment. |
gocat |
Specifies the GO type, can be assigned one of the following character values: "MF", "BP" and "CC". |
GOHyperGAll_result |
|
type |
The function |
myslimv |
optional argument to provide custom |
cutoff |
p-value cutoff for GO terms to show in result |
correct |
If |
setlist |
|
id_type |
specifies type of IDs in input, can be assigned |
method |
Specifies analysis type. Current options are |
CLSZ |
minimum gene set (cluster) size to consider. Gene sets below this cutoff will be ignored. |
gocats |
Specifies GO type, can be assigned the values "MF", "BP" and "CC". |
recordSpecGO |
argument to report in the result |
GOBatchResult |
|
... |
additional arguments to pass on |
GOHyperGAll_Simplify
: The result data frame from GOHyperGAll
will often contain several connected GO terms with significant scores which
can complicate the interpretation of large sample sets. To reduce this redundancy,
the function GOHyperGAll_Simplify
subsets the data frame
by a user specified p-value cutoff and removes from it all GO nodes with
overlapping children sets (OFFSPRING), while the best scoring nodes are
retained in the result data.frame
.
GOCluster_Report
: performs the three types of GO term enrichment
analyses in batch mode: GOHyperGAll
, GOHyperGAll_Subset
or
GOHyperGAll_Simplify
. It processes many gene sets (e.g. gene expression
clusters) and returns the results conveniently organized in a single result data frame.
makeCATdb
generates catDB
object from file.
Thomas Girke
This workflow has been published in Plant Physiol (2008) 147, 41-57.
GOHyperGAll_Subset
, GOHyperGAll_Simplify
, GOCluster_Report
, goBarplot
## Not run: ## Obtain annotations from BioMart listMarts() # To choose BioMart database m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene") listAttributes(m) # Choose data types you want to download go <- getBM(attributes=c("go_accession", "tair_locus", "go_namespace_1003"), mart=m) go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3]) write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t") ## Create catDB instance (takes a while but needs to be done only once) catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", colno=c(1,2,3), idconv=NULL) catdb ## Create catDB from Bioconductor annotation package # catdb <- makeCATdb(myfile=NULL, lib="ath1121501.db", org="", colno=c(1,2,3), idconv=NULL) ## AffyID-to-GeneID mappings when working with AffyIDs # affy2locusDF <- systemPipeR:::.AffyID2GeneID(map = "ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt", download=TRUE) # catdb_conv <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", colno=c(1,2,3), idconv=list(affy=affy2locusDF)) # systemPipeR:::.AffyID2GeneID(catdb=catdb_conv, affyIDs=c("244901_at", "244902_at")) ## Next time catDB can be loaded from file save(catdb, file="catdb.RData") load("catdb.RData") ## Perform enrichment test on single gene set test_sample <- unique(as.character(catmap(catdb)$D_MF[1:100,"GeneID"])) GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2)[1:20,] ## GO Slim analysis by subsetting results accordingly GOHyperGAll_result <- GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2) GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample=test_sample, type="goSlim") ## Reduce GO term redundancy in 'GOHyperGAll_results' simplifyDF <- GOHyperGAll_Simplify(GOHyperGAll_result, gocat="MF", cutoff=0.001, correct=T) # Returns the redundancy reduced data set. data.frame(GOHyperGAll_result[GOHyperGAll_result[,1] ## Batch Analysis of Gene Clusters testlist <- list(Set1=test_sample) GOBatchResult <- GOCluster_Report(catdb=catdb, setlist=testlist, method="all", id_type="gene", CLSZ=10, cutoff=0.001, gocats=c("MF", "BP", "CC"), recordSpecGO=c("GO:0003674", "GO:0008150", "GO:0005575")) ## Plot 'GOBatchResult' as bar plot goBarplot(GOBatchResult, gocat="MF") ## End(Not run)
## Not run: ## Obtain annotations from BioMart listMarts() # To choose BioMart database m <- useMart("ENSEMBL_MART_PLANT"); listDatasets(m) m <- useMart("ENSEMBL_MART_PLANT", dataset="athaliana_eg_gene") listAttributes(m) # Choose data types you want to download go <- getBM(attributes=c("go_accession", "tair_locus", "go_namespace_1003"), mart=m) go <- go[go[,3]!="",]; go[,3] <- as.character(go[,3]) write.table(go, "GOannotationsBiomart_mod.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t") ## Create catDB instance (takes a while but needs to be done only once) catdb <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", colno=c(1,2,3), idconv=NULL) catdb ## Create catDB from Bioconductor annotation package # catdb <- makeCATdb(myfile=NULL, lib="ath1121501.db", org="", colno=c(1,2,3), idconv=NULL) ## AffyID-to-GeneID mappings when working with AffyIDs # affy2locusDF <- systemPipeR:::.AffyID2GeneID(map = "ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt", download=TRUE) # catdb_conv <- makeCATdb(myfile="GOannotationsBiomart_mod.txt", lib=NULL, org="", colno=c(1,2,3), idconv=list(affy=affy2locusDF)) # systemPipeR:::.AffyID2GeneID(catdb=catdb_conv, affyIDs=c("244901_at", "244902_at")) ## Next time catDB can be loaded from file save(catdb, file="catdb.RData") load("catdb.RData") ## Perform enrichment test on single gene set test_sample <- unique(as.character(catmap(catdb)$D_MF[1:100,"GeneID"])) GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2)[1:20,] ## GO Slim analysis by subsetting results accordingly GOHyperGAll_result <- GOHyperGAll(catdb=catdb, gocat="MF", sample=test_sample, Nannot=2) GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample=test_sample, type="goSlim") ## Reduce GO term redundancy in 'GOHyperGAll_results' simplifyDF <- GOHyperGAll_Simplify(GOHyperGAll_result, gocat="MF", cutoff=0.001, correct=T) # Returns the redundancy reduced data set. data.frame(GOHyperGAll_result[GOHyperGAll_result[,1] ## Batch Analysis of Gene Clusters testlist <- list(Set1=test_sample) GOBatchResult <- GOCluster_Report(catdb=catdb, setlist=testlist, method="all", id_type="gene", CLSZ=10, cutoff=0.001, gocats=c("MF", "BP", "CC"), recordSpecGO=c("GO:0003674", "GO:0008150", "GO:0005575")) ## Plot 'GOBatchResult' as bar plot goBarplot(GOBatchResult, gocat="MF") ## End(Not run)
Import R Markdown file as workflow. Each R code chunk will be set as a step in the
workflow. This operation requires a few extra settings on the R Markdown chunk
options, to include a particular code chunk in the workflow analysis. Please check
Details
.
importWF( sysargs, file_path, ignore_eval = TRUE, update = FALSE, confirm = FALSE, check_tool = !update, check_module = check_tool, verbose = TRUE )
importWF( sysargs, file_path, ignore_eval = TRUE, update = FALSE, confirm = FALSE, check_tool = !update, check_module = check_tool, verbose = TRUE )
sysargs |
|
file_path |
string, file path of the workflow file. |
ignore_eval |
logical, treat all R chunks' |
update |
logical, If you have ever changed the template and want to sync new changes,
turn |
confirm |
logical, Only useful when you combine |
check_tool |
logical, whether to check all required tools by this workflow are in PATH (callable).
It uses the |
check_module |
logical, whether to check all required modules are available. To do so, a modular system has be to installed.
If no modular system, this check will be skipped, even if |
verbose |
logical, print out verbose message while function running. |
To include a particular code chunk from the R Markdown file in the workflow analysis, please use the following code chunk options:
- spr = 'r'
: for code chunks with R code lines;
- spr = 'sysargs'
: for code chunks with an 'SYSargsList' object;
- spr.dep = <StepName>
: for specify the previous dependency. If this options
is not found, it will automaticly add the previous step.
For spr = 'sysargs'
, the last object assigned needs to be the
SYSargsList
.
If the spr
flag is not found, the R chunk will not be included in the workflow.
It is required to start a project using SPRproject()
function, and use the
object to populate the steps from R Markdown file.
importWF
will return an SYSargsList
object.
Le Zhang and Daniela Cassol
file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) sal <- importWF(sal, file_path)
file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) sal <- importWF(sal, file_path)
"INTERSECTset"
Container for storing standard intersect results created by the overLapper
function.
The setlist
slot stores the original label sets as vectors
in a list
;
intersectmatrix
organizes the label sets in a present-absent matrix; complexitylevels
represents the number of comparisons considered for each comparison set as vector of integers;
and intersectlist
contains the standard intersect vectors.
Objects can be created by calls of the form new("INTERSECTset", ...)
.
setlist
:Object of class "list"
: list
of vectors
intersectmatrix
:Object of class "matrix"
: binary matrix
complexitylevels
:Object of class "integer"
: vector
of integers
intersectlist
:Object of class "list"
: list
of vectors
signature(x = "INTERSECTset")
: coerces INTERSECTset
to list
signature(from = "list", to = "INTERSECTset")
: as(list, "INTERSECTset")
signature(x = "INTERSECTset")
: extracts data from complexitylevels
slot
signature(x = "INTERSECTset")
: extracts data from intersectlist
slot
signature(x = "INTERSECTset")
: extracts data from intersectmatrix
slot
signature(x = "INTERSECTset")
: returns number of original label sets
signature(x = "INTERSECTset")
: extracts slot names
signature(x = "INTERSECTset")
: extracts data from setlist
slot
signature(object = "INTERSECTset")
: summary view of INTERSECTset
objects
Thomas Girke
overLapper
, vennPlot
, olBarplot
, VENNset-class
showClass("INTERSECTset") ## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## Create VENNset interset <- overLapper(setlist[1:5], type="intersects") class(interset) ## Accessor methods for VENNset/INTERSECTset objects names(interset) setlist(interset) intersectmatrix(interset) complexitylevels(interset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(interset)
showClass("INTERSECTset") ## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## Create VENNset interset <- overLapper(setlist[1:5], type="intersects") class(interset) ## Accessor methods for VENNset/INTERSECTset objects names(interset) setlist(interset) intersectmatrix(interset) complexitylevels(interset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(interset)
"LineWise"
S4 class container for storing R-based code for a workflow step.
LineWise class
instances are constructed by the LineWise
function,
based on the R-based code, step name, and dependency tree.
When the container is built from the R Markdown, using importWF
function, two other slots are populated: codeChunkStart
and rmdPath
.
codeChunkStart
will store the first line of each R chunk, and rmdPath
will store the R Markdown file path.
## Constructor LineWise(code, step_name = "default", codeChunkStart = integer(), rmdPath = character(), dependency="", run_step = "mandatory", run_session = "management", run_remote_resources = NULL)
## Constructor LineWise(code, step_name = "default", codeChunkStart = integer(), rmdPath = character(), dependency="", run_step = "mandatory", run_session = "management", run_remote_resources = NULL)
code |
R code separated either by a semi-colon ( |
step_name |
character. Step name needs to be unique and is required when appending this step to the workflow. |
codeChunkStart |
integer. R Markdown code chunk line start. This element will be populated when
the object is built by |
rmdPath |
character. Path of R Markdown file used by |
dependency |
character. Dependency tree, required when appending this step to
the workflow. Character name of a previous step in the workflow.
Default is empty string |
run_step |
character. If the step has "mandatory" or "optional" flag for the execution. |
run_session |
character. If the step has "management" or "compute" flag for the execution. |
run_remote_resources |
|
Objects can be created by calls of the form new("LineWise", ...)
.
codeLine
:Object of class "expression"
storing R-based
code.
codeChunkStart
:Object of class "interger"
storing start
line from the rmdPath
file, when the
"LineWise"
is built from R Markdown.
stepName
:Object of class "character"
storing step name.
dependency
:Object of class "list"
storing dependency
tree.
status
:Object of class "list"
storing status steps.
files
:Object of class "list"
storing file for
R Markdown file and the file containing
stdout and stderr after running the R-based code.
runInfo
:Object of class "list"
storing all the
runInfo
information of the workflow
See 'Usage' for details on invocation.
Constructor:
Returns a LineWise
object.
Accessors:
Printing method for the CodeLine
slot.
Extract start line of the R Markdown R chunk.
Extract Rmarkdown file path.
Extract the step name.
Extract the dependency tree.
Extract status of the step.
Extract log file path storing stdout and stderr after running step.
Replacement method for append a R code line.
Replacement method for replace a R code line.
Methods:
Return a new LineWise
object made of the selected R code lines.
Extract the slot information from LineWine
object.
Replacement method for LineWine
slots.
Extract slots elements by name.
Extract number of R-based code lines.
Extract slot names.
Summary view of LineWise
elements.
signature(from = "LineWise", to = "list")
as(LineWise, "list")
signature(from = "list", to = "LineWise")
as(list, "LineWise")
Coerce back to list as(LineWise, "list")
Daniela Cassol
showClass("LineWise") lw <- LineWise(code = { log_out <- log(10) }, step_name = "R_log") codeLine(lw) ## ImportWF option file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) # file_path <- "../inst/extdata/spr_simple_lw.Rmd" sal <- importWF(sal, file_path) sal <- runWF(sal) lw2 <- sal$stepsWF[[2]] lw2 names(lw2) length(lw2) ## Accessors codeLine(lw2) codeChunkStart(lw2) rmdPath(lw2) stepName(lw2) dependency(lw2) status(lw2) files(lw2) ## Replacement appendCodeLine(lw2, after = 0) <- "log <- log(10)" codeLine(lw2) replaceCodeLine(lw2, 1) <- "plot(iris)" codeLine(lw2) ## Coerce lw2 <- linewise(lw2) ## OR lw2 <- as(lw2, "list") lw2 <- as(lw2, "LineWise")
showClass("LineWise") lw <- LineWise(code = { log_out <- log(10) }, step_name = "R_log") codeLine(lw) ## ImportWF option file_path <- system.file("extdata/spr_simple_lw.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) # file_path <- "../inst/extdata/spr_simple_lw.Rmd" sal <- importWF(sal, file_path) sal <- runWF(sal) lw2 <- sal$stepsWF[[2]] lw2 names(lw2) length(lw2) ## Accessors codeLine(lw2) codeChunkStart(lw2) rmdPath(lw2) stepName(lw2) dependency(lw2) status(lw2) files(lw2) ## Replacement appendCodeLine(lw2, after = 0) <- "log <- log(10)" codeLine(lw2) replaceCodeLine(lw2, 1) <- "plot(iris)" codeLine(lw2) ## Coerce lw2 <- linewise(lw2) ## OR lw2 <- as(lw2, "list") lw2 <- as(lw2, "LineWise")
These functions list/check whether required command-line tools/modules are installed in the PATH and are callable.
listCmdTools(sal, check_path = FALSE, check_module = FALSE) listCmdModules(sal, check_module = FALSE)
listCmdTools(sal, check_path = FALSE, check_module = FALSE) listCmdModules(sal, check_module = FALSE)
sal |
SPR workflow object in |
check_path |
logical, whether to check if the required tools are in PATH. |
check_module |
logical, whether to check if the required modules are installed. |
Both functions by default will not check the existence of tools or modules. The default is to list the requirement.
Both functions print out the list/check results as dataframe. The first column is workflow step names that
require certain tools/modules. The second column is the tool/module names. The third column is logical, TRUE
for
the existence of the tool in PATH/modular system, if check_path = TRUE
or check_module = TRUE
. Otherwise,
the third column will be NA
.
In the case of both check_path = TRUE, check_module = TRUE
for listCmdTools
, the returned dataframe is
still results for tool PATH checking but not module checking results. If one wish to obtain the module checking results,
please use listCmdModules
.
When the current workflow has no command-line (SYSargs) step, or there is no module required, or there is no modular
system installed, the return will be NULL
.
These two functions are automatically performed when importWF
is called.
Le Zhang
importWF
module
# See examples of `importWF`
# See examples of `importWF`
The constructor functions create an SYSargs2
S4 class object from three
input files: a CWL param
and input
files, and one simple tabular
or yml file, a targets
file. The latter is optional for workflow steps
lacking input files. TheCWL param
provides all the parameters required
for running command-line software, following the standard and specification
defined on Common Workflow Language (CWL).
The input
file provides additional information for the command-line,
allowing each sample level input/outfile operation uses its own SYSargs2
instance. In the targets
file users could provide the paths to the initial
sample input files (e.g. FASTQ) along with sample labels, and if appropriate
biological replicate and contrast information for controlling differential
abundance analyses.
The renderWF
function populates all the command-line for each sample in
each step of the particular workflow. Each sample level input/outfile operation
uses its own SYSargs2
instance. The output of SYSargs2
define all
the expected output files for each step in the workflow, which usually it is the
sample input for the next step in an SYSargs2
instance.
By chaining several SYSargs2
steps together one can construct complex
workflows involving many sample-level input/output file operations with any
combination of command-line or R-based software. Between different instances,
this connectivity is established by `appendStep<-`
method. Please
check more details from SYSargsList-class
class.
loadWorkflow(targets = NULL, wf_file, input_file, dir_path = "param/cwl", id = "SampleName") renderWF(WF, inputvars = NULL) updateWF(WF, write.yaml=FALSE, name.yaml="default", new_targets=NULL, new_targetsheader=NULL, inputvars=NULL, silent=FALSE)
loadWorkflow(targets = NULL, wf_file, input_file, dir_path = "param/cwl", id = "SampleName") renderWF(WF, inputvars = NULL) updateWF(WF, write.yaml=FALSE, name.yaml="default", new_targets=NULL, new_targetsheader=NULL, inputvars=NULL, silent=FALSE)
targets |
either the path to |
wf_file |
name and path to |
input_file |
name and path to |
dir_path |
path to the |
id |
A column from |
WF |
Object of class |
inputvars |
named character vector. Variables defined in the |
write.yaml |
logical. If set to |
name.yaml |
name and path to |
new_targets |
new targets files as list. 'targets' |
new_targetsheader |
character. New header/comment lines of targets file. Default is |
silent |
If set to |
SYSargs2
object.
Daniela Cassol and Thomas Girke
showClass("SYSargs2")
## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## If required to update the object yamlinput(WF, "thread") <- 6L WF <- updateWF(WF) cmdlist(WF)[1] yamlinput(WF)$thread
## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## If required to update the object yamlinput(WF, "thread") <- 6L WF <- updateWF(WF) cmdlist(WF)[1] yamlinput(WF)$thread
Merges BAM files based on sample groupings provided by a factor using internally
the mergeBam
function from the Rsamtools
package. The function
also returns an updated SYSargs
or SYSargs2
object containing
the paths to the merged BAM files as well as to the unmerged BAM files if there
are any. All rows of merged parent samples are removed. When a named
character vector
is provided as input, a data.frame
with a target
containing the paths to the merged BAM files as output.
The functionality provided by mergeBamByFactor
is useful for experiments
where pooling of replicates is advantageous to maximize the depth of read
coverage, such as prior to peak calling in ChIP-Seq or miRNA gene prediction
experiments.
mergeBamByFactor(args, targetsDF = NULL, mergefactor = "Factor", out_dir = file.path("results", "merge_bam"), overwrite = FALSE, silent = FALSE, ...)
mergeBamByFactor(args, targetsDF = NULL, mergefactor = "Factor", out_dir = file.path("results", "merge_bam"), overwrite = FALSE, silent = FALSE, ...)
args |
An instance of |
targetsDF |
This argument is required when |
mergefactor |
|
out_dir |
The directory path to store merged bam files. Default uses |
overwrite |
If |
silent |
If |
... |
To pass on additional arguments to the internally used |
The merged BAM files will be written to output files with the following naming
convention: <first_BAM_file_name>_<grouping_label_of_factor>.<bam>. In
addition, the function returns an updated SYSargs
or SYSargs2
object where all output file paths contain the paths to the merged BAM files.
When a named character vector
is provided as input, a data.frame
with a target containing the paths to the merged BAM files as output.
The rows of the merged parent samples are removed and the rows of the unmerged
samples remain unchanged.
Thomas Girke
writeTargetsout
, writeTargetsRef
## Construct initial SYSargs object targetspath <- system.file("extdata", "targets_chip.txt", package="systemPipeR") parampath <- system.file("extdata", "bowtieSE.param", package="systemPipeR") args <- systemArgs(sysma=parampath, mytargets=targetspath) ## Not run: ## After running alignmets (e.g. with Bowtie2) generate targets file ## for the corresponding BAM files. The alignment step is skipped here. writeTargetsout(x=args, file="targets_bam.txt", overwrite=TRUE) args <- systemArgs(sysma=NULL, mytargets="targets_bam.txt") ## Merge BAM files and return updated SYSargs object args_merge <- mergeBamByFactor(args, overwrite=TRUE, silent=FALSE) ## Export modified targets file writeTargetsout(x=args_merge, file="targets_mergeBamByFactor.txt", overwrite=TRUE) ## End(Not run)
## Construct initial SYSargs object targetspath <- system.file("extdata", "targets_chip.txt", package="systemPipeR") parampath <- system.file("extdata", "bowtieSE.param", package="systemPipeR") args <- systemArgs(sysma=parampath, mytargets=targetspath) ## Not run: ## After running alignmets (e.g. with Bowtie2) generate targets file ## for the corresponding BAM files. The alignment step is skipped here. writeTargetsout(x=args, file="targets_bam.txt", overwrite=TRUE) args <- systemArgs(sysma=NULL, mytargets="targets_bam.txt") ## Merge BAM files and return updated SYSargs object args_merge <- mergeBamByFactor(args, overwrite=TRUE, silent=FALSE) ## Export modified targets file writeTargetsout(x=args_merge, file="targets_mergeBamByFactor.txt", overwrite=TRUE) ## End(Not run)
The function module
enables use of the Environment Modules system
(http://modules.sourceforge.net/) from within the R environment.
The user's login shell environment (i.e. bash -l
) will be used to
initialize the current session. The module function can also; load or unload
specific software, list all the loaded software within the current session,
and list all the applications available for loading from the module system.
Lastly, the module function can remove all loaded software from the current
session.
module(action_type, module_name = NULL) moduleload(module_name) moduleUnload(module_name) modulelist() moduleAvail() moduleClear() moduleInit()
module(action_type, module_name = NULL) moduleload(module_name) moduleUnload(module_name) modulelist() moduleAvail() moduleClear() moduleInit()
action_type |
Name of the action to be executed as character vector. The following switches are accepted: |
module_name |
Name of software to load as character vector. Examples: |
Partial failure would also result 'FALSE', e.g. "load" two modules, one successful and the other failed, then the return is 'FALSE'. For "unload" action will always return 'TRUE' even if the module is not loaded at all or not found.
Tyler Backman, Jordan Hayes and Thomas Girke
## Not run: ## List all available software from the module system avail <- moduleAvail() ## List loaded software in the current session modulelist() ## Example for loading a software into the shell environment moduleload("hisat2") moduleload("hisat2/2.2.1") ## Example for removing software from the shell environment moduleUnload("hisat2") ## Clear all of the software from the shell's initialization files moduleClear() ## List and load all the software loaded in users default login shell into the current session (default) moduleInit() ## End(Not run)
## Not run: ## List all available software from the module system avail <- moduleAvail() ## List loaded software in the current session modulelist() ## Example for loading a software into the shell environment moduleload("hisat2") moduleload("hisat2/2.2.1") ## Example for removing software from the shell environment moduleUnload("hisat2") ## Clear all of the software from the shell's initialization files moduleClear() ## List and load all the software loaded in users default login shell into the current session (default) moduleInit() ## End(Not run)
Generates bar plots of the intersect counts of VENNset
and
INTERSECTset
objects generated by the overLapper
function. It is
an alternative to Venn diagrames (e.g. vennPlot
) that scales to larger numbers
of label sets. By default the bars in the plot are colored and grouped by complexity
levels of the intersect sets.
olBarplot(x, mincount = 0, complexity="default", myxlabel = "default", myylabel="Counts", mytitle = "default", ...)
olBarplot(x, mincount = 0, complexity="default", myxlabel = "default", myylabel="Counts", mytitle = "default", ...)
x |
Object of class |
mincount |
Sets minimum number of counts to consider in the bar plot. Default |
complexity |
Allows user to limit the bar plot to specific complexity levels of intersects
by specifying the chosen ones with an integer vector. Default
|
myxlabel |
Defines label of x-axis. |
myylabel |
Defines label of y-axis. |
mytitle |
Defines main title of plot. |
... |
Allows to pass on additional arguments to |
Bar plot.
The functions provided here are an extension of the Venn diagram resources on this site: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams
Thomas Girke
overLapper
, vennPlot
## Sample data: list of vectors with object labels setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## 2-way Venn diagram vennset <- overLapper(setlist[1:2], type="vennsets") vennPlot(vennset) ## 3-way Venn diagram vennset <- overLapper(setlist[1:3], type="vennsets") vennPlot(vennset) ## 4-way Venn diagram vennset <- overLapper(setlist[1:4], type="vennsets") vennPlot(list(vennset, vennset)) ## Pseudo 4-way Venn diagram with circles vennPlot(vennset, type="circle") ## 5-way Venn diagram vennset <- overLapper(setlist[1:5], type="vennsets") vennPlot(vennset) ## Alternative Venn count input to vennPlot (not recommended!) counts <- sapply(vennlist(vennset), length) vennPlot(counts) ## 6-way Venn comparison as bar plot vennset <- overLapper(setlist[1:6], type="vennsets") olBarplot(vennset, mincount=1) ## Bar plot of standard intersect counts interset <- overLapper(setlist, type="intersects") olBarplot(interset, mincount=1) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) names(interset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset) as.list(interset) ## Pairwise intersect matrix and heatmap olMA <- sapply(names(setlist), function(x) sapply(names(setlist), function(y) sum(setlist[[x]] %in% setlist[[y]]))) olMA heatmap(olMA, Rowv=NA, Colv=NA) ## Presence-absence matrices for large numbers of sample sets interset <- overLapper(setlist=setlist, type="intersects", complexity=2) (paMA <- intersectmatrix(interset)) heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))
## Sample data: list of vectors with object labels setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## 2-way Venn diagram vennset <- overLapper(setlist[1:2], type="vennsets") vennPlot(vennset) ## 3-way Venn diagram vennset <- overLapper(setlist[1:3], type="vennsets") vennPlot(vennset) ## 4-way Venn diagram vennset <- overLapper(setlist[1:4], type="vennsets") vennPlot(list(vennset, vennset)) ## Pseudo 4-way Venn diagram with circles vennPlot(vennset, type="circle") ## 5-way Venn diagram vennset <- overLapper(setlist[1:5], type="vennsets") vennPlot(vennset) ## Alternative Venn count input to vennPlot (not recommended!) counts <- sapply(vennlist(vennset), length) vennPlot(counts) ## 6-way Venn comparison as bar plot vennset <- overLapper(setlist[1:6], type="vennsets") olBarplot(vennset, mincount=1) ## Bar plot of standard intersect counts interset <- overLapper(setlist, type="intersects") olBarplot(interset, mincount=1) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) names(interset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset) as.list(interset) ## Pairwise intersect matrix and heatmap olMA <- sapply(names(setlist), function(x) sapply(names(setlist), function(y) sum(setlist[[x]] %in% setlist[[y]]))) olMA heatmap(olMA, Rowv=NA, Colv=NA) ## Presence-absence matrices for large numbers of sample sets interset <- overLapper(setlist=setlist, type="intersects", complexity=2) (paMA <- intersectmatrix(interset)) heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))
Function for identifying consensus peak among two peaks sets sharing a minimum relative overlap.
olRanges(query, subject, output = "gr")
olRanges(query, subject, output = "gr")
query |
Object of class |
subject |
Object of class |
output |
By default |
Thomas Girke
## Sample Data Sets grq <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges::IRanges(seq(1, 100, by=10), end = seq(30, 120, by=10)), strand = Rle(strand(c("-", "+", "-")), c(1, 7, 2))) grs <- shift(grq[c(2,5,6)], 5) ## Run olRanges function olRanges(query=grq, subject=grs, output="df") olRanges(query=grq, subject=grs, output="gr")
## Sample Data Sets grq <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges::IRanges(seq(1, 100, by=10), end = seq(30, 120, by=10)), strand = Rle(strand(c("-", "+", "-")), c(1, 7, 2))) grs <- shift(grq[c(2,5,6)], 5) ## Run olRanges function olRanges(query=grq, subject=grs, output="df") olRanges(query=grq, subject=grs, output="gr")
SYSargs2
object
After executing all the command-lines by the runCommadline
function, the output files can be created in specific directories rather then results
in a particular directory. Also, the runCommadline
function allows converting the SAM file outputs to sorted and indexed BAM files. Thus, the output_update
function allows updating the location of these files in the output of the SYSargs2
object.
output_update(args, dir = FALSE, dir.name = NULL, replace = FALSE, extension = NULL, make_bam=FALSE, del_sam=TRUE)
output_update(args, dir = FALSE, dir.name = NULL, replace = FALSE, extension = NULL, make_bam=FALSE, del_sam=TRUE)
args |
object of class |
dir |
assign |
dir.name |
if the results directory name is not specified in the |
replace |
replace the extension for selected output files in the |
extension |
object of class |
make_bam |
Auto detects SAM file outputs and update them on the |
del_sam |
This option allows deleting the SAM files created when the |
SYSargs2
object with output location files updated.
Daniela Cassol and Thomas Girke
To check directory name in the input
file: yamlinput(WF)$results_path$path
.
## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF output(WF) ## Not run: runCommandline(args=WF, make_bam=TRUE) ## Output paths update WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam")) runCommandline(args=WF, make_bam=TRUE, dir=TRUE) ## Output paths update WF <- output_update(WF, dir=TRUE, replace=TRUE, extension=c(".sam", ".bam")) ## End(Not run)
## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF output(WF) ## Not run: runCommandline(args=WF, make_bam=TRUE) ## Output paths update WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam")) runCommandline(args=WF, make_bam=TRUE, dir=TRUE) ## Output paths update WF <- output_update(WF, dir=TRUE, replace=TRUE, extension=c(".sam", ".bam")) ## End(Not run)
Function for computing Venn intersects or standard intersects among large
numbers of label sets provided as list
of vectors
. The resulting
intersect objects can be used for plotting 2-5 way Venn diagrams or intersect
bar plots using the functions vennPlot
or olBarplot
, respectively.
The overLapper
function scales to 2-20 or more label vectors for Venn
intersect calculations and to much larger sample numbers for standard
intersects. The different intersect types are explained below under the
definition of the type
argument. The upper Venn limit around 20 label
sets is unavoidable because the complexity of Venn intersects increases
exponentially with the label set number n
according to this
relationship: 2^n - 1
. The current implementation of
the plotting function vennPlot
supports Venn diagrams for 2-5 label
sets. To visually analyze larger numbers of label sets, a variety of intersect
methods are introduced in the olBarplot
help file. These methods are
much more scalable than Venn diagrams, but lack their restrictive intersect
logic.
overLapper(setlist, complexity = "default", sep = "_", cleanup = FALSE, keepdups = FALSE, type)
overLapper(setlist, complexity = "default", sep = "_", cleanup = FALSE, keepdups = FALSE, type)
setlist |
Object of class |
complexity |
Complexity level of intersects specified as integer vector. For Venn intersects
it needs to be assigned |
sep |
Character used to separate set labels. |
cleanup |
If set to |
keepdups |
By default all duplicates are removed from the label sets. The setting
|
type |
With the default setting |
Additional Venn diagram resources are provided by the packages limma
,
gplots
, vennerable
, eVenn
and VennDiagram
, or
online resources such as shapes, Venn Diagram Generator and Venny.
overLapper
returns standard intersect and Venn intersect results as
INTERSECTset
or VENNset
objects, respectively. These S4 objects
contain the following components:
setlist |
Original label sets accessible with |
intersectmatrix |
Present-absent matrix accessible with |
complexitylevels |
Complexity levels accessible with |
vennlist |
Venn intersects for |
intersectlist |
Standard intersects for |
The functions provided here are an extension of the Venn diagram resources on this site: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams
Thomas Girke
See examples in 'The Electronic Journal of Combinatorics': http://www.combinatorics.org/files/Surveys/ds5/VennSymmExamples.html
vennPlot
, olBarplot
## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## 2-way Venn diagram vennset <- overLapper(setlist[1:2], type="vennsets") vennPlot(vennset) ## 3-way Venn diagram vennset <- overLapper(setlist[1:3], type="vennsets") vennPlot(vennset) ## 4-way Venn diagram vennset <- overLapper(setlist[1:4], type="vennsets") vennPlot(list(vennset, vennset)) ## Pseudo 4-way Venn diagram with circles vennPlot(vennset, type="circle") ## 5-way Venn diagram vennset <- overLapper(setlist[1:5], type="vennsets") vennPlot(vennset) ## Alternative Venn count input to vennPlot (not recommended!) counts <- sapply(vennlist(vennset), length) vennPlot(counts) ## 6-way Venn comparison as bar plot vennset <- overLapper(setlist[1:6], type="vennsets") olBarplot(vennset, mincount=1) ## Bar plot of standard intersect counts interset <- overLapper(setlist, type="intersects") olBarplot(interset, mincount=1) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) names(interset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset) as.list(interset) ## Pairwise intersect matrix and heatmap olMA <- sapply(names(setlist), function(x) sapply(names(setlist), function(y) sum(setlist[[x]] %in% setlist[[y]]))) olMA heatmap(olMA, Rowv=NA, Colv=NA) ## Presence-absence matrices for large numbers of sample sets interset <- overLapper(setlist=setlist, type="intersects", complexity=2) (paMA <- intersectmatrix(interset)) heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))
## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## 2-way Venn diagram vennset <- overLapper(setlist[1:2], type="vennsets") vennPlot(vennset) ## 3-way Venn diagram vennset <- overLapper(setlist[1:3], type="vennsets") vennPlot(vennset) ## 4-way Venn diagram vennset <- overLapper(setlist[1:4], type="vennsets") vennPlot(list(vennset, vennset)) ## Pseudo 4-way Venn diagram with circles vennPlot(vennset, type="circle") ## 5-way Venn diagram vennset <- overLapper(setlist[1:5], type="vennsets") vennPlot(vennset) ## Alternative Venn count input to vennPlot (not recommended!) counts <- sapply(vennlist(vennset), length) vennPlot(counts) ## 6-way Venn comparison as bar plot vennset <- overLapper(setlist[1:6], type="vennsets") olBarplot(vennset, mincount=1) ## Bar plot of standard intersect counts interset <- overLapper(setlist, type="intersects") olBarplot(interset, mincount=1) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) names(interset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset) as.list(interset) ## Pairwise intersect matrix and heatmap olMA <- sapply(names(setlist), function(x) sapply(names(setlist), function(y) sum(setlist[[x]] %in% setlist[[y]]))) olMA heatmap(olMA, Rowv=NA, Colv=NA) ## Presence-absence matrices for large numbers of sample sets interset <- overLapper(setlist=setlist, type="intersects", complexity=2) (paMA <- intersectmatrix(interset)) heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))
Plots the 3 tabular data types (A-C) generated by the featureCoverage
function. It accepts data from single or many features (e.g. CDSs) and
samples (BAM files). The coverage from multiple features will be summarized
using methods such as mean
, while the data from multiple samples will
be plotted in separate panels.
plotfeatureCoverage(covMA, method = mean, scales = "fixed", extendylim=2, scale_count_val = 10^6)
plotfeatureCoverage(covMA, method = mean, scales = "fixed", extendylim=2, scale_count_val = 10^6)
covMA |
Object of class |
method |
Defines the summary statistics to use when |
scales |
Scales setting passed on to the |
extendylim |
Allows to extend the upper limit of the y axis when |
scale_count_val |
Scales (normalizes) the read counts to a fixed value of aligned reads
in each sample such as counts per million aligned reads (default is 10^6).
For this calculation the |
Currently, the function returns ggplot2 bar plot graphics.
Thomas Girke
featureCoverage
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1) ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") ## (A) Generate binned coverage for two BAM files and 4 transcripts grl <- cdsBy(txdb, "tx", use.names=TRUE) fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage.xls") plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (B) Coverage matrix upstream and downstream of start/stop codons fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_UpDown.xls") plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (C) Combined matrix for both binned and start/stop codon fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/test.xls") plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (D) Rle coverage objects one for each query feature fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/RleCoverage.xls") ## End(Not run)
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1) ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") ## (A) Generate binned coverage for two BAM files and 4 transcripts grl <- cdsBy(txdb, "tx", use.names=TRUE) fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage.xls") plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (B) Coverage matrix upstream and downstream of start/stop codons fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/featureCoverage_UpDown.xls") plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (C) Combined matrix for both binned and start/stop codon fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/test.xls") plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6) ## (D) Rle coverage objects one for each query feature fcov <- featureCoverage(bfl=BamFileList(outpaths[1:2]), grl=grl[1:4], resizereads=NULL, readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, resizefeatures=TRUE, upstream=20, downstream=20, outfile="results/RleCoverage.xls") ## End(Not run)
Function to visualize the distribution of reads across different feature types
for many alignment files in parallel. The plots are stacked bar plots
representing the raw or normalized read counts for the sense and antisense
strand of each feature. The graphics results are generated with
ggplot2
. Typically, the expected input is generated with the
affiliated featuretypeCounts
function.
plotfeaturetypeCounts(x, graphicsfile, graphicsformat = "pdf", scales = "fixed", anyreadlength = FALSE, drop_N_total_aligned = TRUE, scale_count_val = 10^6, scale_length_val = NULL)
plotfeaturetypeCounts(x, graphicsfile, graphicsformat = "pdf", scales = "fixed", anyreadlength = FALSE, drop_N_total_aligned = TRUE, scale_count_val = 10^6, scale_length_val = NULL)
x |
|
graphicsfile |
Path to file where to write the output graphics. Note, the function returns
the graphics instructions from |
graphicsformat |
Graphics file format. Currently, supported formats are: pdf, png or jpeg. Argument accepts one of them as character string. |
scales |
Scales setting passed on to the |
anyreadlength |
If set to |
drop_N_total_aligned |
If set to |
scale_count_val |
Scales (normalizes) the read counts to a fixed value of aligned reads
in each sample such as counts per million aligned reads (default is 10^6).
For this calculation the |
scale_length_val |
Allows to adjust the raw or scaled read counts to a constant length interval
(e.g. |
The function returns bar plot graphics for aligned read counts with read
length resolution if the input contains this information and argument
anyreadlength
is set to FALSE
. If the input contains counts for
any read length and/or anyreadlength=TRUE
then there will be only one
bar per feature and sample. Due to the complexity of the plots, the results
are directly written to file in the chosen graphics format. However, the
function also returns the plotting instructions returned by ggplot2
to
display the result components using R's plotting device.
Thomas Girke
featuretypeCounts
, genFeatures
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1) ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE) ## Generate and plot feature counts for specific read lengths fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame") p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE) ## Generate and plot feature counts for any read length fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame") p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE) ## End(Not run)
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1) ## Features from sample data of systemPipeRdata package library(txdbmaker) file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis") feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE) ## Generate and plot feature counts for specific read lengths fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame") p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE) ## Generate and plot feature counts for any read length fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame") p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE) ## End(Not run)
Visualize SPR workflow and status. plotWF
is the general function
that creates the plot.
plotwfOutput
and renderPlotwf
are used in Shiny UI and server
respectively, similar to plotOutput
and renderPlot
.
plotWF( sysargs, width = NULL, height = NULL, elementId = NULL, responsive = TRUE, branch_method = "auto", branch_no = NULL, layout = "compact", no_plot = FALSE, plot_method = "svg", out_format = "plot", out_path = NULL, show_legend = TRUE, mark_main_branch = FALSE, rstudio = FALSE, in_log = FALSE, rmarkdown = "detect", verbose = FALSE, show_warns = FALSE, plot_ctr = TRUE, pan_zoom = FALSE, exit_point = 0 ) plotwfOutput( outputId, width = '100%', height = '400px' ) renderPlotwf( expr, env = parent.frame(), quoted = FALSE )
plotWF( sysargs, width = NULL, height = NULL, elementId = NULL, responsive = TRUE, branch_method = "auto", branch_no = NULL, layout = "compact", no_plot = FALSE, plot_method = "svg", out_format = "plot", out_path = NULL, show_legend = TRUE, mark_main_branch = FALSE, rstudio = FALSE, in_log = FALSE, rmarkdown = "detect", verbose = FALSE, show_warns = FALSE, plot_ctr = TRUE, pan_zoom = FALSE, exit_point = 0 ) plotwfOutput( outputId, width = '100%', height = '400px' ) renderPlotwf( expr, env = parent.frame(), quoted = FALSE )
sysargs |
object of class |
width |
string, a valid CSS string for width, like "500px", "100%". |
height |
string, a valid CSS string for height, like "500px", "100%". |
elementId |
string, optional ID value for the plot. |
responsive |
bool, should the plot be responsive? useful in Rstudio built-in viewer, Rmarkdown, Shiny or embed it into other web pages. |
branch_method |
string, one of "auto", "choose". How to determine the main branch of the workflow. "auto" will be determined by internal alrgothrim: Branches connecting the frist and last step and/or the longest will be favored. "choose" will list all possible branches and you can make a choice. |
branch_no |
numeric, only works if |
layout |
string, one of "compact", "vertical", "horizontal", "execution". |
no_plot |
bool, if you want to assgin the plot to a variable and do not want
to see it interactively, change this to |
plot_method |
string, one of "svg", "png", how to make plot, use svg or png to embed the plot. |
out_format |
string, one of "plot", "html", "dot", "dot_print"
See |
out_path |
string, if the |
show_legend |
bool, show plot legend? |
mark_main_branch |
bool, color the main branch on the plot? |
rstudio |
bool, if you are using Rstudio, open the built-in viewer to see the
plot? Default is no, open the browser tab to see it plot. The default viewer is
too small to see the full plot clearly, so we recommend to use the browser tab.
However, if you are using this plot in Shiny apps, always turn |
in_log |
bool, is this plot been made in a SPR log file? If |
rmarkdown |
are you rendering this plot in a Rmarkdown document? default value is
"detect", this function will determine based on current R environment, or you
can force it to be |
verbose |
bool, turn on verbose mode will give you more information. |
show_warns |
bool, print the warning messages on the plot?. |
plot_ctr |
bool, add the plot control panel to the plot? This requires you to have internet connection. It will download some additional javascript libraries, and allow you to save the plot as png, jpg, svg, pdf or graphviz directly from the browser. |
pan_zoom |
bool, allow panning and zooming of the plot? Use mouse wheel
or touch pad to zoom in and out of the plot. You need to have internet
connection, additional javascript libraries will be loaded automatically online.
Cannot be used with |
exit_point |
numeric, for advanced debugging only, see details |
outputId |
string, shiny output ID |
expr |
An expression that generates a plotwf, like |
env |
The environment in which to evaluate |
quoted |
Is |
compact: try to plot steps as close as possible.
vertical: main branch will be placed vertically and side branches will be placed on the same horizontal level and sub steps of side branches will be placed vertically.
horizontal: main branch is placed horizontally and side branches and sub steps will be placed vertically.
execution: a linear plot to show the workflow execution order of all steps.
return intermediate results at different points and exit the function
0: no early exit
1: after all branches are found, return tree
2: after the new tree has been built, return new nodes
3: after dot translation, return graph string
Rmarkdown will change some of the format and cause conflicts. If the plot can be rendered outside Rmd but cannot within Rmd, try to turn this option on. Some additional javascript processing will be performed to avoid the conflict but may cause unknown issues.
The plot rendering uses htmlwidgets
, which generates an interactive HTML page.
Saving these plots directly to standard image files, such as png
, is not possible.
However, a few workarounds exist to save to these image formats:
1: use webshot2::webshot
function.
2: use the interactive panel located on the top-left corner to download as an image after the plot is rendered.
3: use plotWF(sal, plot_method = "png")
to embed the plot as png
, and
then right-click to save the image.
Please see our website for examples: https://systempipe.org/sp/spr/sp_run/step_vis/
When the plot is rendered in a Shiny app, the rstudio
option must be turned on,
plotWF(sal, rstudio = TRUE, ...)
.
see out_format
and exit_point
Predicts open reading frames (ORFs) and coding sequences (CDSs) in DNA sequences provided as DNAString
or DNAStringSet
objects.
predORF(x, n = 1, type = "grl", mode = "orf", strand = "sense", longest_disjoint=FALSE, startcodon = "ATG", stopcodon = c("TAA", "TAG", "TGA"))
predORF(x, n = 1, type = "grl", mode = "orf", strand = "sense", longest_disjoint=FALSE, startcodon = "ATG", stopcodon = c("TAA", "TAG", "TGA"))
x |
DNA query sequence(s) provided as |
n |
Defines the maximum number of ORFs to return for each input sequence. The ORFs identified are sorted decreasingly
by their length. For instance, |
type |
One of three options provided as character values: |
mode |
The setting |
strand |
One of three options passed on as character vector of length one: |
longest_disjoint |
If set to |
startcodon |
Defines the start codon(s) for ORF predictions. The default is set to the standard start codon 'ATG'. Any custom set of triplet DNA sequences can be assigned here. |
stopcodon |
Defines the stop codon(s) for ORF predictions. The default is set to the three standard stop codons 'TAA', 'TAG' and 'TGA'. Any custom set of triplet DNA sequences can be assigned here. |
Returns ORF/CDS ranges identified in query sequences as GRanges
or
data.frame
object. The type
argument defines which one of them
will be returned. The objects contain the following columns:
seqnames
: names of query sequences
subject_id
: identified ORF/CDS ranges numbered by query
start/end
: start and end positions of ORF/CDS ranges
strand
: strand of query sequence used for prediction
width
: length of subject range in bases
inframe2end
: frame of identified ORF/CDS relative to 3'
end of query sequence. This can be important if the query sequence was
extracted directly upstream of an ORF (e.g. 5' UTR upstream of main ORF).
The value 1 stands for in-frame with downstream ORF, while 2 or 3 indicates
a shift of one or two bases, respectively.
Thomas Girke
scaleRanges
## Load DNA sample data set from Biostrings package file <- system.file("extdata", "someORF.fa", package="Biostrings") dna <- readDNAStringSet(file) ## Predict longest ORF for sense strand in each query sequence (orf <- predORF(dna[1:4], n=1, type="gr", mode="orf", strand="sense")) ## Not run: ## Usage for more complex example library(txdbmaker); library(systemPipeRdata) gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis") futr <- fiveUTRsByTranscript(txdb, use.names=TRUE) genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata") dna <- extractTranscriptSeqs(FaFile(genome), futr) uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense") grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE) export.gff3(unlist(grl_scaled), "uorf.gff") ## End(Not run)
## Load DNA sample data set from Biostrings package file <- system.file("extdata", "someORF.fa", package="Biostrings") dna <- readDNAStringSet(file) ## Predict longest ORF for sense strand in each query sequence (orf <- predORF(dna[1:4], n=1, type="gr", mode="orf", strand="sense")) ## Not run: ## Usage for more complex example library(txdbmaker); library(systemPipeRdata) gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis") futr <- fiveUTRsByTranscript(txdb, use.names=TRUE) genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata") dna <- extractTranscriptSeqs(FaFile(genome), futr) uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense") grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE) export.gff3(unlist(grl_scaled), "uorf.gff") ## End(Not run)
Applies custom read preprocessing functions to single-end or paired-end FASTQ
files. The function uses the FastqStreamer
function from the ShortRead
package to stream through large files in a memory-efficient manner.
preprocessReads(args = NULL, FileName1 = NULL, FileName2 = NULL, outfile1 = NULL, outfile2 = NULL, Fct, batchsize = 100000, overwrite = TRUE, ...)
preprocessReads(args = NULL, FileName1 = NULL, FileName2 = NULL, outfile1 = NULL, outfile2 = NULL, Fct, batchsize = 100000, overwrite = TRUE, ...)
args |
Object of class |
FileName1 |
Path to input forward fastq file. Defaul is |
FileName2 |
Path to input reverse fastq file. Defaul is |
outfile1 |
Path to output forward fastq file. Defaul is |
outfile2 |
Path to output reverse fastq file. Defaul is |
Fct |
|
batchsize |
Number of reads to process in each iteration by the internally used |
overwrite |
If |
... |
To pass on additional arguments to the internally used |
Writes to files in FASTQ format. Their names are specified by outpaths(args)
.
Thomas Girke
FastqStreamer
## Preprocessing of single-end reads dir_path <- system.file("extdata/cwl/preprocessReads/trim-se", package="systemPipeR") targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") trim <- loadWorkflow(targets=targetspath, wf_file="trim-se.cwl", input_file="trim-se.yml", dir_path=dir_path) trim <- renderWF(trim, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) ## Not run: preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE) ## End(Not run) ## Preprocessing of paired-end reads dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR") targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR") trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path) trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_")) trim ## Not run: preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE) ## End(Not run)
## Preprocessing of single-end reads dir_path <- system.file("extdata/cwl/preprocessReads/trim-se", package="systemPipeR") targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") trim <- loadWorkflow(targets=targetspath, wf_file="trim-se.cwl", input_file="trim-se.yml", dir_path=dir_path) trim <- renderWF(trim, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) ## Not run: preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE) ## End(Not run) ## Preprocessing of paired-end reads dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR") targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR") trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path) trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_")) trim ## Not run: preprocessReads(args=trim[1], Fct="trimLRPatterns(Rpattern='GCCCGGGTAA', subject=fq)", batchsize=100000, overwrite=TRUE, compress=TRUE) ## End(Not run)
Accessories function to modify the Command-line Version 1
printParam(sysargs, position, index = NULL) subsetParam(sysargs, position, index = NULL, trim = TRUE, mute = FALSE) replaceParam(sysargs, position, index = NULL, replace, mute = FALSE) renameParam(sysargs, position, index = FALSE, rename, mute = FALSE) appendParam(sysargs, position, index = NULL, append, after, mute = FALSE)
printParam(sysargs, position, index = NULL) subsetParam(sysargs, position, index = NULL, trim = TRUE, mute = FALSE) replaceParam(sysargs, position, index = NULL, replace, mute = FALSE) renameParam(sysargs, position, index = FALSE, rename, mute = FALSE) appendParam(sysargs, position, index = NULL, append, after, mute = FALSE)
sysargs |
Object of class |
position |
string, one of |
index |
numeric or character vector, index to view or change a single item in
|
trim |
logical, only keep arguments specified by |
replace |
named list, replace arguments in different positions. Replace list length
must be the same as |
rename |
character vector, rename items in different positions. |
append |
named list, same requirements as |
after |
a subscript, after which the values are to be appended. If |
mute |
logical, print the raw command-line string and output after replacing or rename. |
- printParam
: prints its arguments defined by position
and index
.
- subsetParam
: returns subsets of command-line, keeping the arguments defined by position
and index
.
- replaceParam
: replaces the values in command-line with indices given in list by those given in values
- renameParam
: rename the names of the arguments.
- appendParam
: Add arguments to the original command line.
SYSargs2
object
Le Zhang and Daniela Cassol
For more details on CWL
, please consult the following page: https://www.commonwl.org/
writeParamFiles
createParamFiles
loadWorkflow
renderWF
showClass("SYSargs2")
command <- " hisat2 \ -S <F, out: ./results/M1A.sam> \ -x <F: ./data/tair10.fasta> \ -k <int: 1> \ -min-intronlen <int: 30> \ -max-intronlen <int: 3000> \ -threads <int: 4> \ -U <F: ./data/SRR446027_1.fastq.gz> \ --verbose " cmd <- createParamFiles(command) cmdlist(cmd)
command <- " hisat2 \ -S <F, out: ./results/M1A.sam> \ -x <F: ./data/tair10.fasta> \ -k <int: 1> \ -min-intronlen <int: 30> \ -max-intronlen <int: 3000> \ -threads <int: 4> \ -U <F: ./data/SRR446027_1.fastq.gz> \ --verbose " cmd <- createParamFiles(command) cmdlist(cmd)
Accessories function to modify the Command-line Version 2
printParam2(sysargs, base = FALSE, args = FALSE, inputs = FALSE, outputs = FALSE, stdout = FALSE, raw_cmd = FALSE, all = TRUE) appendParam2(sysargs, x, position = c("inputs", "args", "outputs"), after = NULL, verbose = FALSE) replaceParam2(sysargs, x, index=NULL, position = c("inputs", "baseCommand", "args", "outputs", "stdout"), verbose = FALSE) removeParam2(sysargs, index=NULL, position = c("inputs", "args", "outputs", "stdout"), verbose = FALSE) renameParam2(sysargs, index=NULL, new_names, position = c("inputs", "args", "outputs", "stdout"), verbose = FALSE)
printParam2(sysargs, base = FALSE, args = FALSE, inputs = FALSE, outputs = FALSE, stdout = FALSE, raw_cmd = FALSE, all = TRUE) appendParam2(sysargs, x, position = c("inputs", "args", "outputs"), after = NULL, verbose = FALSE) replaceParam2(sysargs, x, index=NULL, position = c("inputs", "baseCommand", "args", "outputs", "stdout"), verbose = FALSE) removeParam2(sysargs, index=NULL, position = c("inputs", "args", "outputs", "stdout"), verbose = FALSE) renameParam2(sysargs, index=NULL, new_names, position = c("inputs", "args", "outputs", "stdout"), verbose = FALSE)
sysargs |
Object of class |
base |
logical, print out base command information? |
args |
logical, print out arguments information? |
inputs |
logical, print out inputs information? |
outputs |
logical, print out outputs information? |
stdout |
logical, print out stdout information? |
raw_cmd |
logical, print out parsed raw command information? |
all |
logical, print out all base command, arguments, inputs,
outputs, and raw command information?
Turn this to |
position |
string, one of the positions to apply a modification.
For |
index |
numeric or character vector, index of items to remove or rename item(s)
inside the position you choose, in |
after |
integer, in |
x |
named list or string, new items to replace or append in different positions. Replace list length
must be the same as |
new_names |
character vector, new names that you wish to replace the old names. |
verbose |
logical, show addtional information during/after operation? for example, print the new changes. |
- printParam2
: prints its arguments defined by position
and index
.
- removeParam2
: removes items in certain positions you select.
- replaceParam2
: replaces the values in command-line with indices given in list by those given in values
- renameParam2
: rename the names of items in certain position.
- appendParam2
: Add arguments to the original command line. Adding new basecommand or
standard out is not allowed.
- If x
is a character, it requires exact 3 semi-colons ;
to separate the string
in to 4 columns. Values before the third column is the same as createParam
inputs,
first column: prefix/argument name, second column: type, third column: default value.
The fourth column (new): numeric, index of the new item, this will be translated into
position
entries in CWL.
- If x
is a list, it must be named. Following items must be included in list:
preF
, type
, value
, index
. They refer to prefix, param type,
default value, and position index correspondingly.
SYSargs2
object
Le Zhang and Daniela Cassol
For more details on CWL
, please consult the following page: https://www.commonwl.org/
writeParamFiles
createParamFiles
loadWorkflow
renderWF
showClass("SYSargs2")
command2 <- ' mycmd2 \ p: -s; File; sample1.txt \ p: -s; File; sample2.txt \ p: --c; ; \ p: -o; File; out: myout.txt \ ref_genome; File; a.fasta \ p: --nn; int; 12 \ mystdout; File; stdout: abc.txt ' cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE) # string format new_cmd <- 'p: -abc; string; abc; 7' cmd2 <- appendParam2(cmd2, x = new_cmd, position = "inputs") printParam2(cmd2, all = FALSE, inputs = TRUE, raw_cmd = TRUE) # list format new_cmd <- list(name = "new_arg", preF = "--foo", index = "8") cmd2 <- appendParam2(cmd2, x = new_cmd, position = "args") printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE) # rename cmd2 <- renameParam2(cmd2, "new_name_arg", index = "new_arg", position = "args") printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE) # remove cmd2 <- removeParam2(cmd2, index = "new_name_arg", position = "args") printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)
command2 <- ' mycmd2 \ p: -s; File; sample1.txt \ p: -s; File; sample2.txt \ p: --c; ; \ p: -o; File; out: myout.txt \ ref_genome; File; a.fasta \ p: --nn; int; 12 \ mystdout; File; stdout: abc.txt ' cmd2 <- createParam(command2, syntaxVersion = "v2", writeParamFiles=FALSE) # string format new_cmd <- 'p: -abc; string; abc; 7' cmd2 <- appendParam2(cmd2, x = new_cmd, position = "inputs") printParam2(cmd2, all = FALSE, inputs = TRUE, raw_cmd = TRUE) # list format new_cmd <- list(name = "new_arg", preF = "--foo", index = "8") cmd2 <- appendParam2(cmd2, x = new_cmd, position = "args") printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE) # rename cmd2 <- renameParam2(cmd2, "new_name_arg", index = "new_arg", position = "args") printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE) # remove cmd2 <- removeParam2(cmd2, index = "new_name_arg", position = "args") printParam2(cmd2, all = FALSE, args = TRUE, raw_cmd = TRUE)
Parses sample comparisons specified in <CMP>
line(s) of targets file
or in targetsheader
slot of SYSargs
object. All possible comparisons
can be specified with 'CMPset: ALL'.
readComp(file, format = "vector", delim = "-")
readComp(file, format = "vector", delim = "-")
file |
Path to targets file. Alternatively, a |
format |
Object type to return: |
delim |
Delimiter to use when sample comparisons are returned as |
list
where each component is named according to the name(s) used in the <CMP>
line(s) of the targets file. The list will contain as many sample comparisons sets (list components) as there are sample comparisons lines in the corresponding targets file.
Thomas Girke
## Return comparisons from targets file targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") read.delim(targetspath, comment.char = "#") readComp(file=targetspath, format="vector", delim="-") ## Return comparisons from SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args readComp(args, format = "vector", delim = "-")
## Return comparisons from targets file targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") read.delim(targetspath, comment.char = "#") readComp(file=targetspath, format="vector", delim="-") ## Return comparisons from SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args readComp(args, format = "vector", delim = "-")
Render the logs report file to the specified output format using pandoc.
renderLogs(sysargs, type = c("html_document", "pdf_document"), fileName = "default", quiet = FALSE, open_file = TRUE)
renderLogs(sysargs, type = c("html_document", "pdf_document"), fileName = "default", quiet = FALSE, open_file = TRUE)
sysargs |
object of class |
type |
The R Markdown output format to convert to. The option
can be the name of a format (e.g. |
fileName |
character string naming a file output. Default is |
quiet |
If set to |
open_file |
Default is |
It will return an SYSargsList
updated.
Daniela Cassol
See also as SYSargsList-class
.
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal <- runWF(sal) sal <- renderLogs(sal, open_file = FALSE)
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal <- runWF(sal) sal <- renderLogs(sal, open_file = FALSE)
Render the technical report file to the specified output format using pandoc.
renderReport(sysargs, fileName ="SPR_Report", rmd_title = "SPR workflow Template - Report", rmd_author = "Author", rmd_date= "Last update: `r format(Sys.time(), '%d %B, %Y')`", type = c("html_document"), desc = "This is a workflow template.", quiet = FALSE, open_file = TRUE)
renderReport(sysargs, fileName ="SPR_Report", rmd_title = "SPR workflow Template - Report", rmd_author = "Author", rmd_date= "Last update: `r format(Sys.time(), '%d %B, %Y')`", type = c("html_document"), desc = "This is a workflow template.", quiet = FALSE, open_file = TRUE)
sysargs |
object of class |
fileName |
character string naming a file output. Default is |
rmd_title |
string, title of the Rmd. |
rmd_author |
string, author(s) of the Rmd, put all authors in a single character string. |
rmd_date |
string, date header of Rmd. |
type |
The R Markdown output format to convert to. The option
can be the name of a format (e.g. |
desc |
string, or character vector of strings, some description text in format Rmarkdown that will be added to the document before the workflow steps start. It can be a single line or multiple lines by providing a character vector, each item is one line. |
quiet |
If set to |
open_file |
Default is |
It will return an SYSargsList
updated, with the file path location.
Daniela Cassol
See also as SYSargsList-class
.
sal <- SPRproject(overwrite = TRUE) file_path <- system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR") sal <- importWF(sal, file_path = file_path, verbose = FALSE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package = "systemPipeR") appendStep(sal) <- SYSargsList(step_name = "echo", targets = targetspath, dir = TRUE, wf_file = "example/workflow_example.cwl", input_file = "example/example.yml", dir_path = system.file("extdata/cwl", package = "systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) sal <- renderReport(sal, open_file = FALSE)
sal <- SPRproject(overwrite = TRUE) file_path <- system.file("extdata", "spr_simple_wf.Rmd", package = "systemPipeR") sal <- importWF(sal, file_path = file_path, verbose = FALSE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package = "systemPipeR") appendStep(sal) <- SYSargsList(step_name = "echo", targets = targetspath, dir = TRUE, wf_file = "example/workflow_example.cwl", input_file = "example/example.yml", dir_path = system.file("extdata/cwl", package = "systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) sal <- renderReport(sal, open_file = FALSE)
Converts read counts to RPKM normalized values.
returnRPKM(counts, ranges)
returnRPKM(counts, ranges)
counts |
Count data frame, e.g. from an RNA-Seq experiment. |
ranges |
|
data.frame
Thomas Girke
## Not run: countDFrpkm <- apply(countDF, 2, function(x) returnRPKM(counts=x, gffsub=eByg)) ## End(Not run)
## Not run: countDFrpkm <- apply(countDF, 2, function(x) returnRPKM(counts=x, gffsub=eByg)) ## End(Not run)
Convenience wrapper function to identify differentially expressed genes
(DEGs) in batch mode with DESeq2
for any number of pairwise sample
comparisons specified under the cmp
argument. Users are strongly
encouraged to consult the DESeq2
vignette for more detailed information
on this topic and how to properly run DESeq2
on data sets with more
complex experimental designs.
run_DESeq2(countDF, targets, cmp, independent = FALSE, lfcShrink=FALSE, type="normal")
run_DESeq2(countDF, targets, cmp, independent = FALSE, lfcShrink=FALSE, type="normal")
countDF |
|
targets |
targets |
cmp |
|
independent |
If |
lfcShrink |
logiacal. If |
type |
please check |
data.frame
containing DESeq2
results from all comparisons. Comparison labels are appended to column titles for tracking.
Thomas Girke
Please properly cite the DESeq2
papers when using this function:
http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html
run_edgeR
, readComp
and DESeq2
vignette
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") cmp <- readComp(file=targetspath, format="matrix", delim="-") countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR") countDF <- read.delim(countfile, row.names=1) degseqDF <- run_DESeq2(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE) pval <- degseqDF[, grep("_FDR$", colnames(degseqDF)), drop=FALSE] fold <- degseqDF[, grep("_logFC$", colnames(degseqDF)), drop=FALSE] DEG_list <- filterDEGs(degDF=degseqDF, filter=c(Fold=2, FDR=10)) names(DEG_list) DEG_list$Summary
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") cmp <- readComp(file=targetspath, format="matrix", delim="-") countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR") countDF <- read.delim(countfile, row.names=1) degseqDF <- run_DESeq2(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE) pval <- degseqDF[, grep("_FDR$", colnames(degseqDF)), drop=FALSE] fold <- degseqDF[, grep("_logFC$", colnames(degseqDF)), drop=FALSE] DEG_list <- filterDEGs(degDF=degseqDF, filter=c(Fold=2, FDR=10)) names(DEG_list) DEG_list$Summary
Convenience wrapper function to identify differentially expressed genes
(DEGs) in batch mode with the edgeR
GML method for any number of pairwise
sample comparisons specified under the cmp
argument. Users are strongly
encouraged to consult the edgeR
vignette for more detailed information
on this topic and how to properly run edgeR
on data sets with more
complex experimental designs.
run_edgeR(countDF, targets, cmp, independent = TRUE, paired = NULL, mdsplot = "")
run_edgeR(countDF, targets, cmp, independent = TRUE, paired = NULL, mdsplot = "")
countDF |
|
targets |
targets |
cmp |
|
independent |
If |
paired |
Defines pairs ( |
mdsplot |
Directory where |
data.frame
containing edgeR
results from all comparisons. Comparison labels are appended to column titles for tracking.
Thomas Girke
Please properly cite the edgeR
papers when using this function:
http://www.bioconductor.org/packages/devel/bioc/html/edgeR.html
run_DESeq2
, readComp
and edgeR
vignette
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") cmp <- readComp(file=targetspath, format="matrix", delim="-") countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR") countDF <- read.delim(countfile, row.names=1) edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="") pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE] fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE] DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10)) names(DEG_list) DEG_list$Summary
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") cmp <- readComp(file=targetspath, format="matrix", delim="-") countfile <- system.file("extdata", "countDFeByg.xls", package="systemPipeR") countDF <- read.delim(countfile, row.names=1) edgeDF <- run_edgeR(countDF=countDF, targets=targets, cmp=cmp[[1]], independent=FALSE, mdsplot="") pval <- edgeDF[, grep("_FDR$", colnames(edgeDF)), drop=FALSE] fold <- edgeDF[, grep("_logFC$", colnames(edgeDF)), drop=FALSE] DEG_list <- filterDEGs(degDF=edgeDF, filter=c(Fold=2, FDR=10)) names(DEG_list) DEG_list$Summary
Function to execute system parameters specified in SYSargs
and SYSargs2
object.
runCommandline(args, runid = "01", make_bam = FALSE, del_sam=TRUE, dir = TRUE, dir.name = NULL, force=FALSE, input_targets = NULL, ...)
runCommandline(args, runid = "01", make_bam = FALSE, del_sam=TRUE, dir = TRUE, dir.name = NULL, force=FALSE, input_targets = NULL, ...)
args |
object of class |
runid |
Run identifier used for log file to track system call commands.
Default is |
make_bam |
Auto-detects SAM file outputs and converts them to sorted and indexed BAM
files. Default is |
del_sam |
This option allows deleting the SAM files created when the |
dir |
This option allows creating an exclusive results folder for each step in the
workflow and a sub-folder for each sample defined in the |
dir.name |
Name of the workflow directory. Default is |
force |
Internally, the function checks if the expected |
input_targets |
This option allows selecting which targets file and, by consequence which
command line will be executed. Default is |
... |
Additional arguments to pass on to |
Output files, their paths can be obtained with outpaths()
from
SYSargs
container or output()
from SYSargs2
.
In addition, a character vector
is returned containing the same
paths.
Daniela Cassol and Thomas Girke
########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF) ## Not run: ## Execute SYSargs2 on single machine WF <- runCommandline(args=WF) ## Execute SYSargs on multiple machines of a compute cluster. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) reg <- clusterRun(WF, FUN = runCommandline, more.args = list(args = WF, make_bam = TRUE), conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) ## Updates the path in the object \code{output(WF)} WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam")) ## Alignment stats read_statsDF <- alignStats(WF) read_statsDF <- cbind(read_statsDF[targets$FileName,], targets) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run) ######################################### ## Examples with \code{SYSargs} object ## ######################################### ## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Execute SYSargs on multiple machines of a compute cluster. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, FUN = runCommandline, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) file.exists(outpaths(args)) ## Alignment stats read_statsDF <- alignStats(args) read_statsDF <- cbind(read_statsDF[targets$FileName,], targets) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run)
########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object from CWl param, CWL input, and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF) ## Not run: ## Execute SYSargs2 on single machine WF <- runCommandline(args=WF) ## Execute SYSargs on multiple machines of a compute cluster. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) reg <- clusterRun(WF, FUN = runCommandline, more.args = list(args = WF, make_bam = TRUE), conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) ## Updates the path in the object \code{output(WF)} WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam")) ## Alignment stats read_statsDF <- alignStats(WF) read_statsDF <- cbind(read_statsDF[targets$FileName,], targets) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run) ######################################### ## Examples with \code{SYSargs} object ## ######################################### ## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Execute SYSargs on multiple machines of a compute cluster. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, FUN = runCommandline, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) file.exists(outpaths(args)) ## Alignment stats read_statsDF <- alignStats(args) read_statsDF <- cbind(read_statsDF[targets$FileName,], targets) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## End(Not run)
Convenience wrapper function for run_edgeR
and run_DESeq2
to
perform differential expression or abundance analysis iteratively for several
count tables. The latter can be peak calling results for several samples or
counts generated for different genomic feature types. The function also returns
the filtering results and plots from filterDEGs
.
runDiff(args, outfiles=NULL, diffFct, targets, cmp, dbrfilter, ...)
runDiff(args, outfiles=NULL, diffFct, targets, cmp, dbrfilter, ...)
args |
An instance of |
outfiles |
Default is |
diffFct |
Defines which function should be used for the differential abundance analysis.
Can be |
targets |
targets |
cmp |
|
dbrfilter |
Named vector with filter cutoffs of format |
... |
Arguments to be passed on to the internally used |
Returns list
containing the filterDEGs
results for each
count table. Each result set is a list
with four components
which are described under ?filterDEGs
. The result files
contain the edgeR
or DESeq2
results from the comparisons
specified under cmp
. The base names of the result files are the
same as the corresponding input files specified under countfiles
and the value of extension
appended.
Thomas Girke
run_edgeR
, run_DESeq2
, filterDEGs
## Paths to BAM files param <- system.file("extdata", "bowtieSE.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args_bam <- systemArgs(sysma=param, mytargets=targets) bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character()) ## Not run: ## SYSargs with paths to range data and count files args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt") ## Iterative read counting countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE) writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE) ## Run differential abundance analysis cmp <- readComp(file=args_bam, format="matrix") args_diff <- systemArgs(sysma="param/rundiff.param", mytargets="targets_countDF.txt") dbrlist <- runDiff(args, diffFct=run_edgeR, targets=targetsin(args_bam), cmp=cmp[[1]], independent=TRUE, dbrfilter=c(Fold=2, FDR=1)) writeTargetsout(x=args_diff, file="targets_rundiff.txt", overwrite=TRUE) ## End(Not run)
## Paths to BAM files param <- system.file("extdata", "bowtieSE.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args_bam <- systemArgs(sysma=param, mytargets=targets) bfl <- BamFileList(outpaths(args_bam), yieldSize=50000, index=character()) ## Not run: ## SYSargs with paths to range data and count files args <- systemArgs(sysma="param/count_rangesets.param", mytargets="targets_macs.txt") ## Iterative read counting countDFnames <- countRangeset(bfl, args, mode="Union", ignore.strand=TRUE) writeTargetsout(x=args, file="targets_countDF.txt", overwrite=TRUE) ## Run differential abundance analysis cmp <- readComp(file=args_bam, format="matrix") args_diff <- systemArgs(sysma="param/rundiff.param", mytargets="targets_countDF.txt") dbrlist <- runDiff(args, diffFct=run_edgeR, targets=targetsin(args_bam), cmp=cmp[[1]], independent=TRUE, dbrfilter=c(Fold=2, FDR=1)) writeTargetsout(x=args_diff, file="targets_rundiff.txt", overwrite=TRUE) ## End(Not run)
Function to execute all the code list specified in SYSargsList object.
runWF(sysargs, steps = NULL, targets = NULL, force = FALSE, saveEnv = TRUE, run_step = "ALL", ignore.dep = FALSE, warning.stop = FALSE, error.stop = TRUE, silent = FALSE, ...)
runWF(sysargs, steps = NULL, targets = NULL, force = FALSE, saveEnv = TRUE, run_step = "ALL", ignore.dep = FALSE, warning.stop = FALSE, error.stop = TRUE, silent = FALSE, ...)
sysargs |
object of class |
steps |
character or numeric. Step name or index. If |
targets |
This option allows selecting which targets file and, by consequence which
command line will be executed for each |
force |
Internally, the option checks if the expected |
saveEnv |
If set to |
run_step |
character. If the step has "mandatory" or "optional" flag for the execution.
When |
ignore.dep |
logical. This option allow to igonore the dependency tree, when |
warning.stop |
If set to |
error.stop |
If set to |
silent |
If set to |
... |
Additional arguments to pass on from |
It will return an SYSargsList
updated.
Daniela Cassol and Thomas Girke
See also as SYSargsList-class
.
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") ## Not run: sal <- runWF(sal) ## End(Not run)
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") ## Not run: sal <- runWF(sal) ## End(Not run)
This function takes a SYSargsList object and translate it to an executable bash script, so one can run the workflow without loading SPR or using an R console.
sal2bash(sal, out_dir = ".", bash_path = "/bin/bash", stop_on_error = TRUE)
sal2bash(sal, out_dir = ".", bash_path = "/bin/bash", stop_on_error = TRUE)
sal |
|
out_dir |
string, a relative or absolute path to a directory. If the directory does not exist, this function will first try to create it. |
bash_path |
string, the path to the bash executable program |
stop_on_error |
bool, should the bash script stop if any error happens in running |
## out files
1. The main executable bash file will be created to the root of 'out_dir'
2. All R steps will be stored as R scripts and along with other supporting files inside a folder called 'spr_bash' under 'out_dir'
3. Not all R steps will have an individual file. This function will "collapse" adjacent R steps into one file as much as possible. Namely, if there is no sysArgs steps in between, R steps will be merged into one file, otherwise they will be in different files.
## R steps
Similarly as running the workflow in R console, all R steps will share the same environment variables and loaded packages. This is done by loading and saving the R environment into a file 'spr_wf.RData' before and after the R script execution. Therefore, it will be good to keep all R steps bundle together as much as possible to avoid the package and environment loading/saving overhead time.
Initially, this environment only contains the SYSargsList object that was used to create the bash script. Note: the SYSargsList object name will be the same as what you pass to 'sal2bash'. If you have 'sal2bash(my_sal)', then in the 'spr_wf.RData' there will be an object called 'my_sal' saved there. It is important to keep using the same name for the SYSargsList object managing the workflow and in workflow running. For example, if you have an R step that requires to query a column from the outfiles of the SAL, 'getColumn(sal, "FileName1")', but you pass 'my_sal' to 'sal2bash(my_sal)', this will cause this R step cannot find the SAL object when run the workflow from bash.
## Execution
This way of execution is not able to handle complex dependency graphs. The original step dependencies from SAL object will be ignored, so all steps will be executed in a linear manner. It is recommended to adjust the workflow order before using this function.
no return
Le Zhang and Daniela Cassol
file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) sal <- importWF(sal, file_path) sal2bash(sal)
file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) sal <- importWF(sal, file_path) sal2bash(sal)
This function takes a SYSargsList object and translate it to SPR workflow template Rmarkdown format.
sal2rmd(sal, out_path = "spr_template.Rmd", rmd_title = "SPR workflow template", rmd_author = "my name", rmd_date = "Last update: `r format(Sys.time(), '%d %B, %Y')`", rmd_output = "html_document", desc = "This is a workflow template.", verbose = TRUE)
sal2rmd(sal, out_path = "spr_template.Rmd", rmd_title = "SPR workflow template", rmd_author = "my name", rmd_date = "Last update: `r format(Sys.time(), '%d %B, %Y')`", rmd_output = "html_document", desc = "This is a workflow template.", verbose = TRUE)
sal |
|
out_path |
string, output file name. |
rmd_title |
string, title of the Rmd. |
rmd_author |
string, author(s) of the Rmd, put all authors in a single character string. |
rmd_date |
string, date header of Rmd. |
rmd_output |
string, output format of Rmd, used in header. |
desc |
string, or character vector of strings, some description text in format Rmarkdown that will be added to the document before the workflow steps start. It can be a single line or multiple lines by providing a character vector, each item is one line. |
verbose |
logical. If |
no return
Le Zhang and Daniela Cassol
file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) sal <- importWF(sal, file_path) sal2rmd(sal)
file_path <- system.file("extdata/spr_simple_wf.Rmd", package="systemPipeR") sal <- SPRproject(overwrite = TRUE) sal <- importWF(sal, file_path) sal2rmd(sal)
Function to scale mappings of spliced features (query ranges) to their
corresponding genome coordinates (subject ranges). The method accounts for
introns in the subject ranges that are absent in the query ranges. A use case
example are uORFs predicted in the 5' UTRs sequences using predORF
.
These query ranges are given relative to the 5' UTR sequence. The
scaleRanges
function will scale them to the corresponding genome
coordinates. This way they can be used in RNA-Seq expression experiments
like other gene ranges.
scaleRanges(subject, query, type = "custom", verbose = TRUE)
scaleRanges(subject, query, type = "custom", verbose = TRUE)
subject |
Genomic ranges provided as |
query |
Feature level ranges provided as |
type |
Feature name to use in |
verbose |
The setting |
Object of class GRangesList
Thomas Girke
predORF
library(IRanges) ## Usage for simple example subject <- GRanges(seqnames="Chr1", IRanges(c(5,15,30),c(10,25,40)), strand="+") query <- GRanges(seqnames="myseq", IRanges(1, 9), strand="+") scaleRanges(GRangesList(myid1=subject), GRangesList(myid1=query), type="test") ## Not run: ## Usage for more complex example library(txdbmaker); library(systemPipeRdata) gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis") futr <- fiveUTRsByTranscript(txdb, use.names=TRUE) genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata") dna <- extractTranscriptSeqs(FaFile(genome), futr) uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense") grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE) export.gff3(unlist(grl_scaled), "uorf.gff") ## End(Not run)
library(IRanges) ## Usage for simple example subject <- GRanges(seqnames="Chr1", IRanges(c(5,15,30),c(10,25,40)), strand="+") query <- GRanges(seqnames="myseq", IRanges(1, 9), strand="+") scaleRanges(GRangesList(myid1=subject), GRangesList(myid1=query), type="test") ## Not run: ## Usage for more complex example library(txdbmaker); library(systemPipeRdata) gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata") txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis") futr <- fiveUTRsByTranscript(txdb, use.names=TRUE) genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata") dna <- extractTranscriptSeqs(FaFile(genome), futr) uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense") grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE) export.gff3(unlist(grl_scaled), "uorf.gff") ## End(Not run)
The following seeFastq
and seeFastqPlot
functions generate and plot a series of
useful quality statistics for a set of FASTQ files including per cycle quality
box plots, base proportions, base-level quality trends, relative k-mer
diversity, length and occurrence distribution of reads, number of reads above
quality cutoffs and mean quality distribution. The functions allow processing
of reads with variable length, but most plots are only meaningful if the read
positions in the FASTQ file are aligned with the sequencing cycles. For
instance, constant length clipping of the reads on either end or variable
length clipping on the 3' end maintains this relationship, while variable
length clipping on the 5' end without reversing the reads erases it.
The function seeFastq
computes the summary stats and stores them in a relatively
small list object that can be saved to disk with save()
and reloaded with
load()
for later plotting. The argument 'klength' specifies the k-mer length and 'batchsize' the
number of reads to random sample from each fastq file.
seeFastq(fastq, batchsize, klength = 8) seeFastqPlot(fqlist, arrange = c(1, 2, 3, 4, 5, 8, 6, 7), ...)
seeFastq(fastq, batchsize, klength = 8) seeFastqPlot(fqlist, arrange = c(1, 2, 3, 4, 5, 8, 6, 7), ...)
fastq |
Named character vector containing paths to FASTQ file in the data fields and sample labels in the name slots. |
batchsize |
Number of reads to random sample from each FASTQ file that will be considered in the QC analysis. Smaller numbers reduce the memory footprint and compute time. |
klength |
Specifies the k-mer length in the plot for the relative k-mer diversity. |
fqlist |
|
arrange |
Integer vector from 1 to 7 specifying the row order of the QC plot. Dropping numbers eliminates the corresponding plots. |
... |
Additional plotting arguments to pass on to |
The function seeFastq
returns the summary stats in a list
containing all information required for the quality plots.
The function seeFastqPlot
plots the information generated by seeFastq
using ggplot2
.
Thomas Girke
## Not run: targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) fqlist <- seeFastq(fastq=infile1(args), batchsize=10000, klength=8) pdf("fastqReport.pdf", height=18, width=4*length(fastq)) seeFastqPlot(fqlist) dev.off() ## End(Not run)
## Not run: targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) fqlist <- seeFastq(fastq=infile1(args), batchsize=10000, klength=8) pdf("fastqReport.pdf", height=18, width=4*length(fastq)) seeFastqPlot(fqlist) dev.off() ## End(Not run)
Create an HTML table using DT package with fixed columns
showDF(data, ...)
showDF(data, ...)
data |
data object (either a matrix or a data frame). |
... |
Additional arguments used by dDT::atatable() function. |
returns an object of datatables
and htmlwidget
.
showDF(iris)
showDF(iris)
Function to construct SYSargsList
workflow control environment (S4 object).
This function creates and checks the directory structure. If the expected directories
are not available, it is possible to create those.
The project directory default structure expected is:
SPRproject/
data/
param/
results/
The project working directory names can be modified, but users need to edit the code accordingly.
SPRproject(projPath = getwd(), data = "data", param = "param", results = "results", logs.dir= ".SPRproject", sys.file="SYSargsList.yml", envir = new.env(), restart = FALSE, resume=FALSE, load.envir = FALSE, overwrite = FALSE, silent = FALSE)
SPRproject(projPath = getwd(), data = "data", param = "param", results = "results", logs.dir= ".SPRproject", sys.file="SYSargsList.yml", envir = new.env(), restart = FALSE, resume=FALSE, load.envir = FALSE, overwrite = FALSE, silent = FALSE)
projPath |
a character vector of a full project path name. Default is the current path. |
data |
a character vector of a |
param |
a character vector of a |
results |
a character vector of a |
logs.dir |
a character vector of a |
sys.file |
a character vector of |
envir |
the environment in which workflow will be evaluated. Default will create
a |
restart |
if set to |
resume |
if set to |
load.envir |
This argument allows to load the environment and recover all the objects saved
during the workflow execution (please check |
overwrite |
if set to |
silent |
if set to TRUE, all messages returned by the function will be suppressed. |
If an SYSargsList
instance was created before or independent of the
project initialization, it is possible to append this instance after the
project is created. Please see check appendStep<-
function.
SPRproject
will return a SYSargsList
object.
Daniela Cassol
See also as SYSargsList-class
.
sal <- SPRproject(projPath = tempdir()) sal
sal <- SPRproject(projPath = tempdir()) sal
Return subsets of character
for the input
, output
or the list of command-line for each workflow step
.
subsetWF(args, slot, subset=NULL, index=NULL, delete=FALSE)
subsetWF(args, slot, subset=NULL, index=NULL, delete=FALSE)
args |
object of class |
slot |
three options available: |
subset |
name or numeric position of the values to be subsetting in the |
index |
A numeric index positions of the file in |
delete |
allows to delete a subset of files in the case of |
Daniela Cassol and Thomas Girke
loadWorkflow
renderWF
## Construct SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## Testing subset_wf function input <- subsetWF(WF, slot="input", subset='FileName') output <- subsetWF(WF, slot="output", subset=1, index=1) step.cmd <- subsetWF(WF, slot="step", subset=1) ## subset all the HISAT2 commandline # subsetWF(WF, slot="output", subset=1, index=1, delete=TRUE) ## in order to delete the subset files list
## Construct SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## Testing subset_wf function input <- subsetWF(WF, slot="input", subset='FileName') output <- subsetWF(WF, slot="output", subset=1, index=1) step.cmd <- subsetWF(WF, slot="step", subset=1) ## subset all the HISAT2 commandline # subsetWF(WF, slot="output", subset=1, index=1, delete=TRUE) ## in order to delete the subset files list
Function for creating symbolic links to view BAM files in a genome browser such as IGV.
symLink2bam(sysargs, command="ln -s", htmldir, ext = c(".bam", ".bai"), urlbase, urlfile)
symLink2bam(sysargs, command="ln -s", htmldir, ext = c(".bam", ".bai"), urlbase, urlfile)
sysargs |
Object of class |
command |
Shell command, defaults to "ln -s" |
htmldir |
Path to HTML directory with http access. |
ext |
File name extensions to use for BAM and index files. |
urlbase |
The base URL structure to use in URL file. |
urlfile |
Name and path of URL file. |
symbolic links and url file
Thomas Girke
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) ## Create sym links and URL file for IGV symLink2bam(sysargs=args, command="ln -s", htmldir=c("~/.html/", "somedir/"), ext=c(".bam", ".bai"), urlbase="http://myserver.edu/~username/", urlfile="IGVurl.txt") ## End(Not run)
## Construct SYSargs2 object from param and targets files targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) args ## Not run: ## Run alignments args <- runCommandline(args, dir = FALSE, make_bam = TRUE) ## Create sym links and URL file for IGV symLink2bam(sysargs=args, command="ln -s", htmldir=c("~/.html/", "somedir/"), ext=c(".bam", ".bai"), urlbase="http://myserver.edu/~username/", urlfile="IGVurl.txt") ## End(Not run)
Methods to access information from SYSargs
object.
sysargs(x)
sysargs(x)
x |
object of class |
various outputs
Thomas Girke
## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args)
## Construct SYSargs object from param and targets files param <- system.file("extdata", "hisat2.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args)
"SYSargs"
S4 class container for storing parameters of command-line- or R-based software.
SYSargs
instances are constructed by the systemArgs
function
from two simple tabular files: a targets
file and a param
file.
The latter is optional for workflow steps lacking command-line software.
Typically, a SYSargs
instance stores all sample-level inputs as well as
the paths to the corresponding outputs generated by command-line- or R-based
software generating sample-level output files. Each sample level input/outfile
operation uses its own SYSargs
instance. The outpaths of SYSargs
usually define the sample inputs for the next SYSargs
instance. This
connectivity is achieved by writing the outpaths with the
writeTargetsout
function to a new targets file that serves as input to
the next systemArgs
call. By chaining several SYSargs
steps
together one can construct complex workflows involving many sample-level
input/output file operations with any combination of command-line or R-based
software.
Objects can be created by calls of the form new("SYSargs", ...)
.
targetsin
:Object of class "data.frame"
storing tabular data from targets input file
targetsout
:Object of class "data.frame"
storing tabular data from targets output file
targetsheader
:Object of class "character"
storing header/comment lines of targets file
modules
:Object of class "character"
storing software versions from module system
software
:Object of class "character"
name of executable of command-line software
cores
:Object of class "numeric"
number of CPU cores to use
other
:Object of class "character"
additional arguments
reference
:Object of class "character"
path to reference genome file
results
:Object of class "character"
path to results directory
infile1
:Object of class "character"
paths to first FASTQ file
infile2
:Object of class "character"
paths to second FASTQ file if data is PE
outfile1
:Object of class "character"
paths to output files generated by command-line software
sysargs
:Object of class "character"
full commands used to execute external software
outpaths
:Object of class "character"
paths to final outputs including postprocessing by Rsamtools
signature(x = "SYSargs")
: extracts sample names
signature(x = "SYSargs")
: subsetting of class with bracket operator
signature(from = "list", to = "SYSargs")
: as(list, "SYSargs")
signature(x = "SYSargs")
: extracts data from cores
slot
signature(x = "SYSargs")
: extracts data from infile1
slot
signature(x = "SYSargs")
: extracts data from infile2
slot
signature(x = "SYSargs")
: extracts data from modules
slot
signature(x = "SYSargs")
: extracts slot names
signature(x = "SYSargs")
: extracts number of samples
signature(x = "SYSargs")
: extracts data from other
slot
signature(x = "SYSargs")
: extracts data from outfile1
slot
signature(x = "SYSargs")
: extracts data from outpath
slot
signature(x = "SYSargs")
: extracts data from reference
slot
signature(x = "SYSargs")
: extracts data from results
slot
signature(object = "SYSargs")
: summary view of SYSargs
objects
signature(x = "SYSargs")
: extracts data from software
slot
signature(x = "SYSargs")
: extracts data from targetsheader
slot
signature(x = "SYSargs")
: extracts data from targetsin
slot
signature(x = "SYSargs")
: extracts data from targetsout
slot
Thomas Girke
systemArgs
and runCommandline
showClass("SYSargs") ## Construct SYSargs object from param and targets files param <- system.file("extdata", "tophat.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); targetsin(args); targetsout(args); targetsheader(args); software(args); modules(args); cores(args); outpaths(args) sysargs(args); other(args); reference(args); results(args); infile1(args) infile2(args); outfile1(args); SampleName(args) ## Return sample comparisons readComp(args, format = "vector", delim = "-") ## The subsetting operator '[' allows to select specific samples args[1:4] ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Execute SYSargs on multiple machines qsubargs <- getQsubargs(queue="batch", Nnodes="nodes=1", cores=cores(args), memory="mem=10gb", time="walltime=20:00:00") qsubRun(appfct="runCommandline(args=args)", appargs=args, qsubargs=qsubargs, Nqsubs=1, submitdir="results", package="systemPipeR") ## Write outpaths to new targets file for next SYSargs step writeTargetsout(x=args, file="default") ## End(Not run)
showClass("SYSargs") ## Construct SYSargs object from param and targets files param <- system.file("extdata", "tophat.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); targetsin(args); targetsout(args); targetsheader(args); software(args); modules(args); cores(args); outpaths(args) sysargs(args); other(args); reference(args); results(args); infile1(args) infile2(args); outfile1(args); SampleName(args) ## Return sample comparisons readComp(args, format = "vector", delim = "-") ## The subsetting operator '[' allows to select specific samples args[1:4] ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Execute SYSargs on multiple machines qsubargs <- getQsubargs(queue="batch", Nnodes="nodes=1", cores=cores(args), memory="mem=10gb", time="walltime=20:00:00") qsubRun(appfct="runCommandline(args=args)", appargs=args, qsubargs=qsubargs, Nqsubs=1, submitdir="results", package="systemPipeR") ## Write outpaths to new targets file for next SYSargs step writeTargetsout(x=args, file="default") ## End(Not run)
"SYSargs2"
SYSargs2 class
stores all the information and instructions needed for
processing a set of input files with a specific command-line or a series of
command-line within a workflow. The SYSargs2
S4 class object is created
from the loadWF
and renderWF
function, which populates all the
command-line for each sample in each step of the particular workflow. Each sample
level input/outfile operation uses its own SYSargs2
instance. The output
of SYSargs2
define all the expected output files for each step in the
workflow, which usually it is the sample input for the next step in an SYSargs2
instance. By chaining several SYSargs2
steps together one can construct
complex workflows involving many sample-level input/output file operations with
any combination of command-line or R-based software.
Objects can be created by calls of the form new("SYSargs2", ...)
.
targets
:Object of class "list"
storing data from each sample from targets
file
targetsheader
:Object of class "list"
storing header/comment lines of targets file
modules
:Object of class "list"
storing software versions from module system
wf
:Object of class "list"
storing data from Workflow CWL parameters
file
clt
:Object of class "list"
storing data from each CommandLineTool
substep in the Workflow or the single CommandLineTool
CWL parameters
file
yamlinput
:Object of class "list"
storing data from input (*.yml)
file
cmdlist
:Object of class "list"
storing all command-line used to execute external software
input
:Object of class "list"
storing data from each target defined in inputvars
output
:Object of class "list"
paths to final outputs files
files
:Object of class "list"
paths to input
and CWL parameters
files
inputvars
:Object of class "list"
storing data from each inputvars
cmdToCwl
:Object of class "list"
storing data from each cmdToCwl
status
:Object of class "list"
storing data from each status
internal_outfiles
:Object of class "list"
storing raw data from each output
Subsetting of class with bracket operator.
Subsetting of class with bracket operator.
Replacement method for "SYSargs2"
class.
Extracting slots elements by name.
Extracts number of samples.
Extracts slot names.
Summary view of SYSargs2
objects.
signature(from = "list", to = "SYSargs2")
: as(list, "SYSargs2")
signature(from = "SYSargs2", to = "list")
as(SYSargs2, "list")
signature(from = "SYSargs2", to = "DataFrame")
: as(x, "DataFrame")
; for targets slot.
Coerce back to list as(SYSargs2, "list")
Extract data from targets
slot.
Extracts data from targetsheader
slot.
Extracts data from modules
slot.
Extracts data from wf
slot.
Extracts data from clt
slot.
Extracts data from yamlinput
slot.
Extracts data from cmdlist
slot.
Extracts data from input
slot.
Extracts data from cmdlist
slot.
Extracts data from files
slot.
Extracts data from inputvars
slot.
Extracts data from cmdToCwl
slot.
Extracts data from status
slot.
extracting paths to first FASTQ file.
extracting paths to second FASTQ file if data is PE.
Extracts baseCommand from command-line used to execute external software.
Extracts all samples names.
Replacement method for yamlinput
slot input.
Daniela Cassol and Thomas Girke
loadWF
and renderWF
and runCommandline
and clusterRun
showClass("SYSargs2") ## Construct SYSargs2 object from CWl param, CWL input, and targets files targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targetspath, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(Message = "_STRING_", SampleName = "_SAMPLE_")) WF ## Methods names(WF) length(WF) baseCommand(WF) SampleName(WF) ## Accessors targets(WF) targetsheader(WF) modules(WF) yamlinput(WF) cmdlist(WF) input(WF) output(WF) files(WF) inputvars(WF) cmdToCwl(WF) status(WF) ## The subsetting operator '[' allows to select specific command-line/sample WF2 <- WF[1:2] ## Not run: ## Execute SYSargs2 on single machine WF2 <- runCommandline(WF2) ## End(Not run) ## Not run: ## Execute SYSargs2 on multiple machines of a compute cluster. The following ## example uses the conf and template files for the Slurm scheduler. Please ## read the instructions on how to obtain the corresponding files for other schedulers. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) reg <- clusterRun(WF, FUN = runCommandline, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=2, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) ## End(Not run)
showClass("SYSargs2") ## Construct SYSargs2 object from CWl param, CWL input, and targets files targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targetspath, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(Message = "_STRING_", SampleName = "_SAMPLE_")) WF ## Methods names(WF) length(WF) baseCommand(WF) SampleName(WF) ## Accessors targets(WF) targetsheader(WF) modules(WF) yamlinput(WF) cmdlist(WF) input(WF) output(WF) files(WF) inputvars(WF) cmdToCwl(WF) status(WF) ## The subsetting operator '[' allows to select specific command-line/sample WF2 <- WF[1:2] ## Not run: ## Execute SYSargs2 on single machine WF2 <- runCommandline(WF2) ## End(Not run) ## Not run: ## Execute SYSargs2 on multiple machines of a compute cluster. The following ## example uses the conf and template files for the Slurm scheduler. Please ## read the instructions on how to obtain the corresponding files for other schedulers. file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".") file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".") resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) reg <- clusterRun(WF, FUN = runCommandline, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=2, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) ## End(Not run)
SYSargsList
instances are constructed by the SYSargsList
function.
SYSargsList(sysargs = NULL, step_name = "default", targets = NULL, wf_file = NULL, input_file = NULL, dir_path = ".", id = "SampleName", inputvars = NULL, rm_targets_col = NULL, dir = TRUE, dependency = NA, run_step = "mandatory", run_session = "management", run_remote_resources = NULL, silent = FALSE, projPath = getOption("projPath", getwd()))
SYSargsList(sysargs = NULL, step_name = "default", targets = NULL, wf_file = NULL, input_file = NULL, dir_path = ".", id = "SampleName", inputvars = NULL, rm_targets_col = NULL, dir = TRUE, dependency = NA, run_step = "mandatory", run_session = "management", run_remote_resources = NULL, silent = FALSE, projPath = getOption("projPath", getwd()))
sysargs |
|
step_name |
character with the step index name. |
targets |
the path to |
wf_file |
name and path to |
input_file |
name and path to |
dir_path |
full path to the directory with the |
id |
A column from |
inputvars |
Each vector element is required to be defined in the |
rm_targets_col |
targets file colunms to be removed. |
dir |
This option allows creating an exclusive results folder for each step in the
workflow. All the outfiles and log files for the particular step will be
created in the respective folders. Default is |
dependency |
character. Dependency tree, required when appending this step to
the workflow. Character name of a previous step in the workflow.
Default is |
run_step |
character. If the step has "mandatory" or "optional" flag for the execution. |
run_session |
character. If the step has "management" or "compute" flag for the execution. |
run_remote_resources |
|
silent |
If set to |
projPath |
a character vector of a full project path name. Default is the current path. |
Daniela Cassol
SYSargs2
, LineWise
, and SPRproject
sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal
sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal
"SYSargsList"
SYSargsList
S4 class is a list-like container where each instance stores
all the input/output paths and parameter components required for a particular
data analysis step based on command-line- or R-based software.
SYSargsList
instances are constructed by the SYSargsList
function.
## Accessors stepsWF(x) statusWF(x) targetsWF(x) outfiles(x) SE(x, ...) dependency(x) projectInfo(x) runInfo(x) ## Methods cmdlist(x, ...) codeLine(x, ...) SampleName(x, ...) stepName(x) baseCommand(x, ...) targetsheader(x, ...) yamlinput(x, ...) viewEnvir(x, silent = FALSE) copyEnvir(x, list = character(), new.env = globalenv(), silent = FALSE) addResources(x, step, resources) ## Subset Methods subset(x, ...) getColumn(x, step, position = c("outfiles", "targetsWF"), column = 1, names = SampleName(x, step)) ## Replacement appendStep(x, after = length(x), ...) <- value yamlinput(x, paramName, ...) <- value replaceStep(x, step, step_name = "default") <- value renameStep(x, step, ...) <- value dependency(x, step, ...) <- value appendCodeLine(x, after = length(x), ...) <- value replaceCodeLine(x, line, ...) <- value updateColumn(x, step, position = c("outfiles", "targetsWF")) <- value
## Accessors stepsWF(x) statusWF(x) targetsWF(x) outfiles(x) SE(x, ...) dependency(x) projectInfo(x) runInfo(x) ## Methods cmdlist(x, ...) codeLine(x, ...) SampleName(x, ...) stepName(x) baseCommand(x, ...) targetsheader(x, ...) yamlinput(x, ...) viewEnvir(x, silent = FALSE) copyEnvir(x, list = character(), new.env = globalenv(), silent = FALSE) addResources(x, step, resources) ## Subset Methods subset(x, ...) getColumn(x, step, position = c("outfiles", "targetsWF"), column = 1, names = SampleName(x, step)) ## Replacement appendStep(x, after = length(x), ...) <- value yamlinput(x, paramName, ...) <- value replaceStep(x, step, step_name = "default") <- value renameStep(x, step, ...) <- value dependency(x, step, ...) <- value appendCodeLine(x, after = length(x), ...) <- value replaceCodeLine(x, line, ...) <- value updateColumn(x, step, position = c("outfiles", "targetsWF")) <- value
x |
An instance of class |
step |
character or numeric. Workflow step name or position index. |
silent |
If set to |
list |
a character vector naming objects to be copyied from the enviroment. |
new.env |
An environment to copy to. Default is |
resources |
|
position |
character. Options are |
column |
character or numeric. Which column will be subset from the |
names |
character vector. Names of the workflow step. |
after |
A subscript, after which the values are to be appended. |
paramName |
character. Input name from |
step_name |
character with the new step name. Default value will automatically give a name:
|
line |
numeric. Index position of the code line to be added or replaced. |
value |
object containing the values to be replaced to |
... |
Further arguments to be passed to or from other methods. |
Objects can be created by calls of the form new("SYSargsList", ...)
.
stepsWF
:Object of class "list"
storing all the
steps objects of the workflow. Each step can either be SYSargs2
or
LineWise
.
statusWF
:Object of class "list"
storing all the
success and failure of each step in the workflow.
targetsWF
:Object of class "list"
storing all the
targets DataFrame
for each step in the workflow. For the
LineWise
steps, a DataFrame
with 0 rows and 0 columns will
be displayed.
outfiles
:Object of class "list"
storing all the
output DataFrame
for each step in the workflow. For the
LineWise
steps, a DataFrame
with 0 rows and 0 columns will be
displayed.
SE
:Object of class "list"
storing all the
SummarizedExperiment
objects in the workflow.
dependency
:Object of class "list"
storing all the
dependency graphs in the workflow.
projectInfo
:Object of class "list"
storing all the
projectInfo
information of the workflow.
runInfo
:Object of class "list"
storing all the
runInfo
information of each step in the workflow.
targets_connection
:Object of class "list"
storing all
targets files connection in the workflow.
signature(x = "SYSargsList", i = "ANY", j = "ANY", drop = "ANY")
:
subsetting of class with bracket operator
signature(x = "SYSargsList", i = "ANY", j = "ANY")
:
subsetting of class with bracket operator
signature(x = "SYSargsList")
:
extracting slots elements by name
signature(from = "list", to = "SYSargsList")
:
as(list, "SYSargsList")
signature(from = "SYSargsList", to = "list")
:
as(SYSargsList, "list")
signature(x = "SYSargsList")
: Coerce back to
list as(SYSargsList, "list")
signature(x = "SYSargsList")
: extracts number of
SYSargsList
steps
signature(x = "SYSargsList")
: extracts slot names
signature(object = "SYSargsList")
: summary view of
SYSargsList
steps
signature(x = "SYSargsList")
: extract data from
stepsWF
slot
signature(x = "SYSargsList")
: extract data from
statusWF
slot
signature(x = "SYSargsList")
: extract data from
targetsWF
slot
signature(x = "SYSargsList")
: extract data from
outfiles
slot
signature(x = "SYSargsList")
: extract data from
SE
slot
signature(x = "SYSargsList")
: extract data from
dependency
slot
signature(x = "SYSargsList")
: extract data from
projectInfo
slot
signature(x = "SYSargsList")
: extract data from
runInfo
slot
signature(x = "SYSargsList", ...)
:
extracts data from cmdlist
slot for each SYSargs2
step
signature(x = "SYSargsList", step)
:
extracts data from codeLine
slot for LineWise
step
signature(x = "SYSargsList", step)
:
extracts Sample ID from SYSargs2
instance step
signature(x = "SYSargsList")
:
extracts steps names from workflow instance
signature(x = "SYSargsList", step)
:
extracts baseCommand
from SYSargs2
instance step
signature(x = "SYSargsList", step)
:
extracts targetsheader
from SYSargs2
instance step
signature(x = "SYSargsList", step)
:
extracts data from yamlinput
slot for each SYSargs2
step
signature(x = "SYSargsList", silent = FALSE)
:
return a vector of character strings giving the names of the
objects in the SYSargsList
environment
signature(x = "SYSargsList", list = character(),
new.env = globalenv(), silent = FALSE)
:
copy of the contents or select objects from SYSargsList
environment and place them into new.env
signature(x = "SYSargsList", step, resources)
:
Adds the computing resources for one or multiple steps in the workflow.
If the particular step(s) is set to be executed "management section,"
when the resources is added, the step(s) will be executed on the
"compute section."
signature(x = "SYSargsList", step,
position = c("outfiles", "targetsWF"), column = 1,
names = SampleName(x, step))
:
extracts the information for targetsWF
or outfiles
slots. The information can be used in an R code downstream
signature(x = "SYSargsList", i = "ANY", j = "ANY", value = "ANY")
:
replacement method for SYSargsList
class
signature(x = "SYSargsList", after = length(x))
:
insert the SYSargsList
or LineWise
object onto x at
the position given by after
signature(x = "SYSargsList", step, paramName )
:
replace a value in the yamlinput
slot for a specific step instance
signature(x = "SYSargsList", step, step_name = "default")
:
replace a specific step in the workflow instance
signature(x = "SYSargsList")
:
rename a stepName
in the workflow instance
signature(x = "SYSargsList", step)
:
replace dependency graph for a specific step instance
signature(x = "SYSargsList", step, after = length(x))
:
insert the R code in a specific step at the position given by after
signature(x = "SYSargsList", step, line)
:
replace the R code in a specific step at the position given by line
signature(x = "SYSargsList", step,
position = c("outfiles", "targetsWF"))
:
update or add a new column in targetsWF
or outfiles
slots
Daniela Cassol and Thomas Girke
SYSargs2
, LineWise
, and SPRproject
sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal length(sal) names(sal) ## Accessors stepsWF(sal) statusWF(sal) targetsWF(sal) outfiles(sal) SE(sal) dependency(sal) projectInfo(sal) runInfo(sal) ## Methods cmdlist(sal, step=1, targets=1:2) ## SYSargs2 step codeLine(sal, step=2) ## LineWise step SampleName(sal, step="echo") stepName(sal) baseCommand(sal, 1) ## SYSargs2 step targetsheader(sal, step=1) ## SYSargs2 step yamlinput(sal, step=1) ## SYSargs2 step viewEnvir(sal) copyEnvir(sal, list = character(), new.env = globalenv()) resources <- list(conffile= system.file("extdata/.batchtools.conf.R", package="systemPipeR"), template= system.file("extdata/batchtools.slurm.tmpl", package="systemPipeR"), Njobs=3, ## Usually, the samples number walltime=60, ## minutes ntasks=1, ncpus=4, memory=1024 ## Mb ) addResources(sal, 1, resources= resources) ## Subset Methods sal_sub <- subset(sal, subset_steps=1, input_targets=1:2, keep_steps = TRUE) sal_sub targetsIn <- getColumn(sal, step=1, position = c("outfiles")) targetsIn ## Replacement renameStep(sal, step=1) <- "new_echo" dependency(sal, step=2) <- "new_echo" updateColumn(sal, step=2, position = c("targetsWF")) <- data.frame(targetsIn) targetsWF(sal) replaceStep(sal, step=2) <- LineWise(code = { hello <- "Printing a new message" }, step_name = "R_hello", dependency = "new_echo") codeLine(sal) yamlinput(sal, step=1, paramName="results_path") <- list(results_path=list( class="Directory", path="./data")) cmdlist(sal, step = 1, targets = 1) appendCodeLine(sal, step=2, after = 0) <- "log <- log(10)" codeLine(sal, 2) replaceCodeLine(sal, step=2, line=1) <- LineWise(code = { log <- log(50) }) codeLine(sal, 2)
sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal length(sal) names(sal) ## Accessors stepsWF(sal) statusWF(sal) targetsWF(sal) outfiles(sal) SE(sal) dependency(sal) projectInfo(sal) runInfo(sal) ## Methods cmdlist(sal, step=1, targets=1:2) ## SYSargs2 step codeLine(sal, step=2) ## LineWise step SampleName(sal, step="echo") stepName(sal) baseCommand(sal, 1) ## SYSargs2 step targetsheader(sal, step=1) ## SYSargs2 step yamlinput(sal, step=1) ## SYSargs2 step viewEnvir(sal) copyEnvir(sal, list = character(), new.env = globalenv()) resources <- list(conffile= system.file("extdata/.batchtools.conf.R", package="systemPipeR"), template= system.file("extdata/batchtools.slurm.tmpl", package="systemPipeR"), Njobs=3, ## Usually, the samples number walltime=60, ## minutes ntasks=1, ncpus=4, memory=1024 ## Mb ) addResources(sal, 1, resources= resources) ## Subset Methods sal_sub <- subset(sal, subset_steps=1, input_targets=1:2, keep_steps = TRUE) sal_sub targetsIn <- getColumn(sal, step=1, position = c("outfiles")) targetsIn ## Replacement renameStep(sal, step=1) <- "new_echo" dependency(sal, step=2) <- "new_echo" updateColumn(sal, step=2, position = c("targetsWF")) <- data.frame(targetsIn) targetsWF(sal) replaceStep(sal, step=2) <- LineWise(code = { hello <- "Printing a new message" }, step_name = "R_hello", dependency = "new_echo") codeLine(sal) yamlinput(sal, step=1, paramName="results_path") <- list(results_path=list( class="Directory", path="./data")) cmdlist(sal, step = 1, targets = 1) appendCodeLine(sal, step=2, after = 0) <- "log <- log(10)" codeLine(sal, 2) replaceCodeLine(sal, step=2, line=1) <- LineWise(code = { log <- log(50) }) codeLine(sal, 2)
Constructs SYSargs
S4 class objects from two simple tablular files: a
targets
file and a param
file. The latter is optional for
workflow steps lacking command-line software. Typically, a SYSargs
instance stores all sample-level inputs as well as the paths to the
corresponding outputs generated by command-line- or R-based software generating
sample-level output files. Each sample level input/outfile operation uses its
own SYSargs
instance. The outpaths of SYSargs
usually define the
sample inputs for the next SYSargs
instance. This connectivity is
established by writing the outpaths with the writeTargetsout
function to a
new targets file that serves as input to the next systemArgs
call. By
chaining several SYSargs
steps together one can construct complex
workflows involving many sample-level input/output file operations with any
combinaton of command-line or R-based software.
systemArgs(sysma, mytargets, type = "SYSargs")
systemArgs(sysma, mytargets, type = "SYSargs")
sysma |
path to 'param' file; file structure follows a simple name/value syntax that
converted into JSON format; for details about the file structure see sample files provided
by package. Assign |
mytargets |
path to targets file |
type |
|
SYSargs
object or character
string in JSON format
Thomas Girke
showClass("SYSargs")
## Construct SYSargs object from param and targets files param <- system.file("extdata", "tophat.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Execute SYSargs on multiple machines of a compute cluster resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) file.exists(outpaths(args)) sapply(1:length(args), function(x) loadResult(reg, x)) # Works once all jobs have completed successfully. ## Alignment stats read_statsDF <- alignStats(args) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## Write outpaths to new targets file for next SYSargs step writeTargetsout(x=args, file="default") ## End(Not run)
## Construct SYSargs object from param and targets files param <- system.file("extdata", "tophat.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) args names(args); modules(args); cores(args); outpaths(args); sysargs(args) ## Not run: ## Execute SYSargs on single machine runCommandline(args=args) ## Execute SYSargs on multiple machines of a compute cluster resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Monitor progress of submitted jobs getStatus(reg=reg) file.exists(outpaths(args)) sapply(1:length(args), function(x) loadResult(reg, x)) # Works once all jobs have completed successfully. ## Alignment stats read_statsDF <- alignStats(args) write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t") ## Write outpaths to new targets file for next SYSargs step writeTargetsout(x=args, file="default") ## End(Not run)
Convert targets files to list
or data.frame
object.
targets.as.df(x) targets.as.list(x, id="SampleName")
targets.as.df(x) targets.as.list(x, id="SampleName")
x |
An object of the class |
id |
A column from |
data.frame
or list
containing all the targets
file information.
Daniela Cassol
showClass("SYSargs2")
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") targetslist <- targets.as.list(x=targets) targets.as.df(x=targetslist)
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") targets <- read.delim(targetspath, comment.char = "#") targetslist <- targets.as.list(x=targets) targets.as.df(x=targetslist)
## Trims adaptors hierarchically from longest to shortest match from right end of read. ## If 'internalmatch=TRUE' then internal matches will trigger the same behavior. The ## argument minpatternlength defines shortest adaptor match to consider for reads ## containing only partial adaptors at the right end.
trimbatch(fq, pattern, internalmatch=FALSE, minpatternlength=8, Nnumber=1, polyhomo=100, minreadlength=18, maxreadlength)
trimbatch(fq, pattern, internalmatch=FALSE, minpatternlength=8, Nnumber=1, polyhomo=100, minreadlength=18, maxreadlength)
fq |
|
pattern |
|
internalmatch |
The default is |
minpatternlength |
It defines shortest adaptor match to consider for reads containing only partial adaptors at the right end. |
Nnumber |
A numeric value representing a minimum criterion for the filter. It selects
elements with fewer than |
polyhomo |
A numeric value representing a maximum criterion for the filter. It selects elements with fewer than threshold copies of any nucleotide. |
minreadlength |
|
maxreadlength |
|
Thomas Girke
## Preprocessing of paired-end reads dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR") targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR") trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path) trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_")) trim ## Not run: iterTrim <- "trimbatch(fq, pattern='ACACGTCT', internalmatch=FALSE, minpatternlength=6, Nnumber=1, polyhomo=50, minreadlength=16, maxreadlength=101)" preprocessReads(args=trim[1], Fct=iterTrim, batchsize=100000, overwrite=TRUE, compress=TRUE) ## End(Not run)
## Preprocessing of paired-end reads dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe", package="systemPipeR") targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR") trim <- loadWorkflow(targets=targetspath, wf_file="trim-pe.cwl", input_file="trim-pe.yml", dir_path=dir_path) trim <- renderWF(trim, inputvars=c(FileName1="_FASTQ_PATH1_", FileName2="_FASTQ_PATH2_", SampleName="_SampleName_")) trim ## Not run: iterTrim <- "trimbatch(fq, pattern='ACACGTCT', internalmatch=FALSE, minpatternlength=6, Nnumber=1, polyhomo=50, minreadlength=16, maxreadlength=101)" preprocessReads(args=trim[1], Fct=iterTrim, batchsize=100000, overwrite=TRUE, compress=TRUE) ## End(Not run)
Function to check if third-party software or utility is installed and set in the PATH.
tryCMD(command, silent = FALSE)
tryCMD(command, silent = FALSE)
command |
a character vector containing the command line name to be tested. |
silent |
If set to |
It will return a positive message if the software is set on the PATH or an error message if the software is not set it.
Please note that not necessary the software is not installed if the message indicates an error, but it has not been exported on the current PATH.
Danela Cassol
## Not run: tryCMD(command="R") tryCMD(command="blastp") tryCMD(command="hisat2") ## End(Not run)
## Not run: tryCMD(command="R") tryCMD(command="blastp") tryCMD(command="hisat2") ## End(Not run)
Function to check if the full path (file or directory) exists.
tryPath(path)
tryPath(path)
path |
a character vector of full path name. |
This function produces a character vector of the file or directory name defined on the path
argument.
A character vector containing the name of the file or directory. If the path does not exist, it will return an error message.
Daniela Cassol
file <- system.file("extdata/", "targets.txt", package="systemPipeR") tryPath(path=file)
file <- system.file("extdata/", "targets.txt", package="systemPipeR") tryPath(path=file)
Functions for generating tabular variant reports including genomic context
annotations and confidence statistics of variants. The annotations are
obtained with utilities provided by the VariantAnnotation
package and
the variant statistics are retrieved from the input VCF files.
## Variant report variantReport(files, txdb, fa, organism, out_dir = "results") ## Combine variant reports combineVarReports(files, filtercol, ncol = 15) ## Create summary statistics of variants varSummary(files)
## Variant report variantReport(files, txdb, fa, organism, out_dir = "results") ## Combine variant reports combineVarReports(files, filtercol, ncol = 15) ## Create summary statistics of variants varSummary(files)
files |
|
txdb |
Annotation data stored as |
fa |
|
organism |
Character vector specifying the organism name of the reference genome. |
filtercol |
Named character vector containing in the name field the column titles to
filter on, and in the data field the corresponding values to include in
the report. For instance, the setting |
ncol |
Integer specifying the number of columns in the tabular input files. Default is set to 15. |
out_dir |
Character vector of a |
Tabular output files.
Thomas Girke
filterVars
## Alignment with BWA (sequentially on single machine) param <- system.file("extdata", "bwa.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) sysargs(args)[1] ## Not run: library(VariantAnnotation) system("bwa index -a bwtsw ./data/tair10.fasta") bampaths <- runCommandline(args=args) ## Alignment with BWA (parallelized on compute cluster) resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Variant calling with GATK ## The following creates in the inital step a new targets file ## (targets_bam.txt). The first column of this file gives the paths to ## the BAM files created in the alignment step. The new targets file and the ## parameter file gatk.param are used to create a new SYSargs ## instance for running GATK. Since GATK involves many processing steps, it is ## executed by a bash script gatk_run.sh where the user can specify the ## detailed run parameters. All three files are expected to be located in the ## current working directory. Samples files for gatk.param and ## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the ## source file of the systemPipeR package. writeTargetsout(x=args, file="targets_bam.txt") system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") # system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_gatk.txt") ## Variant calling with BCFtools ## The following runs the variant calling with BCFtools. This step requires in ## the current working directory the parameter file sambcf.param and the ## bash script sambcf_run.sh. args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_sambcf.txt") ## Filtering of VCF files generated by GATK args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt") filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4" # filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6" filterVars(args, filter, varcaller="gatk", organism="A. thaliana") writeTargetsout(x=args, file="targets_gatk_filtered.txt") ## Filtering of VCF files generated by BCFtools args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt") filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" # filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" filterVars(args, filter, varcaller="bcftools", organism="A. thaliana") writeTargetsout(x=args, file="targets_sambcf_filtered.txt") ## Annotate filtered variants from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Annotate filtered variants from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Combine results from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Combine results from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Summary for GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t") ## Summary for BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t") ## Venn diagram of variants args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_gatk <- overLapper(varlist, type="vennsets") args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_bcf <- overLapper(varlist, type="vennsets") vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red")) ## End(Not run)
## Alignment with BWA (sequentially on single machine) param <- system.file("extdata", "bwa.param", package="systemPipeR") targets <- system.file("extdata", "targets.txt", package="systemPipeR") args <- systemArgs(sysma=param, mytargets=targets) sysargs(args)[1] ## Not run: library(VariantAnnotation) system("bwa index -a bwtsw ./data/tair10.fasta") bampaths <- runCommandline(args=args) ## Alignment with BWA (parallelized on compute cluster) resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) ## Variant calling with GATK ## The following creates in the inital step a new targets file ## (targets_bam.txt). The first column of this file gives the paths to ## the BAM files created in the alignment step. The new targets file and the ## parameter file gatk.param are used to create a new SYSargs ## instance for running GATK. Since GATK involves many processing steps, it is ## executed by a bash script gatk_run.sh where the user can specify the ## detailed run parameters. All three files are expected to be located in the ## current working directory. Samples files for gatk.param and ## gatk_run.sh are available in the subdirectory ./inst/extdata/ of the ## source file of the systemPipeR package. writeTargetsout(x=args, file="targets_bam.txt") system("java -jar CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") # system("java -jar /opt/picard/1.81/CreateSequenceDictionary.jar R=./data/tair10.fasta O=./data/tair10.dict") args <- systemArgs(sysma="gatk.param", mytargets="targets_bam.txt") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_gatk.txt") ## Variant calling with BCFtools ## The following runs the variant calling with BCFtools. This step requires in ## the current working directory the parameter file sambcf.param and the ## bash script sambcf_run.sh. args <- systemArgs(sysma="sambcf.param", mytargets="targets_bam.txt") resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) reg <- clusterRun(args, conffile=".batchtools.conf.R", template="batchtools.slurm.tmpl", Njobs=18, runid="01", resourceList=resources) writeTargetsout(x=args, file="targets_sambcf.txt") ## Filtering of VCF files generated by GATK args <- systemArgs(sysma="filter_gatk.param", mytargets="targets_gatk.txt") filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==4" # filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6" filterVars(args, filter, varcaller="gatk", organism="A. thaliana") writeTargetsout(x=args, file="targets_gatk_filtered.txt") ## Filtering of VCF files generated by BCFtools args <- systemArgs(sysma="filter_sambcf.param", mytargets="targets_sambcf.txt") filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" # filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)" filterVars(args, filter, varcaller="bcftools", organism="A. thaliana") writeTargetsout(x=args, file="targets_sambcf_filtered.txt") ## Annotate filtered variants from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Annotate filtered variants from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") txdb <- loadDb("./data/tair10.sqlite") fa <- FaFile(systemPipeR::reference(args)) variantReport(args=args, txdb=txdb, fa=fa, organism="A. thaliana") ## Combine results from GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_gatk.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Combine results from BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") combineDF <- combineVarReports(args, filtercol=c(Consequence="nonsynonymous")) write.table(combineDF, "./results/combineDF_nonsyn_sambcf.xls", quote=FALSE, row.names=FALSE, sep="\t") ## Summary for GATK args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") write.table(varSummary(args), "./results/variantStats_gatk.xls", quote=FALSE, col.names = NA, sep="\t") ## Summary for BCFtools args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") write.table(varSummary(args), "./results/variantStats_sambcf.xls", quote=FALSE, col.names = NA, sep="\t") ## Venn diagram of variants args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_gatk_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_gatk <- overLapper(varlist, type="vennsets") args <- systemArgs(sysma="annotate_vars.param", mytargets="targets_sambcf_filtered.txt") varlist <- sapply(names(outpaths(args))[1:4], function(x) as.character(read.delim(outpaths(args)[x])$VARID)) vennset_bcf <- overLapper(varlist, type="vennsets") vennPlot(list(vennset_gatk, vennset_bcf), mymain="", mysub="GATK: red; BCFtools: blue", colmode=2, ccol=c("blue", "red")) ## End(Not run)
Ploting function of 2-5 way Venn diagrams from 'VENNset' objects or count set vectors. A useful feature is the possiblity to combine the counts from several Venn comparisons with the same number of label sets in a single Venn diagram.
vennPlot(x, mymain = "Venn Diagram", mysub = "default", setlabels = "default", yoffset = seq(0, 10, by = 0.34), ccol = rep(1, 31), colmode = 1, lcol = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), lines = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), mylwd = 3, diacol = 1, type = "ellipse", ccex = 1, lcex = 1, sepsplit = "_", ...)
vennPlot(x, mymain = "Venn Diagram", mysub = "default", setlabels = "default", yoffset = seq(0, 10, by = 0.34), ccol = rep(1, 31), colmode = 1, lcol = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), lines = c("#FF0000", "#008B00", "#0000FF", "#FF00FF", "#CD8500"), mylwd = 3, diacol = 1, type = "ellipse", ccex = 1, lcex = 1, sepsplit = "_", ...)
x |
|
mymain |
Main title of plot. |
mysub |
Subtitle of plot. Default |
setlabels |
The argument |
yoffset |
The results from several Venn comparisons can be combined in a single Venn diagram
by assigning to |
ccol |
Character or numeric |
colmode |
See argument |
lcol |
Character or numeric |
lines |
Character or numeric |
mylwd |
Defines line width of shapes used in plot. |
diacol |
See argument |
type |
Defines shapes used to plot 4-way Venn diagram. Default |
ccex |
Controls font size for count values. |
lcex |
Controls font size for set labels. |
sepsplit |
Character used to separate sample labels in Venn counts. |
... |
Additional arguments to pass on. |
Venn diagram plot.
The functions provided here are an extension of the Venn diagram resources on this site: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams
Thomas Girke
See examples in 'The Electronic Journal of Combinatorics': http://www.combinatorics.org/files/Surveys/ds5/VennSymmExamples.html
overLapper
, olBarplot
## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## 2-way Venn diagram vennset <- overLapper(setlist[1:2], type="vennsets") vennPlot(vennset) ## 3-way Venn diagram vennset <- overLapper(setlist[1:3], type="vennsets") vennPlot(vennset) ## 4-way Venn diagram vennset <- overLapper(setlist[1:4], type="vennsets") vennPlot(list(vennset, vennset)) ## Pseudo 4-way Venn diagram with circles vennPlot(vennset, type="circle") ## 5-way Venn diagram vennset <- overLapper(setlist[1:5], type="vennsets") vennPlot(vennset) ## Alternative Venn count input to vennPlot (not recommended!) counts <- sapply(vennlist(vennset), length) vennPlot(counts) ## 6-way Venn comparison as bar plot vennset <- overLapper(setlist[1:6], type="vennsets") olBarplot(vennset, mincount=1) ## Bar plot of standard intersect counts interset <- overLapper(setlist, type="intersects") olBarplot(interset, mincount=1) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) names(interset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset) as.list(interset) ## Pairwise intersect matrix and heatmap olMA <- sapply(names(setlist), function(x) sapply(names(setlist), function(y) sum(setlist[[x]] %in% setlist[[y]]))) olMA heatmap(olMA, Rowv=NA, Colv=NA) ## Presence-absence matrices for large numbers of sample sets interset <- overLapper(setlist=setlist, type="intersects", complexity=2) (paMA <- intersectmatrix(interset)) heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))
## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## 2-way Venn diagram vennset <- overLapper(setlist[1:2], type="vennsets") vennPlot(vennset) ## 3-way Venn diagram vennset <- overLapper(setlist[1:3], type="vennsets") vennPlot(vennset) ## 4-way Venn diagram vennset <- overLapper(setlist[1:4], type="vennsets") vennPlot(list(vennset, vennset)) ## Pseudo 4-way Venn diagram with circles vennPlot(vennset, type="circle") ## 5-way Venn diagram vennset <- overLapper(setlist[1:5], type="vennsets") vennPlot(vennset) ## Alternative Venn count input to vennPlot (not recommended!) counts <- sapply(vennlist(vennset), length) vennPlot(counts) ## 6-way Venn comparison as bar plot vennset <- overLapper(setlist[1:6], type="vennsets") olBarplot(vennset, mincount=1) ## Bar plot of standard intersect counts interset <- overLapper(setlist, type="intersects") olBarplot(interset, mincount=1) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) names(interset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) intersectlist(interset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset) as.list(interset) ## Pairwise intersect matrix and heatmap olMA <- sapply(names(setlist), function(x) sapply(names(setlist), function(y) sum(setlist[[x]] %in% setlist[[y]]))) olMA heatmap(olMA, Rowv=NA, Colv=NA) ## Presence-absence matrices for large numbers of sample sets interset <- overLapper(setlist=setlist, type="intersects", complexity=2) (paMA <- intersectmatrix(interset)) heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray"))
"VENNset"
Container for storing Venn intersect results created by the overLapper
function.
The setlist
slot stores the original label sets as vectors
in a list
;
intersectmatrix
organizes the label sets in a present-absent matrix; complexitylevels
represents the number of comparisons considered for each comparison set as vector of integers;
and vennlist
contains the Venn intersect vectors.
Objects can be created by calls of the form new("VENNset", ...)
.
setlist
:Object of class "list"
: list
of vectors
intersectmatrix
:Object of class "matrix"
: binary matrix
complexitylevels
:Object of class "integer"
: vector
of integers
vennlist
:Object of class "list"
: list
of vectors
signature(x = "VENNset")
: coerces VENNset
to list
signature(from = "list", to = "VENNset")
: as(list, "VENNset")
signature(x = "VENNset")
: extracts data from complexitylevels
slot
signature(x = "VENNset")
: extracts data from intersectmatrix
slot
signature(x = "VENNset")
: returns number of original label sets
signature(x = "VENNset")
: extracts slot names
signature(x = "VENNset")
: extracts data from setlist
slot
signature(object = "VENNset")
: summary view of VENNset
objects
signature(x = "VENNset")
: extracts data from vennset
slot
Thomas Girke
overLapper
, vennPlot
, olBarplot
, INTERSECTset-class
showClass("VENNset") ## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## Create VENNset vennset <- overLapper(setlist[1:5], type="vennsets") class(vennset) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset)
showClass("VENNset") ## Sample data setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18), F=sample(letters, 22)) ## Create VENNset vennset <- overLapper(setlist[1:5], type="vennsets") class(vennset) ## Accessor methods for VENNset/INTERSECTset objects names(vennset) setlist(vennset) intersectmatrix(vennset) complexitylevels(vennset) vennlist(vennset) ## Coerce VENNset/INTERSECTset object to list as.list(vennset)
Function to writeout SYSargsList
workflow control environment (S4 object)
object.
write_SYSargsList(sysargs, sys.file=".SPRproject/SYSargsList.yml", silent=TRUE)
write_SYSargsList(sysargs, sys.file=".SPRproject/SYSargsList.yml", silent=TRUE)
sysargs |
object of class |
sys.file |
name and path of the SYSargsList file which will store all the project configuration
information. Default is |
silent |
if set to TRUE, all messages returned by the function will be suppressed. |
write_SYSargsList
will return a sys.file
path.
Daniela Cassol
See also as SYSargsList-class
.
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal <- write_SYSargsList(sal) sal
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") sal <- write_SYSargsList(sal) sal
SYSargsList
Convenience write function for generating targets files containing the paths to
files generated by input processes. These processes can be commandline-
or R-based software. Typically, the paths to the inputs are stored in the
targets infile targetsWF(sysargs)
for SYSargsList
objects.
Note: by default the function cannot overwrite any existing files. If a file
exists then the user has to explicitly remove it or set overwrite=TRUE
.
writeTargets(sysargs, step, file = "default", silent = FALSE, overwrite = FALSE)
writeTargets(sysargs, step, file = "default", silent = FALSE, overwrite = FALSE)
sysargs |
Object of class |
step |
character with the step name. To check all the names, please
use |
file |
Name and path of the output file. If set to "default" then the name of the output file will have the pattern 'targets_<stepName>.txt'. |
silent |
If set to |
overwrite |
If set to |
Writes tabular targes files containing the header/comment lines from
stepsWF(sysargs)[[step]][["targetsheader"]]
and the
columns from targetsWF(sysargs)[[step]]
.
Daniela Cassol
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") writeTargets(sal, "echo")
## Construct SYSargsList object from Rmd file sal <- SPRproject(overwrite=TRUE) targetspath <- system.file("extdata/cwl/example/targets_example.txt", package="systemPipeR") ## Constructor and `appendStep<-` appendStep(sal) <- SYSargsList(step_name = "echo", targets=targetspath, dir=TRUE, wf_file="example/workflow_example.cwl", input_file="example/example.yml", dir_path = system.file("extdata/cwl", package="systemPipeR"), inputvars = c(Message = "_STRING_", SampleName = "_SAMPLE_")) appendStep(sal) <- LineWise(code = { hello <- lapply(getColumn(sal, step=1, 'outfiles'), function(x) yaml::read_yaml(x)) }, step_name = "R_read", dependency = "echo") writeTargets(sal, "echo")
Convenience write function for generating targets files with updated
FileName
columns containing the paths to files generated
by input/output processes. These processes can be commandline- or R-based software.
Typically, the paths to the inputs are stored in the targets infile
(targetsin(args)
for SYSargs
objects or targets.as.df(targets(WF))
for SYSargs2
objects) and the outputs are stored in the targets outfile
(targetsout(args)
for SYSargs
objects or output(WF)
) for
SYSargs2
objects. Note: by default the function cannot overwrite
any existing files. If a file exists then the user has to explicitly
remove it or set overwrite=TRUE
.
writeTargetsout(x, file = "default", silent = FALSE, overwrite = FALSE, step = NULL, new_col=NULL, new_col_output_index=NULL, remove=FALSE, ...)
writeTargetsout(x, file = "default", silent = FALSE, overwrite = FALSE, step = NULL, new_col=NULL, new_col_output_index=NULL, remove=FALSE, ...)
x |
Object of class |
file |
Name and path of the output file. If set to "default" then the name of the output file will have the pattern 'targets_<software>.txt', where <software> will be what |
silent |
If set to |
overwrite |
If set to |
step |
Name or numeric position of the step in the workflow to write the output files. The names can be check running |
new_col |
A vector of character strings of the new column name to add to target file. |
new_col_output_index |
A vector of numeric index positions of the file in |
remove |
To remove the original columns. Default is |
... |
To pass on additional arguments. |
Writes tabular targes files containing the header/comment lines from targetsheader(x)
and the columns from targetsout(x)
.
Daniela Cassol and Thomas Girke
writeTargetsRef
########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## Write targets out file names(WF$clt) writeTargetsout(x=WF, file="default", step=1, new_col=c("sam_file"), new_col_output_index=c(1))
########################################## ## Examples with \code{SYSargs2} object ## ########################################## ## Construct SYSargs2 object targets <- system.file("extdata", "targets.txt", package="systemPipeR") dir_path <- system.file("extdata/cwl", package="systemPipeR") WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path) WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_")) WF ## Write targets out file names(WF$clt) writeTargetsout(x=WF, file="default", step=1, new_col=c("sam_file"), new_col_output_index=c(1))
Generates targets file with sample-wise reference as required for some NGS
applications, such as ChIP-Seq containing input samples. The reference sample
information needs to be provided in the input file in a column called
SampleReference
where the values reference the labels used in the
SampleName
column. Sample rows without reference assignments will be
removed automatically.
writeTargetsRef(infile, outfile, silent = FALSE, overwrite = FALSE, ...)
writeTargetsRef(infile, outfile, silent = FALSE, overwrite = FALSE, ...)
infile |
Path to input targets file. |
outfile |
Path to output targets file. |
silent |
If set to |
overwrite |
If set to |
... |
To pass on additional arguments. |
Generates modified targets file with the paths to the reference samples in the second column named FileName2
.
Note, sample rows not assigned reference samples are removed automatically.
Thomas Girke
writeTargetsout
, mergeBamByFactor
## Path to input targets file targets <- system.file("extdata", "targets_chip.txt", package="systemPipeR") ## Not run: ## Write modified targets file with reference (e.g. input) sample writeTargetsRef(infile=targets, outfile="~/targets_refsample.txt", silent=FALSE, overwrite=FALSE) ## End(Not run)
## Path to input targets file targets <- system.file("extdata", "targets_chip.txt", package="systemPipeR") ## Not run: ## Write modified targets file with reference (e.g. input) sample writeTargetsRef(infile=targets, outfile="~/targets_refsample.txt", silent=FALSE, overwrite=FALSE) ## End(Not run)