Title: | Pipeline framework for bioinformatics in R |
---|---|
Description: | pipeFrame is an R package for building a componentized bioinformatics pipeline. Each step in this pipeline is wrapped in the framework, so the connection among steps is created seamlessly and automatically. Users could focus more on fine-tuning arguments rather than spending a lot of time on transforming file format, passing task outputs to task inputs or installing the dependencies. Componentized step elements can be customized into other new pipelines flexibly as well. This pipeline can be split into several important functional steps, so it is much easier for users to understand the complex arguments from each step rather than parameter combination from the whole pipeline. At the same time, componentized pipeline can restart at the breakpoint and avoid rerunning the whole pipeline, which may save a lot of time for users on pipeline tuning or such issues as power off or process other interrupts. |
Authors: | Zheng Wei, Shining Ma |
Maintainer: | Zheng Wei <[email protected]> |
License: | GPL-3 |
Version: | 1.23.0 |
Built: | 2024-12-30 03:21:31 UTC |
Source: | https://github.com/bioc/pipeFrame |
Obtain all of objects in the specific pipeline
getObjsInPipe(pipeName = "pipe")
getObjsInPipe(pipeName = "pipe")
pipeName |
|
List
scalar. A list containing all objects that belongs to the pipe name.
getObjsInPipe("pipe")
getObjsInPipe("pipe")
The step relations are managed and restricted to directed acyclic graph. The direction of data flow is from upstream to downstream. So when users create a new step object, restricting its relation with existing steps is necessary.
addEdges(edges, argOrder) getPrevSteps(stepType, argOrder) getAttachedStep(stepType) regAttachedStep(newStepType, stepType) getNextSteps(stepType, argOrder) printMap(stepType = NULL, display = TRUE, ...)
addEdges(edges, argOrder) getPrevSteps(stepType, argOrder) getAttachedStep(stepType) regAttachedStep(newStepType, stepType) getNextSteps(stepType, argOrder) printMap(stepType = NULL, display = TRUE, ...)
edges |
|
argOrder |
|
stepType |
|
newStepType |
|
display |
|
... |
Additional arguments, currently used. |
addEdges |
No value will be returned. |
getPrevSteps |
Names of previous steps |
getAttachedStep |
get the step that is generated from |
regAttachedStep |
Add different step type for exist step |
getNextSteps |
Names of next steps |
printMap |
Print the flow map for the pipeline. |
addEdges(edges = c("RandomRegionOnGenome", "OverlappedRandomRegion"),argOrder = 1) printMap() getPrevSteps("OverlappedRandomRegion",1)
addEdges(edges = c("RandomRegionOnGenome", "OverlappedRandomRegion"),argOrder = 1) printMap() getPrevSteps("OverlappedRandomRegion",1)
ignore checking input and output file (for developer)
ignoreCheck(ignore = TRUE)
ignoreCheck(ignore = TRUE)
ignore |
|
ignoreCheck |
No value will be returned |
ignoreCheck(FALSE)
ignoreCheck(FALSE)
This function should be called first in R terminal for general users. And it should be used in .onLoad() function for package developers. In this function, several parameters need to be defined and configured, including genome, job name, reference directory, temporary directory, check and install function, threads number, reference list, etc.
initPipeFrame( defaultJobName, availableGenome = c("hg19", "hg38", "mm9", "mm10", "danRer10", "galGal5", "galGal4", "rheMac3", "rheMac8", "panTro4", "rn5", "rn6", "sacCer2", "sacCer3", "susScr3", "testgenome"), defaultCheckAndInstallFunc = NULL, defaultThreads = 2, defaultTmpDir = getwd(), defaultRefDir = file.path(getwd(), "refdir"), defaultReference = list(test = list(file = "fileName", rc = "obj")) )
initPipeFrame( defaultJobName, availableGenome = c("hg19", "hg38", "mm9", "mm10", "danRer10", "galGal5", "galGal4", "rheMac3", "rheMac8", "panTro4", "rn5", "rn6", "sacCer2", "sacCer3", "susScr3", "testgenome"), defaultCheckAndInstallFunc = NULL, defaultThreads = 2, defaultTmpDir = getwd(), defaultRefDir = file.path(getwd(), "refdir"), defaultReference = list(test = list(file = "fileName", rc = "obj")) )
defaultJobName |
|
availableGenome |
|
defaultCheckAndInstallFunc |
|
defaultThreads |
|
defaultTmpDir |
|
defaultRefDir |
|
defaultReference |
|
No value will be returned.
initPipeFrame(availableGenome = c("hg19", "hg38", "mm9", "mm10"), defaultJobName = paste0("pkgname","-pipeline") )
initPipeFrame(availableGenome = c("hg19", "hg38", "mm9", "mm10"), defaultJobName = paste0("pkgname","-pipeline") )
load configure from file
loadConfig(configFile) saveConfig(configFile) configRegName()
loadConfig(configFile) saveConfig(configFile) configRegName()
configFile |
|
loadConfig |
No value will be returned |
saveConfig |
save the configuration into a RDS file |
configRegName |
charactor vector, registered configuration name |
configRegName() saveConfig("test.rds") loadConfig("test.rds")
configRegName() saveConfig("test.rds") loadConfig("test.rds")
load PipeFrame Step (or its inherit class) object from rds file
loadStep(rdsfile, regClass = TRUE)
loadStep(rdsfile, regClass = TRUE)
rdsfile |
|
regClass |
|
Step (or its inherit class) object
objrds <- system.file(package = "pipeFrame", "extdata","pipeFrame.obj.rds") obj <- loadStep(objrds)
objrds <- system.file(package = "pipeFrame", "extdata","pipeFrame.obj.rds") obj <- loadStep(objrds)
Install dependent data or software with finishing check
runWithFinishCheck(func, refName, refFilePath = NULL, genome = NULL) checkAndInstallBSgenome(refFilePath, genome = getGenome()) checkAndInstallOrgDb(refFilePath, genome = getGenome()) checkAndInstallTxDb(refFilePath, genome = getGenome()) checkAndInstallGenomeFa(refFilePath)
runWithFinishCheck(func, refName, refFilePath = NULL, genome = NULL) checkAndInstallBSgenome(refFilePath, genome = getGenome()) checkAndInstallOrgDb(refFilePath, genome = getGenome()) checkAndInstallTxDb(refFilePath, genome = getGenome()) checkAndInstallGenomeFa(refFilePath)
func |
|
refName |
|
refFilePath |
|
genome |
|
runWithFinishCheck |
No value will be returned |
checkAndInstallBSgenome |
check if there is the BSgenome package installed for curent genome and install it if not. No value will be returned. |
checkAndInstallOrgDb |
check if there is the OrgDb package installed for curent genome and install it if not. No value will be returned. |
checkAndInstallTxDb |
check if there is the TxDb package installed for curent genome and install it if not. Nothing will be returned. |
checkAndInstallGenomeFa |
check if genome FASTA file exist and install if not. No value will be returned |
checkAndInstall <- function(){ runWithFinishCheck(func = checkAndInstallBSgenome,refName = "bsgenome") runWithFinishCheck(func = checkAndInstallGenomeFa,refName = "fasta", refFilePath = paste0(getGenome(),".fa")) } initPipeFrame(availableGenome = c("hg19", "hg38","mm9","mm10","testgenome"), defaultJobName = paste0("pkgname","-pipeline") ) setGenome("hg19")
checkAndInstall <- function(){ runWithFinishCheck(func = checkAndInstallBSgenome,refName = "bsgenome") runWithFinishCheck(func = checkAndInstallGenomeFa,refName = "fasta", refFilePath = paste0(getGenome(),".fa")) } initPipeFrame(availableGenome = c("hg19", "hg38","mm9","mm10","testgenome"), defaultJobName = paste0("pkgname","-pipeline") ) setGenome("hg19")
Configure the reference genome assembly for all steps.
getValidGenome() setGenome(genome) getGenome()
getValidGenome() setGenome(genome) getGenome()
genome |
|
getValidGenome |
|
setGenome |
All packages and dependencies are configured and installed. No value will be returned. |
getGenome |
|
getValidGenome() setGenome("hg19") getGenome()
getValidGenome() setGenome("hg19") getGenome()
Configure the job name for following steps.
setJobName(jobName) getJobName() getJobDir()
setJobName(jobName) getJobName() getJobDir()
jobName |
|
setJobName |
No value will be returned |
getJobName |
Set a job name for following steps. |
getJobDir |
get the job directory |
setJobName("testJobName") getJobName() getJobDir()
setJobName("testJobName") getJobName() getJobDir()
Configure the pipe name for following steps.
setPipeName(pipeName) getPipeName(all = FALSE)
setPipeName(pipeName) getPipeName(all = FALSE)
pipeName |
|
all |
|
setPipeName |
No value will be returned |
getPipeName |
Set a pipeline name for following steps. |
setPipeName("pipe") getPipeName()
setPipeName("pipe") getPipeName()
Set the reference directory
setRefDir(refdir, createDir = TRUE) getRefDir() getRef(refName) getRefFiles(refName) getRefRc(refName)
setRefDir(refdir, createDir = TRUE) getRefDir() getRef(refName) getRefFiles(refName) getRefRc(refName)
refdir |
|
createDir |
|
refName |
|
setRefDir |
No value will be returned |
getRefDir |
|
getRef |
|
getRefFiles |
|
getRefRc |
|
setRefDir("./refdir") getRefDir() getRef("test") getRefFiles("test") getRefRc("test")
setRefDir("./refdir") getRefDir() getRef("test") getRefFiles("test") getRefRc("test")
Configure the maximum number of threads for all steps
setThreads(threads = detectCores()) getThreads()
setThreads(threads = detectCores()) getThreads()
threads |
|
setThreads |
No value will be returned |
getThreads |
|
setThreads() getThreads()
setThreads() getThreads()
Configure the directory for intermediate results of all steps
setTmpDir(tmpDir = getwd()) getTmpDir()
setTmpDir(tmpDir = getwd()) getTmpDir()
tmpDir |
|
setTmpDir |
No value will be returned |
getTmpDir |
|
setTmpDir() getTmpDir()
setTmpDir() getTmpDir()
Users can call Step object operation methods below to obtain information in objects.
## S4 method for signature 'Step' init(.Object, prevSteps = list(), ...) ## S4 method for signature 'Step' stepName(.Object, ...) ## S4 method for signature 'Step' stepType(.Object, attachedTypes = TRUE, ...) ## S4 method for signature 'Step' pipeName(.Object, ...) ## S4 method for signature 'Step' input(.Object) ## S4 replacement method for signature 'Step' input(.Object) <- value ## S4 method for signature 'Step' output(.Object) ## S4 replacement method for signature 'Step' output(.Object) <- value ## S4 method for signature 'Step' param(.Object) ## S4 replacement method for signature 'Step' param(.Object) <- value ## S4 method for signature 'Step' property(.Object, ..., pipeName = NULL) ## S4 replacement method for signature 'Step' property(.Object, pipeName = NULL) <- value ## S4 method for signature 'Step' report(.Object) ## S4 replacement method for signature 'Step' report(.Object) <- value ## S4 method for signature 'Step' argv(.Object) ## S4 method for signature 'Step' x$name ## S4 replacement method for signature 'Step' x$name <- value ## S4 method for signature 'Step' getParam(.Object, item, type = c("input", "output", "other"), ...) ## S4 method for signature 'Step' getParamItems(.Object, type = c("input", "output", "other"), ...) ## S4 method for signature 'Step' isReady(.Object, ...) ## S4 method for signature 'Step' clearStepCache(.Object, ...) ## S4 method for signature 'Step' getAutoPath(.Object, originPath, regexSuffixName, suffix, ...) ## S4 method for signature 'Step' checkRequireParam(.Object, ...) ## S4 method for signature 'Step' checkAllPath(.Object, ...) ## S4 method for signature 'Step' getParamMD5Path(.Object, ...) ## S4 method for signature 'Step' getStepWorkDir(.Object, filename = NULL, ...) ## S4 method for signature 'Step' stepID(.Object, ...) ## S4 method for signature 'Step' writeLog( .Object, msg, ..., isWarnning = FALSE, appendLog = TRUE, showMsg = TRUE ) processing(.Object, ...) genReport(.Object, ...)
## S4 method for signature 'Step' init(.Object, prevSteps = list(), ...) ## S4 method for signature 'Step' stepName(.Object, ...) ## S4 method for signature 'Step' stepType(.Object, attachedTypes = TRUE, ...) ## S4 method for signature 'Step' pipeName(.Object, ...) ## S4 method for signature 'Step' input(.Object) ## S4 replacement method for signature 'Step' input(.Object) <- value ## S4 method for signature 'Step' output(.Object) ## S4 replacement method for signature 'Step' output(.Object) <- value ## S4 method for signature 'Step' param(.Object) ## S4 replacement method for signature 'Step' param(.Object) <- value ## S4 method for signature 'Step' property(.Object, ..., pipeName = NULL) ## S4 replacement method for signature 'Step' property(.Object, pipeName = NULL) <- value ## S4 method for signature 'Step' report(.Object) ## S4 replacement method for signature 'Step' report(.Object) <- value ## S4 method for signature 'Step' argv(.Object) ## S4 method for signature 'Step' x$name ## S4 replacement method for signature 'Step' x$name <- value ## S4 method for signature 'Step' getParam(.Object, item, type = c("input", "output", "other"), ...) ## S4 method for signature 'Step' getParamItems(.Object, type = c("input", "output", "other"), ...) ## S4 method for signature 'Step' isReady(.Object, ...) ## S4 method for signature 'Step' clearStepCache(.Object, ...) ## S4 method for signature 'Step' getAutoPath(.Object, originPath, regexSuffixName, suffix, ...) ## S4 method for signature 'Step' checkRequireParam(.Object, ...) ## S4 method for signature 'Step' checkAllPath(.Object, ...) ## S4 method for signature 'Step' getParamMD5Path(.Object, ...) ## S4 method for signature 'Step' getStepWorkDir(.Object, filename = NULL, ...) ## S4 method for signature 'Step' stepID(.Object, ...) ## S4 method for signature 'Step' writeLog( .Object, msg, ..., isWarnning = FALSE, appendLog = TRUE, showMsg = TRUE ) processing(.Object, ...) genReport(.Object, ...)
.Object |
|
prevSteps |
|
... |
Additional arguments, currently unused. |
attachedTypes |
|
value |
any type scalar. The value to be set for corresponding item in a list. |
pipeName |
|
x |
|
name |
|
item |
|
type |
|
originPath |
|
regexSuffixName |
|
suffix |
|
filename |
|
msg |
|
isWarnning |
|
appendLog |
|
showMsg |
|
Step
is a S4 class for generating Step S4 objects.
All Step objects generated by child classes inherit from Step.
To generate new Step objects,
a function wrapper with fixed arguments needs to be implemented.
Users use this function to generate new Step functions rather
than Step S4 class to generate objects.
the function and result of functions:
init |
(For package developer only) A Step child class object with initialized input, output and other parameters |
stepName |
get Step object Character name |
stepType |
get Step object Character type name (class name) |
pipeName |
get Step object pipe name |
input |
get input list |
input<- |
set input list |
output |
get output list |
output<- |
set output list |
param |
get other parameters list |
param<- |
set other parameters list |
property |
get property list |
property<- |
set property list |
report |
get report list |
report<- |
set report list |
argv |
get arguments list |
$ |
get inputList, outputList, paramList, allList, propList or any item value in inputList, outputList or paramList |
$<- |
set inputList, outputList, paramList, allList, propList or any item value in inputList, outputList or paramList |
getParam |
Get parameter value set by process function.
See |
getParamItems |
Get parameter name list |
isReady |
Is the process ready for downstream process |
clearStepCache |
Clear cache of Step object |
getAutoPath |
(For package developer) Developer can use this method to generate new file name based on exist input file name |
checkRequireParam |
(For package developer) Check required inputs or parameters are filled. |
checkRequireParam |
(For package developer) Check required inputs are filled. |
getParamMD5Path |
The Step object storage directory |
getStepWorkDir |
Get the step work directory of this object |
stepID |
Get the step ID |
writeLog |
(For package developer) write log. |
processing |
(For package developer) Run pipeline step |
genReport |
(For package developer) Generate report list |
Zheng Wei
library(BSgenome) library(rtracklayer) library(magrittr) # generate new Step : RandomRegionOnGenome setClass(Class = "RandomRegionOnGenome", contains = "Step" ) setMethod( f = "init", signature = "RandomRegionOnGenome", definition = function(.Object,prevSteps = list(),...){ # All arguments in function randomRegionOnGenome # will be passed from "..." # so get the arguments from "..." first. allparam <- list(...) sampleNumb <- allparam[["sampleNumb"]] regionLen <- allparam[["regionLen"]] genome <- allparam[["genome"]] outputBed <- allparam[["outputBed"]] # no previous steps for this step so ingnore the "prevSteps" # begin to set input parameters # no input for this step # begin to set output parameters if(is.null(outputBed)){ output(.Object)$outputBed <- getStepWorkDir(.Object,"random.bed") }else{ output(.Object)$outputBed <- outputBed } # begin to set other parameters param(.Object)$regionLen <- regionLen param(.Object)$sampleNumb <- sampleNumb if(is.null(genome)){ param(.Object)$bsgenome <- getBSgenome(getGenome()) }else{ param(.Object)$bsgenome <- getBSgenome(genome) } # don't forget to return .Object .Object } ) setMethod( f = "processing", signature = "RandomRegionOnGenome", definition = function(.Object,...){ # All arguments are set in .Object # so we can get them from .Object allparam <- list(...) sampleNumb <- getParam(.Object,"sampleNumb") regionLen <- getParam(.Object,"regionLen") bsgenome <- getParam(.Object,"bsgenome") outputBed <- getParam(.Object,"outputBed") # begin the calculation chrlens <-seqlengths(bsgenome) selchr <- grep("_|M",names(chrlens),invert=TRUE) chrlens <- chrlens[selchr] startchrlens <- chrlens - regionLen spchrs <- sample(x = names(startchrlens), size = sampleNumb, replace = TRUE, prob = startchrlens / sum(startchrlens)) gr <- GRanges() for(chr in names(startchrlens)){ startpt <- sample(x = 1:startchrlens[chr], size = sum(spchrs == chr),replace = FALSE) gr <- c(gr,GRanges(seqnames = chr, ranges = IRanges(start = startpt, width = 1000))) } result <- sort(gr,ignore.strand=TRUE) rtracklayer::export.bed(object = result, con = outputBed) # don't forget to return .Object .Object } ) setMethod( f = "genReport", signature = "RandomRegionOnGenome", definition = function(.Object, ...){ .Object } ) # This function is exported in NAMESPACE for user to use randomRegionOnGenome <- function(sampleNumb, regionLen = 1000, genome = NULL, outputBed = NULL, ...){ allpara <- c(list(Class = "RandomRegionOnGenome", prevSteps = list()), as.list(environment()),list(...)) step <- do.call(new,allpara) invisible(step) } # generate another new Step : OverlappedRandomRegion setClass(Class = "OverlappedRandomRegion", contains = "Step" ) setMethod( f = "init", signature = "OverlappedRandomRegion", definition = function(.Object,prevSteps = list(),...){ # All arguments in function overlappedRandomRegion and # runOerlappedRandomRegion will be passed from "..." # so get the arguments from "..." first. allparam <- list(...) inputBed <- allparam[["inputBed"]] randomBed <- allparam[["randomBed"]] outputBed <- allparam[["outputBed"]] # inputBed can obtain from previous step object when running # runOerlappedRandomRegion if(length(prevSteps)>0){ prevStep <- prevSteps[[1]] input(.Object)$randomBed <- getParam(prevStep,"outputBed") } # begin to set input parameters if(!is.null(inputBed)){ input(.Object)$inputBed <- inputBed } if(!is.null(randomBed)){ input(.Object)$randomBed <- randomBed } # begin to set output parameters # the output is recemended to set under the step work directory if(!is.null(outputBed)){ output(.Object)$outputBed <- outputBed }else{ output(.Object)$outputBed <- getAutoPath(.Object, getParam(.Object, "inputBed"), "bed", suffix = "bed") # the path can also be generate in this way # ib <- getParam(.Object,"inputBed") # output(.Object)$outputBed <- # file.path(getStepWorkDir(.Object), # paste0(substring(ib,1,nchar(ib)-3), "bed")) } # begin to set other parameters # no other parameters # don't forget to return .Object .Object } ) setMethod( f = "processing", signature = "OverlappedRandomRegion", definition = function(.Object,...){ # All arguments are set in .Object # so we can get them from .Object allparam <- list(...) inputBed <- getParam(.Object,"inputBed") randomBed <- getParam(.Object,"randomBed") outputBed <- getParam(.Object,"outputBed") # begin the calculation gr1 <- import.bed(con = inputBed) gr2 <- import.bed(con = randomBed) gr <- second(findOverlapPairs(gr1,gr2)) export.bed(gr,con = outputBed) # don't forget to return .Object .Object } ) setMethod( f = "genReport", signature = "OverlappedRandomRegion", definition = function(.Object, ...){ .Object } ) # This function is exported in NAMESPACE for user to use overlappedRandomRegion <- function(inputBed, randomBed, outputBed = NULL, ...){ allpara <- c(list(Class = "OverlappedRandomRegion", prevSteps = list()),as.list(environment()),list(...)) step <- do.call(new,allpara) invisible(step) } setGeneric("runOverlappedRandomRegion", function(prevStep, inputBed, randomBed = NULL, outputBed = NULL, ...) standardGeneric("runOverlappedRandomRegion")) setMethod( f = "runOverlappedRandomRegion", signature = "Step", definition = function(prevStep, inputBed, randomBed = NULL, outputBed = NULL, ...){ allpara <- c(list(Class = "OverlappedRandomRegion", prevSteps = list(prevStep)),as.list(environment()),list(...)) step <- do.call(new,allpara) invisible(step) } ) # add to graph addEdges(edges = c("RandomRegionOnGenome","OverlappedRandomRegion"), argOrder = 1) # begin to test pipeline setGenome("hg19") # generate test BED file test_bed <- file.path(tempdir(),"test.bed") library(rtracklayer) export.bed(GRanges("chr7:1-127473000"),test_bed) rd <- randomRegionOnGenome(10000) overlap <- runOverlappedRandomRegion(rd, inputBed = test_bed) randombed <- getParam(rd,"outputBed") randombed overlap1 <- overlappedRandomRegion(inputBed = test_bed, randomBed = randombed) clearStepCache(overlap1) overlap1 <- overlappedRandomRegion(inputBed = test_bed, randomBed = randombed) clearStepCache(rd) clearStepCache(overlap1) rd <- randomRegionOnGenome(10000) %>% runOverlappedRandomRegion(inputBed = test_bed) stepName(rd) stepID(rd) isReady(rd)
library(BSgenome) library(rtracklayer) library(magrittr) # generate new Step : RandomRegionOnGenome setClass(Class = "RandomRegionOnGenome", contains = "Step" ) setMethod( f = "init", signature = "RandomRegionOnGenome", definition = function(.Object,prevSteps = list(),...){ # All arguments in function randomRegionOnGenome # will be passed from "..." # so get the arguments from "..." first. allparam <- list(...) sampleNumb <- allparam[["sampleNumb"]] regionLen <- allparam[["regionLen"]] genome <- allparam[["genome"]] outputBed <- allparam[["outputBed"]] # no previous steps for this step so ingnore the "prevSteps" # begin to set input parameters # no input for this step # begin to set output parameters if(is.null(outputBed)){ output(.Object)$outputBed <- getStepWorkDir(.Object,"random.bed") }else{ output(.Object)$outputBed <- outputBed } # begin to set other parameters param(.Object)$regionLen <- regionLen param(.Object)$sampleNumb <- sampleNumb if(is.null(genome)){ param(.Object)$bsgenome <- getBSgenome(getGenome()) }else{ param(.Object)$bsgenome <- getBSgenome(genome) } # don't forget to return .Object .Object } ) setMethod( f = "processing", signature = "RandomRegionOnGenome", definition = function(.Object,...){ # All arguments are set in .Object # so we can get them from .Object allparam <- list(...) sampleNumb <- getParam(.Object,"sampleNumb") regionLen <- getParam(.Object,"regionLen") bsgenome <- getParam(.Object,"bsgenome") outputBed <- getParam(.Object,"outputBed") # begin the calculation chrlens <-seqlengths(bsgenome) selchr <- grep("_|M",names(chrlens),invert=TRUE) chrlens <- chrlens[selchr] startchrlens <- chrlens - regionLen spchrs <- sample(x = names(startchrlens), size = sampleNumb, replace = TRUE, prob = startchrlens / sum(startchrlens)) gr <- GRanges() for(chr in names(startchrlens)){ startpt <- sample(x = 1:startchrlens[chr], size = sum(spchrs == chr),replace = FALSE) gr <- c(gr,GRanges(seqnames = chr, ranges = IRanges(start = startpt, width = 1000))) } result <- sort(gr,ignore.strand=TRUE) rtracklayer::export.bed(object = result, con = outputBed) # don't forget to return .Object .Object } ) setMethod( f = "genReport", signature = "RandomRegionOnGenome", definition = function(.Object, ...){ .Object } ) # This function is exported in NAMESPACE for user to use randomRegionOnGenome <- function(sampleNumb, regionLen = 1000, genome = NULL, outputBed = NULL, ...){ allpara <- c(list(Class = "RandomRegionOnGenome", prevSteps = list()), as.list(environment()),list(...)) step <- do.call(new,allpara) invisible(step) } # generate another new Step : OverlappedRandomRegion setClass(Class = "OverlappedRandomRegion", contains = "Step" ) setMethod( f = "init", signature = "OverlappedRandomRegion", definition = function(.Object,prevSteps = list(),...){ # All arguments in function overlappedRandomRegion and # runOerlappedRandomRegion will be passed from "..." # so get the arguments from "..." first. allparam <- list(...) inputBed <- allparam[["inputBed"]] randomBed <- allparam[["randomBed"]] outputBed <- allparam[["outputBed"]] # inputBed can obtain from previous step object when running # runOerlappedRandomRegion if(length(prevSteps)>0){ prevStep <- prevSteps[[1]] input(.Object)$randomBed <- getParam(prevStep,"outputBed") } # begin to set input parameters if(!is.null(inputBed)){ input(.Object)$inputBed <- inputBed } if(!is.null(randomBed)){ input(.Object)$randomBed <- randomBed } # begin to set output parameters # the output is recemended to set under the step work directory if(!is.null(outputBed)){ output(.Object)$outputBed <- outputBed }else{ output(.Object)$outputBed <- getAutoPath(.Object, getParam(.Object, "inputBed"), "bed", suffix = "bed") # the path can also be generate in this way # ib <- getParam(.Object,"inputBed") # output(.Object)$outputBed <- # file.path(getStepWorkDir(.Object), # paste0(substring(ib,1,nchar(ib)-3), "bed")) } # begin to set other parameters # no other parameters # don't forget to return .Object .Object } ) setMethod( f = "processing", signature = "OverlappedRandomRegion", definition = function(.Object,...){ # All arguments are set in .Object # so we can get them from .Object allparam <- list(...) inputBed <- getParam(.Object,"inputBed") randomBed <- getParam(.Object,"randomBed") outputBed <- getParam(.Object,"outputBed") # begin the calculation gr1 <- import.bed(con = inputBed) gr2 <- import.bed(con = randomBed) gr <- second(findOverlapPairs(gr1,gr2)) export.bed(gr,con = outputBed) # don't forget to return .Object .Object } ) setMethod( f = "genReport", signature = "OverlappedRandomRegion", definition = function(.Object, ...){ .Object } ) # This function is exported in NAMESPACE for user to use overlappedRandomRegion <- function(inputBed, randomBed, outputBed = NULL, ...){ allpara <- c(list(Class = "OverlappedRandomRegion", prevSteps = list()),as.list(environment()),list(...)) step <- do.call(new,allpara) invisible(step) } setGeneric("runOverlappedRandomRegion", function(prevStep, inputBed, randomBed = NULL, outputBed = NULL, ...) standardGeneric("runOverlappedRandomRegion")) setMethod( f = "runOverlappedRandomRegion", signature = "Step", definition = function(prevStep, inputBed, randomBed = NULL, outputBed = NULL, ...){ allpara <- c(list(Class = "OverlappedRandomRegion", prevSteps = list(prevStep)),as.list(environment()),list(...)) step <- do.call(new,allpara) invisible(step) } ) # add to graph addEdges(edges = c("RandomRegionOnGenome","OverlappedRandomRegion"), argOrder = 1) # begin to test pipeline setGenome("hg19") # generate test BED file test_bed <- file.path(tempdir(),"test.bed") library(rtracklayer) export.bed(GRanges("chr7:1-127473000"),test_bed) rd <- randomRegionOnGenome(10000) overlap <- runOverlappedRandomRegion(rd, inputBed = test_bed) randombed <- getParam(rd,"outputBed") randombed overlap1 <- overlappedRandomRegion(inputBed = test_bed, randomBed = randombed) clearStepCache(overlap1) overlap1 <- overlappedRandomRegion(inputBed = test_bed, randomBed = randombed) clearStepCache(rd) clearStepCache(overlap1) rd <- randomRegionOnGenome(10000) %>% runOverlappedRandomRegion(inputBed = test_bed) stepName(rd) stepID(rd) isReady(rd)
Functions for directory operations
getBasenamePrefix(filepath, words, ...) getPathPrefix(filepath, words, ...) checkFileExist(filePath, ...) checkPathExist(filePath, ...) checkFileCreatable(filePath, ...) addFileSuffix(filePath, suffix, ...)
getBasenamePrefix(filepath, words, ...) getPathPrefix(filepath, words, ...) checkFileExist(filePath, ...) checkPathExist(filePath, ...) checkFileCreatable(filePath, ...) addFileSuffix(filePath, suffix, ...)
filepath |
|
words |
|
... |
Additional arguments, currently unused |
filePath |
|
suffix |
|
getBasenamePrefix |
Get the filepath basename with removed suffix |
getPathPrefix |
Get the filepath with removed suffix |
checkFileExist |
(For package developer) Check file is exist. |
checkPathExist |
(For package developer) Check directory is exist. |
checkFileCreatable |
(For package developer) Check file creatable. |
addFileSuffix |
(For package developer) Check if file suffix existed and add suffix |
getBasenamePrefix("aaa/bbb.ccc.ddd","cCc") getBasenamePrefix("aaa/bbb.ccc.ddd","ddd") getPathPrefix("aaa/bbb.ccc.ddd","dDd") getPathPrefix("aaa/bbb.ccc.ddd","ccc") file.create("test.bed") checkFileExist("test.bed") tryCatch({checkFileExist("test.bed1")},error = function(e) e) dir.create("testdir") checkPathExist(file.path(getwd(),"testdir")) tryCatch({checkPathExist(file.path(dirname(getwd()), "notexistfolder","testdir"))},error = function(e) e) checkFileCreatable("aaa.bed") tryCatch({checkFileCreatable("testdir1/aaa.bed")},error = function(e) e)
getBasenamePrefix("aaa/bbb.ccc.ddd","cCc") getBasenamePrefix("aaa/bbb.ccc.ddd","ddd") getPathPrefix("aaa/bbb.ccc.ddd","dDd") getPathPrefix("aaa/bbb.ccc.ddd","ccc") file.create("test.bed") checkFileExist("test.bed") tryCatch({checkFileExist("test.bed1")},error = function(e) e) dir.create("testdir") checkPathExist(file.path(getwd(),"testdir")) tryCatch({checkPathExist(file.path(dirname(getwd()), "notexistfolder","testdir"))},error = function(e) e) checkFileCreatable("aaa.bed") tryCatch({checkFileCreatable("testdir1/aaa.bed")},error = function(e) e)