Title: | An Easy-to-use Systematic pipeline for ATACseq data analysis |
---|---|
Description: | This package provides a framework and complete preset pipeline for quantification and analysis of ATAC-seq Reads. It covers raw sequencing reads preprocessing (FASTQ files), reads alignment (Rbowtie2), aligned reads file operations (SAM, BAM, and BED files), peak calling (F-seq), genome annotations (Motif, GO, SNP analysis) and quality control report. The package is managed by dataflow graph. It is easy for user to pass variables seamlessly between processes and understand the workflow. Users can process FASTQ files through end-to-end preset pipeline which produces a pretty HTML report for quality control and preliminary statistical results, or customize workflow starting from any intermediate stages with esATAC functions easily and flexibly. |
Authors: | Zheng Wei, Wei Zhang |
Maintainer: | Zheng Wei <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 1.29.0 |
Built: | 2024-11-29 08:46:20 UTC |
Source: | https://github.com/bioc/esATAC |
This package provides a framework and complete preset pipeline for the quantification and analysis of ATAC-seq Reads. It covers raw sequencing reads preprocessing (FASTQ files), reads alignment (Rbowtie2), aligned reads file operation (SAM, BAM, and BED files), peak calling (fseq), genome annotations (Motif, GO, SNP analysis) and quality control report. The package is managed by dataflow graph. It is easy for user to pass variables seamlessly between processes and understand the workflow. Users can process FASTQ files through end-to-end preset pipeline which produces a pretty HTML report for quality control and preliminary statistical results, or customize workflow starting from any intermediate stages with esATAC functions easily and flexibly.
Preset pipeline for single replicate case study is shown below.
For multi-replicates case study, see atacRepsPipe
.
For single replicate case-control study, see atacPipe2
.
For multi-replicates case-control study, see atacRepsPipe2
.
NOTE: Build bowtie index in the function may take some time. If you already have bowtie2 index files or you want to download(ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes) instead of building, you can let esATAC skip the steps by renaming them following the format (genome+suffix) and put them in reference installation path (refdir). Example: hg19 bowtie2 index files
hg19.1.bt2
hg19.2.bt2
hg19.3.bt2
hg19.4.bt2
hg19.rev.1.bt2
hg19.rev.2.bt2
For single end reads FASTQ files, The required parameters are fastqInput1 and adapter1. For paired end reads non-interleaved FASTQ files (interleave=FALSE,defualt), The required parameters are fastqInput1 and fastqInput2. Otherwise, parameter fastqInput2 is not required (interleave=TRUE)
The paths of sequencing data replicates can be a Character
vector.
For example:
fastqInput1=c("file_1.rep1.fastq","file_1.rep2.fastq")
fastqInput2=c("file_2.rep1.fastq","file_2.rep2.fastq")
The result will be return by the function. An HTML report file will be created for paired end reads. Intermediate files will be save at tmpdir path (default is ./)
atacPipe( genome, fastqInput1, fastqInput2 = NULL, tmpdir = file.path(getwd(), "esATAC-pipeline"), refdir = file.path(tmpdir, "refdir"), threads = 2, adapter1 = NULL, adapter2 = NULL, interleave = FALSE, basicAnalysis = FALSE, createReport = TRUE, motifs = NULL, pipelineName = "pipe", chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
atacPipe( genome, fastqInput1, fastqInput2 = NULL, tmpdir = file.path(getwd(), "esATAC-pipeline"), refdir = file.path(tmpdir, "refdir"), threads = 2, adapter1 = NULL, adapter2 = NULL, interleave = FALSE, basicAnalysis = FALSE, createReport = TRUE, motifs = NULL, pipelineName = "pipe", chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
genome |
|
fastqInput1 |
|
fastqInput2 |
|
tmpdir |
|
refdir |
|
threads |
|
adapter1 |
|
adapter2 |
|
interleave |
|
basicAnalysis |
|
createReport |
|
motifs |
either |
pipelineName |
|
chr |
Which chromatin the program will processing. It must be identical with the filename of cut site information files or subset of . Default:c(1:22, "X", "Y"). |
p.cutoff |
p-value cutoff for returning motifs, default: 1e-6. |
... |
Additional arguments, currently unused. |
See packageDescription('esATAC') for package details.
List
scalar. It is a list that save the result of the pipeline.
Slot "filelist": the input file paths.
Slot "wholesummary": a dataframe that for quality control summary
Slot "atacProcs": ATACProc-class
objects generated by each process in the pipeline.
Slot "filtstat": a dataframe that summary the reads filted in each process.
Zheng Wei and Wei Zhang
printMap
,
atacPipe2
,
atacRenamer
,
atacRemoveAdapter
,
atacBowtie2Mapping
,
atacPeakCalling
,
atacMotifScan
,
atacRepsPipe
,
atacRepsPipe2
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF and BATF3 will be processing) conclusion <- atacPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all vertebrates motif in JASPAR will be processed) conclusion <- atacPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), # MODIFY: Set the genome for your data genome = "hg19") ## End(Not run)
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF and BATF3 will be processing) conclusion <- atacPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all vertebrates motif in JASPAR will be processed) conclusion <- atacPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), # MODIFY: Set the genome for your data genome = "hg19") ## End(Not run)
The preset pipeline to process case control study sequencing data. An HTML report file, result files(e.g. BED, BAM files) and conclusion list will generated. See detail for usage.
atacPipe2( genome, case = list(fastqInput1 = "paths/To/fastq1", fastqInput2 = "paths/To/fastq2", adapter1 = NULL, adapter2 = NULL), control = list(fastqInput1 = "paths/To/fastq1", fastqInput2 = "paths/To/fastq2", adapter1 = NULL, adapter2 = NULL), tmpdir = file.path(getwd(), "esATAC-pipeline"), refdir = file.path(tmpdir, "refdir"), threads = 2, interleave = FALSE, createReport = TRUE, motifs = NULL, chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
atacPipe2( genome, case = list(fastqInput1 = "paths/To/fastq1", fastqInput2 = "paths/To/fastq2", adapter1 = NULL, adapter2 = NULL), control = list(fastqInput1 = "paths/To/fastq1", fastqInput2 = "paths/To/fastq2", adapter1 = NULL, adapter2 = NULL), tmpdir = file.path(getwd(), "esATAC-pipeline"), refdir = file.path(tmpdir, "refdir"), threads = 2, interleave = FALSE, createReport = TRUE, motifs = NULL, chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
genome |
|
case |
|
control |
|
tmpdir |
|
refdir |
|
threads |
|
interleave |
|
createReport |
|
motifs |
either |
chr |
Which chromatin the program will processing. It must be identical with the filename of cut site information files or subset of . Default:c(1:22, "X", "Y"). |
p.cutoff |
p-value cutoff for returning motifs, default: 1e-6. |
... |
Additional arguments, currently unused. |
NOTE: Build bowtie index in this function may take some time. If you already have bowtie2 index files or you want to download(ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes) instead of building, you can let esATAC skip the steps by renaming them following the format (genome+suffix) and put them in reference installation path (refdir). Example: hg19 bowtie2 index files
hg19.1.bt2
hg19.2.bt2
hg19.3.bt2
hg19.4.bt2
hg19.rev.1.bt2
hg19.rev.2.bt2
For single end reads FASTQ files, The required parameters are fastqInput1 and adapter1. For paired end reads non-interleaved FASTQ files (interleave=FALSE,defualt), The required parameters are fastqInput1 and fastqInput2. Otherwise, parameter fastqInput2 is not required (interleave=TRUE)
The paths of sequencing data replicates can be a Character
vector.
For example:
fastqInput1=c("file_1.rep1.fastq","file_1.rep2.fastq")
fastqInput2=c("file_2.rep1.fastq","file_2.rep2.fastq")
The result will be return by the function. An HTML report file will be created for paired end reads. Intermediate files will be save at tmpdir path (default is ./)
List
scalar. It is a list that save the result of the pipeline.
Slot "wholesummary": a dataframe for quality control summary of case and control data
Slot "caselist" and "ctrlist": Each of them is a list that save the result for case or control data.
Slots of "caselist" and "ctrllist":
Slot "filelist": the input file paths.
Slot "wholesummary": a dataframe for quality control summary of case or control data
Slot "atacProcs": ATACProc-class
objects generated by each process in the pipeline.
Slot "filtstat": a dataframe that summary the reads filted in each process.
Zheng Wei and Wei Zhang
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF and BATF3 will be processed) conclusion <- atacPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all vertebrates motif in JASPAR will be processed) conclusion <- atacPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19") ## End(Not run)
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF and BATF3 will be processed) conclusion <- atacPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all vertebrates motif in JASPAR will be processed) conclusion <- atacPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19") ## End(Not run)
This class is inherit from Step
in pipeFrame package,
no more method is extended or override. Please see Step
class for detail.
The preset pipeline to process multi-replicates case study sequencing data. HTML report files, result files(e.g. BED, BAM files) and conclusion list will generated. See detail for usage.
atacRepsPipe( genome, fastqInput1, fastqInput2 = NULL, refdir = NULL, tmpdir = NULL, threads = 2, adapter1 = NULL, adapter2 = NULL, interleave = FALSE, createReport = TRUE, motifs = NULL, prefix = NULL, chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
atacRepsPipe( genome, fastqInput1, fastqInput2 = NULL, refdir = NULL, tmpdir = NULL, threads = 2, adapter1 = NULL, adapter2 = NULL, interleave = FALSE, createReport = TRUE, motifs = NULL, prefix = NULL, chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
genome |
|
fastqInput1 |
|
fastqInput2 |
|
refdir |
|
tmpdir |
|
threads |
|
adapter1 |
|
adapter2 |
|
interleave |
|
createReport |
|
motifs |
either |
prefix |
|
chr |
Which chromatin the program will processing. It must be identical with the filename of cut site information files or subset of . Default:c(1:22, "X", "Y"). |
p.cutoff |
p-value cutoff for returning motifs, default: 1e-6. |
... |
Additional arguments, currently unused. |
List
scalar. It is a list that save the result of the pipeline.
Slot "filelist": the input file paths.
Slot "wholesummary": a dataframe that for quality control summary
Slot "atacProcs": ATACProc-class
objects generated by each process in the pipeline.
Slot "filtstat": a dataframe that summary the reads filted in each process.
Zheng Wei and Wei Zhang
printMap
,
atacPipe2
,
atacRenamer
,
atacRemoveAdapter
,
atacBowtie2Mapping
,
atacPeakCalling
,
atacMotifScan
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF and BATF3 will be processing) conclusion <- atacRepsPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), fastqInput2 = list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all vertebrates motif in JASPAR will be processed) conclusion <- atacRepsPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), fastqInput2 = list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19") ## End(Not run)
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF and BATF3 will be processing) conclusion <- atacRepsPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), fastqInput2 = list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all vertebrates motif in JASPAR will be processed) conclusion <- atacRepsPipe( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" fastqInput1 = list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), fastqInput2 = list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19") ## End(Not run)
The preset pipeline to process multi-replicates case control study sequencing data. HTML report files, result files(e.g. BED, BAM files) and conclusion list will generated. See detail for usage.
atacRepsPipe2( genome, caseFastqInput1, caseFastqInput2, ctrlFastqInput1, ctrlFastqInput2, caseAdapter1 = NULL, caseAdapter2 = NULL, ctrlAdapter1 = NULL, ctrlAdapter2 = NULL, refdir = NULL, tmpdir = NULL, threads = 2, interleave = FALSE, createReport = TRUE, motifs = NULL, chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
atacRepsPipe2( genome, caseFastqInput1, caseFastqInput2, ctrlFastqInput1, ctrlFastqInput2, caseAdapter1 = NULL, caseAdapter2 = NULL, ctrlAdapter1 = NULL, ctrlAdapter2 = NULL, refdir = NULL, tmpdir = NULL, threads = 2, interleave = FALSE, createReport = TRUE, motifs = NULL, chr = c(1:22, "X", "Y"), p.cutoff = 1e-06, ... )
genome |
|
caseFastqInput1 |
|
caseFastqInput2 |
|
ctrlFastqInput1 |
|
ctrlFastqInput2 |
|
caseAdapter1 |
|
caseAdapter2 |
|
ctrlAdapter1 |
|
ctrlAdapter2 |
|
refdir |
|
tmpdir |
|
threads |
|
interleave |
|
createReport |
|
motifs |
either |
chr |
Which chromatin the program will processing. It must be identical with the filename of cut site information files or subset of . Default:c(1:22, "X", "Y"). |
p.cutoff |
p-value cutoff for returning motifs, default: 1e-6. |
... |
Additional arguments, currently unused. |
NOTE: Build bowtie index in this function may take some time. If you already have bowtie2 index files or you want to download(ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes) instead of building, you can let esATAC skip the steps by renaming them following the format (genome+suffix) and put them in reference installation path (refdir). Example: hg19 bowtie2 index files
hg19.1.bt2
hg19.2.bt2
hg19.3.bt2
hg19.4.bt2
hg19.rev.1.bt2
hg19.rev.2.bt2
For single end reads FASTQ files, The required parameters are fastqInput1 and adapter1. For paired end reads non-interleaved FASTQ files (interleave=FALSE,defualt), The required parameters are fastqInput1 and fastqInput2. Otherwise, parameter fastqInput2 is not required (interleave=TRUE)
The paths of sequencing data replicates can be a Character
vector.
For example:
fastqInput1=c("file_1.rep1.fastq","file_1.rep2.fastq")
fastqInput2=c("file_2.rep1.fastq","file_2.rep2.fastq")
The result will be return by the function. An HTML report file will be created for paired end reads. Intermediate files will be save at tmpdir path (default is ./)
List
scalar. It is a list that save the result of the pipeline.
Slot "caselist" and "ctrlist": Each of them is a list that save the result for case or control data.
Slot "comp_result": compare analysis result for case and control data
Zheng Wei and Wei Zhang
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF will be processed) conclusion <- atacRepsPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")), # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all human motif in JASPAR will be processed) conclusion <- atacRepsPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")), # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19" ) ## End(Not run)
## Not run: ## These codes are time consuming so they will not be run and ## checked by bioconductor checker. # call pipeline # for a quick example(only CTCF will be processed) conclusion <- atacRepsPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")), # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19", motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))) # call pipeline # for overall example(all human motif in JASPAR will be processed) conclusion <- atacRepsPipe2( # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")), # MODIFY: Change these paths to your own case files! # e.g. fastqInput1 = "your/own/data/path.fastq" caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"), system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")), # MODIFY: Change these paths to your own control files! # e.g. fastqInput1 = "your/own/data/path.fastq" ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"), system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")), # MODIFY: Set the genome for your data genome = "hg19" ) ## End(Not run)
This function is used to convert SAM file to BED file and merge interleave paired end reads, shift reads, filter reads according to chromosome, filter reads according to fragment size, sort, remove duplicates reads before generating BED file.
atacBam2Bed( atacProc, bamInput = NULL, bedOutput = NULL, reportOutput = NULL, bsgenome = NULL, mergePairIntoFrag = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM|_", sortBed = TRUE, rmMultiMap = TRUE, minFragLen = 0, maxFragLen = 2000, saveExtLen = FALSE, uniqueBed = c("auto", "yes", "no"), ... ) ## S4 method for signature 'ATACProc' atacBam2Bed( atacProc, bamInput = NULL, bedOutput = NULL, reportOutput = NULL, bsgenome = NULL, mergePairIntoFrag = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM|_", sortBed = TRUE, rmMultiMap = TRUE, minFragLen = 0, maxFragLen = 2000, saveExtLen = FALSE, uniqueBed = c("auto", "yes", "no"), ... ) bam2bed( bamInput, bedOutput = NULL, reportOutput = NULL, bsgenome = NULL, mergePairIntoFrag = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM|_", sortBed = TRUE, rmMultiMap = TRUE, minFragLen = 0, maxFragLen = 2000, saveExtLen = FALSE, uniqueBed = c("auto", "yes", "no"), ... )
atacBam2Bed( atacProc, bamInput = NULL, bedOutput = NULL, reportOutput = NULL, bsgenome = NULL, mergePairIntoFrag = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM|_", sortBed = TRUE, rmMultiMap = TRUE, minFragLen = 0, maxFragLen = 2000, saveExtLen = FALSE, uniqueBed = c("auto", "yes", "no"), ... ) ## S4 method for signature 'ATACProc' atacBam2Bed( atacProc, bamInput = NULL, bedOutput = NULL, reportOutput = NULL, bsgenome = NULL, mergePairIntoFrag = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM|_", sortBed = TRUE, rmMultiMap = TRUE, minFragLen = 0, maxFragLen = 2000, saveExtLen = FALSE, uniqueBed = c("auto", "yes", "no"), ... ) bam2bed( bamInput, bedOutput = NULL, reportOutput = NULL, bsgenome = NULL, mergePairIntoFrag = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM|_", sortBed = TRUE, rmMultiMap = TRUE, minFragLen = 0, maxFragLen = 2000, saveExtLen = FALSE, uniqueBed = c("auto", "yes", "no"), ... )
atacProc |
|
bamInput |
|
bedOutput |
|
reportOutput |
|
bsgenome |
|
mergePairIntoFrag |
|
posOffset |
|
negOffset |
|
chrFilterList |
|
sortBed |
|
rmMultiMap |
|
minFragLen |
|
maxFragLen |
|
saveExtLen |
|
uniqueBed |
|
... |
Additional arguments, currently unused. |
The bam file wiil be automatically obtained from
object(atacProc
) or input by hand. Output can be ignored.
An invisible ATACProc-class
object scalar for
downstream analysis.
Zheng Wei, Wei Zhang
library(Rsamtools) # change dataset !! # ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") # bam2bed(bamInput = ex1_file)
library(Rsamtools) # change dataset !! # ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") # bam2bed(bamInput = ex1_file)
This function is used to generate BigWig file from BED reads file. The BigWig file can be shown reads coverage on genome browser.
atacBedToBigWig( atacProc, bedInput = NULL, bsgenome = NULL, bwOutput = NULL, toWig = FALSE, ... ) ## S4 method for signature 'ATACProc' atacBedToBigWig( atacProc, bedInput = NULL, bsgenome = NULL, bwOutput = NULL, toWig = FALSE, ... ) bedToBigWig(bedInput, bsgenome = NULL, bwOutput = NULL, toWig = FALSE, ...)
atacBedToBigWig( atacProc, bedInput = NULL, bsgenome = NULL, bwOutput = NULL, toWig = FALSE, ... ) ## S4 method for signature 'ATACProc' atacBedToBigWig( atacProc, bedInput = NULL, bsgenome = NULL, bwOutput = NULL, toWig = FALSE, ... ) bedToBigWig(bedInput, bsgenome = NULL, bwOutput = NULL, toWig = FALSE, ...)
atacProc |
|
bedInput |
|
bsgenome |
|
bwOutput |
|
toWig |
|
... |
Additional arguments, currently unused. Save as wig file instead of binary BigWig file |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use bedToBigWig
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacSamToBed
samToBed
atacBedUtils
bedUtils
library(R.utils) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") ## Not run: bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) library(BSgenome.Hsapiens.UCSC.hg19) bedToBigWig(bedfile, BSgenome.Hsapiens.UCSC.hg19) dir(td) ## End(Not run)
library(R.utils) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") ## Not run: bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) library(BSgenome.Hsapiens.UCSC.hg19) bedToBigWig(bedfile, BSgenome.Hsapiens.UCSC.hg19) dir(td) ## End(Not run)
This function is used to merge interleave paired end reads in bed, downsample bed reads, shift bed reads, filter bed reads according to chromosome, filter bed reads according to fragment size, sort bed, remove duplicates reads in bed.
atacBedUtils( atacProc, bedInput = NULL, bedOutput = NULL, mergePair = FALSE, downSample = NULL, posOffset = 0L, negOffset = 0L, chrFilterList = c("chrM"), select = FALSE, sortBed = FALSE, uniqueBed = FALSE, minFragLen = 0, maxFragLen = 2e+09, newStepType = "BedUtils", ... ) ## S4 method for signature 'ATACProc' atacBedUtils( atacProc, bedInput = NULL, bedOutput = NULL, mergePair = FALSE, downSample = NULL, posOffset = 0L, negOffset = 0L, chrFilterList = c("chrM"), select = FALSE, sortBed = FALSE, uniqueBed = FALSE, minFragLen = 0, maxFragLen = 2e+09, newStepType = "BedUtils", ... ) bedUtils( bedInput, bedOutput = NULL, mergePair = FALSE, downSample = NULL, reportOutput = NULL, posOffset = 0L, negOffset = 0L, chrFilterList = c("chrM"), select = FALSE, sortBed = FALSE, uniqueBed = FALSE, minFragLen = 0, maxFragLen = 2e+09, newStepType = "BedUtils", ... )
atacBedUtils( atacProc, bedInput = NULL, bedOutput = NULL, mergePair = FALSE, downSample = NULL, posOffset = 0L, negOffset = 0L, chrFilterList = c("chrM"), select = FALSE, sortBed = FALSE, uniqueBed = FALSE, minFragLen = 0, maxFragLen = 2e+09, newStepType = "BedUtils", ... ) ## S4 method for signature 'ATACProc' atacBedUtils( atacProc, bedInput = NULL, bedOutput = NULL, mergePair = FALSE, downSample = NULL, posOffset = 0L, negOffset = 0L, chrFilterList = c("chrM"), select = FALSE, sortBed = FALSE, uniqueBed = FALSE, minFragLen = 0, maxFragLen = 2e+09, newStepType = "BedUtils", ... ) bedUtils( bedInput, bedOutput = NULL, mergePair = FALSE, downSample = NULL, reportOutput = NULL, posOffset = 0L, negOffset = 0L, chrFilterList = c("chrM"), select = FALSE, sortBed = FALSE, uniqueBed = FALSE, minFragLen = 0, maxFragLen = 2e+09, newStepType = "BedUtils", ... )
atacProc |
|
bedInput |
|
bedOutput |
|
mergePair |
|
downSample |
|
posOffset |
|
negOffset |
|
chrFilterList |
|
select |
|
sortBed |
|
uniqueBed |
|
minFragLen |
|
maxFragLen |
|
newStepType |
|
... |
Additional arguments, currently unused. |
reportOutput |
|
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use bedUtils
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacBam2Bed
bam2bed
atacSamToBed
samToBed
atacFragLenDistr
atacExtractCutSite
atacPeakCalling
atacTSSQC
atacBedToBigWig
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) sambzfile <- system.file(package="esATAC", "extdata", "Example.sam.bz2") samfile <- file.path(td,"Example.sam") bunzip2(sambzfile,destname=samfile,overwrite=TRUE,remove=FALSE) atacproc<-samToBed(samInput = samfile) %>% atacBedUtils(maxFragLen = 100, chrFilterList = NULL)
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) sambzfile <- system.file(package="esATAC", "extdata", "Example.sam.bz2") samfile <- file.path(td,"Example.sam") bunzip2(sambzfile,destname=samfile,overwrite=TRUE,remove=FALSE) atacproc<-samToBed(samInput = samfile) %>% atacBedUtils(maxFragLen = 100, chrFilterList = NULL)
Use bowtie2 aligner to map reads to reference genome
atacBowtie2Mapping( atacProc, samOutput = NULL, reportOutput = NULL, bt2Idx = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = "--no-discordant --no-unal --no-mixed -X 2000", ... ) ## S4 method for signature 'ATACProc' atacBowtie2Mapping( atacProc, samOutput = NULL, reportOutput = NULL, bt2Idx = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = "--no-discordant --no-unal --no-mixed -X 2000", ... ) bowtie2Mapping( fastqInput1, fastqInput2 = NULL, samOutput = NULL, reportOutput = NULL, bt2Idx = NULL, interleave = FALSE, threads = getThreads(), paramList = "--no-discordant --no-unal --no-mixed -X 2000", ... )
atacBowtie2Mapping( atacProc, samOutput = NULL, reportOutput = NULL, bt2Idx = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = "--no-discordant --no-unal --no-mixed -X 2000", ... ) ## S4 method for signature 'ATACProc' atacBowtie2Mapping( atacProc, samOutput = NULL, reportOutput = NULL, bt2Idx = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = "--no-discordant --no-unal --no-mixed -X 2000", ... ) bowtie2Mapping( fastqInput1, fastqInput2 = NULL, samOutput = NULL, reportOutput = NULL, bt2Idx = NULL, interleave = FALSE, threads = getThreads(), paramList = "--no-discordant --no-unal --no-mixed -X 2000", ... )
atacProc |
|
samOutput |
|
reportOutput |
|
bt2Idx |
|
fastqInput1 |
|
fastqInput2 |
|
interleave |
|
threads |
|
paramList |
Additional arguments to be passed on to the binaries. See below for details. |
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use bowtie2Mapping
instead.
additional parameters to be passed on to
bowtie2. You can put all aditional
arguments in one Character
(e.g. "–threads 8 –no-mixed")
with white space splited just like command line,
or put them as Character
vector
(e.g. c("–threads","8","–no-mixed")). Note that some
arguments("-x","–interleaved","-U","-1","-2","-S","threads") to the
bowtie2 are invalid if they are already handled as explicit
function arguments. See the output of
bowtie2_usage()
for details about available parameters.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacRemoveAdapter
removeAdapter
bowtie2
bowtie2_build
bowtie2_usage
atacSam2Bam
atacSamToBed
atacLibComplexQC
td <- tempdir() setTmpDir(td) ## Building a bowtie2 index library("Rbowtie2") refs <- dir(system.file(package="esATAC", "extdata", "bt2","refs"), full=TRUE) bowtie2_build(references=refs, bt2Index=file.path(td, "lambda_virus"), "--threads 4 --quiet",overwrite=TRUE) ## Alignments reads_1 <- system.file(package="esATAC", "extdata", "bt2", "reads", "reads_1.fastq") reads_2 <- system.file(package="esATAC", "extdata", "bt2", "reads", "reads_2.fastq") if(file.exists(file.path(td, "lambda_virus.1.bt2"))){ (bowtie2Mapping(bt2Idx = file.path(td, "lambda_virus"), samOutput = file.path(td, "result.sam"), fastqInput1=reads_1,fastqInput2=reads_2,threads=3)) head(readLines(file.path(td, "result.sam"))) }
td <- tempdir() setTmpDir(td) ## Building a bowtie2 index library("Rbowtie2") refs <- dir(system.file(package="esATAC", "extdata", "bt2","refs"), full=TRUE) bowtie2_build(references=refs, bt2Index=file.path(td, "lambda_virus"), "--threads 4 --quiet",overwrite=TRUE) ## Alignments reads_1 <- system.file(package="esATAC", "extdata", "bt2", "reads", "reads_1.fastq") reads_2 <- system.file(package="esATAC", "extdata", "bt2", "reads", "reads_2.fastq") if(file.exists(file.path(td, "lambda_virus.1.bt2"))){ (bowtie2Mapping(bt2Idx = file.path(td, "lambda_virus"), samOutput = file.path(td, "result.sam"), fastqInput1=reads_1,fastqInput2=reads_2,threads=3)) head(readLines(file.path(td, "result.sam"))) }
This function is used to count cut site number in given motif
regions and plot footprint. Multi-motif is supported.
NOTE: The input parameter is a a little bit complex,
atacExtractCutSite
and atacMotifScan
is recommended to use which
makes the entire procedure easier.
atacCutSiteCount( atacProcCutSite, atacProcMotifScan = NULL, csInput = NULL, motif_info = NULL, chr = c(1:22, "X", "Y"), matrixOutput = NULL, strandLength = 100, FootPrint = TRUE, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacCutSiteCount( atacProcCutSite, atacProcMotifScan = NULL, csInput = NULL, motif_info = NULL, chr = c(1:22, "X", "Y"), matrixOutput = NULL, strandLength = 100, FootPrint = TRUE, prefix = NULL, ... ) cutsitecount( csInput = NULL, motif_info = NULL, chr = c(1:22, "X", "Y"), matrixOutput = NULL, strandLength = 100, FootPrint = TRUE, prefix = NULL, ... )
atacCutSiteCount( atacProcCutSite, atacProcMotifScan = NULL, csInput = NULL, motif_info = NULL, chr = c(1:22, "X", "Y"), matrixOutput = NULL, strandLength = 100, FootPrint = TRUE, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacCutSiteCount( atacProcCutSite, atacProcMotifScan = NULL, csInput = NULL, motif_info = NULL, chr = c(1:22, "X", "Y"), matrixOutput = NULL, strandLength = 100, FootPrint = TRUE, prefix = NULL, ... ) cutsitecount( csInput = NULL, motif_info = NULL, chr = c(1:22, "X", "Y"), matrixOutput = NULL, strandLength = 100, FootPrint = TRUE, prefix = NULL, ... )
atacProcCutSite |
|
atacProcMotifScan |
|
csInput |
Your cut site information file(from atacExtractCutSite function, separated by chromatin name and all cut site are sorted) path with prefix. e.g. "/your_cut_site_information_path/prefix". |
motif_info |
A rds file from function |
chr |
Which chromatin the program will processing. It must be identical with the filename of cut site information files or subset of . Default:c(1:22, "X", "Y"). |
matrixOutput |
The output directory, where to save your cut site count of every motif position. an empty folder would be great. Default:tmpdir/Footprint |
strandLength |
How many bp(base pair) do you want to count up/downstream of the motif. default:100. |
FootPrint |
TRUE or FALSE, plot footprint or not. |
prefix |
prefix for the pdf file. |
... |
Additional arguments, currently unused. |
The parameter is simplified because of too many input file.
parameter atacProcCutSite
and atacProcMotifScan
contains all
input information so function atacExtractCutSite
and
atacMotifScan
is recommended to use together. For instance,
if you want footprint of 3 TFs (transcription factor) of human in
chr1-22, X, Y, then you need 24 chromatin cut site files, 3 motif position
files as well as 3 integers of the motif. Function atacExtractCutSite
and
atacMotifScan
will do all this, you just specify which motif you want.
Therefore, atacExtractCutSite
and atacMotifScan
is
recommended to use together.
An invisible ATACProc-class
object scalar.
Wei Zhang
atacExtractCutSite
atacMotifScan
library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) ## processing bed file fra_path <- system.file("extdata", "chr20.50000.bed.bz2", package="esATAC") frag <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20.50000.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) cs.data <- extractcutsite(bedInput = frag, prefix = "ATAC") ## find motif position p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN = bzfile, overwrite=TRUE, remove = FALSE)) # motif <- readRDS(system.file("extdata", "MotifPFM.rds", package="esATAC")) # motif.data <- motifscan(peak = peak1_path, genome = BSgenome.Hsapiens.UCSC.hg19, motifs = motif) ## plot footprint # atacCutSiteCount(atacProcCutSite = cs.data, atacProcMotifScan = motif.data)
library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) ## processing bed file fra_path <- system.file("extdata", "chr20.50000.bed.bz2", package="esATAC") frag <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20.50000.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) cs.data <- extractcutsite(bedInput = frag, prefix = "ATAC") ## find motif position p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN = bzfile, overwrite=TRUE, remove = FALSE)) # motif <- readRDS(system.file("extdata", "MotifPFM.rds", package="esATAC")) # motif.data <- motifscan(peak = peak1_path, genome = BSgenome.Hsapiens.UCSC.hg19, motifs = motif) ## plot footprint # atacCutSiteCount(atacProcCutSite = cs.data, atacProcMotifScan = motif.data)
Extract cutting site from ATAC-seq fangment bed file
(from atacSamToBed
).
atacExtractCutSite( atacProc, bedInput = NULL, csOutput.dir = NULL, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacExtractCutSite( atacProc, bedInput = NULL, csOutput.dir = NULL, prefix = NULL, ... ) extractcutsite(bedInput, csOutput.dir = NULL, prefix = NULL, ...)
atacExtractCutSite( atacProc, bedInput = NULL, csOutput.dir = NULL, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacExtractCutSite( atacProc, bedInput = NULL, csOutput.dir = NULL, prefix = NULL, ... ) extractcutsite(bedInput, csOutput.dir = NULL, prefix = NULL, ...)
atacProc |
|
bedInput |
|
csOutput.dir |
|
prefix |
|
... |
Additional arguments, currently unused. |
In ATAC-seq data, every line in merged bed file
(from atacSamToBed
, the first 3 column is chr, start, end)
means a DNA fragment, the cutting site is start+1 and end, this function
extract and sort this information for the next step
(atacCutSiteCount
).
An invisible ATACProc-class
object scalar for downstream
analysis.
Wei Zhang
library(R.utils) fra_path <- system.file("extdata", "chr20.50000.bed.bz2", package="esATAC") frag <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20.50000.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) extractcutsite(bedInput = frag, prefix = "ATAC")
library(R.utils) fra_path <- system.file("extdata", "chr20.50000.bed.bz2", package="esATAC") frag <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20.50000.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) extractcutsite(bedInput = frag, prefix = "ATAC")
Generate quality control plots from fastq of ATAC-seq data.
atacQCReport(atacProc, input_file = NULL, output_file = NULL, ...) ## S4 method for signature 'ATACProc' atacQCReport(atacProc, input_file = NULL, output_file = NULL, ...) qcreport(input_file, output_file = NULL, ...)
atacQCReport(atacProc, input_file = NULL, output_file = NULL, ...) ## S4 method for signature 'ATACProc' atacQCReport(atacProc, input_file = NULL, output_file = NULL, ...) qcreport(input_file, output_file = NULL, ...)
atacProc |
|
input_file |
|
output_file |
|
... |
Additional arguments, currently unused. |
Every highthroughput sequencing need quality control analysis, this function provide QC for ATAC-seq, such as GC content.
An invisible ATACProc-class
object scalar for downstream
analysis.
Wei Zhang
atacUnzipAndMerge
,
atacRenamer
library(R.utils) fra_path <- system.file("extdata", "chr20_1.2.fq.bz2", package="esATAC") fq1 <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20_1.fq"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) fra_path <- system.file("extdata", "chr20_2.2.fq.bz2", package="esATAC") fq2 <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20_2.fq"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) ## Not run: qcreport(input_file = c(fq1, fq2)) ## End(Not run)
library(R.utils) fra_path <- system.file("extdata", "chr20_1.2.fq.bz2", package="esATAC") fq1 <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20_1.fq"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) fra_path <- system.file("extdata", "chr20_2.2.fq.bz2", package="esATAC") fq2 <- as.vector(bunzip2(filename = fra_path, destname = file.path(getwd(), "chr20_2.fq"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) ## Not run: qcreport(input_file = c(fq1, fq2)) ## End(Not run)
Use AdapterRemoval to identify adapters for paired end data
atacFindAdapter( atacProc, fastqInput1 = NULL, fastqInput2 = NULL, reportPrefix = NULL, interleave = FALSE, findParamList = NULL, threads = getThreads(), ... ) ## S4 method for signature 'ATACProc' atacFindAdapter( atacProc, fastqInput1 = NULL, fastqInput2 = NULL, reportPrefix = NULL, interleave = FALSE, findParamList = NULL, threads = getThreads(), ... ) findAdapter( fastqInput1, fastqInput2 = NULL, reportPrefix = NULL, interleave = FALSE, findParamList = NULL, threads = getThreads(), ... )
atacFindAdapter( atacProc, fastqInput1 = NULL, fastqInput2 = NULL, reportPrefix = NULL, interleave = FALSE, findParamList = NULL, threads = getThreads(), ... ) ## S4 method for signature 'ATACProc' atacFindAdapter( atacProc, fastqInput1 = NULL, fastqInput2 = NULL, reportPrefix = NULL, interleave = FALSE, findParamList = NULL, threads = getThreads(), ... ) findAdapter( fastqInput1, fastqInput2 = NULL, reportPrefix = NULL, interleave = FALSE, findParamList = NULL, threads = getThreads(), ... )
atacProc |
|
fastqInput1 |
|
fastqInput2 |
|
reportPrefix |
|
interleave |
|
findParamList |
Additional arguments to be passed on to the binaries for identifying adapter. See below for details. |
threads |
The number of threads used in this step. |
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use findAdapter
instead.
You can put all aditional
arguments in one Character
(e.g. "–threads 8") with white space
splited just like command line,
or put them in Character
vector(e.g. c("–threads","8")).
Note that some arguments(
"–file1","–file2","–adapter1","–adapter2","–output1","–output2",
"–basename","–interleaved","thread") to the
findParamList are invalid if they are already handled as explicit
function arguments. See the output of
adapterremoval_usage()
for details about available parameters.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacRenamer
renamer
atacUnzipAndMerge
unzipAndMerge
atacBowtie2Mapping
library(magrittr) td <- tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads1.fastq") reads_merged_2 <- file.path(td,"reads2.fastq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) %>% atacRenamer %>% atacFindAdapter dir(td)
library(magrittr) td <- tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads1.fastq") reads_merged_2 <- file.path(td,"reads2.fastq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) %>% atacRenamer %>% atacFindAdapter dir(td)
These functions are used to generate fragment distribution plot. The fourier transform of fragment distribution will be calculated. Strength distribution around period at 10.4bp and 180bp will be shown in another two plots.
atacFragLenDistr(atacProc, reportPrefix = NULL, bedInput = NULL, ...) ## S4 method for signature 'ATACProc' atacFragLenDistr(atacProc, reportPrefix = NULL, bedInput = NULL, ...) fragLenDistr(bedInput, reportPrefix = NULL, ...)
atacFragLenDistr(atacProc, reportPrefix = NULL, bedInput = NULL, ...) ## S4 method for signature 'ATACProc' atacFragLenDistr(atacProc, reportPrefix = NULL, bedInput = NULL, ...) fragLenDistr(bedInput, reportPrefix = NULL, ...)
atacProc |
|
reportPrefix |
|
bedInput |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use fragLenDistr
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacSamToBed
samToBed
atacBedUtils
bedUtils
library(R.utils) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") ## Not run: bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) fragLenDistr(bedfile) ## End(Not run) dir(td)
library(R.utils) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") ## Not run: bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) fragLenDistr(bedfile) ## End(Not run) dir(td)
Calculate the fraction of reads falling within peak regions
atacFripQC( atacProc, atacProcPeak = NULL, bsgenome = NULL, reportOutput = NULL, readsBedInput = NULL, peakBedInput = NULL, ... ) ## S4 method for signature 'ATACProc' atacFripQC( atacProc, atacProcPeak = NULL, bsgenome = NULL, reportOutput = NULL, readsBedInput = NULL, peakBedInput = NULL, ... ) fripQC(readsBedInput, peakBedInput, bsgenome = NULL, reportOutput = NULL, ...)
atacFripQC( atacProc, atacProcPeak = NULL, bsgenome = NULL, reportOutput = NULL, readsBedInput = NULL, peakBedInput = NULL, ... ) ## S4 method for signature 'ATACProc' atacFripQC( atacProc, atacProcPeak = NULL, bsgenome = NULL, reportOutput = NULL, readsBedInput = NULL, peakBedInput = NULL, ... ) fripQC(readsBedInput, peakBedInput, bsgenome = NULL, reportOutput = NULL, ...)
atacProc |
|
atacProcPeak |
|
bsgenome |
|
reportOutput |
|
readsBedInput |
|
peakBedInput |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
or you can use fripQC
instead.
An invisible fripQC
object scalar for downstream analysis.
Zheng Wei
library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling %>% atacFripQC(bsgenome=BSgenome.Hsapiens.UCSC.hg19) dir(td)
library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling %>% atacFripQC(bsgenome=BSgenome.Hsapiens.UCSC.hg19) dir(td)
atacMotifScan and atacMotifScanPair accept PFM in a list
, this
function convert JASPAR PFM file to PFMatrix
or PFMatrixList
.
getMotifInfo(motif.file = NULL)
getMotifInfo(motif.file = NULL)
motif.file |
Motif PFM file downloaded from JASPAR. |
Generate PFMatrix
or PFMatrixList
.
PFMatrix
or PFMatrixList
.
Wei Zhang
motif_file <- system.file("extdata", "CustomizedMotif.txt", package="esATAC") pfm <- getMotifInfo(motif.file = motif_file)
motif_file <- system.file("extdata", "CustomizedMotif.txt", package="esATAC") pfm <- getMotifInfo(motif.file = motif_file)
The function calculate the nonredundant fraction of reads (NRF). Its definition is number of distinct uniquely mapping reads (i.e. after removing duplicates) / Total number of reads. The function also Calculate PCR Bottlenecking Coefficient 1 (PBC1) and PCR Bottlenecking Coefficient 2 (PBC2). PBC1=M1/M_DISTINCT and PBC2=M1/M2, where M1: number of genomic locations where exactly one read maps uniquely, M2: number of genomic locations where two reads map uniquely M_DISTINCT: number of distinct genomic locations to which some read maps uniquely.
atacLibComplexQC( atacProc, reportOutput = NULL, samInput = NULL, singleEnd = FALSE, subsampleSize = Inf, ... ) ## S4 method for signature 'ATACProc' atacLibComplexQC( atacProc, reportOutput = NULL, samInput = NULL, singleEnd = FALSE, subsampleSize = Inf, ... ) libComplexQC( samInput, reportOutput = NULL, singleEnd = FALSE, subsampleSize = Inf, ... )
atacLibComplexQC( atacProc, reportOutput = NULL, samInput = NULL, singleEnd = FALSE, subsampleSize = Inf, ... ) ## S4 method for signature 'ATACProc' atacLibComplexQC( atacProc, reportOutput = NULL, samInput = NULL, singleEnd = FALSE, subsampleSize = Inf, ... ) libComplexQC( samInput, reportOutput = NULL, singleEnd = FALSE, subsampleSize = Inf, ... )
atacProc |
|
reportOutput |
|
samInput |
|
singleEnd |
|
subsampleSize |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use libComplexQC
instead.
An invisible libComplexQC
object scalar for downstream analysis.
Zheng Wei
atacBowtie2Mapping
bowtie2Mapping
library(R.utils) td <- tempdir() setTmpDir(td) sambzfile <- system.file(package="esATAC", "extdata", "Example.sam.bz2") samfile <- file.path(td,"Example.sam") bunzip2(sambzfile,destname=samfile,overwrite=TRUE,remove=FALSE) atacproc<-libComplexQC(samInput = samfile)
library(R.utils) td <- tempdir() setTmpDir(td) sambzfile <- system.file(package="esATAC", "extdata", "Example.sam.bz2") samfile <- file.path(td,"Example.sam") bunzip2(sambzfile,destname=samfile,overwrite=TRUE,remove=FALSE) atacproc<-libComplexQC(samInput = samfile)
Use F-seq to call peak
atacPeakCalling( atacProc, bedInput = NULL, background = NULL, genomicReadsCount = NULL, fragmentSize = 0, featureLength = NULL, bedOutput = NULL, ploidyDir = NULL, fileformat = c("bed", "wig", "npf"), wiggleTrackStep = NULL, threshold = NULL, verbose = TRUE, wgThresholdSet = NULL, ... ) ## S4 method for signature 'ATACProc' atacPeakCalling( atacProc, bedInput = NULL, background = NULL, genomicReadsCount = NULL, fragmentSize = 0, featureLength = NULL, bedOutput = NULL, ploidyDir = NULL, fileformat = c("bed", "wig", "npf"), wiggleTrackStep = NULL, threshold = NULL, verbose = TRUE, wgThresholdSet = NULL, ... ) peakCalling( bedInput, background = NULL, genomicReadsCount = NULL, fragmentSize = 0, featureLength = NULL, bedOutput = NULL, ploidyDir = NULL, fileformat = c("bed", "wig", "npf"), wiggleTrackStep = NULL, threshold = NULL, verbose = TRUE, wgThresholdSet = NULL, ... )
atacPeakCalling( atacProc, bedInput = NULL, background = NULL, genomicReadsCount = NULL, fragmentSize = 0, featureLength = NULL, bedOutput = NULL, ploidyDir = NULL, fileformat = c("bed", "wig", "npf"), wiggleTrackStep = NULL, threshold = NULL, verbose = TRUE, wgThresholdSet = NULL, ... ) ## S4 method for signature 'ATACProc' atacPeakCalling( atacProc, bedInput = NULL, background = NULL, genomicReadsCount = NULL, fragmentSize = 0, featureLength = NULL, bedOutput = NULL, ploidyDir = NULL, fileformat = c("bed", "wig", "npf"), wiggleTrackStep = NULL, threshold = NULL, verbose = TRUE, wgThresholdSet = NULL, ... ) peakCalling( bedInput, background = NULL, genomicReadsCount = NULL, fragmentSize = 0, featureLength = NULL, bedOutput = NULL, ploidyDir = NULL, fileformat = c("bed", "wig", "npf"), wiggleTrackStep = NULL, threshold = NULL, verbose = TRUE, wgThresholdSet = NULL, ... )
atacProc |
|
bedInput |
|
background |
|
genomicReadsCount |
|
fragmentSize |
|
featureLength |
|
bedOutput |
|
ploidyDir |
|
fileformat |
|
wiggleTrackStep |
|
threshold |
|
verbose |
|
wgThresholdSet |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use peakCalling
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacSamToBed
samToBed
atacBedUtils
bedUtils
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling dir(td)
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling dir(td)
Use MACS2 installed by to call peak
atacPeakCallingMACS2( atacProc, bedInput = NULL, background = NULL, outputPrefix = NULL, genomeSize = NULL, pvalueThreshold = 0.01, extsize = 150, shift = -round(extsize/2), ... ) ## S4 method for signature 'ATACProc' atacPeakCallingMACS2( atacProc, bedInput = NULL, background = NULL, outputPrefix = NULL, genomeSize = NULL, pvalueThreshold = 0.01, extsize = 150, shift = -round(extsize/2), ... ) peakCallingMACS2( bedInput, background = NULL, outputPrefix = NULL, genomeSize = NULL, pvalueThreshold = 0.01, extsize = 150, shift = -round(extsize/2), ... ) testPeakCallingMACS2()
atacPeakCallingMACS2( atacProc, bedInput = NULL, background = NULL, outputPrefix = NULL, genomeSize = NULL, pvalueThreshold = 0.01, extsize = 150, shift = -round(extsize/2), ... ) ## S4 method for signature 'ATACProc' atacPeakCallingMACS2( atacProc, bedInput = NULL, background = NULL, outputPrefix = NULL, genomeSize = NULL, pvalueThreshold = 0.01, extsize = 150, shift = -round(extsize/2), ... ) peakCallingMACS2( bedInput, background = NULL, outputPrefix = NULL, genomeSize = NULL, pvalueThreshold = 0.01, extsize = 150, shift = -round(extsize/2), ... ) testPeakCallingMACS2()
atacProc |
|
bedInput |
|
background |
|
outputPrefix |
|
extsize |
|
shift |
|
... |
Additional arguments, currently unused. |
genomeSize |
|
pvalueThreshold |
|
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use peakCalling
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacSamToBed
samToBed
atacBedUtils
bedUtils
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling dir(td)
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling dir(td)
These functions are used to calculate the overlap ratio in specific quality control rigion. Blacklist and DHS region are provided. You can also set your own BED file as quality control rigion.
atacPeakQC( atacProc, bsgenome = NULL, reportOutput = NULL, qcbedInput = c("DHS", "blacklist", "path/to/bed"), bedInput = NULL, newStepType = "PeakQC", ... ) ## S4 method for signature 'ATACProc' atacPeakQC( atacProc, bsgenome = NULL, reportOutput = NULL, qcbedInput = c("DHS", "blacklist", "path/to/bed"), bedInput = NULL, newStepType = "PeakQC", ... ) peakQC( bedInput, bsgenome = NULL, reportOutput = NULL, qcbedInput = c("DHS", "blacklist", "path/to/bed"), newStepType = "PeakQC", ... )
atacPeakQC( atacProc, bsgenome = NULL, reportOutput = NULL, qcbedInput = c("DHS", "blacklist", "path/to/bed"), bedInput = NULL, newStepType = "PeakQC", ... ) ## S4 method for signature 'ATACProc' atacPeakQC( atacProc, bsgenome = NULL, reportOutput = NULL, qcbedInput = c("DHS", "blacklist", "path/to/bed"), bedInput = NULL, newStepType = "PeakQC", ... ) peakQC( bedInput, bsgenome = NULL, reportOutput = NULL, qcbedInput = c("DHS", "blacklist", "path/to/bed"), newStepType = "PeakQC", ... )
atacProc |
|
bsgenome |
|
reportOutput |
|
qcbedInput |
|
bedInput |
|
newStepType |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use peakQC
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) blacklistfile <- system.file(package="esATAC", "extdata", "hg19.blacklist.bed") library(BSgenome.Hsapiens.UCSC.hg19) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling %>% atacPeakQC(qcbedInput = blacklistfile, bsgenome = BSgenome.Hsapiens.UCSC.hg19) dir(td)
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) blacklistfile <- system.file(package="esATAC", "extdata", "hg19.blacklist.bed") library(BSgenome.Hsapiens.UCSC.hg19) bedUtils(bedInput = bedfile,maxFragLen = 100, chrFilterList = NULL) %>% atacPeakCalling %>% atacPeakQC(qcbedInput = blacklistfile, bsgenome = BSgenome.Hsapiens.UCSC.hg19) dir(td)
Use AdapterRemoval to remove adapters
atacRemoveAdapter( atacProc, adapter1 = NULL, adapter2 = NULL, fastqOutput1 = NULL, reportPrefix = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = NULL, findParamList = NULL, ... ) ## S4 method for signature 'ATACProc' atacRemoveAdapter( atacProc, adapter1 = NULL, adapter2 = NULL, fastqOutput1 = NULL, reportPrefix = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = NULL, findParamList = NULL, ... ) removeAdapter( fastqInput1, fastqInput2, adapter1, adapter2, fastqOutput1 = NULL, reportPrefix = NULL, fastqOutput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = NULL, findParamList = NULL, ... )
atacRemoveAdapter( atacProc, adapter1 = NULL, adapter2 = NULL, fastqOutput1 = NULL, reportPrefix = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = NULL, findParamList = NULL, ... ) ## S4 method for signature 'ATACProc' atacRemoveAdapter( atacProc, adapter1 = NULL, adapter2 = NULL, fastqOutput1 = NULL, reportPrefix = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = NULL, findParamList = NULL, ... ) removeAdapter( fastqInput1, fastqInput2, adapter1, adapter2, fastqOutput1 = NULL, reportPrefix = NULL, fastqOutput2 = NULL, interleave = FALSE, threads = getThreads(), paramList = NULL, findParamList = NULL, ... )
atacProc |
|
adapter1 |
|
adapter2 |
|
fastqOutput1 |
|
reportPrefix |
|
fastqOutput2 |
|
fastqInput1 |
|
fastqInput2 |
|
interleave |
|
threads |
|
paramList |
Additional arguments to be passed on to the binaries for removing adapter. See below for details. |
findParamList |
Additional arguments to be passed on to the binaries for identifying adapter. See below for details. |
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use removeAdapter
instead.
You can put all aditional
arguments in one Character
(e.g. "–threads 8") with white space
splited just like command line,
or put them in Character
vector(e.g. c("–threads","8")).
Note that some arguments(
"–file1","–file2","–adapter1","–adapter2","–output1","–output2",
"–basename","–interleaved","thread") to the
paramList and findParamList are invalid if they are already handled as explicit
function arguments. See the output of
adapterremoval_usage()
for details about available parameters.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacRenamer
renamer
atacUnzipAndMerge
unzipAndMerge
atacBowtie2Mapping
library(magrittr) td <- tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads1.fastq") reads_merged_2 <- file.path(td,"reads2.fastq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) %>% atacRenamer %>% atacFindAdapter %>% atacRemoveAdapter dir(td)
library(magrittr) td <- tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads1.fastq") reads_merged_2 <- file.path(td,"reads2.fastq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) %>% atacRenamer %>% atacFindAdapter %>% atacRemoveAdapter dir(td)
Rename reads name in fastq with increasing integer
atacRenamer( atacProc, fastqOutput1 = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), ... ) ## S4 method for signature 'ATACProc' atacRenamer( atacProc, fastqOutput1 = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), ... ) renamer( fastqInput1 = NULL, fastqInput2 = NULL, fastqOutput1 = NULL, fastqOutput2 = NULL, interleave = FALSE, threads = getThreads(), ... )
atacRenamer( atacProc, fastqOutput1 = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), ... ) ## S4 method for signature 'ATACProc' atacRenamer( atacProc, fastqOutput1 = NULL, fastqOutput2 = NULL, fastqInput1 = NULL, fastqInput2 = NULL, interleave = FALSE, threads = getThreads(), ... ) renamer( fastqInput1 = NULL, fastqInput2 = NULL, fastqOutput1 = NULL, fastqOutput2 = NULL, interleave = FALSE, threads = getThreads(), ... )
atacProc |
|
fastqOutput1 |
|
fastqOutput2 |
|
fastqInput1 |
|
fastqInput2 |
|
interleave |
|
threads |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use renamer
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacUnzipAndMerge
unzipAndMerge
atacQCReport
atacRemoveAdapter
ignoreCheck() # warnning: run this for fast test only library(magrittr) td <- tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads1.fastq") reads_merged_2 <- file.path(td,"reads2.fastq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) %>% atacRenamer dir(td)
ignoreCheck() # warnning: run this for fast test only library(magrittr) td <- tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads1.fastq") reads_merged_2 <- file.path(td,"reads2.fastq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) %>% atacRenamer dir(td)
Ranking functional groups based on a set of genes. For more information, please see enrichGO.
atacGOAnalysis( atacProc, gene = NULL, OrgDb = NULL, keytype = "ENTREZID", ont = "MF", pvalueCutoff = 0.05, pAdjustMethod = "BH", universe = NULL, qvalueCutoff = 0.2, readable = FALSE, pool = FALSE, goOutput = NULL, ... ) ## S4 method for signature 'ATACProc' atacGOAnalysis( atacProc, gene = NULL, OrgDb = NULL, keytype = "ENTREZID", ont = "MF", pvalueCutoff = 0.05, pAdjustMethod = "BH", universe = NULL, qvalueCutoff = 0.2, readable = FALSE, pool = FALSE, goOutput = NULL, ... ) goanalysis( gene, OrgDb = NULL, keytype = "ENTREZID", ont = "MF", pvalueCutoff = 0.05, pAdjustMethod = "BH", universe = NULL, qvalueCutoff = 0.2, readable = FALSE, pool = FALSE, goOutput = NULL, ... )
atacGOAnalysis( atacProc, gene = NULL, OrgDb = NULL, keytype = "ENTREZID", ont = "MF", pvalueCutoff = 0.05, pAdjustMethod = "BH", universe = NULL, qvalueCutoff = 0.2, readable = FALSE, pool = FALSE, goOutput = NULL, ... ) ## S4 method for signature 'ATACProc' atacGOAnalysis( atacProc, gene = NULL, OrgDb = NULL, keytype = "ENTREZID", ont = "MF", pvalueCutoff = 0.05, pAdjustMethod = "BH", universe = NULL, qvalueCutoff = 0.2, readable = FALSE, pool = FALSE, goOutput = NULL, ... ) goanalysis( gene, OrgDb = NULL, keytype = "ENTREZID", ont = "MF", pvalueCutoff = 0.05, pAdjustMethod = "BH", universe = NULL, qvalueCutoff = 0.2, readable = FALSE, pool = FALSE, goOutput = NULL, ... )
atacProc |
|
gene |
A vector of entrez gene id. |
OrgDb |
Genome wide annotation databese. |
keytype |
Keytype of input gene. |
ont |
One of "MF", "BP", and "CC" subontologies. "MF" for molecular function, "BP" for biological process, "CC" for cellular component. |
pvalueCutoff |
pvalueCutoff. |
pAdjustMethod |
One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". |
universe |
Background genes. |
qvalueCutoff |
qvalue cutoff. |
readable |
whether mapping gene ID to gene Name. |
pool |
If ont=’ALL’, whether pool 3 GO sub-ontologies. |
goOutput |
|
... |
Additional arguments, currently unused. |
This function using enrichGO to do GO
analysis but fixed some parameters. If atacProc is not NULL, it will read
the gene ID from the output of atacPeakAnno
.
An invisible ATACProc-class
object scalar.
Wei Zhang
Guangchuang Yu., Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287
atacPeakAnno
enrichGO function enrichGO in package
"clusterProfiler"
## Not run: library(org.Hs.eg.db) # generate simulated geneID geneId <- as.character(sample(seq(10000), 100)) goanalysis(gene = geneId, OrgDb = 'org.Hs.eg.db') ## End(Not run)
## Not run: library(org.Hs.eg.db) # generate simulated geneID geneId <- as.character(sample(seq(10000), 100)) goanalysis(gene = geneId, OrgDb = 'org.Hs.eg.db') ## End(Not run)
Search motif position in genome according thr given motif and peak information.
atacMotifScan( atacProc, peak = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-06, scanO.dir = NULL, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacMotifScan( atacProc, peak = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-06, scanO.dir = NULL, prefix = NULL, ... ) motifscan( peak = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-06, scanO.dir = NULL, prefix = NULL, ... )
atacMotifScan( atacProc, peak = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-06, scanO.dir = NULL, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacMotifScan( atacProc, peak = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-06, scanO.dir = NULL, prefix = NULL, ... ) motifscan( peak = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-06, scanO.dir = NULL, prefix = NULL, ... )
atacProc |
|
peak |
|
genome |
BSgenome object, Default: from |
motifs |
either |
p.cutoff |
p-value cutoff for returning motifs. |
scanO.dir |
|
prefix |
prefix for Output file. |
... |
Additional arguments, currently unused. |
This function scan motif position in a given genome regions.
An invisible ATACProc-class
object scalar for
downstream analysis.
Wei Zhang
atacPeakCalling
atacCutSiteCount
## Not run: library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) peak.path <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak.path <- as.vector(bunzip2(filename = peak.path, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE , remove = FALSE)) motif <- readRDS(system.file("extdata", "MotifPFM.rds", package="esATAC")) motifscan(peak = peak.path, genome = BSgenome.Hsapiens.UCSC.hg19, motifs = motif) ## End(Not run)
## Not run: library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) peak.path <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak.path <- as.vector(bunzip2(filename = peak.path, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE , remove = FALSE)) motif <- readRDS(system.file("extdata", "MotifPFM.rds", package="esATAC")) motifscan(peak = peak.path, genome = BSgenome.Hsapiens.UCSC.hg19, motifs = motif) ## End(Not run)
Search motif position in genome according thr given motif and peak information.
atacMotifScanPair( atacProc, peak1 = NULL, peak2 = NULL, background = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-04, scanO.dir = NULL, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacMotifScanPair( atacProc, peak1 = NULL, peak2 = NULL, background = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-04, scanO.dir = NULL, prefix = NULL, ... ) motifscanpair( peak1 = NULL, peak2 = NULL, background = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-04, scanO.dir = NULL, prefix = NULL, ... )
atacMotifScanPair( atacProc, peak1 = NULL, peak2 = NULL, background = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-04, scanO.dir = NULL, prefix = NULL, ... ) ## S4 method for signature 'ATACProc' atacMotifScanPair( atacProc, peak1 = NULL, peak2 = NULL, background = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-04, scanO.dir = NULL, prefix = NULL, ... ) motifscanpair( peak1 = NULL, peak2 = NULL, background = NULL, genome = NULL, motifs = NULL, p.cutoff = 1e-04, scanO.dir = NULL, prefix = NULL, ... )
atacProc |
|
peak1 |
peak file path. |
peak2 |
peak file path. |
background |
background peak file path. |
genome |
BSgenome object, Default: from |
motifs |
either |
p.cutoff |
p-value cutoff for returning motifs. |
scanO.dir |
|
prefix |
prefix for Output file. Order: peak1, peak2, backgroud. |
... |
Additional arguments, currently unused. |
This function scan motif position in a given genome regions.
An invisible ATACProc-class
object scalar for
downstream analysis.
Wei Zhang
## Not run: library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") p2bz <- system.file("extdata", "Example_peak2.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE , remove = FALSE)) peak2_path <- as.vector(bunzip2(filename = p2bz, destname = file.path(getwd(), "Example_peak2.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) peakcom.output <- peakcomp(bedInput1 = peak1_path, bedInput2 = peak2_path, olap.rate = 0.1) motif <- readRDS(system.file("extdata", "MotifPFM.rds", package="esATAC")) output <- atacMotifScanPair(atacProc = peakcom.output, genome = BSgenome.Hsapiens.UCSC.hg19, motifs = motif) ## End(Not run)
## Not run: library(R.utils) library(BSgenome.Hsapiens.UCSC.hg19) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") p2bz <- system.file("extdata", "Example_peak2.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE , remove = FALSE)) peak2_path <- as.vector(bunzip2(filename = p2bz, destname = file.path(getwd(), "Example_peak2.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) peakcom.output <- peakcomp(bedInput1 = peak1_path, bedInput2 = peak2_path, olap.rate = 0.1) motif <- readRDS(system.file("extdata", "MotifPFM.rds", package="esATAC")) output <- atacMotifScanPair(atacProc = peakcom.output, genome = BSgenome.Hsapiens.UCSC.hg19, motifs = motif) ## End(Not run)
This function annotates ATAC-seq peak by a given annotation database.
For more information, please see annotatePeak
.
atacPeakAnno( atacProc, peakInput = NULL, tssRegion = c(-1000, 1000), TxDb = NULL, level = "transcript", genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", annoOutput = NULL, ... ) ## S4 method for signature 'ATACProc' atacPeakAnno( atacProc, peakInput = NULL, tssRegion = c(-1000, 1000), TxDb = NULL, level = "transcript", genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", annoOutput = NULL, ... ) peakanno( peakInput, tssRegion = c(-1000, 1000), TxDb = NULL, level = "transcript", genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", annoOutput = NULL, ... )
atacPeakAnno( atacProc, peakInput = NULL, tssRegion = c(-1000, 1000), TxDb = NULL, level = "transcript", genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", annoOutput = NULL, ... ) ## S4 method for signature 'ATACProc' atacPeakAnno( atacProc, peakInput = NULL, tssRegion = c(-1000, 1000), TxDb = NULL, level = "transcript", genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", annoOutput = NULL, ... ) peakanno( peakInput, tssRegion = c(-1000, 1000), TxDb = NULL, level = "transcript", genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", annoOutput = NULL, ... )
atacProc |
|
peakInput |
|
tssRegion |
Region range of TSS, default:c(-1000, 1000). |
TxDb |
TxDb object, annotation database. |
level |
"transcript" or "gene". |
genomicAnnotationPriority |
genomic annotation priority. |
annoDb |
Gene annotation database. |
addFlankGeneInfo |
logical, add flanking gene information from the peaks. |
flankDistance |
distance of flanking sequence. |
sameStrand |
logical, whether find nearest/overlap gene in the same strand. |
ignoreOverlap |
logical, whether ignore overlap of TSS with peak. |
ignoreUpstream |
logical, if True only annotate gene at the 3' of the peak. |
ignoreDownstream |
logical, if True only annotate gene at the 5' of the peak. |
overlap |
one of 'TSS' or 'all', if overlap="all", then gene overlap with peak will be reported as nearest gene, no matter the overlap is at TSS region or not. |
annoOutput |
|
... |
Additional arguments, currently unused. |
An invisible ATACProc-class
object scalar for
downstream analysis.
Wei Zhang
Guangchuang Yu, Li-Gen Wang, Qing-Yu He. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 2015, 31(14):2382-2383
atacPeakCalling
atacGOAnalysis
library(R.utils) library(TxDb.Hsapiens.UCSC.hg19.knownGene) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) #peakanno(peakInput = peak1_path, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene, #annoDb = 'org.Hs.eg.db')
library(R.utils) library(TxDb.Hsapiens.UCSC.hg19.knownGene) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) #peakanno(peakInput = peak1_path, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene, #annoDb = 'org.Hs.eg.db')
This function compares two peak file and report overlap or differential peaks according to the parameter "operation".
atacPeakComp( atacProcPeak1, atacProcPeak2, bedInput1 = NULL, bedInput2 = NULL, bedOutput = NULL, olap.rate = 0.2, ... ) ## S4 method for signature 'ATACProc' atacPeakComp( atacProcPeak1, atacProcPeak2, bedInput1 = NULL, bedInput2 = NULL, bedOutput = NULL, olap.rate = 0.2, ... ) peakcomp( bedInput1 = NULL, bedInput2 = NULL, bedOutput = NULL, olap.rate = 0.2, ... )
atacPeakComp( atacProcPeak1, atacProcPeak2, bedInput1 = NULL, bedInput2 = NULL, bedOutput = NULL, olap.rate = 0.2, ... ) ## S4 method for signature 'ATACProc' atacPeakComp( atacProcPeak1, atacProcPeak2, bedInput1 = NULL, bedInput2 = NULL, bedOutput = NULL, olap.rate = 0.2, ... ) peakcomp( bedInput1 = NULL, bedInput2 = NULL, bedOutput = NULL, olap.rate = 0.2, ... )
atacProcPeak1 |
|
atacProcPeak2 |
|
bedInput1 |
|
bedInput2 |
|
bedOutput |
The output file path for overlap peaks. |
olap.rate |
Overlap rate, if the overlap region between 2 peak is more than this rate of the short peak, these two peak are considered to be overlap and will be merged to a bigger peak. Default: 0.2. NOTICE: multi-peak will be merged together! |
... |
Additional arguments, currently unused. |
An invisible ATACProc-class
object scalar for
downstream analysis.
Wei Zhang
library(R.utils) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") p2bz <- system.file("extdata", "Example_peak2.bed.bz2", package="esATAC") ## Not run: peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE , remove = FALSE)) peak2_path <- as.vector(bunzip2(filename = p2bz, destname = file.path(getwd(), "Example_peak2.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) output <- peakcomp(bedInput1 = peak1_path, bedInput2 = peak2_path, olap.rate = 0.1) ## End(Not run)
library(R.utils) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") p2bz <- system.file("extdata", "Example_peak2.bed.bz2", package="esATAC") ## Not run: peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE , remove = FALSE)) peak2_path <- as.vector(bunzip2(filename = p2bz, destname = file.path(getwd(), "Example_peak2.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) output <- peakcomp(bedInput1 = peak1_path, bedInput2 = peak2_path, olap.rate = 0.1) ## End(Not run)
Find snps(user providing) in given regions. This function do not consider strand.
atacSNPAnno( atacProc, snp.info = NULL, region.info = NULL, annoOutput = NULL, ... ) ## S4 method for signature 'ATACProc' atacSNPAnno( atacProc, snp.info = NULL, region.info = NULL, annoOutput = NULL, ... ) snpanno(snp.info = NULL, region.info = NULL, annoOutput = NULL, ...)
atacSNPAnno( atacProc, snp.info = NULL, region.info = NULL, annoOutput = NULL, ... ) ## S4 method for signature 'ATACProc' atacSNPAnno( atacProc, snp.info = NULL, region.info = NULL, annoOutput = NULL, ... ) snpanno(snp.info = NULL, region.info = NULL, annoOutput = NULL, ...)
atacProc |
|
snp.info |
|
region.info |
|
annoOutput |
|
... |
withend Your snp data has only one position column or 2. |
An invisible ATACProc-class
object scalar.
Wei Zhang
library(R.utils) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) snps <- system.file("extdata", "snp_info", package="esATAC") #snpanno(snp.info = snps, region.info = peak1_path)
library(R.utils) p1bz <- system.file("extdata", "Example_peak1.bed.bz2", package="esATAC") peak1_path <- as.vector(bunzip2(filename = p1bz, destname = file.path(getwd(), "Example_peak1.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE)) snps <- system.file("extdata", "snp_info", package="esATAC") #snpanno(snp.info = snps, region.info = peak1_path)
Sort bamfile and build index.
atacBamSort(atacProc, bamInput = NULL, bamOutput = NULL, ...) ## S4 method for signature 'ATACProc' atacBamSort(atacProc, bamInput = NULL, bamOutput = NULL, ...) bamsort(bamInput = NULL, bamOutput = NULL, ...)
atacBamSort(atacProc, bamInput = NULL, bamOutput = NULL, ...) ## S4 method for signature 'ATACProc' atacBamSort(atacProc, bamInput = NULL, bamOutput = NULL, ...) bamsort(bamInput = NULL, bamOutput = NULL, ...)
atacProc |
|
bamInput |
|
bamOutput |
|
... |
Additional arguments, currently unused. |
An invisible ATACProc-class
object scalar for
downstream analysis.
Wei Zhang
library(Rsamtools) ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") bamsort(bamInput = ex1_file)
library(Rsamtools) ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") bamsort(bamInput = ex1_file)
This function convert a sam file into a bam file.
atacSam2Bam(atacProc, samInput = NULL, bamOutput = NULL, isSort = TRUE, ...) ## S4 method for signature 'ATACProc' atacSam2Bam(atacProc, samInput = NULL, bamOutput = NULL, isSort = TRUE, ...) sam2bam(samInput, bamOutput = NULL, isSort = TRUE, ...)
atacSam2Bam(atacProc, samInput = NULL, bamOutput = NULL, isSort = TRUE, ...) ## S4 method for signature 'ATACProc' atacSam2Bam(atacProc, samInput = NULL, bamOutput = NULL, isSort = TRUE, ...) sam2bam(samInput, bamOutput = NULL, isSort = TRUE, ...)
atacProc |
|
samInput |
|
bamOutput |
|
isSort |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use bamToBed
instead.
An invisible ATACProc-class
object scalar for
downstream analysis.
Wei Zhang
atacBowtie2Mapping
atacBam2Bed
atacBamSort
library(R.utils) sam_bz <- system.file("extdata", "Example.sam.bz2", package="esATAC") sam_path <- as.vector(bunzip2(filename = sam_bz, destname = file.path(getwd(), "Example.sam"), ext="bz2", FUN=bzfile, remove = FALSE)) sam2bam(samInput = sam_path)
library(R.utils) sam_bz <- system.file("extdata", "Example.sam.bz2", package="esATAC") sam_path <- as.vector(bunzip2(filename = sam_bz, destname = file.path(getwd(), "Example.sam"), ext="bz2", FUN=bzfile, remove = FALSE)) sam2bam(samInput = sam_path)
This function is used to convert SAM file to BED file and merge interleave paired end reads, shift reads, filter reads according to chromosome, filter reads according to fragment size, sort, remove duplicates reads before generating BED file.
atacSamToBed( atacProc, reportOutput = NULL, merge = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM", samInput = NULL, bedOutput = NULL, sortBed = TRUE, minFragLen = 0, maxFragLen = 100, saveExtLen = FALSE, uniqueBed = TRUE, ... ) ## S4 method for signature 'ATACProc' atacSamToBed( atacProc, reportOutput = NULL, merge = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM", samInput = NULL, bedOutput = NULL, sortBed = TRUE, minFragLen = 0, maxFragLen = 100, saveExtLen = FALSE, uniqueBed = TRUE, ... ) samToBed( samInput, reportOutput = NULL, merge = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM", bedOutput = NULL, sortBed = TRUE, minFragLen = 0, maxFragLen = 100, saveExtLen = FALSE, uniqueBed = TRUE, ... )
atacSamToBed( atacProc, reportOutput = NULL, merge = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM", samInput = NULL, bedOutput = NULL, sortBed = TRUE, minFragLen = 0, maxFragLen = 100, saveExtLen = FALSE, uniqueBed = TRUE, ... ) ## S4 method for signature 'ATACProc' atacSamToBed( atacProc, reportOutput = NULL, merge = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM", samInput = NULL, bedOutput = NULL, sortBed = TRUE, minFragLen = 0, maxFragLen = 100, saveExtLen = FALSE, uniqueBed = TRUE, ... ) samToBed( samInput, reportOutput = NULL, merge = c("auto", "yes", "no"), posOffset = +4, negOffset = -5, chrFilterList = "chrM", bedOutput = NULL, sortBed = TRUE, minFragLen = 0, maxFragLen = 100, saveExtLen = FALSE, uniqueBed = TRUE, ... )
atacProc |
|
reportOutput |
|
merge |
|
posOffset |
|
negOffset |
|
chrFilterList |
|
samInput |
|
bedOutput |
|
sortBed |
|
minFragLen |
|
maxFragLen |
|
saveExtLen |
|
uniqueBed |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
you can use samToBed
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacBowtie2Mapping
bowtie2Mapping
atacFragLenDistr
atacExtractCutSite
atacPeakCalling
atacBedUtils
atacTSSQC
atacBedToBigWig
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) sambzfile <- system.file(package="esATAC", "extdata", "Example.sam.bz2") samfile <- file.path(td,"Example.sam") bunzip2(sambzfile,destname=samfile,overwrite=TRUE,remove=FALSE) samToBed(samInput = samfile)
library(R.utils) library(magrittr) td <- tempdir() setTmpDir(td) sambzfile <- system.file(package="esATAC", "extdata", "Example.sam.bz2") samfile <- file.path(td,"Example.sam") bunzip2(sambzfile,destname=samfile,overwrite=TRUE,remove=FALSE) samToBed(samInput = samfile)
When user call all steps in the pipeline, the final report can be generated.
atacSingleRepReport(prevStep, htmlOutput = NULL, createHTML = TRUE, ...) ## S4 method for signature 'Step' atacSingleRepReport(prevStep, htmlOutput = NULL, createHTML = TRUE, ...)
atacSingleRepReport(prevStep, htmlOutput = NULL, createHTML = TRUE, ...) ## S4 method for signature 'Step' atacSingleRepReport(prevStep, htmlOutput = NULL, createHTML = TRUE, ...)
prevStep |
|
htmlOutput |
|
createHTML |
|
... |
Additional arguments, currently unused. |
The report is HTML format. All link in HTML file is the relative directory in report step folder and other step folder If user want to move HTML file and keep all link access available, they should move the whole pipeline folder at the same time.
An invisible ATACProc-class
object (Step-class
based) scalar for downstream analysis.
Zheng Wei
These functions are used to generate the reads coverage plot around TSS.
atacTSSQC( atacProc, txdbKnownGene = NULL, bsgenome = NULL, reportPrefix = NULL, bedInput = NULL, fragLenRange = c(0, 2000), tssUpdownstream = 1000, newStepType = "TSSQC", ... ) ## S4 method for signature 'ATACProc' atacTSSQC( atacProc, txdbKnownGene = NULL, bsgenome = NULL, reportPrefix = NULL, bedInput = NULL, fragLenRange = c(0, 2000), tssUpdownstream = 1000, newStepType = "TSSQC", ... ) tssQC( bedInput, txdbKnownGene = NULL, bsgenome = NULL, reportPrefix = NULL, fragLenRange = c(0, 2000), tssUpdownstream = 1000, newStepType = "TSSQC", ... )
atacTSSQC( atacProc, txdbKnownGene = NULL, bsgenome = NULL, reportPrefix = NULL, bedInput = NULL, fragLenRange = c(0, 2000), tssUpdownstream = 1000, newStepType = "TSSQC", ... ) ## S4 method for signature 'ATACProc' atacTSSQC( atacProc, txdbKnownGene = NULL, bsgenome = NULL, reportPrefix = NULL, bedInput = NULL, fragLenRange = c(0, 2000), tssUpdownstream = 1000, newStepType = "TSSQC", ... ) tssQC( bedInput, txdbKnownGene = NULL, bsgenome = NULL, reportPrefix = NULL, fragLenRange = c(0, 2000), tssUpdownstream = 1000, newStepType = "TSSQC", ... )
atacProc |
|
txdbKnownGene |
|
bsgenome |
|
reportPrefix |
|
bedInput |
|
fragLenRange |
|
tssUpdownstream |
|
newStepType |
|
... |
Additional arguments, currently unused. |
The parameter related to input and output file path
will be automatically
obtained from ATACProc-class
object(atacProc
) or
generated based on known parameters
if their values are default(e.g. NULL
).
Otherwise, the generated values will be overwrited.
If you want to use this function independently,
atacProc
should be set NULL
or you can use tssQC
instead.
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
atacSamToBed
samToBed
atacBedUtils
bedUtils
library(R.utils) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(BSgenome.Hsapiens.UCSC.hg19) tssQC(bedfile,TxDb.Hsapiens.UCSC.hg19.knownGene,BSgenome.Hsapiens.UCSC.hg19,fragLenRange=c(180,247)) dir(td)
library(R.utils) td <- tempdir() setTmpDir(td) bedbzfile <- system.file(package="esATAC", "extdata", "chr20.50000.bed.bz2") bedfile <- file.path(td,"chr20.50000.bed") bunzip2(bedbzfile,destname=bedfile,overwrite=TRUE,remove=FALSE) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(BSgenome.Hsapiens.UCSC.hg19) tssQC(bedfile,TxDb.Hsapiens.UCSC.hg19.knownGene,BSgenome.Hsapiens.UCSC.hg19,fragLenRange=c(180,247)) dir(td)
Unzip and merge fastq files that are in format of bzip, gzip or fastq
atacUnzipAndMerge( fastqInput1, fastqInput2 = NULL, fastqOutput1 = NULL, fastqOutput2 = NULL, interleave = FALSE, ... ) unzipAndMerge( fastqInput1, fastqInput2 = NULL, fastqOutput1 = NULL, fastqOutput2 = NULL, interleave = FALSE, ... )
atacUnzipAndMerge( fastqInput1, fastqInput2 = NULL, fastqOutput1 = NULL, fastqOutput2 = NULL, interleave = FALSE, ... ) unzipAndMerge( fastqInput1, fastqInput2 = NULL, fastqOutput1 = NULL, fastqOutput2 = NULL, interleave = FALSE, ... )
fastqInput1 |
|
fastqInput2 |
|
fastqOutput1 |
|
fastqOutput2 |
|
interleave |
|
... |
Additional arguments, currently unused. |
An invisible ATACProc-class
object scalar for downstream analysis.
Zheng Wei
ignoreCheck() # warnning: run this for fast test only td<-tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads_1.fq") reads_merged_2 <- file.path(td,"reads_2.fq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) dir(td)
ignoreCheck() # warnning: run this for fast test only td<-tempdir() setTmpDir(td) # Identify adapters prefix<-system.file(package="esATAC", "extdata", "uzmg") (reads_1 <-file.path(prefix,"m1",dir(file.path(prefix,"m1")))) (reads_2 <-file.path(prefix,"m2",dir(file.path(prefix,"m2")))) reads_merged_1 <- file.path(td,"reads_1.fq") reads_merged_2 <- file.path(td,"reads_2.fq") atacproc <- atacUnzipAndMerge(fastqInput1 = reads_1,fastqInput2 = reads_2) dir(td)