| Title: | The R package to curate and merge enriched genomic regions into consensus peak sets |
|---|---|
| Description: | peakCombiner, a fully R based, user-friendly, transparent, and customizable tool that allows even novice R users to create a high-quality consensus peak list. The modularity of its functions allows an easy way to optimize input and output data. A broad range of accepted input data formats can be used to create a consensus peak set that can be exported to a file or used as the starting point for most downstream peak analyses. |
| Authors: | Markus Muckenhuber [aut, cre] (ORCID: <https://orcid.org/0000-0003-1897-2329>), Charlotte Soneson [aut] (ORCID: <https://orcid.org/0000-0003-3833-2169>), Michael Stadler [aut] (ORCID: <https://orcid.org/0000-0002-2269-4934>), Kathleen Sprouffske [aut] (ORCID: <https://orcid.org/0000-0001-7081-2598>), Novartis Biomedical Research [cph] |
| Maintainer: | Markus Muckenhuber <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.3.0 |
| Built: | 2026-05-30 07:18:56 UTC |
| Source: | https://github.com/bioc/peakCombiner |
centerExpandRegions() is an optional step that re-defines the
genomic regions by expanding them from their center. The center information
has to be stored in the input data column center, while the information for
the expansion can either be user provided or input data derived. The accepted
input is a data frame created from prepareInputRegions().
Please see prepareInputRegions() for more details.
centerExpandRegions( data, centerBy = "center_column", expandBy = NULL, genome = NA, trim_start = TRUE, outputFormat = "GenomicRanges", showMessages = TRUE )centerExpandRegions( data, centerBy = "center_column", expandBy = NULL, genome = NA, trim_start = TRUE, outputFormat = "GenomicRanges", showMessages = TRUE )
data |
PeakCombiner data frame structure with required columns
named |
centerBy |
Allowed values are 'center_column' (default) or 'midpoint'.
|
expandBy |
Allowed values a numeric vector of length 1 or 2, or 'NULL' (default).
|
genome |
Character value to define the matching genome reference to
the input data. Default value is NA. Allows values are
based on GenomicRanges supported genomes like "GRCh38",
"GRCh38.p13", "Amel_HAv3.1", "WBcel235", "TAIR10.1",
"hg38", "mm10", "rn6", "bosTau9", "canFam3", "musFur1",
"galGal6","dm6", "ce11", and "sacCer3". Please see also
help for |
trim_start |
Logical value of TRUE or FALSE (default). If TRUE, and
no valid reference genome are provided in |
outputFormat |
Character value to define format of output object. Accepted values are "GenomicRanges" (default), "tibble" or "data.frame". |
showMessages |
Logical value of TRUE (default) or FALSE. Defines if info messages are displayed or not. |
This is an optional function that resizes the genomic regions based on the
input peakCombiner standard data frame and the options you select. An
expected input data foam contains the following columns with the names:
chrom, start, end, name, score, strand, center, sample_name.
Such a data frame is created by the script
prepareInputRegions. This step is useful if you want all of
your peaks to be the same size for your downstream analyses. In addition, if
you want to use the "summit" information, normally obtained by some peak
callers (e.g., Macs2), this function allows you to automatically center your
regions of interest on these summits. This enables you to capture
information about the most important region within a genomic region (e.g.,
TF-binding site or highest peak) and put that region in the center of your
downstream analyses (e.g., applicable to motif-finding or "heatmaps"
summarizing multiple genomic regions).
There are two concepts that are relevant for centerExpandRegions: how to define the center, and how much to expand from the center.
When you prepared your input regions, it is recommended to use the function
prepareInputRegions provided by this package. This pre-
populated the center column with the absolute genomic coordinate of the
center of the peak region. You can either choose to define the center by
using pre-defined summit information (e.g., obtained from a peak caller like
MACS2) or re-compute the arithmetic mean and save that value in the column
center. (For details see the help for
prepareInputRegions()).
You can choose to expand the genomic region from the center either symmetrically or asymmetrically (different lengths before and after the center position).
In the symmetrical case, if you want to choose the size of your genomic
region based on the input data, this function can also calculate the median
peak size across all of your genomic regions and use that value (expandBy
= NULL). Alternatively, the user is free to provide a numeric vector to
define the expansion. A numeric vector with one value is used to
symmetrically expand, while a vector with two values allows to expand
asymmetrically.
A tibble with the columns chrom, start, end, name, score,
strand, center, sample_name. The definitions of these columns are
described in full in the prepareInputRegions Details.
Use as input for functions filterRegions() and
combineRegions().
# Load in and prepare a an accepted tibble utils::data(syn_data_bed) # Prepare input data data_prepared <- prepareInputRegions( data = syn_data_bed, outputFormat = "tibble", showMessages = TRUE ) # Run center and expand data_center_expand <- centerExpandRegions( data = data_prepared, centerBy = "center_column", expandBy = NULL, outputFormat = "tibble", showMessages = TRUE ) data_center_expand # You can choose to use the midpoint and predefined values to expand data_center_expand <- centerExpandRegions( data = data_prepared, centerBy = "midpoint", expandBy = c(100, 600), outputFormat = "tibble", showMessages = FALSE ) data_center_expand# Load in and prepare a an accepted tibble utils::data(syn_data_bed) # Prepare input data data_prepared <- prepareInputRegions( data = syn_data_bed, outputFormat = "tibble", showMessages = TRUE ) # Run center and expand data_center_expand <- centerExpandRegions( data = data_prepared, centerBy = "center_column", expandBy = NULL, outputFormat = "tibble", showMessages = TRUE ) data_center_expand # You can choose to use the midpoint and predefined values to expand data_center_expand <- centerExpandRegions( data = data_prepared, centerBy = "midpoint", expandBy = c(100, 600), outputFormat = "tibble", showMessages = FALSE ) data_center_expand
This is a general helper function for the package peakCombiner. Aim of
this function is to check a data frame for the correct column names and
classes of each column to ensure to be an accepte inpuut for functions:
centerExpandRegions(), filterRegions() and
combineRegions().
checkDataStructure(data, showMessages = TRUE)checkDataStructure(data, showMessages = TRUE)
data |
A tibble with the columns |
showMessages |
Logical value of TRUE (default) or FALSE. Defines if info messages are displayed or not. |
A tibble with the columns chrom, start, end, name, score,
strand, center, sample_name. The definitions of these columns are
described in full in the Details below. Use as input for functions
centerExpandRegions(), filterRegions() and
combineRegions().
Helper function for main function prepareInputRegions.
Input data is checked for multiple entries of the same genomic
region. This can occur when using called peak files as multiple summits can
be annotated within the same genomic regions (defined by chrom, start
and end). To avoid multiple entries, this script is checking the input for
multiple summits within the same regions and maintains only the strongest
enriched (based on the values in the column score). This step is mandatory
to quantity an optimal result.
For details see the details for prepareInputRegions.
collapseSummits(data)collapseSummits(data)
data |
A tibble with the columns |
A tibble with the columns chrom, start, end, name, score,
strand, center, sample_name. The definitions of these columns are
described in full in the Details below. Use as input for functions
centerExpandRegions(), filterRegions() and
combineRegions().
combineRegions is the main function of this package and combines overlapping genomic regions from different samples to create a single set of consensus genomic regions.
The accepted input is the PeakCombiner data frame is created from the function prepareInputRegions and has optionally already been centered and expanded and / or filtered using centerExpandRegions and filterRegions, respectively. Please see prepareInputRegions for more details.
combineRegions( data, foundInSamples = 2, combinedCenter = "nearest", removeFlankOverlaps = TRUE, annotateWithInputNames = FALSE, combinedSampleName = NULL, outputFormat = "GenomicRanges", showMessages = TRUE )combineRegions( data, foundInSamples = 2, combinedCenter = "nearest", removeFlankOverlaps = TRUE, annotateWithInputNames = FALSE, combinedSampleName = NULL, outputFormat = "GenomicRanges", showMessages = TRUE )
data |
PeakCombiner data frame structure with required columns
named |
foundInSamples |
Only include genomic regions that are found
in at least |
combinedCenter |
Defines how the column 'center' will be
populated for each genomic region in the output
data. Allowed options are
* |
removeFlankOverlaps |
TRUE (default) / FALSE. If TRUE, the combined regions are checked for an overlap with an input summit. Regions without such an overlap are considered as false positive regions caused by an artificial overlap of neighboring regions due to the expansion step. If FLASE, this step will be skipped. |
annotateWithInputNames |
TRUE / FALSE (default). If TRUE, a new column named 'input_names' is created in the output data that is populated for each combined genomic region with the 'name's of all contributing input regions. If the column 'input_names' already exists, it will be overwritten. |
combinedSampleName |
Optionally defines how the column 'sample_name' is populated for the output data. If not used, then the default is to simply concatenate all input sample_names into a single comma-separated string |
outputFormat |
Character value to define format of output object. Accepted values are "GenomicRanges" (default), "tibble" or "data.frame". |
showMessages |
Logical value of TRUE (default) or FALSE. Defines if info messages are displayed or not. |
combineRegions creates a set of consensus genomic regions by combining overlapping genomic regions from different samples. The general steps within this function are:
Identify overlapping genomic regions from the input samples
Retain overlapping genomic regions that are found in at least
foundInSamples samples. In this way, you can remove rare or
sample-specific regions
Note that overlapping genomic regions must contain at least one 'center' from its input sample regions to be considered a valid genomic region.
As you can use the output data from this step again (e.g., to center and expand the new set of consensus regions), we must define the 'center', 'score', 'sample_name', and 'name' values for the new genomic regions. We do this as follows:
'center' is defined by the combinedCenter parameter, which has three
options.
* middle - the mathematical center of the new region
* strongest - the 'center' of the input region that has the
the highest 'score' of all overlapping input
regions
* nearest - the 'center' of the input region that is closest
to mean of the 'center's of all overlapping
input regions (default)
'score' is the score of the genomic region from the sample whose
'center's was used, or the mean of the 'score's if middle was selected
for the combinedCenter parameter
'sample_name' can be user defined (combinedSampleName) or is a
concatenated string of all input 'sample_names' (default).
'name' is created by combining 'sample_name' and row number to create a unique identifier for each newly created genomic region.
Note, the output data.frame columns sample_name, name and score
will be updated.
A tibble with the columns chrom, start, end, name, score,
strand, center, sample_name, and optionally input_names.
The definitions of these columns are
described in full in the Details below. Use as input for functions
centerExpandRegions and filterRegions.
# Load in and prepare a an accepted tibble utils::data(syn_data_bed) data_prepared <- prepareInputRegions( data = syn_data_bed, outputFormat = "tibble", showMessages = FALSE ) # Lets combine the input data by defining all potential option combineRegions( data = data_prepared, foundInSamples = 2, combinedCenter = "nearest", annotateWithInputNames = TRUE, combinedSampleName = "consensus", outputFormat = "tibble", showMessages = TRUE )# Load in and prepare a an accepted tibble utils::data(syn_data_bed) data_prepared <- prepareInputRegions( data = syn_data_bed, outputFormat = "tibble", showMessages = FALSE ) # Lets combine the input data by defining all potential option combineRegions( data = data_prepared, foundInSamples = 2, combinedCenter = "nearest", annotateWithInputNames = TRUE, combinedSampleName = "consensus", outputFormat = "tibble", showMessages = TRUE )
Helper function for main function combineRegions. Requires in memory data frame in the standard accepted format for the peakCombiner package. For details see the details for combineRegions.
crAddSummit( data, input, combinedCenter = "nearest", annotateWithInputNames = FALSE, combinedSampleName = NULL )crAddSummit( data, input, combinedCenter = "nearest", annotateWithInputNames = FALSE, combinedSampleName = NULL )
data |
PeakCombiner data frame structure with required columns
named |
input |
The original input file from |
combinedCenter |
Defines how the column 'center' will be
populated for each genomic region in the output
data. Allowed options are
* |
annotateWithInputNames |
TRUE / FALSE (default). If TRUE, a new column named 'input_names' is created in the output data that is populated for each combined genomic region with the 'name's of all contributing input regions. If the column 'input_names' already exists, it will be overwritten. |
combinedSampleName |
Optionally defines how the column 'sample_name' is populated for the output data. If not used, then the default is to simply concatenate all input sample_names into a single comma-separated string |
As you can use the output data from this step again (e.g., to center and expand the new set of consensus regions), we must define the 'center', 'score', 'sample_name', and 'name' values for the new genomic regions. We do this as follows:
'center' is defined by the combinedCenter parameter, which has three
options.
* middle - the mathematical center of the new region
* strongest - the 'center' of the input region that has the
the highest 'score' of all overlapping input
regions
* nearest - the 'center' of the input region that is closest
to mean of the 'center's of all overlapping
input regions (default)
'score' is the score of the genomic region from the sample whose
'center's was used, or the mean of the 'score's if middle was selected
for the combinedCenter parameter
'sample_name' is a concatenated string of all input sample_names
In addition, the output data.frame columns sample_name, name and score
will be updated.
A tibble with the following columns: chrom, start, end, name,
score, strand, center, sample_name.
Helper function for main function combineRegions. Requires in memory data frame in the standard accepted format for the peakCombiner package. For details see the details for combineRegions.
crDisjoinFilter(data, foundInSamples)crDisjoinFilter(data, foundInSamples)
data |
PeakCombiner data frame structure with required columns
named |
foundInSamples |
Only include genomic regions that are found
in at least |
Retain overlapping genomic regions that are found in at least
foundInSamples samples. In this way, you can remove rare or
sample-specific regions.
A tibble with the following columns: chrom, start, end,
width, strand, revmap, ranking_comb_ref, name, rowname_disjoin.
Helper function for main function combineRegions. Requires in memory data frame in the standard accepted format for the peakCombiner package. For details see the details for combineRegions.
crOverlapWithSummits(data, input, removeFlankOverlaps = TRUE)crOverlapWithSummits(data, input, removeFlankOverlaps = TRUE)
data |
PeakCombiner data frame structure with required columns
named |
input |
The original input file from |
removeFlankOverlaps |
TRUE (default) / FALSE. If TRUE, the combined regions are checked for an overlap with an input summit. Regions without such an overlap are considered as false positive regions caused by an artificial overlap of neighboring regions due to the expansion step. If FLASE, this step will be skipped. |
Overlapping genomic regions must contain at least one 'center' from its input sample regions to be considered a valid genomic region. Regions without overlap might be a consequence of the expansion parameter and are likely to be false positive.
A tibble with the following columns: chrom, start, end,
width, strand, name.
Helper function for main function combineRegions. Requires in memory data frame in the standard accepted format for the peakCombiner package. For details see the details for combineRegions.
crReduce(data)crReduce(data)
data |
PeakCombiner data frame structure with required columns
named |
Recombine filtered genomic regions from disjoin function to create the consensus regions.
A tibble with the following columns: chrom, start, end,
width, strand, name.
Calculates the parameter expandBy when it was set to 'NULL' in the main
function. 'NULL' allows for data-driven definition of the expandBy value.
It calculates the median genomic region size of the input data and uses this
value like a length 1 numeric vector for expansion.
defineExpansion(data = data, expandBy = expandBy)defineExpansion(data = data, expandBy = expandBy)
data |
PeakCombiner data frame structure with required columns
named |
expandBy |
Allowed values a numeric vector of length 1 or 2, or 'NULL' (default).
|
A vector of length 1 to define region expansion.
filterRegions is an optional step that allows inclusion or exclusion of genomic regions based on 4 different criteria:
Include regions by their chromosome names (optional).
Exclude blacklisted regions (optional).
Include regions above a given score (optional).
Include top n regions per sample, ranked from highest to lowest score (optional).
The accepted input is the PeakCombiner data frame is created from the function prepareInputRegions. Please see prepareInputRegions for more details.
The filterRegions can be used multiple times on the same data set, which allows a user to step-wise optimize selection criteria of regions of interest.
filterRegions( data, includeByChromosomeName = NULL, excludeByBlacklist = NULL, includeAboveScoreCutoff = NULL, includeTopNScoring = NULL, outputFormat = "GenomicRanges", showMessages = TRUE )filterRegions( data, includeByChromosomeName = NULL, excludeByBlacklist = NULL, includeAboveScoreCutoff = NULL, includeTopNScoring = NULL, outputFormat = "GenomicRanges", showMessages = TRUE )
data |
PeakCombiner data frame structure with required columns
named |
includeByChromosomeName |
|
excludeByBlacklist |
|
includeAboveScoreCutoff |
|
includeTopNScoring |
|
outputFormat |
Character value to define format of output object. Accepted values are "GenomicRanges" (default), "tibble" or "data.frame". |
showMessages |
Logical value of TRUE (default) or FALSE. Defines if info messages are displayed or not. |
This is an optional step which enables commonly-needed filtering steps to focus in on the key genomic regions of interest. This can be useful when there are many genomic regions identified in your peak-caller or input BED files.
filterRegions can be used multiple times on the same data set, allowing a user to select regions of interest using a step-wise optimization approach.
includeByChromosomeName - Retains only chromosomes that are in the
provided vector. By not including
mitochondrial, sex, or non-classical
chromosomes, genomic regions found on
these chromosomes can be removed. If set
to 'NULL' (default), this step will be
skipped (optional).
excludeByBlacklist - A GenomicRanges file, dataframe or tibble
can be provided listing the genomic
regions to remove (having chrom (
seqnames for GenomicRanges) , start,
and end column names). If set to 'NULL'
(default), this step will be skipped
(optional).
Please note that if there are not matching
entries in the 'chrom' columns of input
and blacklist, an information message is
displayed. This can happend and does not
cause any problems with the script.
includeAboveScoreCutoff - Single numeric value that defines the
score threshold above which all genomic
regions will be retained. The score
column in the peakCombiner input data
should be non-zero for this parameter to
be used. It is populated by
prepareInputRegions, and
by default takes the value of -log10(FDR)
if possible (e.g., using a .narrowPeak
file from MACS2 as input). Importantly,
applying this filter retains a variable
number of genomic regions per sample, all
having a score greater than the
includeAboveScoreCutoff parameter. If
set to 'NULL' (default), this step will
be skipped (optional).
includeTopNScoring - Single numeric value that defines how many
of the top scoring genomic regions (using
the column score) are retained. All
other genomic regions are discarded.
Importantly, applying this filter retains
includeTopNScoring regions per
sample, which means that the minimum
enrichment levels may vary between
samples. Note that if multiple genomic
regions have the same score cutoff
value, then all of those genomic regions
are included. In this case, the number of
resulting regions retained may be a bit
higher than the input parameter. If set to
'NULL' (default), this step will be
skipped (optional).
A tibble with the columns chrom, start, end, name, score,
strand, center, sample_name. The definitions of these columns are
described in full in the prepareInputRegions Details.
Use as input for functions centerExpandRegions and
combineRegions.
# Load in and prepare a an accepted tibble utils::data(syn_data_bed) data_prepared <- prepareInputRegions( data = syn_data_bed, outputFormat = "tibble", showMessages = TRUE ) # Here use options for all four filtering methods. filterRegions( data = data_prepared, includeByChromosomeName = c("chr1", "chr2", "chr4"), excludeByBlacklist = NULL, includeAboveScoreCutoff = 10, includeTopNScoring = 100, outputFormat = "tibble", showMessages = TRUE )# Load in and prepare a an accepted tibble utils::data(syn_data_bed) data_prepared <- prepareInputRegions( data = syn_data_bed, outputFormat = "tibble", showMessages = TRUE ) # Here use options for all four filtering methods. filterRegions( data = data_prepared, includeByChromosomeName = c("chr1", "chr2", "chr4"), excludeByBlacklist = NULL, includeAboveScoreCutoff = 10, includeTopNScoring = 100, outputFormat = "tibble", showMessages = TRUE )
prepareInputRegions prepares the input data in the format needed for all of the following steps within peakCombiner. It accepts the following formats:
in memory data frame listing each sample's peak file location,
in memory data frame listing the peaks themselves that are found in each sample, or
in memory GRanges object listing the peaks themselves that are found in each sample.
prepareInputRegions( data, outputFormat = "GenomicRanges", genome = NA, startsAreBased = 1, showMessages = TRUE )prepareInputRegions( data, outputFormat = "GenomicRanges", genome = NA, startsAreBased = 1, showMessages = TRUE )
data |
Data frame or GRanges object with the input data. Several formats are accepted, which are described in full in the Details below.
|
outputFormat |
Character value to define format of output object. Accepted values are "GenomicRanges" (default), "tibble" or "data.frame". |
genome |
Character value to define the matching genome reference to
the input data. Default value is NA. Allows values are
based on GenomicRanges supported genomes like "GRCh38",
"GRCh38.p13", "Amel_HAv3.1", "WBcel235", "TAIR10.1",
"hg38", "mm10", "rn6", "bosTau9", "canFam3", "musFur1",
"galGal6","dm6", "ce11", and "sacCer3". Please see also
help for |
startsAreBased |
Either 0, 1 (Default), or NA. Define if the provided input data is 0 or 1-based. Only, if paramter is NA then GenomicRanges object, tibbles and dataframes are considered 1-based, while data loaded from a sample_sheet is considered 0-based (expected to load a BED file). |
showMessages |
Logical value of TRUE (default) or FALSE. Defines if info messages are displayed or not. |
Accepted inputs are one of the three following options:
In memory data frame listing each sample's peak file location
sample_name - Unique name for each sample
(required).
file_path - Path to the file in which the genomic regions are
stored. For example, the path to a bed file or
.narrowPeak file (required).
file_format - The expected file format. Needed to correctly label the
columns of the input. Acceptable values are:
bed, narrowPeak, and broadPeak (required).
score_colname - Either column name or number of the column having the
the metric used to rank peak importance, where bigger
values are more important. Entries have to be identical,
mutliple entries are not supported. If not provided,
column 9 will be used for .narrowPeak or
.broadPeak file formats. Column 9 corresponds to
the qValue as described in the UCSC documentation
here.
Other alternatives for narrowPeak or broadPeak
could be columns 7 or 8, which correspond to
signalValue or pValue (optional).
In memory data frame listing the peaks themselves that are found in each sample. The columns can be provided in any order and have the following names. Note that additional columns will be dropped.
chrom - chromosome name (required).
start - start coordinate of range (1-based coordinate system,
NOT like bed files which are 0-based) (required).
end - end coordinate of range (required).
sample_name - unique identifier for a sample. No restrictions on
characters (required).
score - the metric used to rank peak importance, where bigger values
are more important. For example, qValue from Macs2,
-log10FDR from another method, or fold enrichment over
background computed from your favorite method. If not
provided, defaults to 0 (optional).
strand - values are '+', '-', or '.'. If not provided, defaults to '.'
(optional).
summit - distance of the strongest signal ("summit") of the peak
region from the start coordinate (optional).
In memory GRanges object listing the peaks themselves that are found in
each sample. This object is very similar to the data frame above,
except that chrom, start, and end are instead described using
the GRanges nomenclature. Note that additional columns will be dropped.
This function parses the inputs provided and returns a data frame having the columns listed below.
chrom - chromosome name
start - start coordinate of range (1-based coordinate system,
NOT like bed files which are 0-based)
end - end coordinate of range
name - unique identifier for a region. auto-generated by this
function
score - the metric used to rank peak importance, where bigger
values are more important. For example, qValue from MACS2,
-log10FDR from another method, or fold enrichment over
background computed from your favorite method
strand - values are '+', '-', or '.'. Chromatin data are typically
non- stranded so will have a '.'.
center - absolute genomic coordinate of the nucleotide at the
center of the peak region, or alternatively the strongest
signal ("summit") of the peak region. If no value is
provided by the user, center defaults to the arithmetic
center of the peak region.
sample_name - unique identifier for a sample. No restrictions on
characters
In addition, input data is checked for mutliple entries of the same genomic
region. This can occure when using called peak files as multiple summits can
be annotated within the sqme genomic regions (defined by chrom, start
and end). To avoid mutliple entries, this script is checking the input for
multiple summits within the same regions and maintains only the strongest
enriched (based on the values in the column score). This step is mandatory
to guaranty an optimal result.
An additional option is to provide already here a genome (details see below)
and maintain this information for the function
centerExpandRegions().
A tibble with the columns chrom, start, end, name, score,
strand, center, sample_name. The definitions of these columns are
described in full in the Details below. Use as input for functions
centerExpandRegions(), filterRegions() and
combineRegions().
# Load in and prepare a an accepted tibble utils::data(syn_data_tibble) data_prepared <- prepareInputRegions( data = syn_data_tibble, outputFormat = "tibble", showMessages = TRUE ) data_prepared # Or a pre-loaded tibble with genomic regions and named columns. utils::data(syn_data_control01) utils::data(syn_data_treatment01) combined_input <- syn_data_control01 |> dplyr::mutate(sample_name = "control-rep1") |> rbind(syn_data_treatment01 |> dplyr::mutate(sample_name = "treatment-rep1")) prepareInputRegions( data = combined_input, outputFormat = "tibble", showMessages = FALSE )# Load in and prepare a an accepted tibble utils::data(syn_data_tibble) data_prepared <- prepareInputRegions( data = syn_data_tibble, outputFormat = "tibble", showMessages = TRUE ) data_prepared # Or a pre-loaded tibble with genomic regions and named columns. utils::data(syn_data_control01) utils::data(syn_data_treatment01) combined_input <- syn_data_control01 |> dplyr::mutate(sample_name = "control-rep1") |> rbind(syn_data_treatment01 |> dplyr::mutate(sample_name = "treatment-rep1")) prepareInputRegions( data = combined_input, outputFormat = "tibble", showMessages = FALSE )
Synthetic example blacklisted regions file as tibble with columns "chrom", "start", and "end".
data(syn_blacklist)data(syn_blacklist)
syn_blacklist A tibble with 2 rows and 3 columns:
Created for R package peakCombiner.
Synthetic example data set as minimal required input file with columns "chrom", "start", "end", "name", "score", "strand", "signalValue", "pValue", "qValue" and "peak".
data(syn_control_rep1_narrowPeak)data(syn_control_rep1_narrowPeak)
syn_control_rep1_narrowPeak A tibble with 11 rows and 6 columns:
Created for R package peakCombiner.
Synthetic example data set as minimal required input file with columns "chrom", "start", "end", and "sample_name".
data(syn_data_bed)data(syn_data_bed)
syn_data_bed A tibble with 55 rows and 4 columns:
Created for R package peakCombiner.
Synthetic example data set as minimal required input file with columns "chrom", "start", "end", "score", "strand", and "center".
data(syn_data_control01)data(syn_data_control01)
syn_data_control01 A tibble with 11 rows and 6 columns:
Created for R package peakCombiner.
Synthetic example data set from GenomicRanges object with columns "seqnames", "start", "end", "width", "strand", "score", "center", and "sample_name".
data(syn_data_granges)data(syn_data_granges)
syn_data_granges A data frame with 55 rows and 8 columns:
Created for R package peakCombiner.
Synthetic example data set as tibble with columns "chrom", "start", "end", "name", "score", "strand" , "center", and "sample_name".
data(syn_data_tibble)data(syn_data_tibble)
syn_data_tibble A tibble with 55 rows and 8 columns:
Created for R package peakCombiner.
Synthetic example data set as minimal required input file with columns "chrom", "start", "end", "score", "strand", and "center".
data(syn_data_treatment01)data(syn_data_treatment01)
syn_data_treatment01 A tibble with 10 rows and 6 columns:
Created for R package peakCombiner.
Synthetic example sample sheet as tibble with columns "sample_name", "file_path", "file_format", and "score_colname".
data(syn_sample_sheet)data(syn_sample_sheet)
syn_sample_sheet A tibble with 6 rows and 4 columns.
Created for R package peakCombiner.
Synthetic example data set as minimal required input file with columns "chrom", "start", "end", "name", "score", "strand", "signalValue", "pValue", "qValue" and "peak".
data(syn_treatment_rep1_narrowPeak)data(syn_treatment_rep1_narrowPeak)
syn_treatment_rep1_narrowPeak A tibble with 11 rows and 6 columns:
Created for R package peakCombiner.