Title: | A/B compartment detection and differential analysis |
---|---|
Description: | HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions. It provides a collection of functions assembled into a pipeline to filter and normalize the data, predict the compartments and visualize the results. It accepts several type of data: tabular `.tsv` files, Cooler `.cool` or `.mcool` files, Juicer `.hic` files or HiC-Pro `.matrix` and `.bed` files. |
Authors: | Kurylo Cyril [aut], Zytnicki Matthias [aut], Foissac Sylvain [aut], Maigné Élise [aut, cre] |
Maintainer: | Maigné Élise <[email protected]> |
License: | LGPL-3 |
Version: | 1.9.2 |
Built: | 2024-12-13 03:17:08 UTC |
Source: | https://github.com/bioc/HiCDOC |
HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions. It provides a collection of functions assembled into a pipeline to filter and normalize the data, predict the compartments and visualize the results. It accepts several type of data: tabular '.tsv' files, Cooler '.cool' or '.mcool' files, Juicer '.hic' files or HiC-Pro '.matrix' and '.bed' files.
Maintainer: Maigné Élise [email protected]
Authors:
Kurylo Cyril [email protected]
Zytnicki Matthias [email protected]
Foissac Sylvain [email protected]
Useful links:
Detects compartments for each genomic position in each condition, and computes p-values for compartment differences between conditions.
detectCompartments( object, parallel = FALSE, kMeansDelta = NULL, kMeansIterations = NULL, kMeansRestarts = NULL, PC1CheckThreshold = NULL )
detectCompartments( object, parallel = FALSE, kMeansDelta = NULL, kMeansIterations = NULL, kMeansRestarts = NULL, PC1CheckThreshold = NULL )
object |
|
parallel |
Whether or not to parallelize the processing. Defaults to FALSE See 'Details'. |
kMeansDelta |
The convergence stop criterion for the clustering. When the centroids'
distances between two iterations is lower than this value, the clustering
stops. Defaults to |
kMeansIterations |
The maximum number of iterations during clustering. Defaults to
|
kMeansRestarts |
The amount of times the clustering is restarted. For each restart, the
clustering iterates until convergence or reaching the maximum number of
iterations. The clustering that minimizes inner-cluster variance is selected.
Defaults to |
PC1CheckThreshold |
The minimum percentage of variance that should be explained by
the first principal component of centroids to pass sanity check.
Defaults to |
To clusterize genomic positions, the algorithm follows these steps:
For each chromosome and condition, get the interaction vectors of each genomic position. Each genomic position can have multiple interaction vectors, corresponding to the multiple replicates in that condition.
For each chromosome and condition, use constrained K-means to clusterize the interaction vectors, forcing replicate interaction vectors into the same cluster. The euclidean distance between interaction vectors determines their similarity.
For each interaction vector, compute its concordance, which is the confidence in its assigned cluster. Mathematically, it is the log ratio of its distance to each centroid, normalized by the distance between both centroids, and min-maxed to a [-1,1] interval.
For each chromosome, compute the distance between all centroids and the centroids of the first condition. The cross-condition clusters whose centroids are closest are given the same cluster label. This results in two clusters per chromosome, spanning all conditions.
To match each cluster with an A or B compartment, the algorithm follows these steps:
For each genomic position, compute its self interaction ratio, which is the difference between its self interaction and the median of its other interactions.
For each chromosome, for each cluster, get the median self interaction ratio of the genomic positions in that cluster.
For each chromosome, the cluster with the smallest median self interaction ratio is matched with compartment A, and the cluster with the greatest median self interaction ratio is matched with compartment B. Compartment A being open, there are more overall interactions between distant genomic positions, so it is assumed that the difference between self interactions and other interactions is lower than in compartment B.
To find significant compartment differences across conditions, and compute their p-values, the algorithm follows three steps:
For each pair of replicates in different conditions, for each genomic position, compute the absolute difference between its concordances.
For each pair of conditions, for each genomic position, compute the median of its concordance differences.
For each pair of conditions, for each genomic position whose assigned compartment switches, rank its median against the empirical cumulative distribution of medians of all non-switching positions in that condition pair. Adjust the resulting p-value with the Benjamini–Hochberg procedure.
The parallel version of detectCompartments uses the
bpmapply
function. Before to call the
function in parallel you should specify the parallel parameters such as:
On Linux:
multiParam <- BiocParallel::MulticoreParam(workers = 10)
On Windows:
multiParam <- BiocParallel::SnowParam(workers = 10)
And then you can register the parameters to be used by BiocParallel:
BiocParallel::register(multiParam, default = TRUE)
You should be aware that using MulticoreParam, reproducibility of the detectCompartments function using a RNGseed may not work. See this issue for more details.
A HiCDOCDataSet
, with compartments, concordances, distances,
centroids, and differences.
data(exampleHiCDOCDataSet) ## Run all filtering and normalization steps (not run for timing reasons) # object <- filterSmallChromosomes(exampleHiCDOCDataSet) # object <- filterSparseReplicates(object) # object <- filterWeakPositions(object) # object <- normalizeTechnicalBiases(object) # object <- normalizeBiologicalBiases(object) # object <- normalizeDistanceEffect(object) # Detect compartments and differences across conditions object <- detectCompartments(exampleHiCDOCDataSet)
data(exampleHiCDOCDataSet) ## Run all filtering and normalization steps (not run for timing reasons) # object <- filterSmallChromosomes(exampleHiCDOCDataSet) # object <- filterSparseReplicates(object) # object <- filterWeakPositions(object) # object <- normalizeTechnicalBiases(object) # object <- normalizeBiologicalBiases(object) # object <- normalizeDistanceEffect(object) # Detect compartments and differences across conditions object <- detectCompartments(exampleHiCDOCDataSet)
A S4 HiCDOCDataSet object with 4 chromosomes, 3 conditions and 3 replicates.
data(exampleHiCDOCDataSet)
data(exampleHiCDOCDataSet)
S4 HiCDOCDataSet object with the following characteristics:
4 chromosomes: W, X, Y, Z
3 conditions: 1, 2, 3
3 replicates: R1, R2, R3
A resolution of 137 bases
data(exampleHiCDOCDataSet) exampleHiCDOCDataSet
data(exampleHiCDOCDataSet) exampleHiCDOCDataSet
A S4 HiCDOCDataSet object with 3 chromosomes, 3 conditions and 3 replicates.
Can be retrieved by running :
data(exampleHiCDOCDataSet);
set.seed(123);
exampleHiCDOCDataSetProcessed <- HiCDOC(exampleHiCDOCDataSet)
data(exampleHiCDOCDataSetProcessed)
data(exampleHiCDOCDataSetProcessed)
S4 HiCDOCDataSet object with the following characteristics:
4 chromosomes: X, Y, Z
3 conditions: 1, 2, 3
3 replicates: R1, R2, R3
A resolution of 137 bases
A HiCDOCDataSet
, already filtered and normalized.
data(exampleHiCDOCDataSetProcessed) exampleHiCDOCDataSetProcessed
data(exampleHiCDOCDataSetProcessed) exampleHiCDOCDataSetProcessed
Removes chromosomes whose length (in number of positions) is smaller than the threshold.
filterSmallChromosomes(object, threshold = NULL)
filterSmallChromosomes(object, threshold = NULL)
object |
|
threshold |
The minimum length (number of positions) for a chromosome to be kept.
Defaults to |
A filtered HiCDOCDataSet
.
filterSparseReplicates
,
filterWeakPositions
,
HiCDOC
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet chromosomes(object) object <- filterSmallChromosomes(object) chromosomes(object)
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet chromosomes(object) object <- filterSmallChromosomes(object) chromosomes(object)
Removes chromosome replicates whose percentage of non-zero interactions is smaller than the threshold.
filterSparseReplicates(object, threshold = NULL)
filterSparseReplicates(object, threshold = NULL)
object |
|
threshold |
The minimum percentage of non-zero interactions for a chromosome replicate to
be kept. If a chromosome replicate's percentage of non-zero interactions is
lower than this value, it is removed. Defaults to
|
A filtered HiCDOCDataSet
.
filterSmallChromosomes
,
filterWeakPositions
,
HiCDOC
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet object <- filterSparseReplicates(object)
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet object <- filterSparseReplicates(object)
Removes weak genomic positions whose interactions average is lower than the threshold.
filterWeakPositions(object, threshold = NULL)
filterWeakPositions(object, threshold = NULL)
object |
|
threshold |
The minimum average interaction for a position to be kept. If a position's
average interaction with the entire chromosome is lower than this value in
any of the replicates, it is removed from all replicates and conditions.
Defaults to |
Detects weak genomic positions in each replicate, and removes them from all
replicates to guarantee comparability across conditions when calling
detectCompartments
.
A filtered HiCDOCDataSet
.
filterSmallChromosomes
,
filterSparseReplicates
,
HiCDOC
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet object <- filterWeakPositions(object)
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet object <- filterWeakPositions(object)
Runs the default filtering, normalization, and computational steps on a
HiCDOCDataSet
. To learn more about HiCDOC, browse the vignette:
browseVignettes(package = "HiCDOC")
.
HiCDOC(object, parallel = FALSE)
HiCDOC(object, parallel = FALSE)
object |
A |
parallel |
Whether or not to parallelize each step. Defaults to FALSE. |
HiCDOC
pipelineThe HiCDOC pipeline has seven steps: \
Three filtering steps:
filterSmallChromosomes
to filter out
small chromosomes
filterWeakPositions
to filter out weak
positions with very few interactions
filterSparseReplicates
to filter out
sparse replicates with many null interactions
Three normalization steps:
normalizeTechnicalBiases
to normalize
technical biases in each replicates
normalizeBiologicalBiases
to normalize
biological biases in each replicate
normalizeDistanceEffect
to normalize the
distance effect in each chromosome
One computational step:
detectCompartments
to detect compartments
in each condition and find significant changes between
conditions.
The parallel version of HiCDOC uses the
BiocParallel
package. Before to call the
function in parallel you should specify the parallel parameters such as:
On Linux:
multiParam <- BiocParallel::MulticoreParam(workers = 10)
On Windows:
multiParam <- BiocParallel::SnowParam(workers = 10)
And then you can register the parameters to be used by BiocParallel: \
BiocParallel::register(multiParam, default = TRUE)
You should be aware that using MulticoreParam, reproducibility of the detectCompartments function using a RNGseed may not work. See this issue for more details.
A HiCDOCDataSet with all slots filled.
HiCDOCDataSet
, filterSmallChromosomes
,
filterWeakPositions
, filterSparseReplicates
,
normalizeTechnicalBiases
,
normalizeBiologicalBiases
,
normalizeDistanceEffect
,
detectCompartments
data(exampleHiCDOCDataSet) # Default HiCDOC pipeline # Not printing loess warnings for example purpose. # Results should be inspected if there is any. suppressWarnings( object <- HiCDOC(exampleHiCDOCDataSet) ) # Equivalent to if(FALSE){ object <- filterSmallChromosomes(exampleHiCDOCDataSet) object <- filterSparseReplicates(object) object <- filterWeakPositions(object) object <- normalizeTechnicalBiases(object) object <- normalizeBiologicalBiases(object) object <- normalizeDistanceEffect(object) object <- detectCompartments(object) }
data(exampleHiCDOCDataSet) # Default HiCDOC pipeline # Not printing loess warnings for example purpose. # Results should be inspected if there is any. suppressWarnings( object <- HiCDOC(exampleHiCDOCDataSet) ) # Equivalent to if(FALSE){ object <- filterSmallChromosomes(exampleHiCDOCDataSet) object <- filterSparseReplicates(object) object <- filterWeakPositions(object) object <- normalizeTechnicalBiases(object) object <- normalizeBiologicalBiases(object) object <- normalizeDistanceEffect(object) object <- detectCompartments(object) }
HiCDOCDataSet
S4 class.Data structure for a Hi-C experiment.
An instance of HiCDOCDataSet
describes a Hi-C experiment with slots
for path(s) to input file(s), interactions, pipeline parameters defaulting to
defaultHiCDOCParameters
, and computation results. It can be
constructed from 4 different types of data:
Tabular files: see HiCDOCDataSetFromTabular
(m)Cool files: see HiCDOCDataSetFromCool
HiC files: see HiCDOCDataSetFromHiC
HiC-Pro matrices and bed files: see HiCDOCDataSetFromHiCPro
An example HiCDOCDataSet
is also available, see
exampleHiCDOCDataSet
.
The HiCDOCDataSet
object can be explored using the appropriate
accessors.
The accessors for a HiCDOCDataset object are the following:
chromosomes
to retrieve the vector of chromosome names.
sampleConditions
to retrieve the vector of condition names,
one for each sample.
sampleReplicates
to retrieve the vector of replicate names,
one for each sample.
After the detection of compartments you can use this accessors:
compartments
returns a GenomicRange of the compartment
of every position in every condition.
concordances
returns a GenomicRange of the significant
compartment differences between conditions, and their p-values.
differences
returns a GenomicRange of the concordance
(confidence in assigned compartment) of every position in every replicate.
See the HiCDOCDataSet-methods man page for more details on methods and accessors.
HiCDOC
,
exampleHiCDOCDataSet
,
HiCDOCDataSetFromTabular
,
HiCDOCDataSetFromCool
,
HiCDOCDataSetFromHiC
,
HiCDOCDataSetFromHiCPro
HiCDOCDataSet
components.Retrieve information and results from a HiCDOCDataSet
.
## S4 method for signature 'HiCDOCDataSet' chromosomes(object) ## S4 method for signature 'HiCDOCDataSet' sampleConditions(object) ## S4 method for signature 'HiCDOCDataSet' sampleReplicates(object) ## S4 method for signature 'HiCDOCDataSet' compartments(object, passChecks = TRUE) ## S4 method for signature 'HiCDOCDataSet' differences(object, threshold = NULL) ## S4 method for signature 'HiCDOCDataSet' concordances(object, passChecks = TRUE) ## S4 method for signature 'HiCDOCDataSet' show(object)
## S4 method for signature 'HiCDOCDataSet' chromosomes(object) ## S4 method for signature 'HiCDOCDataSet' sampleConditions(object) ## S4 method for signature 'HiCDOCDataSet' sampleReplicates(object) ## S4 method for signature 'HiCDOCDataSet' compartments(object, passChecks = TRUE) ## S4 method for signature 'HiCDOCDataSet' differences(object, threshold = NULL) ## S4 method for signature 'HiCDOCDataSet' concordances(object, passChecks = TRUE) ## S4 method for signature 'HiCDOCDataSet' show(object)
object |
a HiCDOCDataSet object |
passChecks |
logical. Display only the concordances/compartments for the chromosomes passing sanity checks. |
threshold |
a numeric value between 0 and 1. If no threshold, all the differences will
be printed even the non significant ones. Otherwise the differences printed
are filtered to show the ones with an adjusted p-value <= |
A character vector (for chromosomes
, sampleConditions
,
sampleReplicates
),
or a GRanges object
(for compartments
, concordances
, differences
).
chromosomes()
: Retrieves the vector of chromosome names.
sampleConditions()
: Retrieves the vector of condition names, one for each sample.
sampleReplicates()
: Retrieves the vector of replicate names, one for each sample.
compartments()
: Retrieves a GenomicRange
of the compartment of every position
in every condition.
differences()
: Retrieves a GenomicRange
of the significant compartment differences
between conditions, and their p-values.
concordances()
: Retrieves a GenomicRange
of the concordance (confidence in assigned
compartment) of every position in every replicate.
# Load an example dataset already processed # (i.e. after the detection of compartments) data(exampleHiCDOCDataSetProcessed) exampleHiCDOCDataSetProcessed chromosomes(exampleHiCDOCDataSetProcessed) sampleConditions(exampleHiCDOCDataSetProcessed) sampleReplicates(exampleHiCDOCDataSetProcessed) compartments(exampleHiCDOCDataSetProcessed) differences(exampleHiCDOCDataSetProcessed) concordances(exampleHiCDOCDataSetProcessed)
# Load an example dataset already processed # (i.e. after the detection of compartments) data(exampleHiCDOCDataSetProcessed) exampleHiCDOCDataSetProcessed chromosomes(exampleHiCDOCDataSetProcessed) sampleConditions(exampleHiCDOCDataSetProcessed) sampleReplicates(exampleHiCDOCDataSetProcessed) compartments(exampleHiCDOCDataSetProcessed) differences(exampleHiCDOCDataSetProcessed) concordances(exampleHiCDOCDataSetProcessed)
HiCDOCDataSet
.Retrieves or sets parameters used for filtering, normalization, and prediciton of compartments.
defaultHiCDOCParameters ## S4 method for signature 'HiCDOCDataSet' parameters(object) ## S4 replacement method for signature 'HiCDOCDataSet' parameters(object) <- value
defaultHiCDOCParameters ## S4 method for signature 'HiCDOCDataSet' parameters(object) ## S4 replacement method for signature 'HiCDOCDataSet' parameters(object) <- value
object |
|
value |
a named list containing the names and valued of the parameters to change (see Details). |
An object of class list
of length 9.
A HiCDOCDataSet
's parameters are automatically set to default
values retrieved from defaultHiCDOCParameters
. They are
accessed by filtering, normalization, and compartment detection functions.
If those functions are called with custom arguments, the object's
parameters are updated to record the actual parameters used. If the
object's parameters are customized before calling the functions, the
custom parameters will be used.
See
filterSmallChromosomes
,
filterSparseReplicates
,
filterWeakPositions
,
normalizeDistanceEffect
, and
detectCompartments
,
for details on how these parameters are used.
smallChromosomeThreshold
The minimum length (number of positions) for a chromosome to be
kept when filtering with filterSmallChromosomes
.
Defaults to
defaultHiCDOCParameters$smallChromosomeThreshold
= 100.
sparseReplicateThreshold
The minimum percentage of non-zero interactions for a chromosome
replicate to be kept when filtering with
filterSparseReplicates
. If a chromosome replicate's
percentage of non-zero interactions is lower than this value, it
is removed. Defaults to
defaultHiCDOCParameters$smallChromosomeThreshold
= 30
weakPositionThreshold
The minimum average interaction for a position to be kept when
filtering with filterWeakPositions
. If a position's
average interaction with the entire chromosome is lower than this
value in any of the replicates, it is removed from all replicates
and conditions. Defaults to
defaultHiCDOCParameters$smallChromosomeThreshold
= 1.
cyclicLoessSpan
The span for cyclic loess normalization used in
normalizeTechnicalBiases
. This value is passed to
multiHiCcompare::cyclic_loess
.
Defaults to NA indicating that span will be automatically
calculated using generalized cross validation.
For large dataset, it is highly recommended to set this value
to reduce computing time and necessary memory.
loessSampleSize
The number of positions used as a sample to estimate the effect
of distance on proportion of interactions when normalizing with
normalizeDistanceEffect
. Defaults to
defaultHiCDOCParameters$loessSampleSize
= 20000.
kMeansDelta
The convergence stop criterion for the clustering when detecting
compartments with detectCompartments
. When the
centroids' distances between two iterations is lower than this
value, the clustering stops. Defaults to
defaultHiCDOCParameters$kMeansDelta
= 0.0001.
kMeansIterations
The maximum number of iterations during clustering when detecting
compartments with detectCompartments
. Defaults to
defaultHiCDOCParameters$kMeansIterations
= 50.
kMeansRestarts
The amount of times the clustering is restarted when detecting
compartments with detectCompartments
. For each
restart, the clustering iterates until convergence or reaching
the maximum number of iterations. The clustering that minimizes
inner-cluster variance is selected. Defaults to
defaultHiCDOCParameters$kMeansRestarts
= 20.
PC1CheckThreshold
The minimum percentage of variance that should be explained by
the first principal component of centroids to pass sanity check.
Defaults to
defaultHiCDOCParameters$PC1CheckThreshold
= 0.75
data(exampleHiCDOCDataSet) # Retrieve parameters parameters(exampleHiCDOCDataSet) # Set parameters parameters(exampleHiCDOCDataSet) <- list("smallChromosomeThreshold" = 50) parameters(exampleHiCDOCDataSet) <- list( "weakPositionThreshold" = 10, "kMeansRestarts" = 30 )
data(exampleHiCDOCDataSet) # Retrieve parameters parameters(exampleHiCDOCDataSet) # Set parameters parameters(exampleHiCDOCDataSet) <- list("smallChromosomeThreshold" = 50) parameters(exampleHiCDOCDataSet) <- list( "weakPositionThreshold" = 10, "kMeansRestarts" = 30 )
HiCDOCDataSet
constructor from Cool files.Constructs a HiCDOCDataSet
from a set of .cool
or
.mcool
files.
HiCDOCDataSetFromCool(paths, replicates, conditions, binSize = NA)
HiCDOCDataSetFromCool(paths, replicates, conditions, binSize = NA)
paths |
A vector of paths to |
replicates |
A vector of replicate names repeated along the conditions. |
conditions |
A vector of condition names repeated along the replicates. |
binSize |
The resolution (span of each position in number of bases). Optionally
provided to select the appropriate resolution in |
## Not run: # Path to each file paths = c( 'path/to/condition-1.replicate-1.cool', 'path/to/condition-1.replicate-2.cool', 'path/to/condition-2.replicate-1.cool', 'path/to/condition-2.replicate-2.cool', 'path/to/condition-3.replicate-1.cool' ) # Replicate and condition of each file. Can be names instead of numbers. replicates <- c(1, 2, 1, 2, 1) conditions <- c(1, 1, 2, 2, 3) # Resolution to select in .mcool files binSize = 500000 # Instantiation of data set object <- HiCDOCDataSetFromCool( paths, replicates = replicates, conditions = conditions, binSize = binSize # Specified for .mcool files. ) ## End(Not run)
## Not run: # Path to each file paths = c( 'path/to/condition-1.replicate-1.cool', 'path/to/condition-1.replicate-2.cool', 'path/to/condition-2.replicate-1.cool', 'path/to/condition-2.replicate-2.cool', 'path/to/condition-3.replicate-1.cool' ) # Replicate and condition of each file. Can be names instead of numbers. replicates <- c(1, 2, 1, 2, 1) conditions <- c(1, 1, 2, 2, 3) # Resolution to select in .mcool files binSize = 500000 # Instantiation of data set object <- HiCDOCDataSetFromCool( paths, replicates = replicates, conditions = conditions, binSize = binSize # Specified for .mcool files. ) ## End(Not run)
HiCDOCDataSet
constructor from HiC files.Constructs a HiCDOCDataSet
from a set of
.hic
files.
HiCDOCDataSetFromHiC(paths, replicates, conditions, binSize)
HiCDOCDataSetFromHiC(paths, replicates, conditions, binSize)
paths |
A vector of paths to |
replicates |
A vector of replicate names repeated along the conditions. |
conditions |
A vector of condition names repeated along the replicates. |
binSize |
The resolution (span of each position in number of bases) to select within
the |
## Not run: #' # Path to each file paths = c( 'path/to/condition-1.replicate-1.hic', 'path/to/condition-1.replicate-2.hic', 'path/to/condition-2.replicate-1.hic', 'path/to/condition-2.replicate-2.hic', 'path/to/condition-3.replicate-1.hic' ) # Replicate and condition of each file. Can be names instead of numbers. replicates <- c(1, 2, 1, 2, 1) conditions <- c(1, 1, 2, 2, 3) # Resolution to select binSize <- 500000 # Instantiation of data set hic.experiment <- HiCDOCDataSetFromHiC( paths, replicates = replicates, conditions = conditions, binSize = binSize ) ## End(Not run)
## Not run: #' # Path to each file paths = c( 'path/to/condition-1.replicate-1.hic', 'path/to/condition-1.replicate-2.hic', 'path/to/condition-2.replicate-1.hic', 'path/to/condition-2.replicate-2.hic', 'path/to/condition-3.replicate-1.hic' ) # Replicate and condition of each file. Can be names instead of numbers. replicates <- c(1, 2, 1, 2, 1) conditions <- c(1, 1, 2, 2, 3) # Resolution to select binSize <- 500000 # Instantiation of data set hic.experiment <- HiCDOCDataSetFromHiC( paths, replicates = replicates, conditions = conditions, binSize = binSize ) ## End(Not run)
HiCDOCDataSet
constructor from HiC-Pro files.Constructs a HiCDOCDataSet
from a set of HiC-Pro generated
files.
HiCDOCDataSetFromHiCPro(matrixPaths, bedPaths, replicates, conditions)
HiCDOCDataSetFromHiCPro(matrixPaths, bedPaths, replicates, conditions)
matrixPaths |
A vector of paths to HiC-Pro matrix files. |
bedPaths |
A vector of paths to HiC-Pro bed files. |
replicates |
A vector of replicate names repeated along the conditions. |
conditions |
A vector of condition names repeated along the replicates. |
## Not run: # Path to each matrix file matrixPaths = c( 'path/to/condition-1.replicate-1.matrix', 'path/to/condition-1.replicate-2.matrix', 'path/to/condition-2.replicate-1.matrix', 'path/to/condition-2.replicate-2.matrix', 'path/to/condition-3.replicate-1.matrix' ) # Path to each bed file bedPaths = c( 'path/to/condition-1.replicate-1.bed', 'path/to/condition-1.replicate-2.bed', 'path/to/condition-2.replicate-1.bed', 'path/to/condition-2.replicate-2.bed', 'path/to/condition-3.replicate-1.bed' ) # Replicate and condition of each file. Can be names instead of numbers. replicates <- c(1, 2, 1, 2, 1) conditions <- c(1, 1, 2, 2, 3) # Instantiation of data set hic.experiment <- HiCDOCDataSetFromHiCPro( matrixPaths = matrixPaths, bedPaths = bedPaths, replicates = replicates, conditions = conditions ) ## End(Not run)
## Not run: # Path to each matrix file matrixPaths = c( 'path/to/condition-1.replicate-1.matrix', 'path/to/condition-1.replicate-2.matrix', 'path/to/condition-2.replicate-1.matrix', 'path/to/condition-2.replicate-2.matrix', 'path/to/condition-3.replicate-1.matrix' ) # Path to each bed file bedPaths = c( 'path/to/condition-1.replicate-1.bed', 'path/to/condition-1.replicate-2.bed', 'path/to/condition-2.replicate-1.bed', 'path/to/condition-2.replicate-2.bed', 'path/to/condition-3.replicate-1.bed' ) # Replicate and condition of each file. Can be names instead of numbers. replicates <- c(1, 2, 1, 2, 1) conditions <- c(1, 1, 2, 2, 3) # Instantiation of data set hic.experiment <- HiCDOCDataSetFromHiCPro( matrixPaths = matrixPaths, bedPaths = bedPaths, replicates = replicates, conditions = conditions ) ## End(Not run)
HiCDOCDataSet
constructor from a tabular file.Constructs a HiCDOCDataSet
from a tabular file.
HiCDOCDataSetFromTabular(path, sep = '\t')
HiCDOCDataSetFromTabular(path, sep = '\t')
path |
A path to a tabular file. |
sep |
The separator of the tabular file. Default to tabulation. |
Accepts a tabular file with chromosome
, position 1
,
position 2
, and multiple replicate columns listing interaction counts.
Null interactions do not have to be listed. Values must be separated by
tabulations. The header must be
chromosome position 1 position 2 x.y x.y x.y ...
with x
replaced by condition names and y
replaced by replicate names.
path <- system.file("extdata", "liver_18_10M_500000.tsv", package = "HiCDOC") object <- HiCDOCDataSetFromTabular(path, sep = '\t')
path <- system.file("extdata", "liver_18_10M_500000.tsv", package = "HiCDOC") object <- HiCDOCDataSetFromTabular(path, sep = '\t')
Normalizes biological biases such as GC content and repeated regions. Uses the Knight-Ruiz balancing algorithm to transform interaction matrices into doubly stochastic matrices, with sum of each row and sum of each column equal to 1.
normalizeBiologicalBiases(object, parallel = FALSE)
normalizeBiologicalBiases(object, parallel = FALSE)
object |
|
parallel |
Should the normalization be run in parallel mode? Default to FALSE. |
A HiCDOCDataSet
with normalized interactions.
filterSparseReplicates
,
filterWeakPositions
,
normalizeTechnicalBiases
,
normalizeDistanceEffect
,
HiCDOC
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet object <- filterSparseReplicates(object) object <- filterWeakPositions(object) object <- normalizeBiologicalBiases(object)
data(exampleHiCDOCDataSet) object <- exampleHiCDOCDataSet object <- filterSparseReplicates(object) object <- filterWeakPositions(object) object <- normalizeBiologicalBiases(object)
Normalizes interactions by their "expected" value relative to the distance that separates their positions. The "expected" values are estimated with a loess regression on the proportion of interactions for each distance.
normalizeDistanceEffect(object, loessSampleSize = NULL, parallel = FALSE)
normalizeDistanceEffect(object, loessSampleSize = NULL, parallel = FALSE)
object |
|
loessSampleSize |
The number of positions used as a sample to estimate the effect of distance
on proportion of interactions. Defaults to
|
parallel |
Should the normalization be run in parallel mode? Default to FALSE. |
A HiCDOCDataSet
with normalized interactions.
normalizeTechnicalBiases
,
normalizeBiologicalBiases
,
HiCDOC
data(exampleHiCDOCDataSet) object <- normalizeDistanceEffect(exampleHiCDOCDataSet)
data(exampleHiCDOCDataSet) object <- normalizeDistanceEffect(exampleHiCDOCDataSet)
Normalizes technical biases such as sequencing depth by using a cyclic loess
to recursively normalize each pair of interaction matrices. Depends on
multiHiCcompare
.
normalizeTechnicalBiases(object, parallel = FALSE, cyclicLoessSpan = NULL)
normalizeTechnicalBiases(object, parallel = FALSE, cyclicLoessSpan = NULL)
object |
|
parallel |
Logical. Whether or not to parallelize the processing. Defaults to FALSE |
cyclicLoessSpan |
A numeric value in between 0 and 1. The span for cyclic loess normalization.
This value is passed to |
If parallel = TRUE
, the function
cyclic_loess
is launched in parallel mode, using bplapply
function. Before to call the function in parallel you should specify
the parallel parameters such as:
On Linux:
multiParam <- BiocParallel::MulticoreParam(workers = 10)
On Windows:
multiParam <- BiocParallel::SnowParam(workers = 10)
And then you can register the parameters to be used by BiocParallel:
BiocParallel::register(multiParam, default = TRUE)
A HiCDOCDataSet
with normalized interactions.
filterSparseReplicates
,
filterWeakPositions
,
normalizeBiologicalBiases
,
normalizeDistanceEffect
,
HiCDOC
data(exampleHiCDOCDataSet) object <- filterSmallChromosomes(exampleHiCDOCDataSet) object <- filterSparseReplicates(object) object <- filterWeakPositions(object) # Not printing loess warnings for example purpose. # Results should be inspected if there is any. suppressWarnings( object <- normalizeTechnicalBiases(object) )
data(exampleHiCDOCDataSet) object <- filterSmallChromosomes(exampleHiCDOCDataSet) object <- filterSparseReplicates(object) object <- filterWeakPositions(object) # Not printing loess warnings for example purpose. # Results should be inspected if there is any. suppressWarnings( object <- normalizeTechnicalBiases(object) )
Plots the result of the PCA on the compartments' centroids.
plotCentroids(object, chromosome, size = 2, checks = TRUE)
plotCentroids(object, chromosome, size = 2, checks = TRUE)
object |
|
chromosome |
A chromosome name or index in |
size |
Size of each point. Defaults to 2. |
checks |
Whether or not to add sanity checks messages on centroids. Default to TRUE. |
A ggplot
.
data(exampleHiCDOCDataSetProcessed) plotCentroids(exampleHiCDOCDataSetProcessed, chromosome = 1)
data(exampleHiCDOCDataSetProcessed) plotCentroids(exampleHiCDOCDataSetProcessed, chromosome = 1)
Plots the predicted compartments, along with their concordance in each replicate, and significant changes between experiment conditions.
plotCompartmentChanges( object, chromosome, threshold = 0.05, xlim = NULL, points = FALSE, checks = TRUE, colour = "gray90" )
plotCompartmentChanges( object, chromosome, threshold = 0.05, xlim = NULL, points = FALSE, checks = TRUE, colour = "gray90" )
object |
|
chromosome |
A chromosome name or index in |
threshold |
Significance threshold for the compartment changes. Defaults to 0.05. |
xlim |
A vector of the minimum and maximum positions to display. If NULL, displays all positions. Defaults to NULL. |
points |
Whether or not to add points to the concordances. Defaults to FALSE. |
checks |
Whether or not to add sanity checks messages. Default to TRUE. |
colour |
Border color for the compartments. Default to 'gray90'. 'NA' means no border. |
A ggplot
.
data(exampleHiCDOCDataSetProcessed) plotCompartmentChanges(exampleHiCDOCDataSetProcessed, chromosome = 1)
data(exampleHiCDOCDataSetProcessed) plotCompartmentChanges(exampleHiCDOCDataSetProcessed, chromosome = 1)
Plots the predicted compartments in each experiment condition.
plotCompartments(object, chromosome, xlim = NULL, colour = "gray90")
plotCompartments(object, chromosome, xlim = NULL, colour = "gray90")
object |
|
chromosome |
A chromosome name or index in |
xlim |
A vector of the minimum and maximum positions to display. If NULL, displays all positions. Defaults to NULL. |
colour |
Border color for the compartments. Default to 'gray90'. 'NA' means no border. |
A ggplot
.
data(exampleHiCDOCDataSetProcessed) plotCompartments(exampleHiCDOCDataSetProcessed, chromosome = 1)
data(exampleHiCDOCDataSetProcessed) plotCompartments(exampleHiCDOCDataSetProcessed, chromosome = 1)
Plots the distribution of concordance differences, which are the differences between concordances of each pair of replicates from different conditions. A concordance can be understood as a confidence in a genomic position's assigned compartment. Mathematically, it is the log ratio of a genomic position's distance to each compartment's centroid, normalized by the distance between both centroids, and min-maxed to a [-1,1] interval.
plotConcordanceDifferences(object)
plotConcordanceDifferences(object)
object |
A ggplot
.
data(exampleHiCDOCDataSetProcessed) plotConcordanceDifferences(exampleHiCDOCDataSetProcessed)
data(exampleHiCDOCDataSetProcessed) plotConcordanceDifferences(exampleHiCDOCDataSetProcessed)
Plots the concordances of each replicate in each experiment condition. A concordance can be understood as a confidence in a genomic position's assigned compartment. Mathematically, it is the log ratio of a genomic position's distance to each compartment's centroid, normalized by the distance between both centroids, and min-maxed to a [-1,1] interval.
plotConcordances( object, chromosome, xlim = NULL, threshold = 0.05, points = FALSE )
plotConcordances( object, chromosome, xlim = NULL, threshold = 0.05, points = FALSE )
object |
|
chromosome |
A chromosome name or index in |
xlim |
A vector of the minimum and maximum positions to display. If NULL, displays all positions. Defaults to NULL. |
threshold |
Significance threshold for the compartment changes. Defaults to 0.05. |
points |
Whether or not to add points to the concordances. Defaults to FALSE. |
A ggplot
.
data(exampleHiCDOCDataSetProcessed) plotConcordances(exampleHiCDOCDataSetProcessed, chromosome = 1)
data(exampleHiCDOCDataSetProcessed) plotConcordances(exampleHiCDOCDataSetProcessed, chromosome = 1)
Plots the distance effect on proportion of interactions. Each point is a cell in the interaction matrix, such that the x-axis is the distance with respect to the diagonal, the y-axis is number of counts. Dots are binned.
plotDistanceEffect( object, chromosome = NULL, transformX = "identity", transformY = "identity" )
plotDistanceEffect( object, chromosome = NULL, transformX = "identity", transformY = "identity" )
object |
|
chromosome |
Name (character) or index of the chromosome, if the plot should be restricted to only one chromosome. Default to NULL. |
transformX |
Transformation of the X axis. Default to "identity". See
|
transformY |
Transformation of the Y axis. Default to "identity". See
|
A ggplot
.
data(exampleHiCDOCDataSet) plotDistanceEffect(exampleHiCDOCDataSet)
data(exampleHiCDOCDataSet) plotDistanceEffect(exampleHiCDOCDataSet)
Plots the interaction matrices as heatmaps.
plotInteractions( object, chromosome, transform = NULL, colours = c(low = "#2c7bb6", mid = "#ffffbf", high = "#d7191c"), midpoint = 0 )
plotInteractions( object, chromosome, transform = NULL, colours = c(low = "#2c7bb6", mid = "#ffffbf", high = "#d7191c"), midpoint = 0 )
object |
|
chromosome |
A chromosome name or index in |
transform |
Transformation of the color scale. Default to NULL (no transformation). See
|
colours |
A character vector colours of length 3 to use for the gradient. See
|
midpoint |
midpoint value to be passed to |
A ggplot
.
data(exampleHiCDOCDataSet) plotInteractions(exampleHiCDOCDataSet, chromosome = 1)
data(exampleHiCDOCDataSet) plotInteractions(exampleHiCDOCDataSet, chromosome = 1)
Plots the boxplots of self interaction ratios, which are the differences between self interaction and median of other interactions for each genomic position. Since the A compartment is open with more interactions overall, it is assumed that self interaction ratios in compartment A are smaller than in compartment B.
plotSelfInteractionRatios(object, chromosome, checks = TRUE)
plotSelfInteractionRatios(object, chromosome, checks = TRUE)
object |
|
chromosome |
A chromosome name or index in |
checks |
Logical. Should sanity checks messages be printed on plot ? Default to TRUE. |
A ggplot
.
data(exampleHiCDOCDataSetProcessed) plotSelfInteractionRatios(exampleHiCDOCDataSetProcessed, chromosome = 1)
data(exampleHiCDOCDataSetProcessed) plotSelfInteractionRatios(exampleHiCDOCDataSetProcessed, chromosome = 1)
HiCDOCDataSet
.Reduces a HiCDOCDataSet
by keeping only given chromosomes,
conditions, or replicates.
reduceHiCDOCDataSet( object, chromosomes = NULL, conditions = NULL, replicates = NULL, dropLevels = TRUE )
reduceHiCDOCDataSet( object, chromosomes = NULL, conditions = NULL, replicates = NULL, dropLevels = TRUE )
object |
|
chromosomes |
The chromosome names or indices in |
conditions |
The condition names in |
replicates |
The replicate names in |
dropLevels |
Whether or not to also remove unused factor levels after filtering. Should be set to FALSE if the reduced objects are meant to be re-combined later. Defaults to TRUE. |
A reduced HiCDOCDataSet
.
data(exampleHiCDOCDataSet) reduced <- reduceHiCDOCDataSet(exampleHiCDOCDataSet, chromosomes = c(1, 2))
data(exampleHiCDOCDataSet) reduced <- reduceHiCDOCDataSet(exampleHiCDOCDataSet, chromosomes = c(1, 2))