Title: | Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs |
---|---|
Description: | Chromatin looping is an essential feature of eukaryotic genomes and can bring regulatory sequences, such as enhancers or transcription factor binding sites, in the close physical proximity of regulated target genes. Here, we provide sevenC, an R package that uses protein binding signals from ChIP-seq and sequence motif information to predict chromatin looping events. Cross-linking of proteins that bind close to loop anchors result in ChIP-seq signals at both anchor loci. These signals are used at CTCF motif pairs together with their distance and orientation to each other to predict whether they interact or not. The resulting chromatin loops might be used to associate enhancers or transcription factor binding sites (e.g., ChIP-seq peaks) to regulated target genes. |
Authors: | Jonas Ibn-Salem [aut, cre] |
Maintainer: | Jonas Ibn-Salem <[email protected]> |
License: | GPL-3 |
Version: | 1.27.0 |
Built: | 2024-10-31 05:24:01 UTC |
Source: | https://github.com/bioc/sevenC |
This function first adds ChIP-seq signals along all regions of motif location
using the function addCovToGR
. Than it calculates the
correlation of coverage for each input pair using the function
addCovCor
. The Pearson correlation coefficient is added as new
metadata column to the input interactions. Note, this function does not work
on windows because reading of bigWig files is currently not supported on
windows.
addCor(gi, bwFile, name = "chip", window = 1000, binSize = 1)
addCor(gi, bwFile, name = "chip", window = 1000, binSize = 1)
gi |
|
bwFile |
File path or connection to BigWig file with ChIP-seq signals. |
name |
Character indicating the sample name. |
window |
Numeric scalar for window size around the center of ranges in
|
binSize |
Integer scalar as size of bins to which the coverage values are combined. |
An GInteractions
object like gi
with a new metadata column colname
holding Pearson correlation
coefficient of ChIP-seq signals for each anchor pair.
if (.Platform$OS.type != "windows") { # use example bigWig file of ChIP-seq signals on human chromosome 22 exampleBigWig <- system.file("extdata", "GM12878_Stat1.chr22_1-30000000.bigWig", package = "sevenC") # use example CTCF moitf location on human chromosome 22 motifGR <- sevenC::motif.hg19.CTCF.chr22 # build candidate interactions gi <- prepareCisPairs(motifGR) # add ChIP-seq signals correlation gi <- addCor(gi, exampleBigWig) # use an alternative metadata column name for ChIP-seq correlation gi <- addCor(gi, exampleBigWig, name = "Stat1") # add ChIP-seq correlation for signals signals in windows of 500bp around # motif centers gi <- addCor(gi, exampleBigWig, window = 500) # add ChIP-seq correlation for signals in bins of 10 bp gi <- addCor(gi, exampleBigWig, binSize = 10) }
if (.Platform$OS.type != "windows") { # use example bigWig file of ChIP-seq signals on human chromosome 22 exampleBigWig <- system.file("extdata", "GM12878_Stat1.chr22_1-30000000.bigWig", package = "sevenC") # use example CTCF moitf location on human chromosome 22 motifGR <- sevenC::motif.hg19.CTCF.chr22 # build candidate interactions gi <- prepareCisPairs(motifGR) # add ChIP-seq signals correlation gi <- addCor(gi, exampleBigWig) # use an alternative metadata column name for ChIP-seq correlation gi <- addCor(gi, exampleBigWig, name = "Stat1") # add ChIP-seq correlation for signals signals in windows of 500bp around # motif centers gi <- addCor(gi, exampleBigWig, window = 500) # add ChIP-seq correlation for signals in bins of 10 bp gi <- addCor(gi, exampleBigWig, binSize = 10) }
This function adds a vector with correlation values for each input
interaction. Only works for input interaction within the given maxDist
distance. Note, this function does not work on windows because reading of
bigWig files is currently not supported on windows.
addCovCor(gi, datacol = "chip", colname = "cor_chip", maxDist = NULL, use = "everything", method = "pearson")
addCovCor(gi, datacol = "chip", colname = "cor_chip", maxDist = NULL, use = "everything", method = "pearson")
gi |
A sorted |
datacol |
a string matching an annotation column in |
colname |
A string that is used as columnname for the new column in
|
maxDist |
maximal distance of pairs in bp as numeric. If maxDist=NULL,
the maximal distance is computed from input interactions gi by
|
use |
an optional character string giving a method for computing
covariances in the presence of missing values. See |
method |
a character string indicating which correlation coefficient (or
covariance) is to be computed. One of "pearson" (default), "kendall", or
"spearman": can be abbreviated. See |
A GInteractions
similar to gi
just with an additional column added.
if (.Platform$OS.type != "windows") { # use internal motif data on chromosome 22 motifGR <- sevenC::motif.hg19.CTCF.chr22 # use example bigWig file exampleBigWig <- system.file("extdata", "GM12878_Stat1.chr22_1-30000000.bigWig", package = "sevenC") # add coverage from bigWig file motifGR <- addCovToGR(motifGR, exampleBigWig) # get all pairs within 1Mb gi <- getCisPairs(motifGR, 1e5) # compute correaltion of coverge for each pair gi <- addCovCor(gi) # addCovCor adds a new metadata column: mcols(gi) }
if (.Platform$OS.type != "windows") { # use internal motif data on chromosome 22 motifGR <- sevenC::motif.hg19.CTCF.chr22 # use example bigWig file exampleBigWig <- system.file("extdata", "GM12878_Stat1.chr22_1-30000000.bigWig", package = "sevenC") # add coverage from bigWig file motifGR <- addCovToGR(motifGR, exampleBigWig) # get all pairs within 1Mb gi <- getCisPairs(motifGR, 1e5) # compute correaltion of coverge for each pair gi <- addCovCor(gi) # addCovCor adds a new metadata column: mcols(gi) }
GRanges
object.This function adds a NumericList
of coverage (or any
other signal in the input bigWig file) to each range in a
GRanges
object. The coverage is reported for a
fixed-sized window around the region center. For regions with negative strand,
the coverage vector is reversed. The coverage signal is added as new metadata
column holding a NumericList
object. Note, this
function does not work on windows because reading of bigWig files is currently
not supported on windows.
addCovToGR(gr, bwFile, window = 1000, binSize = 1, colname = "chip")
addCovToGR(gr, bwFile, window = 1000, binSize = 1, colname = "chip")
gr |
|
bwFile |
File path or connection to BigWig or wig file with coverage to parse from. |
window |
Numeric scalar for window size around the center of ranges in
|
binSize |
Integer scalar as size of bins to which the coverage values are combined. |
colname |
Character as name of the new column that is created in
|
GRanges
as input but with an additional
meta column containing the coverage values for each region as
NumericList
.
if (.Platform$OS.type != "windows") { # use example bigWig file of ChIP-seq signals on human chromosome 22 exampleBigWig <- system.file("extdata", "GM12878_Stat1.chr22_1-30000000.bigWig", package = "sevenC") # use example CTCF moitf location on human chromosome 22 motifGR <- sevenC::motif.hg19.CTCF.chr22 # add ChIP-seq signals to motif regions motifGR <- addCovToGR(motifGR, exampleBigWig) # add ChIP-seq signals as column named "Stat1" motifGR <- addCovToGR(motifGR, exampleBigWig, colname = "Stat1") # add ChIP-seq signals in windows of 500bp around motif centers motifGR <- addCovToGR(motifGR, exampleBigWig, window = 500) # add ChIP-seq signals in bins of 10 bp motifGR <- addCovToGR(motifGR, exampleBigWig, binSize = 10) }
if (.Platform$OS.type != "windows") { # use example bigWig file of ChIP-seq signals on human chromosome 22 exampleBigWig <- system.file("extdata", "GM12878_Stat1.chr22_1-30000000.bigWig", package = "sevenC") # use example CTCF moitf location on human chromosome 22 motifGR <- sevenC::motif.hg19.CTCF.chr22 # add ChIP-seq signals to motif regions motifGR <- addCovToGR(motifGR, exampleBigWig) # add ChIP-seq signals as column named "Stat1" motifGR <- addCovToGR(motifGR, exampleBigWig, colname = "Stat1") # add ChIP-seq signals in windows of 500bp around motif centers motifGR <- addCovToGR(motifGR, exampleBigWig, window = 500) # add ChIP-seq signals in bins of 10 bp motifGR <- addCovToGR(motifGR, exampleBigWig, binSize = 10) }
GInteractions
with overlap support.See overlap methods in InteractionSet
package for more details
on the overlap calculations: ?overlapsAny
addInteractionSupport(gi, subject, colname = "loop", ...)
addInteractionSupport(gi, subject, colname = "loop", ...)
gi |
|
subject |
another |
colname |
name of the new annotation column in |
... |
additional arguments passed to |
InteractionSet
gi
as input but with additional
annotation column colname
indicating whether each interaction
is supported by subject
or not.
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # build example GIntreaction object gi <- GInteractions( c(1, 2, 2), c(4, 3, 4), anchorGR, mode = "strict" ) # build exapple support GInteractions object exampleSupport <- GInteractions( GRanges("chr1", IRanges(1, 4)), GRanges("chr1", IRanges(15, 20)) ) # add support gi <- addInteractionSupport(gi, subject = exampleSupport) # Use colname argument to add support to differnt metadata column name gi <- addInteractionSupport(gi, subject = exampleSupport, colname = "example")
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # build example GIntreaction object gi <- GInteractions( c(1, 2, 2), c(4, 3, 4), anchorGR, mode = "strict" ) # build exapple support GInteractions object exampleSupport <- GInteractions( GRanges("chr1", IRanges(1, 4)), GRanges("chr1", IRanges(15, 20)) ) # add support gi <- addInteractionSupport(gi, subject = exampleSupport) # Use colname argument to add support to differnt metadata column name gi <- addInteractionSupport(gi, subject = exampleSupport, colname = "example")
If each anchor region (motif) has a score as annotation column, this function adds two new columns named "score_1" and "score_2" with the scores of the first and the second anchor region, respectively. Additionally, a column named "score_min" is added with holds for each interaction the minimum of "score_1" and "score_2".
addMotifScore(gi, scoreColname = "score")
addMotifScore(gi, scoreColname = "score")
gi |
|
scoreColname |
Character as name the metadata column in with motif score. |
The same GInteractions
as gi
but with three
additional annotation columns.
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # build example GIntreaction object gi <- GInteractions( c(1, 2, 2), c(4, 3, 4), anchorGR, mode = "strict" ) # add add motif score gi <- addMotifScore(gi, scoreColname = "score")
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # build example GIntreaction object gi <- GInteractions( c(1, 2, 2), c(4, 3, 4), anchorGR, mode = "strict" ) # add add motif score gi <- addMotifScore(gi, scoreColname = "score")
Each anchor region has a strand that is '+'
or '-'
. Therefore,
the each interaction between two regions has one of the following strand
combinations: "forward", "reverse", "convergent", or "divergent". Unstranded
ranges, indicated by *
, are treated as positive strand.
addStrandCombination(gi, colname = "strandOrientation")
addStrandCombination(gi, colname = "strandOrientation")
gi |
|
colname |
name of the new column that is created in |
The same GInteractions
as gi
but with an
additional column indicating the four possible combinations of strands
"forward", "reverse", "convergent", or "divergent".
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # build example GIntreaction object gi <- GInteractions( c(1, 2, 2), c(4, 3, 4), anchorGR, mode = "strict" ) # add combination of anchor strands as new metadata column gi <- addStrandCombination(gi) # build small matrix to check strand combination cbind( as.character(strand(anchors(gi, "first"))), as.character(strand(anchors(gi, "second"))), mcols(gi)[, "strandOrientation"] )
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # build example GIntreaction object gi <- GInteractions( c(1, 2, 2), c(4, 3, 4), anchorGR, mode = "strict" ) # add combination of anchor strands as new metadata column gi <- addStrandCombination(gi) # build small matrix to check strand combination cbind( as.character(strand(anchors(gi, "first"))), as.character(strand(anchors(gi, "second"))), mcols(gi)[, "strandOrientation"] )
This value is the average optimal cutoff value on the 10 best performing TF
ChIP-seq data sets. It is used as default cutoff value on the logistic
regression response score in predLoops
function See
?'cutoffByTF'
for more details.
cutoffBest10
cutoffBest10
An object of class numeric
of length 1.
This dataset contains optimal cutoff scores for the response value of logistic regression models. The cutoff is based on optimal F1-scores. A separate model was trained For each of 124 TF ChIP-seq datasets in human GM12878 cells. The model performance was calculated with Hi-C and ChIA-PET interactions using 10-fold cross-validation.
cutoffByTF
cutoffByTF
An object of class tbl_df
with 121 rows and 3 columns:
Transcription factor name
The optimal cutoff on the logistic regression response value
The optimal f1-score associated to the max_cutoff
value
modelBest10Avg
and TFspecificModels
GInteractions
object with all pairs of input
GRanges
within a given distance.Distance is calculated from the center of input regions.
getCisPairs(inGR, maxDist = 1e+06)
getCisPairs(inGR, maxDist = 1e+06)
inGR |
|
maxDist |
maximal distance in base-pairs between pairs of ranges as single numeric value. |
A GInteractions
object with all pairs
within the given distance.
# build example GRanges as input inGR <- GRanges( rep("chr1", 5), IRanges( c(10, 20, 30, 100, 1000), c(15, 25, 35, 105, 1005) ) ) # get all pairs within 50 bp gi <- getCisPairs(inGR, maxDist = 50) # getCisPiars returns a StrictGInteractions object class(gi) # The input regions are accessibly via regions() regions(gi)
# build example GRanges as input inGR <- GRanges( rep("chr1", 5), IRanges( c(10, 20, 30, 100, 1000), c(15, 25, 35, 105, 1005) ) ) # get all pairs within 50 bp gi <- getCisPairs(inGR, maxDist = 50) # getCisPiars returns a StrictGInteractions object class(gi) # The input regions are accessibly via regions() regions(gi)
Get out of chromosomal bound ranges.
getOutOfBound(gr)
getOutOfBound(gr)
gr |
A |
A data.frame
with rows for each range in gr
that
extends out of chromosomes. The first column holds the index of the range in
gr
, the second the size of the overlap to the left of the chromosome
and the third the size of the overlap to the right of the chromosome.
This dataset contains term names and estimates for logistic regression model to predict chromatin looping interactions. The estimate represent an average of the 10 best performing models out of 124 transcription factor ChIP-seq data sets from ENCODE.
modelBest10Avg
modelBest10Avg
An object of class data.frame
with 7 rows and 2 columns
holding the term name and estimate.
The intercept of the logistic regression model.
The genomic distance between the centers of motifs in base pairs (bp).
Orientation of motif pairs. 1 if divergent 0 if not.
Orientation of motif pairs. 1 if forward 0 if not.
Orientation of motif pairs. 1 if reverse 0 if not.
Minimum of motif hit score between both motifs in pair. The motif score is defined as -log_10 of the p-value of the motif hit as reported by JASPAR motif tracks. The unit is -log_10(p) where p is the p-value of the motif hit.
Pearson correlation coefficient of ChIP-seq signals across +/- 500 bp around CTCF motif centers.
Each of 124 transcription factor (TF) ChIP-seq data sets from ENCODE in
GM12878 cells were used to train a logistic regression model. All CTCF motifs
in motif.hg19.CTCF
within a distance of 1 Mb were used as
candidates. A given pair was labled as true loop interactions, if it has
interaction support based on loops from Hi-C in human GM12878 cells from Rao
et al. 2014 or ChIA-PET loops from Tang et al. 2015 in the same cell type.
The 10 best performing models were selected based on the average area under
the precision-recall-curve in 10-fold cross-validation.
The parameters were than averaged across the 10 best performig models.
Suhas S.P. Rao, Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, Ido Machol, Arina D. Omer, Eric S. Lander, Erez Lieberman Aiden, A 3D Map of the Human Genome at Kilobase Resolution Reveals #' Principles of Chromatin Looping, Cell, Volume 159, Issue 7, 18 December 2014, Pages 1665-1680, ISSN 0092-8674, https://doi.org/10.1016/j.cell.2014.11.021.
Zhonghui Tang, Oscar Junhong Luo, Xingwang Li, Meizhen Zheng, Jacqueline Jufen Zhu, Przemyslaw Szalaj, Pawel Trzaskoma, Adriana Magalska, Jakub Wlodarczyk, Blazej Ruszczycki, Paul Michalski, Emaly Piecuch, Ping Wang, Danjuan Wang, Simon Zhongyuan Tian, May Penrad-Mobayed, Laurent M. Sachs, Xiaoan Ruan, Chia-Lin Wei, Edison T. Liu, Grzegorz M. Wilczynski, Dariusz Plewczynski, Guoliang Li, Yijun Ruan, CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription, Cell, Volume 163, Issue 7, 17 December 2015, Pages 1611-1627, ISSN 0092-8674, https://doi.org/10.1016/j.cell.2015.11.024.
cutoffBest10
and TFspecificModels
A dataset containing the motif hits of the CTCF recognition motif from JASPAR database (MA0139.1, http://jaspar.genereg.net/matrix/MA0139.1/) in human genome assembly hg19.
motif.hg19.CTCF
motif.hg19.CTCF
GRanges
object with 38774 ranges on
positive and negative strand with 1 meta column:
The significance socre of the motif hit, defined as -log_10(p-value).
The dataset was downloaded from JASPAR 2018 motif tracks from the following URL: http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/hg19/tsv/MA0139.1.tsv.gz
Motif locations were filtered to contain only motif hists with p-value
.
The p-value is the motif hit significance as repoted from the
motif scanning alogrithim used during construction of the JASPAR motif
tracks. More information on the JASPAR motif track pipeline can be found
here:
https://github.com/wassermanlab/JASPAR-UCSC-tracks.
http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/hg19/tsv/MA0139.1.tsv.gz
A dataset containing the motif hits of the CTCF recognition motif from JASPAR
database (MA0139.1) in human genome assembly hg19. Only motifs with a p-value
on chromosome 22 are reported.
motif.hg19.CTCF.chr22
motif.hg19.CTCF.chr22
An object of class GRanges
of length 917.
See '?motif.hg19.CTCF' for a more details and the full data set.
This dataset is the same as motif.hg19.CTCF.chr22
but with one
additional metadata colum, named "chip", holding ChIP-seq signals for all
motifs in a windows of 1000 bp around the motif center as
NumericList
. The data is from a ChIP-seq experiment
for STAT1 in human GM12878 cells. The full bigWig file can be downloaded from
ENCODE (Dunham et al. 2012)
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhTfbs/wgEncodeSydhTfbsGm12878Stat1StdSig.bigWig.
See motif.hg19.CTCF
and motif.hg19.CTCF.chr22
for
a more details and the motif data set.
motif.hg19.CTCF.chr22.cov
motif.hg19.CTCF.chr22.cov
An object of class GRanges
of length 917.
motif.hg19.CTCF
, motif.hg19.CTCF.chr22
returns indices of columns with non-zero variance
noZeroVar(dat)
noZeroVar(dat)
dat |
data.frame or matrix |
column indices of columns with non-zero variance
GInteractions
.Parse chromatin loops from Rao et al. 2014 as strict
GInteractions
.
parseLoopsRao(inFile, ...)
parseLoopsRao(inFile, ...)
inFile |
input file with loops |
... |
additional arguments, that will be passed to
|
GInteractions
with loops from input file.
# use example loop file exampleLoopFile <- system.file("extdata", "GM12878_HiCCUPS.chr22_1-30000000.loop.txt", package = "sevenC") # read loops form example file: gi <- parseLoopsRao(exampleLoopFile) # read loops with custom seqinfo object: customSeqInfo <- Seqinfo(seqnames = c("chr1", "chr22"), seqlengths = c(10^8, 10^8), isCircular = c(FALSE, FALSE), genome = "custom") gi <- parseLoopsRao(exampleLoopFile, seqinfo = customSeqInfo)
# use example loop file exampleLoopFile <- system.file("extdata", "GM12878_HiCCUPS.chr22_1-30000000.loop.txt", package = "sevenC") # read loops form example file: gi <- parseLoopsRao(exampleLoopFile) # read loops with custom seqinfo object: customSeqInfo <- Seqinfo(seqnames = c("chr1", "chr22"), seqlengths = c(10^8, 10^8), isCircular = c(FALSE, FALSE), genome = "custom") gi <- parseLoopsRao(exampleLoopFile, seqinfo = customSeqInfo)
GInteractions
.Reads pairwise ChIA-PET interaction from an input file.
parseLoopsTang(inFile, ...)
parseLoopsTang(inFile, ...)
inFile |
input file with loops |
... |
additional arguments, that will be passed to
|
It reads files with the following tab-delimited format:
chr12 | 48160351 | 48161634 | chr12 | 48230665 | 48232848 | 27 |
chr7 | 77284664 | 77285815 | chr7 | 77388242 | 77388928 | 7 |
chr4 | 128459961 | 128460166 | chr4 | 128508304 | 128509082 | 4 |
This file format was used for ChIA-PET interaction data by Tang et al. 2015 http://dx.doi.org/10.1016/j.cell.2015.11.024. The last column of input file is added as annotation column with colname "score".
An GInteractions
with loops from input file.
exampleLoopTang2015File <- system.file("extdata", "ChIA-PET_GM12878_Tang2015.chr22_1-30000000.clusters.txt", package = "sevenC") gi <- parseLoopsTang(exampleLoopTang2015File) # read loops with custom seqinfo object: customSeqInfo <- Seqinfo(seqnames = c("chr1", "chr22"), seqlengths = c(10^8, 10^8), isCircular = c(FALSE, FALSE), genome = "custom") gi <- parseLoopsTang(exampleLoopTang2015File, seqinfo = customSeqInfo)
exampleLoopTang2015File <- system.file("extdata", "ChIA-PET_GM12878_Tang2015.chr22_1-30000000.clusters.txt", package = "sevenC") gi <- parseLoopsTang(exampleLoopTang2015File) # read loops with custom seqinfo object: customSeqInfo <- Seqinfo(seqnames = c("chr1", "chr22"), seqlengths = c(10^8, 10^8), isCircular = c(FALSE, FALSE), genome = "custom") gi <- parseLoopsTang(exampleLoopTang2015File, seqinfo = customSeqInfo)
Predict interaction probability using logistic regression model.
predLogit(data, formula, betas)
predLogit(data, formula, betas)
data |
A data.frame like object with predictor variables. |
formula |
A |
betas |
A vector with parameter estimates for predictor variables. They
should be in the same order as variables in |
A numeric vector with interaction probabilities for each observation
in df
. NAs are produced for NAs in df
.
This function takes a GInteractions
object with
candidate looping interactions. It should be annotated with features in
metadata columns. A logistic regression model is applied to predict looping
interaction probabilities.
predLoops(gi, formula = NULL, betas = NULL, colname = "pred", cutoff = get("cutoffBest10"))
predLoops(gi, formula = NULL, betas = NULL, colname = "pred", cutoff = get("cutoffBest10"))
gi |
A |
formula |
A |
betas |
A vector with parameter estimates for predictor variables. They
should be in the same order as variables in |
colname |
A |
cutoff |
Numeric cutoff on prediction score. Only interactions with
interaction probability >= |
A GInteractions
as gi
with an
additional metadata column holding the predicted looping probability.
# use example CTCF moitf location on human chromosome 22 with chip coverage motifGR <- sevenC::motif.hg19.CTCF.chr22.cov # build candidate interactions gi <- prepareCisPairs(motifGR) # add ChIP-seq signals correlation gi <- addCovCor(gi) # predict chromatin looping interactions loops <- predLoops(gi) # add prediction score for all candidates without filter gi <- predLoops(gi, cutof = NULL) # add prediction score using custom column name gi <- predLoops(gi, cutof = NULL, colname = "my_colname") # Filter loop predictions on custom cutoff loops <- predLoops(gi, cutoff = 0.4) # predict chromatin looping interactions using custom model parameters myParams <- c(-4, -5, -2, -1, -1, 5, 3) loops <- predLoops(gi, betas = myParams) # predict chromatin loops using custom model formula and params myFormula <- ~ dist + score_min # define parameters for intercept, dist and motif_min myParams <- c(-5, -4, 6) loops <- predLoops(gi, formula = myFormula, betas = myParams)
# use example CTCF moitf location on human chromosome 22 with chip coverage motifGR <- sevenC::motif.hg19.CTCF.chr22.cov # build candidate interactions gi <- prepareCisPairs(motifGR) # add ChIP-seq signals correlation gi <- addCovCor(gi) # predict chromatin looping interactions loops <- predLoops(gi) # add prediction score for all candidates without filter gi <- predLoops(gi, cutof = NULL) # add prediction score using custom column name gi <- predLoops(gi, cutof = NULL, colname = "my_colname") # Filter loop predictions on custom cutoff loops <- predLoops(gi, cutoff = 0.4) # predict chromatin looping interactions using custom model parameters myParams <- c(-4, -5, -2, -1, -1, 5, 3) loops <- predLoops(gi, betas = myParams) # predict chromatin loops using custom model formula and params myFormula <- ~ dist + score_min # define parameters for intercept, dist and motif_min myParams <- c(-5, -4, 6) loops <- predLoops(gi, formula = myFormula, betas = myParams)
GInteractions
and add
genomic features.Prepares motif pairs as GInteractions
and add
genomic features.
prepareCisPairs(motifs, maxDist = 1e+06, scoreColname = "score")
prepareCisPairs(motifs, maxDist = 1e+06, scoreColname = "score")
motifs |
|
maxDist |
maximal distance in base-pairs between pairs of ranges as single numeric value. |
scoreColname |
Character as name the metadata column in with motif score. |
An GInteractions
object with motif
pairs and annotations of distance, strand orientation, and motif scores.
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # prepare candidates gi <- prepareCisPairs(anchorGR) # prepare candidates using a mimial distance of 10 bp gi <- prepareCisPairs(anchorGR, maxDist = 10) # prepare candidates using an alternative score value in anchors anchorGR$myScore <- rnorm(length(anchorGR)) gi <- prepareCisPairs(anchorGR, scoreColname = "myScore")
# build example GRanges as anchors anchorGR <- GRanges( rep("chr1", 4), IRanges( c(1, 5, 20, 14), c(4, 8, 23, 17) ), strand = c("+", "+", "+", "-"), score = c(5, 4, 6, 7) ) # prepare candidates gi <- prepareCisPairs(anchorGR) # prepare candidates using a mimial distance of 10 bp gi <- prepareCisPairs(anchorGR, maxDist = 10) # prepare candidates using an alternative score value in anchors anchorGR$myScore <- rnorm(length(anchorGR)) gi <- prepareCisPairs(anchorGR, scoreColname = "myScore")
Chromatin looping is an essential feature of eukaryotic genomes and can bring regulatory sequences, such as enhancers or transcription factor binding sites, in the close physical proximity of regulated target genes. Here, we provide sevenC, an R package that uses protein binding signals from ChIP-seq and sequence motif information to predict chromatin looping events. Cross-linking of proteins that bind close to loop anchors result in ChIP-seq signals at both anchor loci. These signals are used at CTCF motif pairs together with their distance and orientation to each other to predict whether they interact or not. The resulting chromatin loops might be used to associate enhancers or transcription factor binding sites (e.g., ChIP-seq peaks) to regulated target genes.
Source: http://stats.stackexchange.com/questions/3051/mean-of-a-sliding-window-in-r
slideMean(x, k)
slideMean(x, k)
x |
numeric vector |
k |
interval size |
numeric vector of length length(x) / k
.
sevenC was trained on 124 TF ChIP-seq data sets from ENCODE. Specific parameters are provided in this data set.
TFspecificModels
TFspecificModels
A data.frame
with 868 rows and 7 columns.
TF name used in ChIP-seq experiment.
File accession ID from ENCODE project
Model term name. See modelBest10Avg
for more
detials.
Mean parameter estimate in 10-fold cross-validation
Median parameter estimate in 10-fold cross-validation
Standard deviation of parameter estimate in 10-fold cross-validation