DegCre can be installed from Bioconductor as follows:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DegCre")
Alternatively, DegCre can be installed from GitHub. With the
devtools
pacakge installed, run:
DegCre associates differentially expressed genes (DEGs) with
cis-regulatory elements (CREs) in a probabilistic, non-parametric
approach. DegCre is intended to be applied on differential expression
and regulatory signal data derived from responses to perturbations such
as drug or natural ligand treatmnents. As an example used here, we have
obtained data from McDowell et al. (McDowell et
al. 2018) which was generated by treating A549 cells with
dexamethasone and measuring RNA-seq and ChIP-seq data at several time
points. Data from RNA-seq and NR3C1 ChIP-seq at four hours versus
control is stored in the list DexNR3C1
.
DegCre uses the GenomicRanges
framework for handling genomic regions and some calculations. As one
input, DegGR
, users generate differential expression
statistics for genes with methods such as DESeq2
or edgeR.
These values should then be associated with gene TSSs such as those
available from EPDNew (Dreos et al. 2015; Meylan
et al. 2020) in a GRanges
. The second input,
CreGR
, is differential regulatory signal data (in the
example, NR3C1 ChIP-seq data) associated with genomic regions in a
GRanges
such as those generated by csaw.
A complete description of the mathematical basis of the DegCre core
algorithms is provided in (Roberts et al.
2024). DegCre generates a Hits
object of all
associations between DegGR
and CreGR
within a
specified maximum distance. Associations are then binned by TSS-to-CRE
distance according to an algorithm that balances resolution (many bins
with few members) versus minimization of the deviance of each bin’s CRE
p-value distribution from the global distribution, selecting an optimal
bin size.
Next, DegCre applies a non-parametric algorithm to find concordance
between DEG and CRE differential effects within bins and derives an
association probability. For all association probabilities involving one
given CRE, the probabilities are adjusted to favor associations across
shorter distances. An FDR of the association probability is then
estimated. Results are returned in list containing a Hits
object and both input GRanges
.
To begin, we load the necessary packages:
library(GenomicRanges)
library(InteractionSet)
library(plotgardener)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
library(DegCre)
library(magick)
DegCre is run on the two input GRanges
,
DegGR
and CreGR
, and differential p-values
associated with each. To require the effect directions be concordant
between the changes in gene expression and CRE signal, users must also
supply log fold-changes for each and set
reqEffectDirConcord=TRUE
(default value). As example data,
we use DexNR3C1
derived from McDowell et al. (McDowell et al. 2018) using DESeq2
and csaw:
data(DexNR3C1)
lapply(DexNR3C1,head)
#> $DegGR
#> GRanges object with 6 ranges and 7 metadata columns:
#> seqnames ranges strand | promGeneName GeneSymb
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] chr1 959255-959256 - | NOC2L_1 NOC2L
#> [2] chr1 960632-960633 + | KLHL17_1 KLHL17
#> [3] chr1 966481-966482 + | PLEKHN1_1 PLEKHN1
#> [4] chr1 976680-976681 - | PERM1_1 PERM1
#> [5] chr1 1000096-1000097 - | HES4_1 HES4
#> [6] chr1 1000510-1000511 + | ISG15_2 ISG15
#> GeneID baseMean logFC pVal pAdj
#> <character> <numeric> <numeric> <numeric> <numeric>
#> [1] ENSG00000188976.9 1404.01811 -0.0148658 0.89202335 0.9804131
#> [2] ENSG00000187961.12 182.81254 -0.5189710 0.00183145 0.0235433
#> [3] ENSG00000187583.9 56.95208 -0.1598313 0.50313987 0.8747718
#> [4] ENSG00000187642.8 1.41588 -1.0589853 0.49900530 NA
#> [5] ENSG00000188290.9 76.33487 -0.7547663 0.00184199 0.0236628
#> [6] ENSG00000187608.7 97.50041 -0.0390943 0.88895013 0.9803767
#> -------
#> seqinfo: 23 sequences from an unspecified genome; no seqlengths
#>
#> $CreGR
#> GRanges object with 6 ranges and 3 metadata columns:
#> seqnames ranges strand | logFC pVal pAdj
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
#> [1] chr1 941905-941960 * | 1.875569 0.3827979 0.947530
#> [2] chr1 943129-943526 * | 3.713827 0.0154015 0.120363
#> [3] chr1 1686385-1686476 * | 2.030911 0.4654492 1.000000
#> [4] chr1 1741105-1741250 * | 2.061714 0.2426531 0.732045
#> [5] chr1 1777591-1777610 * | -0.366914 0.6429142 1.000000
#> [6] chr1 1777645-1777700 * | 0.241421 1.0000000 1.000000
#> -------
#> seqinfo: 23 sequences from an unspecified genome
We now run DegCre on this data with default values (subsetting input to “chr1” for expediency):
subDegGR <- DexNR3C1$DegGR[which(seqnames(DexNR3C1$DegGR)=="chr1")]
subCreGR <- DexNR3C1$CreGR[which(seqnames(DexNR3C1$CreGR)=="chr1")]
myDegCreResList <- runDegCre(DegGR=subDegGR,
DegP=subDegGR$pVal,
DegLfc=subDegGR$logFC,
CreGR=subCreGR,
CreP=subCreGR$pVal,
CreLfc=subCreGR$logFC,
reqEffectDirConcord=TRUE,
verbose=FALSE)
DegCre produces a list object with several items. These include the
inputs DegGR
and CreGR
. It also contains a
Hits
object degCreHits
. The
queryHits
of degCreHits
index
DegGR
and the subjectHits
index
CreGR
. degCreHits
also has several metadata
columns that include the key resuls of the algorithm. These include
assocProb
which is the probabilty of the association being
true, and assocProbFDR
which is the FDR of
assocProb
being greater than a null model based on TSS to
CRE distance alone. Also, the results of the distance bin optimization
heuristic, binHeurOutputs
, and the input DEG significance
threshold alphaVal
are returned.
names(myDegCreResList)
#> [1] "degCreHits" "binHeurOutputs" "alphaVal" "DegGR"
#> [5] "CreGR"
head(myDegCreResList$degCreHits)
#> Hits object with 6 hits and 10 metadata columns:
#> queryHits subjectHits | assocDist assocProb assocProbFDR rawAssocProb
#> <integer> <integer> | <integer> <numeric> <numeric> <numeric>
#> [1] 1 5 | 818334 0.0000000 1.00000e+00 0.226259
#> [2] 2 5 | 816957 0.0980146 1.21683e-03 0.226259
#> [3] 3 5 | 811108 0.0000000 1.00000e+00 0.226259
#> [4] 4 5 | 800909 0.0000000 1.00000e+00 0.226259
#> [5] 5 5 | 777493 0.1276152 5.57757e-11 0.212164
#> [6] 6 5 | 777079 0.0000000 1.00000e+00 0.212164
#> CreP DegP DegPadj binAssocDist numObs distBinId
#> <numeric> <numeric> <numeric> <numeric> <numeric> <integer>
#> [1] 0.642914 0.89202335 1.0000000 860869 2277 13
#> [2] 0.642914 0.00183145 0.0905258 860869 2277 13
#> [3] 0.642914 0.50313987 1.0000000 860869 2277 13
#> [4] 0.642914 0.49900530 1.0000000 860869 2277 13
#> [5] 0.642914 0.00184199 0.0909295 788305 2277 12
#> [6] 0.642914 0.88895013 1.0000000 788305 2277 12
#> -------
#> queryLength: 2791 / subjectLength: 2515
To find associations driven by non-random changes in CRE signal, we
subset the degCreHits
by assocProbFDR
. We can
also further subset by assocProb
to find those also more
likely to confirm in a suitable validation experiment:
passFDR <- which(mcols(myDegCreResList$degCreHits)$assocProbFDR <= 0.05)
passProb <- which(mcols(myDegCreResList$degCreHits)$assocProb >= 0.25)
keepDegCreHits <- myDegCreResList$degCreHits[intersect(passFDR,passProb)]
keepDegCreHits
#> Hits object with 1454 hits and 10 metadata columns:
#> queryHits subjectHits | assocDist assocProb assocProbFDR rawAssocProb
#> <integer> <integer> | <integer> <numeric> <numeric> <numeric>
#> [1] 106 12 | 21906 0.618061 0.00000e+00 0.618097
#> [2] 106 11 | 197028 0.263720 1.11022e-16 0.263736
#> [3] 110 13 | 133 0.496307 0.00000e+00 0.496308
#> [4] 111 13 | 1110 0.496307 0.00000e+00 0.496308
#> [5] 112 14 | 26733 0.506025 8.46745e-12 0.653970
#> ... ... ... . ... ... ... ...
#> [1450] 2774 2509 | 736 0.603130 0 0.623462
#> [1451] 2774 2508 | 12166 0.418739 0 0.432855
#> [1452] 2774 2510 | 25936 0.469207 0 0.485025
#> [1453] 2774 2511 | 34810 0.387866 0 0.400941
#> [1454] 2774 2512 | 34882 0.663104 0 0.685458
#> CreP DegP DegPadj binAssocDist numObs distBinId
#> <numeric> <numeric> <numeric> <numeric> <numeric> <integer>
#> [1] 0.000925164 3.15116e-07 5.75016e-05 48582 2277 1
#> [2] 0.015637885 3.15116e-07 5.75016e-05 220204 2277 4
#> [3] 0.026711376 9.18081e-10 8.97545e-07 48582 2277 1
#> [4] 0.026711376 9.18081e-10 8.97545e-07 48582 2277 1
#> [5] 0.000177546 6.40784e-03 2.26227e-01 48582 2277 1
#> ... ... ... ... ... ... ...
#> [1450] 5.56120e-04 0.000456518 0.0326117 48582 2277 1
#> [1451] 7.08808e-01 0.000456518 0.0326117 48582 2277 1
#> [1452] 5.06296e-02 0.000456518 0.0326117 48582 2277 1
#> [1453] 9.04117e-01 0.000456518 0.0326117 48582 2277 1
#> [1454] 9.74797e-05 0.000456518 0.0326117 48582 2277 1
#> -------
#> queryLength: 2791 / subjectLength: 2515
We can then view the subsets of DegGR
and
CreGR
that comprise the passing associations:
myDegCreResList$DegGR[queryHits(keepDegCreHits)]
#> GRanges object with 1454 ranges and 7 metadata columns:
#> seqnames ranges strand | promGeneName GeneSymb
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] chr1 6180123-6180124 - | CHD5_1 CHD5
#> [2] chr1 6180123-6180124 - | CHD5_1 CHD5
#> [3] chr1 6259438-6259439 - | GPR153_2 GPR153
#> [4] chr1 6261063-6261064 - | GPR153_1 GPR153
#> [5] chr1 6358928-6358929 - | ACOT7_4 ACOT7
#> ... ... ... ... . ... ...
#> [1450] chr1 246582721-246582722 + | CNST_3 CNST
#> [1451] chr1 246582721-246582722 + | CNST_3 CNST
#> [1452] chr1 246582721-246582722 + | CNST_3 CNST
#> [1453] chr1 246582721-246582722 + | CNST_3 CNST
#> [1454] chr1 246582721-246582722 + | CNST_3 CNST
#> GeneID baseMean logFC pVal pAdj
#> <character> <numeric> <numeric> <numeric> <numeric>
#> [1] ENSG00000116254.16 52.5996 1.503528 3.15116e-07 5.75016e-05
#> [2] ENSG00000116254.16 52.5996 1.503528 3.15116e-07 5.75016e-05
#> [3] ENSG00000158292.6 98.8953 1.869499 9.18081e-10 8.97545e-07
#> [4] ENSG00000158292.6 98.8953 1.869499 9.18081e-10 8.97545e-07
#> [5] ENSG00000097021.18 430.7171 0.354015 6.40784e-03 2.26227e-01
#> ... ... ... ... ... ...
#> [1450] ENSG00000162852.12 1165.88 0.48361 0.000456518 0.0326117
#> [1451] ENSG00000162852.12 1165.88 0.48361 0.000456518 0.0326117
#> [1452] ENSG00000162852.12 1165.88 0.48361 0.000456518 0.0326117
#> [1453] ENSG00000162852.12 1165.88 0.48361 0.000456518 0.0326117
#> [1454] ENSG00000162852.12 1165.88 0.48361 0.000456518 0.0326117
#> -------
#> seqinfo: 23 sequences from an unspecified genome; no seqlengths
myDegCreResList$CreGR[subjectHits(keepDegCreHits)]
#> GRanges object with 1454 ranges and 3 metadata columns:
#> seqnames ranges strand | logFC pVal pAdj
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
#> [1] chr1 6157801-6158216 * | 6.06528 0.000925164 0.02104324
#> [2] chr1 5982949-5983094 * | 5.02309 0.015637885 0.12155357
#> [3] chr1 6259573-6259952 * | 5.14967 0.026711376 0.17136537
#> [4] chr1 6259573-6259952 * | 5.14967 0.026711376 0.17136537
#> [5] chr1 6385663-6386114 * | 6.95045 0.000177546 0.00954392
#> ... ... ... ... . ... ... ...
#> [1450] chr1 246583459-246584018 * | 5.095578 5.56120e-04 0.0172137
#> [1451] chr1 246570517-246570554 * | 0.737347 7.08808e-01 1.0000000
#> [1452] chr1 246608659-246609254 * | 3.355731 5.06296e-02 0.2651076
#> [1453] chr1 246617533-246617570 * | 0.822883 9.04117e-01 1.0000000
#> [1454] chr1 246617605-246618290 * | 5.616396 9.74797e-05 0.0067073
#> -------
#> seqinfo: 23 sequences from an unspecified genome
The choice of the threshold for DEG significance (DEG α) is important in DegCre calculations. Accordingly, DegCre can be run over a set of thresholds to determine an optimal value. Here we select DEG α by finding the value that maximizes a normalized Precision-Recall Area Under the Curve (AUC):
alphaOptList <- optimizeAlphaDegCre(DegGR=subDegGR,
DegP=subDegGR$pVal,
DegLfc=subDegGR$logFC,
CreGR=subCreGR,
CreP=subCreGR$pVal,
CreLfc=subCreGR$logFC,
reqEffectDirConcord=TRUE,
verbose=FALSE)
alphaOptList$alphaPRMat
#> alphaVal AUC deltaAUC normDeltaAUC
#> alpha_0.005 0.005 0.02226499 0.01486418 0.01497501
#> alpha_0.01 0.010 0.02358739 0.01570337 0.01582816
#> alpha_0.02 0.020 0.02805818 0.01909147 0.01926421
#> alpha_0.03 0.030 0.02897679 0.01955400 0.01974000
#> alpha_0.05 0.050 0.02764595 0.01865236 0.01882163
#> alpha_0.1 0.100 0.02628054 0.01779331 0.01794562
bestAlphaId <- which.max(alphaOptList$alphaPRMat[,4])
bestDegCreResList <- alphaOptList$degCreResListsByAlpha[[bestAlphaId]]
With an optimal DegCre result obtained, we can plot the association probabilities versus binned association distance to see the relationship between the two:
par(mai=c(0.8,1.6,0.4,0.1))
par(mgp=c(1.7,0.5,0))
par(cex.axis=1)
par(cex.lab=1.4)
par(bty="l")
par(lwd=2)
binStats <- plotDegCreAssocProbVsDist(degCreResList=bestDegCreResList,
assocProbFDRThresh=0.05)
The total number of associations passing a reasonable FDR (0.05 in the figure) drops fairly quickly as bin TSS to CRE distance increases (top panel). Similarly, the median association probability (black line in lower panel) drops with distance as well. The blue range is the interquartile range of the association probabilities. The red line in the above plot represents the null association probability, calculated as the probability if all CRE p-values in a given bin were identical.
The binning of the associations by distance is also important to the
operation of the DegCre algorithm. The choice of the bin size (number of
associations per bin) is done by an optimization heuristic that tries a
wide range of bin sizes and calculates the KS test statistic for each
bin’s CRE p-value distribution compared against the global (unbinned)
CRE p-value distribution. The algorithm finds the smallest bin size for
which the median KS-statistic is within a specified fraction,
fracMinKsMedianThresh
, of the range of that for all tested
bin sizes. We visualize the results of this process:
par(mai=c(0.8,0.6,0.4,0.1))
par(mgp=c(1.7,0.5,0))
par(cex.axis=1)
par(cex.lab=1.4)
par(bty="l")
par(lwd=1.5)
plotDegCreBinHeuristic(degCreResList=bestDegCreResList)
The red dot indicates the chosen bin size run with the default value of ‘fracMinKsMedianThresh = 0.2’.
Another way to evaluate the performance of the DegCre on a data set is to make a Precision-Recall (PR) curve. To calculate these values, we define as a perfect set of associations, i.e. one that would generate a PR Area Under the Curve (AUC) of 1, as containing associations to all significant DEGs with uniform association probabilities of 1. Such a set will not result from real data, but it serves as a useful benchmark. Comparison of relative values of PR AUCs for DegCre results will be most informative. We plot the PR curve for the example data:
par(mai=c(0.8,0.6,0.4,0.1))
par(mgp=c(1.7,0.5,0))
par(cex.axis=1)
par(cex.lab=1.4)
par(bty="l")
par(lwd=1.5)
degCrePRAUC(degCreResList=bestDegCreResList)
The black line is the performance of the DegCre associations and the red region is values obtained from random shuffles.
Since DegCre calculates association probabilities, one can estimate
the number of expected associations per significant DEG by summing the
total associations that pass a chosen FDR. DEGs with many expected
associations are more likely to have true associations. This can be done
with the getExpectAssocPerDEG()
function:
expectAssocPerDegDf <- getExpectAssocPerDEG(degCreResList=bestDegCreResList,
geneNameColName="GeneSymb",
assocAlpha=0.05)
The result is a DataFrame
with one entry per gene,
sorted by the expected number of DEGs, `expectAssocs’.
head(expectAssocPerDegDf)
#> DataFrame with 6 rows and 11 columns
#> promGeneName GeneSymb GeneID baseMean logFC pVal
#> <character> <character> <character> <numeric> <numeric> <numeric>
#> 1 ZBTB7B_2 ZBTB7B ENSG00000160685.12 442.688 0.956975 9.65209e-10
#> 2 NFIA_4 NFIA ENSG00000162599.14 2914.410 0.882264 8.26113e-12
#> 3 TGFBR3_3 TGFBR3 ENSG00000069702.9 1235.361 0.931604 1.59219e-17
#> 4 COA7_1 COA7 ENSG00000162377.5 1191.281 0.442608 7.74760e-06
#> 5 ZC3H12A_2 ZC3H12A ENSG00000163874.8 567.067 1.528620 1.71215e-34
#> 6 SHC1_2 SHC1 ENSG00000160691.17 7353.804 0.623721 1.76309e-08
#> pAdj expectAssocs nAssocs assocAlpha degAlpha
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 1 2.46032e-06 10.31544 33 0.05 0.03
#> 2 2.10576e-08 8.92411 36 0.05 0.03
#> 3 4.05850e-14 6.89431 24 0.05 0.03
#> 4 1.97486e-02 6.83219 25 0.05 0.03
#> 5 4.36426e-31 6.72306 18 0.05 0.03
#> 6 4.49412e-05 6.55873 21 0.05 0.03
This result can be plotted as a histogram by the function
plotExpectedAssocsPerDeg()
:
par(mai=c(0.8,0.6,0.4,0.1))
par(mgp=c(1.7,0.5,0))
par(cex.axis=1)
par(cex.lab=1.4)
par(bty="l")
plotExpectedAssocsPerDeg(expectAssocPerDegDf=expectAssocPerDegDf)
The dashed line indicates the median expected associations per DEG.
One of the genes with a high number of expected associations and also
a known target of NR3C1 (Glucocorticoid receptor) is ERRFI1.
Our test data DexNR3C1
is derived from ChIP-seq of NR3C1 in
cells treated with the powerful glucocorticoid, dexamethasone, so it is
not surprising to find it with many expected associations. To visualize
these associations, we can make a browser plot of the associations and
underlying data with plotBrowserDegCre()
which relies on
the plotgardener
package.
browserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList,
geneName="ERRFI1",
geneNameColName="GeneSymb",
CreSignalName="NR3C1",
panelTitleFontSize=8,
geneLabelFontsize=10,
plotXbegin=0.9,
plotWidth=5)
We now want to zoom in closer to the TSS of ERRFI1. To do
this we specify the region we want to plot as a GRanges
of
length 1:
We then supply the GRanges
to the
plotRegionGR
argument of
plotBrowserDegCre()
:
zoomBrowserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList,
plotRegionGR=zoomGR,
geneNameColName="GeneSymb",
CreSignalName="NR3C1",
panelTitleFontSize=8,
geneLabelFontsize=10,
plotXbegin=0.9,
plotWidth=5)
Since we did not specify geneName
it is not highlighted.
We can force highlights by supplying a DataFrame
of gene
names and colors to the geneHighlightDf
argument. This
DataFrame
should conform to the requirements of
plotgardener::plotGenes()
.
geneHighDf <- data.frame(gene=bestDegCreResList$DegGR$GeneSymb,
col="gray")
geneHighDf[which(geneHighDf$gene=="ERRFI1"),2] <- "black"
Then geneHighDf
is supplied to the
geneHighlightDf
argument to produce:
The list of results that DegCre()
produces has many
useful features for downstream analysis. However, conversion to other
formats will be useful in some situations. The function
convertdegCreResListToGInteraction()
will convert the
DegCre list to a GInteractions
object from the InteractionSet
package:
degCreGInter <-
convertdegCreResListToGInteraction(degCreResList=bestDegCreResList,
assocAlpha=0.05)
degCreGInter
#> GInteractions object with 713 interactions and 20 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 |
#> <Rle> <IRanges> <Rle> <IRanges> |
#> [1] chr1 6180123-6180124 --- chr1 6157801-6158216 |
#> [2] chr1 6259438-6259439 --- chr1 6259573-6259952 |
#> [3] chr1 6261063-6261064 --- chr1 6259573-6259952 |
#> [4] chr1 6419918-6419919 --- chr1 6412897-6413132 |
#> [5] chr1 6419918-6419919 --- chr1 6410395-6410720 |
#> ... ... ... ... ... ... .
#> [709] chr1 229626055-229626056 --- chr1 229585357-229585538 |
#> [710] chr1 229626055-229626056 --- chr1 229952647-229953008 |
#> [711] chr1 229626055-229626056 --- chr1 229273795-229274192 |
#> [712] chr1 229626055-229626056 --- chr1 230002525-230002670 |
#> [713] chr1 229626055-229626056 --- chr1 229221055-229222064 |
#> assocDist assocProb assocProbFDR rawAssocProb CreP DegP
#> <integer> <numeric> <numeric> <numeric> <numeric> <numeric>
#> [1] 21906 0.390791 3.72280e-12 0.390791 0.000925164 3.15116e-07
#> [2] 133 0.294180 7.15281e-09 0.294180 0.026711376 9.18081e-10
#> [3] 1110 0.294180 7.15281e-09 0.294180 0.026711376 9.18081e-10
#> [4] 6785 0.294180 6.34136e-09 0.294180 0.025518445 1.34403e-09
#> [5] 9197 0.256254 4.24725e-06 0.256254 0.276057988 1.34403e-09
#> ... ... ... ... ... ... ...
#> [709] 40516 0.263008 2.03990e-06 0.263008 0.131849573 1.50454e-06
#> [710] 326590 0.146415 4.08013e-02 0.146415 0.000816414 1.50454e-06
#> [711] 351862 0.130408 3.63957e-02 0.130408 0.017252454 1.50454e-06
#> [712] 376468 0.125265 2.68111e-02 0.125265 0.020953953 1.50454e-06
#> [713] 403990 0.140241 2.89873e-02 0.140241 0.000275504 1.50454e-06
#> DegPadj binAssocDist numObs distBinId Deg_promGeneName
#> <numeric> <numeric> <numeric> <integer> <character>
#> [1] 8.03231e-04 48582 2277 1 CHD5_1
#> [2] 2.34019e-06 48582 2277 1 GPR153_2
#> [3] 2.34019e-06 48582 2277 1 GPR153_1
#> [4] 3.42593e-06 48582 2277 1 HES2_1
#> [5] 3.42593e-06 48582 2277 1 HES2_1
#> ... ... ... ... ... ...
#> [709] 0.00383506 48582 2277 1 TAF5L_1
#> [710] 0.00383506 356289 2277 6 TAF5L_1
#> [711] 0.00383506 356289 2277 6 TAF5L_1
#> [712] 0.00383506 427723 2277 7 TAF5L_1
#> [713] 0.00383506 427723 2277 7 TAF5L_1
#> Deg_GeneSymb Deg_GeneID Deg_baseMean Deg_logFC Deg_pVal
#> <character> <character> <numeric> <numeric> <numeric>
#> [1] CHD5 ENSG00000116254.16 52.5996 1.50353 3.15116e-07
#> [2] GPR153 ENSG00000158292.6 98.8953 1.86950 9.18081e-10
#> [3] GPR153 ENSG00000158292.6 98.8953 1.86950 9.18081e-10
#> [4] HES2 ENSG00000069812.10 34.9662 2.05545 1.34403e-09
#> [5] HES2 ENSG00000069812.10 34.9662 2.05545 1.34403e-09
#> ... ... ... ... ... ...
#> [709] TAF5L ENSG00000135801.8 1120.88 0.459666 1.50454e-06
#> [710] TAF5L ENSG00000135801.8 1120.88 0.459666 1.50454e-06
#> [711] TAF5L ENSG00000135801.8 1120.88 0.459666 1.50454e-06
#> [712] TAF5L ENSG00000135801.8 1120.88 0.459666 1.50454e-06
#> [713] TAF5L ENSG00000135801.8 1120.88 0.459666 1.50454e-06
#> Deg_pAdj Cre_logFC Cre_pVal Cre_pAdj
#> <numeric> <numeric> <numeric> <numeric>
#> [1] 8.03231e-04 6.06528 0.000925164 0.0210432
#> [2] 2.34019e-06 5.14967 0.026711376 0.1713654
#> [3] 2.34019e-06 5.14967 0.026711376 0.1713654
#> [4] 3.42593e-06 3.27381 0.025518445 0.1662881
#> [5] 3.42593e-06 2.20441 0.276057988 0.7825750
#> ... ... ... ... ...
#> [709] 0.00383506 2.75821 0.131849573 0.5002293
#> [710] 0.00383506 6.04432 0.000816414 0.0202765
#> [711] 0.00383506 3.64238 0.017252454 0.1291889
#> [712] 0.00383506 4.93818 0.020953953 0.1462453
#> [713] 0.00383506 7.08167 0.000275504 0.0119885
#> -------
#> regions: 529 ranges and 0 metadata columns
#> seqinfo: 23 sequences from an unspecified genome
The DegCre results list can can also be converted to a
DataFrame
in roughly a BEDPE format. This is useful for
writing out DegCre results as text files:
degCreDf <- convertDegCreDataFrame(degCreResList=bestDegCreResList,
assocAlpha=0.05)
head(degCreDf)
#> Deg_chr Deg_start Deg_end Deg_strand Cre_chr Cre_start Cre_end Cre_strand
#> 1 chr1 6180123 6180124 - chr1 6157801 6158216 *
#> 2 chr1 6259438 6259439 - chr1 6259573 6259952 *
#> 3 chr1 6261063 6261064 - chr1 6259573 6259952 *
#> 4 chr1 6419918 6419919 - chr1 6412897 6413132 *
#> 5 chr1 6419918 6419919 - chr1 6410395 6410720 *
#> 6 chr1 6419918 6419919 - chr1 6385663 6386114 *
#> assocDist assocProb assocProbFDR rawAssocProb binAssocDist numObs distBinId
#> 1 21906 0.3907914 3.722800e-12 0.3907914 48582 2277 1
#> 2 133 0.2941803 7.152814e-09 0.2941803 48582 2277 1
#> 3 1110 0.2941803 7.152814e-09 0.2941803 48582 2277 1
#> 4 6785 0.2941803 6.341361e-09 0.2941803 48582 2277 1
#> 5 9197 0.2562537 4.247251e-06 0.2562537 48582 2277 1
#> 6 33803 0.4509649 1.902668e-09 0.4509649 48582 2277 1
#> Deg_promGeneName Deg_GeneSymb Deg_GeneID Deg_baseMean Deg_logFC
#> 1 CHD5_1 CHD5 ENSG00000116254.16 52.59957 1.503528
#> 2 GPR153_2 GPR153 ENSG00000158292.6 98.89534 1.869499
#> 3 GPR153_1 GPR153 ENSG00000158292.6 98.89534 1.869499
#> 4 HES2_1 HES2 ENSG00000069812.10 34.96616 2.055450
#> 5 HES2_1 HES2 ENSG00000069812.10 34.96616 2.055450
#> 6 HES2_1 HES2 ENSG00000069812.10 34.96616 2.055450
#> Deg_pVal Deg_pAdj Cre_logFC Cre_pVal Cre_pAdj
#> 1 3.151162e-07 8.032311e-04 6.065277 0.0009251642 0.021043243
#> 2 9.180810e-10 2.340188e-06 5.149673 0.0267113763 0.171365372
#> 3 9.180810e-10 2.340188e-06 5.149673 0.0267113763 0.171365372
#> 4 1.344031e-09 3.425935e-06 3.273808 0.0255184446 0.166288137
#> 5 1.344031e-09 3.425935e-06 2.204414 0.2760579875 0.782574964
#> 6 1.344031e-09 3.425935e-06 6.950452 0.0001775463 0.009543921
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] magick_2.8.5
#> [2] DegCre_1.3.0
#> [3] org.Hs.eg.db_3.20.0
#> [4] TxDb.Hsapiens.UCSC.hg38.knownGene_3.20.0
#> [5] GenomicFeatures_1.59.1
#> [6] AnnotationDbi_1.69.0
#> [7] plotgardener_1.13.0
#> [8] InteractionSet_1.35.0
#> [9] SummarizedExperiment_1.37.0
#> [10] Biobase_2.67.0
#> [11] MatrixGenerics_1.19.0
#> [12] matrixStats_1.4.1
#> [13] GenomicRanges_1.59.1
#> [14] GenomeInfoDb_1.43.2
#> [15] IRanges_2.41.2
#> [16] S4Vectors_0.45.2
#> [17] BiocGenerics_0.53.3
#> [18] generics_0.1.3
#> [19] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.3 bitops_1.0-9 rlang_1.1.4
#> [4] magrittr_2.0.3 compiler_4.4.2 RSQLite_2.3.9
#> [7] reshape2_1.4.4 png_0.1-8 vctrs_0.6.5
#> [10] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.3
#> [13] fastmap_1.2.0 XVector_0.47.0 utf8_1.2.4
#> [16] Rsamtools_2.23.1 rmarkdown_2.29 UCSC.utils_1.3.0
#> [19] strawr_0.0.92 purrr_1.0.2 bit_4.5.0.1
#> [22] xfun_0.49 zlibbioc_1.52.0 cachem_1.1.0
#> [25] jsonlite_1.8.9 blob_1.2.4 rhdf5filters_1.19.0
#> [28] DelayedArray_0.33.3 Rhdf5lib_1.29.0 BiocParallel_1.41.0
#> [31] parallel_4.4.2 R6_2.5.1 plyranges_1.27.0
#> [34] stringi_1.8.4 bslib_0.8.0 RColorBrewer_1.1-3
#> [37] rtracklayer_1.67.0 jquerylib_0.1.4 Rcpp_1.0.13-1
#> [40] knitr_1.49 splines_4.4.2 Matrix_1.7-1
#> [43] tidyselect_1.2.1 qvalue_2.39.0 abind_1.4-8
#> [46] yaml_2.3.10 codetools_0.2-20 curl_6.0.1
#> [49] plyr_1.8.9 lattice_0.22-6 tibble_3.2.1
#> [52] withr_3.0.2 KEGGREST_1.47.0 evaluate_1.0.1
#> [55] gridGraphics_0.5-1 Biostrings_2.75.2 pillar_1.9.0
#> [58] BiocManager_1.30.25 RCurl_1.98-1.16 ggplot2_3.5.1
#> [61] munsell_0.5.1 scales_1.3.0 glue_1.8.0
#> [64] maketools_1.3.1 tools_4.4.2 BiocIO_1.17.1
#> [67] sys_3.4.3 data.table_1.16.4 GenomicAlignments_1.43.0
#> [70] buildtools_1.0.0 fs_1.6.5 XML_3.99-0.17
#> [73] rhdf5_2.51.1 grid_4.4.2 colorspace_2.1-1
#> [76] GenomeInfoDbData_1.2.13 restfulr_0.0.15 cli_3.6.3
#> [79] fansi_1.0.6 S4Arrays_1.7.1 dplyr_1.1.4
#> [82] gtable_0.3.6 yulab.utils_0.1.8 sass_0.4.9
#> [85] digest_0.6.37 SparseArray_1.7.2 ggplotify_0.1.2
#> [88] rjson_0.2.23 memoise_2.0.1 htmltools_0.5.8.1
#> [91] lifecycle_1.0.4 httr_1.4.7 bit64_4.5.2