Title: | Probabilistic association of DEGs to CREs from differential data |
---|---|
Description: | DegCre generates associations between differentially expressed genes (DEGs) and cis-regulatory elements (CREs) based on non-parametric concordance between differential data. The user provides GRanges of DEG TSS and CRE regions with differential p-value and optionally log-fold changes and DegCre returns an annotated Hits object with associations and their calculated probabilities. Additionally, the package provides functionality for visualization and conversion to other formats. |
Authors: | Brian S. Roberts [aut, cre] |
Maintainer: | Brian S. Roberts <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0 |
Built: | 2024-12-14 03:08:29 UTC |
Source: | https://github.com/bioc/DegCre |
Given a DegCre results list, this function calculates the odds-ratio of the association probability.
calcAssocProbOR(degCreResList, type = "adj")
calcAssocProbOR(degCreResList, type = "adj")
degCreResList |
List of DegCre results. |
type |
A character of either |
This function is similar to calcRawAssocProbOR and will mimic its
function when type = "raw"
. When type = "raw"
the
calculation operates on the "rawAssocProb"
metadata.
When type = "adj"
the calculation operates on the "assocProb"
metadata.
The OR is calculated relative to the distance bin null association
probability, which would happen if all CRE p-values were identical.
Thus it is a measure of the increase in association probability due to
CRE p-value information content over what would occur by random chance.
A numeric of the association probability odds-ratios
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Calculate odds-ratio. calcOR <- calcAssocProbOR(degCreResListDexNR3C1)
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Calculate odds-ratio. calcOR <- calcAssocProbOR(degCreResListDexNR3C1)
Given a DegCre results list, this function calculates the raw association probability odds ratio (OR) for each association.
calcRawAssocProbOR(degCreResList)
calcRawAssocProbOR(degCreResList)
degCreResList |
List of DegCre results. |
This function calculates the raw association probability odds ratio (OR) for each association in a DegCre analysis output. The OR is calculated relative to the distance bin null association probability, which would happen if all CRE p-values were identical. Thus it is a measure of the increase in association probability due to CRE p-value information content over what would occur by random chance.
A numeric vector of raw association probability odds ratios (OR) for each association.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Calculate raw odds ratio. ORvec <- calcRawAssocProbOR(degCreResListDexNR3C1)
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Calculate raw odds ratio. ORvec <- calcRawAssocProbOR(degCreResListDexNR3C1)
Given a DegCre results list, this function finds associations between the same CRE and multiple TSSs of the same gene and keeps the nearest TSS only.
collapseDegCreToGene( degCreResList, method = "nearest", geneColname = "GeneSymb" )
collapseDegCreToGene( degCreResList, method = "nearest", geneColname = "GeneSymb" )
degCreResList |
List of DegCre results. |
method |
Method for choosing between multiple TSS. Currently only
supported for "nearest". (Default: |
geneColname |
The name of the metadata column in DegGR within
|
Often, the DegGR input to DegCre will contain multiple TSS's for a given gene. DegCre will create associations to all of them. Often, downstream analyses need to only have on CRE to TSS association per gene. This function modifies the DegCre Hits to keep the shortest (minimum) genomic distance association.
A degCreResList
with only the shortest TSS to CRE
association per gene.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Convert to single TSS per association. degCreResListUniqTSS <- collapseDegCreToGene(degCreResListDexNR3C1, method = "nearest", geneColname = "GeneSymb")
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Convert to single TSS per association. degCreResListUniqTSS <- collapseDegCreToGene(degCreResListDexNR3C1, method = "nearest", geneColname = "GeneSymb")
Converts a degCreResList to a hit indexed GRanges with metadata columns of predicted gene target ("predictGene") and the association score ("score").
convDegCreResListToCreGeneScoreGR( degCreResList, scoreType = "assocProb", geneColname = "GeneSymb", onlyDEGs = TRUE, DEgAlpha = NULL, degPadjColname = "pAdj" )
convDegCreResListToCreGeneScoreGR( degCreResList, scoreType = "assocProb", geneColname = "GeneSymb", onlyDEGs = TRUE, DEgAlpha = NULL, degPadjColname = "pAdj" )
degCreResList |
A |
scoreType |
Character of one of |
geneColname |
The name of the metadata column in |
onlyDEGs |
Logical. If |
DEgAlpha |
Significance threshold for DEGs in associations to report.
(Default: |
degPadjColname |
The metadata column name in |
This function simplifies a degResList
to a
GRanges with metadata of the predicted associated gene
and a score of that prediction (predictScore
).
If scoreType = "assocProb"
, the score is the DegCre association
probability. By setting scoreType = "adjOR"
the odds-ratio of the
association probability is calculated with calcAssocProbOR and
reported. This can be useful to emphasize associations
that are greatly above background and will tend to weight longer distance
associations more heavily. By setting scoreType = "rawAssocProb"
the
raw association probability is used. This is the probability of association
without considering the probability that the involved DEG is truly
differential. This score may be useful when comparing to CRISPRi validation
in the absence of a perturbation.
A GRanges of CRE regions with metadata columns
of predictGene
and predictScore
.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Convert to GR with assocProb as score. creToGeneGR <- convDegCreResListToCreGeneScoreGR(degCreResListDexNR3C1, scoreType="assocProb", geneColname="GeneSymb", onlyDEGs=TRUE, DEgAlpha=NULL, degPadjColname="pAdj")
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Convert to GR with assocProb as score. creToGeneGR <- convDegCreResListToCreGeneScoreGR(degCreResListDexNR3C1, scoreType="assocProb", geneColname="GeneSymb", onlyDEGs=TRUE, DEgAlpha=NULL, degPadjColname="pAdj")
Given a DegCre results list, this function converts it into a DataFrame for further analysis and export.
convertDegCreDataFrame(degCreResList, assocAlpha = 0.05)
convertDegCreDataFrame(degCreResList, assocAlpha = 0.05)
degCreResList |
List of DegCre results. |
assocAlpha |
The significance threshold for associations to be included
in the output (Default: |
This function takes a DegCre results list as input and extracts the
significant associations based on the adjusted p-values assocProbFDR
compared to the specified significance threshold assocAlpha
.
It then creates a DataFrame with the genomic coordinates
of the significant associations from both the DegGR
and CreGR
components of the input list.
These are marked as Deg_
or Cre_
with chr
, start
,
end
, and strand
.
The coordinates are followed by the metadata of the Hits
DataFramed by runDegCre.
These are then followed by all metadata columns in the input DegGR
or
CreGR
proceeded by either Deg_
or Cre_
in the colname.
If no associations pass the significance threshold,
the function returns NA
.
A DataFrame containing the significant associations that pass the specified significance threshold. It is roughly in BEDPE format.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Create DataFrame. outDf <- convertDegCreDataFrame(degCreResList=degCreResListDexNR3C1, assocAlpha = 0.05) #Write out as text file. degCreDfFile <- tempfile(pattern="myDegCreResults",fileext=".tsv") write.table(outDf,file=degCreDfFile[1],sep="\t",row.names=FALSE,quote=FALSE) unlink(degCreDfFile[1])
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Create DataFrame. outDf <- convertDegCreDataFrame(degCreResList=degCreResListDexNR3C1, assocAlpha = 0.05) #Write out as text file. degCreDfFile <- tempfile(pattern="myDegCreResults",fileext=".tsv") write.table(outDf,file=degCreDfFile[1],sep="\t",row.names=FALSE,quote=FALSE) unlink(degCreDfFile[1])
Given a DegCre results list, this function converts it into a GInteractions object.
convertdegCreResListToGInteraction(degCreResList, assocAlpha = 0.05)
convertdegCreResListToGInteraction(degCreResList, assocAlpha = 0.05)
degCreResList |
List of DegCre results. |
assocAlpha |
The significance threshold for associations to be included
in the output (Default: |
This function takes a DegCre results list as input and extracts the
significant associations based on the assocProbFDR
compared to the
specified significance threshold assocAlpha
. It then creates a
GInteractions object metadata columns from the input
list.
If no associations pass the significance threshold, the function returns NA' and prints a message.
A GInteractions object containing the associations that pass the specified significance threshold.
The GInteractions object has same metadata columns as
the Hits returned from runDegCre with additional
columns.
These additional columns are every metadata column in the input DegGR
or CreGR
proceeded by either Deg_
or Cre_
in the
colname.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Create GInteractions object. gInteractions <- convertdegCreResListToGInteraction(degCreResList=degCreResListDexNR3C1, assocAlpha = 0.01)
#Load required packages. library(GenomicRanges) #Load test data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Create GInteractions object. gInteractions <- convertdegCreResListToGInteraction(degCreResList=degCreResListDexNR3C1, assocAlpha = 0.01)
Probabilistic association of DEGs to CREs from differential data.
Maintainer: Brian S. Roberts [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/brianSroberts/DegCre/issues
This function calculates the Precision-Recall Area Under the Curve (AUC) from a DegCre results list.
degCrePRAUC( degCreResList, makePlot = TRUE, nShuff = 100, alphaVal = degCreResList$alphaVal, nThresh = 200 )
degCrePRAUC( degCreResList, makePlot = TRUE, nShuff = 100, alphaVal = degCreResList$alphaVal, nThresh = 200 )
degCreResList |
A list of DegCre results. |
makePlot |
Logical indicating whether to generate a plot of the
Precision-Recall curve. (Default: |
nShuff |
Integer number of shuffles for no-skill curve.
(Default: |
alphaVal |
Numeric from 0 to 1 threshold alpha value of DEG
significance. (Default: alpha value from |
nThresh |
Integer number of threshold values for the Precision-Recall
curve. (Default: |
This function calculates the Precision-Recall curve and AUC based on the provided DegCre results. It also estimates the statistical significance of the AUC by shuffling the associations and calculating AUC for shuffled data. Note that the PR AUCs tend to be small (0.05-0.2). Under the calculation framework, a PR AUC of 1 could only be achieved from DegCre results in which every association involves a significant DEG and has an association probability of 1. This situation will never actually occur but serves as a theoretical optimum for comparison.
Invisibly, a list containing:
A matrix of actual True Positive Rate (TPR) and apparent Positive Predictive Value (PPV).
A matrix of shuffled TPR quantiles.
A matrix of shuffled PPV quantiles.
Numeric of the total Area Under the Curve (AUC) for the Precision-Recall curve.
Numeric of the difference in AUC between the actual curve and shuffled curves.
Numeric of the normalized difference in AUC.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load sample data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Plot PR curve. degCrePRAUC(degCreResList=degCreResListDexNR3C1) #Get PR results with out plotting. prAUCList <- degCrePRAUC(degCreResList=degCreResListDexNR3C1, makePlot=FALSE)
#Load required packages. library(GenomicRanges) #Load sample data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Plot PR curve. degCrePRAUC(degCreResList=degCreResListDexNR3C1) #Get PR results with out plotting. prAUCList <- degCrePRAUC(degCreResList=degCreResListDexNR3C1, makePlot=FALSE)
DegCre input data for examples.
A named list with two slots: DegGR and CreGR.
GRanges of RNA-seq data. The coordinates reference TSS sites. It has the following mcols:
GRanges of differntial CRE data. The coordinates reference signal regions. It has the following mcols:
This is a list with two slots: DegGR and CreGR. This data was derived from work by McDowell et al. in which they generated RNA-seq and ChIP-seq data by treating A549 cells with dexamethasone at several time points. Specifically this is RNA-seq and NR3C1 ChIP-seq at four hours versus control.
Brian S. Roberts
https://genome.cshlp.org/content/28/9/1272
This function finds associations and distances between two sets of genomic regions.
getAssocDistHits(DegGR, CreGR, maxDist = 1e+06)
getAssocDistHits(DegGR, CreGR, maxDist = 1e+06)
DegGR |
A GRanges object representing DEG TSSs. |
CreGR |
A GRanges object representing the CREs. |
maxDist |
Integer value representing the maximum distance allowed for
associations. Regions further apart than this threshold will not be
associated. (Default: |
This function identifies associations between genomic regions from two
GenomicRanges objects (DegGR and CreGR) based on their spatial overlap
within a specified maximum distance threshold using
findOverlaps. It calculates the distances with
distance between associated regions and stores them
in the metadata of the Hits object.
Large values maxDist
will require more computational resources.
A Hits object containing associations and their distances between the genomic regions represented by DegGR and CreGR.
Brian S. Roberts
#Load sample data. data(DexNR3C1) # Get hits with association distances. hits <- getAssocDistHits(DegGR = DexNR3C1$DegGR, CreGR = DexNR3C1$CreGR, maxDist = 1e6)
#Load sample data. data(DexNR3C1) # Get hits with association distances. hits <- getAssocDistHits(DegGR = DexNR3C1$DegGR, CreGR = DexNR3C1$CreGR, maxDist = 1e6)
Calculates the null association probability for each distance bin in the DegCre analysis.
getDistBinNullAssocProb(degCreResList)
getDistBinNullAssocProb(degCreResList)
degCreResList |
A list of DegCre results. |
This function takes the results of the DegCre analysis and computes the null association probability for each unique distance bin. The null association probability represents the expected proportion of differentially expressed genes (DEGs) in each distance bin under the null hypothesis.
A matrix with these columns:
Numeric value representing the distance bin (TSS to CRE) in base pairs.
Numeric value representing the null association probability of the bin.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) # Calculate null association probabilities. outNullMat <- getDistBinNullAssocProb(degCreResList = degCreResListDexNR3C1)
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) # Calculate null association probabilities. outNullMat <- getDistBinNullAssocProb(degCreResList = degCreResListDexNR3C1)
Calculates the expected associations per DEG (Differentially Expressed Gene).
getExpectAssocPerDEG(degCreResList, geneNameColName = NULL, assocAlpha = 0.05)
getExpectAssocPerDEG(degCreResList, geneNameColName = NULL, assocAlpha = 0.05)
degCreResList |
A list of DegCre results. |
geneNameColName |
Character value of the name of the column in DegGR
(that was inputted to runDegCre) that contains gene names. If NULL,
the function will attempt to automatically find the gene name column.
(Default: |
assocAlpha |
Numeric value from 0 to 1 specifying the significance
threshold for associations. (Default: |
This function calculates the expected associations per DEG based on DegCre
analysis results. It first filters significant associations based on the
provided association significance threshold (assocAlpha
) and then
computes the expected associations per gene. The function returns a
DataFrame with gene-level information, including expected
associations, number of associations, and significance thresholds.
A DataFrame with the all data in the input DegGR with these columns added:
Character values of gene names extracted from
geneNameColName
column (or column found if
geneNameColName = NULL
) in DegGR.
Numeric values of the expected associations per gene.
Integer value of the number of associations passing
assocAlpha
per gene.
Numeric value from 0 to 1 of input assocAlpha
Numeric value from 0 to 1 of the significance threshold
for DEGs. Obtained from degCreResList
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) # Get expected associations per DEG expectAssocsDf <- getExpectAssocPerDEG(degCreResList = degCreResListDexNR3C1, geneNameColName = "GeneSymb", assocAlpha = 0.05) head(expectAssocsDf)
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) # Get expected associations per DEG expectAssocsDf <- getExpectAssocPerDEG(degCreResList = degCreResListDexNR3C1, geneNameColName = "GeneSymb", assocAlpha = 0.05) head(expectAssocsDf)
Runs DegCre across a set of DEG alpha thesholds to find optimal performance.
optimizeAlphaDegCre( DegGR, DegP, DegLfc = NULL, CreGR, CreP, CreLfc = NULL, reqEffectDirConcord = TRUE, padjMethod = "bonferroni", maxDist = 1e+06, verbose = FALSE, smallestTestBinSize = 100, fracMinKsMedianThresh = 0.2, testedAlphaVals = c(0.005, 0.01, 0.02, 0.03, 0.05, 0.1), minNDegs = 5 )
optimizeAlphaDegCre( DegGR, DegP, DegLfc = NULL, CreGR, CreP, CreLfc = NULL, reqEffectDirConcord = TRUE, padjMethod = "bonferroni", maxDist = 1e+06, verbose = FALSE, smallestTestBinSize = 100, fracMinKsMedianThresh = 0.2, testedAlphaVals = c(0.005, 0.01, 0.02, 0.03, 0.05, 0.1), minNDegs = 5 )
DegGR |
A GRanges object of gene TSSs. Multiple TSSs per gene are allowed. |
DegP |
A numeric vector of differential expression p-values for genes
in |
DegLfc |
A numeric vector of log fold-change values of differential
expression for gene in |
CreGR |
A GRanges object of CRE regions. |
CreP |
A numeric vector differential signal p-values for regions in
|
CreLfc |
A numeric vector log fold-change values of differential
signal for regions in |
reqEffectDirConcord |
A logical whether to require concordance between
the effect direction between DEG and CRE differential values.
(Default: |
padjMethod |
A character value indicating the method for p-value
adjustment. Do not change from default under most circumstances. Can be any
method name accepted by |
maxDist |
An integer value specifying the maximum distance for
probability calculation of TSS to CRE associations. (Default: |
verbose |
A logical indicating whether to print messages of step
completion and algorithm results. (Default: |
smallestTestBinSize |
An integer value specifying the size
(number of elements) of the smallest distance bin to be considered in the
optimization algorithm. (Default: |
fracMinKsMedianThresh |
A numeric value between 0 and 1 specifying the
optimization criterion for the distance bin size algorithm (See Details).
(Default: |
testedAlphaVals |
A numeric vector of DEG alpha values to test
(Default: |
minNDegs |
An integer specifying minimum number of DEGs that pass
the lowest |
This function runs runDegCre for each value in testedAlphaVals
.
The performance at each tested alpha is evaluated with degCrePRAUC.
which generates a Precision-Recall curve based on the recovery rate of DEGs
by associations.
Various AUCs are calculated as performance metrics. Using the alpha with
the highest value of
normDeltaAUC
is recommended (see Examples).
A named list containing:
A matrix of Precision-Recall Area Under the Curve (AUC) values.
Named list of DegCre results lists indexed
by the testedAlphaVals
.
The columns of alphaPRMat
are:
Numeric vector of tested DEG alpha value.
Numeric vector of Area under the curve of a Precision-Recall (PR) curve based on associations recovering significant DEGs.
Numeric vector of PR AUC minus the AUC of the no-skill line.
Numeric vector of the value of deltaAUC
divided by one minus the no-skill AUC.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load sample data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] # Run DegCre over range of alpha values: alphaOptList <- optimizeAlphaDegCre(DegGR = subDegGR, DegP = subDegGR$pVal, DegLfc = subDegGR$logFC, CreGR = subCreGR, CreP = subCreGR$pVal, CreLfc = subCreGR$logFC) bestAlphaId <- which.max(alphaOptList$alphaPRMat[,4]) bestDegCreResList <- alphaOptList$degCreResListsByAlpha[[bestAlphaId]]
#Load required packages. library(GenomicRanges) #Load sample data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] # Run DegCre over range of alpha values: alphaOptList <- optimizeAlphaDegCre(DegGR = subDegGR, DegP = subDegGR$pVal, DegLfc = subDegGR$logFC, CreGR = subCreGR, CreP = subCreGR$pVal, CreLfc = subCreGR$logFC) bestAlphaId <- which.max(alphaOptList$alphaPRMat[,4]) bestDegCreResList <- alphaOptList$degCreResListsByAlpha[[bestAlphaId]]
Creates browser plots of specified genomic regions or gene regions based on the provided DegCre analysis results.
plotBrowserDegCre( degCreResList, assocAlpha = 0.05, browserWinPad = 1000, geneName = NULL, plotRegionGR = NULL, CreSignalName = "CRE", assembly = "hg38", plotWidth = grDevices::dev.size("in")[1], plotHeight = grDevices::dev.size("in")[2], plotXbegin = 0.9, mergeGenePromotersDist = 1000, sigPlotMaxY = 4, assocColorRange = NULL, lowAssocColor = "#88CCEE", hiAssocColor = "#CC6677", signalColor = "#DDCC77", geneLabelFontsize = 8, axisFontSize = 6, panelTitleFontSize = 7, geneNameColName = NULL, geneHighlightDf = NULL, dePrioritizeSmallRNA = TRUE, useLogFC = TRUE, creSignalBinRes = 100 )
plotBrowserDegCre( degCreResList, assocAlpha = 0.05, browserWinPad = 1000, geneName = NULL, plotRegionGR = NULL, CreSignalName = "CRE", assembly = "hg38", plotWidth = grDevices::dev.size("in")[1], plotHeight = grDevices::dev.size("in")[2], plotXbegin = 0.9, mergeGenePromotersDist = 1000, sigPlotMaxY = 4, assocColorRange = NULL, lowAssocColor = "#88CCEE", hiAssocColor = "#CC6677", signalColor = "#DDCC77", geneLabelFontsize = 8, axisFontSize = 6, panelTitleFontSize = 7, geneNameColName = NULL, geneHighlightDf = NULL, dePrioritizeSmallRNA = TRUE, useLogFC = TRUE, creSignalBinRes = 100 )
degCreResList |
List of DegCre results. |
assocAlpha |
Numeric value from 0 to 1 of significance threshold for
associations. (Default: |
browserWinPad |
Numeric value of the padding size (in base pairs) to
extend the plotting region. (Default: |
geneName |
Character of name of the gene of interest. If specified,
the function will plot the region associated with this gene.
(Default: |
plotRegionGR |
GRanges of length 1 specifying the
region to plot. If provided, geneName is ignored. (Default: |
CreSignalName |
Character of name of the differential CRE signal track.
For plot labeling purposes only (Default: |
assembly |
Character of genome assembly name, e.g., "hg38". Must be one
of accepted inputs to |
plotWidth |
Numeric value of width of the browser plot in inches.
(Default: |
plotHeight |
Numeric value of height of the browser plot in inches.
(Default: |
plotXbegin |
Numeric value of the width of the left margin (where
track annotations will be) in inches. (Default: |
mergeGenePromotersDist |
Maximum distance (in base pairs) for merging
promoters of the same gene in plot. (Default: |
sigPlotMaxY |
Numeric value of maximum value for the CRE differential
signal plot (Y-axis). (Default: |
assocColorRange |
Numeric vector of values from 0 to 1 of length 2.
These values specify the lower and upper values of DegCre association
probabilities for color saturation for arch color. (Default: |
lowAssocColor |
Character color for low saturation point of association
probabilities in arches plot. (Default: |
hiAssocColor |
Character color for high saturation point of association
probabilities in arches plot. (Default: |
signalColor |
Character color for the CRE differential signal plot.
(Default: |
geneLabelFontsize |
Numeric of font size (as implemented in
plotgardener)
for gene labels. (Default: |
axisFontSize |
Numeric of font size for axis labels and tick marks.
(Default: |
panelTitleFontSize |
Numeric of font size for panel titles.
(Default: |
geneNameColName |
Character of name of the column in DegGR metadata
that was inputted to runDegCre that contains gene names to query by
|
geneHighlightDf |
DataFrame specifying genes to
highlight in the plot as accepted by plotGenes argument
|
dePrioritizeSmallRNA |
Logical, indicating whether small RNA genes
should be deprioritized in plotting. (Default: |
useLogFC |
Logical, indicating whether to use log-fold change values
for the CRE differential signal. (Default: |
creSignalBinRes |
Bin resolution in base pairs for the CRE signal
track. Only used for initial calculation and will likely differ from display
resolution. (Default: |
This function uses
plotgardener
functionality to generate browser plots for visualizing DegCre analysis
results in specified regions. The user can input genomic regions or gene
names.
The output plot consists of an arches plot made with
plotPairsArches of DegCre associations colored by
association probability.
Below is a signal track plot made via plotSignal of the
data in CreGR
that was inputted to runDegCre.
This plot displays the signed negative log p-value, meaning the
multiplied by the sign of the the CRE log
fold-change.
Beneath this panel genomic coordinates via
plotGenomeLabel and gene models via
plotGenes are displayed.
Invisibly, a named list containing:
GRanges of the plotted region.
GRanges of the CRE signal (signed negative log CRE p-value) across the plotted region.
GInteractions of the DegCre associations in the plotted region.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Make browser plot from specified gene name. browserOuts <- plotBrowserDegCre(degCreResList=degCreResListDexNR3C1, geneName="ERRFI1", geneNameColName="GeneSymb", CreSignalName="NR3C1") dev.off() #Make plot of specified region. zoomGR <- GenomicRanges::GRanges(seqnames="chr1", ranges=IRanges(start=7900e3,end=8400e3)) zoomedBrowserOuts <- plotBrowserDegCre(degCreResList=degCreResListDexNR3C1, plotRegionGR=zoomGR, geneNameColName="GeneSymb", CreSignalName="NR3C1") dev.off()
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Make browser plot from specified gene name. browserOuts <- plotBrowserDegCre(degCreResList=degCreResListDexNR3C1, geneName="ERRFI1", geneNameColName="GeneSymb", CreSignalName="NR3C1") dev.off() #Make plot of specified region. zoomGR <- GenomicRanges::GRanges(seqnames="chr1", ranges=IRanges(start=7900e3,end=8400e3)) zoomedBrowserOuts <- plotBrowserDegCre(degCreResList=degCreResListDexNR3C1, plotRegionGR=zoomGR, geneNameColName="GeneSymb", CreSignalName="NR3C1") dev.off()
Plots the DegCre association probability against binned genomic distance and highlights the quantile range.
plotDegCreAssocProbVsDist( degCreResList, assocProbFDRThresh = 0.05, plotQRange = c(0.25, 0.75), hiYLim = NULL, loYLim = NULL, qRangeFillColor = "#88CCEE", nullLineColor = "#CC6677" )
plotDegCreAssocProbVsDist( degCreResList, assocProbFDRThresh = 0.05, plotQRange = c(0.25, 0.75), hiYLim = NULL, loYLim = NULL, qRangeFillColor = "#88CCEE", nullLineColor = "#CC6677" )
degCreResList |
A list of DegCre results. |
assocProbFDRThresh |
Numeric value from 0 to 1 specifying the FDR
threshold for association probability. (Default: |
plotQRange |
Numeric vector of quantile range for plotting
(e.g., |
hiYLim |
Numeric value specifying the upper limit of the y-axis.
(Default: |
loYLim |
Numeric value specifying the lower limit of the y-axis.
(Default: |
qRangeFillColor |
Color for filling the quantile range polygon.
(Default: |
nullLineColor |
Color for the null association probability line.
(Default: |
This function takes the results of the DegCre analysis, including genomic
distances and association probabilities, and creates a plot of association
probabilities against binned genomic distances. It highlights the quantile
range (e.g., interquartile range) and includes a line for null association
probabilities.
The top panel shows the number of associations passing
assocProbFDRThresh
. The bottom panel shows the median FDR-passing
association probability as a black line, with the specified quantile range
(defaults to interquartile) plotted as qRangeFillColor
region.
The nullLineColor
colored line is the null association probability,
that is the association probability for a bin with uniform CRE p-values.
Invisibly, a matrix with these columns:
Numeric value of the midpoint distance of the bin (TSS to CRE) in kb.
plotQRange[1] x 100
>Numeric value of lower bound of the highlight region.
Numeric value of plotted line.
plotQRange[2] x 100
>Numeric value of upper bound of the highlight region.
Numeric of null association probability of the bin.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Plot association probability versus binned genomic distance. outProbVsDistMat <- plotDegCreAssocProbVsDist(degCreResList=degCreResListDexNR3C1)
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Plot association probability versus binned genomic distance. outProbVsDistMat <- plotDegCreAssocProbVsDist(degCreResList=degCreResListDexNR3C1)
Plots the DegCre distance bin optimization statistic against different bin sizes, highlighting the optimal bin size.
plotDegCreBinHeuristic(degCreResList)
plotDegCreBinHeuristic(degCreResList)
degCreResList |
A list of DegCre results. |
This function takes a DegCre results list and plots the bin heuristic statistics against different bin sizes. It also highlights the optimal bin size chosen based on the analysis. The y-axis of the plot is the median KS statistic of all bins versus the global CRE p-value distribution.
Invisibly, the picked optimal bin size.
Brian S. Roberts
distBinHeuristic for calculating the DEG-CRE bin heuristic.
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Plot distance bin median KS statistic curve. plotDegCreBinHeuristic(degCreResList=degCreResListDexNR3C1)
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #Plot distance bin median KS statistic curve. plotDegCreBinHeuristic(degCreResList=degCreResListDexNR3C1)
Plots a histogram of the expected number of associations per DEG (Differentially Expressed Gene) based on DegCre analysis.
plotExpectedAssocsPerDeg( expectAssocPerDegDf, barOutlineColor = "#88CCEE", barFillColor = NULL, extraText = FALSE )
plotExpectedAssocsPerDeg( expectAssocPerDegDf, barOutlineColor = "#88CCEE", barFillColor = NULL, extraText = FALSE )
expectAssocPerDegDf |
DataFrame output of getExpectAssocPerDEG. |
barOutlineColor |
Color for the outline of the histogram bars.
(Default: |
barFillColor |
Fill color for the histogram bars. If |
extraText |
Logical, indicating whether additional text information (Details) should be added to the plot. |
This function generates a histogram of the expected number of associations per DEG and optionally adds additional text information to the plot, such as DEG FDR, association FDR, and the fraction of DEGs with at least one association. Plot displays a dashed line a value indicating the median expected associations per DEG.
Invisibly, the median expected associations per DEG.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) # Generate data frame of expected associations per DEG expectAssocPerDegDf <- getExpectAssocPerDEG(degCreResList = degCreResListDexNR3C1, geneNameColName = "GeneSymb", assocAlpha = 0.05) # Plot histogram of expected associations per DEG medianExpAssocs <- plotExpectedAssocsPerDeg(expectAssocPerDegDf, barOutlineColor = "blue", extraText = TRUE)
#Load required packages. library(GenomicRanges) #Load example data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #Generate DegCre results. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) # Generate data frame of expected associations per DEG expectAssocPerDegDf <- getExpectAssocPerDEG(degCreResList = degCreResListDexNR3C1, geneNameColName = "GeneSymb", assocAlpha = 0.05) # Plot histogram of expected associations per DEG medianExpAssocs <- plotExpectedAssocsPerDeg(expectAssocPerDegDf, barOutlineColor = "blue", extraText = TRUE)
Create DEG to CRE associations from differential data.
runDegCre( DegGR, DegP, DegLfc = NULL, CreGR, CreP, CreLfc = NULL, reqEffectDirConcord = TRUE, padjMethod = "qvalue", maxDist = 1e+06, verbose = TRUE, smallestTestBinSize = 100, fracMinKsMedianThresh = 0.2, alphaVal = 0.01, binNOverride = NULL )
runDegCre( DegGR, DegP, DegLfc = NULL, CreGR, CreP, CreLfc = NULL, reqEffectDirConcord = TRUE, padjMethod = "qvalue", maxDist = 1e+06, verbose = TRUE, smallestTestBinSize = 100, fracMinKsMedianThresh = 0.2, alphaVal = 0.01, binNOverride = NULL )
DegGR |
A GRanges object of gene TSSs. Multiple TSSs per gene are allowed. |
DegP |
A numeric vector of differential expression p-values for genes
in |
DegLfc |
A numeric vector of log fold-change values of differential
expression for gene in |
CreGR |
A GRanges object of CRE regions. |
CreP |
A numeric vector differential signal p-values for regions in
|
CreLfc |
A numeric vector log fold-change values of differential signal
for regions in |
reqEffectDirConcord |
A logical whether to require concordance between
the effect direction between DEG and CRE differential values.
(Default: |
padjMethod |
A character value indicating the method for p-value
adjustment. Do not change from default under most circumstances. Can be any
method name accepted by p.adjust (Default: |
maxDist |
An integer value specifying the maximum distance for
probability calculation of TSS to CRE associations. (Default: |
verbose |
A logical indicating whether to print messages of step
completion and algorithm results. (Default: |
smallestTestBinSize |
An integer value specifying the size
(number of elements) of the smallest distance bin to be considered in the
optimization algorithm. (Default: |
fracMinKsMedianThresh |
A numeric value between 0 and 1 specifying the
optimization criterion for the distance bin size algorithm (See Details).
(Default: |
alphaVal |
A numeric value between 0 and 1 specifying the alpha value
for DEG significance. (Default: |
binNOverride |
An integer value specifying the number of elements per
distance bin. When specified, overrides distance bin size optimization
(Not recommended). (Default: |
The DegCre algorithm considers experimental data from a perturbation experiment and produces associations between cis-regulatory elements (CREs) and differentially expressed genes (DEGs). The user provides differential expression data such as RNA-seq, and differential regulatory signal data such as ATAC-seq, DNase Hypersensitivity, and ChIP-seq. For RNA-seq analysis, we suggest methods such as DESeq2 or edgeR. For the analysis of differential regulatory data we recommend csaw. As an example experiment, we use data from McDowell et al. (PMID = 30097539) in which A549 cells were treated with dexamethasone and control. RNA-seq and ChIP-seq data were collected at various time points.
A complete description of the mathematical basis of the DegCre core
algorithms is provided in
DegCre bioRxiv.
DegCre takes two inputs. The first is a GRanges of p-values and optionally
log fold-changes associated with DEG TSSs.
The second input is a GRanges of differential signal p-values and optionally
log fold-changes for CRE regions.
DegCre generates a Hits object of all associations between
DEG TSSs and CREs within maxDist
.
Associations are then binned by TSS-to-CRE distance according to an
algorithm that balances resolution (many bins with few members)
versus minimization of the deviance of each bin's CRE p-value distribution
from the global distribution, seleting an optimal bin size.
Next, DegCre applies a non-parametric algorithm to find concordance between and CRE differential effects within bins and derives an association probability. For all association probabilities involving one given CRE, the probabilities are adjusted to favor associations across shorter distances. An FDR of the association probability is then estimated. Results are returned in list containing a Hits object and both input GRanges.
A named list containing:
A Hits object with metadata.
The queryHits of
degCreHits
reference DegGR
.
The subjectHits of
degCreHits
reference CreGR
List of outputs from the distance binning algorithm.
Numeric alpha value used for DEG significance threshold.
GRanges of input DegGR
with
added metadata columns "pVal", "pAdj",and possibly "logFC"
if reqEffectDirConcord==TRUE
. Will overwrite existing metadata
with same colnames.
GRanges of input CreGR
with
added metadata columns "pVal", "pAdj",and possibly "logFC" if
reqEffectDirConcord==TRUE
. Will overwrite existing metadata with
same colnames.
The degCreHits Hits object metadata has these columns:
Integer of distance in base pairs between the TSS and CRE for the association.
Numeric from 0 to 1 of association probability.
Numeric from 0 to 1 of False discovery rate of the association probability exceeding distance only null.
Numeric from 0 to 1 of association probability not adjusted for DEG significance or shorter associations involving this CRE.
Numeric of differential p-value of the CRE.
Numeric of differential p-value of the DEG.
Numeric of differential adjusted p-value of the DEG.
Integer of the maximum association distance cutoff for the bin containing the association.
Integer number of associations in the distance bin containing the association.
Integer that uniquely identifies the distance containing the association.
Brian S. Roberts
#Load required packages. library(GenomicRanges) #Load sample data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #With defaults. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #With custom settings. modDegCreResList <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, CreGR=subCreGR, CreP=subCreGR$pVal, reqEffectDirConcord=FALSE, maxDist=1e5, alphaVal=0.001)
#Load required packages. library(GenomicRanges) #Load sample data. data(DexNR3C1) subDegGR <- DexNR3C1$DegGR[which(GenomeInfoDb::seqnames(DexNR3C1$DegGR)=="chr1")] subCreGR <- DexNR3C1$CreGR[which(GenomeInfoDb::seqnames(DexNR3C1$CreGR)=="chr1")] #With defaults. degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, DegLfc=subDegGR$logFC, CreGR=subCreGR, CreP=subCreGR$pVal, CreLfc=subCreGR$logFC) #With custom settings. modDegCreResList <- runDegCre(DegGR=subDegGR, DegP=subDegGR$pVal, CreGR=subCreGR, CreP=subCreGR$pVal, reqEffectDirConcord=FALSE, maxDist=1e5, alphaVal=0.001)