Title: | Search for correlation between epigenetic signals and gene expression in TADs |
---|---|
Description: | The package is focused on the detection of correlation between expressed genes and selected epigenomic signals (i.e. enhancers obtained from ChIP-seq data) either within topologically associated domains (TADs) or between chromatin contact loop anchors. Various parameters can be controlled to investigate the influence of external factors and visualization plots are available for each analysis step. |
Authors: | Konstantin Okonechnikov, Serap Erkek, Lukas Chavez |
Maintainer: | Konstantin Okonechnikov <[email protected]> |
License: | GPL (>=2) |
Version: | 1.27.0 |
Built: | 2024-10-30 07:34:13 UTC |
Source: | https://github.com/bioc/InTAD |
This function combines signals and genes in inside of Topologically Associated Domains (TADs)
combineInTAD(object, tadGR, selMaxTadOvlp = TRUE, closestGene = TRUE)
combineInTAD(object, tadGR, selMaxTadOvlp = TRUE, closestGene = TRUE)
object |
InTADSig object |
tadGR |
TAD genomic regions |
selMaxTadOvlp |
If a signal overlaps 2 or more TADs by default only single TAD with max overlap is selected.All overlaps can be included by deactivating this option. |
closestGene |
By default closest to TAD genes are selected based on TSS location. Deactivate this option to use genes only lying within TAD. |
Each signal is checked if it is lying inside of TAD. Signals out of TADs are ignored. The genomic regions reprenting gene coordiantes are converted to TSS. By default, the closest genes are assigned belonging to TAD. If this option deactivated, only those lying with TAD are collected. Result is a list of signals connected to tables with gene details.
Updated InTADSig object containing genes connected to eash signal
# create sigInTAD object inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) # combine signals and genes in TAD inTadSig <- combineInTAD(inTadSig, tadGR)
# create sigInTAD object inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) # combine signals and genes in TAD inTadSig <- combineInTAD(inTadSig, tadGR)
This function combines signals and genes based on the usage of loops obtained from HiC data analysis
combineWithLoops(object, loopsInitDf, fragmentLength = 0, tssWidth = 2000, extSize = 0)
combineWithLoops(object, loopsInitDf, fragmentLength = 0, tssWidth = 2000, extSize = 0)
object |
InTADSig object |
loopsInitDf |
Data frame with loops. By default 6-column format (chr1,start1,end1,chr2,start2,pos2) is expected. |
fragmentLength |
In case the input format is 4-column (chr1,middlePos1, chr2, middlePos2) fragment length should be provided to extend the corresponding loci for loop start and end positions. |
tssWidth |
The transcription start site width is used to control overlaps with loop anchor. Default is 2000 base pairs. |
extSize |
The loop endings can be extended upstream and downstream with provided corresponding increase size in base pairs. |
The expected input is the loops data.frame applied to find connections of signals to genes. This data.frame could be in two formats: either (chr1,start1,end1,chr2,start2,end2) or (chr1,middlePos1,chr2,middlePos2) with fragment size.
Updated InTADSig object containing genes connected to signals via loops
This data.frame contains 65 selected in chr15 normalized enhancers signals subset from 25 medulloblastoma samples.
enhSel
enhSel
a data.frame instance
NULL, but makes available the dataframe
This GRanges object contains the coordinates of 65 medulloblastoma enhancer signals in chr15 target region
enhSelGR
enhSelGR
a GRanges object
NULL, but makes available the dataset
This funcion returns gene expression counts table
## S4 method for signature 'InTADSig' exprs(object)
## S4 method for signature 'InTADSig' exprs(object)
object |
InTADSig object with signals and genes |
Gene expression table
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(exprs(inTadSig))
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(exprs(inTadSig))
This function performs filtering of gene expression counts based on various parameters
filterGeneExpr(obj, cutVal = 0, geneType = NA, checkExprDistr = FALSE, plotExprDistr = FALSE)
filterGeneExpr(obj, cutVal = 0, geneType = NA, checkExprDistr = FALSE, plotExprDistr = FALSE)
obj |
InTADSig object |
cutVal |
Exclude genes that have max expression less or equal to this value in all samples. Default: 0 |
geneType |
Type of gene to select for filtering i.e. "protein_coding". Default:NA |
checkExprDistr |
Adjust cutVal based on gene expression distribution |
plotExprDistr |
Perform visualziation of the distribution |
The function allows to stabilize the functional activity of the genes. By default all not expressed genes are filtered. It is also possible to set type of gene to take into account i.e. "protein_coding" only. This option requires additional metadata column "transcript_type". Also, special filtering option based on mclust library allows to analyze distribution of counts and adjust the cut value to exclude low expressed genes.
InTADSig object with filtered counts table
## perform analysis on test data inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) ## default filtering inTadSig <- filterGeneExpr(inTadSig) ## filter based on gene type inTadSig <- filterGeneExpr(inTadSig, geneType = "protein_coding")
## perform analysis on test data inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) ## default filtering inTadSig <- filterGeneExpr(inTadSig) ## filter based on gene type inTadSig <- filterGeneExpr(inTadSig, geneType = "protein_coding")
This function combines genes and signals using obtained loop connections.
findCorFromLoops(object, method = "pearson", adj.pval = FALSE)
findCorFromLoops(object, method = "pearson", adj.pval = FALSE)
object |
InTADSig object with signals and genes combined via loops |
method |
Correlation method: "pearson" (default), "kendall", "spearman" |
adj.pval |
Perform p-value adjsutment and include q-values in result |
A table with correlation values for signal-gene pairs including correlation p-value and euclidian distance.
This function combines genes and signals in inside of TADs
findCorrelation(object, method = "pearson", adj.pval = FALSE, plot.proportions = FALSE)
findCorrelation(object, method = "pearson", adj.pval = FALSE, plot.proportions = FALSE)
object |
InTADSig object with signals and genes combined in TADS |
method |
Correlation method: "pearson" (default), "kendall", "spearman" |
adj.pval |
Perform p-value adjsutment and include q-values in result |
plot.proportions |
Plot proportions of signals and genes in correlation |
A table with correlation values for signal-gene pairs including correlation p-value, euclidian distance and rank.
## perform analysis on test data inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) inTadSig <- filterGeneExpr(inTadSig, geneType = "protein_coding") inTadSig <- combineInTAD(inTadSig, tadGR) corData <- findCorrelation(inTadSig, method="pearson")
## perform analysis on test data inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) inTadSig <- filterGeneExpr(inTadSig, geneType = "protein_coding") inTadSig <- combineInTAD(inTadSig, tadGR) corData <- findCorrelation(inTadSig, method="pearson")
This function collects all genes for signal genomic region inside of Topologically Associated Domains (TADs)
fnSE(id, sigList, tadGR, tss, pickMaxOvlp, nearestTad)
fnSE(id, sigList, tadGR, tss, pickMaxOvlp, nearestTad)
id |
Id of signal from the list |
sigList |
List of signal GRs and their names |
tadGR |
TAD genomic regions |
tss |
Gene transcription start sites |
pickMaxOvlp |
Use TAD with max overlap |
nearestTad |
The table listing TADs nearest to each TSS #' |
The signal is checked if it is lying inside of TAD. Then all genes in this TAD are collected.
Data.frame containing genes connected to signal
This funcion returns the gene GRanges
geneCoords(object) ## S4 method for signature 'InTADSig' geneCoords(object)
geneCoords(object) ## S4 method for signature 'InTADSig' geneCoords(object)
object |
InTADSig object with signals and genes |
Gene GRanges
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(geneCoords(inTadSig))
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(geneCoords(inTadSig))
This function uses mclust package to analyze gene expression distribution
get.enr.bg.normfit(x)
get.enr.bg.normfit(x)
x |
Full gene expression vector |
The function adjust filtering cut value based on mclust library to exclude low expressed genes. It is a part of filtering procedure.
Distribution properties: mean and std
The InTADSig object stores signals and gene expression data for the samples.
It uses MultiAssayExperiment object to store information. Key slots to access are listed below.
sigMAE
:"MultiAssayExperiment"
,
MultiAssayExperiment object containg signals and gene counts
signalConnections
:"list"
,
The list of signals representing gene data frames in the same TAD
loopsDf
:"data.frame"
,
The data.frame containing details of provided input loops
loopConnections
:"list"
,
The list of connections between signals and genes via loops
ncore
:"numeric"
,
Number of cores to use for parallel computing
#'
The fuction loads the data tables to create an object that contains the signals and gene expression data.frames along with their genomic coordinates for further processing.
loadSigInTAD(signalsFile, countsFile, gtfFile, annFile = "", performLog = TRUE, logExprsOffset = 1, ncores = 1)
loadSigInTAD(signalsFile, countsFile, gtfFile, annFile = "", performLog = TRUE, logExprsOffset = 1, ncores = 1)
signalsFile |
Tab-seprated data table containg signals and their coordinates as row.names |
countsFile |
Tab-seprated counts table |
gtfFile |
GTF file containing all gene coordinates |
annFile |
Tab-delimited phenotype annotation of samples |
performLog |
Perform log2 convertion of expression values. Default: TRUE. |
logExprsOffset |
Offset x for log2 gene exrpression i.e. log2(value + x). Default: 1 |
ncores |
Number of cores to use for parallel computing |
The function loads data from input files and creates object that stores matrices of signals and gene expression values along with coordiantes. The samples order and names of columns should match in both tables. It is expected that gene ids are applied in the validation of counts table.
Novel InTADSig object
# create sigInTAD object inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel)
# create sigInTAD object inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel)
The table contains genomic coordinates of chromatin loops in 6-column format derived from IMR90 cell line (focus : chr15)
loopsDfSel
loopsDfSel
a data.frame object
NULL, but makes available the dataset
The table includes additional informaiton about MB tumour samples (subgroup, gender, age, histology and M.Stage)
mbAnnData
mbAnnData
a data.frame object
NULL, but makes available the dataset
The fuction generates an object that contains the signals and gene expression data.frames along with their genomic coordinates for further processing.
newSigInTAD(signalData = NULL, signalRegions = NULL, countsData = NULL, geneRegions = NULL, sampleInfo = NULL, performLog = TRUE, logExprsOffset = 1, ncores = 1)
newSigInTAD(signalData = NULL, signalRegions = NULL, countsData = NULL, geneRegions = NULL, sampleInfo = NULL, performLog = TRUE, logExprsOffset = 1, ncores = 1)
signalData |
data frame containing signals |
signalRegions |
genomic regions of the signals |
countsData |
data matrix containing count expression values |
geneRegions |
gene coordiantes |
sampleInfo |
data frame containing additional sample info |
performLog |
Perform log2 convertion of expression values. Default: TRUE. |
logExprsOffset |
Offset x for log2 gene exrpression i.e. log2(value + x). Default: 1 |
ncores |
Number of cores to use for parallel computing |
InTADSig object stores matrices of signals and gene expression values along with coordinates. The order of samples and names of columns should match in both datasets. For gene coordinates GRanges "gene_id" and "gene_name" are required in metadata. These are typical markers of genes in GTF anntotation format.
Novel InTADSig object
## create sigInTAD object inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel)
## create sigInTAD object inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel)
This function creates a plot of correlation strength in target genomic region from the result table. The X-coordinates represent signals, Y-coords represent genes, while each dot represents -log10(P-value) from correlation test. Additionallly all TAD boundaries can be visualized.
plotCorAcrossRef(obj, corRes, targetRegion, showCorVals = FALSE, symmetric = FALSE, tads = NULL)
plotCorAcrossRef(obj, corRes, targetRegion, showCorVals = FALSE, symmetric = FALSE, tads = NULL)
obj |
InTADSig object with signals and genes combined in TADS |
corRes |
Correlation result table created by function findCorrelation() |
targetRegion |
Target genomic region visualise. |
showCorVals |
Use this option to visualize postive correlation values instead of correlation strength |
symmetric |
Activate mirrow symmetry for gene-signal connections |
tads |
TAD regions to visualize. By default only TADs persent in correlation result table are applied (NULL value). |
A ggplot
object for visualization or customization.
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) inTadSig <- combineInTAD(inTadSig, tadGR) corData <- findCorrelation(inTadSig, method="pearson") plotCorAcrossRef(inTadSig,corData,GRanges("chr15:25000000-28000000"))
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) inTadSig <- combineInTAD(inTadSig, tadGR) corData <- findCorrelation(inTadSig, method="pearson") plotCorAcrossRef(inTadSig,corData,GRanges("chr15:25000000-28000000"))
This function creates a plot of selected pair signal-gene
plotCorrelation(obj, sId, geneName, xLabel = "Gene expression", yLabel = "Signal enrichment", colByPhenotype = "", corMethod = "pearson")
plotCorrelation(obj, sId, geneName, xLabel = "Gene expression", yLabel = "Signal enrichment", colByPhenotype = "", corMethod = "pearson")
obj |
InTADSig object with signals and genes combined in TADS |
sId |
Signal id based on genomic cooridantes i.e. "chr:start-end" |
geneName |
Gene name to select. Based on "gene_name" attribute. |
xLabel |
The label to mark signal X-axis. Default: "Gene expression" |
yLabel |
The label to mark signal Y-axis. Default: "Signal enrichment" |
colByPhenotype |
The pheno data column i.e. tumour type that can be use for colour |
corMethod |
Correlation method. Default: Pearson |
A ggplot
object for visualization or customization.
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) inTadSig <- combineInTAD(inTadSig, tadGR) plotCorrelation(inTadSig, "chr15:26372163-26398073", "GABRA5")
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) inTadSig <- combineInTAD(inTadSig, tadGR) plotCorrelation(inTadSig, "chr15:26372163-26398073", "GABRA5")
This data.frame contains RPKM gene expression values from chr15 for subset from 25 medulluoblastoma samples.
rpkmCountsSel
rpkmCountsSel
a data.frame instance
NULL, but makes available the dataframe
This funcion returns the signal GRanges
sigCoords(object) ## S4 method for signature 'InTADSig' sigCoords(object)
sigCoords(object) ## S4 method for signature 'InTADSig' sigCoords(object)
object |
InTADSig object with signals and genes |
Signal GRanges
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(sigCoords(inTadSig))
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(sigCoords(inTadSig))
This funcion returns the signal values table
signals(object) ## S4 method for signature 'InTADSig' signals(object)
signals(object) ## S4 method for signature 'InTADSig' signals(object)
object |
InTADSig object with signals and genes |
Signals table
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(signals(inTadSig))
inTadSig <- newSigInTAD(enhSel, enhSelGR, rpkmCountsSel, txsSel) head(signals(inTadSig))
This GRanges object contains the coordinates of TADs revealed from IMR90 cell line (extracted from 0-indexed .bed file)
tadGR
tadGR
a GRanges object
NULL, but makes available the dataset
This GRanges object contains the coordinates of genes subset from chr15
txsSel
txsSel
a GRanges object
NULL, but makes avaialbe the dataset