Package 'MethReg' reference manual

Title:	Assessing the regulatory potential of DNA methylation regions or sites on gene transcription
Description:	Epigenome-wide association studies (EWAS) detects a large number of DNA methylation differences, often hundreds of differentially methylated regions and thousands of CpGs, that are significantly associated with a disease, many are located in non-coding regions. Therefore, there is a critical need to better understand the functional impact of these CpG methylations and to further prioritize the significant changes. MethReg is an R package for integrative modeling of DNA methylation, target gene expression and transcription factor binding sites data, to systematically identify and rank functional CpG methylations. MethReg evaluates, prioritizes and annotates CpG sites with high regulatory potential using matched methylation and gene expression data, along with external TF-target interaction databases based on manually curation, ChIP-seq experiments or gene regulatory network analysis.
Authors:	Tiago Silva [aut, cre] , Lily Wang [aut]
Maintainer:	Tiago Silva <[email protected]>
License:	GPL-3
Version:	1.17.0
Built:	2025-03-19 04:12:54 UTC
Source:	https://github.com/bioc/MethReg

TCGA-COAD clinical matrix for 38 samples retrieved from GDC using TCGAbiolinks

Description

TCGA-COAD clinical matrix for 38 samples retrieved from GDC using TCGAbiolinks

Usage

clinical
clinical

Format

A matrix: 38 samples (rows) and variables (columns) patient, sample, gender and sample_type

Evaluate correlation of DNA methylation region and target gene expression

Description

This function evaluate the correlation of the DNA methylation and target gene expression using spearman rank correlation test. Note that genes with RNA expression equal to 0 for all samples will not be evaluated.

Usage

cor_dnam_target_gene(
  pair.dnam.target,
  dnam,
  exp,
  filter.results = TRUE,
  min.cor.pval = 0.05,
  min.cor.estimate = 0,
  cores = 1
)
cor_dnam_target_gene(
  pair.dnam.target,
  dnam,
  exp,
  filter.results = TRUE,
  min.cor.pval = 0.05,
  min.cor.estimate = 0,
  cores = 1
)

Arguments

`pair.dnam.target`	A dataframe with the following columns: regionID (DNA methylation) and target (target gene)
`dnam`	DNA methylation matrix or SummarizedExperiment object with regions/cpgs in rows and samples in columns are samples. Samples should be in the same order as gene expression matrix (exp).
`exp`	Gene expression matrix or SummarizedExperiment object (rows are genes, columns are samples) log2-normalized (log2(exp + 1)). Samples should be in the same order as the DNA methylation matrix.
`filter.results`	Filter results using min.cor.pval and min.cor.estimate thresholds
`min.cor.pval`	P-value threshold filter (default: 0.05)
`min.cor.estimate`	Correlation estimate threshold filter (default: not applied)
`cores`	Number of CPU cores to be used. Default 1.

Value

A data frame with the following information: regionID, target gene, correlation pvalue and estimate between DNA methylation and target gene expression, FDR corrected p-values.

Examples

dnam <- t(matrix(sort(c(runif(20))), ncol = 1))
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)
exp <- dnam
rownames(exp) <- c("ENSG00000232886")
colnames(exp) <- paste0("Samples",1:20)

pair.dnam.target <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000232886"
)

# Correlated DNAm and gene expression, display only significant associations
results.cor.pos <- cor_dnam_target_gene(
   pair.dnam.target = pair.dnam.target,
   dnam = dnam,
   exp = exp,
   filter.results = TRUE,
   min.cor.pval = 0.05,
   min.cor.estimate = 0.0
)
dnam <- t(matrix(sort(c(runif(20))), ncol = 1))
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)
exp <- dnam
rownames(exp) <- c("ENSG00000232886")
colnames(exp) <- paste0("Samples",1:20)

pair.dnam.target <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000232886"
)

# Correlated DNAm and gene expression, display only significant associations
results.cor.pos <- cor_dnam_target_gene(
   pair.dnam.target = pair.dnam.target,
   dnam = dnam,
   exp = exp,
   filter.results = TRUE,
   min.cor.pval = 0.05,
   min.cor.estimate = 0.0
)

Evaluate correlation of TF expression and target gene expression

Description

This function evaluate the correlation of a TF and target gene expression using spearman rank correlation test. Note that genes with RNA expression equal to 0 for all samples will not be evaluated.

Usage

cor_tf_target_gene(
  pair.tf.target,
  exp,
  tf.activity.es = NULL,
  cores = 1,
  verbose = FALSE
)
cor_tf_target_gene(
  pair.tf.target,
  exp,
  tf.activity.es = NULL,
  cores = 1,
  verbose = FALSE
)

Arguments

`pair.tf.target`	A dataframe with the following columns: TF and target (target gene)
`exp`	Gene expression matrix or SummarizedExperiment object (rows are genes, columns are samples) log2-normalized (log2(exp + 1)). Samples should be in the same order as the tf.activity.es matrix
`tf.activity.es`	A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression. See `get_tf_ES`.
`cores`	Number of CPU cores to be used. Default 1.
`verbose`	Show messages ?

Value

A data frame with the following information: TF, target gene, correlation p-value and estimate between TF and target gene expression, FDR corrected p-values.

Examples

exp <- t(matrix(sort(c(runif(40))), ncol = 2))
rownames(exp) <- c("ENSG00000232886","ENSG00000232889")
colnames(exp) <- paste0("Samples",1:20)

pair.tf.target <- data.frame(
   "TF" = "ENSG00000232889",
   "target" = "ENSG00000232886"
)

# Correlated TF and gene expression
results.cor.pos <- cor_tf_target_gene(
   pair.tf.target = pair.tf.target,
   exp = exp,
)
# Correlated TF and gene expression
results.cor.pos <- cor_tf_target_gene(
   pair.tf.target = pair.tf.target,
   exp = exp,
   tf.activity.es = exp
)
exp <- t(matrix(sort(c(runif(40))), ncol = 2))
rownames(exp) <- c("ENSG00000232886","ENSG00000232889")
colnames(exp) <- paste0("Samples",1:20)

pair.tf.target <- data.frame(
   "TF" = "ENSG00000232889",
   "target" = "ENSG00000232886"
)

# Correlated TF and gene expression
results.cor.pos <- cor_tf_target_gene(
   pair.tf.target = pair.tf.target,
   exp = exp,
)
# Correlated TF and gene expression
results.cor.pos <- cor_tf_target_gene(
   pair.tf.target = pair.tf.target,
   exp = exp,
   tf.activity.es = exp
)

Map DNAm to target genes using distance approaches, and TF to the DNAm region using JASPAR2024 TFBS.

Description

This function wraps two other functions get_region_target_gene and get_tf_in_region from the package. This function will map a region to a target gene using three methods (mapping to the closest gene, mapping to any gene within a given window of distance, or mapping to a fixed number of nearby genes upstream or downstream). To find TFs binding to the region, JASPAR2024 is used.

Usage

create_triplet_distance_based(
  region,
  genome = c("hg38", "hg19"),
  target.method = c("genes.promoter.overlap", "window", "nearby.genes", "closest.gene"),
  target.window.size = 500 * 10^3,
  target.num.flanking.genes = 5,
  target.promoter.upstream.dist.tss = 2000,
  target.promoter.downstream.dist.tss = 2000,
  target.rm.promoter.regions.from.distal.linking = TRUE,
  motif.search.window.size = 0,
  motif.search.p.cutoff = 1e-08,
  TF.peaks.gr = NULL,
  max.distance.region.target = 10^6,
  cores = 1
)
create_triplet_distance_based(
  region,
  genome = c("hg38", "hg19"),
  target.method = c("genes.promoter.overlap", "window", "nearby.genes", "closest.gene"),
  target.window.size = 500 * 10^3,
  target.num.flanking.genes = 5,
  target.promoter.upstream.dist.tss = 2000,
  target.promoter.downstream.dist.tss = 2000,
  target.rm.promoter.regions.from.distal.linking = TRUE,
  motif.search.window.size = 0,
  motif.search.p.cutoff = 1e-08,
  TF.peaks.gr = NULL,
  max.distance.region.target = 10^6,
  cores = 1
)

Arguments

`region`	A Granges or a named vector with regions (i.e "chr21:100002-1004000")
`genome`	Human genome reference "hg38" or "hg19"
`target.method`	How genes are mapped to regions: regions overlapping gene promoter ("genes.promoter.overlap"); genes within a window around the region ("window"); or fixed number of nearby genes upstream and downstream from the region
`target.window.size`	When `method = "window"`, number of base pairs to extend the region (+- window.size/2). Default is 500kbp (or +/- 250kbp, i.e. 250k bp from start or end of the region)
`target.num.flanking.genes`	Number of flanking genes upstream and downstream to search. For example, if `target.num.flanking.genes = 5`, it will return the 5 genes upstream and 5 genes downstream
`target.promoter.upstream.dist.tss`	Number of base pairs (bp) upstream of TSS to consider as promoter regions. Defaults to 2000 bp.
`target.promoter.downstream.dist.tss`	Number of base pairs (bp) downstream of TSS to consider as promoter regions. Defaults to 2000 bp.
`target.rm.promoter.regions.from.distal.linking`	When performing distal linking with method = "windows" or method = "nearby.genes", or "closest.gene.tss", if set to TRUE (default), probes in promoter regions will be removed from the input.
`motif.search.window.size`	Integer value to extend the regions. For example, a value of 50 will extend 25 bp upstream and 25 downstream the region. Default is no increase
`motif.search.p.cutoff`	motifmatchr pvalue cut-off. Default 1e-8.
`TF.peaks.gr`	A granges with TF peaks to be overlaped with input region Metadata column expected "id" with TF name. Default NULL. Note that Remap catalog can be used as shown in the examples.
`max.distance.region.target`	Max distance between region and target gene. Default 1Mbp.
`cores`	Number of CPU cores to be used. Default 1.

Value

A data frame with TF, target and RegionID information.

Examples

regions.names <- c("chr3:189631389-189632889","chr4:43162098-43163498")
triplet <- create_triplet_distance_based(
   region = regions.names,
   motif.search.window.size = 500,
   target.method = "closest.gene"
)
regions.names <- c("chr3:189631389-189632889","chr4:43162098-43163498")
triplet <- create_triplet_distance_based(
   region = regions.names,
   motif.search.window.size = 500,
   target.method = "closest.gene"
)

Map TF and target genes using regulon databases or any user provided target-tf. Maps TF to the DNAm region with TFBS using JASPAR2020 TFBS.

Description

This function wraps two other functions get_region_target_gene and get_tf_in_region from the package.

Usage

create_triplet_regulon_based(
  region,
  genome = c("hg38", "hg19"),
  regulons.min.confidence = "B",
  motif.search.window.size = 0,
  motif.search.p.cutoff = 1e-08,
  cores = 1,
  tf.target,
  TF.peaks.gr = NULL,
  max.distance.region.target = 10^6
)
create_triplet_regulon_based(
  region,
  genome = c("hg38", "hg19"),
  regulons.min.confidence = "B",
  motif.search.window.size = 0,
  motif.search.p.cutoff = 1e-08,
  cores = 1,
  tf.target,
  TF.peaks.gr = NULL,
  max.distance.region.target = 10^6
)

Arguments

`region`	A Granges or a named vector with regions (i.e "chr21:100002-1004000")
`genome`	Human genome reference "hg38" or "hg19"
`regulons.min.confidence`	Minimun confidence score ("A", "B","C","D", "E") classifying regulons based on their quality from Human DoRothEA database dorothea_hs. The default minimun confidence score is "B".
`motif.search.window.size`	Integer value to extend the regions. For example, a value of 50 will extend 25 bp upstream and 25 downstream the region. Default is no increase
`motif.search.p.cutoff`	motifmatchr pvalue cut-off. Default 1e-8.
`cores`	Number of CPU cores to be used. Default 1.
`tf.target`	A dataframe with tf and target columns. If not provided, dorothea_hs will be used.
`TF.peaks.gr`	A granges with TF peaks to be overlaped with input region Metadata column expected "id" with TF name. Default NULL. Note that Remap catalog can be used as shown in the examples.
`max.distance.region.target`	Max distance between region and target gene. Default 1Mbp.

Value

A data frame with TF, target and RegionID information.

Examples

triplet <- create_triplet_regulon_based(
   region = c("chr1:69591-69592", "chr1:898803-898804"),
   motif.search.window.size = 50,
   regulons.min.confidence = "B",
     motif.search.p.cutoff = 0.05
)
triplet <- create_triplet_regulon_based(
   region = c("chr1:69591-69592", "chr1:898803-898804"),
   motif.search.window.size = 50,
   regulons.min.confidence = "B",
     motif.search.p.cutoff = 0.05
)

TCGA-COAD DNA methylation matrix (beta-values) for 38 samples (only chr21) retrieved from GDC using TCGAbiolinks

Description

TCGA-COAD DNA methylation matrix (beta-values) for 38 samples (only chr21) retrieved from GDC using TCGAbiolinks

Usage

dna.met.chr21
dna.met.chr21

Format

A beta-value matrix with 38 samples, includes CpG IDs in the rows and TCGA sample identifiers in the columns

Format MethReg results table and export to XLSX file

Description

Receives a methReg results table and create a formatted XLSX file to easier readability and interpretation of the results

Usage

export_results_to_table(results, file = "MethReg_results.xlsx")
export_results_to_table(results, file = "MethReg_results.xlsx")

Arguments

`results`	MethReg results
`file`	xlsx filename used to save

Value

A summarized Experiment object

Examples

library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000252982")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000083937")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000252982",
   "TF" = "ENSG00000083937"
)
results <- interaction_model(
   triplet = triplet, 
   dnam = dnam, 
   exp = exp, 
    dnam.group.threshold = 0.25,
   stage.wise.analysis = FALSE, 
   sig.threshold = 1,
   filter.correlated.tf.exp.dnam = FALSE,
   filter.correlated.target.exp.dnam = FALSE,
   filter.triplet.by.sig.term = FALSE
)
results <- results %>% stratified_model( dnam = dnam,  exp = exp)
export_results_to_table(results = results, file = "MethReg_results.xlsx")


results$`RLM_DNAmGroup:TF_region_stage_wise_adj_pvalue` <- results$`RLM_DNAmGroup:TF_fdr`
results$`RLM_DNAmGroup:TF_triplet_stage_wise_adj_pvalue` <- results$`RLM_DNAmGroup:TF_fdr`
results$`RLM_DNAmGroup:TF_fdr` <- NULL
export_results_to_table(results = results, file = "MethReg_results_stage_wise.xlsx")

library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000252982")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000083937")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000252982",
   "TF" = "ENSG00000083937"
)
results <- interaction_model(
   triplet = triplet, 
   dnam = dnam, 
   exp = exp, 
    dnam.group.threshold = 0.25,
   stage.wise.analysis = FALSE, 
   sig.threshold = 1,
   filter.correlated.tf.exp.dnam = FALSE,
   filter.correlated.target.exp.dnam = FALSE,
   filter.triplet.by.sig.term = FALSE
)
results <- results %>% stratified_model( dnam = dnam,  exp = exp)
export_results_to_table(results = results, file = "MethReg_results.xlsx")


results$`RLM_DNAmGroup:TF_region_stage_wise_adj_pvalue` <- results$`RLM_DNAmGroup:TF_fdr`
results$`RLM_DNAmGroup:TF_triplet_stage_wise_adj_pvalue` <- results$`RLM_DNAmGroup:TF_fdr`
results$`RLM_DNAmGroup:TF_fdr` <- NULL
export_results_to_table(results = results, file = "MethReg_results_stage_wise.xlsx")

Select regions with variations in DNA methylation levels above a threshold

Description

For each region, computes the interquartile range (IQR) of the DNA methylation (DNAm) levels and requires the IQR to be above a threshold

Usage

filter_dnam_by_quant_diff(dnam, min.IQR.threshold = 0.2, cores = 1)
filter_dnam_by_quant_diff(dnam, min.IQR.threshold = 0.2, cores = 1)

Arguments

`dnam`	DNA methylation matrix or SummarizedExperiment object
`min.IQR.threshold`	Threshold for minimal interquantile range (difference between the 75th and 25th percentiles) of the DNAm
`cores`	Number of CPU cores to be used in the analysis. Default: 1

Value

A subset of the original matrix only with the rows passing the filter threshold.

Examples

data("dna.met.chr21")
dna.met.chr21.filtered <- filter_dnam_by_quant_diff(
  dna.met.chr21
)
data("dna.met.chr21")
dna.met.chr21.filtered <- filter_dnam_by_quant_diff(
  dna.met.chr21
)

Select genes with variations above a threshold

Description

For each gene, compares the mean gene expression levels in samples in high expression (Q4) vs. samples with low gene expression (Q1), and requires the fold change to be above a certain threshold.

Usage

filter_exp_by_quant_mean_FC(exp, fold.change = 1.5, cores = 1)
filter_exp_by_quant_mean_FC(exp, fold.change = 1.5, cores = 1)

Arguments

`exp`	Gene expression matrix or SumarizedExperiment object
`fold.change`	Threshold for fold change of mean gene expression levels in samples with high (Q4) and low (Q1) gene expression levels. Defaults to 1.5.
`cores`	Number of CPU cores to be used in the analysis. Default: 1

Value

A subset of the original matrix only with the rows passing the filter threshold.

Examples

data("gene.exp.chr21.log2")
gene.exp.chr21.log2.filtered <- filter_exp_by_quant_mean_FC(
  gene.exp.chr21.log2
)
data("gene.exp.chr21.log2")
gene.exp.chr21.log2.filtered <- filter_exp_by_quant_mean_FC(
  gene.exp.chr21.log2
)

Remove genes with gene expression level equal to 0 in a substantial percentage of the samples

Description

Remove genes with gene expression level equal to 0 in a substantial percentage of the samples

Usage

filter_genes_zero_expression(exp, max.samples.percentage = 0.25)
filter_genes_zero_expression(exp, max.samples.percentage = 0.25)

Arguments

`exp`	Gene expression matrix or SumarizedExperiment object
`max.samples.percentage`	Max percentage of samples with gene expression as 0, for genes to be selected. If max.samples.percentage 100, remove genes with 0 for 100% samples. If max.samples.percentage 25, remove genes with 0 for more than 25% of the samples.

Value

A subset of the original matrix only with the rows passing the filter threshold.

TCGA-COAD gene expression matrix (log2 (FPKM-UQ + 1)) for 38 samples (only chromosome 21) retrieved from GDC using TCGAbiolinks

Description

TCGA-COAD gene expression matrix (log2 (FPKM-UQ + 1)) for 38 samples (only chromosome 21) retrieved from GDC using TCGAbiolinks

Usage

gene.exp.chr21.log2
gene.exp.chr21.log2

Format

A log2 (FPKM-UQ + 1) gene expression matrix with 38 samples, includes Ensembl IDs in the rows and TCGA sample identifiers in the columns

Access human TF from Lambert et al 2018

Description

Access human TF from Lambert et al 2018 (PMID: 29425488)

Usage

get_human_tfs()
get_human_tfs()

Value

A dataframe with Human TF

Examples

 human.tfs <- get_human_tfs()
human.tfs <- get_human_tfs()

Get HM450/EPIC manifest files from Sesame package

Description

Returns a data frame with HM450/EPIC manifest information files from Sesame package

Usage

get_met_probes_info(genome = c("hg38", "hg19"), arrayType = c("450k", "EPIC"))
get_met_probes_info(genome = c("hg38", "hg19"), arrayType = c("450k", "EPIC"))

Arguments

`genome`	Human genome of reference hg38 or hg19
`arrayType`	"450k" or "EPIC" array

Value

A Granges with the DNAm array manifest

Examples

regions.names <- c("chr22:18267969-18268249","chr23:18267969-18268249")
regions.gr <- make_granges_from_names(regions.names)
make_names_from_granges(regions.gr)
regions.names <- c("chr22:18267969-18268249","chr23:18267969-18268249")
regions.gr <- make_granges_from_names(regions.names)
make_names_from_granges(regions.gr)

Summarize promoter DNA methylation beta values by mean.

Description

First, identify gene promoter regions (default +-2Kkb around TSS). Then, for each promoter region calculate the mean DNA methylation of probes overlapping the region.

Usage

get_promoter_avg(
  dnam,
  genome,
  arrayType,
  cores = 1,
  upstream.dist.tss = 2000,
  downstream.dist.tss = 2000,
  verbose = FALSE
)
get_promoter_avg(
  dnam,
  genome,
  arrayType,
  cores = 1,
  upstream.dist.tss = 2000,
  downstream.dist.tss = 2000,
  verbose = FALSE
)

Arguments

`dnam`	A DNA methylation matrix or a SummarizedExperiment object
`genome`	Human genome of reference hg38 or hg19
`arrayType`	DNA methylation array type (450k or EPIC)
`cores`	A integer number to use multiple cores. Default 1 core.
`upstream.dist.tss`	Number of base pairs (bp) upstream of TSS to consider as promoter regions
`downstream.dist.tss`	Number of base pairs (bp) downstream of TSS to consider as promoter regions
`verbose`	A logical argument indicating if messages output should be provided.

Value

A RangedSummarizedExperiment with promoter region and mean beta-values of CpGs within it. Metadata will provide the promoter gene region and gene informations.

Examples

## Not run: 
  data("dna.met.chr21")
  promoter.avg <- get_promoter_avg(
     dnam = dna.met.chr21,
     genome = "hg19",
     arrayType = "450k"
)

## End(Not run)
## Not run: 
  data("dna.met.chr21")
  promoter.avg <- get_promoter_avg(
     dnam = dna.met.chr21,
     genome = "hg19",
     arrayType = "450k"
)

## End(Not run)

Obtain target genes of input regions based on distance

Description

To map an input region to genes there are three options: 1) map region to closest gene tss 2) map region to all genes within a window around the region (default window.size = 500kbp (i.e. +/- 250kbp from start or end of the region)). 3) map region to a fixed number of nearby genes (upstream/downstream)

Usage

get_region_target_gene(
  regions.gr,
  genome = c("hg38", "hg19"),
  method = c("genes.promoter.overlap", "window", "nearby.genes", "closest.gene.tss"),
  promoter.upstream.dist.tss = 2000,
  promoter.downstream.dist.tss = 2000,
  window.size = 500 * 10^3,
  num.flanking.genes = 5,
  rm.promoter.regions.from.distal.linking = TRUE
)
get_region_target_gene(
  regions.gr,
  genome = c("hg38", "hg19"),
  method = c("genes.promoter.overlap", "window", "nearby.genes", "closest.gene.tss"),
  promoter.upstream.dist.tss = 2000,
  promoter.downstream.dist.tss = 2000,
  window.size = 500 * 10^3,
  num.flanking.genes = 5,
  rm.promoter.regions.from.distal.linking = TRUE
)

Arguments

`regions.gr`	A Genomic Ranges object (GRanges) or a SummarizedExperiment object (rowRanges will be used)
`genome`	Human genome of reference "hg38" or "hg19"
`method`	How genes are mapped to regions: region overlapping gene promoter ("genes.promoter.overlap"); or genes within a window around the region ("window"); or a fixed number genes upstream and downstream of the region ("nearby.genes"); or closest gene tss to the region ("closest.gene.tss")
`promoter.upstream.dist.tss`	Number of base pairs (bp) upstream of TSS to consider as promoter regions. Defaults to 2000 bp.
`promoter.downstream.dist.tss`	Number of base pairs (bp) downstream of TSS to consider as promoter regions. Defaults to 2000 bp.
`window.size`	When `method = "window"`, number of base pairs to extend the region (+- window.size/2). Default is 500kbp (or +/- 250kbp, i.e. 250k bp from start or end of the region)
`num.flanking.genes`	When `method = "nearby.genes"`, set the number of flanking genes upstream and downstream to search.Defaults to 5. For example, if num.flanking.genes = 5, it will return the 5 genes upstream and 5 genes downstream of the given region.
`rm.promoter.regions.from.distal.linking`	When performing distal linking with method = "windows", "nearby.genes" or "closest.gene.tss", if set to TRUE (default), probes in promoter regions will be removed from the input.

Details

For the analysis of probes in promoter regions (promoter analysis), we recommend setting method = "genes.promoter.overlap".

For the analysis of probes in distal regions (distal analysis), we recommend setting either method = "window" or method = "nearby.genes".

Note that because method = "window" or method = "nearby.genes" are mainly used for analyzing distal probes, by default rm.promoter.regions.from.distal.linking = TRUE to remove probes in promoter regions.

Value

A data frame with the following information: regionID, Target symbol, Target ensembl ID

Examples

library(GenomicRanges)
library(dplyr)

# Create example region
regions.gr <- data.frame(
       chrom = c("chr22", "chr22", "chr22", "chr22", "chr22"),
       start = c("39377790", "50987294", "19746156", "42470063", "43817258"),
       end   = c("39377930", "50987527", "19746368", "42470223", "43817384"),
       stringsAsFactors = FALSE)  %>%
     makeGRangesFromDataFrame

 # map to closest gene tss
 region.genes.promoter.overlaps <- get_region_target_gene(
                      regions.gr = regions.gr,
                      genome = "hg19",
                      method = "genes.promoter.overlap"
 )

 # map to all gene within region +- 250kbp
 region.window.genes <- get_region_target_gene(
                      regions.gr = regions.gr,
                      genome = "hg19",
                      method = "window",
                      window.size = 500 * 10^3
 )

 # map regions to n upstream and n downstream genes
 region.nearby.genes <- get_region_target_gene(
                      regions.gr = regions.gr,
                      genome = "hg19",
                      method = "nearby.genes",
                      num.flanking.genes = 5
 )
library(GenomicRanges)
library(dplyr)

# Create example region
regions.gr <- data.frame(
       chrom = c("chr22", "chr22", "chr22", "chr22", "chr22"),
       start = c("39377790", "50987294", "19746156", "42470063", "43817258"),
       end   = c("39377930", "50987527", "19746368", "42470223", "43817384"),
       stringsAsFactors = FALSE)  %>%
     makeGRangesFromDataFrame

 # map to closest gene tss
 region.genes.promoter.overlaps <- get_region_target_gene(
                      regions.gr = regions.gr,
                      genome = "hg19",
                      method = "genes.promoter.overlap"
 )

 # map to all gene within region +- 250kbp
 region.window.genes <- get_region_target_gene(
                      regions.gr = regions.gr,
                      genome = "hg19",
                      method = "window",
                      window.size = 500 * 10^3
 )

 # map regions to n upstream and n downstream genes
 region.nearby.genes <- get_region_target_gene(
                      regions.gr = regions.gr,
                      genome = "hg19",
                      method = "nearby.genes",
                      num.flanking.genes = 5
 )

Get residuals from regression model

Description

Compute studentized residuals from fitting linear regression models to expression values in a data matrix

Usage

get_residuals(data, metadata.samples = NULL, metadata.genes = NULL, cores = 1)
get_residuals(data, metadata.samples = NULL, metadata.genes = NULL, cores = 1)

Arguments

`data`	A matrix or SummarizedExperiment object with samples as columns and features (gene, probes) as rows. Note that expression values should typically be log2(expx + 1) transformed before fitting linear regression models.
`metadata.samples`	A data frame with samples as rows and columns the covariates. No NA values are allowed, otherwise residual of the corresponding sample will be NA.
`metadata.genes`	A data frame with genes (covariates) as rows and samples as columns. For each evaluated gene, each column (e.g. CNA) that corresponds to the same gene will be set as a single covariate variable. This can be used to correct copy number alterations for each gene.
`cores`	Number of CPU cores to be used. Defaults to 1.

Details

When only metadata.samples are provided, this function computes residuals for expression values in a data matrix by fitting model

features ~ Sample_covariate1 + Sample_covariate2 ... + Sample_covariateN where N is the index of the columns in the metadata provided, features are (typically log transformed) expression values.

When the user additionally provide metadata.genes, that is, gene metadata (e.g. gene_covariate = copy number variations/alterations) residuals are computed by fitting the following model:

features ~ Sample_covariate1 + Sample_covariate2 ... + Sample_covariateN + gene_covariate

Value

A residuals matrix with samples as columns and features (gene, probes) as rows

Examples

data("gene.exp.chr21.log2")

data("clinical")
metadata <- clinical[,c( "gender", "sample_type")]

cnv <- matrix(
   sample(x = c(-2,-1,0,1,2),
   size = ncol(gene.exp.chr21.log2) * nrow(gene.exp.chr21.log2),replace = TRUE),
   nrow = nrow(gene.exp.chr21.log2),
   ncol = ncol(gene.exp.chr21.log2)
)
rownames(cnv) <- rownames(gene.exp.chr21.log2)
colnames(cnv) <- colnames(gene.exp.chr21.log2)

gene.exp.residuals <- get_residuals(
   data = gene.exp.chr21.log2[1:3,],
   metadata.samples = metadata,
   metadata.genes = cnv
)
gene.exp.residuals <- get_residuals(
   data = gene.exp.chr21.log2[1:3,],
   metadata.samples = metadata,
   metadata.genes = cnv[1:2,]
)
gene.exp.residuals <- get_residuals(
   data = gene.exp.chr21.log2[1:3,],
   metadata.samples = metadata
)
data("gene.exp.chr21.log2")

data("clinical")
metadata <- clinical[,c( "gender", "sample_type")]

cnv <- matrix(
   sample(x = c(-2,-1,0,1,2),
   size = ncol(gene.exp.chr21.log2) * nrow(gene.exp.chr21.log2),replace = TRUE),
   nrow = nrow(gene.exp.chr21.log2),
   ncol = ncol(gene.exp.chr21.log2)
)
rownames(cnv) <- rownames(gene.exp.chr21.log2)
colnames(cnv) <- colnames(gene.exp.chr21.log2)

gene.exp.residuals <- get_residuals(
   data = gene.exp.chr21.log2[1:3,],
   metadata.samples = metadata,
   metadata.genes = cnv
)
gene.exp.residuals <- get_residuals(
   data = gene.exp.chr21.log2[1:3,],
   metadata.samples = metadata,
   metadata.genes = cnv[1:2,]
)
gene.exp.residuals <- get_residuals(
   data = gene.exp.chr21.log2[1:3,],
   metadata.samples = metadata
)

Calculate enrichment scores for each TF across all samples using dorothea and viper.

Description

Calculate enrichment scores for each TF across all samples using dorothea and viper.

Usage

get_tf_ES(exp, min.confidence = "B", regulons)
get_tf_ES(exp, min.confidence = "B", regulons)

Arguments

`exp`	Gene expression matrix with gene expression counts, row as ENSG gene IDS and column as samples
`min.confidence`	Minimun confidence score ("A", "B","C","D", "E") classifying regulons based on their quality from Human DoRothEA database. The default minimun confidence score is "B"
`regulons`	DoRothEA regulons in table format. Same as run_viper. If not specified Bioconductor (human) dorothea regulons besed on GTEx will be. used dorothea_hs.

Value

A matrix of normalized enrichment scores for each TF across all samples

Examples

gene.exp.chr21.log2 <- get(data("gene.exp.chr21.log2"))
tf_es <- get_tf_ES(gene.exp.chr21.log2)
gene.exp.chr21.log2 <- get(data("gene.exp.chr21.log2"))
tf_es <- get_tf_ES(gene.exp.chr21.log2)

Get human TFs for regions by either scanning it with motifmatchr using JASPAR 2024 database or overlapping with TF chip-seq from user input

Description

Given a genomic region, this function maps TF in regions using two methods: 1) using motifmatchr nd JASPAR 2024 to scan the region for 554 human transcription factors binding sites. There is also an option (argument window.size) to extend the scanning region before performing the search, which by default is 0 (do not extend). 2) Using user input TF chip-seq to check for overlaps between region and TF peaks.

Usage

get_tf_in_region(
  region,
  window.size = 0,
  genome = c("hg19", "hg38"),
  p.cutoff = 1e-08,
  cores = 1,
  TF.peaks.gr = NULL,
  verbose = FALSE
)
get_tf_in_region(
  region,
  window.size = 0,
  genome = c("hg19", "hg38"),
  p.cutoff = 1e-08,
  cores = 1,
  TF.peaks.gr = NULL,
  verbose = FALSE
)

Arguments

`region`	A vector of region names or GRanges object with the DNA methylation regions to be scanned for the motifs
`window.size`	Integer value to extend the regions. For example, a value of 50 will extend 25 bp upstream and 25 bp downstream the region. The default is not to increase the scanned region.
`genome`	Human genome of reference "hg38" or "hg19".
`p.cutoff`	motifmatchr p.cutoff. Default 1e-8.
`cores`	Number of CPU cores to be used. Default 1.
`TF.peaks.gr`	A granges with TF peaks to be overlaped with input region Metadata column expected "id" with TF name. Default NULL. Note that Remap catalog can be used as shown in the examples.
`verbose`	A logical argument indicating if messages output should be provided.

Value

A data frame with the following information: regionID, TF symbol, TF ensembl ID

Examples

 regions.names <- c("chr3:189631389-189632889","chr4:43162098-43163498")
 region.tf <- get_tf_in_region(
                 region = regions.names,
                 genome = "hg38"
 )

## Not run: 
   library(ReMapEnrich)
   demo.dir <- "~/ReMapEnrich_demo"
   dir.create(demo.dir, showWarnings = FALSE, recursive = TRUE)
   # Use the function DowloadRemapCatalog
   remapCatalog2018hg38 <- downloadRemapCatalog(demo.dir, assembly = "hg38")
   # Load the ReMap catalogue and convert it to Genomic Ranges
   remapCatalog <- bedToGranges(remapCatalog2018hg38)
   regions.names <- c("chr3:189631389-189632889","chr4:43162098-43163498")
   region.tf.remap <- get_tf_in_region(
                   region = regions.names,
                   genome = "hg38",
                   TF.peaks.gr = remapCatalog
   )
 
## End(Not run)
regions.names <- c("chr3:189631389-189632889","chr4:43162098-43163498")
 region.tf <- get_tf_in_region(
                 region = regions.names,
                 genome = "hg38"
 )

## Not run: 
   library(ReMapEnrich)
   demo.dir <- "~/ReMapEnrich_demo"
   dir.create(demo.dir, showWarnings = FALSE, recursive = TRUE)
   # Use the function DowloadRemapCatalog
   remapCatalog2018hg38 <- downloadRemapCatalog(demo.dir, assembly = "hg38")
   # Load the ReMap catalogue and convert it to Genomic Ranges
   remapCatalog <- bedToGranges(remapCatalog2018hg38)
   regions.names <- c("chr3:189631389-189632889","chr4:43162098-43163498")
   region.tf.remap <- get_tf_in_region(
                   region = regions.names,
                   genome = "hg38",
                   TF.peaks.gr = remapCatalog
   )
 
## End(Not run)

Fits linear models with interaction to triplet data (Target, TF, DNAm), where DNAm is a binary variable (samples in Q1 or Q4)

Description

Evaluates regulatory potential of DNA methylation (DNAm) on gene expression, by fitting robust linear model or zero inflated negative binomial model to triplet data. These models consist of terms to model direct effect of DNAm on target gene expression, direct effect of TF on gene expression, as well as an interaction term that evaluates the synergistic effect of DNAm and TF on gene expression.

Usage

interaction_model(
  triplet,
  dnam,
  exp,
  dnam.group.threshold = 0.25,
  cores = 1,
  tf.activity.es = NULL,
  sig.threshold = 0.05,
  fdr = TRUE,
  filter.correlated.tf.exp.dnam = TRUE,
  filter.correlated.target.exp.dnam = TRUE,
  filter.triplet.by.sig.term = TRUE,
  stage.wise.analysis = TRUE,
  verbose = FALSE
)
interaction_model(
  triplet,
  dnam,
  exp,
  dnam.group.threshold = 0.25,
  cores = 1,
  tf.activity.es = NULL,
  sig.threshold = 0.05,
  fdr = TRUE,
  filter.correlated.tf.exp.dnam = TRUE,
  filter.correlated.target.exp.dnam = TRUE,
  filter.triplet.by.sig.term = TRUE,
  stage.wise.analysis = TRUE,
  verbose = FALSE
)

Arguments

`triplet`	Data frame with columns for DNA methylation region (regionID), TF (TF), and target gene (target)
`dnam`	DNA methylation matrix or SummarizedExperiment object (columns: samples in the same order as `exp` matrix, rows: regions/probes)
`exp`	A matrix or SummarizedExperiment object object (columns: samples in the same order as `dnam`, rows: genes represented by ensembl IDs (e.g. ENSG00000239415))
`dnam.group.threshold`	DNA methylation threshold percentage to define samples in the low methylated group and high methylated group. For example, setting the threshold to 0.3 (30%) will assign samples with the lowest 30% methylation in the low group and the highest 30% methylation in the high group. Default is 0.25 (25%), accepted threshold range (0.0,0.5].
`cores`	Number of CPU cores to be used. Default 1.
`tf.activity.es`	A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression. See `get_tf_ES`.
`sig.threshold`	Threshold to filter significant triplets. Select if interaction.pval < 0.05 or pval.dnam < 0.05 or pval.tf < 0.05 in binary model
`fdr`	Uses fdr when using sig.threshold. Select if interaction.fdr < 0.05 or fdr.dnam < 0.05 or fdr.tf < 0.05 in binary model
`filter.correlated.tf.exp.dnam`	If wilcoxon test of TF expression Q1 and Q4 is significant (pvalue < 0.05), triplet will be removed.
`filter.correlated.target.exp.dnam`	If wilcoxon test of target expression Q1 and Q4 is not significant (pvalue > 0.05), triplet will be removed.
`filter.triplet.by.sig.term`	Filter significant triplets ? Select if interaction.pval < 0.05 or pval.dnam <0.05 or pval.tf < 0.05 in binary model
`stage.wise.analysis`	A boolean indicating if stagewise analysis should be performed to correct for multiple comparisons. If set to FALSE FDR analysis is performed.
`verbose`	A logical argument indicating if messages output should be provided.

Details

This function fits the linear model

log2(RNA target) ~ log2(TF) + DNAm + log2(TF) * DNAm

to triplet data as follow:

Model by considering DNAm as a binary variable - we defined a binary group for DNA methylation values (high = 1, low = 0). That is, samples with the highest DNAm levels (top 25 percent) has high = 1, samples with lowest DNAm levels (bottom 25 percent) has high = 0. Note that in this implementation, only samples with DNAm values in the first and last quartiles are considered.

In these models, the term log2(TF) evaluates direct effect of TF on target gene expression, DNAm evaluates direct effect of DNAm on target gene expression, and log2(TF)*DNAm evaluates synergistic effect of DNAm and TF, that is, if TF regulatory activity is modified by DNAm.

There are two implementations of these models, depending on whether there are an excessive amount (i.e. more than 25 percent) of samples with zero counts in RNAseq data:

When percent of zeros in RNAseq data is less than 25 percent, robust linear models are implemented using rlm function from MASS package. This gives outlier gene expression values reduced weight. We used "psi.bisqure" option in function rlm (bisquare weighting, https://stats.idre.ucla.edu/r/dae/robust-regression/).
When percent of zeros in RNAseq data is more than 25 percent, zero inflated negative binomial models are implemented using zeroinfl function from pscl package. This assumes there are two processes that generated zeros (1) one where the counts are always zero (2) another where the count follows a negative binomial distribution.

To account for confounding effects from covariate variables, first use the get_residuals function to obtain RNA or DNAm residual values which have covariate effects removed, then fit interaction model. Note that no log2 transformation is needed when interaction_model is applied to residuals data.

Note that only triplets with TF expression not significantly different in high vs. low methylation groups will be evaluated (Wilcoxon test, p > 0.05).

Value

A dataframe with Region, TF, target, TF_symbo, target_symbol, estimates and P-values, after fitting robust linear models or zero-inflated negative binomial models (see Details above).

Model considering DNAm values as a binary variable generates quant_pval_metGrp, quant_pval_rna.tf, quant_pval_metGrp.rna.tf, quant_estimates_metGrp, quant_estimates_rna.tf, quant_estimates_metGrp.rna.tf.

Model.interaction indicates which model (robust linear model or zero inflated model) was used to fit Model 1, and Model.quantile indicates which model(robust linear model or zero inflated model) was used to fit Model 2.

Examples

library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000252982")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000083937")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000252982",
   "TF" = "ENSG00000083937"
)
results <- interaction_model(
   triplet = triplet, 
   dnam = dnam, 
   exp = exp, 
    dnam.group.threshold = 0.25,
   stage.wise.analysis = FALSE, 
   sig.threshold = 1,
   filter.correlated.tf.exp.dnam = FALSE,
   filter.correlated.target.exp.dnam = FALSE,
   filter.triplet.by.sig.term = FALSE
)
library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000252982")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000083937")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000252982",
   "TF" = "ENSG00000083937"
)
results <- interaction_model(
   triplet = triplet, 
   dnam = dnam, 
   exp = exp, 
    dnam.group.threshold = 0.25,
   stage.wise.analysis = FALSE, 
   sig.threshold = 1,
   filter.correlated.tf.exp.dnam = FALSE,
   filter.correlated.target.exp.dnam = FALSE,
   filter.triplet.by.sig.term = FALSE
)

Transform DNA methylation array into a summarized Experiment object

Description

Transform DNA methylation array into a summarized Experiment object

Usage

make_dnam_se(
  dnam,
  genome = c("hg38", "hg19"),
  arrayType = c("450k", "EPIC"),
  betaToM = FALSE,
  verbose = FALSE
)
make_dnam_se(
  dnam,
  genome = c("hg38", "hg19"),
  arrayType = c("450k", "EPIC"),
  betaToM = FALSE,
  verbose = FALSE
)

Arguments

`dnam`	DNA methylation matrix with beta-values or m-values as data, row as cpgs "cg07946458" or regions ("chr1:232:245") and column as samples
`genome`	Human genome of reference: hg38 or hg19
`arrayType`	DNA methylation array type (450k or EPIC)
`betaToM`	indicates if converting methylation beta values to mvalues
`verbose`	A logical argument indicating if messages output should be provided.

Value

A summarized Experiment object with DNA methylation probes mapped to genomic regions

Examples

library(dplyr)
dnam <- runif(20, min = 0,max = 1) %>% sort %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)
 se <- make_dnam_se(dnam)

library(dplyr)
dnam <- runif(20, min = 0,max = 1) %>% sort %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)
 se <- make_dnam_se(dnam)

Transform gene expression matrix into a Summarized Experiment object

Description

Transform gene expression matrix into a Summarized Experiment object

Usage

make_exp_se(exp, genome = c("hg38", "hg19"), verbose = FALSE)
make_exp_se(exp, genome = c("hg38", "hg19"), verbose = FALSE)

Arguments

`exp`	Gene expression matrix with gene expression counts, row as ENSG gene IDS and column as samples
`genome`	Human genome of reference: hg38 or hg19
`verbose`	A logical argument indicating if messages output should be provided.

Value

A summarized Experiment object

Examples

gene.exp.chr21.log2 <- get(data("gene.exp.chr21.log2"))
gene.exp.chr21.log2.se <- make_exp_se(gene.exp.chr21.log2)
gene.exp.chr21.log2 <- get(data("gene.exp.chr21.log2"))
gene.exp.chr21.log2.se <- make_exp_se(gene.exp.chr21.log2)

Create a Granges object from a genmic region string

Description

Given a region name such as chr22:18267969-18268249, we will create a Granges object

Usage

make_granges_from_names(names)
make_granges_from_names(names)

Arguments

names

A region name as "chr22:18267969-18268249" or a vector of region names.

Value

A GRanges

Examples

regions.names <- c("chr22:18267969-18268249","chr23:18267969-18268249")
regions.gr <- make_granges_from_names(regions.names)
regions.names <- c("chr22:18267969-18268249","chr23:18267969-18268249")
regions.gr <- make_granges_from_names(regions.names)

Create region name from Granges

Description

Given a GRanges returns region name such as chr22:18267969-18268249

Usage

make_names_from_granges(region)
make_names_from_granges(region)

Arguments

region

A GenomicRanges object

Value

A string

Examples

regions.names <- c("chr22:18267969-18268249","chr23:18267969-18268249")
regions.gr <- make_granges_from_names(regions.names)
make_names_from_granges(regions.gr)
regions.names <- c("chr22:18267969-18268249","chr23:18267969-18268249")
regions.gr <- make_granges_from_names(regions.names)
make_names_from_granges(regions.gr)

Wrapper for MethReg functions

Description

Wrapper for the following MethReg functions: 1) DNAm vs Target gene spearman correlation 2) TF vs Target gene spearman correlation 3) interaction_model 4) stratified model

Usage

methReg_analysis(
  triplet,
  dnam,
  exp,
  tf.activity.es = NULL,
  dnam.group.percent.threshold = 0.25,
  perform.correlation.analaysis = TRUE,
  remove.nonsig.correlated.dnam.target.gene = FALSE,
  remove.nonsig.correlated.dnam.target.gene.threshold.pvalue = 0.01,
  remove.nonsig.correlated.dnam.target.gene.threshold.estimate = 0.2,
  remove.sig.correlated.tf.exp.dnam = TRUE,
  filter.triplet.by.sig.term = TRUE,
  filter.triplet.by.sig.term.using.fdr = TRUE,
  filter.triplet.by.sig.term.pvalue.threshold = 0.05,
  multiple.correction.by.stage.wise.analysis = TRUE,
  tf.dnam.classifier.pval.threshold = 0.001,
  verbose = FALSE,
  cores = 1
)
methReg_analysis(
  triplet,
  dnam,
  exp,
  tf.activity.es = NULL,
  dnam.group.percent.threshold = 0.25,
  perform.correlation.analaysis = TRUE,
  remove.nonsig.correlated.dnam.target.gene = FALSE,
  remove.nonsig.correlated.dnam.target.gene.threshold.pvalue = 0.01,
  remove.nonsig.correlated.dnam.target.gene.threshold.estimate = 0.2,
  remove.sig.correlated.tf.exp.dnam = TRUE,
  filter.triplet.by.sig.term = TRUE,
  filter.triplet.by.sig.term.using.fdr = TRUE,
  filter.triplet.by.sig.term.pvalue.threshold = 0.05,
  multiple.correction.by.stage.wise.analysis = TRUE,
  tf.dnam.classifier.pval.threshold = 0.001,
  verbose = FALSE,
  cores = 1
)

Arguments

`triplet`	Data frame with columns for DNA methylation region (regionID), TF (TF), and target gene (target)
`dnam`	DNA methylation matrix or SummarizedExperiment object (columns: samples in the same order as `exp` matrix, rows: regions/probes)
`exp`	A matrix or SummarizedExperiment object object (columns: samples in the same order as `dnam`, rows: genes represented by ensembl IDs (e.g. ENSG00000239415))
`tf.activity.es`	A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression. See `get_tf_ES`.
`dnam.group.percent.threshold`	DNA methylation threshold percentage to define samples in the low methylated group and high methylated group. For example, setting the threshold to 0.3 (30%) will assign samples with the lowest 30% methylation in the low group and the highest 30% methylation in the high group. Default is 0.25 (25%), accepted threshold range (0.0,0.5].
`perform.correlation.analaysis`	Perform correlation analysis ?
`remove.nonsig.correlated.dnam.target.gene`	If spearman correlation of target expression and DNAm for all samples is not significant (pvalue > 0.05), triplet will be removed If wilcoxon test of target expression Q1 and Q4 is not significant (pvalue > 0.05), triplet will be removed.
`remove.nonsig.correlated.dnam.target.gene.threshold.pvalue`	Cut-off for remove.nonsig.correlated.dnam.target.gene in the spearman test
`remove.nonsig.correlated.dnam.target.gene.threshold.estimate`	Cut-off for remove.nonsig.correlated.dnam.target.gene in the spearman test
`remove.sig.correlated.tf.exp.dnam`	If wilcoxon test of TF expression Q1 and Q4 is significant (pvalue < 0.05), triplet will be removed.
`filter.triplet.by.sig.term`	Filter significant triplets ? Select triplets if any term is significant 1) interaction (TF x DNAm) p-value < 0.05 or 2) DNAm p-value < 0.05 or 3) TF p-value < 0.05 in binary model
`filter.triplet.by.sig.term.using.fdr`	Uses FRD instead of p-value when using filter.triplet.by.sig.term.
`filter.triplet.by.sig.term.pvalue.threshold`	P-values/FDR Threshold to filter significant triplets.
`multiple.correction.by.stage.wise.analysis`	A boolean indicating if stagewise analysis should be performed to correct for multiple comparisons. If set to FALSE then FDR analysis is performed.
`tf.dnam.classifier.pval.threshold`	P-value threshold to consider a linear model significant of not. Default 0.001. This will be used to classify the TF role and DNAm effect.
`verbose`	A logical argument indicating if messages output should be provided.
`cores`	Number of CPU cores to be used. Default 1.

Details

This function fits the linear model

log2(RNA target) ~ log2(TF) + DNAm + log2(TF) * DNAm

to triplet data as follow:

There are two implementations of these models, depending on whether there are an excessive amount (i.e. more than 25 percent) of samples with zero counts in RNAseq data:

When percent of zeros in RNAseq data is less than 25 percent, robust linear models are implemented using rlm function from MASS package. This gives outlier gene expression values reduced weight. We used "psi.bisqure" option in function rlm (bisquare weighting, https://stats.idre.ucla.edu/r/dae/robust-regression/).
When percent of zeros in RNAseq data is more than 25 percent, zero inflated negative binomial models are implemented using zeroinfl function from pscl package. This assumes there are two processes that generated zeros (1) one where the counts are always zero (2) another where the count follows a negative binomial distribution.

Note that only triplets with TF expression not significantly different in high vs. low methylation groups will be evaluated (Wilcoxon test, p > 0.05).

Value

A dataframe with Region, TF, target, TF_symbo, target_symbol, estimates and P-values, after fitting robust linear models or zero-inflated negative binomial models (see Details above).

Plot interaction model results

Description

Create several plots to show interaction data TF expression with target gene interaction using a linear model

$log2(RNA target) = log2(TF) + DNAm + log2(TF) * DNAm$

To consider covariates, RNA can also be the residuals.

$log2(RNA target residuals) = log2(TF residual) + DNAm + log2(TF residual) * DNAm$

Usage

plot_interaction_model(
  triplet.results,
  dnam,
  exp,
  metadata,
  tf.activity.es = NULL,
  tf.dnam.classifier.pval.thld = 0.001,
  dnam.group.threshold = 0.25,
  label.dnam = "beta-value",
  label.exp = "expression",
  genome = "hg38",
  add.tf.vs.exp.scatter.plot = FALSE
)
plot_interaction_model(
  triplet.results,
  dnam,
  exp,
  metadata,
  tf.activity.es = NULL,
  tf.dnam.classifier.pval.thld = 0.001,
  dnam.group.threshold = 0.25,
  label.dnam = "beta-value",
  label.exp = "expression",
  genome = "hg38",
  add.tf.vs.exp.scatter.plot = FALSE
)

Arguments

`triplet.results`	Output from function interaction_model with Region ID, TF (column name: TF), and target gene (column name: target), p-values and estimates of interaction
`dnam`	DNA methylation matrix or SummarizedExperiment object (columns: samples same order as met, rows: regions/probes)
`exp`	gene expression matrix or a SummarizedExperiment object (columns: samples same order as met, rows: genes)
`metadata`	A data frame with samples as rownames and one columns that will be used to color the samples
`tf.activity.es`	A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression.
`tf.dnam.classifier.pval.thld`	P-value threshold to consider a linear model significant of not. Default 0.001. This will be used to classify the TF role and DNAm effect.
`dnam.group.threshold`	DNA methylation threshold percentage to define samples in the low methylated group and high methylated group. For example, setting the threshold to 0.3 (30%) will assign samples with the lowest 30% methylation in the low group and the highest 30% methylation in the high group. Default is 0.25 (25%), accepted threshold range (0.0,0.5].
`label.dnam`	Used for label text. Option "beta-value" and "residuals"
`label.exp`	Used for label text. Option "expression" and "residuals"
`genome`	Genome of reference to be added to the plot as text
`add.tf.vs.exp.scatter.plot`	Add another row to the figure if the target gene expression vs TF expression stratified by DNA methylation groups (DNAmLow - low quartile, DNAmHigh - high quartile)

Value

A ggplot object, includes a table with results from fitting interaction model, and the the following scatter plots: 1) TF vs DNAm, 2) Target vs DNAm, 3) Target vs TF, 4) Target vs TF for samples in Q1 and Q4 for DNA methylation, 5) Target vs DNAm for samples in Q1 and Q4 for the TF

Examples

library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000252982")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000083937")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000252982",
   "TF" = "ENSG00000083937"
)
results <- interaction_model(
   triplet = triplet, 
   dnam = dnam, 
   exp = exp, 
    dnam.group.threshold = 0.25,
   stage.wise.analysis = FALSE, 
   sig.threshold = 1,
   filter.correlated.tf.exp.dnam = FALSE,
   filter.correlated.target.exp.dnam = FALSE,
   filter.triplet.by.sig.term = FALSE
)
plots <- plot_interaction_model(
    triplet.results = results,
    dnam = dnam,
    exp = exp
)
library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000252982")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000083937")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000252982",
   "TF" = "ENSG00000083937"
)
results <- interaction_model(
   triplet = triplet, 
   dnam = dnam, 
   exp = exp, 
    dnam.group.threshold = 0.25,
   stage.wise.analysis = FALSE, 
   sig.threshold = 1,
   filter.correlated.tf.exp.dnam = FALSE,
   filter.correlated.target.exp.dnam = FALSE,
   filter.triplet.by.sig.term = FALSE
)
plots <- plot_interaction_model(
    triplet.results = results,
    dnam = dnam,
    exp = exp
)

Plot stratified model results

Description

Create several plots to show interaction data TF expression with target gene interaction using a linear model

$log2(RNA target) ~ log2(TF)$

to samples with highest DNAm values (top 25 percent) and lowest DNAm values (bottom 25 percent), separately.

Usage

plot_stratified_model(
  triplet.results,
  dnam,
  exp,
  metadata,
  label.dnam = "beta-value",
  label.exp = "expression",
  tf.activity.es = NULL,
  dnam.group.threshold = 0.25
)
plot_stratified_model(
  triplet.results,
  dnam,
  exp,
  metadata,
  label.dnam = "beta-value",
  label.exp = "expression",
  tf.activity.es = NULL,
  dnam.group.threshold = 0.25
)

Arguments

`triplet.results`	Output from function stratified_model with Region ID, TF (column name: TF), and target gene (column name: target), p-values and estimates of interaction
`dnam`	DNA methylation matrix or SummarizedExperiment object (columns: samples same order as met, rows: regions/probes)
`exp`	A gene expression matrix or SummarizedExperiment object (columns: samples same order as met, rows: genes)
`metadata`	A data frame with samples as row names and one columns that will be used to color the samples
`label.dnam`	Used for label text. Option "beta-value" and "residuals"
`label.exp`	Used for label text. Option "expression" and "residuals"
`tf.activity.es`	A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression.
`dnam.group.threshold`	DNA methylation threshold percentage to define samples in the low methylated group and high methylated group. For example, setting the threshold to 0.3 (30%) will assign samples with the lowest 30% methylation in the low group and the highest 30% methylation in the high group. Default is 0.25 (25%), accepted threshold range (0.0,0.5].

Value

A ggplot object, includes a table with results from fitting stratified model, and the following scatter plots: 1) TF vs DNAm, 2) Target vs DNAm, 3) Target vs TF, 4) Target vs TF for samples in Q1 and Q4 for DNA methylation, 5) Target vs DNAm for samples in Q1 and Q4 for the TF

Access REMAP2022 non-redundant peaks

Description

Access REMAP2022 non-redundant peaks

Usage

readRemap2022(cell_line)
readRemap2022(cell_line)

Arguments

cell_line

filter peaks using cell line description field

Fits linear models to triplet data (Target, TF, DNAm) for samples with high DNAm or low DNAm separately, and annotates TF (activator/repressor) and DNam effect over TF activity (attenuate, enhance).

Description

Should be used after fitting interaction_model, and only for triplet data with significant TF*DNAm interaction. This analysis examines in more details on how TF activities differ in samples with high DNAm or low DNAm values.

Usage

stratified_model(
  triplet,
  dnam,
  exp,
  cores = 1,
  tf.activity.es = NULL,
  tf.dnam.classifier.pval.thld = 0.001,
  dnam.group.threshold = 0.25
)
stratified_model(
  triplet,
  dnam,
  exp,
  cores = 1,
  tf.activity.es = NULL,
  tf.dnam.classifier.pval.thld = 0.001,
  dnam.group.threshold = 0.25
)

Arguments

`triplet`	Data frame with columns for DNA methylation region (regionID), TF (TF), and target gene (target)
`dnam`	DNA methylation matrix or SummarizedExperiment (columns: samples in the same order as `exp` matrix, rows: regions/probes)
`exp`	A matrix or SummarizedExperiment (columns: samples in the same order as `dnam` matrix, rows: genes represented by ensembl IDs (e.g. ENSG00000239415))
`cores`	Number of CPU cores to be used. Default 1.
`tf.activity.es`	A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression.
`tf.dnam.classifier.pval.thld`	P-value threshold to consider a linear model significant of not. Default 0.001. This will be used to classify the TF role and DNAm effect.
`dnam.group.threshold`	DNA methylation threshold percentage to define samples in the low methylated group and high methylated group. For example, setting the threshold to 0.3 (30%) will assign samples with the lowest 30% methylation in the low group and the highest 30% methylation in the high group. Default is 0.25 (25%), accepted threshold range (0.0,0.5].

Details

This function fits linear model log2(RNA target) = log2(TF)

to samples with highest DNAm values (top 25 percent) or lowest DNAm values (bottom 25 percent), separately.

There are two implementations of these models, depending on whether there are an excessive amount (i.e. more than 25 percent) of samples with zero counts in RNAseq data:

When percent of zeros in RNAseq data is less than 25 percent, robust linear models are implemented using rlm function from MASS package. This gives outlier gene expression values reduced weight. We used "psi.bisqure" option in function rlm (bisquare weighting, https://stats.idre.ucla.edu/r/dae/robust-regression/).
When percent of zeros in RNAseq data is more than 25 percent, zero inflated negative binomial models are implemented using zeroinfl function from pscl package. This assumes there are two processes that generated zeros (1) one where the counts are always zero (2) another where the count follows a negative binomial distribution.

To account for confounding effects from covariate variables, first use the get_residuals function to obtain RNA residual values which have covariate effects removed, then fit interaction model. Note that no log2 transformation is needed when interaction_model is applied to residuals data.

This function also provides annotations for TFs. A TF is annotated as activator if increasing amount of TF (higher TF gene expression) corresponds to increased target gene expression. A TF is annotated as repressor if increasing amount of TF (higher TF gene expression) corresponds to decrease in target gene expression. A TF is annotated as dual if in the Q1 methylation group increasing amount of TF (higher TF gene expression) corresponds to increase in target gene expression, while in Q4 methylation group increasing amount of TF (higher TF gene expression) corresponds to decrease in target gene expression (or the same but changing Q1 and Q4 in the previous sentence).

In addition, a region/CpG is annotated as enhancing if more TF regulation on gene transcription is observed in samples with high DNAm. That is, DNA methylation enhances TF regulation on target gene expression. On the other hand, a region/CpG is annotated as attenuating if more TF regulation on gene transcription is observed in samples with low DNAm. That is, DNA methylation reduces TF regulation on target gene expression.

Value

A data frame with Region, TF, target, TF_symbol target_symbol, results for fitting linear models to samples with low methylation (DNAmlow_pval_rna.tf, DNAmlow_estimate_rna.tf), or samples with high methylation (DNAmhigh_pval_rna.tf, DNAmhigh_pval_rna.tf.1), annotations for TF (class.TF) and (class.TF.DNAm).

Examples

library(dplyr)
dnam <- runif (20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif (20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000232886")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif (20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000232888")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000232886",
   "TF" = "ENSG00000232888"
)

results <- stratified_model(
  triplet = triplet,
  dnam = dnam,
  exp = exp
)
library(dplyr)
dnam <- runif (20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif (20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000232886")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif (20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000232888")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000232886",
   "TF" = "ENSG00000232888"
)

results <- stratified_model(
  triplet = triplet,
  dnam = dnam,
  exp = exp
)

Package 'MethReg'

Help Index

TCGA-COAD clinical matrix for 38 samples retrieved from GDC using TCGAbiolinks

Description

Usage

Format

Evaluate correlation of DNA methylation region and target gene expression

Description

Usage

Arguments

Value

Examples

Evaluate correlation of TF expression and target gene expression

Description

Usage

Arguments

Value

Examples

Map DNAm to target genes using distance approaches, and TF to the DNAm region using JASPAR2024 TFBS.

Description

Usage

Arguments

Value

Examples

Map TF and target genes using regulon databases or any user provided target-tf. Maps TF to the DNAm region with TFBS using JASPAR2020 TFBS.

Description

Usage

Arguments

Value

Examples

TCGA-COAD DNA methylation matrix (beta-values) for 38 samples (only chr21) retrieved from GDC using TCGAbiolinks

Description

Usage

Format

Format MethReg results table and export to XLSX file

Description

Usage

Arguments

Value

Examples

Select regions with variations in DNA methylation levels above a threshold

Description

Usage

Arguments

Value

Examples

Select genes with variations above a threshold

Description

Usage

Arguments

Value

Examples

Remove genes with gene expression level equal to 0 in a substantial percentage of the samples

Description

Usage

Arguments

Value

TCGA-COAD gene expression matrix (log2 (FPKM-UQ + 1)) for 38 samples (only chromosome 21) retrieved from GDC using TCGAbiolinks

Description

Usage

Format

Access human TF from Lambert et al 2018

Description

Usage

Value

Examples

Get HM450/EPIC manifest files from Sesame package

Description

Usage

Arguments

Value

Examples

Summarize promoter DNA methylation beta values by mean.

Description

Usage

Arguments

Value

Examples

Obtain target genes of input regions based on distance

Description