DELocal

DELocal

Introduction

Exploration of genetically modified organisms, developmental processes, diseases or responses to various treatments require accurate measurement of changes in gene expression. This can be done for thousands of genes using high throughput technologies such as microarray and RNAseq. However, identification of differentially expressed (DE) genes poses technical challenges due to limited sample size, few replicates, or simply very small changes in expression levels. Consequently, several methods have been developed to determine DE genes, such as Limma, edgeR, and DESeq2. These methods identify DE genes based on the expression levels alone. As genomic co-localization of genes is generally not linked to co-expression, we deduced that DE genes could be detected with the help of genes from chromosomal neighbourhood. Here, we present a new method, DELocal, which identifies DE genes by comparing their expression changes to changes in adjacent genes in their chromosomal regions.

neighbor
In the above figure it can be seen that Sostdc1 is differentially expressed in developing tooth tissues (E13 and E14). DELocal helps in identifying similar genes.

Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("delocal")

To install from github

if (!requireNamespace("devtools")) {
  install.packages("devtools")
}
devtools::install_github("dasroy/delocal")

How to run

This is a basic example which shows you how to use DELocal:

First a SummarizedExperiment object will be configured with gene expression count matrix and gene location info.

Read the raw count values

library(DELocal)
count_matrix <- as.matrix(read.table(file = system.file("extdata", 
                                              "tooth_RNASeq_counts.txt", 
                                              package = "DELocal")))
colData <- data.frame(condition=gsub("\\..*",x=colnames(count_matrix),
                                     replacement = ""))

Getting gene chromosomal location

Example of required gene location information

gene_location <- read.table(file = system.file("extdata", 
                                              "gene_location.txt", 
                                              package = "DELocal"))
head(gene_location)
#>                       ensembl_gene_id start_position chromosome_name
#> ENSMUSG00000000001 ENSMUSG00000000001      108107280               3
#> ENSMUSG00000000003 ENSMUSG00000000003       77837901               X
#> ENSMUSG00000000028 ENSMUSG00000000028       18780447              16
#> ENSMUSG00000000031 ENSMUSG00000000031      142575529               7
#> ENSMUSG00000000037 ENSMUSG00000000037      161082525               X
#> ENSMUSG00000000049 ENSMUSG00000000049      108343354              11

Example code to get gene location information like above

library(biomaRt)
gene_attributes <- c("ensembl_gene_id", "start_position", "chromosome_name")
ensembl_ms_mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL",
                           dataset="mmusculus_gene_ensembl", host="www.ensembl.org")
#> Warning: Ensembl will soon enforce the use of https.
#> Ensure the 'host' argument includes "https://"
gene_location_sample <- getBM(attributes=gene_attributes, mart=ensembl_ms_mart,
                       verbose = FALSE)
rownames(gene_location_sample) <- gene_location_sample$ensembl_gene_id

Integrating gene expression and location into a single object.

library(SummarizedExperiment)
smrExpt <- SummarizedExperiment(assays=list(counts=count_matrix),
                                                      rowData = gene_location, 
                                                      colData=colData)
smrExpt
#> class: SummarizedExperiment 
#> dim: 52183 14 
#> metadata(0):
#> assays(1): counts
#> rownames(52183): ENSMUSG00000000001 ENSMUSG00000000003 ...
#>   ENSMUSG00000114967 ENSMUSG00000114968
#> rowData names(3): ensembl_gene_id start_position chromosome_name
#> colnames(14): ME14.E1M1R ME14.E2M1R ... ME13.E9M1R ME13.EXM1L
#> colData names(1): condition

Final results

These may take long time to run the whole data therefore here we will analyse genes only from X chromosome. Here in this example DELocal compares each gene with 5 ‘nearest_neighbours’ and returns only genes whose adjusted p-value is less than pValue_cut.

library(dplyr)
x_genes <- SummarizedExperiment::rowData(smrExpt) %>% 
    as.data.frame() %>% 
    filter(chromosome_name=="X") %>% rownames() 

DELocal_result <- DELocal(pSmrExpt = smrExpt[x_genes,],
                         nearest_neighbours = 5,pDesign = ~ condition,
                         pValue_cut = 0.05)

head(round(DELocal_result,digits = 9))
#>                    relative.logFC  P.Value   adj.P.Val         B
#> ENSMUSG00000037217       508.8688 0.00e+00 0.000000091 15.033054
#> ENSMUSG00000062184      -694.6147 4.07e-07 0.000267921  4.970833
#> ENSMUSG00000059401       140.0683 4.35e-07 0.000267921  4.899633
#> ENSMUSG00000016319       955.2564 4.85e-07 0.000267921  4.781442
#> ENSMUSG00000045103       417.2448 5.18e-07 0.000267921  4.711055
#> ENSMUSG00000031103       370.1563 6.98e-07 0.000300492  4.392911

The results are already sorted in ascending order of adjusted p-value (adj.P.Val)

Plot expression pattern of a neighbourhood of a gene

plotNeighbourhood function can be used to plot median expressions of different ‘condition’ for a gene of interest and its pNearest_neighbours genes.

DELocal::plotNeighbourhood(pSmrExpt = smrExpt, pGene_id = "ENSMUSG00000059401", 
                           pNearest_neighbours=5, pDesign = ~ condition)$plot

Dynamic neighbour

In previous example 1 Mb chromosomal area around each gene to define its neighbourhood. The choice of 1Mb window is obviously somewhat arbitrary. However it is also possible to use different size of neighbourhood for each gene. For that user can provide “neighbors_start” and “neighbors_end” for each gene in the rowData.
To demonstrate this, TADs (topologically associating domains) boundaries which are different for each gene are used next.

gene_location_dynamicNeighbourhood <- read.csv(system.file("extdata", "Mouse_TAD_boundaries.csv",
                             package = "DELocal"))

rownames(gene_location_dynamicNeighbourhood) <- 
    gene_location_dynamicNeighbourhood$ensembl_gene_id

# rename the columns as required by DELocal
colnames(gene_location_dynamicNeighbourhood)[4:5] <- c("neighbors_start",
                                                       "neighbors_end")
common_genes <- intersect(rownames(count_matrix),
                          rownames(gene_location_dynamicNeighbourhood) )

smrExpt_dynamicNeighbour <-
    SummarizedExperiment::SummarizedExperiment(
        assays = list(counts = count_matrix[common_genes,]),
        rowData = gene_location_dynamicNeighbourhood[common_genes, ],
        colData = colData
    )

## Selecting only chromosome 1 genes to reduce the runtime                                                      
# one_genes <- SummarizedExperiment::rowData(smrExpt_dynamicNeighbour) %>% 
#     as.data.frame() %>% 
#     filter(chromosome_name=="1") %>% rownames() 

DELocal_result_tad <- DELocal(pSmrExpt = smrExpt_dynamicNeighbour[x_genes,], 
                         nearest_neighbours = 5,pDesign = ~ condition,
                         pValue_cut = 0.05, pLogFold_cut = 0)

head(DELocal_result_tad)
#>                    relative.logFC      P.Value    adj.P.Val         B
#> ENSMUSG00000037217       508.8688 3.520588e-11 9.097200e-08 15.036315
#> ENSMUSG00000062184      -694.6147 4.068015e-07 2.679164e-04  4.974179
#> ENSMUSG00000059401       140.0683 4.347483e-07 2.679164e-04  4.902978
#> ENSMUSG00000016319       955.2564 4.854489e-07 2.679164e-04  4.784788
#> ENSMUSG00000045103       417.2448 5.184140e-07 2.679164e-04  4.714401
#> ENSMUSG00000031103       370.1563 6.977232e-07 3.004861e-04  4.396258

SessionInfo

#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] dplyr_1.1.4                 SummarizedExperiment_1.35.5
#>  [3] Biobase_2.67.0              GenomicRanges_1.57.2       
#>  [5] GenomeInfoDb_1.41.2         IRanges_2.39.2             
#>  [7] S4Vectors_0.43.2            BiocGenerics_0.53.0        
#>  [9] MatrixGenerics_1.17.1       matrixStats_1.4.1          
#> [11] biomaRt_2.63.0              DELocal_1.7.0              
#> [13] BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.1        farver_2.1.2            blob_1.2.4             
#>  [4] filelock_1.0.3          Biostrings_2.75.0       fastmap_1.2.0          
#>  [7] BiocFileCache_2.15.0    digest_0.6.37           lifecycle_1.0.4        
#> [10] statmod_1.5.0           KEGGREST_1.45.1         RSQLite_2.3.7          
#> [13] magrittr_2.0.3          compiler_4.4.1          rlang_1.1.4            
#> [16] sass_0.4.9              progress_1.2.3          tools_4.4.1            
#> [19] utf8_1.2.4              yaml_2.3.10             knitr_1.48             
#> [22] labeling_0.4.3          prettyunits_1.2.0       S4Arrays_1.5.11        
#> [25] curl_5.2.3              bit_4.5.0               DelayedArray_0.31.14   
#> [28] xml2_1.3.6              plyr_1.8.9              abind_1.4-8            
#> [31] BiocParallel_1.39.0     withr_3.0.2             purrr_1.0.2            
#> [34] sys_3.4.3               grid_4.4.1              fansi_1.0.6            
#> [37] colorspace_2.1-1        ggplot2_3.5.1           scales_1.3.0           
#> [40] cli_3.6.3               rmarkdown_2.28          crayon_1.5.3           
#> [43] generics_0.1.3          httr_1.4.7              reshape2_1.4.4         
#> [46] DBI_1.2.3               cachem_1.1.0            stringr_1.5.1          
#> [49] zlibbioc_1.51.2         parallel_4.4.1          AnnotationDbi_1.69.0   
#> [52] BiocManager_1.30.25     XVector_0.45.0          vctrs_0.6.5            
#> [55] Matrix_1.7-1            jsonlite_1.8.9          hms_1.1.3              
#> [58] bit64_4.5.2             maketools_1.3.1         locfit_1.5-9.10        
#> [61] limma_3.61.12           jquerylib_0.1.4         glue_1.8.0             
#> [64] codetools_0.2-20        stringi_1.8.4           gtable_0.3.6           
#> [67] UCSC.utils_1.1.0        munsell_0.5.1           tibble_3.2.1           
#> [70] pillar_1.9.0            rappdirs_0.3.3          htmltools_0.5.8.1      
#> [73] GenomeInfoDbData_1.2.13 dbplyr_2.5.0            httr2_1.0.5            
#> [76] R6_2.5.1                evaluate_1.0.1          lattice_0.22-6         
#> [79] highr_0.11              png_0.1-8               memoise_2.0.1          
#> [82] bslib_0.8.0             Rcpp_1.0.13             SparseArray_1.5.45     
#> [85] DESeq2_1.45.3           xfun_0.48               buildtools_1.0.0       
#> [88] pkgconfig_2.0.3