This study explores expression profiles of basal stem-cell enriched cells (B) and committed luminal cells (L) of the mammary gland of virgin, pregnant and lactating mice [1]. Six groups are compared corresponding to a combination of cell type and mouse status. Each group contains two biological replicates. Sequences and counts datasets are publicly available from the Gene Expression Omnibus (GEO) with the serie accession number GSE60450. The RNA-Seq data is analysed by edgeR development team [2].
As described in edgeR user guide, we tested for significant differential expression in gene, using the QL F-test [3]. Here, we focused in gene expression analysis in only one type of cell, i.e. luminal cells for the three developmental status (virgin, pregnant and lactating mice). Among the 15,804 expressed genes, we obtained 7,302 significantly differentially expressed (DE) genes for the comparison virgin versus pregnant, 7,699 for the comparison pregnant versus lactate, and 9,583 for the comparison virgin versus lactate, with Benjamini-Hochberg correction to control the false discovery rate at 5%.
Vignette build convenience (for less build time and package size) need that data were pre-calculated (provided by the package), and that illustrations were not interactive.
We load examples files from ViSEAGO
package using system.file
from the locally installed
package. We read the gene identifiers for the background (= expressed
genes) file and the three lists of DE genes of the study. Here, gene
identifiers are GeneID from EntrezGene database.
# load genes identifiants (GeneID,ENS...) background (expressed genes)
background<-scan(
system.file(
"extdata/data/input",
"background_L.txt",
package = "ViSEAGO"
),
quiet=TRUE,
what=""
)
# load Differentialy Expressed (DE) gene identifiants from lists
PregnantvsLactateDE<-scan(
system.file(
"extdata/data/input",
"pregnantvslactateDE.txt",
package = "ViSEAGO"
),
quiet=TRUE,
what=""
)
VirginvsLactateDE<-scan(
system.file(
"extdata/data/input",
"virginvslactateDE.txt",
package = "ViSEAGO"
),
quiet=TRUE,
what=""
)
VirginvsPregnantDE<-scan(
system.file(
"extdata/data/input",
"virginvspregnantDE.txt",
package = "ViSEAGO"
),
quiet=TRUE,
what=""
)
Here, we display the first 6 GeneID from the PregnantvsLactateDE list.
In this study, we build a myGENE2GO
object using the
Bioconductor org.Mm.eg.db
database package for the mouse species. This object contains all
available GO annotations for categories Molecular Function (MF),
Biological Process (BP), and Cellular Component (CC).
NB: Don’t forget to check if the last current annotation
database version is installed in your R session! See
ViSEAGO::available_organisms(Bioconductor)
.
# connect to Bioconductor
Bioconductor<-ViSEAGO::Bioconductor2GO()
# load GO annotations from Bioconductor
myGENE2GO<-ViSEAGO::annotate(
"org.Mm.eg.db",
Bioconductor
)
- object class: gene2GO
- database: Bioconductor
- stamp/version: 2019-Jul10
- organism id: org.Mm.eg.db
GO annotations:
- Molecular Function (MF): 22707 annotated genes with 91986 terms (4121 unique terms)
- Biological Process (BP): 23210 annotated genes with 164825 terms (12224 unique terms)
- Cellular Component (CC): 23436 annotated genes with 107852 terms (1723 unique terms)
We perform a functional Gene Ontology (GO) enrichment analysis from
differentially expressed (DE) genes of luminal cells in the mammary
gland. The enriched Biological process (BP) are
obtained using a Fisher’s exact test with elim
algorithm
developped in topGO
package.
First, we create three topGOdata
objects, using
ViSEAGO::create_topGOdata
method, corresponding to the
three DE genes lists for the comparison virgin versus pregnant, pregnant
versus lactate, and virgin versus lactate. The gene background
corresponding to expressed genes in luminal cells of the mammary gland
and GO annotations are also provided.
# create topGOdata for BP for each list of DE genes
BP_PregnantvsLactate<-ViSEAGO::create_topGOdata(
geneSel=PregnantvsLactateDE,
allGenes=background,
gene2GO=myGENE2GO,
ont="BP",
nodeSize=5
)
BP_VirginvsLactate<-ViSEAGO::create_topGOdata(
geneSel=VirginvsLactateDE,
allGenes=background,
gene2GO=myGENE2GO,
ont="BP",
nodeSize=5
)
BP_VirginvsPregnant<-ViSEAGO::create_topGOdata(
geneSel=VirginvsPregnantDE,
allGenes=background,
gene2GO=myGENE2GO,
ont="BP",
nodeSize=5
)
Now, we perform the GO enrichment tests for BP category with Fisher’s
exact test and elim algorithm using topGO::runTest
method.
NB: p-values of enriched GO terms are not adjusted and considered significant if below 0.01.
# perform topGO tests
elim_BP_PregnantvsLactate<-topGO::runTest(
BP_PregnantvsLactate,
algorithm ="elim",
statistic = "fisher",
cutOff=0.01
)
elim_BP_VirginvsLactate<-topGO::runTest(
BP_VirginvsLactate,
algorithm ="elim",
statistic = "fisher",
cutOff=0.01
)
elim_BP_VirginvsPregnant<-topGO::runTest(
BP_VirginvsPregnant,
algorithm ="elim",
statistic = "fisher",
cutOff=0.01
)
We combine the results of the three enrichment tests into an object
using ViSEAGO::merge_enrich_terms
method. A table of
enriched GO terms in at least one comparison is displayed in interactive
mode, or printed in a file using ViSEAGO::show_table
method. The printed table contains for each enriched GO terms,
additional columns including the list of significant genes and frequency
(ratio between the number of significant genes and number of background
genes in a specific GO tag) evaluated by comparison.
# merge topGO results
BP_sResults<-ViSEAGO::merge_enrich_terms(
cutoff=0.01,
Input=list(
PregnantvsLactate=c(
"BP_PregnantvsLactate",
"elim_BP_PregnantvsLactate"
),
VirginvsLactate=c(
"BP_VirginvsLactate",
"elim_BP_VirginvsLactate"
),
VirginvsPregnant=c(
"BP_VirginvsPregnant",
"elim_BP_VirginvsPregnant"
)
)
)
- object class: enrich_GO_terms
- ontology: BP
- method: topGO
- summary:PregnantvsLactate
BP_PregnantvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 7699
feasible_genes: 14091
feasible_genes_significant: 7044
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_PregnantvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 199
feasible_genes: 14091
feasible_genes_significant: 7044
genes_nodeSize: 5
Nontrivial_nodes: 8433
VirginvsLactate
BP_VirginvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 9583
feasible_genes: 14091
feasible_genes_significant: 8734
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_VirginvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 152
feasible_genes: 14091
feasible_genes_significant: 8734
genes_nodeSize: 5
Nontrivial_nodes: 8457
VirginvsPregnant
BP_VirginvsPregnant
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 7302
feasible_genes: 14091
feasible_genes_significant: 6733
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_VirginvsPregnant
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 243
feasible_genes: 14091
feasible_genes_significant: 6733
genes_nodeSize: 5
Nontrivial_nodes: 8413
- enrichment pvalue cutoff:
PregnantvsLactate : 0.01
VirginvsLactate : 0.01
VirginvsPregnant : 0.01
- enrich GOs (in at least one list): 521 GO terms of 3 conditions.
PregnantvsLactate : 199 terms
VirginvsLactate : 152 terms
VirginvsPregnant : 243 terms
Several graphs summarize important features. An interactive barchart
showing the number of GO terms enriched or not in each comparison, using
ViSEAGO::GOcount
method. Intersections of lists of enriched
GO terms between comparisons are displayed in an Upset plot using
ViSEAGO::Upset
method.
GO annotations of genes created at a previous step and enriched GO
terms are combined using ViSEAGO::build_GO_SS
method. The
Semantic Similarity (SS) between enriched GO terms are calculated using
ViSEAGO::compute_SS_distances
method which is a wrapper of
functions implemented in the GOSemSim
package [4]. Here, we choose Wang
method based on the topology of GO graph structure. The built object
myGOs
contains all informations of enriched GO terms and
the SS distances between them.
# create GO_SS-class object
myGOs<-ViSEAGO::build_GO_SS(
gene2GO=myGENE2GO,
enrich_GO_terms=BP_sResults
)
- object class: GO_SS
- ontology: BP
- method: topGO
- summary:
PregnantvsLactate
BP_PregnantvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 7699
feasible_genes: 14091
feasible_genes_significant: 7044
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_PregnantvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 199
feasible_genes: 14091
feasible_genes_significant: 7044
genes_nodeSize: 5
Nontrivial_nodes: 8433
VirginvsLactate
BP_VirginvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 9583
feasible_genes: 14091
feasible_genes_significant: 8734
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_VirginvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 152
feasible_genes: 14091
feasible_genes_significant: 8734
genes_nodeSize: 5
Nontrivial_nodes: 8457
VirginvsPregnant
BP_VirginvsPregnant
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 7302
feasible_genes: 14091
feasible_genes_significant: 6733
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_VirginvsPregnant
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 243
feasible_genes: 14091
feasible_genes_significant: 6733
genes_nodeSize: 5
Nontrivial_nodes: 8413
- enrichment pvalue cutoff:
PregnantvsLactate : 0.01
VirginvsLactate : 0.01
VirginvsPregnant : 0.01
- enrich GOs (in at least one list): 521 GO terms of 3 conditions.
PregnantvsLactate : 199 terms
VirginvsLactate : 152 terms
VirginvsPregnant : 243 terms
- terms distances: Wang
A Multi Dimensional Scale (MDS) plot with
ViSEAGO::MDSplot
method provides a representation of
distances among a set of enriched GO terms on the two first dimensions.
Some patterns could appear at this time and could be investigated in the
interactive mode. The plot could also be printed in a png file.
To fully explore the results of this functional analysis, a
hierarchical clustering using ViSEAGO::GOterms_heatmap
is
performed based on Wang
SS distance between enriched GO
terms and ward.D2
aggregation criteria.
Clusters of enriched GO terms are obtained by cutting branches off the dendrogram. Here, we choose a dynamic branch cutting method based on the shape of clusters using dynamicTreeCut [5,6]. Here, we use parameters to detect small clusters in agreement with the dendrogram structure.
Enriched GO terms are ranked in a dendrogram and colored depending on
their cluster assignation. Additional illustrations are displayed: the
GO description of terms (trimmed if more than 50 characters), a heatmap
of -log10(pvalue) of enrichment test for each comparison, and optionally
the Information Content (IC). The IC of a GO term is computed by the
negative log probability of the term occurring in the GO annotations
database of the studied species. A rarely used term contains a greater
amount of IC. The illustration is displayed with
ViSEAGO::show_heatmap
method.
# Create GOterms heatmap
Wang_clusters_wardD2<-ViSEAGO::GOterms_heatmap(
myGOs,
showIC=FALSE,
showGOlabels =FALSE,
GO.tree=list(
tree=list(
distance="Wang",
aggreg.method="ward.D2"
),
cut=list(
dynamic=list(
pamStage=TRUE,
pamRespectsDendro=TRUE,
deepSplit=2,
minClusterSize =2
)
)
),
samples.tree=NULL
)
A table of enriched GO terms in at least one of the comparison is
displayed in interactive mode, or printed in a file using
ViSEAGO::show_table
method. The columns GO cluster number
and IC value are added to the previous table of enriched terms table.
The printed table contains for each enriched GO terms, additional
columns including the list of significant genes and frequency (ratio
between the number of significant genes and number of background genes
in a specific GO tag) evaluated by comparison.
NB: For this vignette, this illustration is not interactive.
We also display a colored Multi Dimensional Scale (MDS) plot showing
the overlay of GO terms clusters from Wang_clusters_wardD2
object with ViSEAGO::MDSplot
method. It is a way to check
the coherence of GO terms clusters on the MDS plot.
This part is dedicated to explore relationships between clusters of
GO terms defined by the last hierarchical clustering with
ViSEAGO::GOterms_heatmap
method. This analysis should be
conducted if the number of clusters of GO terms is large enough and
therefore difficult to investigate.
A Semantic Similarity (SS) between clusters of GO terms, previously
defined, is calculated using ViSEAGO::compute_SS_distances
method. Here, we choose the Best-Match Average, also known as
BMA method implemented in the GOSemSim
package [4]. It calculates the average of
all maximum similarities between two clusters of GO terms.
A colored Multi Dimensional Scale (MDS) plot with
ViSEAGO::MDSplot
method provides a representation of
distances between the clusters of GO terms. Each circle represents a
cluster of GO terms and its size depends on the number of GO terms that
it contains. Clusters of GO terms that are close should share a
functional coherence.
A new hierarchical clustering using
ViSEAGO::GOclusters_heatmap
is performed based on the
BMA
SS distance between the clusters of GO terms,
previously computed, and the ward.D2
aggregation
criteria.
Clusters of GO terms are ranked in a dendrogram and colored depending
on their cluster assignation. Additional illustrations are displayed:
the definition of the cluster of GO terms corresponds to the first
common GO term ancestor followed by the cluster label in brackets, and
the heatmap of GO terms count in the corresponding cluster. The
illustration is displayed with ViSEAGO::show_heatmap
method.
# GOclusters heatmap
Wang_clusters_wardD2<-ViSEAGO::GOclusters_heatmap(
Wang_clusters_wardD2,
tree=list(
distance="BMA",
aggreg.method="ward.D2"
)
)
An history of the enrichment functional analysis is reported.
- object class: GO_clusters
- ontology: BP
- method: topGO
- summary:
PregnantvsLactate
BP_PregnantvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 7699
feasible_genes: 14091
feasible_genes_significant: 7044
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_PregnantvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 199
feasible_genes: 14091
feasible_genes_significant: 7044
genes_nodeSize: 5
Nontrivial_nodes: 8433
VirginvsLactate
BP_VirginvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 9583
feasible_genes: 14091
feasible_genes_significant: 8734
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_VirginvsLactate
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 152
feasible_genes: 14091
feasible_genes_significant: 8734
genes_nodeSize: 5
Nontrivial_nodes: 8457
VirginvsPregnant
BP_VirginvsPregnant
description: Bioconductor org.Mm.eg.db 2019-Jul10
available_genes: 15804
available_genes_significant: 7302
feasible_genes: 14091
feasible_genes_significant: 6733
genes_nodeSize: 5
nodes_number: 8463
edges_number: 19543
elim_BP_VirginvsPregnant
description: Bioconductor org.Mm.eg.db 2019-Jul10
test_name: fisher p<0.01
algorithm_name: elim
GO_scored: 8463
GO_significant: 243
feasible_genes: 14091
feasible_genes_significant: 6733
genes_nodeSize: 5
Nontrivial_nodes: 8413
- enrichment pvalue cutoff:
PregnantvsLactate : 0.01
VirginvsLactate : 0.01
VirginvsPregnant : 0.01
- enrich GOs (in at least one list): 521 GO terms of 3 conditions.
PregnantvsLactate : 199 terms
VirginvsLactate : 152 terms
VirginvsPregnant : 243 terms
- terms distances: Wang
- clusters distances: BMA
- Heatmap:
* GOterms: TRUE
- GO.tree:
tree.distance: Wang
tree.aggreg.method: ward.D2
cut.dynamic.pamStage: TRUE
cut.dynamic.pamRespectsDendro: TRUE
cut.dynamic.deepSplit: 2
cut.dynamic.minClusterSize: 2
number of clusters: 62
clusters min size: 2
clusters mean size: 8
clusters max size: 32
- sample.tree: FALSE
* GOclusters: TRUE
- tree:
distance: BMA
aggreg.method: ward.D2
A functional Gene Ontology (GO) enrichment analysis was performed using ViSEAGO package, from differentially expressed (DE) genes in luminal cells of the mouse mammary gland. It facilitates the interpretation of biological processes involved between these three comparisons of virgin, pregnant and lactating mice. It provides both a synthetic and detailed view using interactive functionalities respecting the GO graph structure and ensuring functional coherence.
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ViSEAGO_1.21.0 BiocStyle_2.35.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 sys_3.4.3 jsonlite_1.8.9
[4] shape_1.4.6.1 magrittr_2.0.3 rmarkdown_2.29
[7] GlobalOptions_0.1.2 fs_1.6.5 zlibbioc_1.52.0
[10] vctrs_0.6.5 memoise_2.0.1 RCurl_1.98-1.16
[13] webshot_0.5.5 htmltools_0.5.8.1 progress_1.2.3
[16] dynamicTreeCut_1.63-1 curl_6.0.1 sass_0.4.9
[19] bslib_0.8.0 htmlwidgets_1.6.4 plyr_1.8.9
[22] httr2_1.0.7 plotly_4.10.4 cachem_1.1.0
[25] buildtools_1.0.0 igraph_2.1.2 lifecycle_1.0.4
[28] iterators_1.0.14 pkgconfig_2.0.3 Matrix_1.7-1
[31] R6_2.5.1 fastmap_1.2.0 GenomeInfoDbData_1.2.13
[34] clue_0.3-66 digest_0.6.37 colorspace_2.1-1
[37] AnnotationDbi_1.69.0 S4Vectors_0.45.2 RSQLite_2.3.9
[40] seriation_1.5.7 filelock_1.0.3 httr_1.4.7
[43] compiler_4.4.2 bit64_4.5.2 doParallel_1.0.17
[46] BiocParallel_1.41.0 viridis_0.6.5 DBI_1.2.3
[49] UpSetR_1.4.0 heatmaply_1.5.0 dendextend_1.19.0
[52] R.utils_2.12.3 biomaRt_2.63.0 rappdirs_0.3.3
[55] rjson_0.2.23 tools_4.4.2 R.oo_1.27.0
[58] glue_1.8.0 DiagrammeR_1.0.11 GOSemSim_2.33.0
[61] grid_4.4.2 cluster_2.1.8 fgsea_1.33.0
[64] generics_0.1.3 gtable_0.3.6 ca_0.71.1
[67] R.methodsS3_1.8.2 tidyr_1.3.1 data.table_1.16.4
[70] hms_1.1.3 xml2_1.3.6 XVector_0.47.0
[73] BiocGenerics_0.53.3 foreach_1.5.2 pillar_1.10.0
[76] stringr_1.5.1 yulab.utils_0.1.8 circlize_0.4.16
[79] dplyr_1.1.4 BiocFileCache_2.15.0 lattice_0.22-6
[82] bit_4.5.0.1 SparseM_1.84-2 tidyselect_1.2.1
[85] registry_0.5-1 topGO_2.59.0 GO.db_3.20.0
[88] ComplexHeatmap_2.23.0 maketools_1.3.1 Biostrings_2.75.3
[91] knitr_1.49 gridExtra_2.3 IRanges_2.41.2
[94] stats4_4.4.2 xfun_0.49 Biobase_2.67.0
[97] matrixStats_1.4.1 DT_0.33 visNetwork_2.1.2
[100] stringi_1.8.4 UCSC.utils_1.3.0 lazyeval_0.2.2
[103] yaml_2.3.10 evaluate_1.0.1 codetools_0.2-20
[106] tibble_3.2.1 BiocManager_1.30.25 graph_1.85.0
[109] cli_3.6.3 munsell_0.5.1 jquerylib_0.1.4
[112] Rcpp_1.0.13-1 GenomeInfoDb_1.43.2 dbplyr_2.5.0
[115] png_0.1-8 XML_3.99-0.17 parallel_4.4.2
[118] ggplot2_3.5.1 assertthat_0.2.1 blob_1.2.4
[121] prettyunits_1.2.0 AnnotationForge_1.49.0 bitops_1.0-9
[124] viridisLite_0.4.2 scales_1.3.0 purrr_1.0.2
[127] crayon_1.5.3 GetoptLong_1.0.5 rlang_1.1.4
[130] TSP_1.2-4 cowplot_1.1.3 fastmatch_1.1-4
[133] KEGGREST_1.47.0