Package 'sSNAPPY' reference manual

Title:	Single Sample directioNAl Pathway Perturbation analYsis
Description:	A single sample pathway perturbation testing method for RNA-seq data. The method propagates changes in gene expression down gene-set topologies to compute single-sample directional pathway perturbation scores that reflect potential direction of change. Perturbation scores can be used to test significance of pathway perturbation at both individual-sample and treatment levels.
Authors:	Wenjun Liu [aut, cre] , Stephen Pederson [aut]
Maintainer:	Wenjun Liu <[email protected]>
License:	GPL-3
Version:	1.11.0
Built:	2025-03-20 20:45:28 UTC
Source:	https://github.com/bioc/sSNAPPY

Permute sample labels to simulate null distribution of perturbation scores

Description

Simulate null distributions of perturbation scores for each pathway through sample permutation.

Usage

generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'matrix'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'data.frame'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'DGEList'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'SummarizedExperiment'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'matrix'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'data.frame'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'DGEList'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

## S4 method for signature 'SummarizedExperiment'
generate_permuted_scores(
  expreMatrix,
  NB = NULL,
  testScore = NULL,
  gsTopology,
  weight,
  drop = TRUE
)

Arguments

`expreMatrix`	`matrix` and `data.frame` of logCPM, or `DGEList`/`SummarizedExperiment` storing gene expression counts. Feature names need to be in entrez IDs
`NB`	Number of permuted sample pairs to compute permuted logFCs for. Default to be the maximum number of possibilities (See details).
`testScore`	Optional. Output of `pathway_pert()` for restricting the permutation only to pathways with non-zero test scores in at least one sample.
`gsTopology`	List of pathway topology matrices generated using function `retrieve_topology`
`weight`	A vector of gene-wise weights derived from function `weight_ss_fc`
`drop`	logic(1). Whether to drop pathways with all zero scores

Details

This generate_permuted_scores function firstly randomly permute sample labels to form permuted pairs and generate permuted logFCs, which are then used to compute permuted perturbation scores for each pathway.

The function outputs a list that is of the same length as the list storing pathway topology matrices (i.e. gsTopology). Each element of the output list corresponds to one pathway and contains a vector of permuted perturbation scores. The permuted perturbation scores will be used to approximate the null distributions of perturbation scores and compute permuted p-values.

If the input is S4 object of DGEList or SummarizedExperiment, gene expression matrix will be extracted and converted to a logCPM matrix.

The sample permutation is operated by randomly choosing combination of 2 samples from the column names of the expreMatrix. Hence, sample metadata is not a required input. The number of maximum unique permutations in a dataset with N samples is N x (N-1). By default, permuted logFCs will be computed for all N x (N-1) permuted pairs. However, this can be overwritten by the NB parameter. If the NB specified is smaller than N x (N-1), NB possibilities will be randomly sampled from all the possible permutation. If the NB specified is larger than the maximum number of permutation possible, the parameter will be ignored. Since the smallest achievable permutation p-value is 1/NB, accurate estimation of small p-value requires a large number of permutations that is only feasible in data with a large sample size.

Value

A list where each element is a vector of perturbation scores for a pathway.

Examples

#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
 groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")
## Not run: 
load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))

# simulate the null distribution of scores through sample permutation
permutedScore <- generate_permuted_scores(logCPM_example,
gsTopology = gsTopology, weight = ls$weight)

 
## End(Not run)
#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
 groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")
## Not run: 
load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))

# simulate the null distribution of scores through sample permutation
permutedScore <- generate_permuted_scores(logCPM_example,
gsTopology = gsTopology, weight = ls$weight)

 
## End(Not run)

gsAnnotation_df: Categorization of KEGG pathways used for community annotation

Description

gsAnnotation_df: Categorization of KEGG pathways used for community annotation

Usage

data(gsAnnotation_df)
data(gsAnnotation_df)

Format

A data.frame with 549 rows and 2 columns containing categorization of 549 KEGG pathways

gs_name: Gene-set name
vategory: Category

Source

https://www.genome.jp/kegg/

logCPM_example: Normalised logCPM of patient-derived explant models obtained from 5 ER-positive primamry breast cancer tumours (GSE80098)

Description

This data was adopted from a study by Singhal H, et al., which was published as Genomic agonism and phenotypic antagonism between estrogen and progesterone receptors in breast cancer in 2016.

Usage

data(logCPM_example)
data(logCPM_example)

Format

A data.frame with 7672 rows and 15 columns

Details

In this study, 12 primary malignant breast tissues (8PR+ and 4 PR-) were developed into patient-derived explants and treated with Vehicle, E2, E2+R5020, or R5020 for 24 or 48 hrs.

Raw data for 48-hr Vehicle-, R5020-treated and E2+R5020-treated samples were retrieved from GEO (GSE80098) and pre-processed into raw count. Filtration was sequentially performed to remove undetectable genes and the filtered counts were normalised using conditional quantile normalisation to offset effects of systematic artefacts, such as gene length and GC contents.

To reduce computing time, we randomly sampled half of the genes after filtration and used their logCPM value as the example data.

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80098

metadata_example: Sample metadata for malignant breast cancer tumours PDE from 5 ER+ breast cancer tumour (GSE80098)

Description

metadata_example: Sample metadata for malignant breast cancer tumours PDE from 5 ER+ breast cancer tumour (GSE80098)

Usage

data(metadata_example)
data(metadata_example)

Format

A data.frame with 15 rows and 4 columns

patient: patient N2-3, P4-6
treatment: treatment: Vehicle, R5020 or E2+R5020
PR: progesterone receptor status
sample: sample name, corresponding to column names of the logCPM matrix

Source

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4928895/

Normalise test perturbation scores by permutation result and compute permutation p-values

Description

Normalise test perturbation scores by permutation result and compute permutation p-values

Usage

normalise_by_permu(
  permutedScore,
  testScore,
  pAdj_method = "fdr",
  sortBy = c("gs_name", "sample", "pvalue")
)
normalise_by_permu(
  permutedScore,
  testScore,
  pAdj_method = "fdr",
  sortBy = c("gs_name", "sample", "pvalue")
)

Arguments

`permutedScore`	A list. Output of `generate_permuted_scores`
`testScore`	A `data.frame`. Output of `pathway_pert`
`pAdj_method`	Method for adjusting p-values for multiple comparisons. See `?p.adjust` for methods available. Default to FDR.
`sortBy`	Sort the output by p-value, gene-set name or sample names.

Details

Normalise the test perturbation scores generated by weight_ss_fc() through the permuted perturbation scores derived from the generate_permuted_scores() function. The mean absolute deviation(MAD) and median of perturbation scores for each pathway are firstly derived from the permuted perturbation scores. The test perturbation scores are then converted to robust z-scores using MADs and medians calculated.

Additionally, by assessing the proportion of permuted scores that are more extreme than the test perturbation score within each pathway, the permuted p-value of individual test perturbation scores will be computed.

Value

A data.frame

Examples

## Not run: 
load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", sampleColumn = "sample", treatColumn = "treatment")

# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)

# sum gene-wise perturbation scores to derive the pathway-level
# single-sample perturbation scores
pathwayPertScore <- pathway_pert(genePertScore, ls$weighted_logFC)

# simulate the null distribution of scores through sample permutation
permutedScore <- generate_permuted_scores(logCPM_example,
gsTopology = gsTopology, weight = ls$weight)

# normlise the test perturbation scores using the permutation results
normalisedScores <- normalise_by_permu(permutedScore, pathwayPertScore,
sortBy = "pvalue")
 
## End(Not run)
## Not run: 
load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", sampleColumn = "sample", treatColumn = "treatment")

# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)

# sum gene-wise perturbation scores to derive the pathway-level
# single-sample perturbation scores
pathwayPertScore <- pathway_pert(genePertScore, ls$weighted_logFC)

# simulate the null distribution of scores through sample permutation
permutedScore <- generate_permuted_scores(logCPM_example,
gsTopology = gsTopology, weight = ls$weight)

# normlise the test perturbation scores using the permutation results
normalisedScores <- normalise_by_permu(permutedScore, pathwayPertScore,
sortBy = "pvalue")
 
## End(Not run)

Compute Single-sample Pathway-level Perturbation Score

Description

Substract ssFC from the raw gene-level perturbation scores within each sample and sum gene-wise raw perturbation scores to derive single-sample perturbation scores for each pathway.

Usage

pathway_pert(genePertScore, weightedFC, drop = TRUE)
pathway_pert(genePertScore, weightedFC, drop = TRUE)

Arguments

`genePertScore`	List of gene-wise raw perturbation score matrices generated using function `raw_gene_pert()`
`weightedFC`	A matrix of weighted ssFC generated using function `weight_ss_fc`
`drop`	logic(1). Whether to drop pathways with all zero scores

Value

A data.frame with 3 columns: score (single-sample pathway-level perturbation score), sample, and gs_name (gene-set name)

References

Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP, Romero R. A novel signaling pathway impact analysis. Bioinformatics. 2009 Jan 1;25(1):75-82.

Examples

#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")
# extract all the KEGG pathways
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")
# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)
# sum gene-wise perturbation scores to derive the pathway-level single-sample
# perturbation scores
pathwayPertScore <- pathway_pert(genePertScore, ls$weighted_logFC)
#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")
# extract all the KEGG pathways
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")
# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)
# sum gene-wise perturbation scores to derive the pathway-level single-sample
# perturbation scores
pathwayPertScore <- pathway_pert(genePertScore, ls$weighted_logFC)

Visualise the community structure in significantly perturbed gene-set network

Description

Visualise the community structure in significantly perturbed gene-set network

Usage

plot_community(
  normalisedScores,
  gsTopology,
  gsAnnotation = NULL,
  colorBy = "community",
  communityMethod = c("louvain", "walktrap", "spinglass", "leading_eigen",
    "edge_betweenness", "fast_greedy", "label_prop", "leiden"),
  foldGSname = TRUE,
  foldafter = 2,
  labelFun = .rm_prefix,
  layout = c("fr", "dh", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"),
  markCommunity = "ellipse",
  markAlpha = 0.2,
  color_lg_title = NULL,
  edgeAlpha = 0.8,
  scale_edgeWidth = c(0.5, 3),
  edgeLegend = FALSE,
  scale_nodeSize = c(3, 6),
  nodeShape = 16,
  lb_size = 3,
  lb_color = "black",
  plotIsolated = FALSE,
  ...
)
plot_community(
  normalisedScores,
  gsTopology,
  gsAnnotation = NULL,
  colorBy = "community",
  communityMethod = c("louvain", "walktrap", "spinglass", "leading_eigen",
    "edge_betweenness", "fast_greedy", "label_prop", "leiden"),
  foldGSname = TRUE,
  foldafter = 2,
  labelFun = .rm_prefix,
  layout = c("fr", "dh", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"),
  markCommunity = "ellipse",
  markAlpha = 0.2,
  color_lg_title = NULL,
  edgeAlpha = 0.8,
  scale_edgeWidth = c(0.5, 3),
  edgeLegend = FALSE,
  scale_nodeSize = c(3, 6),
  nodeShape = 16,
  lb_size = 3,
  lb_color = "black",
  plotIsolated = FALSE,
  ...
)

Arguments

`normalisedScores`	A `data.frame` derived from normalise_by_permu
`gsTopology`	List of pathway topology matrices generated using retrieve_topology
`gsAnnotation`	A `data.frame` containing gene-sets categories for pathway annotation. Must contain the two columns: `c("gs_name", "category")`, where `gs_name` denotes gene-sets names that are matched to names of pathway topology matrices, and `category` records a higher level category for each pathway. If customized annotation is not provided, it will be assumed that the pathways were obtained from the KEGG database and inbuilt KEGG pathway annotation information will be used
`colorBy`	Can be any column with in the `normalisedScores` object, or the additional value "community".
`communityMethod`	A community detection method supported by `igraph`. See details for all methods available.
`foldGSname`	`logical`. Should long gene-set names be folded into two lines
`foldafter`	The number of words after which gene-set names should be folded. Defaults to 2
`labelFun`	function to manipulate or modify gene-set labels. By default, any database will be stripped from the prefix using a regex pattern
`layout`	The layout algorithm to apply. Accepted layouts are `⁠"fr", "dh", "gem", "graphopt", "kk", "lgl", "mds" and "sugiyama"⁠`
`markCommunity`	`character` A `⁠geom_mark_⁠` method supported by `ggforce` to annotate sets of nodes belonging to the same community. Either `⁠NULL, ellipse, circle, hull, rect*⁠`
`markAlpha`	Transparency of annotation areas.
`color_lg_title`	Title for the color legend
`edgeAlpha`	Transparency of edges.
`scale_edgeWidth`	A numerical vector of length 2 to be provided to `ggraph::scale_edge_width_continuous()` for specifying the minimum and maximum edge widths after transformation.
`edgeLegend`	`logical` Should edge weight legend be shown
`scale_nodeSize`	A numerical vector of length 2 to be provided to `ggplot2::scale_size()` for specifying the minimum and maximum node sizes after transformation.
`nodeShape`	The shape to use for nodes
`lb_size`	Size of node text labels
`lb_color`	Color of node text labels
`plotIsolated`	`logical(1)` Should nodes not connected to any other nodes be plotted. Defaults to FALSE
`...`	Used to pass various potting parameters to `⁠ggforce::geom_mark_*()⁠`

Details

A community detection strategy specified by communityMethod will be applied to the pathway-pathway network, and communities will be annotated with the pathway category that had the highest number of occurrence, denoting the main biological processes perturbed in that community.

At the moment, only KEGG pathway categories are provided with the package, so if the provided normalisedScores contains perturbation scores of pathways derived from other databases, annotation of communities will not be performed unless pathway information is provided through the gsAnnotation object. The category information needs to be provided in a data.frame containing gs_name (gene-set names) and category (categorising the given pathways).

Plotting parameters accepted by ⁠geom_mark_*⁠ could be passed to the function to adjust the annotation area or the annotation label. See geom_mark_ellipse for more details.

Value

A ggplot2 object

Examples

load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))
#Subset the first 10 rows of the normalisedScores data.frame as an example
subset <- normalisedScores[1:15,]
subset$status <- ifelse(subset$robustZ > 0, "Activated", "Inhibited")
# Color network plot nodes by the community they were assigned to and mark
# nodes belonging to the same community by ellipses
plot_community(subset, gsTopology, colorBy = "community",layout = "kk",
color_lg_title = "Community")

# Color network plot nodes by pathways' directions of changes and mark nodes
# belonging to the same community by ellipses
plot_community(subset, gsTopology, colorBy = "status",layout = "kk",
color_lg_title = "Direction of pathway perturbation")

# To change the colour and fill of `geom_mark_*` annotation, use any
# `scale_fill_*` and/or `scale_color_*`
# functions supported by `ggplot2`. For example:
p <- plot_community(subset, gsTopology, colorBy = "status",layout = "kk",
markCommunity = "rect",color_lg_title = "Direction of pathway perturbation")
p + ggplot2::scale_color_ordinal() + ggplot2::scale_fill_ordinal()

load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))
#Subset the first 10 rows of the normalisedScores data.frame as an example
subset <- normalisedScores[1:15,]
subset$status <- ifelse(subset$robustZ > 0, "Activated", "Inhibited")
# Color network plot nodes by the community they were assigned to and mark
# nodes belonging to the same community by ellipses
plot_community(subset, gsTopology, colorBy = "community",layout = "kk",
color_lg_title = "Community")

# Color network plot nodes by pathways' directions of changes and mark nodes
# belonging to the same community by ellipses
plot_community(subset, gsTopology, colorBy = "status",layout = "kk",
color_lg_title = "Direction of pathway perturbation")

# To change the colour and fill of `geom_mark_*` annotation, use any
# `scale_fill_*` and/or `scale_color_*`
# functions supported by `ggplot2`. For example:
p <- plot_community(subset, gsTopology, colorBy = "status",layout = "kk",
markCommunity = "rect",color_lg_title = "Direction of pathway perturbation")
p + ggplot2::scale_color_ordinal() + ggplot2::scale_fill_ordinal()

Plot genes' contribution to a pathway's perturbation as a heatmap

Description

Plot individual genes' contributions to the pathway-level perturbation score

Usage

plot_gene_contribution(
  genePertMatr,
  mapEntrezID = NULL,
  topGene = 10,
  filterBy = c("mean", "sd", "max.abs"),
  tieMethod = "min",
  annotation_df = NULL,
  ...
)
plot_gene_contribution(
  genePertMatr,
  mapEntrezID = NULL,
  topGene = 10,
  filterBy = c("mean", "sd", "max.abs"),
  tieMethod = "min",
  annotation_df = NULL,
  ...
)

Arguments

`genePertMatr`	A matrix of gene-wise perturbation scores corresponding to a pathway. An element of the output generated using function `raw_gene_pert()`
`mapEntrezID`	Optional. A `data.frame` matching genes' entrez IDs to another identifier with preferred labels. Must contain the columns: `"entrezid"` and `"mapTo"`
`topGene`	Numeric(1). The number of top genes to plot
`filterBy`	Filter top genes by the mean, variability (sd), maximum value, or maximum absolute values
`tieMethod`	Method for handling ties in ranking (i.e. in values returned by `filterBy`, two or many genes share the same value). Default to "min". See `?rank` for other options.
`annotation_df`	A `data.frame` for annotating heatmap columns. Must contain a "sample" column with sample names matching to the column names of the `genePertMatr`
`...`	Used to pass various potting parameters to `pheatmap::pheatmap()`

Details

The single-sample pathway-level perturbation score for a given pathway is derived from aggregating all the gene-wise perturbation scores of genes in that pathway. This function visualises individual pathway genes' perturbation scores as a heatmap to demonstrate pathway genes' contribution to a pathway perturbation.

Plotting of the heatmap is done through pheatmap::pheatmap() so all plotting parameters accepted by pheatmap::pheatmap() could also be passed to this function.

References

Kolde R (2019). pheatmap: Pretty Heatmaps. R package version 1.0.12, https://CRAN.R-project.org/package=pheatmap.

Examples

#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
# compute single-sample logFCs for all treated samples
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")

# extract all the KEGG pathways
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")

# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)
# sum gene-wise perturbation scores to derive the pathway-level single-sample perturbation scores
pathwayPertScore <- pathway_pert(genePertScore, ls$weighted_logFC)

# Genes with top 10 mean absolute gene-wise perturbation scores in the
# Estrogen signaling pathway was visualised.
plot_gene_contribution(genePertScore$`kegg.Estrogen signaling pathway`,
filterBy = "mean", topGene = 10)

# Columns of the heatmap could be annotated by the pathway-level perturbation
# and treatments. Firstly, create a `data.frame` with the two annotation
# attributes and sample names matching the column names of the perturbation
# score matrix.
annotation_df <- dplyr::select(metadata_example, sample, treatment)
pathwayLevel <- dplyr::filter(pathwayPertScore,
gs_name == "kegg.Estrogen signaling pathway")
pathwayLevel$`pathway-level` <- ifelse(
pathwayLevel$score > 0, "Activated", "Inhibited")
annotation_df <- dplyr::left_join(
dplyr::select(pathwayLevel, sample, `pathway-level`),
annotation_df, unmatched = "drop")
# To make the gene labels more informative, also map genes' entrez id
# to chosen identifiers.
load(system.file("extdata", "entrez2name.rda", package = "sSNAPPY"))
plot_gene_contribution(genePertScore$`kegg.Estrogen signaling pathway`,
topGene = 10, filterBy = "mean", annotation_df = annotation_df,
mapEntrezID = entrez2name)

# Plotting parameters accepted by `pheatmap::pheatmap()` could be passed to
# this function to customise the plot. For example, changin the color of annotations
plot_gene_contribution(genePertScore$`kegg.Estrogen signaling pathway`,
topGene = 10, filterBy = "mean", annotation_df = annotation_df,
mapEntrezID = entrez2name, annotation_colors = list(
treatment = c(R5020 = "black", `E2+R5020` = "white"),
`pathway-level` = c(Activated = "darkred", Inhibited = "lightskyblue")))
#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
# compute single-sample logFCs for all treated samples
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")

# extract all the KEGG pathways
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")

# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)
# sum gene-wise perturbation scores to derive the pathway-level single-sample perturbation scores
pathwayPertScore <- pathway_pert(genePertScore, ls$weighted_logFC)

# Genes with top 10 mean absolute gene-wise perturbation scores in the
# Estrogen signaling pathway was visualised.
plot_gene_contribution(genePertScore$`kegg.Estrogen signaling pathway`,
filterBy = "mean", topGene = 10)

# Columns of the heatmap could be annotated by the pathway-level perturbation
# and treatments. Firstly, create a `data.frame` with the two annotation
# attributes and sample names matching the column names of the perturbation
# score matrix.
annotation_df <- dplyr::select(metadata_example, sample, treatment)
pathwayLevel <- dplyr::filter(pathwayPertScore,
gs_name == "kegg.Estrogen signaling pathway")
pathwayLevel$`pathway-level` <- ifelse(
pathwayLevel$score > 0, "Activated", "Inhibited")
annotation_df <- dplyr::left_join(
dplyr::select(pathwayLevel, sample, `pathway-level`),
annotation_df, unmatched = "drop")
# To make the gene labels more informative, also map genes' entrez id
# to chosen identifiers.
load(system.file("extdata", "entrez2name.rda", package = "sSNAPPY"))
plot_gene_contribution(genePertScore$`kegg.Estrogen signaling pathway`,
topGene = 10, filterBy = "mean", annotation_df = annotation_df,
mapEntrezID = entrez2name)

# Plotting parameters accepted by `pheatmap::pheatmap()` could be passed to
# this function to customise the plot. For example, changin the color of annotations
plot_gene_contribution(genePertScore$`kegg.Estrogen signaling pathway`,
topGene = 10, filterBy = "mean", annotation_df = annotation_df,
mapEntrezID = entrez2name, annotation_colors = list(
treatment = c(R5020 = "black", `E2+R5020` = "white"),
`pathway-level` = c(Activated = "darkred", Inhibited = "lightskyblue")))

Plot significantly perturbed gene-sets as a network

Description

Plot significantly perturbed gene-sets as a network

Usage

plot_gs_network(
  normalisedScores,
  gsTopology,
  colorBy = NULL,
  foldGSname = TRUE,
  foldafter = 2,
  labelFun = .rm_prefix,
  layout = c("fr", "dh", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"),
  edgeAlpha = 0.8,
  edgeWidthScale = c(0.5, 3),
  edgeLegend = FALSE,
  nodeSizeScale = c(3, 6),
  nodeShape = 16,
  showLegend = TRUE,
  gsLegTitle = NULL,
  gsNameSize = 3,
  gsNameColor = "black",
  plotIsolated = FALSE,
  maxOverlaps = 10,
  ...
)
plot_gs_network(
  normalisedScores,
  gsTopology,
  colorBy = NULL,
  foldGSname = TRUE,
  foldafter = 2,
  labelFun = .rm_prefix,
  layout = c("fr", "dh", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"),
  edgeAlpha = 0.8,
  edgeWidthScale = c(0.5, 3),
  edgeLegend = FALSE,
  nodeSizeScale = c(3, 6),
  nodeShape = 16,
  showLegend = TRUE,
  gsLegTitle = NULL,
  gsNameSize = 3,
  gsNameColor = "black",
  plotIsolated = FALSE,
  maxOverlaps = 10,
  ...
)

Arguments

`normalisedScores`	A `data.frame` of pathway perturbation scores derived from the `normalise_by_permu()` function
`gsTopology`	List of pathway topology matrices generated using function `retrieve_topology()`
`colorBy`	Choose to color nodes either by robustZ or pvalue. A column must exist in the `normalisedScores` `data.frame` for the chosen parameter
`foldGSname`	logical(1) Should long gene-set names fold across multiple lines
`foldafter`	The number of words after which gene-set names should be folded. Defaults to 2
`labelFun`	function to manipulate or modify gene-set labels. By default, any database will be stripped from the prefix using a regex pattern
`layout`	The layout algorithm to apply. Accept all layout supported by `igraph`
`edgeAlpha`	numeric(1) Transparency of edges. Default to 0.8
`edgeWidthScale`	A numerical vector of length 2 to be provided to `ggraph::scale_edge_width_continuous()` for specifying the minimum and maximum edge widths after transformation. Defaults to `c(0.5, 3)`
`edgeLegend`	logical(1) Should edge weight legend be shown
`nodeSizeScale`	A numeric vector of length 2 to be provided to `ggplot2::scale_size()` for specifying the minimum and maximum node sizes after transformation. Defaulted to `c(3,6)`
`nodeShape`	Shape of nodes
`showLegend`	logical(1) Should color legend be shown
`gsLegTitle`	Optional title for the color legend
`gsNameSize`	Size of node text labels
`gsNameColor`	Color of node text labels
`plotIsolated`	logical(1) Should nodes not connected to any other nodes be plotted. Defaults to FALSE
`maxOverlaps`	passed to geom_node_text
`...`	Not used

Value

A ggplot2 object

Examples

load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))

# Subset pathways significantly perturbed in sample R5020_N2_48
subset <- dplyr::filter(
  normalisedScores, adjPvalue < 0.05, sample == "R5020_N2_48"
)
subset[["status"]] <- ifelse(subset[["robustZ"]]> 0, "Activated", "Inhibited")


# Color network plot nodes by status
plot_gs_network(subset, gsTopology,
colorBy = "status", layout = "dh",
gsLegTitle = "Direction of pathway Perturbation")

# Color network plot nodes by p-values
plot_gs_network(subset, gsTopology, layout = "dh",
colorBy = "pvalue", gsLegTitle = "P-value")

load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))

# Subset pathways significantly perturbed in sample R5020_N2_48
subset <- dplyr::filter(
  normalisedScores, adjPvalue < 0.05, sample == "R5020_N2_48"
)
subset[["status"]] <- ifelse(subset[["robustZ"]]> 0, "Activated", "Inhibited")


# Color network plot nodes by status
plot_gs_network(subset, gsTopology,
colorBy = "status", layout = "dh",
gsLegTitle = "Direction of pathway Perturbation")

# Color network plot nodes by p-values
plot_gs_network(subset, gsTopology, layout = "dh",
colorBy = "pvalue", gsLegTitle = "P-value")

Plot pathways and genes contained in them as a network

Description

Plot pathways and genes contained in them as a network

Usage

plot_gs2gene(
  normalisedScores,
  gsTopology,
  geneFC = NULL,
  mapEntrezID = NULL,
  colorGsBy = NULL,
  foldGSname = TRUE,
  foldafter = 2,
  labelFun = .rm_prefix,
  filterGeneBy = 2,
  layout = c("fr", "dh", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"),
  edgeColor = "darkgrey",
  edgeAlpha = 0.8,
  edgeArc = 0.5,
  geneNodeSize = 3,
  geneNodeShape = 17,
  geneNameFace = c("italic", "plain", "bold", "bold-italic"),
  geneNameColor = "grey30",
  geneNameSize = 3,
  labelGene = TRUE,
  gsNodeSize = 2,
  gsNodeShape = 21,
  gsNodeStroke = 0.5,
  gsNodeOutline = "white",
  gsNameSize = 6,
  gsNameColor = "black",
  geneLegTitle = "Mean logFC",
  gsLegTitle = colorGsBy,
  maxOverlaps = 10,
  ...
)
plot_gs2gene(
  normalisedScores,
  gsTopology,
  geneFC = NULL,
  mapEntrezID = NULL,
  colorGsBy = NULL,
  foldGSname = TRUE,
  foldafter = 2,
  labelFun = .rm_prefix,
  filterGeneBy = 2,
  layout = c("fr", "dh", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"),
  edgeColor = "darkgrey",
  edgeAlpha = 0.8,
  edgeArc = 0.5,
  geneNodeSize = 3,
  geneNodeShape = 17,
  geneNameFace = c("italic", "plain", "bold", "bold-italic"),
  geneNameColor = "grey30",
  geneNameSize = 3,
  labelGene = TRUE,
  gsNodeSize = 2,
  gsNodeShape = 21,
  gsNodeStroke = 0.5,
  gsNodeOutline = "white",
  gsNameSize = 6,
  gsNameColor = "black",
  geneLegTitle = "Mean logFC",
  gsLegTitle = colorGsBy,
  maxOverlaps = 10,
  ...
)

Arguments

`normalisedScores`	A `data.frame` derived from the `normalise_by_permu()` function. Only gene-sets of interest should be included
`gsTopology`	List of pathway topology matrices generated using function `retrieve_topology()`
`geneFC`	An optional named vector of pathways' fold changes
`mapEntrezID`	Optional. A `data.frame` matching genes' entrez IDs to another identifier with preferred labels. Must contain the columns: `"entrezid"` and `"mapTo"`
`colorGsBy`	Column within `normalisedScores` to color gene-set/pathway nodes by
`foldGSname`	`logical`. Should long gene-set names be folded into two lines
`foldafter`	The number of words after which gene-set names should be folded.
`labelFun`	function to manipulate or modify gene-set labels. By default, any database will be stripped from the prefix using a regex pattern
`filterGeneBy`	Filtration cut-off applied to genes' connectivity (ie. how many pathways was a gene involved in).
`layout`	The layout algorithm to apply. Accepts all layout supported by `igraph`.
`edgeColor`, `edgeAlpha`	Color and transparency of edges
`edgeArc`	The bend of edges. 1 approximates a semi-circle whilst 0 will give a straight line.
`geneNodeSize`, `geneNodeShape`	Size and shape for gene nodes
`geneNameSize`, `geneNameColor`, `geneNameFace`	Size, color and fontface to use for gene labels
`labelGene`	`logical(1)` Should the gene names be included
`gsNodeSize`	Size for gene-set/pathway nodes
`gsNodeShape`	Shape for gene-set/pathway nodes. Should be a shape with a fill parameter, such as 21:25
`gsNodeStroke`, `gsNodeOutline`	Border thickness and color for gene-set/pathway nodes
`gsNameSize`, `gsNameColor`	Size and color of gene-set/pathway labels
`geneLegTitle`	`character(1)`. Legend title for gene nodes
`gsLegTitle`	`character(1)` Legend title for gene-set/pathway nodes
`maxOverlaps`	passed to geom_node_text
`...`	Not used

Details

Taking the perturbation scores of a list of gene-sets derived from normalise_by_permu() as input, this function matches gene-sets to their associated genes by utilizing information from pathway topology matrices.

If providing logFC values as a named vector, the names must be entrezgene IDs in the format of "ENTREZID:XXXX" for compatibility with the values returned by retrieve_topology(). If not providing this vector, only genes associated with two or more pathways will be added to the plot, however, it should be noted that if omitting this vector, network plots can easily become unmanageable.

Users can also choose to provide a mapEntrezID data.frame to match entrezgene IDs to their chosen identifiers. The data.frame should contain the columns: "entrezid" and "mapTo".

If geneFC is provided, gene nodes will be colored by values provided, otherwise all gene nodes will drawn in grey.

Since some gene-sets could can contain hundreds of genes, it is not recommended to plot all genes. If mapEntrezID data.frame is provided, only genes included in that data.frame will be used in the plot.

It is strongly recommended to filter genes using some criteria, such as those with the largest magnitude of change. If all pathway genes are desired, please consider setting labelGene to FALSE to remove gene names.

Value

A ggplot2 object

Examples


load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))

# Subset pathways significantly perturbed in sample R5020_N2_48
subset <- dplyr::filter(normalisedScores, adjPvalue < 0.05, sample == "R5020_N2_48")
subset$response <- ifelse(subset$robustZ > 0, "Activated", "Inhibited")

# Color gene-sets nodes by robust z-scores.
plot_gs2gene(
  subset, gsTopology, colorGsBy = "robustZ", labelGene = FALSE, geneNodeSize = 1,
  gsNodeSize = 4
) + scale_fill_gradient2()
# When fold-changes are not provided, gene nodes are colored grey.

# To color genes by their direction of change, firstly compute single-sample logFC
data(logCPM_example)
data(metadata_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(
  logCPM_example, metadata = metadata_example,
  groupBy = "patient", treatColumn = "treatment",
  sampleColumn = "sample"
)
# Provide fold-changes of sample R5020_N2_48 as a named vector
plot_gs2gene(
  subset, gsTopology, geneFC = ls$weighted_logFC[,"R5020_N2_48"],
  colorGsBy = "response", labelGene = FALSE
) + scale_colour_gradient2()

# By default, the function only include genes involved in at least 2 pathways,
# which can be overwritten by the `filterGeneBy` parameter. But there are still
# a large number of genes, making the plot cumbersome. Instead, only include
# fold-changes of genes within the top 500 absolute values for fold-change
top500 <- rank(1/abs(ls$weighted_logFC[,"R5020_N2_48"])) <= 500
fcByDir <- ifelse(ls$weighted_logFC[top500,"R5020_N2_48"] > 0, "Up-Regulated", "Down-Regulated")
plot_gs2gene(subset, gsTopology, geneFC = fcByDir, colorGsBy = "response") +
  scale_fill_manual(values = c("darkred", "lightskyblue")) +
  scale_colour_manual(values = c("red", "blue"))

# To make the gene labels more informative, map genes' entrez id to chosen identifiers.
load(system.file("extdata", "entrez2name.rda", package = "sSNAPPY"))
plot_gs2gene(
  subset, gsTopology, geneFC = fcByDir, mapEntrezID = entrez2name,
  colorGsBy = "response", gsNodeSize = 4
) +
  scale_fill_manual(values = c("darkred", "lightskyblue"), name = "Pathway") +
  scale_colour_manual(values = c("blue", "red"), name = "Gene\nDirection")

load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))

# Subset pathways significantly perturbed in sample R5020_N2_48
subset <- dplyr::filter(normalisedScores, adjPvalue < 0.05, sample == "R5020_N2_48")
subset$response <- ifelse(subset$robustZ > 0, "Activated", "Inhibited")

# Color gene-sets nodes by robust z-scores.
plot_gs2gene(
  subset, gsTopology, colorGsBy = "robustZ", labelGene = FALSE, geneNodeSize = 1,
  gsNodeSize = 4
) + scale_fill_gradient2()
# When fold-changes are not provided, gene nodes are colored grey.

# To color genes by their direction of change, firstly compute single-sample logFC
data(logCPM_example)
data(metadata_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(
  logCPM_example, metadata = metadata_example,
  groupBy = "patient", treatColumn = "treatment",
  sampleColumn = "sample"
)
# Provide fold-changes of sample R5020_N2_48 as a named vector
plot_gs2gene(
  subset, gsTopology, geneFC = ls$weighted_logFC[,"R5020_N2_48"],
  colorGsBy = "response", labelGene = FALSE
) + scale_colour_gradient2()

# By default, the function only include genes involved in at least 2 pathways,
# which can be overwritten by the `filterGeneBy` parameter. But there are still
# a large number of genes, making the plot cumbersome. Instead, only include
# fold-changes of genes within the top 500 absolute values for fold-change
top500 <- rank(1/abs(ls$weighted_logFC[,"R5020_N2_48"])) <= 500
fcByDir <- ifelse(ls$weighted_logFC[top500,"R5020_N2_48"] > 0, "Up-Regulated", "Down-Regulated")
plot_gs2gene(subset, gsTopology, geneFC = fcByDir, colorGsBy = "response") +
  scale_fill_manual(values = c("darkred", "lightskyblue")) +
  scale_colour_manual(values = c("red", "blue"))

# To make the gene labels more informative, map genes' entrez id to chosen identifiers.
load(system.file("extdata", "entrez2name.rda", package = "sSNAPPY"))
plot_gs2gene(
  subset, gsTopology, geneFC = fcByDir, mapEntrezID = entrez2name,
  colorGsBy = "response", gsNodeSize = 4
) +
  scale_fill_manual(values = c("darkred", "lightskyblue"), name = "Pathway") +
  scale_colour_manual(values = c("blue", "red"), name = "Gene\nDirection")

Compute Gene-wise Perturbation Score

Description

Propagate weighted single sample logFCs down the pathway topologies to compute gene-wise perturbation score per gene per sample per pathway

Usage

raw_gene_pert(weightedFC, gsTopology)
raw_gene_pert(weightedFC, gsTopology)

Arguments

`weightedFC`	A matrix of weighted single sample logFCs derived from function `weight_ss_fc()`
`gsTopology`	List of pathway topology matrices generated using function `retrieve_topology()`

Details

This function use the algorithm adopted from SPIA (see citation) to integrate genes' changes in expression and gene-gene interaction to compute gene-wise perturbation score per gene per sample per pathway. The rownames of the weighted single sample logFC matrix and the pathway topology matrices must use the same type of gene identifier (ie. entrez ID).

Pathways with zero perturbation scores across all genes and samples will be dropped from the output.

Value

A list where each element is a matrix corresponding to a pathway. Each column of an element corresponds to a sample, and each row corresponds to a pathway gene.

References

Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP, Romero R. A novel signaling pathway impact analysis. Bioinformatics. 2009 Jan 1;25(1):75-82.

Examples

#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")
# extract all the KEGG pathways
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")
# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)
#compute weighted single sample logFCs
data(metadata_example)
data(logCPM_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
groupBy = "patient", treatColumn = "treatment", sampleColumn = "sample")
# extract all the KEGG pathways
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")
# compute raw gene-wise perturbation scores
genePertScore <- raw_gene_pert(ls$weighted_logFC, gsTopology)

Retrieve pathway topology as weighted adjacency matrices

Description

Retrieve pathway topology matrices and convert them to normalized weighted directed adjacency matrices describing gene signaling networks.

Usage

retrieve_topology(
  database = c("kegg", "wikipathways", "reactome"),
  species = c("hsapiens", "athaliana", "btaurus", "celegans", "cfamiliaris",
    "dmelanogaster", "drerio", "ecoli", "ggallus", "mmusculus", "rnorvegicus",
    "scerevisiae", "sscrofa", "xlaevis"),
  keyword = NULL,
  beta = NULL
)
retrieve_topology(
  database = c("kegg", "wikipathways", "reactome"),
  species = c("hsapiens", "athaliana", "btaurus", "celegans", "cfamiliaris",
    "dmelanogaster", "drerio", "ecoli", "ggallus", "mmusculus", "rnorvegicus",
    "scerevisiae", "sscrofa", "xlaevis"),
  keyword = NULL,
  beta = NULL
)

Arguments

`database`	A character vector of supported databases.
`species`	One of the supported species. Currently support `c("hsapiens", "athaliana", "btaurus", "celegans", "cfamiliaris", "dmelanogaster", "drerio", "ecoli", "ggallus", "mmusculus","rnorvegicus", "scerevisiae", "sscrofa", "xlaevis")`
`keyword`	Optional. Areas of interests as a character vector.
`beta`	Optional. A named numeric vector of weights to be assigned to each type of gene/protein relation type.See details for more information.

Details

This function takes pathway topology information retrieved using graphite and converts these to normalized weighted directed adjacency matrices describing the gene signaling network, which can be used to compute gene-wise and pathway-level perturbation score through the scoring algorithm derived from the SPIA algorithm. See cited document for more details.

The database parameter may specify multiple databases but only one species can be provided to the species parameter.

Users can provide areas of interests as keywords, which will be matched to pathway names from chosen databases to subset pathways. Cases will be ignored in string matching.

The beta parameter specifies weights to be assigned to each type of gene-gene interaction. It should be a named numeric vector of length 25, whose names must be: c("activation","compound","binding/association", "expression","inhibition","activation_phosphorylation","phosphorylation", "indirect","inhibition_phosphorylation","dephosphorylation_inhibition", "dissociation","dephosphorylation","activation_dephosphorylation", "state","activation_indirect","inhibition_ubiquination","ubiquination", "expression_indirect","indirect_inhibition","repression", "binding/association_phosphorylation","dissociation_phosphorylation", "indirect_phosphorylation").

If unspecified, beta will be set as an integer vector with: a) values of 1 for interactions which match 'expression' or 'activation'; b) values of -1 for interactions which match 'repression' or 'inhibition'; and c) 0 elsewhere.

The retrieved topology matrices will be processed as decribed by SPIA to: 1) scale gene-gene interactions by the number of downstream genes and 2) subtract an identity matrix of the same size from the topology matrix.

The converted weighted adjacent matrices will be stored in a list. We recommend users to store the returned list as a file so this step only needs to be performed once for each database.

Value

A list where each element is a weighted directed adjacency matrix corresponding to a pathway

References

Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP, Romero R. A novel signaling pathway impact analysis. Bioinformatics. 2009 Jan 1;25(1):75-82. Sales, G., Calura, E., Cavalieri, D. et al. graphite - a Bioconductor package to convert pathway topology to gene network. BMC Bioinformatics 13, 20 (2012).

Examples


# retrieve pathway topology matrices of all KEGG pathway
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")

# If only interested in selected pathways, specify the areas of interest as
# keywords
gsTopology <- retrieve_topology(database = "kegg",
keyword = c("metabolism", "signaling"), species = "hsapiens")

# retrieve pathway topology matrices of all KEGG pathway
gsTopology <- retrieve_topology(database = "kegg", species = "hsapiens")

# If only interested in selected pathways, specify the areas of interest as
# keywords
gsTopology <- retrieve_topology(database = "kegg",
keyword = c("metabolism", "signaling"), species = "hsapiens")

sSNAPPY: A package for testing directional single sample pathway perturbation

Description

A package for testing directional single sample pathway perturbation

Compute weighted single sample LogFC from normalised logCPM

Description

Compute weighted single sample logFC for each treated samples using normalized logCPM values. Fit a lowess curve on variances ~ mean of logCPM values, and use it to predict gene-wise weights. The weighted single sample logFC are ready to be used for computing perturbation scores.

Usage

weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'matrix'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'data.frame'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'DGEList'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'SummarizedExperiment'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'matrix'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'data.frame'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'DGEList'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

## S4 method for signature 'SummarizedExperiment'
weight_ss_fc(
  expreMatrix,
  metadata = NULL,
  sampleColumn,
  treatColumn,
  groupBy,
  prefix = "ENTREZID:"
)

Arguments

`expreMatrix`	`matrix` or `data.frame` of logCPM, or `DGEList`/ `SummarizedExperiment` storing gene expression counts and sample metadata. Feature names need to be in entrez IDs, and column names need to be sample names
`metadata`	Sample metadata `data.frame` as described in the details section.
`sampleColumn`	Name of the column in the `metadata` containing column names of the `expreMatrix`
`treatColumn`	Name of the column in the `metadata` containing treatment information. The column must be a factor with the reference level set to be the control treatment.
`groupBy`	Name of the column in the `metadata` containing information for how samples are matched in pairs (eg. patient).
`prefix`	Character(1). Prefix to be add to the `rownames` of the `expreMatrix`. Default to `⁠ENTREZID:⁠` to align with the format that topology matrices will be retrieved in, but can be adjusted according to users' needs.

Details

This function computes weighted single-sample logFC (ssFC) from normalised logCPM values, used for computing single-sample perturbation scores.

Genes with low expression tend to have larger variances, which will introduce biases to the ssFC computed. Therefore, a lowess curve will be fitted to estimate the relationship between the variances and the mean of logCPM, and the relationship will be used to estimate the variance of each gene-wise mean logCPM value. Gene-wise weights, which are defined to be the inverse of predicted variances that are scaled to have a sum of 1, will then be multiplied to ssFC to down-weight genes with low counts.

It is assumed that the genes with extremely low counts have been removed and the count matrix has been normalised prior to the logCPM matrix was derived. Row names of the matrix must be in genes' entrez IDs to align with the fomrat of pathway topology matrices.

If a S4 object of DGEList or SummarizedExperiment is provided as input to expreMatrix, the gene expression matrix will be extracted from it and converted to a logCPM matrix. Sample metadata will also be extracted from the same S4 object unless otherwise specified.

Provided sample metadata should have the same number of rows as the number of columns in the logCPM matrix and must contain the a column for treatment, one for sample names and a column for how samples should be matched into pairs.

The treatment column in the sample metadata must be a factor with the reference level set to be the control condition.

Value

A list with two elements: $weight gene-wise weights; $weighted_logFC weighted single sample logFC matrix

Examples

# Inspect metadata data frame to make sure it has treatment, sample
# and patient columns
data(metadata_example)
data(logCPM_example)
# Set the treatment column to be a factor where the reference is the control
#treatment
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
 sampleColumn = "sample", groupBy = "patient", treatColumn = "treatment")
# Inspect metadata data frame to make sure it has treatment, sample
# and patient columns
data(metadata_example)
data(logCPM_example)
# Set the treatment column to be a factor where the reference is the control
#treatment
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
 sampleColumn = "sample", groupBy = "patient", treatColumn = "treatment")

Package 'sSNAPPY'

Help Index

Permute sample labels to simulate null distribution of perturbation scores

Description

Usage

Arguments

Details

Value

Examples

gsAnnotation_df: Categorization of KEGG pathways used for community annotation

Description

Usage

Format

Source

logCPM_example: Normalised logCPM of patient-derived explant models obtained from 5 ER-positive primamry breast cancer tumours (GSE80098)

Description

Usage

Format

Details

Source

metadata_example: Sample metadata for malignant breast cancer tumours PDE from 5 ER+ breast cancer tumour (GSE80098)

Description

Usage

Format

Source

Normalise test perturbation scores by permutation result and compute permutation p-values

Description

Usage

Arguments

Details

Value

Examples

Compute Single-sample Pathway-level Perturbation Score

Description

Usage

Arguments

Value

References

Examples

Visualise the community structure in significantly perturbed gene-set network

Description

Usage

Arguments

Details

Value

Examples

Plot genes' contribution to a pathway's perturbation as a heatmap

Description

Usage

Arguments

Details

References

Examples

Plot significantly perturbed gene-sets as a network

Description

Usage

Arguments

Value

Examples

Plot pathways and genes contained in them as a network

Description

Usage

Arguments

Details

Value

Examples

Compute Gene-wise Perturbation Score

Description

Usage

Arguments

Details

Value

References

Examples

Retrieve pathway topology as weighted adjacency matrices

Description

Usage

Arguments

Details

Value