Package 'magrene' reference manual

Title:	Motif Analysis In Gene Regulatory Networks
Description:	magrene allows the identification and analysis of graph motifs in (duplicated) gene regulatory networks (GRNs), including lambda, V, PPI V, delta, and bifan motifs. GRNs can be tested for motif enrichment by comparing motif frequencies to a null distribution generated from degree-preserving simulated GRNs. Motif frequencies can be analyzed in the context of gene duplications to explore the impact of small-scale and whole-genome duplications on gene regulatory networks. Finally, users can calculate interaction similarity for gene pairs based on the Sorensen-Dice similarity index.
Authors:	Fabrício Almeida-Silva [aut, cre] , Yves Van de Peer [aut]
Maintainer:	Fabrício Almeida-Silva <[email protected]>
License:	GPL-3
Version:	1.7.0
Built:	2024-06-30 03:12:21 UTC
Source:	https://github.com/bioc/magrene

Calculate Z-score for motif frequencies

Description

Calculate Z-score for motif frequencies

Usage

calculate_Z(observed = NULL, nulls = NULL)
calculate_Z(observed = NULL, nulls = NULL)

Arguments

`observed`	A list of observed motif frequencies for each motif type. List elements must be named 'lambda', 'bifan', 'V', 'PPI_V', and 'delta' (not necessarily in that order).
`nulls`	A list of null distributions for each motif type as returned by `generate_nulls`.

Value

A numeric vector with the Z-score for each motif type.

Examples

# Simulating it for test purposes
null <- rnorm(1000, mean = 5, sd = 1)
nulls <- list(
    lambda = null, V = null, PPI_V = null, delta = null, bifan = null
)
observed <- list(lambda = 7, bifan = 13, delta = 9, V = 5, PPI_V = 10)
z <- calculate_Z(observed, nulls)
# Check for motif enrichment (Z > 5)
z[which(z > 5)]
# Simulating it for test purposes
null <- rnorm(1000, mean = 5, sd = 1)
nulls <- list(
    lambda = null, V = null, PPI_V = null, delta = null, bifan = null
)
observed <- list(lambda = 7, bifan = 13, delta = 9, V = 5, PPI_V = 10)
z <- calculate_Z(observed, nulls)
# Check for motif enrichment (Z > 5)
z[which(z > 5)]

Find bifan motifs

Description

Find bifan motifs

Usage

find_bifan(
  edgelist = NULL,
  paralogs = NULL,
  lambda_vec = NULL,
  count_only = FALSE
)
find_bifan(
  edgelist = NULL,
  paralogs = NULL,
  lambda_vec = NULL,
  count_only = FALSE
)

Arguments

`edgelist`	A 2-column data frame with regulators in column 1 and targets in column 2. It can be ignored if you give lambda motifs to parameter lambda_vec (recommended).
`paralogs`	A 2-column data frame with gene IDs for each paralog in the paralog pair.
`lambda_vec`	A character of lambda motifs as returned by `find_lambda()`. If this is NULL, this function will find lambda motifs from edgelist and paralogs first. Passing previously identified lambda motifs will make this function much faster.
`count_only`	Logical indicating whether the function should return only motif counts as a numeric scalar. If FALSE, it will return a character vector of motifs. Default: FALSE.

Value

A character vector with bifan motifs represented in the format regulator1, regulator2->target1, target2.

Examples

data(gma_grn)
data(gma_paralogs)
edgelist <- gma_grn[1:50000, 1:2]
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
paralogs <- rbind(
paralogs,
data.frame(duplicate1 = "Glyma.01G177200", 
           duplicate2 = "Glyma.08G116700")
)
lambda_vec <- find_lambda(edgelist, paralogs)
bifan <- find_bifan(paralogs = paralogs, lambda_vec = lambda_vec)
data(gma_grn)
data(gma_paralogs)
edgelist <- gma_grn[1:50000, 1:2]
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
paralogs <- rbind(
paralogs,
data.frame(duplicate1 = "Glyma.01G177200", 
           duplicate2 = "Glyma.08G116700")
)
lambda_vec <- find_lambda(edgelist, paralogs)
bifan <- find_bifan(paralogs = paralogs, lambda_vec = lambda_vec)

Find delta motifs

Description

Find delta motifs

Usage

find_delta(
  edgelist = NULL,
  paralogs = NULL,
  edgelist_ppi = NULL,
  lambda_vec = NULL,
  count_only = FALSE
)
find_delta(
  edgelist = NULL,
  paralogs = NULL,
  edgelist_ppi = NULL,
  lambda_vec = NULL,
  count_only = FALSE
)

Arguments

`edgelist`	A 2-column data frame with regulators in column 1 and targets in column 2. It can be ignored if you give lambda motifs to parameter lambda_vec (recommended).
`paralogs`	A 2-column data frame with gene IDs for each paralog in the paralog pair. It can be ignored if you give lambda motifs to parameter lambda_vec (recommended).
`edgelist_ppi`	A 2-column data frame with IDs of genes that encode each protein in the interacting pair.
`lambda_vec`	A character of lambda motifs as returned by `find_lambda()`. If this is NULL, this function will find lambda motifs from edgelist and paralogs first. Passing previously identified lambda motifs will make this function much faster.
`count_only`	Logical indicating whether the function should return only motif counts as a numeric scalar. If FALSE, it will return a character vector of motifs. Default: FALSE.

Value

A character vector with lambda motifs represented in the format target1<-regulator->target2.

Examples

data(gma_grn)
data(gma_paralogs)
data(gma_ppi)
edgelist <- gma_grn[500:1000, 1:2] # reducing for test purposes
edgelist <- gma_grn[1:10000, 1:2]
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
edgelist_ppi <- gma_ppi
lambda_vec <- find_lambda(edgelist, paralogs)
motifs <- find_delta(edgelist_ppi = edgelist_ppi, lambda_vec = lambda_vec)
data(gma_grn)
data(gma_paralogs)
data(gma_ppi)
edgelist <- gma_grn[500:1000, 1:2] # reducing for test purposes
edgelist <- gma_grn[1:10000, 1:2]
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
edgelist_ppi <- gma_ppi
lambda_vec <- find_lambda(edgelist, paralogs)
motifs <- find_delta(edgelist_ppi = edgelist_ppi, lambda_vec = lambda_vec)

Find lambda motifs

Description

Find lambda motifs

Usage

find_lambda(edgelist = NULL, paralogs = NULL, count_only = FALSE)
find_lambda(edgelist = NULL, paralogs = NULL, count_only = FALSE)

Arguments

`edgelist`	A 2-column data frame with regulators in column 1 and targets in column 2.
`paralogs`	A 2-column data frame with gene IDs for each paralog in the paralog pair.
`count_only`	Logical indicating whether the function should return only motif counts as a numeric scalar. If FALSE, it will return a character vector of motifs. Default: FALSE.

Value

A character vector with lambda motifs represented in the format target1<-regulator->target2.

Examples

data(gma_grn)
data(gma_paralogs)
edgelist <- gma_grn[500:1000, 1:2] # reducing for test purposes
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
motifs <- find_lambda(edgelist, paralogs)
data(gma_grn)
data(gma_paralogs)
edgelist <- gma_grn[500:1000, 1:2] # reducing for test purposes
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
motifs <- find_lambda(edgelist, paralogs)

Find V motifs in protein-protein interactions

Description

Find V motifs in protein-protein interactions

Usage

find_ppi_v(edgelist = NULL, paralogs = NULL, count_only = FALSE)
find_ppi_v(edgelist = NULL, paralogs = NULL, count_only = FALSE)

Arguments

`edgelist`	A 2-column data frame with protein 1 in column 1 and protein 2 in column 2.
`paralogs`	A 2-column data frame with gene IDs for each paralog in the paralog pair.
`count_only`	Logical indicating whether the function should return only motif counts as a numeric scalar. If FALSE, it will return a character vector of motifs. Default: FALSE.

Details

This function aims to find the number of paralogous gene pairs that share an interaction partner.

Value

A character vector with V motifs represented in the format paralog1-partner-paralog2.

Examples

data(gma_ppi)
data(gma_paralogs)
edgelist <- gma_ppi
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
motifs <- find_ppi_v(edgelist, paralogs)
data(gma_ppi)
data(gma_paralogs)
edgelist <- gma_ppi
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
motifs <- find_ppi_v(edgelist, paralogs)

Find V motifs

Description

Find V motifs

Usage

find_v(edgelist = NULL, paralogs = NULL, count_only = FALSE)
find_v(edgelist = NULL, paralogs = NULL, count_only = FALSE)

Arguments

`edgelist`	A 2-column data frame with regulators in column 1 and targets in column 2.
`paralogs`	A 2-column data frame with gene IDs for each paralog in the paralog pair.
`count_only`	Logical indicating whether the function should return only motif counts as a numeric scalar. If FALSE, it will return a character vector of motifs. Default: FALSE.

Value

A character vector with V motifs represented in the format regulator1->target<-regulator2.

Examples

data(gma_grn)
data(gma_paralogs)
edgelist <- gma_grn[2000:4000, 1:2] # reducing for test purposes
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
motifs <- find_v(edgelist, paralogs)
data(gma_grn)
data(gma_paralogs)
edgelist <- gma_grn[2000:4000, 1:2] # reducing for test purposes
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
motifs <- find_v(edgelist, paralogs)

Generate null distributions of motif counts for each motif type

Description

Generate null distributions of motif counts for each motif type

Usage

generate_nulls(
  edgelist = NULL,
  paralogs = NULL,
  edgelist_ppi = NULL,
  n = 1000,
  bp_param = BiocParallel::SerialParam()
)
generate_nulls(
  edgelist = NULL,
  paralogs = NULL,
  edgelist_ppi = NULL,
  n = 1000,
  bp_param = BiocParallel::SerialParam()
)

Arguments

`edgelist`	A 2-column data frame with regulators in column 1 and targets in column 2.
`paralogs`	A 2-column data frame with gene IDs for each paralog in the paralog pair.
`edgelist_ppi`	A 2-column data frame with IDs of genes that encode each protein in the interacting pair.
`n`	Number of degree-preserving simulated networks to generate. Default: 1000.
`bp_param`	BiocParallel back-end to be used. Default: BiocParallel::SerialParam().

Value

A list of numeric vectors named lambda, delta, V, PPI_V, and bifan, containing the null distribution of motif counts for each motif type.

Examples

set.seed(123)
data(gma_grn)
data(gma_paralogs)
data(gma_ppi)
edgelist <- gma_grn[500:1000, 1:2] # reducing for test purposes
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
edgelist_ppi <- gma_ppi
n <- 2 # small n for demonstration purposes
generate_nulls(edgelist, paralogs, edgelist_ppi, n)
set.seed(123)
data(gma_grn)
data(gma_paralogs)
data(gma_ppi)
edgelist <- gma_grn[500:1000, 1:2] # reducing for test purposes
paralogs <- gma_paralogs[gma_paralogs$type == "WGD", 1:2]
edgelist_ppi <- gma_ppi
n <- 2 # small n for demonstration purposes
generate_nulls(edgelist, paralogs, edgelist_ppi, n)

Sample soybean GRN

Description

The GRN was inferred with BioNERO using expression data from Libault et al., 2010, and Severin et al., 2010.

Usage

data(gma_grn)
data(gma_grn)

Format

A 3-column data frame with node1, node2, and edge weight.

References

Severin, A. J., Woody, J. L., Bolon, Y. T., Joseph, B., Diers, B. W., Farmer, A. D., ... & Shoemaker, R. C. (2010). RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome. BMC plant biology, 10(1), 1-16.

Libault, M., Farmer, A., Joshi, T., Takahashi, K., Langley, R. J., Franklin, L. D., ... & Stacey, G. (2010). An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants. The Plant Journal, 63(1), 86-99.

Examples

data(gma_grn)
data(gma_grn)

Soybean (Glycine max) duplicated genes

Description

The repertoire of soybean paralogs was retrieved from Almeida-Silva et al., 2020.

Usage

data(gma_paralogs)
data(gma_paralogs)

Format

A 3-column data frame with duplicate 1, duplicate 2, and duplication type

References

Almeida-Silva, F., Moharana, K. C., Machado, F. B., & Venancio, T. M. (2020). Exploring the complexity of soybean (Glycine max) transcriptional regulation using global gene co-expression networks. Planta, 252(6), 1-12.

Examples

data(gma_paralogs)
data(gma_paralogs)

Sample soybean PPI network

Description

PPI were retrieved from the STRING database and filtered to keep only medium confidence edges and nodes in the GRN.

Usage

data(gma_ppi)
data(gma_ppi)

Format

A 2-column data frame with node1 and node2.

Examples

data(gma_ppi)
data(gma_ppi)

Null distribution of motif frequencies for vignette data set

Description

Data were filtered exactly as demonstrated in the vignette. Briefly, the top 30k edges from the GRN were kept, and only WGD-derived gene pairs were used.

Usage

data(nulls)
data(nulls)

Format

A list of numeric vectors with the motif frequencies in each simulated network. List elements are named lambda, delta, V, PPI_V, and bifan, and each element has length 100.

Examples

data(nulls)
data(nulls)

Calculate Sorensen-Dice similarity between paralogous gene pairs

Description

Calculate Sorensen-Dice similarity between paralogous gene pairs

Usage

sd_similarity(edgelist = NULL, paralogs = NULL)
sd_similarity(edgelist = NULL, paralogs = NULL)

Arguments

`edgelist`	A 2-column data frame with regulators in column 1 and targets in column 2.
`paralogs`	A 2-column data frame with gene IDs for each paralog in the paralog pair.

Value

A data frame containing the paralogous gene pairs and their Sorensen-Dice similarity scores.

Examples

data(gma_ppi)
data(gma_paralogs)
edgelist <- gma_ppi
paralogs <- gma_paralogs
sim <- sd_similarity(edgelist, paralogs)
data(gma_ppi)
data(gma_paralogs)
edgelist <- gma_ppi
paralogs <- gma_paralogs
sim <- sd_similarity(edgelist, paralogs)

Package 'magrene'

Help Index

Calculate Z-score for motif frequencies

Description

Usage

Arguments

Value

Examples

Find bifan motifs

Description

Usage

Arguments

Value

Examples

Find delta motifs

Description

Usage

Arguments

Value

Examples

Find lambda motifs

Description

Usage

Arguments

Value

Examples

Find V motifs in protein-protein interactions

Description

Usage

Arguments

Details

Value

Examples

Find V motifs

Description

Usage

Arguments

Value

Examples

Generate null distributions of motif counts for each motif type

Description

Usage

Arguments

Value

Examples

Sample soybean GRN

Description

Usage

Format

References

Examples

Soybean (Glycine max) duplicated genes

Description

Usage

Format

References

Examples

Sample soybean PPI network

Description

Usage

Format

Examples

Null distribution of motif frequencies for vignette data set

Description

Usage

Format

Examples

Calculate Sorensen-Dice similarity between paralogous gene pairs

Description

Usage

Arguments

Value

Examples