Package 'csdR' reference manual

Title:	Differential gene co-expression
Description:	This package contains functionality to run differential gene co-expression across two different conditions. The algorithm is inspired by Voigt et al. 2017 and finds Conserved, Specific and Differentiated genes (hence the name CSD). This package include efficient and variance calculation by bootstrapping and Welford's algorithm.
Authors:	Jakob Peder Pettersen [aut, cre]
Maintainer:	Jakob Peder Pettersen <[email protected]>
License:	GPL-3
Version:	1.13.4
Built:	2025-03-27 03:36:17 UTC
Source:	https://github.com/bioc/csdR

Sample expression matrices for CSD

Description

Sample expression matrices of thyroid gland tissue for thyroid cancer patients and healthy individuals. These datasets were pre-processed by Gulla et al. (2019). Due to size requirements, only 1000 randomly selected genes are provided in the dataset. Number of samples are 399 and 504 in the healthy controls and the sick samples, respectively.

Usage

data(normal_expression)
data(sick_expression)
data(normal_expression)
data(sick_expression)

Format

Numeric matrices of normalized gene expression. Genes are in columns, whereas samples are in rows.

Source

For the expression matrix for healthy individuals, GenotypeTissue Expression (GTEx) V7. For the thyroid cancer patients, the data are obtained for the Thyroid Cancer project (THCA) from The Cancer Genome Atlas (TCGA).

References

Gulla, Almaas, Eivind, & Voigt, André (2019). An integrated systems biology approach to investigate transcriptomic data of thyroid carcinoma. NTNU. http://hdl.handle.net/11250/2621725

Extract indecies corresponding to the largest elements

Description

Extracts the indecies of the $n$ largest elements of the input. This procedure is equivalent to order(x, decreasing = TRUE)[1:n_elements], but is much faster and avoids the overhead of sorting discarded elements. This function is useful for extracting the rows in a data frame having the largest values in one of the columns.

Usage

partial_argsort(x, n_elements)
partial_argsort(x, n_elements)

Arguments

`x`	Numeric vector, the vector containing the numbers to sort.
`n_elements`	Integer scalar, the number of indecies to return.

Value

Numeric vector, the indecies of the largest elements (in sorted order) in x.

Examples

x <- c(10L,5L,-2L,12L,15L)
max_indecies <- partial_argsort(x,3L)
max_indecies
x[max_indecies]
order(x)[1:3]
mtcars[partial_argsort(mtcars$hp,5L),]
x <- c(10L,5L,-2L,12L,15L)
max_indecies <- partial_argsort(x,3L)
max_indecies
x[max_indecies]
order(x)[1:3]
mtcars[partial_argsort(mtcars$hp,5L),]

Run bootstrapping of Spearman correlations within a dataset

Description

This function provides the more low-level functionality of bootstrapping the Spearman correlations of the columns within a dataset. Only use this function if you want a low-level interface, else run_csd provides a more streamlined approach if you want to do a CSD analysis.

Usage

run_cor_bootstrap(
  x,
  n_it = 20L,
  nThreads = 1L,
  verbose = TRUE,
  iterations_gap = 1L
)
run_cor_bootstrap(
  x,
  n_it = 20L,
  nThreads = 1L,
  verbose = TRUE,
  iterations_gap = 1L
)

Arguments

`x`	Numeric matrix, the gene expression matrix to analyse. Genes are in columns, samples are in rows.
`n_it`	Integer, number of bootstrap iterations
`nThreads`	Integer, number of threads to use for computations
`verbose`	Logical, should progress be printed?
`iterations_gap`	If output is verbose - Number of iterations between each status message (Default=1 - Displayed only if `verbose=TRUE`)

Value

A list with two fields

rho: Numeric matrix constaining the bootstrapped mean of the Spearman correlation between each column
var: Numeric matrix constaining the bootstrapped variance of the Spearman correlation between each column

Examples

data("normal_expression")
cor_res <- run_cor_bootstrap(
    x = normal_expression,
    n_it = 100, nThreads = 2L
)
data("normal_expression")
cor_res <- run_cor_bootstrap(
    x = normal_expression,
    n_it = 100, nThreads = 2L
)

Run CSD analysis

Description

This function implements the CSD algorithm based on the one presented by Voigt et al. 2017. All pairs of genes are first compared within each condition by the Spearman correlation and the correlation and its variance are estimated by bootstrapping. Finally, the results for the two conditions are compared and C-, S- and D-values are computed and returned.

Usage

run_csd(
  x_1,
  x_2,
  n_it = 20L,
  nThreads = 1L,
  verbose = TRUE,
  iterations_gap = 1L
)
run_csd(
  x_1,
  x_2,
  n_it = 20L,
  nThreads = 1L,
  verbose = TRUE,
  iterations_gap = 1L
)

Arguments

`x_1`	Numeric matrix, the gene expression matrix for the first condition. Genes are in columns, samples are in rows. The columns must be named with the name of the genes. Missing values are not allowed.
`x_2`	Numeric matrix, the gene expression matrix for the second condition.
`n_it`	Integer, number of bootstrap iterations
`nThreads`	Integer, number of threads to use for computations
`verbose`	Logical, should progress be printed?
`iterations_gap`	If output is verbose - Number of iterations between each status message (Default=1 - Displayed only if `verbose=TRUE`)

Details

The gene names in x_1 and x_2 do not need to be in the same order, but must be in the same namespace. Only genes present in both datasets will be considered for the analysis. The parallelism gained by nThreads applies to the computations within a single iteration. The iterations are run is serial in order to reduce the memory footprint.

Value

A data.frame with the additional class attribute csd_res with the results of the CSD analysis. This frame has a row for each pair of genes and has the following columns:

Gene1: Character, the name of the first gene
Gene2: Character, the name of the second gene
rho1: Mean correlation of the two genes in the first condition
rho2: Mean correlation of the two genes in the second condition
var1: The estimated variance of rho1 determined by bootstrapping
var2: The estimated variance of rho2 determined by bootstrapping
cVal: Numeric, the conserved score. A high value indicates that the co-expression of the two genes have the same sign in both conditions
sVal: Numeric, the specific score. A high value indicates that the co-expression of the two genes have a high degree of co-expression in one condition, but not the other.
dVal: Numeric, the differentiated score. A high value indicates that the co-expression of the two genes have a high degree of co-expression in both condition, but the sign of co-expression is different.

References

Voigt A, Nowick K and Almaas E 'A composite network of conserved and tissue specific gene interactions reveals possible genetic interactions in glioma' In: PLOS Computational Biology 13(9): e1005739. (doi: https://doi.org/10.1371/journal.pcbi.1005739)

Examples

data("sick_expression")
data("normal_expression")
cor_res <- run_csd(
    x_1 = sick_expression, x_2 = normal_expression,
    n_it = 100, nThreads = 2L
)
c_max <- max(cor_res$cVal)
data("sick_expression")
data("normal_expression")
cor_res <- run_csd(
    x_1 = sick_expression, x_2 = normal_expression,
    n_it = 100, nThreads = 2L
)
c_max <- max(cor_res$cVal)

Package 'csdR'

Help Index

Sample expression matrices for CSD

Description

Usage

Format

Source

References

Extract indecies corresponding to the largest elements

Description

Usage

Arguments

Value

Examples

Run bootstrapping of Spearman correlations within a dataset

Description

Usage

Arguments

Value

Examples

Run CSD analysis

Description

Usage

Arguments

Details

Value

References

Examples