Title: | Differential gene co-expression |
---|---|
Description: | This package contains functionality to run differential gene co-expression across two different conditions. The algorithm is inspired by Voigt et al. 2017 and finds Conserved, Specific and Differentiated genes (hence the name CSD). This package include efficient and variance calculation by bootstrapping and Welford's algorithm. |
Authors: | Jakob Peder Pettersen [aut, cre] |
Maintainer: | Jakob Peder Pettersen <[email protected]> |
License: | GPL-3 |
Version: | 1.13.3 |
Built: | 2024-11-08 03:23:37 UTC |
Source: | https://github.com/bioc/csdR |
Sample expression matrices of thyroid gland tissue for thyroid cancer patients and healthy individuals. These datasets were pre-processed by Gulla et al. (2019). Due to size requirements, only 1000 randomly selected genes are provided in the dataset. Number of samples are 399 and 504 in the healthy controls and the sick samples, respectively.
data(normal_expression) data(sick_expression)
data(normal_expression) data(sick_expression)
Numeric matrices of normalized gene expression. Genes are in columns, whereas samples are in rows.
For the expression matrix for healthy individuals, GenotypeTissue Expression (GTEx) V7. For the thyroid cancer patients, the data are obtained for the Thyroid Cancer project (THCA) from The Cancer Genome Atlas (TCGA).
Gulla, Almaas, Eivind, & Voigt, André (2019). An integrated systems biology approach to investigate transcriptomic data of thyroid carcinoma. NTNU. http://hdl.handle.net/11250/2621725
Extracts the indecies of the largest elements of the input.
This procedure is equivalent to
order(x, decreasing = TRUE)[1:n_elements]
,
but is much faster and avoids the overhead of sorting discarded elements.
This function is useful for extracting the rows in a data frame having the
largest values in one of the columns.
partial_argsort(x, n_elements)
partial_argsort(x, n_elements)
x |
Numeric vector, the vector containing the numbers to sort. |
n_elements |
Integer scalar, the number of indecies to return. |
Numeric vector, the indecies of the largest elements (in sorted order) in
x
.
x <- c(10L,5L,-2L,12L,15L) max_indecies <- partial_argsort(x,3L) max_indecies x[max_indecies] order(x)[1:3] mtcars[partial_argsort(mtcars$hp,5L),]
x <- c(10L,5L,-2L,12L,15L) max_indecies <- partial_argsort(x,3L) max_indecies x[max_indecies] order(x)[1:3] mtcars[partial_argsort(mtcars$hp,5L),]
This function provides the more low-level functionality
of bootstrapping the Spearman correlations of the columns within a dataset.
Only use this function if you want
a low-level interface, else run_csd
provides a more streamlined approach if you want to do a CSD analysis.
run_cor_bootstrap( x, n_it = 20L, nThreads = 1L, verbose = TRUE, iterations_gap = 1L )
run_cor_bootstrap( x, n_it = 20L, nThreads = 1L, verbose = TRUE, iterations_gap = 1L )
x |
Numeric matrix, the gene expression matrix to analyse. Genes are in columns, samples are in rows. |
n_it |
Integer, number of bootstrap iterations |
nThreads |
Integer, number of threads to use for computations |
verbose |
Logical, should progress be printed? |
iterations_gap |
If output is verbose - Number of iterations between
each status message
(Default=1 - Displayed only if |
A list with two fields
Numeric matrix constaining the bootstrapped mean of the Spearman correlation between each column
Numeric matrix constaining the bootstrapped variance of the Spearman correlation between each column
data("normal_expression") cor_res <- run_cor_bootstrap( x = normal_expression, n_it = 100, nThreads = 2L )
data("normal_expression") cor_res <- run_cor_bootstrap( x = normal_expression, n_it = 100, nThreads = 2L )
This function implements the CSD algorithm based on the one presented by Voigt et al. 2017. All pairs of genes are first compared within each condition by the Spearman correlation and the correlation and its variance are estimated by bootstrapping. Finally, the results for the two conditions are compared and C-, S- and D-values are computed and returned.
run_csd( x_1, x_2, n_it = 20L, nThreads = 1L, verbose = TRUE, iterations_gap = 1L )
run_csd( x_1, x_2, n_it = 20L, nThreads = 1L, verbose = TRUE, iterations_gap = 1L )
x_1 |
Numeric matrix, the gene expression matrix for the first condition. Genes are in columns, samples are in rows. The columns must be named with the name of the genes. Missing values are not allowed. |
x_2 |
Numeric matrix, the gene expression matrix for the second condition. |
n_it |
Integer, number of bootstrap iterations |
nThreads |
Integer, number of threads to use for computations |
verbose |
Logical, should progress be printed? |
iterations_gap |
If output is verbose - Number of iterations between
each status message
(Default=1 - Displayed only if |
The gene names in x_1
and x_2
do not need to be in the same order,
but must be in the same namespace.
Only genes present in both datasets will be considered for the analysis.
The parallelism gained by nThreads
applies to the computations
within a single iteration. The iterations are run is serial in order
to reduce the memory footprint.
A data.frame
with
the additional class attribute csd_res
with the
results of the CSD analysis.
This frame has a row for each pair of genes and has the
following columns:
Gene1
Character, the name of the first gene
Gene2
Character, the name of the second gene
rho1
Mean correlation of the two genes in the first condition
rho2
Mean correlation of the two genes in the second condition
var1
The estimated variance of rho1
determined by bootstrapping
var2
The estimated variance of rho2
determined by bootstrapping
cVal
Numeric, the conserved score. A high value indicates that the co-expression of the two genes have the same sign in both conditions
sVal
Numeric, the specific score. A high value indicates that the co-expression of the two genes have a high degree of co-expression in one condition, but not the other.
dVal
Numeric, the differentiated score. A high value indicates that the co-expression of the two genes have a high degree of co-expression in both condition, but the sign of co-expression is different.
Voigt A, Nowick K and Almaas E 'A composite network of conserved and tissue specific gene interactions reveals possible genetic interactions in glioma' In: PLOS Computational Biology 13(9): e1005739. (doi: https://doi.org/10.1371/journal.pcbi.1005739)
data("sick_expression") data("normal_expression") cor_res <- run_csd( x_1 = sick_expression, x_2 = normal_expression, n_it = 100, nThreads = 2L ) c_max <- max(cor_res$cVal)
data("sick_expression") data("normal_expression") cor_res <- run_csd( x_1 = sick_expression, x_2 = normal_expression, n_it = 100, nThreads = 2L ) c_max <- max(cor_res$cVal)