Title: | Mixture Nested Effects Models |
---|---|
Description: | Mixture Nested Effects Models (mnem) is an extension of Nested Effects Models and allows for the analysis of single cell perturbation data provided by methods like Perturb-Seq (Dixit et al., 2016) or Crop-Seq (Datlinger et al., 2017). In those experiments each of many cells is perturbed by a knock-down of a specific gene, i.e. several cells are perturbed by a knock-down of gene A, several by a knock-down of gene B, ... and so forth. The observed read-out has to be multi-trait and in the case of the Perturb-/Crop-Seq gene are expression profiles for each cell. mnem uses a mixture model to simultaneously cluster the cell population into k clusters and and infer k networks causally linking the perturbed genes for each cluster. The mixture components are inferred via an expectation maximization algorithm. |
Authors: | Martin Pirkl [aut, cre] |
Maintainer: | Martin Pirkl <[email protected]> |
License: | GPL-3 |
Version: | 1.23.0 |
Built: | 2024-12-05 03:47:36 UTC |
Source: | https://github.com/bioc/mnem |
Example data: mnem results for the Dixit et al., 2016 and Datlinger et al., pooled CRISPR screens. For details see the vignette or function createApp().
app
app
Datlinger, P., Rendeiro, A., Schmidl, C., Krausgruber, T., Traxler, P., Klughammer, J., Schuster, L. C., Kuchler, A., Alpar, D., and Bock, C. (2017). Pooled crispr screening with single-cell transcriptome readout. Nature Methods, 14, 297-301.
Dixit, A., Parnas, O., Li, B., Chen, J., Fulco, C. P., Jerby-Arnon, L., Marjanovic, N. D., Dionne, D., Burks, T., Raychowdhury, R., Adamson, B., Norman, T. M., Lander, E. S., Weissman, J. S., Friedman, N., and Regev, A. (2016). Perturb-seq: Dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. Cell, 167(7), 1853-1866.e17.
data(app)
data(app)
Run bootstrap simulations on the components (phi) of an object of class mnem.
bootstrap(x, size = 1000, p = 1, logtype = 2, complete = FALSE, ...)
bootstrap(x, size = 1000, p = 1, logtype = 2, complete = FALSE, ...)
x |
mnem object |
size |
size of the booststrap simulations |
p |
percentage of samples (e.g. for 100 E-genes p=0.5 means sampling 50) |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
complete |
if TRUE, complete data log likelihood is considered (for very large data sets, e.g. 1000 cells and 1000 E-genes) |
... |
additional parameters for the nem function |
returns bootstrap support for each edge in each component (phi); list of adjacency matrices
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) boot <- bootstrap(result, size = 2)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) boot <- bootstrap(result, size = 2)
This function clusters the data and performs standard nem on each cluster.
clustNEM( data, k = 2:10, cluster = NULL, starts = 1, logtype = 2, nem = TRUE, getprobspars = list(), getaffinitypars = list(), Rho = NULL, ... )
clustNEM( data, k = 2:10, cluster = NULL, starts = 1, logtype = 2, nem = TRUE, getprobspars = list(), getaffinitypars = list(), Rho = NULL, ... )
data |
data of log ratios with cells in columns and features in rows |
k |
number of clusters to check |
cluster |
given clustering has to correspond to the columns of data |
starts |
number of random starts for the kmeans algorithm |
logtype |
logarithm type of the data |
nem |
if FALSE only clusters the data |
getprobspars |
list of parameters for the getProbs function |
getaffinitypars |
list of parameters for the getAffinity function |
Rho |
perturbation matrix with dimensions nxl with n S-genes and l samples; either as probabilities with the sum of probabilities for a sample less or equal to 1 or discrete with 1s and 0s |
... |
additional arguments for standard nem function |
family of nems; the first k list entries hold full information of the standard nem search
comp |
list of all adjacency matrices phi |
mw |
vector of mixture weights |
probs |
fake cell probabilities (see mw: mixture weights) |
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) resulst <- clustNEM(data, k = 2:3)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) resulst <- clustNEM(data, k = 2:3)
This function is for the reproduction of the application results in the vignette and publication. See the publication Pirkl & Beerenwinkel (2018) on how to download the data files: GSE92872_CROP-seq_Jurkat_TCR.digital_expression.csv k562_both_filt.txt GSM2396861_k562_ccycle_cbc_gbc_dict.csv GSM2396858_k562_tfs_7_cbc_gbc_dict.csv
createApp( sets = seq_len(3), m = NULL, n = NULL, o = NULL, maxk = 5, parallel = NULL, path = "", types = c("data", "lods", "mnem"), allcrop = FALSE, multi = FALSE, file = NULL, ... )
createApp( sets = seq_len(3), m = NULL, n = NULL, o = NULL, maxk = 5, parallel = NULL, path = "", types = c("data", "lods", "mnem"), allcrop = FALSE, multi = FALSE, file = NULL, ... )
sets |
numeric vector with the data sets: 1 (CROPseq), 2, 3 (both PERTURBseq); default is all three |
m |
number of Sgenes (for testing) |
n |
number of most variable E-genes (for testing) |
o |
number of samples per S-gene (for testing) |
maxk |
maximum number of component in mnem inference (default: 5) |
parallel |
number of threads for parallelisation |
path |
path to the data files path/file.csv: "path/" |
types |
types of data/analysis; "data" creates the gene expression matrix, "lods" includes the log odds, "mnem" additionally performes the mixture nem analysis; default c("data", "lods", "mnem") |
allcrop |
if TRUE, does not restrict and uses the full CROPseq dataset |
multi |
if TRUE, includes cells with more than one perturbed gene |
file |
path and filename of the rda file with the raw data from the command "data <- createApp(..., types = "data")" |
... |
additional parameters for the mixture nem function |
app data object
Martin Pirkl
## recreate the app data object (takes very long, i.e. days) ## Not run: createApp() ## End(Not run) data(app)
## recreate the app data object (takes very long, i.e. days) ## Not run: createApp() ## End(Not run) data(app)
Computes the accuracy of the fit between simulated and inferred mixture.
fitacc(x, y, strict = FALSE, unique = TRUE, type = "ham")
fitacc(x, y, strict = FALSE, unique = TRUE, type = "ham")
x |
mnem object |
y |
simulation object or another mnem object |
strict |
if TRUE, accounts for over/underfitting, i.e. the number of components |
unique |
if TRUE, phis of x and y are made unique each (FALSE if strict is TRUE) |
type |
type of accuracy. "ham" for hamming, "sens" for sensitivity and "spec for Specificity" |
plot of EM convergence
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) fitacc(result, sim) fitacc(result, sim, type = "sens") fitacc(result, sim, type = "spec") fitacc(result, sim, strict = TRUE, type = "sens") fitacc(result, sim, strict = TRUE, type = "spec")
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) fitacc(result, sim) fitacc(result, sim, type = "sens") fitacc(result, sim, type = "spec") fitacc(result, sim, strict = TRUE, type = "sens") fitacc(result, sim, strict = TRUE, type = "spec")
Calculates responsibilities and mixture weights based on the ground truth and noisy data.
fuzzyindex(x, data, logtype = 2, complete = FALSE, marginal = FALSE, ...)
fuzzyindex(x, data, logtype = 2, complete = FALSE, marginal = FALSE, ...)
x |
mnem_sim object |
data |
noisy data matrix |
logtype |
logarithm type of the data |
complete |
if TRUE, complete data log likelihood is considered (for very large data sets, e.g. 1000 cells and 1000 E-genes) |
marginal |
logical to compute the marginal likelihood (TRUE) |
... |
additional parameters for the function getAffinity |
list with cell log odds mixture weights and log likelihood
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- sim$data data[which(sim$data == 1)] <- rnorm(sum(sim$data == 1), 1, 1) data[which(sim$data == 0)] <- rnorm(sum(sim$data == 0), -1, 1) fuzzy <- fuzzyindex(sim, data)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- sim$data data[which(sim$data == 1)] <- rnorm(sum(sim$data == 1), 1, 1) data[which(sim$data == 0)] <- rnorm(sum(sim$data == 0), -1, 1) fuzzy <- fuzzyindex(sim, data)
This function calculates the responsibilities of each component for all cells from the expected log distribution of the hidden data.
getAffinity( x, affinity = 0, norm = TRUE, logtype = 2, mw = NULL, data = matrix(0, 2, ncol(x)), complete = FALSE )
getAffinity( x, affinity = 0, norm = TRUE, logtype = 2, mw = NULL, data = matrix(0, 2, ncol(x)), complete = FALSE )
x |
log odds for l cells and k components as a kxl matrix |
affinity |
0 for standard soft clustering, 1 for hard clustering during inference (not recommended) |
norm |
if TRUE normalises to probabilities (recommended) |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
mw |
mixture weights of the components |
data |
data in log odds |
complete |
if TRUE, complete data log likelihood is considered (for very large data sets, e.g. 1000 cells and 1000 E-genes) |
responsibilities as a kxl matrix (k components, l cells)
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) resp <- getAffinity(result$probs, mw = result$mw, data = data)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) resp <- getAffinity(result$probs, mw = result$mw, data = data)
This function calculates a negative penalized log likelihood given a object of class mnem. This penalized likelihood is based on the normal likelihood and penalizes complexity of the mixture components (i.e. the networks).
getIC( x, man = FALSE, degree = 4, logtype = 2, pen = 2, useF = FALSE, Fnorm = FALSE )
getIC( x, man = FALSE, degree = 4, logtype = 2, pen = 2, useF = FALSE, Fnorm = FALSE )
x |
mnem object |
man |
logical. manual data penalty, e.g. man=TRUE and pen=2 for an approximation of the Akaike Information Criterion |
degree |
different degree of penalty for complexity: positive entries of transitively reduced phis or phi^r (degree=0), phi^r and mixture components minus one k-1 (1), phi^r, k-1 and positive entries of thetas (2), positive entries of transitively closed phis or phi^t, k-1 (3), phi^t, theta, k-1 (4, default), all entries of phis, thetas and k-1 (5) |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
pen |
penalty weight for the data (e.g. pen=2 for approximate Akaike Information Criterion) |
useF |
use F (see publication) as complexity instead of phi and theta |
Fnorm |
normalize complexity of F, i.e. if two components have the same entry in F, it is only counted once |
penalized log likelihood
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) pen <- numeric(3) result <- list() for (k in seq_len(2)) { result[[k]] <- mnem(data, k = k, starts = 1) pen[k] <- getIC(result[[k]]) } print(pen)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) pen <- numeric(3) result <- list() for (k in seq_len(2)) { result[[k]] <- mnem(data, k = k, starts = 1) pen[k] <- getIC(result[[k]]) } print(pen)
This function uses the hamming distance to calculate an accuracy for two networks (phi).
hamSim(a, b, diag = 1, symmetric = TRUE)
hamSim(a, b, diag = 1, symmetric = TRUE)
a |
adjacency matrix (phi) |
b |
adjacency matrix (phi) |
diag |
if 1 includes diagonal in distance, if 0 not |
symmetric |
comparing a to b is asymmetrical, if TRUE includes comparison b to a |
normalized hamming accuracy for a and b
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) similarity <- hamSim(sim$Nem[[1]], sim$Nem[[2]])
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) similarity <- hamSim(sim$Nem[[1]], sim$Nem[[2]])
This function simultaneously learns a mixture of causal networks and clusters of a cell population from single cell perturbation data (e.g. log odds of fold change) with a multi-trait readout. E.g. Pooled CRISPR scRNA-Seq data (Perturb-Seq. Dixit et al., 2016, Crop-Seq. Datlinger et al., 2017).
mnem( D, inference = "em", search = "greedy", phi = NULL, theta = NULL, mw = NULL, method = "llr", marginal = FALSE, parallel = NULL, reduce = FALSE, runs = 1, starts = 3, type = "networks", complete = FALSE, p = NULL, k = NULL, kmax = 10, verbose = FALSE, max_iter = 100, parallel2 = NULL, converged = -Inf, redSpace = NULL, affinity = 0, evolution = FALSE, lambda = 1, subtopoX = NULL, ratio = TRUE, logtype = 2, domean = TRUE, modulesize = 5, compress = FALSE, increase = TRUE, fpfn = c(0.1, 0.1), Rho = NULL, ksel = c("kmeans", "silhouette", "cor"), nullcomp = FALSE, tree = FALSE, burnin = 10, hastings = TRUE, nodeswitch = TRUE, postgaps = 10, penalized = FALSE, accept_range = 1, ... )
mnem( D, inference = "em", search = "greedy", phi = NULL, theta = NULL, mw = NULL, method = "llr", marginal = FALSE, parallel = NULL, reduce = FALSE, runs = 1, starts = 3, type = "networks", complete = FALSE, p = NULL, k = NULL, kmax = 10, verbose = FALSE, max_iter = 100, parallel2 = NULL, converged = -Inf, redSpace = NULL, affinity = 0, evolution = FALSE, lambda = 1, subtopoX = NULL, ratio = TRUE, logtype = 2, domean = TRUE, modulesize = 5, compress = FALSE, increase = TRUE, fpfn = c(0.1, 0.1), Rho = NULL, ksel = c("kmeans", "silhouette", "cor"), nullcomp = FALSE, tree = FALSE, burnin = 10, hastings = TRUE, nodeswitch = TRUE, postgaps = 10, penalized = FALSE, accept_range = 1, ... )
D |
data with cells indexing the columns and features (E-genes) indexing the rows |
inference |
inference method "em" for expectation maximization or "mcmc" for markov chain monte carlo sampling |
search |
search method for single network inference "greedy", "exhaustive" or "modules" (also possible: "small", which is greedy with only one edge change per M-step to make for a smooth convergence) |
phi |
a list of n lists of k networks for n starts of the EM and k components |
theta |
a list of n lists of k attachment vector for the E-genes for n starts of the EM and k components |
mw |
mixture weights; if NULL estimated or uniform |
method |
"llr" for log ratios or foldchanges as input (see ratio) |
marginal |
logical to compute the marginal likelihood (TRUE) |
parallel |
number of threads for parallelization of the number of em runs |
reduce |
logical - reduce search space for exhaustive search to unique networks |
runs |
number of runs for greedy search |
starts |
number of starts for the em or mcmc |
type |
initialize with responsibilities either by "random", "cluster" (each S-gene is clustered and the different S-gene clustered differently combined for several starts), "cluster2" (clustNEM is used to infer reasonable phis, which are then used as a start for one EM run), "cluster3" (global clustering as a start), or "networks" (initialize with random phis), inference='mcmc' only supports 'networks' and 'empty' for unconncected networks phi |
complete |
if TRUE, optimizes the expected complete log likelihood of the model, otherwise the log likelihood of the observed data |
p |
initial probabilities as a k (components) times l (cells) matrix |
k |
number of components |
kmax |
maximum number of components when k=NULL is inferred |
verbose |
verbose output |
max_iter |
maximum iterations (moves for inference='mcmc'. adjust parameter burnin) |
parallel2 |
if parallel=NULL, number of threads for single component optimization |
converged |
absolute distance for convergence between new and old log likelihood; if set to -Inf, the EM stops if neither the phis nor thetas were changed in the most recent iteration |
redSpace |
space for "exhaustive" search |
affinity |
0 is default for soft clustering, 1 is for hard clustering |
evolution |
logical. If TRUE components are penelized for being different from each other. |
lambda |
smoothness value for the prior put on the components, if evolution set to TRUE |
subtopoX |
hard prior on theta as a vector with entry i equal to j, if E-gene i is attached to S-gene j |
ratio |
logical, if true data is log ratios, if false foldchanges |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
domean |
average the data, when calculating a single NEM (speed improvment) |
modulesize |
max number of S-genes per module in module search |
compress |
compress networks after search (warning: penelized likelihood not interpretable) |
increase |
if set to FALSE, the algorithm will not stop if the likelihood decreases |
fpfn |
numeric vector of length two with false positive and false negative rates for discrete data |
Rho |
perturbation matrix with dimensions nxl with n S-genes and l samples; either as probabilities with the sum of probabilities for a sample less or equal to 1 or discrete with 1s and 0s |
ksel |
character vector of methods for the inference of k; can combine as the first two vlues "hc" (hierarchical clustering) or "kmeans" with "silhouette", "BIC" or "AIC"; the third value is either "cor" for correlation distance or any method accepted by the function 'dist' |
nullcomp |
if TRUE, adds a null component (k+1) |
tree |
if TRUE, restrict inference on trees (MCMC not included) |
burnin |
number of iterations to be discarded prior to analyzing the posterior distribution of the mcmc |
hastings |
if set to TRUE, the Hastings ratio is calculated |
nodeswitch |
if set to TRUE, node switching is allowed as a move, additional to the edge moves |
postgaps |
can be set to numeric. Determines after how many iterations the next Phi mixture is added to the Phi edge Frequency tracker in the mcmc |
penalized |
if set to TRUE, the penalized likelihood will be used for the mcmc. Per default this is FALSE, since no component learning is involved and sparcity is hence not enforced |
accept_range |
the random probability the acceptance probability is compared to (default: 1) |
... |
arguments to function nem |
object of class mnem
comp |
list of the component with each component being a list of the causal network phi and the E-gene attachment theta |
data |
input data matrix |
limits |
list of results for all indpendent searches |
ll |
log likelihood of the best model |
lls |
log likelihood ascent of the best model search |
mw |
vector with mixture weights |
probs |
kxl matrix containing the cell log likelihoods of the model |
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1)
This function does a hierarchical mixture. That means it uses the approximate BIC to check, if there are more than one component. It recursively splits the data if there is evidence for k > 1 components.
mnemh(data, k = 2, logtype = 2, getprobspars = list(), ...)
mnemh(data, k = 2, logtype = 2, getprobspars = list(), ...)
data |
data matrix either binary or log odds |
k |
number of maximal components for each hierarchy leaf |
logtype |
log type of the data |
getprobspars |
list of parameters for the getProbs function |
... |
additional parameters for the mnem function |
object of class mnem
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnemh(data, starts = 1, k = 1)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnemh(data, starts = 1, k = 1)
High level function for learning the number of components k, if unknown.
mnemk( D, ks = seq_len(5), man = FALSE, degree = 4, logtype = 2, pen = 2, useF = FALSE, Fnorm = FALSE, ... )
mnemk( D, ks = seq_len(5), man = FALSE, degree = 4, logtype = 2, pen = 2, useF = FALSE, Fnorm = FALSE, ... )
D |
data with cells indexing the columns and features (E-genes) indexing the rows |
ks |
vector of number of components k to test |
man |
logical. manual data penalty, e.g. man=TRUE and pen=2 for an approximation of the Akaike Information Criterion |
degree |
different degree of penalty for complexity: positive entries of transitively reduced phis or phi^r (degree=0), phi^r and mixture components minus one k-1 (1), phi^r, k-1 and positive entries of thetas (2), positive entries of transitively closed phis or phi^t, k-1 (3), phi^t, theta, k-1 (4, default), all entries of phis, thetas and k-1 (5) |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
pen |
penalty weight for the data (e.g. pen=2 for approximate Akaike Information Criterion) |
useF |
use F (see publication) as complexity instead of phi and theta |
Fnorm |
normalize complexity of F, i.e. if two components have the same entry in F, it is only counted once |
... |
additional parameters for the mnem main function |
list containing the result of the best k as an mnem object and the raw and penalized log likelihoods
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnemk(data, ks = seq_len(2), starts = 1)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnemk(data, ks = seq_len(2), starts = 1)
Plots a boxplots plus x-axis randomised scatter and mirrored densities to visualise a distribution.
moreboxplot( x, box = TRUE, dens = TRUE, scatter = "no", polygon = TRUE, sd = 0.1, dcol = NULL, scol = NULL, dlty = 1, dlwd = 1, spch = 1, gcol = rgb(0, 0, 0, 0.5), glty = 2, glen = 2001, gmin = -100, gmax = 100, ... )
moreboxplot( x, box = TRUE, dens = TRUE, scatter = "no", polygon = TRUE, sd = 0.1, dcol = NULL, scol = NULL, dlty = 1, dlwd = 1, spch = 1, gcol = rgb(0, 0, 0, 0.5), glty = 2, glen = 2001, gmin = -100, gmax = 100, ... )
x |
list, matrix or data.frame |
box |
if TRUE, draws boxes |
dens |
if TRUE, draws densities |
scatter |
if set to "random", draws x-axis randomised scatter points |
polygon |
if TRUE, filles the densities |
sd |
standard deviation of the scatter |
dcol |
color of the densities |
scol |
color of the scatter points |
dlty |
line type of the densities |
dlwd |
line width of the densities |
spch |
type of scatter points |
gcol |
color of the grid |
glty |
line type of the grid |
glen |
length of the grid |
gmin |
minimal point of the grid |
gmax |
maximal point of the grid |
... |
optional parameters for boxplot or plot |
transitively closed matrix or graphNEL
Martin Pirkl
D <- matrix(rnorm(100*3), 100, 3) moreboxplot(D)
D <- matrix(rnorm(100*3), 100, 3) moreboxplot(D)
Infers a signalling pathway from perturbation experiments.
nem( D, search = "greedy", start = NULL, method = "llr", marginal = FALSE, parallel = NULL, reduce = FALSE, weights = NULL, runs = 1, verbose = FALSE, redSpace = NULL, trans.close = TRUE, subtopo = NULL, prior = NULL, ratio = TRUE, domean = TRUE, modulesize = 5, fpfn = c(0.1, 0.1), Rho = NULL, logtype = 2, modified = FALSE, tree = FALSE, learnRates = FALSE, stepSize = 0.01, ... )
nem( D, search = "greedy", start = NULL, method = "llr", marginal = FALSE, parallel = NULL, reduce = FALSE, weights = NULL, runs = 1, verbose = FALSE, redSpace = NULL, trans.close = TRUE, subtopo = NULL, prior = NULL, ratio = TRUE, domean = TRUE, modulesize = 5, fpfn = c(0.1, 0.1), Rho = NULL, logtype = 2, modified = FALSE, tree = FALSE, learnRates = FALSE, stepSize = 0.01, ... )
D |
data matrix with observed genes as rows and knock-down experiments as columns |
search |
either "greedy", "modules" or "exhaustive" (not recommended for more than five S-genes) |
start |
either NULL ("null") or a specific network to start the greedy |
method |
"llr" for log odds or p-values densities or "disc" for binary data |
marginal |
logical to compute the marginal likelihood (TRUE) |
parallel |
NULL for no parallel optimization or an integer for the number of threads |
reduce |
reduce search space (TRUE) for exhaustive search |
weights |
a numeric vector of weights for the columns of D |
runs |
the number of runs for the greedy search |
verbose |
for verbose output (TRUE) |
redSpace |
reduced search space for exhaustive search; see result of exhaustive search with reduce = TRUE |
trans.close |
if TRUE uses the transitive closure of adj |
subtopo |
optional matrix with the subtopology theta as adjacency matrix |
prior |
a prior network matrix for adj |
ratio |
if FALSE uses alternative distance for the model score |
domean |
if TRUE summarizes duplicate columns |
modulesize |
the max number of S-genes included in one module for search = "modules" |
fpfn |
numeric vector of length two with false positive and false negative rates |
Rho |
optional perturbation matrix |
logtype |
log base of the log odds |
modified |
if TRUE, assumes a preprocessed data matrix |
tree |
if TRUE forces tree; does not allow converging edges |
learnRates |
if TRUE learns rates for false positives/negatives |
stepSize |
numerical step size for learning rates |
... |
optional parameters for future search methods |
transitively closed matrix or graphNEL
Martin Pirkl
D <- matrix(rnorm(100*3), 100, 3) colnames(D) <- 1:3 rownames(D) <- 1:100 adj <- diag(3) colnames(adj) <- rownames(adj) <- 1:3 scoreAdj(D, adj)
D <- matrix(rnorm(100*3), 100, 3) colnames(D) <- 1:3 rownames(D) <- 1:100 adj <- diag(3) colnames(adj) <- rownames(adj) <- 1:3 scoreAdj(D, adj)
Plot bootstrap mnem result.
## S3 method for class 'bootmnem' plot(x, reduce = TRUE, ...)
## S3 method for class 'bootmnem' plot(x, reduce = TRUE, ...)
x |
bootmnem object |
reduce |
if TRUE transitively reduces the graphs |
... |
additional parameters for the plotting function plotDNF |
visualization of bootstrap mnem result with Rgraphviz
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) boot <- bootstrap(result, size = 2) plot(boot)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) boot <- bootstrap(result, size = 2) plot(boot)
Plot mnem result.
## S3 method for class 'mnem' plot( x, oma = c(3, 1, 1, 3), main = "M&NEM", anno = TRUE, cexAnno = 1, scale = NULL, global = TRUE, egenes = TRUE, sep = FALSE, tsne = FALSE, affinity = 0, logtype = 2, cells = TRUE, pch = ".", legend = FALSE, showdata = FALSE, bestCell = TRUE, showprobs = FALSE, shownull = TRUE, ratio = TRUE, method = "llr", marginal = FALSE, showweights = TRUE, ... )
## S3 method for class 'mnem' plot( x, oma = c(3, 1, 1, 3), main = "M&NEM", anno = TRUE, cexAnno = 1, scale = NULL, global = TRUE, egenes = TRUE, sep = FALSE, tsne = FALSE, affinity = 0, logtype = 2, cells = TRUE, pch = ".", legend = FALSE, showdata = FALSE, bestCell = TRUE, showprobs = FALSE, shownull = TRUE, ratio = TRUE, method = "llr", marginal = FALSE, showweights = TRUE, ... )
x |
mnem object |
oma |
outer margin |
main |
main text |
anno |
annotate cells by their perturbed gene |
cexAnno |
text size of the cell annotations |
scale |
scale cells to show relative and not absolute distances |
global |
if TRUE clusters all cells, if FALSE clusters cells within a component |
egenes |
show egene attachments, i.e. number of E-genes assigned to each S-gene |
sep |
seperate clusters and not put them on top of each other for better visualization |
tsne |
if TRUE use tsne instead of pca |
affinity |
use hard clustering if TRUE |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
cells |
show cell attachments, .i.e how many cells are assigned to each S-gene |
pch |
cell symbol |
legend |
show legend |
showdata |
show data if TRUE |
bestCell |
show probability of best fitting cell for each S-gene |
showprobs |
if TRUE, shows responsibilities for all cells and components |
shownull |
if TRUE, shows the null node |
ratio |
use log ratios (TRUE) or foldchanges (FALSE) |
method |
"llr" for ratios |
marginal |
logical to compute the marginal likelihood (TRUE) |
showweights |
if TRUE, shows mixture weights for all components |
... |
additional parameters |
visualization of mnem result with Rgraphviz
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) plot(result)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) plot(result)
Plot mnem_mcmc result.
## S3 method for class 'mnem_mcmc' plot(x, starts = NULL, burnin = 0, ...)
## S3 method for class 'mnem_mcmc' plot(x, starts = NULL, burnin = 0, ...)
x |
mnem_mcmc object |
starts |
restarts of mcmc as used in mnem function |
burnin |
number of iteration to start from |
... |
parameters for function ggplot2 |
visualization of mcmc result with Rgraphviz
Viktoria Brunner
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) plot(result)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) plot(result)
Plot simulated mixture.
## S3 method for class 'mnem_sim' plot(x, data = NULL, logtype = 2, fuzzypars = list(), ...)
## S3 method for class 'mnem_sim' plot(x, data = NULL, logtype = 2, fuzzypars = list(), ...)
x |
mnem_sim object |
data |
noisy data matrix (optional) |
logtype |
logarithm type of the data |
fuzzypars |
list of parameters for the function fuzzyindex |
... |
additional parameters for the plotting function plotDNF |
visualization of simulated mixture with Rgraphviz
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) plot(sim)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) plot(sim)
Generic function plotting convergence diagnostics for different methods.
plotConvergence(x, ...)
plotConvergence(x, ...)
x |
object with convergence statistics |
... |
additional parameters for the specific object type |
plot of EM convergence
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) par(mfrow=c(2,2)) plotConvergence(result)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) par(mfrow=c(2,2)) plotConvergence(result)
This function plots the convergence of the different EM iterations (four figures, e.g. par(mfrow=(2,2))).
## S3 method for class 'mnem' plotConvergence(x, col = NULL, type = "b", convergence = 0.1, ...)
## S3 method for class 'mnem' plotConvergence(x, col = NULL, type = "b", convergence = 0.1, ...)
x |
mnem object |
col |
vector of colors for the iterations |
type |
see ?plot.default |
convergence |
difference of when two log likelihoods are considered equal; see also convergence for the function mnem() |
... |
additional parameters for the plots/lines functions |
plot of EM convergence
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) par(mfrow=c(2,2)) plotConvergence(result)
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1) par(mfrow=c(2,2)) plotConvergence(result)
This function visualizes a graph encoded as a disjunctive nromal form. See the graphviz documentation for possible input arguments, like edgehead/tail: https://graphviz.org/docs/attr-types/arrowType/
plotDnf( dnf = NULL, freq = NULL, stimuli = c(), signals = c(), inhibitors = c(), connected = TRUE, CNOlist = NULL, cex = NULL, fontsize = NULL, labelsize = NULL, type = 2, lwd = 1, edgelwd = 1, legend = 0, x = 0, y = 0, xjust = 0, yjust = 0, width = 1, height = 1, layout = "dot", main = "", sub = "", cex.main = 1.5, cex.sub = 1, col.sub = "grey", fontcolor = NULL, nodestates = NULL, simulate = NULL, edgecol = NULL, labels = NULL, labelcol = "blue", nodelabel = NULL, nodecol = NULL, bordercol = NULL, nodeshape = NULL, verbose = FALSE, edgestyle = NULL, nodeheight = NULL, nodewidth = NULL, edgewidth = NULL, lty = NULL, hierarchy = NULL, showall = FALSE, edgehead = NULL, edgelabel = NULL, edgetail = NULL, bool = TRUE, draw = TRUE, ... )
plotDnf( dnf = NULL, freq = NULL, stimuli = c(), signals = c(), inhibitors = c(), connected = TRUE, CNOlist = NULL, cex = NULL, fontsize = NULL, labelsize = NULL, type = 2, lwd = 1, edgelwd = 1, legend = 0, x = 0, y = 0, xjust = 0, yjust = 0, width = 1, height = 1, layout = "dot", main = "", sub = "", cex.main = 1.5, cex.sub = 1, col.sub = "grey", fontcolor = NULL, nodestates = NULL, simulate = NULL, edgecol = NULL, labels = NULL, labelcol = "blue", nodelabel = NULL, nodecol = NULL, bordercol = NULL, nodeshape = NULL, verbose = FALSE, edgestyle = NULL, nodeheight = NULL, nodewidth = NULL, edgewidth = NULL, lty = NULL, hierarchy = NULL, showall = FALSE, edgehead = NULL, edgelabel = NULL, edgetail = NULL, bool = TRUE, draw = TRUE, ... )
dnf |
Hyper-graph in disjunctive normal form, e.g. c("A=B", "A=C+D", "E=!B") with the child on the left and the parents on the right of the equation with "A=C+D" for A = C AND D. Alternatively, dnf can be an adjacency matrix, which is converted on the fly to a disjunctive normal form. |
freq |
Frequency of hyper-edges which are placed on the edges. |
stimuli |
Highlights vertices which can be stimulated. |
signals |
Highlights vertices which regulate E-genes. |
inhibitors |
Highlights vertices which can be inhibited. |
connected |
If TRUE, only includes vertices which are connected to other vertices. |
CNOlist |
CNOlist object. Optional instead of stimuli, inhibitors or signals. See package CellNOptR. |
cex |
Global font size. |
fontsize |
Vertice label size. |
labelsize |
Edge label size. |
type |
Different plot types. 2 for Rgraphviz and 1 for graph. |
lwd |
Line width of nodeborder. |
edgelwd |
Global edgeline width. |
legend |
0 shows no legend. 1 shows legend as a graph. 2 shows legend in a standard box. |
x |
x coordinate of box legend. |
y |
y coordinate of box legend. |
xjust |
Justification of legend box left, right or center (-1,1,0). |
yjust |
Justification of legend box top, bottom or middle (-1,1,0). |
width |
Vertice width. |
height |
Vertice height. |
layout |
Graph layout. See graphvizCapabilities()$layoutTypes. |
main |
Main title. |
sub |
Subtitle. |
cex.main |
Main title font size. |
cex.sub |
Subtitle font size. |
col.sub |
Font color of subtitle. |
fontcolor |
Global font color. |
nodestates |
Binary state of each vertice. |
simulate |
Simulate stimulation and inhibition of a list of vertices. E.g. simulate = list(stimuli = c("A", "B"), inhibitors = c("C", "D")). |
edgecol |
Vector with colors for every edge of the graph (not hyper-graph). E.g. an AND gate consists of three distinct edges. |
labels |
Vector with labels for the edges. |
labelcol |
Vector with label colors for the edges. |
nodelabel |
List of vertices with labels as input. E.g. labels = list(A="test", B="label for B"). |
nodecol |
List of vertices with colors as input. |
bordercol |
List of vertices with colors as input. |
nodeshape |
List of vertices with shapes (diamond, box, square,...). |
verbose |
Verbose output. |
edgestyle |
set the edge style like dashed, can be numerical |
nodeheight |
List of vertices with height as input. |
nodewidth |
List of vertices with width as input. |
edgewidth |
Vector with edge widths for individual edges. |
lty |
Vector with edge styles (line, dotted,...). |
hierarchy |
List with the hierarchy of the vertices. E.g. list(top = c("A", "B"), bottom = c("C", "D")). |
showall |
See "connected" above. |
edgehead |
Vector with edge heads. |
edgelabel |
Vector with edge labels. |
edgetail |
Vector with edge tails. |
bool |
If TRUE, only shows normal graph and no AND gates. |
draw |
Do not plot the graph and only output the graphNEL object. |
... |
additional arguments |
Rgraphviz object
Martin Pirkl
g <- c("!A+B+C=G", "C=G", "!D=G") plotDnf(g)
g <- c("!A+B+C=G", "C=G", "!D=G") plotDnf(g)
Computes the fit (score of a network) of the data given a network matrix
scoreAdj( D, adj, method = "llr", marginal = FALSE, logtype = 2, weights = NULL, trans.close = TRUE, subtopo = NULL, prior = NULL, ratio = TRUE, fpfn = c(0.1, 0.1), Rho = NULL, dotopo = FALSE, P = NULL, oldadj = NULL, modified = TRUE )
scoreAdj( D, adj, method = "llr", marginal = FALSE, logtype = 2, weights = NULL, trans.close = TRUE, subtopo = NULL, prior = NULL, ratio = TRUE, fpfn = c(0.1, 0.1), Rho = NULL, dotopo = FALSE, P = NULL, oldadj = NULL, modified = TRUE )
D |
data matrix; use modified = FALSE |
adj |
adjacency matrix of the network phi |
method |
either llr if D consists of log odds or disc, if D is binary |
marginal |
logical to compute the marginal likelihood (TRUE) |
logtype |
log base of the log odds |
weights |
a numeric vector of weights for the columns of D |
trans.close |
if TRUE uses the transitive closure of adj |
subtopo |
optional matrix with the subtopology theta as adjacency matrix |
prior |
a prior network matrix for adj |
ratio |
if FALSE uses alternative distance for the model score |
fpfn |
numeric vector of length two with false positive and false negative rates |
Rho |
optional perturbation matrix |
dotopo |
if TRUE computes and returns the subtopology theta (optional) |
P |
previous score matrix (only used internally) |
oldadj |
previous adjacency matrix (only used internally) |
modified |
if TRUE, assumes a prepocessed data matrix |
transitively closed matrix or graphNEL
Martin Pirkl
D <- matrix(rnorm(100*3), 100, 3) colnames(D) <- 1:3 rownames(D) <- 1:100 adj <- diag(3) colnames(adj) <- rownames(adj) <- 1:3 scoreAdj(D, adj)
D <- matrix(rnorm(100*3), 100, 3) colnames(D) <- 1:3 rownames(D) <- 1:100 adj <- diag(3) colnames(adj) <- rownames(adj) <- 1:3 scoreAdj(D, adj)
This function simulates single cell data from a random mixture of networks.
simData( Sgenes = 5, Egenes = 1, Nems = 2, reps = NULL, mw = NULL, evolution = FALSE, nCells = 1000, uninform = 0, unitheta = FALSE, edgeprob = c(0, 1), multi = FALSE, subsample = 1, scalefree = FALSE, badCells = 0, exactProb = TRUE, tree = FALSE, ... )
simData( Sgenes = 5, Egenes = 1, Nems = 2, reps = NULL, mw = NULL, evolution = FALSE, nCells = 1000, uninform = 0, unitheta = FALSE, edgeprob = c(0, 1), multi = FALSE, subsample = 1, scalefree = FALSE, badCells = 0, exactProb = TRUE, tree = FALSE, ... )
Sgenes |
number of Sgenes |
Egenes |
number of Egenes |
Nems |
number of components |
reps |
number of replicates, if set (not realistic for cells) |
mw |
mixture weights (has to be vector of length Nems) |
evolution |
evolving and not purely random network, if set to TRUE |
nCells |
number of cells |
uninform |
number of uninformative Egenes |
unitheta |
uniform theta, if TRUE |
edgeprob |
edge probability, value between 0 and 1 for sparse or dense networks or a range c(l,u) with lower and upper bound |
multi |
a vector with the percentages of cell with multiple perturbations, e.g. c(0.2,0.1,0) for 20 no quadruple knock-downs |
subsample |
range to subsample data. 1 means the full simulated data is used |
scalefree |
if TRUE, graph is scale free |
badCells |
number of cells, which are just noise and not connected to the ground truth network |
exactProb |
logical; if TRUE generates random network with exact fraction of edges provided by edgeprob |
tree |
if TRUE, restricts dag to a tree |
... |
additional parameters for the scale free network sampler (see 'nem' package) |
simulation object with meta information and data
Nem |
list of adjacency matrixes generatign the data |
theta |
E-gene attachaments |
data |
data matrix |
index |
index for which Nem generated which cell (data column) |
mw |
vector of input mixture weights |
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6))
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6))
Computes the transitive closure of a dag or only of a deletion/addition of an edge
transitive.closure(g, u = NULL, v = NULL)
transitive.closure(g, u = NULL, v = NULL)
g |
graph as matrix or graphNEL object |
u |
index of the parent of an edge (optional) |
v |
index of the child of an edge (optional) |
transitively closed matrix or graphNEL
Martin Pirkl
g <- matrix(c(0,0,0,1,0,0,0,1,0), 3) transitive.closure(g)
g <- matrix(c(0,0,0,1,0,0,0,1,0), 3) transitive.closure(g)
Computes the transitive reduction of an adjacency matrix or graphNEL object. Originally imported from the package 'nem'.
transitive.reduction(g)
transitive.reduction(g)
g |
adjacency matrix or graphNEL object |
transitively reduced adjacency matrix
Holger Froehlich
R. Sedgewick, Algorithms, Pearson, 2002.
g <- matrix(c(0,0,0,1,0,0,0,1,0), 3) rownames(g) <- colnames(g) <- seq_len(3) g.tr <- transitive.reduction(g)
g <- matrix(c(0,0,0,1,0,0,0,1,0), 3) rownames(g) <- colnames(g) <- seq_len(3) g.tr <- transitive.reduction(g)