Title: | Network Centrality Metrics for Elastic-Net Regularized Models |
---|---|
Description: | glmSparseNet is an R-package that generalizes sparse regression models when the features (e.g. genes) have a graph structure (e.g. protein-protein interactions), by including network-based regularizers. glmSparseNet uses the glmnet R-package, by including centrality measures of the network as penalty weights in the regularization. The current version implements regularization based on node degree, i.e. the strength and/or number of its associated edges, either by promoting hubs in the solution or orphan genes in the solution. All the glmnet distribution families are supported, namely "gaussian", "poisson", "binomial", "multinomial", "cox", and "mgaussian". |
Authors: | André Veríssimo [aut, cre] , Susana Vinga [aut], Eunice Carrasquinha [ctb], Marta Lopes [ctb] |
Maintainer: | André Veríssimo <[email protected]> |
License: | GPL-3 |
Version: | 1.25.0 |
Built: | 2024-12-29 06:11:39 UTC |
Source: | https://github.com/bioc/glmSparseNet |
Change base dir for '.runCache
.baseDir(path = NULL)
.baseDir(path = NULL)
path |
to base directory where cache is saved |
the new path
glmSparseNet:::.baseDir("/tmp/cache")
glmSparseNet:::.baseDir("/tmp/cache")
Common call to biomaRt to avoid repetitive code
.biomartLoad(attributes, filters, values, useCache, verbose)
.biomartLoad(attributes, filters, values, useCache, verbose)
attributes |
Attributes you want to retrieve. A possible list of attributes can be retrieved using the function biomaRt::listAttributes. |
filters |
Filters (one or more) that should be used in the query. A possible list of filters can be retrieved using the function biomaRt::listFilters. |
values |
Values of the filter, e.g. vector of affy IDs. If multiple filters are specified then the argument should be a list of vectors of which the position of each vector corresponds to the position of the filters in the filters argument |
useCache |
Boolean indicating if biomaRt cache should be used |
verbose |
When using biomaRt in webservice mode and setting verbose to TRUE, the XML query to the webservice will be printed. |
data.frame with attributes as columns and values translated to them
geneNames
ensemblGeneNames
protein2EnsemblGeneNames
biomaRt::getBM()
biomaRt::useEnsembl()
glmSparseNet:::.biomartLoad( attributes = c("external_gene_name", "ensembl_gene_id"), filters = "external_gene_name", values = c("MOB1A", "RFLNB", "SPIC", "TP53"), useCache = TRUE, verbose = FALSE )
glmSparseNet:::.biomartLoad( attributes = c("external_gene_name", "ensembl_gene_id"), filters = "external_gene_name", values = c("MOB1A", "RFLNB", "SPIC", "TP53"), useCache = TRUE, verbose = FALSE )
Build digest of function from the actual code
.buildFunctionDigest(fun)
.buildFunctionDigest(fun)
fun |
function call name |
a digest
glmSparseNet:::.buildFunctionDigest(sum) glmSparseNet:::.buildFunctionDigest(c)
glmSparseNet:::.buildFunctionDigest(sum) glmSparseNet:::.buildFunctionDigest(c)
Change cache.compression for run_cache
.cacheCompression(compression = NULL)
.cacheCompression(compression = NULL)
compression |
see compression parameter in save function |
the new compression
glmSparseNet:::.cacheCompression("bzip2")
glmSparseNet:::.cacheCompression("bzip2")
Internal method to calculate the network using data-dependant methods
.calcPenalty(xdata, penaltyType, options = networkOptions())
.calcPenalty(xdata, penaltyType, options = networkOptions())
xdata |
input data |
penaltyType |
which method to use |
options |
options to be used |
vector with penalty weights
xdata <- matrix(rnorm(1000), ncol = 200) glmSparseNet:::.calcPenalty(xdata, "none") glmSparseNet:::.calcPenalty( xdata, "correlation", networkOptions(cutoff = .6) ) glmSparseNet:::.calcPenalty(xdata, "correlation") glmSparseNet:::.calcPenalty( xdata, "covariance", networkOptions(cutoff = .6) ) glmSparseNet:::.calcPenalty(xdata, "covariance")
xdata <- matrix(rnorm(1000), ncol = 200) glmSparseNet:::.calcPenalty(xdata, "none") glmSparseNet:::.calcPenalty( xdata, "correlation", networkOptions(cutoff = .6) ) glmSparseNet:::.calcPenalty(xdata, "correlation") glmSparseNet:::.calcPenalty( xdata, "covariance", networkOptions(cutoff = .6) ) glmSparseNet:::.calcPenalty(xdata, "covariance")
This is where the actual work is done
.calculateResult(path, compression, forceRecalc, showMessage, fun, ...)
.calculateResult(path, compression, forceRecalc, showMessage, fun, ...)
path |
path to save cache |
compression |
compression used in save |
forceRecalc |
force to recalculate cache |
showMessage |
boolean to show messages |
fun |
function to be called |
... |
arguments to said function , |
result of fun(...)
glmSparseNet:::.calculateResult( file.path(tempdir(), "calculate_result.Rdata"), "gzip", FALSE, TRUE, sum, 1, 2, 3 )
glmSparseNet:::.calculateResult( file.path(tempdir(), "calculate_result.Rdata"), "gzip", FALSE, TRUE, sum, 1, 2, 3 )
Please note that all the interactions have duplicates as it's a two way interaction (score(ProteinA-Protein) == score(ProteinB, PorteinA))
.combinedScore(allInteractions, scoreThreshold, removeText)
.combinedScore(allInteractions, scoreThreshold, removeText)
allInteractions |
table with score of all interactions |
scoreThreshold |
threshold to keep interactions |
removeText |
remove text-based interactions |
To better understand how the score is calculated, please see: https://string-db.org/help/faq/#how-are-the-scores-computed
table with combined score
Create directories for cache
.createDirectoryForCache(baseDir, parentPath)
.createDirectoryForCache(baseDir, parentPath)
baseDir |
tentative base dir to create. |
parentPath |
first 4 characters of digest that will become parent directory for the actual cache file (this reduces number of files per folder) |
a list of updated baseDir and parentDir
glmSparseNet:::.createDirectoryForCache(tempdir(), "abcd") glmSparseNet:::.createDirectoryForCache( file.path(getwd(), "run-cache"), "abcd" )
glmSparseNet:::.createDirectoryForCache(tempdir(), "abcd") glmSparseNet:::.createDirectoryForCache( file.path(getwd(), "run-cache"), "abcd" )
Should be solved in issue #39, will test to remove it.
.curlWorkaround(expr)
.curlWorkaround(expr)
expr |
expression |
result of expression
glmSparseNet:::.curlWorkaround({ biomaRt::useEnsembl( biomart = "genes", dataset = "hsapiens_gene_ensembl" ) })
glmSparseNet:::.curlWorkaround({ biomaRt::useEnsembl( biomart = "genes", dataset = "hsapiens_gene_ensembl" ) })
The assumption to use this function is that the network represented by a matrix is symetric and without any connection the node and itself.
.degreeGeneric( fun = stats::cor, funPrefix = "operator", xdata, cutoff = 0, considerUnweighted = FALSE, chunks = 1000, forceRecalcDegree = FALSE, forceRecalcNetwork = FALSE, nCores = 1, ... )
.degreeGeneric( fun = stats::cor, funPrefix = "operator", xdata, cutoff = 0, considerUnweighted = FALSE, chunks = 1000, forceRecalcDegree = FALSE, forceRecalcNetwork = FALSE, nCores = 1, ... )
fun |
function that will calculate the edge weight between 2 nodes |
funPrefix |
used to store low-level information on network as it can become to large to be stored in memory |
xdata |
calculate correlation matrix on each column |
cutoff |
positive value that determines a cutoff value |
considerUnweighted |
consider all edges as 1 if they are greater than 0 |
chunks |
calculate function at batches of this value (default is 1000) |
forceRecalcDegree |
force recalculation of penalty weights (but not the network), instead of going to cache |
forceRecalcNetwork |
force recalculation of network and penalty weights, instead of going to cache |
nCores |
number of cores to be used |
... |
extra parameters for fun |
a vector of the degrees
Sets a default caching algorithm to use with .runCache
.digestCache(val)
.digestCache(val)
val |
object to calculate hash over |
a hash of the sha256
glmSparseNet:::.digestCache(c(1, 2, 3, 4, 5)) glmSparseNet:::.digestCache("some example")
glmSparseNet:::.digestCache(c(1, 2, 3, 4, 5)) glmSparseNet:::.digestCache("some example")
Calculate the upper triu of the matrix
.networkGenericParallel( fun, funPrefix, xdata, buildOutput = "matrix", nCores = 1, forceRecalcNetwork = FALSE, showMessage = FALSE, ... )
.networkGenericParallel( fun, funPrefix, xdata, buildOutput = "matrix", nCores = 1, forceRecalcNetwork = FALSE, showMessage = FALSE, ... )
fun |
function that will calculate the edge weight between 2 nodes |
funPrefix |
used to store low-level information on network as it can become to large to be stored in memory |
xdata |
base data to calculate network |
buildOutput |
if output returns a 'matrix', 'vector' of the upper triu without the diagonal or NULL with any other argument |
nCores |
number of cores to be used |
forceRecalcNetwork |
force recalculation, instead of going to cache |
showMessage |
shows cache operation messages |
... |
extra parameters for fun |
depends on buildOutput parameter
Note that it assumes it does not calculate for index below and equal to ixI
.networkWorker(fun, xdata, ixI, ...)
.networkWorker(fun, xdata, ixI, ...)
fun |
function to be used, can be cor, cov or any other defined function |
xdata |
original data to calculate the function over |
ixI |
starting index, this can be used to save ony upper triu |
... |
extra parameters for fun |
a vector with size ncol(xdata) - ixI
This method saves the function that's being called
.runCache( fun, ..., seed = NULL, baseDir = NULL, cachePrefix = "generic_cache", cacheDigest = list(), showMessage = NULL, forceRecalc = FALSE, addToHash = NULL ) ## S4 method for signature 'function' .runCache( fun, ..., seed = NULL, baseDir = NULL, cachePrefix = "generic_cache", cacheDigest = list(), showMessage = NULL, forceRecalc = FALSE, addToHash = NULL )
.runCache( fun, ..., seed = NULL, baseDir = NULL, cachePrefix = "generic_cache", cacheDigest = list(), showMessage = NULL, forceRecalc = FALSE, addToHash = NULL ) ## S4 method for signature 'function' .runCache( fun, ..., seed = NULL, baseDir = NULL, cachePrefix = "generic_cache", cacheDigest = list(), showMessage = NULL, forceRecalc = FALSE, addToHash = NULL )
fun |
function call name |
... |
parameters for function call |
seed |
when function call is random, this allows to set seed beforehand |
baseDir |
directory where data is stored |
cachePrefix |
prefix for file name to be generated from parameters (...) |
cacheDigest |
cache of the digest for one or more of the parameters |
showMessage |
show message that data is being retrieved from cache |
forceRecalc |
force the recalculation of the values |
addToHash |
something to add to the filename generation |
the result of fun(...)
.runCache(`function`)
: accepts function as first argument and save cache
# [optional] save cache in a temporary directory # glmSparseNet:::.baseDir(tempdir()) glmSparseNet:::.runCache(c, 1, 2, 3, 4) # # next three should use the same cache # note, the middle call should be a little faster as digest is not # calculated # for the first argument glmSparseNet:::.runCache(c, 1, 2, 3, 4) glmSparseNet:::.runCache(c, a = 1, 2, c = 3, 4) # Using a local folder # glmSparseNet:::.runCache(c, 1, 2, 3, 4, baseDir = "runcache")
# [optional] save cache in a temporary directory # glmSparseNet:::.baseDir(tempdir()) glmSparseNet:::.runCache(c, 1, 2, 3, 4) # # next three should use the same cache # note, the middle call should be a little faster as digest is not # calculated # for the first argument glmSparseNet:::.runCache(c, 1, 2, 3, 4) glmSparseNet:::.runCache(c, a = 1, 2, c = 3, 4) # Using a local folder # glmSparseNet:::.runCache(c, 1, 2, 3, 4, baseDir = "runcache")
Saving the cache
.saveRunCache(result, path, compression, showMessage)
.saveRunCache(result, path, compression, showMessage)
result |
main result to save |
path |
path to the file to save |
compression |
compression method to be used |
showMessage |
TRUE to show messages, FALSE otherwise |
result of save operation
glmSparseNet:::.saveRunCache( 35, file.path(tempdir(), "save_run_cache.Rdata"), FALSE, TRUE )
glmSparseNet:::.saveRunCache( 35, file.path(tempdir(), "save_run_cache.Rdata"), FALSE, TRUE )
Show messages option in .runCache
.showMessage(showMessage = NULL)
.showMessage(showMessage = NULL)
showMessage |
boolean indicating to show messages or not |
the show.message option
glmSparseNet:::.showMessage(FALSE)
glmSparseNet:::.showMessage(FALSE)
Temporary directory for runCache
.tempdirCache()
.tempdirCache()
a path to a temporary directory used by runCache
Write a file in run-cache directory to explain the origin
.writeReadme(baseDir)
.writeReadme(baseDir)
baseDir |
directory where to build this file |
the path to the file it has written
glmSparseNet:::.writeReadme(tempdir())
glmSparseNet:::.writeReadme(tempdir())
Create balanced folds for cross validation using stratified sampling
balancedCvFolds(..., nfolds = 10) # deprecated, please use balancedCvFolds() balanced.cv.folds(..., nfolds = 10)
balancedCvFolds(..., nfolds = 10) # deprecated, please use balancedCvFolds() balanced.cv.folds(..., nfolds = 10)
... |
vectors representing data |
nfolds |
number of folds to be created |
list with given input, nfolds and result. The result is a list matching the input with foldid attributed to each position.
balancedCvFolds(seq(10), seq(11, 15), nfolds = 2) # will give a warning balancedCvFolds(seq(10), seq(11, 13), nfolds = 10) balancedCvFolds(seq(100), seq(101, 133), nfolds = 10)
balancedCvFolds(seq(10), seq(11, 15), nfolds = 2) # will give a warning balancedCvFolds(seq(10), seq(11, 13), nfolds = 10) balancedCvFolds(seq(100), seq(101, 133), nfolds = 10)
Auxiliary function to generate suitable lambda parameters
buildLambda( lambdaLargest = NULL, xdata = NULL, ydata = NULL, family = NULL, ordersOfMagnitudeSmaller = 3, lambdaPerOrderMagnitude = 150, lambda.largest = deprecated(), orders.of.magnitude.smaller = deprecated(), lambda.per.order.magnitude = deprecated() )
buildLambda( lambdaLargest = NULL, xdata = NULL, ydata = NULL, family = NULL, ordersOfMagnitudeSmaller = 3, lambdaPerOrderMagnitude = 150, lambda.largest = deprecated(), orders.of.magnitude.smaller = deprecated(), lambda.per.order.magnitude = deprecated() )
lambdaLargest |
numeric value for largest number of lambda to consider (usually with a target of 1 selected variable) |
xdata |
X parameter for glmnet function |
ydata |
Y parameter for glmnet function |
family |
family parameter to glmnet function |
ordersOfMagnitudeSmaller |
minimum value for lambda (lambda.largest / 10^orders.of.magnitude.smaller) |
lambdaPerOrderMagnitude |
how many lambdas to create for each order of magnitude |
lambda.largest |
|
orders.of.magnitude.smaller |
|
lambda.per.order.magnitude |
a numeric vector with suitable lambdas
buildLambda(5.4)
buildLambda(5.4)
This can reduce the dimension of the original network, as there may not be a mapping between peptide and gene id
buildStringNetwork( stringTbl, useNames = c("protein", "ensembl", "external"), string.tbl = deprecated(), use.names = deprecated() )
buildStringNetwork( stringTbl, useNames = c("protein", "ensembl", "external"), string.tbl = deprecated(), use.names = deprecated() )
stringTbl |
|
useNames |
|
string.tbl |
|
use.names |
a new matrix with gene ids instead of peptide ids. The size of matrix can be different as there may not be a mapping or a peptide mapping can have multiple genes.
interactions <- stringDBhomoSapiens(scoreThreshold = 100) string_network <- buildStringNetwork(interactions) # number of edges sum(string_network != 0)
interactions <- stringDBhomoSapiens(scoreThreshold = 100) string_network <- buildStringNetwork(interactions) # number of edges sum(string_network != 0)
network parameter accepts:
cv.glmDegree( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) cv.glmHub( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) cv.glmOrphan( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) cv.glmSparseNet( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... )
cv.glmDegree( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) cv.glmHub( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) cv.glmOrphan( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) cv.glmSparseNet( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... )
xdata |
input data, can be a matrix or MultiAssayExperiment. |
ydata |
response data compatible with glmnet. |
network |
type of network, see below. |
options |
options to calculate network. |
experiment |
name of experiment to use as input in MultiAssayExperiment object (only if xdata is an object of this class). |
network.options |
|
experiment.name |
|
... |
parameters that |
string to calculate network based on data (correlation, covariance)
matrix representing the network
vector with already calculated penalty weights (can also be used directly glmnet)
an object just as cv.glmnet
cv.glmDegree()
: penalizes nodes with small degree
(inversion penalization h(x) = 1 / x
).
cv.glmHub()
: penalizes nodes with small degree
(normalized heuristic that promotes nodes with many edges).
cv.glmOrphan()
: penalizes nodes with high degree
(normalized heuristic that promotes nodes with few edges).
Model with the same penalizations glmSparseNet()
.
# Degree penalization xdata <- matrix(rnorm(100), ncol = 5) cv.glmDegree( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", nfolds = 5, options = networkOptions(minDegree = .2) ) # Hub penalization xdata <- matrix(rnorm(100), ncol = 5) cv.glmHub( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", nfolds = 5, options = networkOptions(minDegree = .2) ) # Orphan penalization xdata <- matrix(rnorm(100), ncol = 5) cv.glmOrphan( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", nfolds = 5, options = networkOptions(minDegree = .2) ) # Gaussian model xdata <- matrix(rnorm(500), ncol = 5) cv.glmSparseNet( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian" ) cv.glmSparseNet( xdata, rnorm(nrow(xdata)), "covariance", family = "gaussian" ) # # # Using MultiAssayExperiment with survival model library(MultiAssayExperiment) data("miniACC", package = "MultiAssayExperiment") xdata <- miniACC # # build valid data with days of last follow up or to event event.ix <- which(!is.na(xdata$days_to_death)) cens.ix <- which(!is.na(xdata$days_to_last_followup)) xdata$surv_event_time <- array(NA, nrow(colData(xdata))) xdata$surv_event_time[event.ix] <- xdata$days_to_death[event.ix] xdata$surv_event_time[cens.ix] <- xdata$days_to_last_followup[cens.ix] # # Keep only valid individuals valid.ix <- as.vector(!is.na(xdata$surv_event_time) & !is.na(xdata$vital_status) & xdata$surv_event_time > 0) xdata.valid <- xdata[, rownames(colData(xdata))[valid.ix]] ydata.valid <- colData(xdata.valid)[, c("surv_event_time", "vital_status")] colnames(ydata.valid) <- c("time", "status") # cv.glmSparseNet( xdata.valid, ydata.valid, nfolds = 5, family = "cox", network = "correlation", experiment = "RNASeq2GeneNorm" )
# Degree penalization xdata <- matrix(rnorm(100), ncol = 5) cv.glmDegree( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", nfolds = 5, options = networkOptions(minDegree = .2) ) # Hub penalization xdata <- matrix(rnorm(100), ncol = 5) cv.glmHub( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", nfolds = 5, options = networkOptions(minDegree = .2) ) # Orphan penalization xdata <- matrix(rnorm(100), ncol = 5) cv.glmOrphan( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", nfolds = 5, options = networkOptions(minDegree = .2) ) # Gaussian model xdata <- matrix(rnorm(500), ncol = 5) cv.glmSparseNet( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian" ) cv.glmSparseNet( xdata, rnorm(nrow(xdata)), "covariance", family = "gaussian" ) # # # Using MultiAssayExperiment with survival model library(MultiAssayExperiment) data("miniACC", package = "MultiAssayExperiment") xdata <- miniACC # # build valid data with days of last follow up or to event event.ix <- which(!is.na(xdata$days_to_death)) cens.ix <- which(!is.na(xdata$days_to_last_followup)) xdata$surv_event_time <- array(NA, nrow(colData(xdata))) xdata$surv_event_time[event.ix] <- xdata$days_to_death[event.ix] xdata$surv_event_time[cens.ix] <- xdata$days_to_last_followup[cens.ix] # # Keep only valid individuals valid.ix <- as.vector(!is.na(xdata$surv_event_time) & !is.na(xdata$vital_status) & xdata$surv_event_time > 0) xdata.valid <- xdata[, rownames(colData(xdata))[valid.ix]] ydata.valid <- colData(xdata.valid)[, c("surv_event_time", "vital_status")] colnames(ydata.valid) <- c("time", "status") # cv.glmSparseNet( xdata.valid, ydata.valid, nfolds = 5, family = "cox", network = "correlation", experiment = "RNASeq2GeneNorm" )
Calculate the degree of the correlation network based on xdata
degreeCor( xdata, cutoff = 0, considerUnweighted = FALSE, forceRecalcDegree = FALSE, forceRecalcNetwork = FALSE, nCores = 1, ..., consider.unweighted = deprecated(), force.recalc.degree = deprecated(), force.recalc.network = deprecated(), n.cores = deprecated() )
degreeCor( xdata, cutoff = 0, considerUnweighted = FALSE, forceRecalcDegree = FALSE, forceRecalcNetwork = FALSE, nCores = 1, ..., consider.unweighted = deprecated(), force.recalc.degree = deprecated(), force.recalc.network = deprecated(), n.cores = deprecated() )
xdata |
calculate correlation matrix on each column. |
cutoff |
positive value that determines a cutoff value. |
considerUnweighted |
consider all edges as 1 if they are greater than 0. |
forceRecalcDegree |
force recalculation of penalty weights (but not the network), instead of going to cache. |
forceRecalcNetwork |
force recalculation of network and penalty weights, instead of going to cache. |
nCores |
number of cores to be used. |
... |
extra parameters for cor function. |
consider.unweighted |
|
force.recalc.degree |
|
force.recalc.network |
|
n.cores |
a vector of the degrees.
n.col <- 6 xdata <- matrix(rnorm(n.col * 4), ncol = n.col) degreeCor(xdata) degreeCor(xdata, cutoff = .5) degreeCor(xdata, cutoff = .5, considerUnweighted = TRUE)
n.col <- 6 xdata <- matrix(rnorm(n.col * 4), ncol = n.col) degreeCor(xdata) degreeCor(xdata, cutoff = .5) degreeCor(xdata, cutoff = .5, considerUnweighted = TRUE)
Calculate the degree of the covariance network based on xdata
degreeCov( xdata, cutoff = 0, considerUnweighted = FALSE, forceRecalcDegree = FALSE, forceRecalcNetwork = FALSE, nCores = 1, ..., consider.unweighted = deprecated(), force.recalc.degree = deprecated(), force.recalc.network = deprecated(), n.cores = deprecated() )
degreeCov( xdata, cutoff = 0, considerUnweighted = FALSE, forceRecalcDegree = FALSE, forceRecalcNetwork = FALSE, nCores = 1, ..., consider.unweighted = deprecated(), force.recalc.degree = deprecated(), force.recalc.network = deprecated(), n.cores = deprecated() )
xdata |
calculate correlation matrix on each column. |
cutoff |
positive value that determines a cutoff value. |
considerUnweighted |
consider all edges as 1 if they are greater than 0. |
forceRecalcDegree |
force recalculation of penalty weights (but not the network), instead of going to cache. |
forceRecalcNetwork |
force recalculation of network and penalty weights, instead of going to cache. |
nCores |
number of cores to be used. |
... |
extra parameters for cov function. |
consider.unweighted |
|
force.recalc.degree |
|
force.recalc.network |
|
n.cores |
a vector of the degrees
n.col <- 6 xdata <- matrix(rnorm(n.col * 4), ncol = n.col) degreeCov(xdata) degreeCov(xdata, cutoff = .5) degreeCov(xdata, cutoff = .5, considerUnweighted = TRUE)
n.col <- 6 xdata <- matrix(rnorm(n.col * 4), ncol = n.col) degreeCov(xdata) degreeCov(xdata, cutoff = .5) degreeCov(xdata, cutoff = .5, considerUnweighted = TRUE)
In case of new call it uses the temporary cache instead of downloading again.
downloadFileLocal(urlStr, oD = tempdir())
downloadFileLocal(urlStr, oD = tempdir())
urlStr |
url of file to download |
oD |
temporary directory to store file |
Inspired by STRINGdb Bioconductor package, but using curl as file may be too big to handle.
path to file
glmSparseNet:::downloadFileLocal( "https://string-db.org/api/tsv-no-header/version" )
glmSparseNet:::downloadFileLocal( "https://string-db.org/api/tsv-no-header/version" )
Retrieve ensembl gene names from biomaRt
ensemblGeneNames( geneId, useCache = TRUE, verbose = FALSE, gene.id = deprecated(), use.cache = deprecated() )
ensemblGeneNames( geneId, useCache = TRUE, verbose = FALSE, gene.id = deprecated(), use.cache = deprecated() )
geneId |
character vector with gene names |
useCache |
Boolean indicating if biomaRt cache should be used |
verbose |
When using biomaRt in webservice mode and setting verbose to TRUE, the XML query to the webservice will be printed. |
gene.id |
|
use.cache |
a dataframe with external gene names, ensembl_id
ensemblGeneNames(c("MOB1A", "RFLNB", "SPIC", "TP53"))
ensemblGeneNames(c("MOB1A", "RFLNB", "SPIC", "TP53"))
Retrieve gene names from biomaRt
geneNames( ensemblGenes, useCache = TRUE, verbose = FALSE, ensembl.genes = deprecated(), use.cache = deprecated() )
geneNames( ensemblGenes, useCache = TRUE, verbose = FALSE, ensembl.genes = deprecated(), use.cache = deprecated() )
ensemblGenes |
character vector with gene names in ensembl_id format |
useCache |
Boolean indicating if biomaRt cache should be used |
verbose |
When using biomaRt in webservice mode and setting verbose to TRUE, the XML query to the webservice will be printed. |
ensembl.genes |
|
use.cache |
a dataframe with external gene names, ensembl_id
geneNames(c("ENSG00000114978", "ENSG00000166211", "ENSG00000183688"))
geneNames(c("ENSG00000114978", "ENSG00000166211", "ENSG00000183688"))
network parameter accepts:
string to calculate network based on data (correlation, covariance)
matrix representing the network
vector with already calculated penalty weights (can also be used directly with glmnet)
glmSparseNet( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) glmDegree( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) glmHub( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) glmOrphan( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... )
glmSparseNet( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) glmDegree( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) glmHub( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... ) glmOrphan( xdata, ydata, network, options = networkOptions(), experiment = NULL, network.options = deprecated(), experiment.name = deprecated(), ... )
xdata |
input data, can be a matrix or MultiAssayExperiment. |
ydata |
response data compatible with glmnet. |
network |
type of network, see below. |
options |
options to calculate network. |
experiment |
name of experiment to use as input in MultiAssayExperiment object (only if xdata is an object of this class). |
network.options |
|
experiment.name |
|
... |
parameters that |
an object just as glmnet
glmDegree()
: penalizes nodes with small degree
(inversion penalization h(x) = 1 / x
).
glmHub()
: Penalizes nodes with small degree
(normalized heuristic that promotes nodes with many edges).
glmOrphan()
: Penalizes nodes with high degree
(normalized heuristic that promotes nodes with few edges).
Cross-validation functions cv.glmSparseNet()
.
xdata <- matrix(rnorm(100), ncol = 20) glmSparseNet(xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian") glmSparseNet(xdata, rnorm(nrow(xdata)), "covariance", family = "gaussian") # # # Using MultiAssayExperiment # load data library(MultiAssayExperiment) data("miniACC", package = "MultiAssayExperiment") xdata <- miniACC # TODO aking out x individuals missing values # build valid data with days of last follow up or to event event.ix <- which(!is.na(xdata$days_to_death)) cens.ix <- which(!is.na(xdata$days_to_last_followup)) xdata$surv_event_time <- array(NA, nrow(colData(xdata))) xdata$surv_event_time[event.ix] <- xdata$days_to_death[event.ix] xdata$surv_event_time[cens.ix] <- xdata$days_to_last_followup[cens.ix] # Keep only valid individuals valid.ix <- as.vector(!is.na(xdata$surv_event_time) & !is.na(xdata$vital_status) & xdata$surv_event_time > 0) xdata.valid <- xdata[, rownames(colData(xdata))[valid.ix]] ydata.valid <- colData(xdata.valid)[, c("surv_event_time", "vital_status")] colnames(ydata.valid) <- c("time", "status") glmSparseNet( xdata.valid, ydata.valid, family = "cox", network = "correlation", experiment = "RNASeq2GeneNorm" ) # Degree penalization xdata <- matrix(rnorm(100), ncol = 5) glmDegree( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", options = networkOptions(minDegree = .2) ) xdata <- matrix(rnorm(100), ncol = 5) glmHub( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", options = networkOptions(minDegree = .2) ) # Orphan penalization xdata <- matrix(rnorm(100), ncol = 5) glmOrphan( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", options = networkOptions(minDegree = .2) )
xdata <- matrix(rnorm(100), ncol = 20) glmSparseNet(xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian") glmSparseNet(xdata, rnorm(nrow(xdata)), "covariance", family = "gaussian") # # # Using MultiAssayExperiment # load data library(MultiAssayExperiment) data("miniACC", package = "MultiAssayExperiment") xdata <- miniACC # TODO aking out x individuals missing values # build valid data with days of last follow up or to event event.ix <- which(!is.na(xdata$days_to_death)) cens.ix <- which(!is.na(xdata$days_to_last_followup)) xdata$surv_event_time <- array(NA, nrow(colData(xdata))) xdata$surv_event_time[event.ix] <- xdata$days_to_death[event.ix] xdata$surv_event_time[cens.ix] <- xdata$days_to_last_followup[cens.ix] # Keep only valid individuals valid.ix <- as.vector(!is.na(xdata$surv_event_time) & !is.na(xdata$vital_status) & xdata$surv_event_time > 0) xdata.valid <- xdata[, rownames(colData(xdata))[valid.ix]] ydata.valid <- colData(xdata.valid)[, c("surv_event_time", "vital_status")] colnames(ydata.valid) <- c("time", "status") glmSparseNet( xdata.valid, ydata.valid, family = "cox", network = "correlation", experiment = "RNASeq2GeneNorm" ) # Degree penalization xdata <- matrix(rnorm(100), ncol = 5) glmDegree( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", options = networkOptions(minDegree = .2) ) xdata <- matrix(rnorm(100), ncol = 5) glmHub( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", options = networkOptions(minDegree = .2) ) # Orphan penalization xdata <- matrix(rnorm(100), ncol = 5) glmOrphan( xdata, rnorm(nrow(xdata)), "correlation", family = "gaussian", options = networkOptions(minDegree = .2) )
The API has been removed and this function is no longer available.
hallmarks( genes, metric = "count", hierarchy = "full", generate.plot = TRUE, show.message = FALSE )
hallmarks( genes, metric = "count", hierarchy = "full", generate.plot = TRUE, show.message = FALSE )
genes |
gene names |
metric |
see below |
hierarchy |
see below |
generate.plot |
flag to indicate if return object has a ggplot2 object |
show.message |
flag to indicate if run_cache method shows messages |
data.frame with choosen metric and hierarchy It also returns a vector with genes that do not have any hallmarks.
See http://chat.lionproject.net/api for more details on the metric and hallmarks parameters
To standardize the colors in the gradient you can use scale_fill_gradientn(limits=c(0,1), colours=topo.colors(3)) to limit between 0 and 1 for cprob and -1 and 1 for npmi
Heuristic function to use in high dimensions
heuristicScale( x, subExp10 = -1, expMult = -1, subExp = -1, sub.exp10 = deprecated(), exp.mult = deprecated(), sub.exp = deprecated() )
heuristicScale( x, subExp10 = -1, expMult = -1, subExp = -1, sub.exp10 = deprecated(), exp.mult = deprecated(), sub.exp = deprecated() )
x |
vector of values to scale |
subExp10 |
value to subtract to base 10 exponential, for example:
|
expMult |
parameter to multiply exponential, i.e. to have a negative exponential or positive |
subExp |
value to subtract for exponentional, for example if x = 0,
|
sub.exp10 |
|
exp.mult |
|
sub.exp |
a vector of scaled values
heuristicScale(rnorm(1:10))
heuristicScale(rnorm(1:10))
Heuristic function to penalize nodes with low degree
hubHeuristic(x)
hubHeuristic(x)
x |
single value of vector |
transformed
hubHeuristic(rnorm(1:10))
hubHeuristic(rnorm(1:10))
Custom pallete of colors
myColors(ix = NULL) # deprecated, please use myColors() my.colors(ix = NULL)
myColors(ix = NULL) # deprecated, please use myColors() my.colors(ix = NULL)
ix |
index for a color |
a color
myColors() myColors(5)
myColors() myColors(5)
Custom pallete of symbols in plots
mySymbols(ix = NULL) # deprecated, please use mySymbols() my.symbols(ix = NULL)
mySymbols(ix = NULL) # deprecated, please use mySymbols() my.symbols(ix = NULL)
ix |
index for symbol |
a symbol
mySymbols() mySymbols(2)
mySymbols() mySymbols(2)
Calculates the correlation network
networkCorParallel( xdata, buildOutput = "matrix", nCores = 1, forceRecalcNetwork = FALSE, showMessage = FALSE, ..., build.output = deprecated(), n.cores = deprecated(), force.recalc.network = deprecated(), show.message = deprecated() )
networkCorParallel( xdata, buildOutput = "matrix", nCores = 1, forceRecalcNetwork = FALSE, showMessage = FALSE, ..., build.output = deprecated(), n.cores = deprecated(), force.recalc.network = deprecated(), show.message = deprecated() )
xdata |
base data to calculate network |
buildOutput |
if output returns a 'matrix', 'vector' of the upper triu without the diagonal or NULL with any other argument |
nCores |
number of cores to be used |
forceRecalcNetwork |
force recalculation, instead of going to cache |
showMessage |
shows cache operation messages |
... |
extra parameters for fun |
build.output |
lifecycle::badge("deprecated") without the diagonal or NULL with any other argument |
n.cores |
lifecycle::badge("deprecated") |
force.recalc.network |
lifecycle::badge("deprecated") |
show.message |
lifecycle::badge("deprecated") |
depends on build.output parameter
n_col <- 6 xdata <- matrix(rnorm(n_col * 4), ncol = n_col) networkCorParallel(xdata)
n_col <- 6 xdata <- matrix(rnorm(n_col * 4), ncol = n_col) networkCorParallel(xdata)
Calculates the covariance network
networkCovParallel( xdata, buildOutput = "matrix", nCores = 1, forceRecalcNetwork = FALSE, showMessage = FALSE, ..., build.output = deprecated(), n.cores = deprecated(), force.recalc.network = deprecated(), show.message = deprecated() )
networkCovParallel( xdata, buildOutput = "matrix", nCores = 1, forceRecalcNetwork = FALSE, showMessage = FALSE, ..., build.output = deprecated(), n.cores = deprecated(), force.recalc.network = deprecated(), show.message = deprecated() )
xdata |
base data to calculate network |
buildOutput |
if output returns a 'matrix', 'vector' of the upper triu without the diagonal or NULL with any other argument |
nCores |
number of cores to be used |
forceRecalcNetwork |
force recalculation, instead of going to cache |
showMessage |
shows cache operation messages |
... |
extra parameters for fun |
build.output |
lifecycle::badge("deprecated") without the diagonal or NULL with any other argument |
n.cores |
lifecycle::badge("deprecated") |
force.recalc.network |
lifecycle::badge("deprecated") |
show.message |
lifecycle::badge("deprecated") |
depends on build.output parameter
n.col <- 6 xdata <- matrix(rnorm(n.col * 4), ncol = n.col) networkCovParallel(xdata)
n.col <- 6 xdata <- matrix(rnorm(n.col * 4), ncol = n.col) networkCovParallel(xdata)
Setup network options, such as using weighted or unweighted degree, which centrality measure to use
networkOptions( method = "pearson", unweighted = TRUE, cutoff = 0, centrality = "degree", minDegree = 0, nCores = 1, transFun = function(x) x, min.degree = deprecated(), n.cores = deprecated(), trans.fun = deprecated() )
networkOptions( method = "pearson", unweighted = TRUE, cutoff = 0, centrality = "degree", minDegree = 0, nCores = 1, transFun = function(x) x, min.degree = deprecated(), n.cores = deprecated(), trans.fun = deprecated() )
method |
in case of correlation and covariance, which method to use. |
unweighted |
calculate degree using unweighted network. |
cutoff |
cuttoff value in network edges to trim the network. |
centrality |
centrality measure to use, currently only supports degree. |
minDegree |
minimum value that individual penalty weight can take. |
nCores |
number of cores to use, default to 1. |
transFun |
See details below. |
min.degree |
|
n.cores |
|
trans.fun |
The |
a list of options
networkOptions(unweighted = FALSE)
networkOptions(unweighted = FALSE)
Heuristic function to penalize nodes with high degree
orphanHeuristic(x)
orphanHeuristic(x)
x |
single value of vector |
transformed
orphanHeuristic(rnorm(1:10))
orphanHeuristic(rnorm(1:10))
Retrieve ensembl gene ids from proteins
protein2EnsemblGeneNames( ensemblProteins, useCache = TRUE, verbose = FALSE, ensembl.proteins = deprecated(), use.cache = deprecated() )
protein2EnsemblGeneNames( ensemblProteins, useCache = TRUE, verbose = FALSE, ensembl.proteins = deprecated(), use.cache = deprecated() )
ensemblProteins |
character vector with gene names in ensembl_peptide_id format |
useCache |
Boolean indicating if biomaRt cache should be used |
verbose |
When using biomaRt in webservice mode and setting verbose to TRUE, the XML query to the webservice will be printed. |
ensembl.proteins |
|
use.cache |
a dataframe with external gene names, ensembl_peptide_id
protein2EnsemblGeneNames(c( "ENSP00000235382", "ENSP00000233944", "ENSP00000216911" ))
protein2EnsemblGeneNames(c( "ENSP00000235382", "ENSP00000233944", "ENSP00000216911" ))
Draws multiple kaplan meyer survival curves (or just 1) and calculates logrank test
separate2GroupsCox( chosenBetas, xdata, ydata, probs = c(0.5, 0.5), noPlot = FALSE, plotTitle = "SurvivalCurves", xlim = NULL, ylim = NULL, expandYZero = FALSE, legendOutside = FALSE, stopWhenOverlap = TRUE, ..., chosen.btas = deprecated(), no.plot = deprecated(), plot.title = deprecated(), expand.yzero = deprecated(), legend.outside = deprecated(), stop.when.overlap = deprecated() )
separate2GroupsCox( chosenBetas, xdata, ydata, probs = c(0.5, 0.5), noPlot = FALSE, plotTitle = "SurvivalCurves", xlim = NULL, ylim = NULL, expandYZero = FALSE, legendOutside = FALSE, stopWhenOverlap = TRUE, ..., chosen.btas = deprecated(), no.plot = deprecated(), plot.title = deprecated(), expand.yzero = deprecated(), legend.outside = deprecated(), stop.when.overlap = deprecated() )
chosenBetas |
list of testing coefficients to calculate prognostic
indexes, for example |
xdata |
n x m matrix with n observations and m variables. |
ydata |
Survival object. |
probs |
How to separate high and low risk patients |
noPlot |
Only calculate p-value and do not generate survival curve plot. |
plotTitle |
Name of file if. |
xlim |
Optional argument to limit the x-axis view. |
ylim |
Optional argument to limit the y-axis view. |
expandYZero |
expand to y = 0. |
legendOutside |
If TRUE legend will be outside plot, otherwise inside. |
stopWhenOverlap |
when probs vector allows for overlapping of samples in both groups, then stop. |
... |
additional parameters to survminer::ggsurvplot |
chosen.btas |
|
no.plot |
|
plot.title |
|
expand.yzero |
|
legend.outside |
|
stop.when.overlap |
Otherwise it will calculate with duplicate samples, i.e. simply adding them to xdata and ydata (in a different group). |
object with logrank test and kaplan-meier survival plot
A list with plot, p-value and kaplan-meier object. The plot was drawn from survminer::ggsurvplot with only the palette, data and fit arguments being defined and keeping all other defaults that can be customized as additional parameters to this function.
xdata <- survival::ovarian[, c("age", "resid.ds")] ydata <- data.frame( time = survival::ovarian$futime, status = survival::ovarian$fustat ) separate2GroupsCox(c(age = 1, 0), xdata, ydata) separate2GroupsCox(c(age = 1, 0.5), xdata, ydata) separate2GroupsCox( c(age = 1), c(1, 0, 1, 0, 1, 0), data.frame(time = runif(6), status = rbinom(6, 1, .5)) ) separate2GroupsCox(list( aa = c(age = 1, 0.5), bb = c(age = 0, 1.5) ), xdata, ydata)
xdata <- survival::ovarian[, c("age", "resid.ds")] ydata <- data.frame( time = survival::ovarian$futime, status = survival::ovarian$fustat ) separate2GroupsCox(c(age = 1, 0), xdata, ydata) separate2GroupsCox(c(age = 1, 0.5), xdata, ydata) separate2GroupsCox( c(age = 1), c(1, 0, 1, 0, 1, 0), data.frame(time = runif(6), status = rbinom(6, 1, .5)) ) separate2GroupsCox(list( aa = c(age = 1, 0.5), bb = c(age = 0, 1.5) ), xdata, ydata)
It was filtered with combined_scores and individual scores below 700 without text-based scores
data('string.network.700.cache', package = 'glmSparseNet')
data('string.network.700.cache', package = 'glmSparseNet')
An object of class dgCMatrix
with 11033 rows and 11033 columns.
Download protein-protein interactions from STRING DB
stringDBhomoSapiens( version = "11.0", scoreThreshold = 0, removeText = TRUE, score_threshold = deprecated(), remove.text = deprecated() )
stringDBhomoSapiens( version = "11.0", scoreThreshold = 0, removeText = TRUE, score_threshold = deprecated(), remove.text = deprecated() )
version |
version of the database to use |
scoreThreshold |
remove scores below threshold |
removeText |
remove text mining-based scores |
score_threshold |
|
remove.text |
a data.frame with rows representing an interaction between two proteins, and columns the count of scores above the given score_threshold
stringDBhomoSapiens(scoreThreshold = 800)
stringDBhomoSapiens(scoreThreshold = 800)