Title: | Rank-based single-sample gene set scoring method |
---|---|
Description: | A simple single-sample gene signature scoring method that uses rank-based statistics to analyze the sample's gene expression profile. It scores the expression activities of gene sets at a single-sample level. |
Authors: | Dharmesh D. Bhuva [aut] , Ruqian Lyu [aut, ctb], Momeneh Foroutan [aut, ctb] , Malvika Kharbanda [aut, cre] |
Maintainer: | Malvika Kharbanda <[email protected]> |
License: | GPL-3 |
Version: | 1.27.0 |
Built: | 2024-11-30 04:50:31 UTC |
Source: | https://github.com/bioc/singscore |
This function generates a number of random gene sets that
have the same number of genes as the scored gene set. It scores each random
gene set and returns a matrix of scores for all samples.
The empirical scores are used to calculate the empirical p-values and plot
the null distribution. The implementation uses BiocParallel::bplapply()
for easy access to parallel backends. Note that one should pass the same
values to the upSet
, downSet
, centerScore
and bidirectional
arguments as what they provide for the simpleScore()
function to generate
a proper null distribution.
generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'vector,ANY' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'GeneSet,ANY' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'vector,vector' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'GeneSet,GeneSet' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL )
generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'vector,ANY' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'GeneSet,ANY' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'vector,vector' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL ) ## S4 method for signature 'GeneSet,GeneSet' generateNull( upSet, downSet = NULL, rankData, subSamples = NULL, centerScore = TRUE, knownDirection = TRUE, B = 1000, ncores = 1, seed = sample.int(1e+06, 1), useBPPARAM = NULL )
upSet |
A GeneSet object or character vector of gene IDs of up-regulated gene set or a gene set where the nature of genes is not known |
downSet |
A GeneSet object or character vector of gene IDs of down-regulated gene set or NULL where only a single gene set is provided |
rankData |
A matrix object, ranked gene expression matrix data generated
using the |
subSamples |
A vector of sample labels/indices that will be used to subset the rankData matrix. All samples will be scored if not provided |
centerScore |
A Boolean, specifying whether scores should be centered
around 0, default as TRUE. Note: scores never centered if |
knownDirection |
A boolean, determining whether the gene set should be considered to be directional or not. A gene set is directional if the type of genes in it are known i.e. up- or down-regulated. This should be set to TRUE if the gene set is composed of both up- AND down-regulated genes. Defaults to TRUE. This parameter becomes irrelevant when both upSet(Colc) and downSet(Colc) are provided. |
B |
integer, the number of permutation repeats or the number of random gene sets to be generated, default as 1000 |
ncores |
integer, the number of CPU cores the function can use |
seed |
integer, set the seed for randomisation |
useBPPARAM |
the backend the function uses, if NULL is provided, the
function uses the default parallel backend which is the first on the list
returned by |
A matrix of empirical scores for all samples
Ruqian Lyu
Post about BiocParallel
browseVignettes("BiocParallel")
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # find out what backends can be registered on your machine BiocParallel::registered() # the first one is the default backend # ncores = ncores <- parallel::detectCores() - 2 permuteResult = generateNull(upSet = toy_gs_up, downSet = toy_gs_dn, ranked, centerScore = TRUE, B =10, seed = 1, ncores = 1 )
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # find out what backends can be registered on your machine BiocParallel::registered() # the first one is the default backend # ncores = ncores <- parallel::detectCores() - 2 permuteResult = generateNull(upSet = toy_gs_up, downSet = toy_gs_dn, ranked, centerScore = TRUE, B =10, seed = 1, ncores = 1 )
With null distributions estimated using the generateNull()
function, p-values are estimated using a one-tailed test. A minimum p-value
of 1/B can be achieved with B permutations.
getPvals(permuteResult, scoredf, subSamples = NULL)
getPvals(permuteResult, scoredf, subSamples = NULL)
permuteResult |
A matrix, null distributions for each sample generated
using the |
scoredf |
A dataframe, the scored results of samples under test
generated using the |
subSamples |
A vector of sample labels/indices that will be used to subset the score matrix. All samples will be scored if not provided |
Estimated p-values for enrichment of the signature in each sample. A p-value of 1/B indicates that the estimated p-value is less than or equal to 1/B.
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # find out what backends can be registered on your machine BiocParallel::registered() # the first one is the default backend, and it can be changed explicitly. # See vignette for more details permuteResult = generateNull(upSet = toy_gs_up, downSet = toy_gs_dn, ranked, B =10, seed = 1, useBPPARAM = NULL) # call the permutation function to generate the empirical scores # for B times. pvals <- getPvals(permuteResult,scoredf)
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # find out what backends can be registered on your machine BiocParallel::registered() # the first one is the default backend, and it can be changed explicitly. # See vignette for more details permuteResult = generateNull(upSet = toy_gs_up, downSet = toy_gs_dn, ranked, B =10, seed = 1, useBPPARAM = NULL) # call the permutation function to generate the empirical scores # for B times. pvals <- getPvals(permuteResult,scoredf)
Get a list of genes that are stably expressed in cancer and normal solid tissue.
getStableGenes( n_stable, type = c("carcinoma", "blood", "protein"), id = c("geneid", "ensembl") )
getStableGenes( n_stable, type = c("carcinoma", "blood", "protein"), id = c("geneid", "ensembl") )
n_stable |
numeric, number of stable genes to retrieve |
type |
character, type of stable genes requested, stable genes in "carcinoma" or stable genes in "blood" |
id |
character, gene identifier required. This can be either "geneid" for symbols or "ensembl" ensembl id) |
a character vector with gene IDs sorted by their expected expression levels in the requested tissue
getStableGenes(5) getStableGenes(5, id = 'ensembl') getStableGenes(5, type = 'blood')
getStableGenes(5) getStableGenes(5, id = 'ensembl') getStableGenes(5, type = 'blood')
This function computes 'singscores' using a ranked gene
expression matrix obtained from the rankGenes()
function and a
GeneSetCollection object or a list of GeneSet objects. It returns a list of
two matrices containing the scores and dispersions. This function should be
used when scoring needs to be performed for multiple signatures. It is
faster than applying simpleScore()
across the different signatures
independently.
multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSetCollection,missing' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSetCollection,GeneSetCollection' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,list,missing' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,list,list' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE )
multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSetCollection,missing' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSetCollection,GeneSetCollection' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,list,missing' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,list,list' multiScore( rankData, upSetColc, downSetColc, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE )
rankData |
A matrix object, ranked gene expression matrix data generated
using the |
upSetColc |
A GeneSetCollection object, a list of GeneSet objects, or a
list of character vectors of up-regulated (or mixed, see
|
downSetColc |
A GeneSetCollection object, a list of GeneSet objects, or a list of character vectors of down-regulated gene sets. NULL otherwise. Names of gene sets within this collection/list should be the same as those of the upSetColc |
subSamples |
A vector of sample labels/indices that will be used to subset the rankData matrix. All samples will be scored if not provided |
centerScore |
A Boolean, specifying whether scores should be centered
around 0, default as TRUE. Note: scores never centered if |
dispersionFun |
A function, dispersion function with default being |
knownDirection |
A boolean, determining whether the gene set should be considered to be directional or not. A gene set is directional if the type of genes in it are known i.e. up- or down-regulated. This should be set to TRUE if the gene set is composed of both up- AND down-regulated genes. Defaults to TRUE. This parameter becomes irrelevant when both upSet(Colc) and downSet(Colc) are provided. |
A list of two matrices containing the scores and dispersions
ranked <- rankGenes(toy_expr_se) GSEABase::setName(toy_gs_up) = "toy_gs_up" GSEABase::setName(toy_gs_dn) = "toy_gs_dn" gslist <- list(toy_gs_up, toy_gs_dn) gscolc <- GSEABase::GeneSetCollection(gslist) scoredf <- multiScore(ranked, upSetColc = gscolc)
ranked <- rankGenes(toy_expr_se) GSEABase::setName(toy_gs_up) = "toy_gs_up" GSEABase::setName(toy_gs_dn) = "toy_gs_dn" gslist <- list(toy_gs_up, toy_gs_dn) gscolc <- GSEABase::GeneSetCollection(gslist) scoredf <- multiScore(ranked, upSetColc = gscolc)
This function takes the output from the simpleScore() function
and generates scatter plots of score vs. dispersion for the total
score, the up score and the down score of samples. If you wish to use the
plotting function but with some customized inputs (instead of outputs from
simpleScore
function), you need to make sure the formats are the same.
To be specific, you need to have columns names "TotalScore"
"TotalDispersion" "UpScore" "UpDispersion" "DownScore" "DownDispersion"
and rows names as samples.
plotDispersion( scoredf, annot = NULL, annot_name = "", sampleLabels = NULL, alpha = 1, size = 1, textSize = 1.2, isInteractive = FALSE )
plotDispersion( scoredf, annot = NULL, annot_name = "", sampleLabels = NULL, alpha = 1, size = 1, textSize = 1.2, isInteractive = FALSE )
scoredf |
data.frame, generated using the |
annot |
any numeric, character or factor annotation provided by the user that needs to be plot. Alternatively, this can be a character specifying the column of scoredf holding the annotation. Annotations must be ordered in the same way as the scores |
annot_name |
character, legend title for the annotation |
sampleLabels |
vector of character, sample names to display, ordered in the same way as samples are ordered in the 'scoredf' data.frame and with labels for all samples. Samples whose labels should not be displayed should be left as empty strings or NAs. Default as NULL which means the projected points are not labelled. |
alpha |
numeric, set the transparency of points |
size |
numeric, set the size of each point |
textSize |
numeric, relative text sizes for title, labels, and axis values |
isInteractive |
Boolean, determine whether the plot is interactive |
A ggplot object
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) plotDispersion(scoredf) plotDispersion(scoredf, isInteractive = TRUE)
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) plotDispersion(scoredf) plotDispersion(scoredf, isInteractive = TRUE)
This function takes the results from function generateNull()
and plots the density curves of permuted scores for the provided samples via
sampleNames
parameter. It can plot null distribution(s) for a single
sample or multiple samples.
plotNull( permuteResult, scoredf, pvals, sampleNames = NULL, cutoff = 0.01, textSize = 2, labelSize = 5 )
plotNull( permuteResult, scoredf, pvals, sampleNames = NULL, cutoff = 0.01, textSize = 2, labelSize = 5 )
permuteResult |
A matrix, null distributions for each sample generated
using the |
scoredf |
A dataframe, singscores generated using the |
pvals |
A vector, estimated p-values using the |
sampleNames |
A character vector, sample IDs for which null distributions will be plotted |
cutoff |
numeric, the cutoff value for determining significance |
textSize |
numeric, size of axes labels, axes values and title |
labelSize |
numeric, size of label texts |
a ggplot object
Ruqian Lyu
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # find out what backends can be registered on your machine BiocParallel::registered() # the first one is the default backend, and it can be changed explicitly. permuteResult = generateNull(upSet = toy_gs_up, downSet = toy_gs_dn, ranked, B =10, seed = 1,useBPPARAM = NULL) # call the permutation function to generate the empirical scores #for B times. pvals <- getPvals(permuteResult,scoredf) # plot for all samples plotNull(permuteResult,scoredf,pvals,sampleNames = names(pvals)) #plot for the first sample plotNull(permuteResult,scoredf,pvals,sampleNames = names(pvals)[1])
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # find out what backends can be registered on your machine BiocParallel::registered() # the first one is the default backend, and it can be changed explicitly. permuteResult = generateNull(upSet = toy_gs_up, downSet = toy_gs_dn, ranked, B =10, seed = 1,useBPPARAM = NULL) # call the permutation function to generate the empirical scores #for B times. pvals <- getPvals(permuteResult,scoredf) # plot for all samples plotNull(permuteResult,scoredf,pvals,sampleNames = names(pvals)) #plot for the first sample plotNull(permuteResult,scoredf,pvals,sampleNames = names(pvals)[1])
This function takes a single-column data frame, which is a
single-column subset of the ranked matrix data generated using
rankGenes()
function, and the gene sets of interest as inputs. It plots
the density of ranks for genes in the gene set and overlays a barcode plot
of these ranks. Ranks are normalized by dividing them by the maximum rank.
Densities are estimated using KDE.
plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,vector,missing' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,GeneSet,missing' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,vector,vector' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,GeneSet,GeneSet' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 )
plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,vector,missing' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,GeneSet,missing' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,vector,vector' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 ) ## S4 method for signature 'ANY,GeneSet,GeneSet' plotRankDensity( rankData, upSet, downSet = NULL, isInteractive = FALSE, textSize = 1.5 )
rankData |
one column of the ranked gene expression matrix obtained from
the |
upSet |
GeneSet object or a vector of gene Ids, up-regulated gene set |
downSet |
GeneSet object or a vector of gene Ids, down-regulated gene set |
isInteractive |
Boolean, determine whether the returned plot is interactive |
textSize |
numberic, set the size of text on the plot |
A ggplot object (or a plotly object) with a rank density plot overlayed with a barcode plot
ranked <- rankGenes(toy_expr_se) plotRankDensity(ranked[,2,drop = FALSE], upSet = toy_gs_up)
ranked <- rankGenes(toy_expr_se) plotRankDensity(ranked[,2,drop = FALSE], upSet = toy_gs_up)
This function takes two data frames which are outputs from the
simpleScore() function and plots the relationship between the two gene set
scores for samples in the gene expression matrix.Scoredf1 and Scoredf2 are
two scoring results of the same set of samples against two different gene
signatures. If you wish to use the plotting function but with some
customized inputs (instead of outputs from the simpleScore
function), you
need to make sure the formats are the same To be specific, you need to have
column names "TotalScore" "TotalDispersion" "UpScore" "UpDispersion"
"DownScore" "DownDispersion" and rows names as samples.
plotScoreLandscape( scoredf1, scoredf2, scorenames = c(), textSize = 1.2, isInteractive = FALSE, hexMin = 100 )
plotScoreLandscape( scoredf1, scoredf2, scorenames = c(), textSize = 1.2, isInteractive = FALSE, hexMin = 100 )
scoredf1 |
data.frame, result of the simpleScore() function which scores the gene expression matrix against a gene set of interest |
scoredf2 |
data.frame, result of the simpleScore() function which scores the gene expression matrix against another gene set of interest |
scorenames |
character vector of length 2, names for the two scored gene set/signatures stored in scoredf1 and scoredf2 |
textSize |
numeric, set the text size for the plot, default as 1.5 |
isInteractive |
boolean, whether the plot is interactive default as FALSE |
hexMin |
integer, the threshold which decides whether hex bin plot or scatter plot is displayed, default as 100 |
A ggplot object, a scatter plot, demonstrating the relationship between scores from two signatures on the same set of samples.
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) scoredf2 <- simpleScore(ranked, upSet = toy_gs_up) plotScoreLandscape(scoredf, scoredf2)
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) scoredf2 <- simpleScore(ranked, upSet = toy_gs_up) plotScoreLandscape(scoredf, scoredf2)
plotScoreLandscape()
This function takes the output (ggplot object) of the function
plotScoreLandscape()
and a new dataset. It projects the new data
points onto the landscape plot and returns a new ggplot object with
projected data points.
projectScoreLandscape( plotObj = NULL, scoredf1, scoredf2, annot = NULL, annot_name = NULL, subSamples = NULL, sampleLabels = NULL, isInteractive = FALSE )
projectScoreLandscape( plotObj = NULL, scoredf1, scoredf2, annot = NULL, annot_name = NULL, subSamples = NULL, sampleLabels = NULL, isInteractive = FALSE )
plotObj |
a ggplot object, resulted from |
scoredf1 |
data.frame, result of the simpleScore() function which scores the gene expression matrix against a gene set of interest |
scoredf2 |
data.frame, result of the simpleScore() function which scores
the gene expression matrix against another gene set of interest. Scores in
scoredf1 and scoredf2 consist of the new data points that will be projected
on the |
annot |
any numeric, character or factor annotation provided by the user that needs to be plot. Alternatively, this can be a character specifying the column of scoredf1 holding the annotation. Annotations must be ordered in the same way as the scores |
annot_name |
character, legend title for the annotation |
subSamples |
vector of character or indices for subsetting the scoredfs,
default as NULL and all samples in scoredfs will be plotted. The subsetted
samples are projected onto the landscape plot of |
sampleLabels |
vector of character, sample names to display, ordered in the same way as samples are ordered in the 'scoredfs' data.frames and with labels for all samples. Samples whose labels should not be displayed should be left as empty strings or NAs. Default as NULL which means the projected points are not labelled. |
isInteractive |
boolean, whether the plot is interactive default as FALSE |
New data points on the already plotted ggplot object from plotScoreLanscape()
plotScoreLandscape()
@examples
ranked <- rankGenes(toy_expr_se)
scoredf1 <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn)
scoredf2 <- simpleScore(ranked, upSet = toy_gs_up)
psl <- plotScoreLandscape(scoredf1, scoredf2)
projectScoreLandscape(psl,scoredf1, scoredf2)
The rankGenes
function is a generic function that can deal
with mutilple types of inputs. Given a matrix of gene expression that has
samples in columns, genes in rows, and values being gene expression
intensity,rankGenes
ranks gene expression intensities in each sample.
It can also work with S4 objects that have gene expression matrix as a
component (i.e ExpressionSet, DGEList,SummarizedExperiment). It calls the
rank
function in the base package which ranks the gene expression
matrix by its absolute expression level. If the input is S4 object of
DGEList, ExpressionSet, or SummarizedExperiment
, it will extract the
gene expression matrix from the object and rank the genes. The default
'tiesMethod' is set to 'min'.
rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'matrix' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'data.frame' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'DGEList' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'ExpressionSet' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'SummarizedExperiment' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL)
rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'matrix' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'data.frame' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'DGEList' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'ExpressionSet' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL) ## S4 method for signature 'SummarizedExperiment' rankGenes(expreMatrix, tiesMethod = "min", stableGenes = NULL)
expreMatrix |
matrix, data.frame, ExpressionSet, DGEList or SummarizedExperiment storing gene expression measurements |
tiesMethod |
character, indicating what method to use when dealing with ties |
stableGenes |
character, containing a list of stable genes to be used to
rank genes using expression of stable genes. This is required when using the
stable genes dependent version of singscore (see details in |
The ranked gene expression matrix that has samples in columns and genes in rows. Unit normalised ranks are returned if data is ranked using stable genes
getStableGenes
, simpleScore
,
rank
, "ExpressionSet",
"SummarizedExperiment",
"DGEList"
rankGenes(toy_expr_se) # toy_expr_se is a gene expression dataset # ExpressionSet object emat <- SummarizedExperiment::assay(toy_expr_se) e <- Biobase::ExpressionSet(assayData = as.matrix(emat)) rankGenes(e) #scoring using the stable version of singscore rankGenes(e, stableGenes = c('2', '20', '25')) ## Not run: #for real cancer or blood datasets, use getStableGenes() rankGenes(cancer_expr, stableGenes = getStableGenes(5)) rankGenes(blood_expr, stableGenes = getStableGenes(5, type = 'blood')) ## End(Not run)
rankGenes(toy_expr_se) # toy_expr_se is a gene expression dataset # ExpressionSet object emat <- SummarizedExperiment::assay(toy_expr_se) e <- Biobase::ExpressionSet(assayData = as.matrix(emat)) rankGenes(e) #scoring using the stable version of singscore rankGenes(e, stableGenes = c('2', '20', '25')) ## Not run: #for real cancer or blood datasets, use getStableGenes() rankGenes(cancer_expr, stableGenes = getStableGenes(5)) rankGenes(blood_expr, stableGenes = getStableGenes(5, type = 'blood')) ## End(Not run)
This data.frame stores pre-computed scores of the CCLE dataset Barretina et al calculated using the
simpleScore()
function against the epithelial gene signature from Tan, Tuan Zea et al. The
data.frame has scores for 55 samples. Please refer to the vignettes for
instructions on how to obtain the full datasets.
scoredf_ccle_epi
scoredf_ccle_epi
An object of class data.frame
with 55 rows and 2 columns.
Barretina, Jordi, Giordano Caponigro, Nicolas Stransky, Kavitha Venkatesan, Adam A Margolin, Sungjoon Kim, Christopher J Wilson, et al. 2012. “The Cancer Cell Line Encyclopedia Enables Predictive Modelling of Anticancer Drug Sensitivity.” Nature 483 (7391): 603–7.
Tan, Tuan Zea, Qing Hao Miow, Yoshio Miki, Tetsuo Noda, Seiichi Mori, Ruby Yun-Ju Huang, and Jean Paul Thiery. 2014–10AD. “Epithelial-Mesenchymal Transition Spectrum Quantification and Its Efficacy in Deciphering Survival and Drug Responses of Cancer Patients.” EMBO Molecular Medicine 6 (10). Oxford, UK: BlackWell Publishing Ltd: 1279–93. doi:10.15252/emmm.201404208.
This data.frame stores pre-computed scores of the CCLE dataset Barretina et al calculated using the
simpleScore()
function against the mesenchymal gene signature from Tan, Tuan Zea et al. The
data.frame has scores for 55 samples. Please refer to the vignettes for
instructions on how to obtain the full datasets.
scoredf_ccle_mes
scoredf_ccle_mes
An object of class data.frame
with 55 rows and 2 columns.
Barretina, Jordi, Giordano Caponigro, Nicolas Stransky, Kavitha Venkatesan, Adam A Margolin, Sungjoon Kim, Christopher J Wilson, et al. 2012. “The Cancer Cell Line Encyclopedia Enables Predictive Modelling of Anticancer Drug Sensitivity.” Nature 483 (7391): 603–7.
Tan, Tuan Zea, Qing Hao Miow, Yoshio Miki, Tetsuo Noda, Seiichi Mori, Ruby Yun-Ju Huang, and Jean Paul Thiery. 2014–10AD. “Epithelial-Mesenchymal Transition Spectrum Quantification and Its Efficacy in Deciphering Survival and Drug Responses of Cancer Patients.” EMBO Molecular Medicine 6 (10). Oxford, UK: BlackWell Publishing Ltd: 1279–93 doi:10.15252/emmm.201404208.
This data.frame stores pre-computed scores of the
TCGA dataset calculated using the
simpleScore()
function against the epithelial gene signature from Tan, Tuan Zea et al.
Please refer to the vignettes for instructions on how to obtain the full
datasets.
scoredf_tcga_epi
scoredf_tcga_epi
An object of class data.frame
with 1119 rows and 2 columns.
Tan, Tuan Zea, Qing Hao Miow, Yoshio Miki, Tetsuo Noda, Seiichi Mori, Ruby Yun-Ju Huang, and Jean Paul Thiery. 2014–10AD. “Epithelial-Mesenchymal Transition Spectrum Quantification and Its Efficacy in Deciphering Survival and Drug Responses of Cancer Patients.” EMBO Molecular Medicine 6 (10). Oxford, UK: BlackWell Publishing Ltd: 1279–93 doi:10.15252/emmm.201404208.
This data.frame stores pre-computed scores of the
TCGA dataset calculated using the
simpleScore()
function against the mesenchymal gene signature from Tan, Tuan Zea et al.
Please refer to the vignettes for instructions on how to obtain the full
datasets.
scoredf_tcga_mes
scoredf_tcga_mes
An object of class data.frame
with 1119 rows and 2 columns.
Tan, Tuan Zea, Qing Hao Miow, Yoshio Miki, Tetsuo Noda, Seiichi Mori, Ruby Yun-Ju Huang, and Jean Paul Thiery. 2014–10AD. “Epithelial-Mesenchymal Transition Spectrum Quantification and Its Efficacy in Deciphering Survival and Drug Responses of Cancer Patients.” EMBO Molecular Medicine 6 (10). Oxford, UK: BlackWell Publishing Ltd: 1279–93 doi:10.15252/emmm.201404208.
This function computes 'singscores' using an unmodified
ranked gene expression matrix obtained from the rankGenes()
function and a
gene set or a pair of up-regulated and down-regulated gene sets. It returns
a data.frame of scores and dispersions for each sample. The gene sets can be
in vector format or as GeneSet objects (from GSEABase packages). If samples
need to be scored against a single gene set, the upSet
argument
should be used to pass the gene set while the downSet
argument is set
to NULL
. This setting is ideal for gene sets representing gene
ontologies where the nature of the genes is unknown (up- or down-regulated).
simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,vector,missing' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSet,missing' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,vector,vector' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSet,GeneSet' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE )
simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,vector,missing' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSet,missing' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,vector,vector' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE ) ## S4 method for signature 'matrix,GeneSet,GeneSet' simpleScore( rankData, upSet, downSet = NULL, subSamples = NULL, centerScore = TRUE, dispersionFun = mad, knownDirection = TRUE )
rankData |
A matrix object, ranked gene expression matrix data generated
using the |
upSet |
A GeneSet object or character vector of gene IDs of up-regulated gene set or a gene set where the nature of genes is not known |
downSet |
A GeneSet object or character vector of gene IDs of down-regulated gene set or NULL where only a single gene set is provided |
subSamples |
A vector of sample labels/indices that will be used to subset the rankData matrix. All samples will be scored if not provided |
centerScore |
A Boolean, specifying whether scores should be centered
around 0, default as TRUE. Note: scores never centered if |
dispersionFun |
A function, dispersion function with default being |
knownDirection |
A boolean, determining whether the gene set should be considered to be directional or not. A gene set is directional if the type of genes in it are known i.e. up- or down-regulated. This should be set to TRUE if the gene set is composed of both up- AND down-regulated genes. Defaults to TRUE. This parameter becomes irrelevant when both upSet(Colc) and downSet(Colc) are provided. |
Signature scores can be computed using transcriptome-wide
measurements or using a smaller set of measuremnts. If ranks are computed
using the default invocation of rankgenes
, the former method is applied
where the rank of each gene in the signature is computed relative to all
other genes in the dataset. Accuracy of this approximation of the relative
expression of a gene will be improved if all or most transctripts are
measured in the experiment. This was the approach proposed in the original
manucript of singscore (Foroutan M, Bhuva DD, et al 2018).
If instead a selected panel of genes is measured (such as from nanostring or
RT-qPCR), a different rank approximation methods using a small set of stable
genes can be used. This approach only requires measurements of genes in the
signature and a small set of stable genes. This approach of scoring can be
invoked by producing a rank matrix by passing in the stableGenes
argument
of rankGenes
. Stable genes in solid cancers and in blood can be retrieved
using getStableGenes
. Upon providing a set of stable genes, rankGenes
automatically ranks all genes relative to these stable genes. When
simpleScore
is provided with a rank matrix constructed using stable genes,
it automatically computes scores using a new approach. Details of the set of
stable genes, the new rank estimation approach and the new scoring approach
will soon be published (manuscript in preparation).
A data.frame consists of singscores and dispersions for all samples
Foroutan, M., Bhuva, D. D., Lyu, R., Horan, K., Cursons, J., & Davis, M. J. (2018). Single sample scoring of molecular phenotypes. BMC bioinformatics, 19(1), 1-10.
rankGenes
, getStableGenes
,
rank
, "GeneSet"
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # toy_gs_up is a GeneSet object, alternatively a vector of gene ids may also # be supplied.
ranked <- rankGenes(toy_expr_se) scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn) # toy_gs_up is a GeneSet object, alternatively a vector of gene ids may also # be supplied.
The package provides functions for calculating gene-set enrichment scores at a single-sample level using gene expression data. It includes functions to perform hypothesis testing and provides visualisations to enable diagnosis of scores and gene sets along with visualisations to enable exploration of results.
A microarray gene expression dataset that was originally obtained from the
integrated TGFb-EMT data published by (Foroutan et al, 2017). (ComBat
corrected values). tgfb_expr_10
is a subset of the integrated TGFb-EMT
data consisting of 10 samples (4 TGFb treated and 6 controls) each with
expression values for 11900 genes.
tgfb_expr_10_se
tgfb_expr_10_se
A SummarizedExperiment object
Foroutan, Momeneh, Joseph Cursons, Soroor Hediyeh-Zadeh, Erik W Thompson, and Melissa J Davis. 2017. “A Transcriptional Program for Detecting Tgfbeta-Induced Emt in Cancer.” Molecular Cancer Research. American Association for Cancer Research. doi:10.1158/1541-7786.MCR-16-0313.
A GeneSet object that contains the down-regulated genes of the TGFb-induced EMT gene signature that was derived by (Foroutan et al,2017), using two meta-analysis techniques. The gene signature contains an up-regulated gene set (up-set) and a down-regulated gene set (down-set). Please refer to the vignettes for the steps to acquire the exact data object.
tgfb_gs_dn
tgfb_gs_dn
A GeneSet object
Foroutan, Momeneh, Joseph Cursons, Soroor Hediyeh-Zadeh, Erik W Thompson, and Melissa J Davis. 2017. “A Transcriptional Program for Detecting Tgfbeta-Induced Emt in Cancer.” Molecular Cancer Research. American Association for Cancer Research. doi:10.1158/1541-7786.MCR-16-0313.
A GeneSet object that contains the up-regulated genes of the TGFb-induced EMT gene signature that was derived by (Foroutan et al.,2017), using two meta-analysis techniques. The gene signature contains an up-regulated gene set (up-set) and a down-regulated gene set (down-set). Please refer to the vignettes for the steps to acquire the exact data object.
tgfb_gs_up
tgfb_gs_up
A GeneSet object
Foroutan, Momeneh, Joseph Cursons, Soroor Hediyeh-Zadeh, Erik W Thompson, and Melissa J Davis. 2017. “A Transcriptional Program for Detecting Tgfbeta-Induced Emt in Cancer.” Molecular Cancer Research. American Association for Cancer Research. doi:10.1158/1541-7786.MCR-16-0313.
A toy dataset consisting of 2 samples with the expression values of 20 genes. The data was created by sampling 2 samples and 20 genes from the dataset by Foroutan et al, 2017.
toy_expr_se
toy_expr_se
A SummarizedExperiment of 2 samples each with 20 genes
a control sample
a TGFb-treated sample
Foroutan, Momeneh, Joseph Cursons, Soroor Hediyeh-Zadeh, Erik W Thompson, and Melissa J Davis. 2017. “A Transcriptional Program for Detecting Tgfbeta-Induced Emt in Cancer.” Molecular Cancer Research. American Association for Cancer Research. doi:10.1158/1541-7786.MCR-16-0313.
A GeneSet object with 5 genes randomly selected from the toy dataset. These genes are independent of those in toy_gs_up
toy_gs_dn
toy_gs_dn
A GSEABase::GeneSet object with 5 genes
"GeneSet"
,toy_expr_se,toy_gs_up
A GeneSet object with 5 genes randomly selected from the toy dataset. These genes are independent of those in toy_gs_dn
toy_gs_up
toy_gs_up
A GeneSet object with 5 genes
"GeneSet"
,toy_expr_se,toy_gs_dn