Title: | Interpretation of RNA-seq experiments through robust, efficient comparison to public databases |
---|---|
Description: | This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package. |
Authors: | Sehyun Oh [aut, cre], Levi Waldron [aut], Sean Davis [aut] |
Maintainer: | Sehyun Oh <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.15.0 |
Built: | 2024-10-30 07:28:44 UTC |
Source: | https://github.com/bioc/GenomicSuperSignature |
This function finds the RAV with the highest validation score (including RAVs with negative silhouette width) for specified PC of the dataset and returns the top enriched pathways.
annotatePC( PCnum, val_all, RAVmodel, n = 5, scoreCutoff = 0.5, nesCutoff = NULL, simplify = TRUE, abs = FALSE, trimed_pathway_len = 45 )
annotatePC( PCnum, val_all, RAVmodel, n = 5, scoreCutoff = 0.5, nesCutoff = NULL, simplify = TRUE, abs = FALSE, trimed_pathway_len = 45 )
PCnum |
A numeric vector. PC number of your dataset to retrieve
annotation results for. The vector can contain any integer number among
|
val_all |
The output from |
RAVmodel |
The RAVmodel used to generate the input for the argument,
|
n |
An integer. Default is 5. The number of the top enriched pathways
to print out. If there are fewer than n pathways passed the cutoff, it will
print out |
scoreCutoff |
A numeric value for the minimum correlation between loadings of the dataset principal component and the RAV. Default is 0.5. |
nesCutoff |
A numeric value for the minimum Normalized Enrichment Score
(NES) for the enrichment analysis. Default is |
simplify |
A logical. Under default ( |
abs |
Default is |
trimed_pathway_len |
Positive integer values, which is the display width of pathway names. Default is 45. |
A data frame of a list based on the simplify
argument. Check
the output detail above.
data(miniRAVmodel) library(bcellViper) data(bcellViper) val_all <- validate(dset, miniRAVmodel) annotatePC(2, val_all, miniRAVmodel)
data(miniRAVmodel) library(bcellViper) data(bcellViper) val_all <- validate(dset, miniRAVmodel) annotatePC(2, val_all, miniRAVmodel)
Search the top enriched pathways for RAV
annotateRAV(RAVmodel, ind, n = 5, abs = FALSE)
annotateRAV(RAVmodel, ind, n = 5, abs = FALSE)
RAVmodel |
PCAGenomicSignatures object. |
ind |
An integer for RAV you want to check the enriched pathways. |
n |
A number of top enriched pathways to output. Default is 5. |
abs |
Default is |
A data frame with n
rows and 4 columns;
Description, NES, pvalue, and qvalues
data(miniRAVmodel) annotateRAV(miniRAVmodel, ind = 695)
data(miniRAVmodel) annotateRAV(miniRAVmodel, ind = 695)
List the available RAVmodels
availableRAVmodel(simplify = TRUE)
availableRAVmodel(simplify = TRUE)
simplify |
Default is |
Under the default, this function will return a data frame with four columns - prior, version, update, pkg_version.
prior
: Different gene sets used for RAVmodel annotation.
Currently, two are available - C2
for MSigDB C2 (curated gene
sets), and PLIERpriors
for bloodCellMarkersIRISDMAP, svmMarkers,
and canonicalPathways
version
: RAVmodel's version, which can be an input for
version
argument of getModel
function
update
: Date the RAVmodel is updated
pkg_version
: Compatible version of GenomicSuperSignature
availableRAVmodel()
availableRAVmodel()
Calculate average loadings of each cluster
buildAvgLoading(dat, k, n = 20, cluster = NULL, study = TRUE)
buildAvgLoading(dat, k, n = 20, cluster = NULL, study = TRUE)
dat |
A data frame. Each row represents principle components from different training datasets. Columns are genes used for PCA analysis. |
k |
The number of clusters used for hierarchical clustering |
n |
The number of top principle components from each datasets used for model building. Default is 20. |
cluster |
Provide pre-defined cluster membership of your data. |
study |
Under default ( |
A named list of 6 elements is returned. It contains:
cluster
A numeric vector on cluster membership of PCs
size
A integer vector on the size of clusters
avgLoading
A matrix of average loadings. Columns for clusters and rows for genes
k
The number of clusters
n
The number of top PCs used for clustering
studies
A list of character vector containing studies in each cluster
data(miniAllZ) data(res_hcut) res <- buildAvgLoading(miniAllZ, k = 40, cluster = res_hcut$cluster)
data(miniAllZ) data(res_hcut) res <- buildAvgLoading(miniAllZ, k = 40, cluster = res_hcut$cluster)
Calculate the validation score for a new dataset
calculateScore(dataset, RAVmodel, rescale.after = TRUE)
calculateScore(dataset, RAVmodel, rescale.after = TRUE)
dataset |
A gene expression profile to be validated. Different classes of objects can be used including ExpressionSet, SummarizedExperiment, RangedSummarizedExperiment, or matrix. Rownames (genes) should be in symbol format. If it is a matrix, genes should be in rows and samples in columns. |
RAVmodel |
PCAGenomicSignatures object. A matrix of average loadings, an
output from |
rescale.after |
Under the default ( |
A list containing the score matrices for input datasets. Scores are assigned to each sample (row) for each cluster (column).
data(miniRAVmodel) library(bcellViper) data(bcellViper) score <- calculateScore(dset, miniRAVmodel) data(miniTCGA) score <- calculateScore(miniTCGA, miniRAVmodel)
data(miniRAVmodel) library(bcellViper) data(bcellViper) score <- calculateScore(dset, miniRAVmodel) data(miniTCGA) score <- calculateScore(miniTCGA, miniRAVmodel)
Plot a word cloud using the remaining MeSH terms in the selected RAV after user-defined filtering.
drawWordcloud( RAVmodel, ind, rm.noise = NULL, scale = c(3, 0.5), weighted = TRUE, drop = NULL, filterMessage = TRUE )
drawWordcloud( RAVmodel, ind, rm.noise = NULL, scale = c(3, 0.5), weighted = TRUE, drop = NULL, filterMessage = TRUE )
RAVmodel |
PCAGenomicSignatures object |
ind |
An index of the RAV you want to draw wordcloud. |
rm.noise |
An integer. Under the default ( |
scale |
A |
weighted |
A logical. If |
drop |
A character vector containing MeSH terms to be excluded from word
cloud. Under the default ( |
filterMessage |
A logical. Under the default |
A word cloud with the MeSH terms associated with the given cluster.
data(miniRAVmodel) drawWordcloud(miniRAVmodel, 1139)
data(miniRAVmodel) drawWordcloud(miniRAVmodel, 1139)
MeSH terms to be excluded in drawWordcloud function
droplist
droplist
A character vector containing MeSH terms to be excluded.
Sehyun Oh [email protected]
Performs a principal components analysis on the given data matrix and returns
the results as an object of class prcomp
.
extractPC(x)
extractPC(x)
x |
a numeric or complex matrix (or data frame) which provides the gene expression data for the principal components analysis. Genes in the rows and samples in the columns. |
A prcomp
object.
m = matrix(rnorm(100),ncol=5) extractPC(m)
m = matrix(rnorm(100),ncol=5) extractPC(m)
RAVs that will output with quality-control messages
filterList
filterList
A named list with four elements - "Cluster_Size_filter", "GSEA_C2_filter", "GSEA_PLIERpriors_filter", and "Redundancy_filter".
Sehyun Oh [email protected]
Once you provide RAVmodel, keyword you're searching for, and the RAV number to this function, it will give you the abs(NES)-based rank of your keyword in the enriched pathways of the target RAV. If can be useful to find out how uniquely your keyword-containing pathways are represented.
findKeywordInRAV(RAVmodel, keyword, ind, n = NULL, includeTotal = TRUE)
findKeywordInRAV(RAVmodel, keyword, ind, n = NULL, includeTotal = TRUE)
RAVmodel |
PCAGenomicSignatures-object. |
keyword |
A character vector. If you are searching for multiple keywords
at the same time, use |
ind |
An integer. The RAV number you want to check. |
n |
An integer. The number of top enriched pathways (based on abs(NES))
to search. Under default ( |
includeTotal |
Under the default condition ( |
A character containing the rank of keyword-containing pathways (separated by |), followed by the total number of enriched pathways in parenthesis.
data(miniRAVmodel) findKeywordInRAV(miniRAVmodel, "Bcell", ind = 695)
data(miniRAVmodel) findKeywordInRAV(miniRAVmodel, "Bcell", ind = 695)
This function finds RAVs containing the keyword you provide. If you provide
"the number of keyword-containing pathways per RAV" in argument k
,
it will give you the RAV number.
findSignature(RAVmodel, keyword, n = 5, k = NULL)
findSignature(RAVmodel, keyword, n = 5, k = NULL)
RAVmodel |
PCAGenomicSignatures-object |
keyword |
A character vector. If you are searching for multiple keywords
at the same time, use |
n |
The number of top ranked (based on abs(NES)) pathways you want to search your keyword |
k |
The number of keyword-containing pathways you want to get the RAV
number. Under default ( |
A data frame or integer vector depending on the parameter k
.
data(miniRAVmodel) findSignature(miniRAVmodel, "Bcell") findSignature(miniRAVmodel, "Bcell", k = 5)
data(miniRAVmodel) findSignature(miniRAVmodel, "Bcell") findSignature(miniRAVmodel, "Bcell", k = 5)
Find the studies contributing each RAV
findStudiesInCluster(RAVmodel, ind = NULL, studyTitle = FALSE)
findStudiesInCluster(RAVmodel, ind = NULL, studyTitle = FALSE)
RAVmodel |
PCAGenomicSignatures object. |
ind |
A numeric vector containing the RAV indexes. Under the default
( |
studyTitle |
Default is |
A list of character vectors. Under the default condition
(ind = NULL
), all the RAVs will be checked for their contributing
studies and the length of the list will be same as the number of RAVs
(= metadata(x)$k
). If you provide the ind
argument, studies
associated with only the specified RAVs will be returned.
Mainly used for model building, within buildAvgLoading.
data(miniRAVmodel) findStudiesInCluster(miniRAVmodel, 1076)
data(miniRAVmodel) findStudiesInCluster(miniRAVmodel, 1076)
GenomicSignatures is a virtual class inherited from SummarizedExperiment and hosts GenomicSignatures models built from different dimensional reduction methods. Currently, PCA-based model, called PCAGenomicSignatures, is available.
x |
A |
value |
See details. |
GenomicSignatures
GenomicSignatures
objectThe default contents of GenomicSignatures
object, with
a set of getter and setter generic functions, which extract either the
assay
, colData
, or metadata
slots of a
GenomicSignatures-class
object. When you create this object,
colData$studies
should be populated before adding any information in
trainingData
slot.
## S4 method for signature 'GenomicSignatures' RAVindex(x) ## S4 method for signature 'GenomicSignatures' geneSets(x) ## S4 method for signature 'GenomicSignatures' updateNote(x) ## S4 method for signature 'GenomicSignatures' version(x) ## S4 replacement method for signature 'GenomicSignatures' geneSets(x) <- value ## S4 replacement method for signature 'GenomicSignatures' updateNote(x) <- value
## S4 method for signature 'GenomicSignatures' RAVindex(x) ## S4 method for signature 'GenomicSignatures' geneSets(x) ## S4 method for signature 'GenomicSignatures' updateNote(x) ## S4 method for signature 'GenomicSignatures' version(x) ## S4 replacement method for signature 'GenomicSignatures' geneSets(x) <- value ## S4 replacement method for signature 'GenomicSignatures' updateNote(x) <- value
x |
A |
value |
See details. |
assay(x) : RAVindex (= avgLoadings) containing genes x RAVs
metadata(x) : Metadata associated with RAVindex building process
colData(x) : Information on RAVs
A GenomicSignatures object for the constructor
Setter method values (i.e., function(x) <- value
):
metadata<- : Assign metadata
coldata<- : Assign extra information associated with RAVs
geneSets<- : A character vector containing the name of gene sets used to annotate average loadings
updateNote<- : A character vector. Describes the main feature of a model construction
RAVindex : Equivalent to assays(x)$RAVindex
geneSets : Access the metadata(x)$geneSets
slot
updateNote : Access the metadata(x)$updateNote
slot
version : Access the metadata(x)$version
slot
data(miniRAVmodel) miniRAVmodel
data(miniRAVmodel) miniRAVmodel
Download a PCAGenomicSignatures model
getModel(prior = c("C2", "PLIERpriors"), version = "latest", load = TRUE)
getModel(prior = c("C2", "PLIERpriors"), version = "latest", load = TRUE)
prior |
The name of gene sets used to annotate PCAGenomicSignatures. Currently there are two available options.
|
version |
Default is |
load |
Default is |
File cache location or PCAGenomicSignatures object loaded from it.
z = getModel("C2")
z = getModel("C2")
Extract information on a specific RAV
getRAVInfo(RAVmodel, ind)
getRAVInfo(RAVmodel, ind)
RAVmodel |
A PCAGenomicSignatures object |
ind |
An index of RAV |
A list with four elements: clusterSize, silhouetteWidth, enrichedPathways (the number of enriched pathways), and members. The 'members' is the summary table of PCs in RAV, containing three columns: studyName, PC, and Variance explained (
data(miniRAVmodel) getRAVInfo(miniRAVmodel, ind = 438)
data(miniRAVmodel) getRAVInfo(miniRAVmodel, ind = 438)
Extract information on a specific training dataset
getStudyInfo(RAVmodel, study)
getStudyInfo(RAVmodel, study)
RAVmodel |
A PCAGenomicSignatures object |
study |
A character for SRA study accession. |
A list with three elements: studyTitle, studySize (the number of samples from this study used in the RAVmodel building), and RAVs. 'RAVs' is a data frame with three columns - PC (1 to 20), RAV (RAV that the given PC belongs to), and Variance explained ( miniRAVmodel, which doesn't have all the PCA summary information, so the example will return only the two PCs of the study instead of all twenty.
data(miniRAVmodel) getStudyInfo(miniRAVmodel, "SRP028155")
data(miniRAVmodel) getStudyInfo(miniRAVmodel, "SRP028155")
This function subsets validate
outputs with different criteria
and visualize it in a heatmap-like table.
heatmapTable( val_all, RAVmodel, ind = NULL, num.out = 5, scoreCutoff = NULL, swCutoff = NULL, clsizeCutoff = NULL, breaks = c(0, 0.5, 1), colors = c("white", "white smoke", "red"), column_title = NULL, row_title = NULL, whichPC = NULL, filterMessage = TRUE, ... )
heatmapTable( val_all, RAVmodel, ind = NULL, num.out = 5, scoreCutoff = NULL, swCutoff = NULL, clsizeCutoff = NULL, breaks = c(0, 0.5, 1), colors = c("white", "white smoke", "red"), column_title = NULL, row_title = NULL, whichPC = NULL, filterMessage = TRUE, ... )
val_all |
An output matrix from |
RAVmodel |
PCAGenomicSignatures-class object. RAVmodel used to prepare
|
ind |
An integer vector. If this parameter is provided, the other
parameters, |
num.out |
A number of highly validated RAVs to output. Default is 5.
If any of the cutoff parameters are provided, |
scoreCutoff |
A numeric value for the minimum correlation (not include).
If |
swCutoff |
A numeric value for the minimum average silhouette width. |
clsizeCutoff |
A integer value for the minimum cluster size. |
breaks |
A numeric vector of length 3. Number represents the values
assigned to three colors. Default is |
colors |
A character vector of length 3. Each represents the color
assigned to three breaks. Default is |
column_title |
A character string. Provide the column title. |
row_title |
A character string. Provide the row title. |
whichPC |
An integer value between 1 and 8. PC number of your data to
check the validated signatures with. Under the default ( |
filterMessage |
A logical. Under the default |
... |
any additional argument for |
A heatmap displaying the subset of the validation result that met the
given cutoff criteria. If val_all
input is from a single dataset, the
output heatmap will contain both score and average silhouette width for each
cluster.
If val_all
input is from multiple studies, the output heatmap's rows
will represent each study and the columns will be RAVs, which meet
scoreCutoff
for any of the input studies.
data(miniRAVmodel) library(bcellViper) data(bcellViper) ## Single dataset val_all <- validate(dset, miniRAVmodel) heatmapTable(val_all, miniRAVmodel, swCutoff = 0) ## A list of datasets val_all2 <- validate(miniTCGA, miniRAVmodel) heatmapTable(val_all2, miniRAVmodel)
data(miniRAVmodel) library(bcellViper) data(bcellViper) ## Single dataset val_all <- validate(dset, miniRAVmodel) heatmapTable(val_all, miniRAVmodel, swCutoff = 0) ## A list of datasets val_all2 <- validate(miniTCGA, miniRAVmodel) heatmapTable(val_all2, miniRAVmodel)
Build a two-column word/frequency table
meshTable( RAVmodel, ind, rm.noise = NULL, weighted = TRUE, filterMessage = TRUE )
meshTable( RAVmodel, ind, rm.noise = NULL, weighted = TRUE, filterMessage = TRUE )
RAVmodel |
A PCAGenomicSignatures object |
ind |
An index of RAV |
rm.noise |
An integer. Under the default ( |
weighted |
A logical. If |
filterMessage |
A logical. Under the default |
A table with two columns, word
and freq
. MeSH terms in
the defined RAV (by ind
argument) is ordered based on their frequency.
data(miniRAVmodel) meshTable(miniRAVmodel,1139)
data(miniRAVmodel) meshTable(miniRAVmodel,1139)
Eight colorectal cancer microarray datasets were used to build RAVmodel and
the intermediate file containing genes and top PCs from each dataset is named
as allZ
. Hierarchical clustering result of allZ
is saved as
res_hcut
. For demonstration, we subset the allZ
matrix with the
first 100 genes, which is named as miniAllZ
.
miniAllZ
miniAllZ
A matrix with 100 genes and 160 PCs from 8 training datasets.
Sehyun Oh [email protected]
https://github.com/shbrief/model_building/tree/main/RAVmodel_8CRC
A object providing a miniature version of RAVmodel_C2 (PCAGenomicSignatures object constructed from 536 studies and annotated with MSigDB C2).
miniRAVmodel
miniRAVmodel
PCAGenomicSignatures
Sehyun Oh [email protected]
TCGA-COAD and TCGA-BRCA RNA sequencing data were acquired using
GSEABenchmarkeR::loadEData
and log-transformed. Conversion from
EntrezID to gene symbol was done with EnrichmentBrowser::idMap
. Only
8 samples from each dataset are kept.
miniTCGA
miniTCGA
A list containing two SummarizedExperiment objects.
Sehyun Oh [email protected]
PCAGenomicSignatures
objectThe default contents of PCAGenomicSignatures
object, with
a set of accessors and setter generic functions, which extract either the
assay
, colData
, metadata
, or trainingData
slots
of a PCAGenomicSignatures-class
object. When you create this
object, colData$studies
should be populated before adding any
information in trainingData
slot
PCAGenomicSignatures(..., trainingData)
PCAGenomicSignatures(..., trainingData)
... |
Additional arguments for supporting functions. |
trainingData |
A |
RAVindex(x) : RAVindex (= avgLoadings) containing genes x RAVs
metadata(x)$cluster : A vector of integers (from 1:k) indicating the cluster to which each point is allocated.
metadata(x)$size : The number of PCs in each cluster.
metadata(x)$k : The number of RAVs.
metadata(x)$n : The number of top PCs from each dataset.
metadata(x)$geneSets : Name of the prior gene sets used to annotate average loadings.
colData(x)$studies : A list of character vectors containing studies contributing to each PC cluster.
colData(x)$silhouetteWidth : A numeric array of average silhouette widths of each clusters
colData(x)$gsea : A list of data frames. Each element is a subset
of outputs from clusterProfiler::GSEA
function.
PCAGenomicSignatures object with multiple setters or accessors
trainingData
A DataFrame
class object for
metadata associated with training data
Setter method values (i.e., function(x) <- value
):
geneSets<- : A character vector containing the name of gene sets used to annotate average loadings
studies<- : A list of character vectors containing gene sets used to annotate average loadings
gsea<- : A list of data frames. Each element is a subset of output
from gseaResult
objects.
metadata<- : A list
object of metadata
'$<-' : A vector to replace the indicated column in colData
All the accessors inherited from SummarizedExperiment
are available
and the additional accessors for PCAGenomicSignatures
specific data
are listed below.
RAVindex : Equivalent to the assay(x)
geneSets : Access the metadata(x)$geneSets
slot
studies : Access the colData(x)$studies
slot
gsea : Access the colData(x)$gsea
'$' : Access a column in colData
trainingData : Access the trainingData
slot
mesh : Access the trainingData(x)$MeSH
slot
PCAsummary : Access the trainingData(x)$PCAsummary
slot
data(miniRAVmodel) miniRAVmodel
data(miniRAVmodel) miniRAVmodel
PCA-based GenomicSignatures-class
.
x |
A |
value |
See details. |
PCAGenomicSignatures
trainingData
A DataFrame
class object for
metadata associated with training data
data(miniRAVmodel) miniRAVmodel
data(miniRAVmodel) miniRAVmodel
PCAGenomicSignatures
objectThe default contents of PCAGenomicSignatures
object, with
a set of accessor and setter generic functions, which extract either the
assay
, colData
, metadata
, or trainingData
slots
of a PCAGenomicSignatures-class
object. When you create this
object, colData$studies
should be populated before adding any
information in trainingData
slot
## S4 replacement method for signature 'PCAGenomicSignatures' studies(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' silhouetteWidth(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' gsea(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' trainingData(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' mesh(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' PCAsummary(x) <- value ## S4 method for signature 'PCAGenomicSignatures' studies(x) ## S4 method for signature 'PCAGenomicSignatures' silhouetteWidth(x) ## S4 method for signature 'PCAGenomicSignatures' gsea(x) ## S4 method for signature 'PCAGenomicSignatures' trainingData(x) ## S4 method for signature 'PCAGenomicSignatures' mesh(x) ## S4 method for signature 'PCAGenomicSignatures' PCAsummary(x) ## S4 method for signature 'PCAGenomicSignatures' show(object)
## S4 replacement method for signature 'PCAGenomicSignatures' studies(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' silhouetteWidth(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' gsea(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' trainingData(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' mesh(x) <- value ## S4 replacement method for signature 'PCAGenomicSignatures' PCAsummary(x) <- value ## S4 method for signature 'PCAGenomicSignatures' studies(x) ## S4 method for signature 'PCAGenomicSignatures' silhouetteWidth(x) ## S4 method for signature 'PCAGenomicSignatures' gsea(x) ## S4 method for signature 'PCAGenomicSignatures' trainingData(x) ## S4 method for signature 'PCAGenomicSignatures' mesh(x) ## S4 method for signature 'PCAGenomicSignatures' PCAsummary(x) ## S4 method for signature 'PCAGenomicSignatures' show(object)
value |
See details. |
object , x
|
A |
RAVindex(x) : RAVindex (= avgLoadings) containing genes x RAVs
metadata(x)$cluster : A vector of integers (from 1:k) indicating the cluster to which each PC is allocated.
metadata(x)$size : The number of PCs in each cluster.
metadata(x)$k : The number of RAVs.
metadata(x)$n : The number of top PCs from each dataset.
metadata(x)$geneSets : Name of the prior gene sets used to annotate average loadings.
colData(x)$studies : A list of character vectors containing studies contributing to each PC cluster.
colData(x)$gsea : A list of data frames. Each element is a subset
of outputs from clusterProfiler::GSEA
function.
PCAGenomicSignatures object with multiple setters or accessors
trainingData
A DataFrame
class object for
metadata associated with training data
Setter method values (i.e., function(x) <- value
):
geneSets<- : A character vector containing the name of gene sets used to annotate average loadings
studies<- : A list of character vectors containing gene sets used to annotate average loadings
gsea<- : A list of gseaResult
objects.
metadata<- : A list
object of metadata
'$<-' : A vector to replace the indicated column in colData
All the accessors inherited from SummarizedExperiment
are available
and the additional accessors for PCAGenomicSignatures
specific data
are listed below.
RAVindex : Equivalent to the assay(x)
geneSets : Access the metadata(x)$geneSets
slot
studies : Access the colData(x)$studies
slot
gsea : Access the colData(x)$gsea
'$' : Access a column in colData
trainingData : Access the trainingData
slot
mesh : Access the trainingData(x)$MeSH
slot
PCAsummary : Access the trainingData(x)$PCAsummary
slot
data(miniRAVmodel) miniRAVmodel
data(miniRAVmodel) miniRAVmodel
A RAV model contain clusters of PCs from individual studies. This function extracts the names of the original PCs from the RAV model given the index in the RAV model.
PCinRAV(RAVmodel, ind)
PCinRAV(RAVmodel, ind)
RAVmodel |
A PCAGenomicSignatures object |
ind |
An index of RAV |
A character vector of PC/study names
data(miniRAVmodel) PCinRAV(miniRAVmodel,695)
data(miniRAVmodel) PCinRAV(miniRAVmodel,695)
Two-dimensional PCA plot with the PC annotation
plotAnnotatedPCA( dataset, RAVmodel, PCnum, val_all = NULL, scoreCutoff = 0.5, nesCutoff = NULL, color_by = NULL, color_lab = NULL, trimed_pathway_len = 45 )
plotAnnotatedPCA( dataset, RAVmodel, PCnum, val_all = NULL, scoreCutoff = 0.5, nesCutoff = NULL, color_by = NULL, color_lab = NULL, trimed_pathway_len = 45 )
dataset |
A gene expression profile to be validated. Different classes of objects can be used including ExpressionSet, SummarizedExperiment, RangedSummarizedExperiment, or matrix. Rownames (genes) should be in symbol format. If it is a matrix, genes should be in rows and samples in columns. |
RAVmodel |
PCAGenomicSignatures-class object |
PCnum |
A numeric vector length of 2. The values should be between 1 and 8. |
val_all |
The output from |
scoreCutoff |
A numeric value for the minimum correlation. Default 0.5. |
nesCutoff |
A numeric value for the minimum NES. Default is |
color_by |
A named vector with the feature you want to color by. Name should be match with the sample names of the dataset. |
color_lab |
A name for color legend. If this argument is not provided, the color legend will be labeled as "Color By" by default. |
trimed_pathway_len |
Positive inter values, which is the display width of pathway names. Default is 45. |
Scatter plot and the table with annotation. If enriched pathway
didn't pass the scoreCutoff
the table will be labeled as "No
significant pathways". If any enriched pathway didn't pass the
nesCutoff
, it will labeled as NA.
data(miniRAVmodel) library(bcellViper) data(bcellViper) ## Not run: plotAnnotatedPCA(exprs(dset), miniRAVmodel, PCnum = c(1,2)) ## End(Not run)
data(miniRAVmodel) library(bcellViper) data(bcellViper) ## Not run: plotAnnotatedPCA(exprs(dset), miniRAVmodel, PCnum = c(1,2)) ## End(Not run)
There are three main information on the graph:
x-axis : Pearson correlation coefficient. Higher value means that test dataset and RAV is more tightly associated with.
y-axis : Silhouette width representing the quality of RAVs.
size : The number of studies in each RAV. (= cluster size)
color : Test dataset's PC number that validate each RAV. Because we used top 8 PCs of the test dataset, there are 8 categories.
plotValidate( val_all, minClusterSize = 2, swFilter = FALSE, minSilhouetteWidth = 0, interactive = FALSE, minClSize = NULL, maxClSize = NULL, colorPalette = "Dark2" )
plotValidate( val_all, minClusterSize = 2, swFilter = FALSE, minSilhouetteWidth = 0, interactive = FALSE, minClSize = NULL, maxClSize = NULL, colorPalette = "Dark2" )
val_all |
Output from validate function. |
minClusterSize |
The minimum size of clusters to be included in the plotting. Default value is 2, so any single-element clusters are excluded. |
swFilter |
If |
minSilhouetteWidth |
A minimum average silhouette width to be plotted.
Only effective under |
interactive |
If set to |
minClSize |
The minimum number of PCs in the clusters you want. |
maxClSize |
The maximum number of PCs in the clusters you want. |
colorPalette |
Default is |
a ggplot object
data(miniRAVmodel) library(bcellViper) data(bcellViper) val_all <- validate(dset, miniRAVmodel) plotValidate(val_all)
data(miniRAVmodel) library(bcellViper) data(bcellViper) val_all <- validate(dset, miniRAVmodel) plotValidate(val_all)
Eight colorectal cancer microarray datasets were used to build RAVmodel and
the intermediate file containing genes and top PCs from each dataset is named
as allZ
. Hierarchical clustering result of allZ
is saved as
res_hcut
.
res_hcut
res_hcut
hclust
object from factoextra::hcut
function.
Sehyun Oh [email protected]
Remove rows with missing and Inf values from a matrix
rmNaInf(x)
rmNaInf(x)
x |
A numeric matrix. |
The updated input matrix where rows with NA and Inf values are removed.
m = matrix(rnorm(100),ncol=10) m[1,1] = NA m1 = rmNaInf(m) dim(m1)
m = matrix(rnorm(100),ncol=10) m[1,1] = NA m1 = rmNaInf(m) dim(m1)
Plot heatmap of the sample scores
sampleScoreHeatmap( score, dataName, modelName, cluster_rows = TRUE, cluster_columns = TRUE, show_row_names = TRUE, show_column_names = TRUE, row_names_gp = 0.7, column_names_gp = 5, ... )
sampleScoreHeatmap( score, dataName, modelName, cluster_rows = TRUE, cluster_columns = TRUE, show_row_names = TRUE, show_column_names = TRUE, row_names_gp = 0.7, column_names_gp = 5, ... )
score |
An output from |
dataName |
Title on the row. The name of the dataset to be scored. |
modelName |
Title on the column. The RAVmodel used for scoring. |
cluster_rows |
A logical. Under the default ( |
cluster_columns |
A logical. Under the default ( |
show_row_names |
Whether show row names. Default is |
show_column_names |
Whether show column names. Default is |
row_names_gp |
Graphic parameters for row names. The default is 0.7. |
column_names_gp |
Graphic parameters for column names. The default is 5. |
... |
Any additional argument for |
A heatmap of the sample score. Rows represent samples and columns represent RAVs.
data(miniRAVmodel) library(bcellViper) data(bcellViper) score <- calculateScore(dset, miniRAVmodel) sampleScoreHeatmap(score, dataName="bcellViper", modelName="miniRAVmodel")
data(miniRAVmodel) library(bcellViper) data(bcellViper) score <- calculateScore(dset, miniRAVmodel) sampleScoreHeatmap(score, dataName="bcellViper", modelName="miniRAVmodel")
Subset enriched pathways of RAV
subsetEnrichedPathways( RAVmodel, ind = NULL, n = 10, both = FALSE, include_nes = FALSE )
subsetEnrichedPathways( RAVmodel, ind = NULL, n = 10, both = FALSE, include_nes = FALSE )
RAVmodel |
PCAGenomicSignatures object. Also an output from
|
ind |
A numeric vector containing the RAV number you want to check enriched pathways. If not specified, this function returns results from all the RAVs. |
n |
The number of top and bottom pathways to be selected based on normalized enrichment score (NES). |
both |
Default is |
include_nes |
Defalt is |
A DataFrame with top and bottom n
pathways from the
enrichment results.
data(miniRAVmodel) # all RAVS in model subsetEnrichedPathways(miniRAVmodel,n=5) # only a specific RAV (note the colnames above) subsetEnrichedPathways(miniRAVmodel,ind=695,n=5)
data(miniRAVmodel) # all RAVS in model subsetEnrichedPathways(miniRAVmodel,n=5) # only a specific RAV (note the colnames above) subsetEnrichedPathways(miniRAVmodel,ind=695,n=5)
Validate new datasets
validate( dataset, RAVmodel, method = "pearson", maxFrom = "PC", level = "max", scale = FALSE )
validate( dataset, RAVmodel, method = "pearson", maxFrom = "PC", level = "max", scale = FALSE )
dataset |
Single or a named list of SummarizedExperiment (RangedSummarizedExperiment, ExpressionSet or matrix) object(s). Gene names should be in 'symbol' format. Currently, each dataset should have at least 8 samples. |
RAVmodel |
PCAGenomicSignatures object. |
method |
A character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated. |
maxFrom |
Select whether to display the maximum value from dataset's PCs
or avgLoadings. Under the default ( |
level |
Output format of validated result. Two options are available:
|
scale |
Default is |
A data frame containing the maximum pearson correlation coefficient
between the top 8 PCs of the dataset and pre-calculated average loadings
(in row) of training datasets (score
column). It also contains other
metadata associated with each RAV: PC
for one of the top 8 PCs of the
dataset that results in the given score
, sw
for the average
silhouette width of the RAV, cl_size
for the size of each RAV.
If the input for dataset
argument is a list of different datasets,
each row of the output represents a new dataset for test, and each column
represents clusters from training datasets. If level = "all"
, a list
containing the matrices of the pearson correlation coefficient between all
top 8 PCs of the datasets and avgLoading.
data(miniRAVmodel) library(bcellViper) data(bcellViper) validate(dset, miniRAVmodel) validate(dset, miniRAVmodel, maxFrom = "avgLoading")
data(miniRAVmodel) library(bcellViper) data(bcellViper) validate(dset, miniRAVmodel) validate(dset, miniRAVmodel, maxFrom = "avgLoading")
Validation result in data frame
validatedSignatures( val_all, RAVmodel, num.out = 5, scoreCutoff = NULL, swCutoff = NULL, clsizeCutoff = NULL, indexOnly = FALSE, whichPC = NULL, filterMessage = TRUE )
validatedSignatures( val_all, RAVmodel, num.out = 5, scoreCutoff = NULL, swCutoff = NULL, clsizeCutoff = NULL, indexOnly = FALSE, whichPC = NULL, filterMessage = TRUE )
val_all |
An output matrix from |
RAVmodel |
PCAGenomicSignatures-class object. RAVmodel used to prepare
|
num.out |
A number of highly validated RAVs to output. Default is 5.
If any of the cutoff parameters are provided, |
scoreCutoff |
A numeric value for the minimum correlation. For multi-studies case, the default is 0.7. |
swCutoff |
A numeric value for the minimum average silhouette width. |
clsizeCutoff |
An integer value for the minimum cluster size. |
indexOnly |
A logical. Under the default (= |
whichPC |
An integer value between 1 and 8. PC number of your data to
check the validated signatures with. Under the default ( |
filterMessage |
A logical. Under the default |
A subset of the input matrix, which meets the given condition.
data(miniRAVmodel) library(bcellViper) data(bcellViper) val_all <- validate(dset, miniRAVmodel) validatedSignatures(val_all, miniRAVmodel, num.out = 3, scoreCutoff = 0)
data(miniRAVmodel) library(bcellViper) data(bcellViper) val_all <- validate(dset, miniRAVmodel) validatedSignatures(val_all, miniRAVmodel, num.out = 3, scoreCutoff = 0)