Title: | Integrative Pathway Analysis with Modern PCA Methodology and Gene Selection |
---|---|
Description: | pathwayPCA is an integrative analysis tool that implements the principal component analysis (PCA) based pathway analysis approaches described in Chen et al. (2008), Chen et al. (2010), and Chen (2011). pathwayPCA allows users to: (1) Test pathway association with binary, continuous, or survival phenotypes. (2) Extract relevant genes in the pathways using the SuperPCA and AES-PCA approaches. (3) Compute principal components (PCs) based on the selected genes. These estimated latent variables represent pathway activities for individual subjects, which can then be used to perform integrative pathway analysis, such as multi-omics analysis. (4) Extract relevant genes that drive pathway significance as well as data corresponding to these relevant genes for additional in-depth analysis. (5) Perform analyses with enhanced computational efficiency with parallel computing and enhanced data safety with S4-class data objects. (6) Analyze studies with complex experimental designs, with multiple covariates, and with interaction effects, e.g., testing whether pathway association with clinical phenotype is different between male and female subjects. Citations: Chen et al. (2008) <https://doi.org/10.1093/bioinformatics/btn458>; Chen et al. (2010) <https://doi.org/10.1002/gepi.20532>; and Chen (2011) <https://doi.org/10.2202/1544-6115.1697>. |
Authors: | Gabriel Odom [aut, cre], James Ban [aut], Lizhong Liu [aut], Lily Wang [aut], Steven Chen [aut] |
Maintainer: | Gabriel Odom <[email protected]> |
License: | GPL-3 |
Version: | 1.23.0 |
Built: | 2024-11-29 06:59:16 UTC |
Source: | https://github.com/bioc/pathwayPCA |
A function to perform adaptive, elastic-net, sparse principal component analysis (AES-PCA).
aespca(X, d = 1, max.iter = 10, eps.conv = 0.001, adaptive = TRUE, para = NULL)
aespca(X, d = 1, max.iter = 10, eps.conv = 0.001, adaptive = TRUE, para = NULL)
X |
A pathway design matrix: the data matrix should be |
d |
The number of principal components (PCs) to extract from the pathway. Defaults to 1. |
max.iter |
The maximum number of times an internal |
eps.conv |
A numerical convergence threshold for the same |
adaptive |
Internal argument of the |
para |
Internal argument of the |
This function calculates the loadings and reduced-dimension
predictor matrix using both the Singular Value Decomposition and AES-PCA
Decomposition (as described in Efron et al (2003)) of the data matrix.
Note that, if the number of features in the pathway exceeds the number of
samples, this decompostion will be an approximation; also, the internal
lars.lsa
function may require more computing time than usual
to converge (which is one of the reasons why, in practice, we usually
remove pathways that have more than 200-300 features).
See https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf.
For potential enhancement details, see the comment in the "Details"
section of normalize
.
A list of four elements containing the loadings and projected predictors:
aesLoad
: A projection matrix of the
AES-PCs.
oldLoad
: A projection matrix of the
PCs from the singular value decomposition (SVD).
aesScore
: An predictor matrix: the
original
observations loaded onto the
AES-PCs.
oldScore
: An predictor matrix: the
original
observations loaded onto the
SVD-PCs.
normalize
; lars.lsa
;
ExtractAESPCs
; AESPCA_pVals
# DO NOT CALL THIS FUNCTION DIRECTLY. # Call this function through AESPCA_pVals() instead. ## Not run: data("colonSurv_df") aespca(as.matrix(colonSurv_df[, 5:50])) ## End(Not run)
# DO NOT CALL THIS FUNCTION DIRECTLY. # Call this function through AESPCA_pVals() instead. ## Not run: data("colonSurv_df") aespca(as.matrix(colonSurv_df[, 5:50])) ## End(Not run)
Given a supervised OmicsPath
object (one of
OmicsSurv
, OmicsReg
, or OmicsCateg
), extract the
first adaptive, elastic-net, sparse principal components (PCs)
from each pathway-subset of the features in the -Omics assay design
matrix, test their association with the response matrix, and return a
data frame of the adjusted
-values for each pathway.
AESPCA_pVals( object, numPCs = 1, numReps = 0L, parallel = FALSE, numCores = NULL, asPCA = FALSE, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... ) ## S4 method for signature 'OmicsPathway' AESPCA_pVals( object, numPCs = 1, numReps = 1000, parallel = FALSE, numCores = NULL, asPCA = FALSE, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... )
AESPCA_pVals( object, numPCs = 1, numReps = 0L, parallel = FALSE, numCores = NULL, asPCA = FALSE, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... ) ## S4 method for signature 'OmicsPathway' AESPCA_pVals( object, numPCs = 1, numReps = 1000, parallel = FALSE, numCores = NULL, asPCA = FALSE, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... )
object |
An object of class |
numPCs |
The number of PCs to extract from each pathway. Defaults to 1. |
numReps |
How many permutations to estimate the |
parallel |
Should the computation be completed in parallel? Defaults to
|
numCores |
If |
asPCA |
Should the computation return the eigenvectors and eigenvalues
instead of the adaptive, elastic-net, sparse principal components and their
corresponding loadings. Defaults to |
adjustpValues |
Should you adjust the |
adjustment |
Character vector of procedures. The returned data frame
will be sorted in ascending order by the first procedure in this vector,
with ties broken by the unadjusted |
... |
Dots for additional internal arguments. |
This is a wrapper function for the ExtractAESPCs
,
PermTestSurv
, PermTestReg
, and
PermTestCateg
functions.
Please see our Quickstart Guide for this package: https://gabrielodom.github.io/pathwayPCA/articles/Supplement1-Quickstart_Guide.html
A results list with class aespcOut
. This list has three
components: a data frame of pathway details, pathway -values, and
potential adjustments to those values (
pVals_df
); a list of the
first numPCs
score vectors for each pathway (PCs_ls
);
and a list of the first numPCs
feature loading vectors for each
pathway (loadings_ls
). The -value data frame has columns:
pathways
: The names of the pathways in the Omics*
object (given in object@trimPathwayCollection$pathways
.)
setsize
: The number of genes in each of the original
pathways (given in the object@trimPathwayCollection$setsize
object).
n_tested
: The number of genes in each of the trimmed
pathways (given in the object@trimPathwayCollection$n_tested
object).
terms
: The pathway description, as given in the
object@trimPathwayCollection$TERMS
object.
rawp
: The unadjusted -values of each pathway.
...
: Additional columns of adjusted -values as
specified through the
adjustment
argument.
The data frame will be sorted in ascending order by the method specified
first in the adjustment
argument. If adjustpValues = FALSE
,
then the data frame will be sorted by the raw -values. If you have
the suggested
tidyverse
package suite loaded, then this data frame
will print as a tibble
. Otherwise, it will print as
a data frame.
CreateOmics
; ExtractAESPCs
;
PermTestSurv
; PermTestReg
;
PermTestCateg
; TabulatepValues
;
clusterApply
### Load the Example Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create an OmicsSurv Object ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "surv" ) ### Calculate Pathway p-Values ### colonSurv_aespc <- AESPCA_pVals( object = colon_Omics, numReps = 0, parallel = TRUE, numCores = 2, adjustpValues = TRUE, adjustment = c("Hoch", "SidakSD") )
### Load the Example Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create an OmicsSurv Object ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "surv" ) ### Calculate Pathway p-Values ### colonSurv_aespc <- AESPCA_pVals( object = colon_Omics, numReps = 0, parallel = TRUE, numCores = 2, adjustpValues = TRUE, adjustment = c("Hoch", "SidakSD") )
An example Canonical Pathways Gene Subset from the Broad
Institute: File: c2.cp.v6.0.symbols.gmt
.
data(colon_pathwayCollection)
data(colon_pathwayCollection)
A pathwayCollection
list of two elements:
pathways
: A list of 15 character vectors. Each vector
contains the names of the individual genes within that pathway as a
vector of character strings.
TERMS
: A character vector of length 15 containing the
names of the gene pathways.
This is a subset of 15 pathways from the Broad Institute pathways
list. This subset contains seven pathways which are related to the response
information in the colonSurv_df
data file.
http://software.broadinstitute.org/gsea/msigdb/collections.jsp
Subset of a colon cancer survival data set, with subject response and assay values.
data(colonSurv_df)
data(colonSurv_df)
A subset of a data frame containing 656 of 2022 genes measured on
250 subjects. The first two columns are the Overall Survival time
(OS_time
) and death indicator (OS_event
).
GEO GSE17538 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17538
Check if any or all of the elements of a short atomic vector are contained within a supplied long atomic vector.
Contains(long, short, matches = c("any", "all"), partial = FALSE)
Contains(long, short, matches = c("any", "all"), partial = FALSE)
long |
A vector to possibly containing any or all elements of
|
short |
A short vector or scalar, some elements of which may be
contained in |
matches |
Should partial set matching of |
partial |
Should partial string matching be allowed? Defaults to
|
This is a helper function to find out if a gene symbol or some similar character string (or character vector) is contained in a pathway. Currently, this function uses base R, but we can write it in a compiled language (such as C++) to increase speed later.
For partial matching (partial = TRUE
), long
must be an
atomic vector of type character, short
must be an atomic scalar (a
vector with length of 1) of type character, and matches
should be
set to "any"
. Because this function is designed to match gene
symbols or CpG locations, we care if the symbol or location starts with
the string supplied. For example, if we set short = "PIK"
, then we
want to find if any of the gene symbols in the supplied long
vector belong to the PIK gene family. We don't care if this string appears
elsewhere in a gene symbol.
A logical scalar. If matches = "any"
, this indicates if any
of the elements of short
are contained in long
. If
matches = "all"
, this indicates if all of the elements of
short
are contained in long
. If partial = TRUE
, the
returned logical indicates whether or not any of the character strings in
long
start with the character scalar supplied to
short
.
Contains(1:10, 8) Contains(LETTERS, c("A", "!"), matches = "any") Contains(LETTERS, c("A", "!"), matches = "all") genesPI <- c( "PI4K2A", "PI4K2B", "PI4KA", "PI4KB", "PIK3C2A", "PIK3C2B", "PIK3C2G", "PIK3C3", "PIK3CA", "PIK3CB", "PIK3CD", "PIK3CG", "PIK3R1", "PIK3R2", "PIK3R3", "PIK3R4", "PIK3R5", "PIK3R6", "PIKFYVE", "PIP4K2A", "PIP4K2B", "PIP5K1B", "PIP5K1C", "PITPNB" ) Contains(genesPI, "PIK3", partial = TRUE)
Contains(1:10, 8) Contains(LETTERS, c("A", "!"), matches = "any") Contains(LETTERS, c("A", "!"), matches = "all") genesPI <- c( "PI4K2A", "PI4K2B", "PI4KA", "PI4KB", "PIK3C2A", "PIK3C2B", "PIK3C2G", "PIK3C3", "PIK3CA", "PIK3CB", "PIK3CD", "PIK3CG", "PIK3R1", "PIK3R2", "PIK3R3", "PIK3R4", "PIK3R5", "PIK3R6", "PIKFYVE", "PIP4K2A", "PIP4K2B", "PIP5K1B", "PIP5K1C", "PITPNB" ) Contains(genesPI, "PIK3", partial = TRUE)
-Omics*
-class objectsThis function calls the CreateOmicsPath
,
CreateOmicsSurv
, CreateOmicsReg
, and
CreateOmicsCateg
functions to create valid objects of the
classes OmicsPathway
, OmicsSurv
, OmicsReg
, or
OmicsCateg
, respectively.
CreateOmics( assayData_df, pathwayCollection_ls, response = NULL, respType = c("none", "survival", "regression", "categorical"), centerScale = c(TRUE, TRUE), minPathSize = 3, ... )
CreateOmics( assayData_df, pathwayCollection_ls, response = NULL, respType = c("none", "survival", "regression", "categorical"), centerScale = c(TRUE, TRUE), minPathSize = 3, ... )
assayData_df |
An |
pathwayCollection_ls |
A
If your gene pathways list is stored in a |
response |
An optional response object. See "Details" for more
information. Defaults to |
respType |
What type of response has been supplied. Options are
|
centerScale |
Should the values in |
minPathSize |
What is the smallest number of genes allowed in each pathway? Defaults to 3. |
... |
Dots for additional arguments passed to the internal
|
This function is a wrapper around the four CreateOmics*
functions. The values supplied to the response
function argument
can be in a list, data frame, matrix, vector, Surv
object, or any class which extends these. Because this function makes
"best guess" type conversions based on the respType
argument, this
argument is mandatory if response
is non-NULL
. Further, it
is the responsibility of the user to ensure that the coerced response
contained in the resulting Omics
object accurately reflects the
supplied response.
For respType = "survival"
, response
is assumed to be ordered
by event time, then event indicator. For example, if the response is a
data frame or matrix, this function assumes that the first column is the
time and the second column the death indicator. If the response is a list,
then this function assumes that the first entry in the list is the event
time and the second entry the death indicator. The death indicator must
be a logical or binary (0-1) vector, where 1 or TRUE
represents a
death and 0 or FALSE
represents right-censoring.
Some of the pathways in the supplied pathways list will be removed, or
"trimmed", during object creation. For the pathway-testing methods, these
trimmed pathways will have -values given as
NA
. For an
explanation of pathway trimming, see the documentation for the
IntersectOmicsPwyCollct
function.
A valid object of class OmicsPathway
, OmicsSurv
,
OmicsReg
, or OmicsCateg
.
OmicsPathway
,
CreateOmicsPath
,
OmicsSurv
,
CreateOmicsSurv
,
OmicsCateg
,
CreateOmicsCateg
OmicsReg
,
CreateOmicsReg
,
CheckAssay
,
CheckPwyColl
, and
IntersectOmicsPwyCollct
### Load the Example Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create an OmicsPathway Object ### colon_OmicsPath <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection ) ### Create an OmicsSurv Object ### colon_OmicsSurv <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "surv" ) ### Create an OmicsReg Object ### colon_OmicsReg <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:2], respType = "reg" ) ### Create an OmicsCateg Object ### colon_OmicsCateg <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, c(1,3)], respType = "cat" )
### Load the Example Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create an OmicsPathway Object ### colon_OmicsPath <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection ) ### Create an OmicsSurv Object ### colon_OmicsSurv <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "surv" ) ### Create an OmicsReg Object ### colon_OmicsReg <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:2], respType = "reg" ) ### Create an OmicsCateg Object ### colon_OmicsCateg <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, c(1,3)], respType = "cat" )
-Omics*
-class objectsThese functions create valid objects of class OmicsPathway
,
OmicsSurv
, OmicsReg
, or OmicsCateg
.
CreateOmicsPath(assayData_df, sampleIDs_char, pathwayCollection_ls) CreateOmicsSurv( assayData_df, sampleIDs_char, pathwayCollection_ls, eventTime_num, eventObserved_lgl ) CreateOmicsReg( assayData_df, sampleIDs_char, pathwayCollection_ls, response_num ) CreateOmicsCateg( assayData_df, sampleIDs_char, pathwayCollection_ls, response_fact )
CreateOmicsPath(assayData_df, sampleIDs_char, pathwayCollection_ls) CreateOmicsSurv( assayData_df, sampleIDs_char, pathwayCollection_ls, eventTime_num, eventObserved_lgl ) CreateOmicsReg( assayData_df, sampleIDs_char, pathwayCollection_ls, response_num ) CreateOmicsCateg( assayData_df, sampleIDs_char, pathwayCollection_ls, response_fact )
assayData_df |
An |
sampleIDs_char |
A character vector with the N sample names. |
pathwayCollection_ls |
A
|
eventTime_num |
A |
eventObserved_lgl |
A |
response_num |
A |
response_fact |
A |
Please note that the classes of the parameters are not
flexible. The -Omics assay data must be or extend the class
data.frame
, and the response values (for a survival-, regression-,
or categorical-response object) must match their expected classes
exactly. The reason for this is to encourage the end user to pay
attention to the quality and format of their input data. Because the
functions internal to this package have only been tested on the classes
described in the Arguments section, these class checks prevent unexpected
errors (or worse, incorrect computational results without an error). These
draconian input class restrictions protect the accuracy of your data
analysis.
A valid object of class OmicsPathway
, OmicsSurv
,
OmicsReg
, or OmicsCateg
.
Valid OmicsPathway
objects will have no response information, just the
mass spectrometry or bio-assay ("design") matrix and the pathway list.
OmicsPathway
objects should be created only when unsupervised
pathway extraction is needed (not possible with Supervised PCA). Because of
the missing response, no pathway testing can be performed on an
OmicsPathway
object.
Valid OmicsSurv
objects will have two response vectors: a vector of
the most recently recorded follow-up times and a logical vector if that
time marks an event (TRUE
: observed event; FALSE
: right-
censored observation).
Valid OmicsReg
and OmicsCateg
objects with have one response
vector of continuous (numeric
) or categorial (factor
)
observations, respectively.
OmicsPathway
,
OmicsSurv
,
OmicsReg
, and
OmicsCateg
# DO NOT CALL THESE FUNCTIONS DIRECTLY. USE CreateOmics() INSTEAD. data("colon_pathwayCollection") data("colonSurv_df") ## Not run: CreateOmicsPath( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection ) CreateOmicsSurv( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection, eventTime_num = colonSurv_df$OS_time, eventObserved_lgl = as.logical(colonSurv_df$OS_event) ) CreateOmicsReg( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection, response_num = colonSurv_df$OS_time ) CreateOmicsCateg( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection, response_fact = as.factor(colonSurv_df$OS_event) ) ## End(Not run)
# DO NOT CALL THESE FUNCTIONS DIRECTLY. USE CreateOmics() INSTEAD. data("colon_pathwayCollection") data("colonSurv_df") ## Not run: CreateOmicsPath( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection ) CreateOmicsSurv( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection, eventTime_num = colonSurv_df$OS_time, eventObserved_lgl = as.logical(colonSurv_df$OS_event) ) CreateOmicsReg( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection, response_num = colonSurv_df$OS_time ) CreateOmicsCateg( assayData_df = colonSurv_df[, -(1:3)], sampleIDs_char = colonSurv_df$sampleID, pathwayCollection_ls = colon_pathwayCollection, response_fact = as.factor(colonSurv_df$OS_event) ) ## End(Not run)
pathwayCollection
-class Object.Manually create a pathwayCollection
list similar to the
output of the read_gmt
function.
CreatePathwayCollection( sets_ls, TERMS, setType = c("pathways", "genes", "regions"), ... )
CreatePathwayCollection( sets_ls, TERMS, setType = c("pathways", "genes", "regions"), ... )
sets_ls |
A named list of character vectors. Each vector should contain
the names of the individual genes, proteins, sits, or CpGs within that
set as a vector of character strings. If you create this pathway
collection to integrate with data of class |
TERMS |
A character vector the same length as the |
setType |
What is the type of the set: pathway set of gene, gene sites
in RNA or DNA, or regions of CpGs. Defaults to |
... |
Additional vectors or data components related to the
|
This function checks the set list and set term inputs and then
creates a pathwayCollection
object from them. Pass additional
list elements (such as the description
of each set) using the
form tag = value
through the ...
argument (as in the
list
function). Because some functions in the
pathwayPCA
package add and edit elements of pathwayCollection
objects, please do not create pathwayCollection
list items named
setsize
or n_tested
.
A list object with class pathwayCollection
.
data("colon_pathwayCollection") CreatePathwayCollection( sets_ls = colon_pathwayCollection$pathways, TERMS = colon_pathwayCollection$TERMS )
data("colon_pathwayCollection") CreatePathwayCollection( sets_ls = colon_pathwayCollection$pathways, TERMS = colon_pathwayCollection$TERMS )
superpcOut
- or aespcOut
-class
Object.Given an object of class aespcOut
or superpcOut
,
as returned by the functions AESPCA_pVals
or
SuperPCA_pVals
, respectively, and the name or unique ID of
a pathway, return a data frame of the principal components and a data
frame of the loading vectors corresponding to that pathway.
getPathPCLs(pcOut, pathway_char, ...) ## S3 method for class 'superpcOut' getPathPCLs(pcOut, pathway_char, ...) ## S3 method for class 'aespcOut' getPathPCLs(pcOut, pathway_char, ...)
getPathPCLs(pcOut, pathway_char, ...) ## S3 method for class 'superpcOut' getPathPCLs(pcOut, pathway_char, ...) ## S3 method for class 'aespcOut' getPathPCLs(pcOut, pathway_char, ...)
pcOut |
An object of classes |
pathway_char |
A character string of the name or unique identifier of a pathway |
... |
Dots for additional arguments (currently unused). |
Match the supplied pathway character string to either the
pathways
or terms
columns of the pVals_df
data frame
within the pcOut
object. Then, subset the loadings_ls
and
PCs_ls
lists for their entries which match the supplied pathway.
Finally, return a list of the PCs, loadings, and the pathway ID and name.
A list of four elements:
PCs
: A data frame of the principal components
Loadings
: A matrix of the loading vectors with features
in the row names
pathway
: The unique pathway identifier for the
pcOut
object
term
: The name of the pathway
NULL
NULL
### Load Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create -Omics Container ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) ### Calculate Supervised PCA Pathway p-Values ### colon_superpc <- SuperPCA_pVals( colon_Omics, numPCs = 2, parallel = TRUE, numCores = 2, adjustment = "BH" ) ### Extract PCs and Loadings ### getPathPCLs( colon_superpc, "KEGG_PENTOSE_PHOSPHATE_PATHWAY" )
### Load Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create -Omics Container ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) ### Calculate Supervised PCA Pathway p-Values ### colon_superpc <- SuperPCA_pVals( colon_Omics, numPCs = 2, parallel = TRUE, numCores = 2, adjustment = "BH" ) ### Extract PCs and Loadings ### getPathPCLs( colon_superpc, "KEGG_PENTOSE_PHOSPHATE_PATHWAY" )
-values from a superpcOut
- or aespcOut
-
class Object.Given an object of class aespcOut
or superpcOut
,
as returned by the functions AESPCA_pVals
or
SuperPCA_pVals
, respectively, return a data frame of the
-values for the top pathways.
getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...) ## S3 method for class 'superpcOut' getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...) ## S3 method for class 'aespcOut' getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...)
getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...) ## S3 method for class 'superpcOut' getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...) ## S3 method for class 'aespcOut' getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...)
pcOut |
An object of classes |
score |
Should the unadjusted |
numPaths |
The number of top pathways by raw |
alpha |
The significance threshold for raw |
... |
Dots for additional arguments (currently unused). |
Row-subset the pVals_df
entry of an object of class
aespcOut
or superpcOut
by the number of pathways requested
(via the nPaths
argument) or by the unadjusted significance level
for each pathway (via the alpha
argument). Return a data frame of
the pathway names, FDR-adjusted significance levels (if available), and
the raw score (negative natural logarithm of the -values) of each
pathway.
A data frame with the following columns:
terms
: The pathway name, as given in the
object@trimPathwayCollection$TERMS
object.
description
: (OPTIONAL) The pathway description, as given
in the object@trimPathwayCollection$description
object, if
supplied.
rawp
: The unadjusted -values of each pathway.
Included if
score = FALSE
.
...
: Additional columns of FDR-adjusted -values
as specified through the
adjustment
argument of the
SuperPCA_pVals
or AESPCA_pVals
functions.
score
: The negative natural logarithm of the unadjusted
-values of each pathway. Included if
score = TRUE
.
NULL
NULL
### Load Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create -Omics Container ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) ### Calculate Supervised PCA Pathway p-Values ### colon_superpc <- SuperPCA_pVals( colon_Omics, numPCs = 2, parallel = TRUE, numCores = 2, adjustment = "BH" ) ### Extract Table of p-Values ### # Top 5 Pathways getPathpVals( colon_superpc, numPaths = 5 ) # Pathways with Unadjusted p-Values < 0.01 getPathpVals( colon_superpc, alpha = 0.01 )
### Load Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create -Omics Container ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) ### Calculate Supervised PCA Pathway p-Values ### colon_superpc <- SuperPCA_pVals( colon_Omics, numPCs = 2, parallel = TRUE, numCores = 2, adjustment = "BH" ) ### Extract Table of p-Values ### # Top 5 Pathways getPathpVals( colon_superpc, numPaths = 5 ) # Pathways with Unadjusted p-Values < 0.01 getPathpVals( colon_superpc, alpha = 0.01 )
Given a list of loading vectors from a training data set, calculate the PCs of the test data set.
LoadOntoPCs(design_df, loadings_ls, sampleID = c("firstCol", "rowNames"))
LoadOntoPCs(design_df, loadings_ls, sampleID = c("firstCol", "rowNames"))
design_df |
A test data frame with rows as samples and named features as columns |
loadings_ls |
A list of |
sampleID |
Are the sample IDs in the first column of |
This function takes in a list of loadings and a training-centered test data set, applies over the list of loadings, subsets the columns of the test data by the row names of the loading vectors, right-multiplies the test-data subset matrix by the loading vector / matrix, and returns a data frame of the test-data PCs for each loading vector.
A data frame with the PCs from each pathway concatenated by column.
If you have the tidyverse
loaded, this object will display as a
tibble
.
### Load the Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create -Omics Container ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) ### Extract AESPCs ### colonSurv_aespc <- AESPCA_pVals( object = colon_Omics, numReps = 0, parallel = TRUE, numCores = 2, adjustpValues = TRUE, adjustment = c("Hoch", "SidakSD") ) ### Project Data onto Pathway First PCs ### LoadOntoPCs( design_df = colonSurv_df, loadings_ls = colonSurv_aespc$loadings_ls )
### Load the Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create -Omics Container ### colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) ### Extract AESPCs ### colonSurv_aespc <- AESPCA_pVals( object = colon_Omics, numReps = 0, parallel = TRUE, numCores = 2, adjustpValues = TRUE, adjustment = c("Hoch", "SidakSD") ) ### Project Data onto Pathway First PCs ### LoadOntoPCs( design_df = colonSurv_df, loadings_ls = colonSurv_aespc$loadings_ls )
OmicsPathway
objectThis creates the OmicsCateg
class which extends the
OmicsPathway
master class.
assayData_df
An data frame with named columns.
pathwayCollection
A list of known gene pathways with three or four elements:
pathways
: A named list of character vectors. Each vector
contains the names of the individual genes within that pathway as a
vector of character strings. The names contained in these vectors must
have non-empty overlap with the column names of the
assayData_df
data frame. The names of the pathways (the list
elements themselves) should be the a shorthand representation of the
full pathway name.
TERMS
: A character vector the same length as the
pathways
list with the proper names of the pathways.
description
: An optional character vector the same length
as the pathways
list with additional information about the
pathways.
setsize
: A named integer vector the same length as the
pathways
list with the number of genes in each pathway. This list
item is calculated during the creation step of a CreateOmics
function call.
response
A factor
vector of length : the dependent
variable of a generalized linear regression exercise. Currently, we
support binary factors only. We expect to extend support to n-ary
responses in the next package version.
An S4 class for mass spectrometry or bio-assay data and gene pathway lists
assayData_df
An data frame with named columns.
sampleIDs_char
A character vector with the N sample names.
pathwayCollection
A list of known gene pathways with three or four elements:
pathways
: A named list of character vectors. Each vector
contains the names of the individual genes within that pathway as a
vector of character strings. The names contained in these vectors must
have non-empty overlap with the column names of the
assayData_df
data frame. The names of the pathways (the list
elements themselves) should be the a shorthand representation of the
full pathway name.
TERMS
: A character vector the same length as the
pathways
list with the proper names of the pathways.
description
: An optional character vector the same length
as the pathways
list with additional information about the
pathways.
setsize
: A named integer vector the same length as the
pathways
list with the number of genes in each pathway. This list
item is calculated during the creation step of a CreateOmics
function call.
trimPathwayCollection
A subset of the list stored in the
pathwayCollection
slot. This list will have pathways that only
contain genes that are present in the assay data frame.
OmicsPathway
objectThis creates the OmicsReg
class which extends the
OmicsPathway
master class.
assayData_df
An data frame with named columns.
pathwayCollection
A list of known gene pathways with three or four elements:
pathways
: A named list of character vectors. Each vector
contains the names of the individual genes within that pathway as a
vector of character strings. The names contained in these vectors must
have non-empty overlap with the column names of the
assayData_df
data frame. The names of the pathways (the list
elements themselves) should be the a shorthand representation of the
full pathway name.
TERMS
: A character vector the same length as the
pathways
list with the proper names of the pathways.
description
: An optional character vector the same length
as the pathways
list with additional information about the
pathways.
setsize
: A named integer vector the same length as the
pathways
list with the number of genes in each pathway. This list
item is calculated during the creation step of a CreateOmics
function call.
response
A numeric
vector of length : the dependent
variable in a regression exercise.
OmicsPathway
objectThis creates the OmicsSurv
class which extends the
OmicsPathway
master class.
assayData_df
An data frame with named columns.
pathwayCollection
A list of known gene pathways with three or four elements:
pathways
: A named list of character vectors. Each vector
contains the names of the individual genes within that pathway as a
vector of character strings. The names contained in these vectors must
have non-empty overlap with the column names of the
assayData_df
data frame. The names of the pathways (the list
elements themselves) should be the a shorthand representation of the
full pathway name.
TERMS
: A character vector the same length as the
pathways
list with the proper names of the pathways.
description
: An optional character vector the same length
as the pathways
list with additional information about the
pathways.
setsize
: A named integer vector the same length as the
pathways
list with the number of genes in each pathway. This list
item is calculated during the creation step of a CreateOmics
function call.
eventTime
A numeric
vector with observations
corresponding to the last observed time of follow up.
eventObserved
A logical
vector with observations
indicating right-censoring. The values will be
FALSE
if the
observation was censored (i.e., we did not observe an event).
To introduce this package, please see our "Integrative Pathway Analysis" vignette: https://gabrielodom.github.io/pathwayPCA/articles//Introduction_to_pathwayPCA.html.
The pathwayPCA
package has three main components:
Import and Tidy Data: https://gabrielodom.github.io/pathwayPCA/articles/Supplement2-Importing_Data.html
Create Omics
Data Objects
https://gabrielodom.github.io/pathwayPCA/articles/Supplement3-Create_Omics_Objects.html
Test Pathway Significance https://gabrielodom.github.io/pathwayPCA/articles/Supplement4-Methods_Walkthrough.html
Analyze and Visualize Results https://gabrielodom.github.io/pathwayPCA/articles/Supplement5-Analyse_Results.html
For an overview of these four topics in context, please see our Quickstart Guide: https://gabrielodom.github.io/pathwayPCA/articles/Supplement1-Quickstart_Guide.html
.gmt
file in as a pathwayCollection
objectRead a set list file in Gene Matrix Transposed (.gmt
)
format, with special performance consideration for large files. Present
this object as a pathwayCollection
object.
read_gmt( file, setType = c("pathways", "genes", "regions"), description = FALSE, nChars = 1e+07, delim = "\t" )
read_gmt( file, setType = c("pathways", "genes", "regions"), description = FALSE, nChars = 1e+07, delim = "\t" )
file |
A path to a file or a connection. This file must be a |
setType |
What is the type of the set: pathway set of gene, gene sites
in RNA or DNA, or regions of CpGs. Defaults to |
description |
Should the "description" field (the second field in the
|
nChars |
The number of characters to read from a connection. The largest
|
delim |
The |
This function uses R
's readChar
function to
improve character input performance over readLines
(and
far improve input performance over scan
).
See the Broad Institute's "Data Formats" page for a description of the Gene Matrix Transposed file format: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29
A pathwayCollection
list of sets. This list has three
elements:
'setType' : A named list of character vectors. Each vector
contains the names of the individual genes, sites, or CpGs within that
set as a vector of character strings. The name of this list entry is
equal to the value specified in setType
.
TERMS
: A character vector the same length as the 'setType'
list with the proper names of the sets.
description
: (OPTIONAL) A character vector the same length
as the 'setType' list with a note on that set (for the .gmt
file
included with this package, this field contains hyperlinks to the
MSigDB description card for that pathway). This field is included when
description = TRUE
.
print.pathwayCollection
; write_gmt
# If you have installed the package: data_path <- system.file( "extdata", "c2.cp.v6.0.symbols.gmt", package = "pathwayPCA", mustWork = TRUE ) geneset_ls <- read_gmt(data_path, description = TRUE) # # If you are using the development version from GitHub: # geneset_ls <- read_gmt( # "inst/extdata/c2.cp.v6.0.symbols.gmt", # description = TRUE # )
# If you have installed the package: data_path <- system.file( "extdata", "c2.cp.v6.0.symbols.gmt", package = "pathwayPCA", mustWork = TRUE ) geneset_ls <- read_gmt(data_path, description = TRUE) # # If you are using the development version from GitHub: # geneset_ls <- read_gmt( # "inst/extdata/c2.cp.v6.0.symbols.gmt", # description = TRUE # )
Extract the assay information from a
SummarizedExperiment-class
-object,
transpose it, and and return it as a tidy data frame that contains assay
measurements, feature names, and sample IDs
SE2Tidy(summExperiment, whichAssay = 1)
SE2Tidy(summExperiment, whichAssay = 1)
summExperiment |
A
|
whichAssay |
Because |
This function is designed to extract and transpose a "tall" assay
data frames (where genes or proteins are the rows and patient or tumour
samples are the columns) from a SummarizedExperiment
object.
This function also transposes the row (feature) names to column names and
the column (sample) names to row names via the
TransposeAssay
function.
NOTE: if this function stops working (again), please add a comment here: https://github.com/gabrielodom/pathwayPCA/issues/83
The transposition of the assay in summExperiment
to tidy form,
with the column data (from the colData
slot of the object) appended
as the first columns of the data frame.
# THIS REQUIRES THE SummarizedExperiment PACKAGE. library(SummarizedExperiment) data(airway, package = "airway") airway_df <- SE2Tidy(airway)
# THIS REQUIRES THE SummarizedExperiment PACKAGE. library(SummarizedExperiment) data(airway, package = "airway") airway_df <- SE2Tidy(airway)
pathwayCollection
Values in Omics*
Objects"Get" or "Set" the values of the assayData_df
,
sampleIDs_char
, or pathwayCollection
slots of an object of
class OmicsPathway
or a class that extends this class
(OmicsSurv
, OmicsReg
, or OmicsCateg
).
getAssay(object, ...) getAssay(object) <- value getSampleIDs(object, ...) getSampleIDs(object) <- value getPathwayCollection(object, ...) getPathwayCollection(object) <- value getTrimPathwayCollection(object, ...) ## S4 method for signature 'OmicsPathway' getAssay(object, ...) ## S4 replacement method for signature 'OmicsPathway' getAssay(object) <- value ## S4 method for signature 'OmicsPathway' getSampleIDs(object, ...) ## S4 replacement method for signature 'OmicsPathway' getSampleIDs(object) <- value ## S4 method for signature 'OmicsPathway' getPathwayCollection(object, ...) ## S4 replacement method for signature 'OmicsPathway' getPathwayCollection(object) <- value ## S4 method for signature 'OmicsPathway' getTrimPathwayCollection(object, ...)
getAssay(object, ...) getAssay(object) <- value getSampleIDs(object, ...) getSampleIDs(object) <- value getPathwayCollection(object, ...) getPathwayCollection(object) <- value getTrimPathwayCollection(object, ...) ## S4 method for signature 'OmicsPathway' getAssay(object, ...) ## S4 replacement method for signature 'OmicsPathway' getAssay(object) <- value ## S4 method for signature 'OmicsPathway' getSampleIDs(object, ...) ## S4 replacement method for signature 'OmicsPathway' getSampleIDs(object) <- value ## S4 method for signature 'OmicsPathway' getPathwayCollection(object, ...) ## S4 replacement method for signature 'OmicsPathway' getPathwayCollection(object) <- value ## S4 method for signature 'OmicsPathway' getTrimPathwayCollection(object, ...)
object |
An object of or extending |
... |
Dots for additional internal arguments (currently unused). |
value |
The replacement object to be assigned to the specified slot. |
These functions can be useful to set or extract the assay data or
pathways list from an Omics*
-class object. However, we recommend
that users simply create a new, valid Omics*
object instead of
modifying an existing one. The validity of edited objects is checked with
the ValidOmicsSurv
, ValidOmicsCateg
, or
ValidOmicsReg
functions.
Further, because the pathwayPCA
methods require a cleaned (trimmed)
pathway collection, the trimPathwayCollection
slot is read-only.
Users may only edit this slot by updating the pathway collection provided
to the pathwayCollection
slot. Despite this functionality, we
strongly recommend that users create a new object with the
updated pathway collection, rather than attempting to overwrite the slots
within an existing object. See IntersectOmicsPwyCollct
for
details on trimmed pathway collection.
The "get" functions return the objects in the slots specified:
getAssay
returns the assayData_df
data frame object,
getSampleIDs
returns the sampleIDs_char
character vector,
getPathwayCollection
returns the pathwayCollection
list
object, and getTrimPathwayCollection
returns the
trimPathwayCollection
. These functions can extract these values
from any valid OmicsPathway
, OmicsSurv
, OmicsReg
, or
OmicsCateg
object.
The "set" functions enable the user to edit or replace objects in the
assayData_df
, sampleIDs_char
, or pathwayCollection
slots for any OmicsPathway
, OmicsSurv
, OmicsReg
, or
OmicsCateg
objects, provided that the new values do not violate
the validity checks of their respective objects. Because the slot for
trimPathwayCollection
is filled upon object creation, and to ensure
that this pathway collection is "clean", there is no "set" function for
the trimmed pathway collection slot. Instead, users can update the pathway
collection, and the trimmed pathway collection will be updated
automatically. See "Details" for more information on the "set" functions.
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection ) getAssay(colon_Omics) getPathwayCollection(colon_Omics)
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection ) getAssay(colon_Omics) getPathwayCollection(colon_Omics)
OmicsReg
or OmicsReg
Object"Get" or "Set" the values of the response_num
or
response_fact
slots of an object of class OmicsReg
or
OmicsReg
, respectively.
getResponse(object, ...) getResponse(object) <- value ## S4 method for signature 'OmicsPathway' getResponse(object, ...) ## S4 replacement method for signature 'OmicsPathway' getResponse(object) <- value
getResponse(object, ...) getResponse(object) <- value ## S4 method for signature 'OmicsPathway' getResponse(object, ...) ## S4 replacement method for signature 'OmicsPathway' getResponse(object) <- value
object |
An object of class |
... |
Dots for additional internal arguments (currently unused). |
value |
The replacement object to be assigned to the |
These functions can be useful to set or extract the response vector
from an object of class OmicsReg
or OmicsReg
. However, we
recommend that users simply create a new, valid object instead of
modifying an existing one. The validity of edited objects is checked with
their respective ValidOmicsCateg
or
ValidOmicsReg
function. Because both classes have a
response
slot, we set this method for the parent class,
OmicsPathway-class
.
The "get" functions return the objects in the slots specified:
getResponse
returns the response_num
vector from objects of
class OmicsReg
and the response_fact
vector from objects of
class OmicsCateg
. These functions can extract these values from
any valid object of those classes.
The "set" functions enable the user to edit or replace the object in the
response_num
slot for any OmicsReg
object or
response_fact
slot for any OmicsCateg
object, provided that
the new values do not violate the validity check of such an object. See
"Details" for more information.
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, c(1, 2)], respType = "reg" ) getResponse(colon_Omics)
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, c(1, 2)], respType = "reg" ) getResponse(colon_Omics)
OmicsSurv
Object"Get" or "Set" the values of the eventTime_num
or
eventObserved_lgl
slots of an object of class OmicsSurv
.
getEventTime(object, ...) getEventTime(object) <- value getEvent(object, ...) getEvent(object) <- value ## S4 method for signature 'OmicsSurv' getEventTime(object, ...) ## S4 replacement method for signature 'OmicsSurv' getEventTime(object) <- value ## S4 method for signature 'OmicsSurv' getEvent(object, ...) ## S4 replacement method for signature 'OmicsSurv' getEvent(object) <- value
getEventTime(object, ...) getEventTime(object) <- value getEvent(object, ...) getEvent(object) <- value ## S4 method for signature 'OmicsSurv' getEventTime(object, ...) ## S4 replacement method for signature 'OmicsSurv' getEventTime(object) <- value ## S4 method for signature 'OmicsSurv' getEvent(object, ...) ## S4 replacement method for signature 'OmicsSurv' getEvent(object) <- value
object |
An object of class |
... |
Dots for additional internal arguments (currently unused). |
value |
The replacement object to be assigned to the specified slot. |
These functions can be useful to set or extract the event time or
death indicator from an OmicsSurv
object. However, we recommend
that users simply create a new, valid OmicsSurv
object instead of
modifying an existing one. The validity of edited objects is checked with
the ValidOmicsSurv
function.
The "get" functions return the objects in the slots specified:
getEventTime
returns the eventTime_num
vector object and
getEvent
returns the eventObserved_lgl
vector object. These
functions can extract these values from any valid OmicsSurv
object.
The "set" functions enable the user to edit or replace objects in the
eventTime_num
or eventObserved_lgl
slots for any
OmicsSurv
object, provided that the new values do not violate the
validity check of an OmicsSurv
object. See "Details" for more
information.
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) getEventTime(colon_Omics) getEvent(colon_Omics)
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) getEventTime(colon_Omics) getEvent(colon_Omics)
pathwayCollection
-class Object by Pathway.The subset method for pathways lists as returned by the
read_gmt
function.
## S3 method for class 'pathwayCollection' x[[name_char]]
## S3 method for class 'pathwayCollection' x[[name_char]]
x |
An object of class |
name_char |
The name of a pathway in the collection or its unique ID. |
This function finds the index matching the name_char
argument
to the TERMS
field of the pathwayCollection
-class Object,
then subsets the pathways
list, TERMS
vector,
description
vector, and setsize
vector by this index. If you
subset a trimmed pathwayCollection
object, and the function errors
with "Pathway not found.", then the pathway specified has been trimmed
from the pathway collection.
Also, this function does not allow for users to overwrite any portion of
a pathway collection. These objects should rarely, if ever, be changed.
If you absolutely must change the components of a pathwayCollection
object, then create a new one with the codeCreatePathwayCollection
function.
A list of the pathway name (Term
), unique ID (pathID
),
contents (IDs
), description (description
), and number of
features (Size
).
data("colon_pathwayCollection") colon_pathwayCollection[["KEGG_RETINOL_METABOLISM"]]
data("colon_pathwayCollection") colon_pathwayCollection[["KEGG_RETINOL_METABOLISM"]]
Given an Omics
object and the name of a pathway, return
the -omes in the assay and the response as a (tibble) data frame.
SubsetPathwayData(object, pathName, ...) ## S4 method for signature 'OmicsPathway' SubsetPathwayData(object, pathName, ...)
SubsetPathwayData(object, pathName, ...) ## S4 method for signature 'OmicsPathway' SubsetPathwayData(object, pathName, ...)
object |
An object of class |
pathName |
The name of a pathway contained in the pathway collection in the object. |
... |
Dots for additional internal arguments (currently unused). |
This function subsets the assay by the matching gene symbols or IDs in the specified pathway.
A data frame of the columns of the assay in the Omics
object
which are listed in the specified pathway, with a leading column for
sample IDs. If the Omics
object has response information, these
are also included as the first column(s) of the data frame, after the
sample IDs. If you have the suggested tidyverse
package suite
loaded, then this data frame will print as a tibble
.
Otherwise, it will print as a data frame.
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) SubsetPathwayData( colon_Omics, "KEGG_RETINOL_METABOLISM" )
data("colonSurv_df") data("colon_pathwayCollection") colon_Omics <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "survival" ) SubsetPathwayData( colon_Omics, "KEGG_RETINOL_METABOLISM" )
Given a supervised OmicsPath
object (one of
OmicsSurv
, OmicsReg
, or OmicsCateg
), extract the
first principal components (PCs) from each pathway-subset of the
-Omics assay design matrix, test their association with the response
matrix, and return a data frame of the adjusted
-values for each
pathway.
SuperPCA_pVals( object, n.threshold = 20, numPCs = 1, parallel = FALSE, numCores = NULL, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... ) ## S4 method for signature 'OmicsPathway' SuperPCA_pVals( object, n.threshold = 20, numPCs = 1, parallel = FALSE, numCores = NULL, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... )
SuperPCA_pVals( object, n.threshold = 20, numPCs = 1, parallel = FALSE, numCores = NULL, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... ) ## S4 method for signature 'OmicsPathway' SuperPCA_pVals( object, n.threshold = 20, numPCs = 1, parallel = FALSE, numCores = NULL, adjustpValues = TRUE, adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH", "TSBH"), ... )
object |
An object of superclass |
n.threshold |
The number of bins into which to split the feature scores
in the fit object returned internally by the |
numPCs |
The number of PCs to extract from each pathway. Defaults to 1. |
parallel |
Should the computation be completed in parallel? Defaults to
|
numCores |
If |
adjustpValues |
Should you adjust the |
adjustment |
Character vector of procedures. The returned data frame
will be sorted in ascending order by the first procedure in this vector,
with ties broken by the unadjusted |
... |
Dots for additional internal arguments. |
This is a wrapper function for the pathway_tScores
,
pathway_tControl
, OptimGumbelMixParams
,
GumbelMixpValues
, and TabulatepValues
functions.
Please see our Quickstart Guide for this package: https://gabrielodom.github.io/pathwayPCA/articles/Supplement1-Quickstart_Guide.html
A data frame with columns:
pathways
: The names of the pathways in the Omics*
object (given in object@trimPathwayCollection$pathways
.)
setsize
: The number of genes in each of the original
pathways (given in the object@trimPathwayCollection$setsize
object).
terms
: The pathway description, as given in the
object@trimPathwayCollection$TERMS
object.
rawp
: The unadjusted -values of each pathway.
...
: Additional columns as specified through the
adjustment
argument.
The data frame will be sorted in ascending order by the method specified
first in the adjustment
argument. If adjustpValues = FALSE
,
then the data frame will be sorted by the raw -values. If you have
the suggested
tidyverse
package suite loaded, then this data frame
will print as a tibble
. Otherwise, it will print as
a data frame.
CreateOmics
; TabulatepValues
;
pathway_tScores
; pathway_tControl
;
OptimGumbelMixParams
; GumbelMixpValues
;
clusterApply
### Load the Example Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create an OmicsSurv Object ### colon_OmicsSurv <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "surv" ) ### Calculate Pathway p-Values ### colonSurv_superpc <- SuperPCA_pVals( object = colon_OmicsSurv, parallel = TRUE, numCores = 2, adjustpValues = TRUE, adjustment = c("Hoch", "SidakSD") )
### Load the Example Data ### data("colonSurv_df") data("colon_pathwayCollection") ### Create an OmicsSurv Object ### colon_OmicsSurv <- CreateOmics( assayData_df = colonSurv_df[, -(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "surv" ) ### Calculate Pathway p-Values ### colonSurv_superpc <- SuperPCA_pVals( object = colon_OmicsSurv, parallel = TRUE, numCores = 2, adjustpValues = TRUE, adjustment = c("Hoch", "SidakSD") )
Transpose an object of class data.frame
that contains
assay measurements while preserving row (feature) and column (sample)
names.
TransposeAssay( assay_df, omeNames = c("firstCol", "rowNames"), stringsAsFactors = FALSE )
TransposeAssay( assay_df, omeNames = c("firstCol", "rowNames"), stringsAsFactors = FALSE )
assay_df |
A data frame with numeric values to transpose |
omeNames |
Are the data feature names in the first column or in the row
names of |
stringsAsFactors |
Should columns containing string information be
coerced to factors? Defaults to |
This function is designed to transpose "tall" assay data frames (where genes or proteins are the rows and patient or tumour samples are the columns). This function also transposes the row (feature) names to column names and the column (sample) names to row names. Notice that all rows and columns (other than the feature name column, as applicable) are numeric.
Recall that data frames require that all elements of a single column to
have the same class
. Therefore, sample IDs of a "tall" data
frame must be stored as the column names rather than in the
first row.
The transposition of df
, with row and column names preserved
and reversed.
x_mat <- matrix(rnorm(5000), ncol = 20, nrow = 250) rownames(x_mat) <- paste0("gene_", 1:250) colnames(x_mat) <- paste0("sample_", 1:20) x_df <- as.data.frame(x_mat, row.names = rownames(x_mat)) TransposeAssay(x_df, omeNames = "rowNames")
x_mat <- matrix(rnorm(5000), ncol = 20, nrow = 250) rownames(x_mat) <- paste0("gene_", 1:250) colnames(x_mat) <- paste0("sample_", 1:20) x_df <- as.data.frame(x_mat, row.names = rownames(x_mat)) TransposeAssay(x_df, omeNames = "rowNames")
pathwayCollection
-class Object by Symbol.The filter-subset method for pathways lists as returned by the
read_gmt
function. This function returns the subset of
pathways which contain the set of symbols requested
WhichPathways(x, symbols_char, ...)
WhichPathways(x, symbols_char, ...)
x |
An object of class |
symbols_char |
A character vector or scalar of gene symbols or regions |
... |
Additional arguments passed to the |
This function finds the index of each set that contains the symbols
supplied, then returns those sets as a new pathwayCollection
object. Find pathways that contain geneA OR geneB by passing the argument
matches = "any"
through ...
to Contains
(this
is the default value). Find pathways that contain geneA AND geneB by
changing this argument to matches = "all"
. Find all genes in a
specified family by passing in one value to short
and setting
partial = TRUE
.
An object of class pathwayCollection
, but containing only the
sets which contain the symbols supplied to symbols_char
. If no sets
are found to contain the symbols supplied, this function returns
NULL
and prints a warning.
data("colon_pathwayCollection") WhichPathways(colon_pathwayCollection, "MAP", partial = TRUE) WhichPathways( colon_pathwayCollection, c("MAP4K5", "RELA"), matches = "all" )
data("colon_pathwayCollection") WhichPathways(colon_pathwayCollection, "MAP", partial = TRUE) WhichPathways( colon_pathwayCollection, c("MAP4K5", "RELA"), matches = "all" )
A pathwayCollection
object containing the homosapiens
pathways list from Wikipathways (https://www.wikipathways.org/).
data(wikipwsHS_Entrez_pathwayCollection)
data(wikipwsHS_Entrez_pathwayCollection)
A pathwayCollection
list of three elements:
pathways
: A named list of 443 character vectors. Each
vector contains the Entrez Gene IDs of the individual genes within that
pathway as a vector of character strings. The names are the shorthand
pathway names.
TERMS
: A character vector of length 443 containing the
shorthand names of the gene pathways.
description
: A character vector of length 443 containing
the full names of the gene pathways.
This pathwayCollection
was sent to us from Dr. Alexander Pico
at the Gladstone Institute
(https://gladstone.org/our-science/people/alexander-pico).
Dr. Alexander Pico, Wikipathways
A pathwayCollection
object containing the homosapiens
pathways list from Wikipathways (https://www.wikipathways.org/).
data(wikipwsHS_Symbol_pathwayCollection)
data(wikipwsHS_Symbol_pathwayCollection)
A pathwayCollection
list of three elements:
pathways
: A named list of 457 character vectors. Each
vector contains the Gene Symbols of the individual genes within that
pathway as a vector of character strings. The names are the shorthand
pathway names.
TERMS
: A character vector of length 457 containing the
shorthand names of the gene pathways.
description
: A character vector of length 457 containing
the full names of the gene pathways.
This pathwayCollection
was sent to us from Dr. Alexander Pico
at the Gladstone Institute
(https://gladstone.org/our-science/people/alexander-pico).
This pathway collection was translated from EntrezIDs to HGNC Symbols with
the script convert_EntrezID_to_HGNC_Ensembl.R
in scripts
.
Dr. Alexander Pico, Wikipathways
pathwayCollection
Object to a .gmt
FileWrite a pathwayCollection
object as a pathways list file
in Gene Matrix Transposed (.gmt
) format.
write_gmt(pathwayCollection, file, setType = c("pathways", "genes", "regions"))
write_gmt(pathwayCollection, file, setType = c("pathways", "genes", "regions"))
pathwayCollection |
A
|
file |
Either a character string naming a file or a connection open for
writing. File names should end in |
setType |
What is the type of the set: pathway set of gene, gene sites
in RNA or DNA, or regions of CpGs. Defaults to |
See the Broad Institute's "Data Formats" page for a description of the Gene Matrix Transposed file format: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29
NULL. Output written to the file path specified.
print.pathwayCollection
; read_gmt
# Toy pathway set toy_pathwayCollection <- list( pathways = list( c("C1orf27", "NR5A1", "BLOC1S4", "C4orf50"), c("TARS2", "DUSP5", "GPR88"), c("TRX-CAT3-1", "LINC01333", "LINC01499", "LINC01046", "LINC01149") ), TERMS = c("C-or-f_paths", "randomPath2", "randomLINCs"), description = c("these are", "totally made up", "pathways") ) class(toy_pathwayCollection) <- c("pathwayCollection", "list") toy_pathwayCollection # write_gmt(toy_pathwayCollection, file = "example_pathway.gmt")
# Toy pathway set toy_pathwayCollection <- list( pathways = list( c("C1orf27", "NR5A1", "BLOC1S4", "C4orf50"), c("TARS2", "DUSP5", "GPR88"), c("TRX-CAT3-1", "LINC01333", "LINC01499", "LINC01046", "LINC01149") ), TERMS = c("C-or-f_paths", "randomPath2", "randomLINCs"), description = c("these are", "totally made up", "pathways") ) class(toy_pathwayCollection) <- c("pathwayCollection", "list") toy_pathwayCollection # write_gmt(toy_pathwayCollection, file = "example_pathway.gmt")