Package 'pathwayPCA'

Title: Integrative Pathway Analysis with Modern PCA Methodology and Gene Selection
Description: pathwayPCA is an integrative analysis tool that implements the principal component analysis (PCA) based pathway analysis approaches described in Chen et al. (2008), Chen et al. (2010), and Chen (2011). pathwayPCA allows users to: (1) Test pathway association with binary, continuous, or survival phenotypes. (2) Extract relevant genes in the pathways using the SuperPCA and AES-PCA approaches. (3) Compute principal components (PCs) based on the selected genes. These estimated latent variables represent pathway activities for individual subjects, which can then be used to perform integrative pathway analysis, such as multi-omics analysis. (4) Extract relevant genes that drive pathway significance as well as data corresponding to these relevant genes for additional in-depth analysis. (5) Perform analyses with enhanced computational efficiency with parallel computing and enhanced data safety with S4-class data objects. (6) Analyze studies with complex experimental designs, with multiple covariates, and with interaction effects, e.g., testing whether pathway association with clinical phenotype is different between male and female subjects. Citations: Chen et al. (2008) <https://doi.org/10.1093/bioinformatics/btn458>; Chen et al. (2010) <https://doi.org/10.1002/gepi.20532>; and Chen (2011) <https://doi.org/10.2202/1544-6115.1697>.
Authors: Gabriel Odom [aut, cre], James Ban [aut], Lizhong Liu [aut], Lily Wang [aut], Steven Chen [aut]
Maintainer: Gabriel Odom <[email protected]>
License: GPL-3
Version: 1.23.0
Built: 2024-11-29 06:59:16 UTC
Source: https://github.com/bioc/pathwayPCA

Help Index


Adaptive, elastic-net, sparse principal component analysis

Description

A function to perform adaptive, elastic-net, sparse principal component analysis (AES-PCA).

Usage

aespca(X, d = 1, max.iter = 10, eps.conv = 0.001, adaptive = TRUE, para = NULL)

Arguments

X

A pathway design matrix: the data matrix should be n×pn \times p, where nn is the sample size and pp is the number of variables included in the pathway.

d

The number of principal components (PCs) to extract from the pathway. Defaults to 1.

max.iter

The maximum number of times an internal while() loop can make calls to the lars.lsa() function. Defaults to 10.

eps.conv

A numerical convergence threshold for the same while() loop. Defaults to 0.001.

adaptive

Internal argument of the lars.lsa() function. Defaults to TRUE.

para

Internal argument of the lars.lsa() function. Defaults to NULL.

Details

This function calculates the loadings and reduced-dimension predictor matrix using both the Singular Value Decomposition and AES-PCA Decomposition (as described in Efron et al (2003)) of the data matrix. Note that, if the number of features in the pathway exceeds the number of samples, this decompostion will be an approximation; also, the internal lars.lsa function may require more computing time than usual to converge (which is one of the reasons why, in practice, we usually remove pathways that have more than 200-300 features).

See https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf.

For potential enhancement details, see the comment in the "Details" section of normalize.

Value

A list of four elements containing the loadings and projected predictors:

  • aesLoad : A d×pd \times p projection matrix of the dd AES-PCs.

  • oldLoad : A d×pd \times p projection matrix of the dd PCs from the singular value decomposition (SVD).

  • aesScore : An n×dn \times d predictor matrix: the original nn observations loaded onto the dd AES-PCs.

  • oldScore : An n×dn \times d predictor matrix: the original nn observations loaded onto the dd SVD-PCs.

See Also

normalize; lars.lsa; ExtractAESPCs; AESPCA_pVals

Examples

# DO NOT CALL THIS FUNCTION DIRECTLY.
  # Call this function through AESPCA_pVals() instead.

## Not run: 
  data("colonSurv_df")
  aespca(as.matrix(colonSurv_df[, 5:50]))

## End(Not run)

Test pathway association with AES-PCA

Description

Given a supervised OmicsPath object (one of OmicsSurv, OmicsReg, or OmicsCateg), extract the first kk adaptive, elastic-net, sparse principal components (PCs) from each pathway-subset of the features in the -Omics assay design matrix, test their association with the response matrix, and return a data frame of the adjusted pp-values for each pathway.

Usage

AESPCA_pVals(
  object,
  numPCs = 1,
  numReps = 0L,
  parallel = FALSE,
  numCores = NULL,
  asPCA = FALSE,
  adjustpValues = TRUE,
  adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY",
    "ABH", "TSBH"),
  ...
)

## S4 method for signature 'OmicsPathway'
AESPCA_pVals(
  object,
  numPCs = 1,
  numReps = 1000,
  parallel = FALSE,
  numCores = NULL,
  asPCA = FALSE,
  adjustpValues = TRUE,
  adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY",
    "ABH", "TSBH"),
  ...
)

Arguments

object

An object of class OmicsPathway with a response matrix or vector.

numPCs

The number of PCs to extract from each pathway. Defaults to 1.

numReps

How many permutations to estimate the pp-value? Defaults to 0 (that is, to estimate the pp-value parametrically). If numReps > 0, then the non-parametric, permutation pp-value will be returned based on the number of random samples specified.

parallel

Should the computation be completed in parallel? Defaults to FALSE.

numCores

If parallel = TRUE, how many cores should be used for computation? Internally defaults to the number of available cores minus 1.

asPCA

Should the computation return the eigenvectors and eigenvalues instead of the adaptive, elastic-net, sparse principal components and their corresponding loadings. Defaults to FALSE; this should be used for diagnostic or comparative purposes only.

adjustpValues

Should you adjust the pp-values for multiple comparisons? Defaults to TRUE.

adjustment

Character vector of procedures. The returned data frame will be sorted in ascending order by the first procedure in this vector, with ties broken by the unadjusted pp-value. If only one procedure is selected, then it is necessarily the first procedure. See the documentation for the ControlFDR function for the adjustment procedure definitions and citations.

...

Dots for additional internal arguments.

Details

This is a wrapper function for the ExtractAESPCs, PermTestSurv, PermTestReg, and PermTestCateg functions.

Please see our Quickstart Guide for this package: https://gabrielodom.github.io/pathwayPCA/articles/Supplement1-Quickstart_Guide.html

Value

A results list with class aespcOut. This list has three components: a data frame of pathway details, pathway pp-values, and potential adjustments to those values (pVals_df); a list of the first numPCs score vectors for each pathway (PCs_ls); and a list of the first numPCs feature loading vectors for each pathway (loadings_ls). The pp-value data frame has columns:

  • pathways : The names of the pathways in the Omics* object (given in object@trimPathwayCollection$pathways.)

  • setsize : The number of genes in each of the original pathways (given in the object@trimPathwayCollection$setsize object).

  • n_tested : The number of genes in each of the trimmed pathways (given in the object@trimPathwayCollection$n_tested object).

  • terms : The pathway description, as given in the object@trimPathwayCollection$TERMS object.

  • rawp : The unadjusted pp-values of each pathway.

  • ... : Additional columns of adjusted pp-values as specified through the adjustment argument.

The data frame will be sorted in ascending order by the method specified first in the adjustment argument. If adjustpValues = FALSE, then the data frame will be sorted by the raw pp-values. If you have the suggested tidyverse package suite loaded, then this data frame will print as a tibble. Otherwise, it will print as a data frame.

See Also

CreateOmics; ExtractAESPCs; PermTestSurv; PermTestReg; PermTestCateg; TabulatepValues; clusterApply

Examples

###  Load the Example Data  ###
  data("colonSurv_df")
  data("colon_pathwayCollection")

  ###  Create an OmicsSurv Object  ###
  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "surv"
  )

  ###  Calculate Pathway p-Values  ###
  colonSurv_aespc <- AESPCA_pVals(
    object = colon_Omics,
    numReps = 0,
    parallel = TRUE,
    numCores = 2,
    adjustpValues = TRUE,
    adjustment = c("Hoch", "SidakSD")
  )

Gene Pathway Subset

Description

An example Canonical Pathways Gene Subset from the Broad Institute: File: c2.cp.v6.0.symbols.gmt.

Usage

data(colon_pathwayCollection)

Format

A pathwayCollection list of two elements:

  • pathways : A list of 15 character vectors. Each vector contains the names of the individual genes within that pathway as a vector of character strings.

  • TERMS : A character vector of length 15 containing the names of the gene pathways.

Details

This is a subset of 15 pathways from the Broad Institute pathways list. This subset contains seven pathways which are related to the response information in the colonSurv_df data file.

Source

http://software.broadinstitute.org/gsea/msigdb/collections.jsp


Colon Cancer -Omics Data

Description

Subset of a colon cancer survival data set, with subject response and assay values.

Usage

data(colonSurv_df)

Format

A subset of a data frame containing 656 of 2022 genes measured on 250 subjects. The first two columns are the Overall Survival time (OS_time) and death indicator (OS_event).

Source

GEO GSE17538 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17538


Check if a long atomic vector contains a short atomic vector

Description

Check if any or all of the elements of a short atomic vector are contained within a supplied long atomic vector.

Usage

Contains(long, short, matches = c("any", "all"), partial = FALSE)

Arguments

long

A vector to possibly containing any or all elements of short

short

A short vector or scalar, some elements of which may be contained in long

matches

Should partial set matching of short be allowed? Defaults to "any", signifying that the function should return TRUE if any of the elements of short are contained in long. The other option is "all".

partial

Should partial string matching be allowed? Defaults to FALSE. Partial string matching means that the character string starts with the supplied value.

Details

This is a helper function to find out if a gene symbol or some similar character string (or character vector) is contained in a pathway. Currently, this function uses base R, but we can write it in a compiled language (such as C++) to increase speed later.

For partial matching (partial = TRUE), long must be an atomic vector of type character, short must be an atomic scalar (a vector with length of 1) of type character, and matches should be set to "any". Because this function is designed to match gene symbols or CpG locations, we care if the symbol or location starts with the string supplied. For example, if we set short = "PIK", then we want to find if any of the gene symbols in the supplied long vector belong to the PIK gene family. We don't care if this string appears elsewhere in a gene symbol.

Value

A logical scalar. If matches = "any", this indicates if any of the elements of short are contained in long. If matches = "all", this indicates if all of the elements of short are contained in long. If partial = TRUE, the returned logical indicates whether or not any of the character strings in long start with the character scalar supplied to short.

Examples

Contains(1:10, 8)
   Contains(LETTERS, c("A", "!"), matches = "any")
   Contains(LETTERS, c("A", "!"), matches = "all")
   
   genesPI <- c(
     "PI4K2A", "PI4K2B", "PI4KA", "PI4KB", "PIK3C2A", "PIK3C2B", "PIK3C2G",
     "PIK3C3", "PIK3CA", "PIK3CB", "PIK3CD", "PIK3CG", "PIK3R1", "PIK3R2",
     "PIK3R3", "PIK3R4", "PIK3R5", "PIK3R6", "PIKFYVE", "PIP4K2A",
     "PIP4K2B", "PIP5K1B", "PIP5K1C", "PITPNB"
   )
   Contains(genesPI, "PIK3", partial = TRUE)

Generation Wrapper function for -Omics*-class objects

Description

This function calls the CreateOmicsPath, CreateOmicsSurv, CreateOmicsReg, and CreateOmicsCateg functions to create valid objects of the classes OmicsPathway, OmicsSurv, OmicsReg, or OmicsCateg, respectively.

Usage

CreateOmics(
  assayData_df,
  pathwayCollection_ls,
  response = NULL,
  respType = c("none", "survival", "regression", "categorical"),
  centerScale = c(TRUE, TRUE),
  minPathSize = 3,
  ...
)

Arguments

assayData_df

An N×pN \times p data frame with named columns.

pathwayCollection_ls

A pathwayCollection list of known gene pathways with two or three elements:

  • pathways : A named list of character vectors. Each vector contains the names of the individual genes within that pathway as a vector of character strings. The names contained in these vectors must have non-empty overlap with the column names of the assayData_df data frame. The names of the pathways (the list elements themselves) should be the a shorthand representation of the full pathway name.

  • TERMS: A character vector the same length as the pathways list with the proper names of the pathways.

  • description : An optional character vector the same length as the pathways list with additional information about the pathways.

If your gene pathways list is stored in a .gmt file, use the read_gmt function to import your pathways list as a pathwayCollection list object.

response

An optional response object. See "Details" for more information. Defaults to NULL.

respType

What type of response has been supplied. Options are "none", "survival", "regression", and "categorical". Defaults to "none" to match the default response = NULL value.

centerScale

Should the values in assayData_df be centered and scaled? Defaults to TRUE for centering and scaling, respectively. See scale for more information.

minPathSize

What is the smallest number of genes allowed in each pathway? Defaults to 3.

...

Dots for additional arguments passed to the internal CheckAssay function.

Details

This function is a wrapper around the four CreateOmics* functions. The values supplied to the response function argument can be in a list, data frame, matrix, vector, Surv object, or any class which extends these. Because this function makes "best guess" type conversions based on the respType argument, this argument is mandatory if response is non-NULL. Further, it is the responsibility of the user to ensure that the coerced response contained in the resulting Omics object accurately reflects the supplied response.

For respType = "survival", response is assumed to be ordered by event time, then event indicator. For example, if the response is a data frame or matrix, this function assumes that the first column is the time and the second column the death indicator. If the response is a list, then this function assumes that the first entry in the list is the event time and the second entry the death indicator. The death indicator must be a logical or binary (0-1) vector, where 1 or TRUE represents a death and 0 or FALSE represents right-censoring.

Some of the pathways in the supplied pathways list will be removed, or "trimmed", during object creation. For the pathway-testing methods, these trimmed pathways will have pp-values given as NA. For an explanation of pathway trimming, see the documentation for the IntersectOmicsPwyCollct function.

Value

A valid object of class OmicsPathway, OmicsSurv, OmicsReg, or OmicsCateg.

See Also

OmicsPathway, CreateOmicsPath, OmicsSurv, CreateOmicsSurv, OmicsCateg, CreateOmicsCateg OmicsReg, CreateOmicsReg, CheckAssay, CheckPwyColl, and IntersectOmicsPwyCollct

Examples

###  Load the Example Data  ###
  data("colonSurv_df")
  data("colon_pathwayCollection")

  ###  Create an OmicsPathway Object  ###
  colon_OmicsPath <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection
  )

  ###  Create an OmicsSurv Object  ###
  colon_OmicsSurv <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "surv"
  )

  ###  Create an OmicsReg Object  ###
  colon_OmicsReg <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:2],
    respType = "reg"
  )

  ###  Create an OmicsCateg Object  ###
  colon_OmicsCateg <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, c(1,3)],
    respType = "cat"
  )

Generation functions for -Omics*-class objects

Description

These functions create valid objects of class OmicsPathway, OmicsSurv, OmicsReg, or OmicsCateg.

Usage

CreateOmicsPath(assayData_df, sampleIDs_char, pathwayCollection_ls)

CreateOmicsSurv(
  assayData_df,
  sampleIDs_char,
  pathwayCollection_ls,
  eventTime_num,
  eventObserved_lgl
)

CreateOmicsReg(
  assayData_df,
  sampleIDs_char,
  pathwayCollection_ls,
  response_num
)

CreateOmicsCateg(
  assayData_df,
  sampleIDs_char,
  pathwayCollection_ls,
  response_fact
)

Arguments

assayData_df

An N×pN \times p data frame with named columns.

sampleIDs_char

A character vector with the N sample names.

pathwayCollection_ls

A pathwayCollection list of known gene pathways with two or three elements:

  • pathways : A named list of character vectors. Each vector contains the names of the individual genes within that pathway as a vector of character strings. The names contained in these vectors must have non-empty overlap with the column names of the assayData_df data frame. The names of the pathways (the list elements themselves) should be the a shorthand representation of the full pathway name.

  • TERMS: A character vector the same length as the pathways list with the proper names of the pathways.

  • description : An optional character vector the same length as the pathways list with additional information about the pathways.

eventTime_num

A numeric vector with NN observations corresponding to the last observed time of follow up.

eventObserved_lgl

A logical vector with NN observations indicating right-censoring. The values will be FALSE if the observation was censored (i.e., we did not observe an event).

response_num

A numeric vector of length NN: the dependent variable in an ordinary regression exercise.

response_fact

A factor vector of length NN: the dependent variable of a generalized linear regression exercise.

Details

Please note that the classes of the parameters are not flexible. The -Omics assay data must be or extend the class data.frame, and the response values (for a survival-, regression-, or categorical-response object) must match their expected classes exactly. The reason for this is to encourage the end user to pay attention to the quality and format of their input data. Because the functions internal to this package have only been tested on the classes described in the Arguments section, these class checks prevent unexpected errors (or worse, incorrect computational results without an error). These draconian input class restrictions protect the accuracy of your data analysis.

Value

A valid object of class OmicsPathway, OmicsSurv, OmicsReg, or OmicsCateg.

OmicsPathway

Valid OmicsPathway objects will have no response information, just the mass spectrometry or bio-assay ("design") matrix and the pathway list. OmicsPathway objects should be created only when unsupervised pathway extraction is needed (not possible with Supervised PCA). Because of the missing response, no pathway testing can be performed on an OmicsPathway object.

OmicsSurv

Valid OmicsSurv objects will have two response vectors: a vector of the most recently recorded follow-up times and a logical vector if that time marks an event (TRUE: observed event; FALSE: right- censored observation).

OmicsReg and OmicsCateg

Valid OmicsReg and OmicsCateg objects with have one response vector of continuous (numeric) or categorial (factor) observations, respectively.

See Also

OmicsPathway, OmicsSurv, OmicsReg, and OmicsCateg

Examples

# DO NOT CALL THESE FUNCTIONS DIRECTLY. USE CreateOmics() INSTEAD.

  data("colon_pathwayCollection")
  data("colonSurv_df")

## Not run: 
  CreateOmicsPath(
    assayData_df = colonSurv_df[, -(1:3)],
    sampleIDs_char = colonSurv_df$sampleID,
    pathwayCollection_ls = colon_pathwayCollection
  )

  CreateOmicsSurv(
    assayData_df = colonSurv_df[, -(1:3)],
    sampleIDs_char = colonSurv_df$sampleID,
    pathwayCollection_ls = colon_pathwayCollection,
    eventTime_num = colonSurv_df$OS_time,
    eventObserved_lgl = as.logical(colonSurv_df$OS_event)
  )

  CreateOmicsReg(
    assayData_df = colonSurv_df[, -(1:3)],
    sampleIDs_char = colonSurv_df$sampleID,
    pathwayCollection_ls = colon_pathwayCollection,
    response_num = colonSurv_df$OS_time
  )

  CreateOmicsCateg(
    assayData_df = colonSurv_df[, -(1:3)],
    sampleIDs_char = colonSurv_df$sampleID,
    pathwayCollection_ls = colon_pathwayCollection,
    response_fact = as.factor(colonSurv_df$OS_event)
  )

## End(Not run)

Manually Create a pathwayCollection-class Object.

Description

Manually create a pathwayCollection list similar to the output of the read_gmt function.

Usage

CreatePathwayCollection(
  sets_ls,
  TERMS,
  setType = c("pathways", "genes", "regions"),
  ...
)

Arguments

sets_ls

A named list of character vectors. Each vector should contain the names of the individual genes, proteins, sits, or CpGs within that set as a vector of character strings. If you create this pathway collection to integrate with data of class Omics*, the names contained in these vectors should have non-empty overlap with the feature names of the assay data frame that will be paired with this list in the subsequent analysis.

TERMS

A character vector the same length as the sets_ls list with the proper names of the sets.

setType

What is the type of the set: pathway set of gene, gene sites in RNA or DNA, or regions of CpGs. Defaults to ''pathway''.

...

Additional vectors or data components related to the sets_ls list. These values should be passed as a name-value pair. See "Details" for more information.

Details

This function checks the set list and set term inputs and then creates a pathwayCollection object from them. Pass additional list elements (such as the description of each set) using the form tag = value through the ... argument (as in the list function). Because some functions in the pathwayPCA package add and edit elements of pathwayCollection objects, please do not create pathwayCollection list items named setsize or n_tested.

Value

A list object with class pathwayCollection.

See Also

read_gmt

Examples

data("colon_pathwayCollection")

  CreatePathwayCollection(
    sets_ls = colon_pathwayCollection$pathways,
    TERMS = colon_pathwayCollection$TERMS
  )

Extract PCs and Loadings from a superpcOut- or aespcOut-class Object.

Description

Given an object of class aespcOut or superpcOut, as returned by the functions AESPCA_pVals or SuperPCA_pVals, respectively, and the name or unique ID of a pathway, return a data frame of the principal components and a data frame of the loading vectors corresponding to that pathway.

Usage

getPathPCLs(pcOut, pathway_char, ...)

## S3 method for class 'superpcOut'
getPathPCLs(pcOut, pathway_char, ...)

## S3 method for class 'aespcOut'
getPathPCLs(pcOut, pathway_char, ...)

Arguments

pcOut

An object of classes superpcOut or aespcOut as returned by the SuperPCA_pVals or AESPCA_pVals functions, respectively.

pathway_char

A character string of the name or unique identifier of a pathway

...

Dots for additional arguments (currently unused).

Details

Match the supplied pathway character string to either the pathways or terms columns of the pVals_df data frame within the pcOut object. Then, subset the loadings_ls and PCs_ls lists for their entries which match the supplied pathway. Finally, return a list of the PCs, loadings, and the pathway ID and name.

Value

A list of four elements:

  • PCs : A data frame of the principal components

  • Loadings : A matrix of the loading vectors with features in the row names

  • pathway : The unique pathway identifier for the pcOut object

  • term : The name of the pathway

NULL

NULL

Examples

###  Load Data  ###
  data("colonSurv_df")
  data("colon_pathwayCollection")

  ###  Create -Omics Container  ###
  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "survival"
  )

  ###  Calculate Supervised PCA Pathway p-Values  ###
  colon_superpc <- SuperPCA_pVals(
    colon_Omics,
    numPCs = 2,
    parallel = TRUE,
    numCores = 2,
    adjustment = "BH"
  )

  ###  Extract PCs and Loadings  ###
  getPathPCLs(
    colon_superpc,
    "KEGG_PENTOSE_PHOSPHATE_PATHWAY"
  )

Extract Table of pp-values from a superpcOut- or aespcOut- class Object.

Description

Given an object of class aespcOut or superpcOut, as returned by the functions AESPCA_pVals or SuperPCA_pVals, respectively, return a data frame of the pp-values for the top pathways.

Usage

getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...)

## S3 method for class 'superpcOut'
getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...)

## S3 method for class 'aespcOut'
getPathpVals(pcOut, score = FALSE, numPaths = 20L, alpha = NULL, ...)

Arguments

pcOut

An object of classes superpcOut or aespcOut as returned by the SuperPCA_pVals or AESPCA_pVals functions, respectively.

score

Should the unadjusted pp-values be returned transformed to negative natural logarithm scores or left as is? Defaults to FALSE; that is, the raw pp-values are returned instead of the transformed pp-values.

numPaths

The number of top pathways by raw pp-value. Defaults to the top 20 pathways. We do not permit users to specify numPaths and alpha concurrently.

alpha

The significance threshold for raw pp-values. Defaults to NULL. If alpha is given, then numPaths will be ignored.

...

Dots for additional arguments (currently unused).

Details

Row-subset the pVals_df entry of an object of class aespcOut or superpcOut by the number of pathways requested (via the nPaths argument) or by the unadjusted significance level for each pathway (via the alpha argument). Return a data frame of the pathway names, FDR-adjusted significance levels (if available), and the raw score (negative natural logarithm of the pp-values) of each pathway.

Value

A data frame with the following columns:

  • terms : The pathway name, as given in the object@trimPathwayCollection$TERMS object.

  • description : (OPTIONAL) The pathway description, as given in the object@trimPathwayCollection$description object, if supplied.

  • rawp : The unadjusted pp-values of each pathway. Included if score = FALSE.

  • ... : Additional columns of FDR-adjusted pp-values as specified through the adjustment argument of the SuperPCA_pVals or AESPCA_pVals functions.

  • score : The negative natural logarithm of the unadjusted pp-values of each pathway. Included if score = TRUE.

NULL

NULL

Examples

###  Load Data  ###
  data("colonSurv_df")
  data("colon_pathwayCollection")

  ###  Create -Omics Container  ###
  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "survival"
  )

  ###  Calculate Supervised PCA Pathway p-Values  ###
  colon_superpc <- SuperPCA_pVals(
    colon_Omics,
    numPCs = 2,
    parallel = TRUE,
    numCores = 2,
    adjustment = "BH"
  )

  ###  Extract Table of p-Values  ###
  # Top 5 Pathways
  getPathpVals(
    colon_superpc,
    numPaths = 5
  )
  
  # Pathways with Unadjusted p-Values < 0.01
  getPathpVals(
    colon_superpc,
    alpha = 0.01
  )

Calculate Test Data PCs from Training-Data Estimated Loadings

Description

Given a list of loading vectors from a training data set, calculate the PCs of the test data set.

Usage

LoadOntoPCs(design_df, loadings_ls, sampleID = c("firstCol", "rowNames"))

Arguments

design_df

A test data frame with rows as samples and named features as columns

loadings_ls

A list of p×dp \times d loading vectors or matrices as returned by either the SuperPCA_pVals, AESPCA_pVals, or ExtractAESPCs functions. These lists of loadings will have feature names as their row names. Such feature names must match a subset of the column names of design_df exactly, as pathway-specific test-data subsetting is performed by column name.

sampleID

Are the sample IDs in the first column of design_df or in accessible by rownames(design_df)? Defaults to the first column. If your data does not have sample IDs for some reason, set this to rowNames.

Details

This function takes in a list of loadings and a training-centered test data set, applies over the list of loadings, subsets the columns of the test data by the row names of the loading vectors, right-multiplies the test-data subset matrix by the loading vector / matrix, and returns a data frame of the test-data PCs for each loading vector.

Value

A data frame with the PCs from each pathway concatenated by column. If you have the tidyverse loaded, this object will display as a tibble.

Examples

###  Load the Data  ###
  data("colonSurv_df")
  data("colon_pathwayCollection")

  ###  Create -Omics Container  ###
  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "survival"
  )

  ###  Extract AESPCs  ###
  colonSurv_aespc <- AESPCA_pVals(
    object = colon_Omics,
    numReps = 0,
    parallel = TRUE,
    numCores = 2,
    adjustpValues = TRUE,
    adjustment = c("Hoch", "SidakSD")
  )

  ###  Project Data onto Pathway First PCs  ###
  LoadOntoPCs(
    design_df = colonSurv_df,
    loadings_ls = colonSurv_aespc$loadings_ls
  )

An S4 class for categorical responses within an OmicsPathway object

Description

This creates the OmicsCateg class which extends the OmicsPathway master class.

Slots

assayData_df

An N×pN \times p data frame with named columns.

pathwayCollection

A list of known gene pathways with three or four elements:

  • pathways : A named list of character vectors. Each vector contains the names of the individual genes within that pathway as a vector of character strings. The names contained in these vectors must have non-empty overlap with the column names of the assayData_df data frame. The names of the pathways (the list elements themselves) should be the a shorthand representation of the full pathway name.

  • TERMS : A character vector the same length as the pathways list with the proper names of the pathways.

  • description : An optional character vector the same length as the pathways list with additional information about the pathways.

  • setsize : A named integer vector the same length as the pathways list with the number of genes in each pathway. This list item is calculated during the creation step of a CreateOmics function call.

response

A factor vector of length NN: the dependent variable of a generalized linear regression exercise. Currently, we support binary factors only. We expect to extend support to n-ary responses in the next package version.

See Also

OmicsPathway, CreateOmics


An S4 class for mass spectrometry or bio-assay data and gene pathway lists

Description

An S4 class for mass spectrometry or bio-assay data and gene pathway lists

Slots

assayData_df

An N×pN \times p data frame with named columns.

sampleIDs_char

A character vector with the N sample names.

pathwayCollection

A list of known gene pathways with three or four elements:

  • pathways : A named list of character vectors. Each vector contains the names of the individual genes within that pathway as a vector of character strings. The names contained in these vectors must have non-empty overlap with the column names of the assayData_df data frame. The names of the pathways (the list elements themselves) should be the a shorthand representation of the full pathway name.

  • TERMS : A character vector the same length as the pathways list with the proper names of the pathways.

  • description : An optional character vector the same length as the pathways list with additional information about the pathways.

  • setsize : A named integer vector the same length as the pathways list with the number of genes in each pathway. This list item is calculated during the creation step of a CreateOmics function call.

trimPathwayCollection

A subset of the list stored in the pathwayCollection slot. This list will have pathways that only contain genes that are present in the assay data frame.

See Also

CreateOmics


An S4 class for continuous responses within an OmicsPathway object

Description

This creates the OmicsReg class which extends the OmicsPathway master class.

Slots

assayData_df

An N×pN \times p data frame with named columns.

pathwayCollection

A list of known gene pathways with three or four elements:

  • pathways : A named list of character vectors. Each vector contains the names of the individual genes within that pathway as a vector of character strings. The names contained in these vectors must have non-empty overlap with the column names of the assayData_df data frame. The names of the pathways (the list elements themselves) should be the a shorthand representation of the full pathway name.

  • TERMS : A character vector the same length as the pathways list with the proper names of the pathways.

  • description : An optional character vector the same length as the pathways list with additional information about the pathways.

  • setsize : A named integer vector the same length as the pathways list with the number of genes in each pathway. This list item is calculated during the creation step of a CreateOmics function call.

response

A numeric vector of length NN: the dependent variable in a regression exercise.

See Also

OmicsPathway, CreateOmics


An S4 class for survival responses within an OmicsPathway object

Description

This creates the OmicsSurv class which extends the OmicsPathway master class.

Slots

assayData_df

An N×pN \times p data frame with named columns.

pathwayCollection

A list of known gene pathways with three or four elements:

  • pathways : A named list of character vectors. Each vector contains the names of the individual genes within that pathway as a vector of character strings. The names contained in these vectors must have non-empty overlap with the column names of the assayData_df data frame. The names of the pathways (the list elements themselves) should be the a shorthand representation of the full pathway name.

  • TERMS : A character vector the same length as the pathways list with the proper names of the pathways.

  • description : An optional character vector the same length as the pathways list with additional information about the pathways.

  • setsize : A named integer vector the same length as the pathways list with the number of genes in each pathway. This list item is calculated during the creation step of a CreateOmics function call.

eventTime

A numeric vector with NN observations corresponding to the last observed time of follow up.

eventObserved

A logical vector with NN observations indicating right-censoring. The values will be FALSE if the observation was censored (i.e., we did not observe an event).

See Also

OmicsPathway, CreateOmics


Read a .gmt file in as a pathwayCollection object

Description

Read a set list file in Gene Matrix Transposed (.gmt) format, with special performance consideration for large files. Present this object as a pathwayCollection object.

Usage

read_gmt(
  file,
  setType = c("pathways", "genes", "regions"),
  description = FALSE,
  nChars = 1e+07,
  delim = "\t"
)

Arguments

file

A path to a file or a connection. This file must be a .gmt file, otherwise input will likely be nonsense. See the "Details" section for more information.

setType

What is the type of the set: pathway set of gene, gene sites in RNA or DNA, or regions of CpGs. Defaults to ''pathway''.

description

Should the "description" field (the second field in the .gmt file on each line) be included in the output? Defaults to FALSE.

nChars

The number of characters to read from a connection. The largest .gmt file we have encountered is the full C5 pathway collection from MSigDB (5917 pathways), which has roughly 5 million characters in UTF-8 encoding. Therefore, we default this argument to be twice the size of the largest pathway collection we have seen so far, 10,000,000.

delim

The .gmt delimiter. As proper .gmt files are tab delimited, this defaults to "\t".

Details

This function uses R's readChar function to improve character input performance over readLines (and far improve input performance over scan).

See the Broad Institute's "Data Formats" page for a description of the Gene Matrix Transposed file format: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29

Value

A pathwayCollection list of sets. This list has three elements:

  • 'setType' : A named list of character vectors. Each vector contains the names of the individual genes, sites, or CpGs within that set as a vector of character strings. The name of this list entry is equal to the value specified in setType.

  • TERMS : A character vector the same length as the 'setType' list with the proper names of the sets.

  • description : (OPTIONAL) A character vector the same length as the 'setType' list with a note on that set (for the .gmt file included with this package, this field contains hyperlinks to the MSigDB description card for that pathway). This field is included when description = TRUE.

See Also

print.pathwayCollection; write_gmt

Examples

# If you have installed the package:
  data_path <- system.file(
    "extdata", "c2.cp.v6.0.symbols.gmt",
    package = "pathwayPCA", mustWork = TRUE
  )
  geneset_ls <- read_gmt(data_path, description = TRUE)

  # # If you are using the development version from GitHub:
  # geneset_ls <- read_gmt(
  #   "inst/extdata/c2.cp.v6.0.symbols.gmt",
  #   description = TRUE
  # )

Tidy a SummarizedExperiment Assay

Description

Extract the assay information from a SummarizedExperiment-class-object, transpose it, and and return it as a tidy data frame that contains assay measurements, feature names, and sample IDs

Usage

SE2Tidy(summExperiment, whichAssay = 1)

Arguments

summExperiment

A SummarizedExperiment-class object

whichAssay

Because SummarizedExperiment objects can store multiple related assays, which assay will be paired with a given pathway collection to create an Omics*-class data container? Defaults to 1, for the first assay in the object.

Details

This function is designed to extract and transpose a "tall" assay data frames (where genes or proteins are the rows and patient or tumour samples are the columns) from a SummarizedExperiment object. This function also transposes the row (feature) names to column names and the column (sample) names to row names via the TransposeAssay function.

NOTE: if this function stops working (again), please add a comment here: https://github.com/gabrielodom/pathwayPCA/issues/83

Value

The transposition of the assay in summExperiment to tidy form, with the column data (from the colData slot of the object) appended as the first columns of the data frame.

Examples

# THIS REQUIRES THE SummarizedExperiment PACKAGE.
   library(SummarizedExperiment)
   data(airway, package = "airway")
   
   airway_df <- SE2Tidy(airway)

Access and Edit Assay or pathwayCollection Values in Omics* Objects

Description

"Get" or "Set" the values of the assayData_df, sampleIDs_char, or pathwayCollection slots of an object of class OmicsPathway or a class that extends this class (OmicsSurv, OmicsReg, or OmicsCateg).

Usage

getAssay(object, ...)

getAssay(object) <- value

getSampleIDs(object, ...)

getSampleIDs(object) <- value

getPathwayCollection(object, ...)

getPathwayCollection(object) <- value

getTrimPathwayCollection(object, ...)

## S4 method for signature 'OmicsPathway'
getAssay(object, ...)

## S4 replacement method for signature 'OmicsPathway'
getAssay(object) <- value

## S4 method for signature 'OmicsPathway'
getSampleIDs(object, ...)

## S4 replacement method for signature 'OmicsPathway'
getSampleIDs(object) <- value

## S4 method for signature 'OmicsPathway'
getPathwayCollection(object, ...)

## S4 replacement method for signature 'OmicsPathway'
getPathwayCollection(object) <- value

## S4 method for signature 'OmicsPathway'
getTrimPathwayCollection(object, ...)

Arguments

object

An object of or extending OmicsPathway-class: that class, OmicsSurv-class, OmicsReg-class, or OmicsCateg-class.

...

Dots for additional internal arguments (currently unused).

value

The replacement object to be assigned to the specified slot.

Details

These functions can be useful to set or extract the assay data or pathways list from an Omics*-class object. However, we recommend that users simply create a new, valid Omics* object instead of modifying an existing one. The validity of edited objects is checked with the ValidOmicsSurv, ValidOmicsCateg, or ValidOmicsReg functions.

Further, because the pathwayPCA methods require a cleaned (trimmed) pathway collection, the trimPathwayCollection slot is read-only. Users may only edit this slot by updating the pathway collection provided to the pathwayCollection slot. Despite this functionality, we strongly recommend that users create a new object with the updated pathway collection, rather than attempting to overwrite the slots within an existing object. See IntersectOmicsPwyCollct for details on trimmed pathway collection.

Value

The "get" functions return the objects in the slots specified: getAssay returns the assayData_df data frame object, getSampleIDs returns the sampleIDs_char character vector, getPathwayCollection returns the pathwayCollection list object, and getTrimPathwayCollection returns the trimPathwayCollection. These functions can extract these values from any valid OmicsPathway, OmicsSurv, OmicsReg, or OmicsCateg object.

The "set" functions enable the user to edit or replace objects in the assayData_df, sampleIDs_char, or pathwayCollection slots for any OmicsPathway, OmicsSurv, OmicsReg, or OmicsCateg objects, provided that the new values do not violate the validity checks of their respective objects. Because the slot for trimPathwayCollection is filled upon object creation, and to ensure that this pathway collection is "clean", there is no "set" function for the trimmed pathway collection slot. Instead, users can update the pathway collection, and the trimmed pathway collection will be updated automatically. See "Details" for more information on the "set" functions.

See Also

CreateOmics

Examples

data("colonSurv_df")
  data("colon_pathwayCollection")

  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection
  )

  getAssay(colon_Omics)
  getPathwayCollection(colon_Omics)

Access and Edit Response of an OmicsReg or OmicsReg Object

Description

"Get" or "Set" the values of the response_num or response_fact slots of an object of class OmicsReg or OmicsReg, respectively.

Usage

getResponse(object, ...)

getResponse(object) <- value

## S4 method for signature 'OmicsPathway'
getResponse(object, ...)

## S4 replacement method for signature 'OmicsPathway'
getResponse(object) <- value

Arguments

object

An object of class OmicsReg-class or OmicsCateg-class.

...

Dots for additional internal arguments (currently unused).

value

The replacement object to be assigned to the response slot.

Details

These functions can be useful to set or extract the response vector from an object of class OmicsReg or OmicsReg. However, we recommend that users simply create a new, valid object instead of modifying an existing one. The validity of edited objects is checked with their respective ValidOmicsCateg or ValidOmicsReg function. Because both classes have a response slot, we set this method for the parent class, OmicsPathway-class.

Value

The "get" functions return the objects in the slots specified: getResponse returns the response_num vector from objects of class OmicsReg and the response_fact vector from objects of class OmicsCateg. These functions can extract these values from any valid object of those classes.

The "set" functions enable the user to edit or replace the object in the response_num slot for any OmicsReg object or response_fact slot for any OmicsCateg object, provided that the new values do not violate the validity check of such an object. See "Details" for more information.

See Also

CreateOmics

Examples

data("colonSurv_df")
  data("colon_pathwayCollection")

  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, c(1, 2)],
    respType = "reg"
  )

  getResponse(colon_Omics)

Access and Edit Event Time or Indicator in an OmicsSurv Object

Description

"Get" or "Set" the values of the eventTime_num or eventObserved_lgl slots of an object of class OmicsSurv.

Usage

getEventTime(object, ...)

getEventTime(object) <- value

getEvent(object, ...)

getEvent(object) <- value

## S4 method for signature 'OmicsSurv'
getEventTime(object, ...)

## S4 replacement method for signature 'OmicsSurv'
getEventTime(object) <- value

## S4 method for signature 'OmicsSurv'
getEvent(object, ...)

## S4 replacement method for signature 'OmicsSurv'
getEvent(object) <- value

Arguments

object

An object of class OmicsSurv-class.

...

Dots for additional internal arguments (currently unused).

value

The replacement object to be assigned to the specified slot.

Details

These functions can be useful to set or extract the event time or death indicator from an OmicsSurv object. However, we recommend that users simply create a new, valid OmicsSurv object instead of modifying an existing one. The validity of edited objects is checked with the ValidOmicsSurv function.

Value

The "get" functions return the objects in the slots specified: getEventTime returns the eventTime_num vector object and getEvent returns the eventObserved_lgl vector object. These functions can extract these values from any valid OmicsSurv object.

The "set" functions enable the user to edit or replace objects in the eventTime_num or eventObserved_lgl slots for any OmicsSurv object, provided that the new values do not violate the validity check of an OmicsSurv object. See "Details" for more information.

See Also

CreateOmics

Examples

data("colonSurv_df")
  data("colon_pathwayCollection")

  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "survival"
  )

  getEventTime(colon_Omics)
  getEvent(colon_Omics)

Subset a pathwayCollection-class Object by Pathway.

Description

The subset method for pathways lists as returned by the read_gmt function.

Usage

## S3 method for class 'pathwayCollection'
x[[name_char]]

Arguments

x

An object of class pathwayCollection.

name_char

The name of a pathway in the collection or its unique ID.

Details

This function finds the index matching the name_char argument to the TERMS field of the pathwayCollection-class Object, then subsets the pathways list, TERMS vector, description vector, and setsize vector by this index. If you subset a trimmed pathwayCollection object, and the function errors with "Pathway not found.", then the pathway specified has been trimmed from the pathway collection.

Also, this function does not allow for users to overwrite any portion of a pathway collection. These objects should rarely, if ever, be changed. If you absolutely must change the components of a pathwayCollection object, then create a new one with the codeCreatePathwayCollection function.

Value

A list of the pathway name (Term), unique ID (pathID), contents (IDs), description (description), and number of features (Size).

Examples

data("colon_pathwayCollection")
  colon_pathwayCollection[["KEGG_RETINOL_METABOLISM"]]

Subset Pathway-Specific Data

Description

Given an Omics object and the name of a pathway, return the -omes in the assay and the response as a (tibble) data frame.

Usage

SubsetPathwayData(object, pathName, ...)

## S4 method for signature 'OmicsPathway'
SubsetPathwayData(object, pathName, ...)

Arguments

object

An object of class OmicsPathway, or an object extending this class.

pathName

The name of a pathway contained in the pathway collection in the object.

...

Dots for additional internal arguments (currently unused).

Details

This function subsets the assay by the matching gene symbols or IDs in the specified pathway.

Value

A data frame of the columns of the assay in the Omics object which are listed in the specified pathway, with a leading column for sample IDs. If the Omics object has response information, these are also included as the first column(s) of the data frame, after the sample IDs. If you have the suggested tidyverse package suite loaded, then this data frame will print as a tibble. Otherwise, it will print as a data frame.

Examples

data("colonSurv_df")
  data("colon_pathwayCollection")

  colon_Omics <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "survival"
  )

  SubsetPathwayData(
    colon_Omics,
    "KEGG_RETINOL_METABOLISM"
  )

Test pathways with Supervised PCA

Description

Given a supervised OmicsPath object (one of OmicsSurv, OmicsReg, or OmicsCateg), extract the first kk principal components (PCs) from each pathway-subset of the -Omics assay design matrix, test their association with the response matrix, and return a data frame of the adjusted pp-values for each pathway.

Usage

SuperPCA_pVals(
  object,
  n.threshold = 20,
  numPCs = 1,
  parallel = FALSE,
  numCores = NULL,
  adjustpValues = TRUE,
  adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY",
    "ABH", "TSBH"),
  ...
)

## S4 method for signature 'OmicsPathway'
SuperPCA_pVals(
  object,
  n.threshold = 20,
  numPCs = 1,
  parallel = FALSE,
  numCores = NULL,
  adjustpValues = TRUE,
  adjustment = c("Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY",
    "ABH", "TSBH"),
  ...
)

Arguments

object

An object of superclass OmicsPathway with a response matrix or vector.

n.threshold

The number of bins into which to split the feature scores in the fit object returned internally by the superpc.train function to the pathway_tScores and pathway_tControl functions. Defaults to 20. Smaller values may result in less accurate pathway pp-values while larger values increase computation time.

numPCs

The number of PCs to extract from each pathway. Defaults to 1.

parallel

Should the computation be completed in parallel? Defaults to FALSE.

numCores

If parallel = TRUE, how many cores should be used for computation? Internally defaults to the number of available cores minus 1.

adjustpValues

Should you adjust the pp-values for multiple comparisons? Defaults to TRUE.

adjustment

Character vector of procedures. The returned data frame will be sorted in ascending order by the first procedure in this vector, with ties broken by the unadjusted pp-value. If only one procedure is selected, then it is necessarily the first procedure. See the documentation for the ControlFDR function for the adjustment procedure definitions and citations.

...

Dots for additional internal arguments.

Details

This is a wrapper function for the pathway_tScores, pathway_tControl, OptimGumbelMixParams, GumbelMixpValues, and TabulatepValues functions.

Please see our Quickstart Guide for this package: https://gabrielodom.github.io/pathwayPCA/articles/Supplement1-Quickstart_Guide.html

Value

A data frame with columns:

  • pathways : The names of the pathways in the Omics* object (given in object@trimPathwayCollection$pathways.)

  • setsize : The number of genes in each of the original pathways (given in the object@trimPathwayCollection$setsize object).

  • terms : The pathway description, as given in the object@trimPathwayCollection$TERMS object.

  • rawp : The unadjusted pp-values of each pathway.

  • ... : Additional columns as specified through the adjustment argument.

The data frame will be sorted in ascending order by the method specified first in the adjustment argument. If adjustpValues = FALSE, then the data frame will be sorted by the raw pp-values. If you have the suggested tidyverse package suite loaded, then this data frame will print as a tibble. Otherwise, it will print as a data frame.

See Also

CreateOmics; TabulatepValues; pathway_tScores; pathway_tControl; OptimGumbelMixParams; GumbelMixpValues; clusterApply

Examples

###  Load the Example Data  ###
  data("colonSurv_df")
  data("colon_pathwayCollection")

  ###  Create an OmicsSurv Object  ###
  colon_OmicsSurv <- CreateOmics(
    assayData_df = colonSurv_df[, -(2:3)],
    pathwayCollection_ls = colon_pathwayCollection,
    response = colonSurv_df[, 1:3],
    respType = "surv"
  )

  ###  Calculate Pathway p-Values  ###
  colonSurv_superpc <- SuperPCA_pVals(
    object = colon_OmicsSurv,
    parallel = TRUE,
    numCores = 2,
    adjustpValues = TRUE,
    adjustment = c("Hoch", "SidakSD")
  )

Transpose an Assay (Data Frame)

Description

Transpose an object of class data.frame that contains assay measurements while preserving row (feature) and column (sample) names.

Usage

TransposeAssay(
  assay_df,
  omeNames = c("firstCol", "rowNames"),
  stringsAsFactors = FALSE
)

Arguments

assay_df

A data frame with numeric values to transpose

omeNames

Are the data feature names in the first column or in the row names of df? Defaults to the first column. If the feature names are in the row names, this function assumes that these names are accesible by the rownames function called on df.

stringsAsFactors

Should columns containing string information be coerced to factors? Defaults to FALSE.

Details

This function is designed to transpose "tall" assay data frames (where genes or proteins are the rows and patient or tumour samples are the columns). This function also transposes the row (feature) names to column names and the column (sample) names to row names. Notice that all rows and columns (other than the feature name column, as applicable) are numeric.

Recall that data frames require that all elements of a single column to have the same class. Therefore, sample IDs of a "tall" data frame must be stored as the column names rather than in the first row.

Value

The transposition of df, with row and column names preserved and reversed.

Examples

x_mat <- matrix(rnorm(5000), ncol = 20, nrow = 250)
   rownames(x_mat) <- paste0("gene_", 1:250)
   colnames(x_mat) <- paste0("sample_", 1:20)
   x_df <- as.data.frame(x_mat, row.names = rownames(x_mat))

   TransposeAssay(x_df, omeNames = "rowNames")

Filter and Subset a pathwayCollection-class Object by Symbol.

Description

The filter-subset method for pathways lists as returned by the read_gmt function. This function returns the subset of pathways which contain the set of symbols requested

Usage

WhichPathways(x, symbols_char, ...)

Arguments

x

An object of class pathwayCollection.

symbols_char

A character vector or scalar of gene symbols or regions

...

Additional arguments passed to the Contains function

Details

This function finds the index of each set that contains the symbols supplied, then returns those sets as a new pathwayCollection object. Find pathways that contain geneA OR geneB by passing the argument matches = "any" through ... to Contains (this is the default value). Find pathways that contain geneA AND geneB by changing this argument to matches = "all". Find all genes in a specified family by passing in one value to short and setting partial = TRUE.

Value

An object of class pathwayCollection, but containing only the sets which contain the symbols supplied to symbols_char. If no sets are found to contain the symbols supplied, this function returns NULL and prints a warning.

Examples

data("colon_pathwayCollection")
  
  WhichPathways(colon_pathwayCollection, "MAP", partial = TRUE)
  
  WhichPathways(
    colon_pathwayCollection,
    c("MAP4K5", "RELA"),
    matches = "all"
  )

Wikipathways Homosapiens EntrezIDs

Description

A pathwayCollection object containing the homosapiens pathways list from Wikipathways (https://www.wikipathways.org/).

Usage

data(wikipwsHS_Entrez_pathwayCollection)

Format

A pathwayCollection list of three elements:

  • pathways : A named list of 443 character vectors. Each vector contains the Entrez Gene IDs of the individual genes within that pathway as a vector of character strings. The names are the shorthand pathway names.

  • TERMS : A character vector of length 443 containing the shorthand names of the gene pathways.

  • description : A character vector of length 443 containing the full names of the gene pathways.

Details

This pathwayCollection was sent to us from Dr. Alexander Pico at the Gladstone Institute (https://gladstone.org/our-science/people/alexander-pico).

Source

Dr. Alexander Pico, Wikipathways


Wikipathways Homosapiens Gene Symbols

Description

A pathwayCollection object containing the homosapiens pathways list from Wikipathways (https://www.wikipathways.org/).

Usage

data(wikipwsHS_Symbol_pathwayCollection)

Format

A pathwayCollection list of three elements:

  • pathways : A named list of 457 character vectors. Each vector contains the Gene Symbols of the individual genes within that pathway as a vector of character strings. The names are the shorthand pathway names.

  • TERMS : A character vector of length 457 containing the shorthand names of the gene pathways.

  • description : A character vector of length 457 containing the full names of the gene pathways.

Details

This pathwayCollection was sent to us from Dr. Alexander Pico at the Gladstone Institute (https://gladstone.org/our-science/people/alexander-pico).

This pathway collection was translated from EntrezIDs to HGNC Symbols with the script convert_EntrezID_to_HGNC_Ensembl.R in scripts.

Source

Dr. Alexander Pico, Wikipathways


Write a pathwayCollection Object to a .gmt File

Description

Write a pathwayCollection object as a pathways list file in Gene Matrix Transposed (.gmt) format.

Usage

write_gmt(pathwayCollection, file, setType = c("pathways", "genes", "regions"))

Arguments

pathwayCollection

A pathwayCollection list of sets. This list contains the following two or three elements:

  • 'setType' : A named list of character vectors. Each vector contains the names of the individual genes, sites, or CpGs within that set as a vector of character strings. If you are using genes, these genes can be represented by HGNC gene symbols, Entrez IDs, Ensembl IDs, GO terms, etc.

  • TERMS : A character vector the same length as the 'setType' list with the proper names of the sets.

  • description : An optional character vector the same length as the 'setType' list with a note on that set (such as a url to the description if the set is a pathway). If this element of the pathwayCollection is NULL, then the file will be written with "" (the empty character string) as its second field in each line.

file

Either a character string naming a file or a connection open for writing. File names should end in .gmt for clarity.

setType

What is the type of the set: pathway set of gene, gene sites in RNA or DNA, or regions of CpGs. Defaults to ''pathway''.

Details

See the Broad Institute's "Data Formats" page for a description of the Gene Matrix Transposed file format: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29

Value

NULL. Output written to the file path specified.

See Also

print.pathwayCollection; read_gmt

Examples

# Toy pathway set
  toy_pathwayCollection <- list(
    pathways = list(
      c("C1orf27", "NR5A1", "BLOC1S4", "C4orf50"),
      c("TARS2", "DUSP5", "GPR88"),
      c("TRX-CAT3-1", "LINC01333", "LINC01499", "LINC01046", "LINC01149")
    ),
    TERMS = c("C-or-f_paths", "randomPath2", "randomLINCs"),
    description = c("these are", "totally made up", "pathways")
  )
  class(toy_pathwayCollection) <- c("pathwayCollection", "list")
  toy_pathwayCollection

  # write_gmt(toy_pathwayCollection, file = "example_pathway.gmt")