Molecular subtyping for ovarian cancer

Introduction

consensusOV is a package for molecular subtyping for ovarian cancer. It is intended for whole-transcriptome gene expression datasets from patients with high-grade serous ovarian carcinoma. This package includes implementations of four previously published subtype classifiers (Helland et al., 2011; Bentink et al., 2012; Verhaak et al., 2013; Konecny et al., 2014) and a consensus random forest classifier (Chen et al., 2018).

The get.subtypes() function is a wrapper for the other package subtyping functions get.consensus.subtypes(), get.konecny.subtypes(), get.verhaak.subtypes(), get.bentink.subtypes(), get.helland.subtypes(). It can take as input either a matrix of gene expression values and a vector of Entrez IDs; or an ExpressionSet from the Biobase package. If expression.dataset is a matrix, it should be formatted with genes as rows and patients as columns; and entrez.ids should be a vector with length the same as nrow(expression.dataset). The method argument specifies which of the five subtype classifiers to use.

Load Data

library(consensusOV)
## Warning: multiple methods tables found for 'setequal'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'DelayedArray'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'IRanges'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'S4Arrays'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'SparseArray'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'XVector'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'AnnotationDbi'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'Biostrings'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'GenomeInfoDb'
## Warning: multiple methods tables found for 'setequal'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'HDF5Array'
## Warning: replacing previous import 'S4Arrays::read_block' by
## 'DelayedArray::read_block' when loading 'HDF5Array'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'SummarizedExperiment'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'GenomicRanges'
## Warning: replacing previous import 'S4Arrays::read_block' by
## 'DelayedArray::read_block' when loading 'SummarizedExperiment'
library(Biobase)
library(genefu)
## Warning: replacing previous import 'limma::logsumexp' by 'mclust::logsumexp'
## when loading 'genefu'

The package contains a subset of the ovarian cancer microarray dataset GSE14764 as example data.

data(GSE14764.eset)
dim(GSE14764.eset)
## Features  Samples 
##     1175       41
GSE14764.expression.matrix <- exprs(GSE14764.eset)
GSE14764.expression.matrix[1:5,1:5]
##              GSE14764_GSM368661 GSE14764_GSM368662 GSE14764_GSM368664
## geneid.10397          10.856712          10.445412          11.976560
## geneid.65108          10.856441          10.312760          12.499419
## geneid.8655           11.518799          11.897707          11.782895
## geneid.22919           8.608944           8.756986           9.170513
## geneid.3925            7.658680           6.698586           7.159795
##              GSE14764_GSM368665 GSE14764_GSM368668
## geneid.10397          11.651318          10.907453
## geneid.65108          11.377340          11.088542
## geneid.8655           11.799197          11.958500
## geneid.22919           8.627511           8.849757
## geneid.3925            7.466107           6.566558
GSE14764.entrez.ids <- fData(GSE14764.eset)$EntrezGene.ID
head(GSE14764.entrez.ids)
## [1] "10397" "65108" "8655"  "22919" "3925"  "1718"

Subtyping

bentink.subtypes <- get.subtypes(GSE14764.eset, method = "Bentink")
bentink.subtypes$Bentink.subtypes
##  [1] "Angiogenic"    "nonAngiogenic" "nonAngiogenic" "Angiogenic"   
##  [5] "Angiogenic"    "nonAngiogenic" "Angiogenic"    "nonAngiogenic"
##  [9] "nonAngiogenic" "Angiogenic"    "nonAngiogenic" "nonAngiogenic"
## [13] "Angiogenic"    "nonAngiogenic" "nonAngiogenic" "nonAngiogenic"
## [17] "nonAngiogenic" "Angiogenic"    "nonAngiogenic" "nonAngiogenic"
## [21] "Angiogenic"    "Angiogenic"    "Angiogenic"    "nonAngiogenic"
## [25] "nonAngiogenic" "Angiogenic"    "nonAngiogenic" "nonAngiogenic"
## [29] "nonAngiogenic" "nonAngiogenic" "nonAngiogenic" "Angiogenic"   
## [33] "nonAngiogenic" "nonAngiogenic" "nonAngiogenic" "nonAngiogenic"
## [37] "nonAngiogenic" "Angiogenic"    "nonAngiogenic" "nonAngiogenic"
## [41] "nonAngiogenic"
konecny.subtypes <- get.subtypes(GSE14764.eset, method = "Konecny")
konecny.subtypes$Konecny.subtypes
##  [1] C3_profL C1_immL  C2_diffL C4_mescL C1_immL  C1_immL  C4_mescL C3_profL
##  [9] C3_profL C1_immL  C2_diffL C2_diffL C4_mescL C2_diffL C3_profL C2_diffL
## [17] C1_immL  C4_mescL C1_immL  C2_diffL C4_mescL C4_mescL C4_mescL C1_immL 
## [25] C3_profL C3_profL C2_diffL C2_diffL C3_profL C2_diffL C3_profL C1_immL 
## [33] C1_immL  C2_diffL C1_immL  C2_diffL C3_profL C3_profL C2_diffL C3_profL
## [41] C2_diffL
## Levels: C1_immL C2_diffL C3_profL C4_mescL
helland.subtypes <- get.subtypes(GSE14764.eset, method = "Helland")
helland.subtypes$Helland.subtypes
##  [1] C1 C2 C5 C1 C1 C2 C4 C1 C5 C2 C4 C4 C1 C5 C4 C4 C2 C1 C4 C5 C1 C1 C1 C2 C5
## [26] C1 C5 C2 C5 C4 C5 C2 C2 C4 C2 C4 C1 C1 C4 C1 C4
## Levels: C2 C4 C5 C1
# to align with the Verhaak subtypes, we need to remove the "geneid." in the rownames
temp_eset <- GSE14764.eset
rownames(temp_eset) <- gsub("geneid.", "", rownames(temp_eset))

verhaak.subtypes <- get.subtypes(temp_eset, method = "Verhaak")
## ℹ GSVA version 2.1.1
## ℹ Using a MulticoreParam parallel back-end with 2 workers
## ℹ Calculating  ssGSEA scores for 4 gene sets
## ℹ Calculating ranks
## ℹ Calculating rank weights
## ✔ Calculations finished
verhaak.subtypes$Verhaak.subtypes
##  [1] MES IMR DIF DIF MES IMR DIF MES PRO IMR DIF DIF MES DIF DIF DIF DIF MES DIF
## [20] PRO MES DIF DIF DIF MES MES DIF DIF PRO DIF PRO IMR IMR DIF DIF DIF PRO MES
## [39] DIF MES DIF
## Levels: IMR DIF PRO MES
consensus.subtypes <- get.subtypes(GSE14764.eset, method = "consensusOV")
## Loading training data
## Training Random Forest...
consensus.subtypes$consensusOV.subtypes
##  [1] MES_consensus IMR_consensus PRO_consensus IMR_consensus IMR_consensus
##  [6] IMR_consensus MES_consensus MES_consensus PRO_consensus IMR_consensus
## [11] DIF_consensus DIF_consensus MES_consensus PRO_consensus DIF_consensus
## [16] DIF_consensus IMR_consensus MES_consensus IMR_consensus PRO_consensus
## [21] MES_consensus MES_consensus MES_consensus DIF_consensus PRO_consensus
## [26] MES_consensus DIF_consensus IMR_consensus PRO_consensus DIF_consensus
## [31] PRO_consensus IMR_consensus IMR_consensus DIF_consensus DIF_consensus
## [36] DIF_consensus PRO_consensus PRO_consensus DIF_consensus MES_consensus
## [41] DIF_consensus
## Levels: IMR_consensus DIF_consensus PRO_consensus MES_consensus
## Alternatively, e.g.
data(sigOvcAngiogenic)
bentink.subtypes <- get.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids, method = "Bentink")

Each subtyping function outputs a list with two values. The first value is a factor of subtype labels. The second is an classifier-specific values. For the Konecny, Helland, Verhaak, and Consensus classifiers, this object is a dataframe with subtype specific scores. For the Bentink classifier, the object is the output of the genefu function call.

Subtype classifiers can alternatively be called using inner functions.

bentink.subtypes <- get.bentink.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids)

Subtype Scores

The Konecny, Helland, Verhaak, and Consensus classifiers produce real-valued subtype scores. We can use these in various ways - for example, here, we compute correlations between correspinding subtype scores.

We can compare the subtype scores between the Verhaak and Helland classifiers:

vest <- verhaak.subtypes$gsva.out
vest <- vest[,c("IMR", "DIF", "PRO", "MES")]
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
    as.vector(vest), 
    rep(colnames(vest), each=nrow(vest)),
    as.vector(hest), 
    rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Verhaak", "vsc", "Helland", "hsc")
## plot
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:consensusOV':
## 
##     margin
ggplot(dat, aes(Verhaak, Helland)) + geom_point() + facet_wrap(vsc~hsc, nrow = 2, ncol = 2)

Corresponding correlation values are 0.81, 0.85, 0.45, and 0.89.

Likewise, we can compare the subtype scores between the Konecny and Helland classifier:

kost <- konecny.subtypes$spearman.cc.vals
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
    as.vector(kost), 
    rep(colnames(kost), each=nrow(kost)),
    as.vector(hest), 
    rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Konecny", "ksc", "Helland", "hsc")
## plot
ggplot(dat, aes(Konecny, Helland)) + geom_point() + facet_wrap(ksc~hsc, nrow = 2, ncol = 2)

Corresponding correlation values are 0.95, 0.84, 0.7, and 0.95.