Title: | Subtype Identification with Survival Data |
---|---|
Description: | Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful. |
Authors: | Dongmin Jung |
Maintainer: | Dongmin Jung <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.23.0 |
Built: | 2024-12-30 04:51:46 UTC |
Source: | https://github.com/bioc/survtype |
For discovery of subtypes of samples that are both clinically relevant and biologically meaningful, the Cox regession model and hierarchical clustering are combined.
Exprs.survtype(surv.data, time, status, exprs.data, K = 2, num.genes = 100, gene.sel = FALSE, gene.sel.opt = list(verbose = FALSE), ...)
Exprs.survtype(surv.data, time, status, exprs.data, K = 2, num.genes = 100, gene.sel = FALSE, gene.sel.opt = list(verbose = FALSE), ...)
surv.data |
survival data |
time |
survival time |
status |
status indicator |
exprs.data |
expression data |
K |
the number of clusters (default: 2) |
num.genes |
the number of top genes based on the Cox score, before variable selection (default: 100) |
gene.sel |
a logical value indicating whether or not gene selection for clustring is applied (default: FALSE) |
gene.sel.opt |
a list of options for the gene selection function "clustvarsel". "verbose" is set to FALSE as default. |
... |
additional parameters for the "pheatmap" |
n |
the number of subjects in each group |
obs |
the weighted observed number of events in each group |
exp |
the weighted expected number of events in each group |
chisq |
the chi-squared statistic for a test of equality |
call |
the matched call |
fit |
fitted survival curves |
cluster |
a vector of integers indicating the cluster to which each sample is assigned |
time |
survival time |
status |
status indicator |
surv.data |
survival data |
exprs.data |
expression data |
Dongmin Jung
Bair, E., & Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoS biology, 2(4), e108.
survival::Surv, survival::survfit, survival::survdiff, survival::coxph, clustvarsel::clustvarsel, pheatmap::pheatmap
set.seed(1) nrows <- 5 ncols <- nrow(ovarian) counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste("X", 1:ncols, sep = "") rownames(ovarian) <- paste("X", 1:ncols, sep = "") SE <- SummarizedExperiment(assays = SimpleList(counts = counts)) ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat", assay(SE), num.genes = 2, scale = "row", clustering_method = "ward.D2") plot(ovarian.survtype, pval = TRUE)
set.seed(1) nrows <- 5 ncols <- nrow(ovarian) counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste("X", 1:ncols, sep = "") rownames(ovarian) <- paste("X", 1:ncols, sep = "") SE <- SummarizedExperiment(assays = SimpleList(counts = counts)) ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat", assay(SE), num.genes = 2, scale = "row", clustering_method = "ward.D2") plot(ovarian.survtype, pval = TRUE)
Heatmaps of clustered genes for subtypes of samples can be drawn.
gene.clust(object, K, ...)
gene.clust(object, K, ...)
object |
the result of "Exprs.survtype" |
K |
the number of clusters |
... |
additional parameters for the "pheatmap" |
Heatmap for each cluster
Dongmin Jung
pheatmap::pheatmap
set.seed(1) nrows <- 5 ncols <- nrow(ovarian) counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste("X", 1:ncols, sep = "") rownames(ovarian) <- paste("X", 1:ncols, sep = "") SE <- SummarizedExperiment(assays = SimpleList(counts = counts)) ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat", assay(SE), num.genes = 5, scale = "row", clustering_method = "ward.D2") plot(ovarian.survtype, pval = TRUE) gene.clust(ovarian.survtype, 2, scale = "row", clustering_method = "ward.D2")
set.seed(1) nrows <- 5 ncols <- nrow(ovarian) counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste("X", 1:ncols, sep = "") rownames(ovarian) <- paste("X", 1:ncols, sep = "") SE <- SummarizedExperiment(assays = SimpleList(counts = counts)) ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat", assay(SE), num.genes = 5, scale = "row", clustering_method = "ward.D2") plot(ovarian.survtype, pval = TRUE) gene.clust(ovarian.survtype, 2, scale = "row", clustering_method = "ward.D2")
The groups of patients are identified according to whether the varints exist on a single gene. Survival difference between patients with and without mutations is compared.
MAF.survgroup(surv.data, time, status, maf, variants = NULL, sample.name = "Tumor_Sample_Barcode", gene.name = "Hugo_Symbol", variant.type = "Variant_Classification", num.genes = 10, siginificant.genes = 1, ...)
MAF.survgroup(surv.data, time, status, maf, variants = NULL, sample.name = "Tumor_Sample_Barcode", gene.name = "Hugo_Symbol", variant.type = "Variant_Classification", num.genes = 10, siginificant.genes = 1, ...)
surv.data |
survival data |
time |
survival time |
status |
status indicator |
maf |
a MAF file |
variants |
types of varints on a single gene for mutated samples. samples with any mutations, defined as mutated samples by default. |
sample.name |
the column name containing sample names (defult: "Tumor_Sample_Barcode") |
gene.name |
the column name containing gene names (defult: "Hugo_Symbol") |
variant.type |
the column name containing variant types (defult: "Variant_Classification") |
num.genes |
the number of top genes based on the number of mutated genes (default: 10) |
siginificant.genes |
the number of top genes based on the statistical siginificance of mutated genes (default: 1) |
... |
additional parameters for the "ggsurvplot" for the statistically siginificant genes |
time |
survival time |
status |
status indicator |
surv.data |
survival data |
maf.matrix |
a mutation matrix |
summary |
a list of number of samples with variants, chi-squared statistics and p-values |
cluster |
a vector of integers indicating the cluster to which each sample is assigned, for the most significant gene |
fit |
fitted survival curves, for the most significant gene |
Dongmin Jung
survival::Surv, survival::survfit, survival::survdiff, survminer::ggsurvplot
library(maftools) laml.maf <- system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools', mustWork = TRUE) laml.clin <- system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools', mustWork = TRUE) laml.maf <- read.csv(laml.maf, sep = "\t") laml.clinical.data <- read.csv(laml.clin, sep = "\t", row.names = 1) index <- which(laml.clinical.data$days_to_last_followup == -Inf) laml.clinical.data <- laml.clinical.data[-index,] laml.clinical.data <- data.frame(laml.clinical.data) laml.survgroup <- MAF.survgroup(laml.clinical.data, time = "days_to_last_followup", status = "Overall_Survival_Status", laml.maf, num.genes = 3, siginificant.genes = 1, pval = TRUE)
library(maftools) laml.maf <- system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools', mustWork = TRUE) laml.clin <- system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools', mustWork = TRUE) laml.maf <- read.csv(laml.maf, sep = "\t") laml.clinical.data <- read.csv(laml.clin, sep = "\t", row.names = 1) index <- which(laml.clinical.data$days_to_last_followup == -Inf) laml.clinical.data <- laml.clinical.data[-index,] laml.clinical.data <- data.frame(laml.clinical.data) laml.survgroup <- MAF.survgroup(laml.clinical.data, time = "days_to_last_followup", status = "Overall_Survival_Status", laml.maf, num.genes = 3, siginificant.genes = 1, pval = TRUE)
Create a mutation matrix using variant types
maf2matrix(maf, surv.data = NULL, sample.name = "Tumor_Sample_Barcode", gene.name = "Hugo_Symbol", variant.type = "Variant_Classification")
maf2matrix(maf, surv.data = NULL, sample.name = "Tumor_Sample_Barcode", gene.name = "Hugo_Symbol", variant.type = "Variant_Classification")
maf |
a MAF file |
surv.data |
survival data for sample names (default: NULL) |
sample.name |
the column name containing sample names (defult: "Tumor_Sample_Barcode") |
gene.name |
the column name containing gene names (defult: "Hugo_Symbol") |
variant.type |
the column name containing variant types (defult: "Variant_Classification") |
a mutation matrix
Dongmin Jung
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools") laml.maf <- read.csv(laml.maf, sep = "\t") laml.mat <- maf2matrix(laml.maf)
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools") laml.maf <- read.csv(laml.maf, sep = "\t") laml.mat <- maf2matrix(laml.maf)
Survival curves for subtypes of samples can be drawn.
## S3 method for class 'survtype' plot(object, ...)
## S3 method for class 'survtype' plot(object, ...)
object |
object of class "survtype" |
... |
additional parameters for the "ggsurvplot" |
Survival curves
Dongmin Jung
survminer::ggsurvplot
data(ovarian) ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat") plot(ovarian.survtype, pval = TRUE)
data(ovarian) ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat") plot(ovarian.survtype, pval = TRUE)
Normalize expression data using quantile normalization
quantile_normalization(x)
quantile_normalization(x)
x |
an expression profile |
a normalized matrix
Dongmin Jung
set.seed(1) nrows <- 10 ncols <- 5 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) normalized.matrix <- quantile_normalization(counts)
set.seed(1) nrows <- 10 ncols <- 5 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) normalized.matrix <- quantile_normalization(counts)
All midpoints of the expression level or real-valued statistic are investigated to find the best threshold giving the most significant difference between two groups. Any patients having the value greater than the best cutoff are assigned as the "high-score" class. Otherwise, the others belong to the "low-score" class.
Single.survgroup(surv.data, time, status, single.gene, intermediate = FALSE, group.names = c("High", "Intermediate", "Low"))
Single.survgroup(surv.data, time, status, single.gene, intermediate = FALSE, group.names = c("High", "Intermediate", "Low"))
surv.data |
survival data |
time |
survival time |
status |
status indicator |
single.gene |
expression level of a single gene |
intermediate |
a logical value indicating whether or not the intermediate class is considered (default: FALSE) |
group.names |
the name of clusters for "high-score", "intermediate-score", and "low-score" classes (defult: "High", "Intermediate", "Low") |
time |
survival time |
status |
status indicator |
surv.data |
survival data |
score |
a vector of scores |
summary |
a list of thresholds, chi-squared statistics and p-values |
cluster |
a vector of clusters to which samples are assigned |
fit |
fitted survival curves |
Dongmin Jung
survival::Surv, survival::survfit, survival::survdiff
set.seed(1) nrows <- 1 ncols <- nrow(ovarian) counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste("X", 1:ncols, sep = "") rownames(ovarian) <- paste("X", 1:ncols, sep = "") Single.ovarian <- Single.survgroup(ovarian, time = "futime", status = "fustat", counts[1,]) plot(Single.ovarian, pval = TRUE)
set.seed(1) nrows <- 1 ncols <- nrow(ovarian) counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste("X", 1:ncols, sep = "") rownames(ovarian) <- paste("X", 1:ncols, sep = "") Single.ovarian <- Single.survgroup(ovarian, time = "futime", status = "fustat", counts[1,]) plot(Single.ovarian, pval = TRUE)
Any patient who lived longer than the median was considered to be a "low-risk" patient, and any patient that lived less than the median was considered to be a "high-risk"" patient. In this manner, we assigned a class label to each observation. For censored data, we can estimate the probability that each censored observation belongs to the "low-risk" and "high-risk" classes, respectively.
Surv.survtype(surv.data, time, status)
Surv.survtype(surv.data, time, status)
surv.data |
survival data |
time |
survival time |
status |
status indicator |
n |
the number of subjects in each group |
obs |
the weighted observed number of events in each group |
exp |
the weighted expected number of events in each group |
chisq |
the chi-squared statistic for a test of equality |
call |
the matched call |
cluster |
a vector of clusters to which samples are assigned |
time |
survival time |
status |
status indicator |
surv.data |
survival data |
fit |
fitted survival curves |
Dongmin Jung
Bair, E., & Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoS biology, 2(4), e108.
survival::Surv, survival::survfit, survival::survdiff
data(ovarian) ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat")
data(ovarian) ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat")