Package 'survtype' reference manual

Title:	Subtype Identification with Survival Data
Description:	Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful.
Authors:	Dongmin Jung
Maintainer:	Dongmin Jung <[email protected]>
License:	Artistic-2.0
Version:	1.23.0
Built:	2025-03-30 06:13:26 UTC
Source:	https://github.com/bioc/survtype

Sample subtype identification via survival information and gene expression data

Description

For discovery of subtypes of samples that are both clinically relevant and biologically meaningful, the Cox regession model and hierarchical clustering are combined.

Usage

Exprs.survtype(surv.data, time, status, exprs.data, K = 2, num.genes = 100,
               gene.sel = FALSE, gene.sel.opt = list(verbose = FALSE), ...)
Exprs.survtype(surv.data, time, status, exprs.data, K = 2, num.genes = 100,
               gene.sel = FALSE, gene.sel.opt = list(verbose = FALSE), ...)

Arguments

`surv.data`	survival data
`time`	survival time
`status`	status indicator
`exprs.data`	expression data
`K`	the number of clusters (default: 2)
`num.genes`	the number of top genes based on the Cox score, before variable selection (default: 100)
`gene.sel`	a logical value indicating whether or not gene selection for clustring is applied (default: FALSE)
`gene.sel.opt`	a list of options for the gene selection function "clustvarsel". "verbose" is set to FALSE as default.
`...`	additional parameters for the "pheatmap"

Value

`n`	the number of subjects in each group
`obs`	the weighted observed number of events in each group
`exp`	the weighted expected number of events in each group
`chisq`	the chi-squared statistic for a test of equality
`call`	the matched call
`fit`	fitted survival curves
`cluster`	a vector of integers indicating the cluster to which each sample is assigned
`time`	survival time
`status`	status indicator
`surv.data`	survival data
`exprs.data`	expression data

Author(s)

Dongmin Jung

References

Bair, E., & Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoS biology, 2(4), e108.

Examples

set.seed(1)
nrows <- 5
ncols <- nrow(ovarian)
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- paste("X", 1:ncols, sep = "")
rownames(ovarian) <- paste("X", 1:ncols, sep = "")
SE <- SummarizedExperiment(assays = SimpleList(counts = counts))
ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat",
                                 assay(SE), num.genes = 2, scale = "row",
                                 clustering_method = "ward.D2")
plot(ovarian.survtype, pval = TRUE)
set.seed(1)
nrows <- 5
ncols <- nrow(ovarian)
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- paste("X", 1:ncols, sep = "")
rownames(ovarian) <- paste("X", 1:ncols, sep = "")
SE <- SummarizedExperiment(assays = SimpleList(counts = counts))
ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat",
                                 assay(SE), num.genes = 2, scale = "row",
                                 clustering_method = "ward.D2")
plot(ovarian.survtype, pval = TRUE)

Plots of the heatmap for each cluster of expression data

Description

Heatmaps of clustered genes for subtypes of samples can be drawn.

Usage

gene.clust(object, K, ...)
gene.clust(object, K, ...)

Arguments

`object`	the result of "Exprs.survtype"
`K`	the number of clusters
`...`	additional parameters for the "pheatmap"

Value

Heatmap for each cluster

Author(s)

Dongmin Jung

Examples

set.seed(1)
nrows <- 5
ncols <- nrow(ovarian)
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- paste("X", 1:ncols, sep = "")
rownames(ovarian) <- paste("X", 1:ncols, sep = "")
SE <- SummarizedExperiment(assays = SimpleList(counts = counts))
ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat",
                                 assay(SE), num.genes = 5, scale = "row",
                                 clustering_method = "ward.D2")
plot(ovarian.survtype, pval = TRUE)
gene.clust(ovarian.survtype, 2, scale = "row", clustering_method = "ward.D2")
set.seed(1)
nrows <- 5
ncols <- nrow(ovarian)
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- paste("X", 1:ncols, sep = "")
rownames(ovarian) <- paste("X", 1:ncols, sep = "")
SE <- SummarizedExperiment(assays = SimpleList(counts = counts))
ovarian.survtype <- Exprs.survtype(ovarian, time = "futime", status = "fustat",
                                 assay(SE), num.genes = 5, scale = "row",
                                 clustering_method = "ward.D2")
plot(ovarian.survtype, pval = TRUE)
gene.clust(ovarian.survtype, 2, scale = "row", clustering_method = "ward.D2")

Patient group identification via survival data and mutation annotation information

Description

The groups of patients are identified according to whether the varints exist on a single gene. Survival difference between patients with and without mutations is compared.

Usage

MAF.survgroup(surv.data, time, status, maf, variants = NULL,
              sample.name = "Tumor_Sample_Barcode", gene.name = "Hugo_Symbol",
              variant.type = "Variant_Classification", num.genes = 10,
              siginificant.genes = 1, ...)
MAF.survgroup(surv.data, time, status, maf, variants = NULL,
              sample.name = "Tumor_Sample_Barcode", gene.name = "Hugo_Symbol",
              variant.type = "Variant_Classification", num.genes = 10,
              siginificant.genes = 1, ...)

Arguments

`surv.data`	survival data
`time`	survival time
`status`	status indicator
`maf`	a MAF file
`variants`	types of varints on a single gene for mutated samples. samples with any mutations, defined as mutated samples by default.
`sample.name`	the column name containing sample names (defult: "Tumor_Sample_Barcode")
`gene.name`	the column name containing gene names (defult: "Hugo_Symbol")
`variant.type`	the column name containing variant types (defult: "Variant_Classification")
`num.genes`	the number of top genes based on the number of mutated genes (default: 10)
`siginificant.genes`	the number of top genes based on the statistical siginificance of mutated genes (default: 1)
`...`	additional parameters for the "ggsurvplot" for the statistically siginificant genes

Value

`time`	survival time
`status`	status indicator
`surv.data`	survival data
`maf.matrix`	a mutation matrix
`summary`	a list of number of samples with variants, chi-squared statistics and p-values
`cluster`	a vector of integers indicating the cluster to which each sample is assigned, for the most significant gene
`fit`	fitted survival curves, for the most significant gene

Author(s)

Dongmin Jung

Examples

library(maftools)
laml.maf <- system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools', mustWork = TRUE)
laml.clin <- system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools', mustWork = TRUE)
laml.maf <- read.csv(laml.maf, sep = "\t")
laml.clinical.data <- read.csv(laml.clin, sep = "\t", row.names = 1)
index <- which(laml.clinical.data$days_to_last_followup == -Inf)
laml.clinical.data <- laml.clinical.data[-index,]
laml.clinical.data <- data.frame(laml.clinical.data)
laml.survgroup <- MAF.survgroup(laml.clinical.data, time = "days_to_last_followup",
                                status = "Overall_Survival_Status", laml.maf,
                                num.genes = 3, siginificant.genes = 1,
                                pval = TRUE)
library(maftools)
laml.maf <- system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools', mustWork = TRUE)
laml.clin <- system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools', mustWork = TRUE)
laml.maf <- read.csv(laml.maf, sep = "\t")
laml.clinical.data <- read.csv(laml.clin, sep = "\t", row.names = 1)
index <- which(laml.clinical.data$days_to_last_followup == -Inf)
laml.clinical.data <- laml.clinical.data[-index,]
laml.clinical.data <- data.frame(laml.clinical.data)
laml.survgroup <- MAF.survgroup(laml.clinical.data, time = "days_to_last_followup",
                                status = "Overall_Survival_Status", laml.maf,
                                num.genes = 3, siginificant.genes = 1,
                                pval = TRUE)

Convert a MAF file to a mutation matrix

Description

Create a mutation matrix using variant types

Usage

maf2matrix(maf, surv.data = NULL, sample.name = "Tumor_Sample_Barcode",
           gene.name = "Hugo_Symbol", variant.type = "Variant_Classification")
maf2matrix(maf, surv.data = NULL, sample.name = "Tumor_Sample_Barcode",
           gene.name = "Hugo_Symbol", variant.type = "Variant_Classification")

Arguments

`maf`	a MAF file
`surv.data`	survival data for sample names (default: NULL)
`sample.name`	the column name containing sample names (defult: "Tumor_Sample_Barcode")
`gene.name`	the column name containing gene names (defult: "Hugo_Symbol")
`variant.type`	the column name containing variant types (defult: "Variant_Classification")

Value

a mutation matrix

Author(s)

Dongmin Jung

Examples

laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml.maf <- read.csv(laml.maf, sep = "\t")
laml.mat <- maf2matrix(laml.maf)
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml.maf <- read.csv(laml.maf, sep = "\t")
laml.mat <- maf2matrix(laml.maf)

Plot of survival curves of sample subtypes

Description

Survival curves for subtypes of samples can be drawn.

Usage

## S3 method for class 'survtype'
plot(object, ...)
## S3 method for class 'survtype'
plot(object, ...)

Arguments

`object`	object of class "survtype"
`...`	additional parameters for the "ggsurvplot"

Value

Survival curves

Author(s)

Dongmin Jung

Examples

data(ovarian)
ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat")
plot(ovarian.survtype, pval = TRUE)
data(ovarian)
ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat")
plot(ovarian.survtype, pval = TRUE)

Normalize a gene expression profile

Description

Normalize expression data using quantile normalization

Usage

quantile_normalization(x)
quantile_normalization(x)

Arguments

`x`	an expression profile

Value

a normalized matrix

Author(s)

Dongmin Jung

Examples

set.seed(1)
nrows <- 10
ncols <- 5
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
normalized.matrix <- quantile_normalization(counts)
set.seed(1)
nrows <- 10
ncols <- 5
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
normalized.matrix <- quantile_normalization(counts)

Patient group identification via survival information and expression of a single gene

Description

All midpoints of the expression level or real-valued statistic are investigated to find the best threshold giving the most significant difference between two groups. Any patients having the value greater than the best cutoff are assigned as the "high-score" class. Otherwise, the others belong to the "low-score" class.

Usage

Single.survgroup(surv.data, time, status, single.gene, intermediate = FALSE,
                 group.names = c("High", "Intermediate", "Low"))
Single.survgroup(surv.data, time, status, single.gene, intermediate = FALSE,
                 group.names = c("High", "Intermediate", "Low"))

Arguments

`surv.data`	survival data
`time`	survival time
`status`	status indicator
`single.gene`	expression level of a single gene
`intermediate`	a logical value indicating whether or not the intermediate class is considered (default: FALSE)
`group.names`	the name of clusters for "high-score", "intermediate-score", and "low-score" classes (defult: "High", "Intermediate", "Low")

Value

`time`	survival time
`status`	status indicator
`surv.data`	survival data
`score`	a vector of scores
`summary`	a list of thresholds, chi-squared statistics and p-values
`cluster`	a vector of clusters to which samples are assigned
`fit`	fitted survival curves

Author(s)

Dongmin Jung

Examples

set.seed(1)
nrows <- 1
ncols <- nrow(ovarian)
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- paste("X", 1:ncols, sep = "")
rownames(ovarian) <- paste("X", 1:ncols, sep = "")
Single.ovarian <- Single.survgroup(ovarian, time = "futime", status = "fustat", counts[1,])
plot(Single.ovarian, pval = TRUE)
set.seed(1)
nrows <- 1
ncols <- nrow(ovarian)
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- paste("X", 1:ncols, sep = "")
rownames(ovarian) <- paste("X", 1:ncols, sep = "")
Single.ovarian <- Single.survgroup(ovarian, time = "futime", status = "fustat", counts[1,])
plot(Single.ovarian, pval = TRUE)

Sample subtype identification via survival information

Description

Any patient who lived longer than the median was considered to be a "low-risk" patient, and any patient that lived less than the median was considered to be a "high-risk"" patient. In this manner, we assigned a class label to each observation. For censored data, we can estimate the probability that each censored observation belongs to the "low-risk" and "high-risk" classes, respectively.

Usage

Surv.survtype(surv.data, time, status)
Surv.survtype(surv.data, time, status)

Arguments

`surv.data`	survival data
`time`	survival time
`status`	status indicator

Value

`n`	the number of subjects in each group
`obs`	the weighted observed number of events in each group
`exp`	the weighted expected number of events in each group
`chisq`	the chi-squared statistic for a test of equality
`call`	the matched call
`cluster`	a vector of clusters to which samples are assigned
`time`	survival time
`status`	status indicator
`surv.data`	survival data
`fit`	fitted survival curves

Author(s)

Dongmin Jung

References

Bair, E., & Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoS biology, 2(4), e108.

Examples

data(ovarian)
ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat")
data(ovarian)
ovarian.survtype <- Surv.survtype(ovarian, time = "futime", status = "fustat")

Package 'survtype'

Help Index

Sample subtype identification via survival information and gene expression data

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Plots of the heatmap for each cluster of expression data

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Patient group identification via survival data and mutation annotation information

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Convert a MAF file to a mutation matrix

Description

Usage

Arguments

Value

Author(s)

Examples

Plot of survival curves of sample subtypes

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Normalize a gene expression profile

Description

Usage

Arguments

Value

Author(s)

Examples

Patient group identification via survival information and expression of a single gene

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Sample subtype identification via survival information

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples