| Title: | Cohort-aware methods for intrinsic molecular subtyping of breast cancer |
|---|---|
| Description: | BreastSubtypeR provides an assumption-aware, multi-method framework for intrinsic molecular subtyping of breast cancer. The package harmonizes several published nearest-centroid (NC) and single-sample predictor (SSP) classifiers, supplies method-specific preprocessing and robust probe-to-gene mapping, and implements a cohort-aware AUTO mode that selectively enables classifiers compatible with the cohort composition. A local Shiny app (iBreastSubtypeR) is included for interactive analyses and to support users without programming experience. |
| Authors: | Qiao Yang [aut, cre] (ORCID: <https://orcid.org/0000-0002-4098-3246>), Emmanouil G. Sifakis [aut] (ORCID: <https://orcid.org/0000-0001-9919-4471>) |
| Maintainer: | Qiao Yang <[email protected]> |
| License: | GPL-3 |
| Version: | 1.5.0 |
| Built: | 2026-05-22 09:38:41 UTC |
| Source: | https://github.com/bioc/BreastSubtypeR |
Model definition for AIMS consisting of 100 pairwise rules and a Naive Bayes classifier (via e1071) as described by Paquet & Hallett (2015).
data("AIMSmodel")data("AIMSmodel")
An object of class list of length 4.
The 100 rules are of the form “EntrezID gene A < EntrezID gene B”. A subset
of k rules (typically 20) is used within a Naive Bayes classifier to
assign subtypes (Basal-like, HER2-enriched, LumA, LumB, Normal-like) on a
per-sample basis.
all.pairs |
Character vector of the 100 AIMS rules (EntrezID comparisons). |
k |
Integer; number of optimal rules (commonly 20). |
one.vs.all.tsp |
Naive Bayes classifier object used with the rules. |
selected.pairs.list |
Rules ranked by discriminative power per subtype. |
Paquet ER, Hallett MT. Absolute assignment of breast cancer intrinsic molecular subtype. J Natl Cancer Inst. 2015;107(1):dju357. https://doi.org/10.1093/jnci/dju357
library(BreastSubtypeR) data("AIMSmodel")library(BreastSubtypeR) data("AIMSmodel")
BreastSubtypeR is an R/Bioconductor package that unifies multiple published intrinsic subtyping (IS) methods for breast cancer into a single, reproducible framework. It supports both nearest-centroid (NC-based) and single-sample predictor (SSP-based) classifiers and introduces an assumption-aware AUTO mode that dynamically selects methods compatible with the input cohort.
By standardizing input handling, applying method-specific normalization, and providing optimised probe-to-gene mapping, BreastSubtypeR reduces inconsistencies across platforms and improves reproducibility in translational research. A companion Shiny app (iBreastSubtypeR) offers an intuitive GUI for non-programmers while preserving data privacy.
Data Input: Supply a gene expression dataset as a SummarizedExperiment.
Supported inputs include raw RNA-seq counts (with gene lengths),
log2(FPKM+1) RNA-seq, or log2-normalized microarray/nCounter data.
Gene Mapping: Prepare expression data with Mapping,
including Entrez ID-based resolution of duplicates.
Subtyping: Apply multiple classifiers simultaneously using
BS_Multi, or enable AUTO mode for
cohort-aware method selection.
visualization: Summarise and interpret subtyping results with
Vis_Multi.
Multi-method framework: Ten published NC- and SSP-based classifiers, harmonised under one interface.
AUTO mode: Evaluates cohort composition (e.g., ER/HER2 prevalence, subtype purity, subgroup sizes) and disables classifiers with violated assumptions; improves accuracy, Cohen’s kappa, and IHC concordance.
standardized normalization: Upper-quartile log2-CPM for NC-based methods; FPKM for SSP-based methods.
Optimised gene mapping: Entrez ID-based mapping with conflict resolution.
Dual accessibility: A Bioconductor-compliant R API and a local Shiny app (iBreastSubtypeR).
Maintainer: Qiao Yang [email protected] (ORCID)
Authors:
Emmanouil G. Sifakis [email protected] (ORCID)
List of reference resources required by nearest-centroid (NC) subtyping methods: platform medians, centroids, signatures, subgroup quantiles, and metadata from the UNC232 training cohort.
data("BreastSubtypeRobj")data("BreastSubtypeRobj")
A list with:
mediansMatrix/data frame of platform-specific medians
for 11 expression/sequencing platforms, derived as described in
Picornell et al. (2019). Platform columns include:
nCounter,
totalRNA.FFPE.20151111, RNAseq.Freeze.20120907, RNAseq.V2, RNAseq.V1,
GC.4x44Kcustom, Agilent_244K, commercial_1x44k_postMeanCollapse_WashU, commercial_4x44k_postMeanCollapse_WashU_v2,
htp1.5_WU_update, arrayTrain_postMeanCollapse.
centroidPAM50 centroids used by parker.original.
genes.sig50Data frame of the 50 PAM50 genes with a proliferation flag.
ssBC.subgroupQuantileSubgroup-specific quantiles used by ssBC.
genes.signatureMarker genes used across NC- and SSP-based methods.
UNC232Summary data for the UNC232 training cohort.
platform.UNC232Platform annotation for UNC232.
Parker JS, Mullins M, Cheung MCU, Leung S, Voduc D, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160–1167. https://doi.org/10.1200/JCO.2008.18.1370
Zhao X, Rodland EA, Tibshirani R, Plevritis S. Molecular subtyping for clinically defined breast cancer subgroups. Breast Cancer Res. 2015;17(1):29. https://doi.org/10.1186/s13058-015-0520-4
Fernandez-Martinez A, Krop IE, Hillman DW, Polley MY, Parker JS, Huebner L, et al. Survival, pathologic response, and genomics in CALGB 40601 (Alliance). J Clin Oncol. 2020;38(36):4184–4197. https://doi.org/10.1200/JCO.20.01276
Picornell AC, Echavarria I, Alvarez E, López-Tarruella S, Jerez Y, Hoadley K, et al. Breast cancer PAM50 signature: correlation and concordance between RNA-seq and digital multiplexed gene expression technologies in a TNBC series. BMC Genomics. 2019;20(1):452. https://doi.org/10.1186/s12864-019-5849-0
library(BreastSubtypeR) data("BreastSubtypeRobj")library(BreastSubtypeR) data("BreastSubtypeRobj")
Implements the AIMS (Absolute Assignment of Intrinsic Molecular Subtype) method for breast cancer intrinsic subtyping. Unlike nearest-centroid (NC) approaches, AIMS is a single-sample predictor (SSP): it assigns subtypes independently for each sample using within-sample, pairwise gene expression rules. This makes it robust to cohort composition and scaling.
BS_AIMS(se_obj)BS_AIMS(se_obj)
se_obj |
A
|
A list with the following elements:
cl: Character vector of AIMS subtype calls per sample.
One of "Basal", "Her2", "LumA", "LumB", or "Normal".
prob: Numeric vector of posterior probabilities corresponding
to the assigned subtype in cl (one value per sample).
all.probs: Matrix of posterior probabilities for all samples
and all subtypes (rows = samples, columns = subtypes).
rules.matrix: 0/1 matrix of the 100 AIMS rules used for assignment
(rows = rules, columns = samples); 1 indicates the rule evaluated to TRUE.
data.used: Expression values actually used to evaluate the rules
(filtered/ordered subset aligned to the AIMS gene set).
EntrezID.used: Character vector of Entrez IDs used by AIMS.
Paquet ER, Hallett MT. Absolute assignment of breast cancer intrinsic molecular subtype. Journal of the National Cancer Institute. 2015;107(1):dju357. https://doi.org/10.1093/jnci/dju357
## Example using SummarizedExperiment input data("OSLO2EMIT0obj") res <- BS_AIMS( se_obj = OSLO2EMIT0obj$data_input$se_SSP )## Example using SummarizedExperiment input data("OSLO2EMIT0obj") res <- BS_AIMS( se_obj = OSLO2EMIT0obj$data_input$se_SSP )
Implements the conventional immunohistochemistry-based (cIHC) intrinsic subtyping approach, which balances cohorts by estrogen receptor (ER) status before applying gene-expression–based subtyping. This method is useful for ER-skewed cohorts where assumptions of nearest-centroid classifiers are violated.
BS_cIHC(se_obj, Subtype = FALSE, hasClinical = FALSE, seed = 118)BS_cIHC(se_obj, Subtype = FALSE, hasClinical = FALSE, seed = 118)
se_obj |
A
|
Subtype |
Logical. If |
hasClinical |
Logical. If
|
seed |
Integer. Random seed for reproducibility of ER-balancing. |
A data.frame containing intrinsic subtype assignments estimated
using the conventional IHC (cIHC) approach.
Ciriello G, Gatza ML, Beck AH, Wilkerson MD, Rhie SK, Pastore A, et al. Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer. Cell. 2015;163(2):506–519. https://doi.org/10.1016/j.cell.2015.09.033
data("OSLO2EMIT0obj") res <- BS_cIHC( se_obj = OSLO2EMIT0obj$data_input$se_NC, Subtype = FALSE, hasClinical = FALSE )data("OSLO2EMIT0obj") res <- BS_cIHC( se_obj = OSLO2EMIT0obj$data_input$se_NC, Subtype = FALSE, hasClinical = FALSE )
Implements an iterative version of the conventional IHC-based intrinsic subtyping approach. This method repeatedly balances samples by estrogen receptor (ER) status across multiple iterations, allowing refinement of subtype calls in ER-skewed cohorts. Users can customise the ER+/ER– ratio to match specific cohort assumptions (e.g., training distribution).
BS_cIHC.itr( se_obj, iteration = 100, ratio = 54/64, Subtype = FALSE, hasClinical = FALSE, seed = 118 )BS_cIHC.itr( se_obj, iteration = 100, ratio = 54/64, Subtype = FALSE, hasClinical = FALSE, seed = 118 )
se_obj |
A
|
iteration |
Integer. Number of iterations for the ER-balancing procedure. Default: 100. |
ratio |
Numeric. Target ER+/ER– ratio for balancing. Options:
|
Subtype |
Logical. If |
hasClinical |
Logical. If
|
seed |
Integer. Random seed for reproducibility. |
A list containing:
subtypes: Intrinsic subtype predictions across iterations.
confidence: Confidence estimates for each assigned subtype.
ER_balance: Proportions of ER+ and ER– subsets observed across iterations.
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, et al. The genomic and transcriptomic architecture of 2,000 breast tumors reveals novel subgroups. Nature. 2012;486(7403):346–352. https://doi.org/10.1038/nature10983
data("OSLO2EMIT0obj") res <- BS_cIHC.itr( se_obj = OSLO2EMIT0obj$data_input$se_NC, iteration = 10, ## for final analysis, use iteration = 100 Subtype = FALSE, hasClinical = FALSE )data("OSLO2EMIT0obj") res <- BS_cIHC.itr( se_obj = OSLO2EMIT0obj$data_input$se_NC, iteration = 10, ## for final analysis, use iteration = 100 Subtype = FALSE, hasClinical = FALSE )
Executes multiple intrinsic molecular subtyping methods in parallel. Users can either specify a set of classifiers directly, or enable the AUTO mode, which dynamically selects methods based on cohort composition (e.g., ER/HER2 distribution, subtype purity, subgroup size). AUTO reduces misclassification in skewed or subtype-specific cohorts by disabling methods whose assumptions are violated, but does not perform consensus voting—subtypes are still returned per method.
BS_Multi(data_input, methods = "AUTO", Subtype = FALSE, hasClinical = FALSE)BS_Multi(data_input, methods = "AUTO", Subtype = FALSE, hasClinical = FALSE)
data_input |
The output from the |
methods |
Character vector specifying the subtyping methods to run. Available options include:
Notes:
|
Subtype |
Logical. If |
hasClinical |
Logical. If
|
A list containing per-method subtype assignments for each sample.
Yang Q, Hartman J, Sifakis EG. BreastSubtypeR: A Unified R/Bioconductor Package for Intrinsic Molecular Subtyping in Breast Cancer Research. NAR Genomics and Bioinformatics. 2025. https://doi.org/10.1093/nargab/lqaf131. Selected as Editor’s Choice.
Parker JS, Mullins M, Cheung MCU, Leung S, Voduc D, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160-1167. https://doi.org/10.1200/JCO.2008.18.1370
Gendoo DMA, Ratanasirigulchai N, Schröder MS, Paré L, Parker JS, Prat A, et al. Genefu: An R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics. 2016;32(7):1097-1099. https://doi.org/10.1093/bioinformatics/btv693
Ciriello G, Gatza ML, Beck AH, Wilkerson MD, Rhie SK, Pastore A, et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell. 2015;163(2):506-519. https://doi.org/10.1016/j.cell.2015.09.033
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, et al. The genomic and transcriptomic architecture of 2,000 breast tumors reveals novel subgroups. Nature. 2012;486(7403):346-352. https://doi.org/10.1038/nature10983
Zhao X, Rodland EA, Tibshirani R, Plevritis S. Molecular subtyping for clinically defined breast cancer subgroups. Breast Cancer Res. 2015;17(1):29. https://doi.org/10.1186/s13058-015-0520-4
Fernandez-Martinez A, Krop IE, Hillman DW, Polley MY, Parker JS, Huebner L, et al. Survival, pathologic response, and genomics in CALGB 40601 (Alliance), a neoadjuvant Phase III trial of paclitaxel–trastuzumab with or without lapatinib in HER2-positive breast cancer. J Clin Oncol. 2020;38(36):4184-4197. https://doi.org/10.1200/JCO.20.01276
Paquet ER, Hallett MT. Absolute assignment of breast cancer intrinsic molecular subtype. J Natl Cancer Inst. 2015;107(1):dju357. https://doi.org/10.1093/jnci/dju357
Staaf J, Häkkinen J, Hegardt C, Saal LH, Kimbung S, Hedenfalk I, et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer. 2022;8(1):27. https://doi.org/10.1038/s41523-022-00465-3
## Example: run multiple methods data("OSLO2EMIT0obj") methods <- c("parker.original", "genefu.scale", "genefu.robust") res.test <- BS_Multi( data_input = OSLO2EMIT0obj$data_input, methods = methods, Subtype = FALSE, hasClinical = FALSE )## Example: run multiple methods data("OSLO2EMIT0obj") methods <- c("parker.original", "genefu.scale", "genefu.robust") res.test <- BS_Multi( data_input = OSLO2EMIT0obj$data_input, methods = methods, Subtype = FALSE, hasClinical = FALSE )
Implements the original PAM50 nearest-centroid classifier as described by Parker et al. (2009), along with supported calibration strategies and variations. This function assigns intrinsic breast cancer subtypes (Luminal A, Luminal B, HER2-enriched, Basal-like, and optionally Normal-like).
BS_parker( se_obj, calibration = "None", internal = NA, external = NA, medians = NA, Subtype = FALSE, hasClinical = FALSE )BS_parker( se_obj, calibration = "None", internal = NA, external = NA, medians = NA, Subtype = FALSE, hasClinical = FALSE )
se_obj |
A
|
calibration |
Character. One of:
|
internal |
Internal calibration method used when
|
external |
Character string specifying the external calibration source.
|
medians |
A matrix or data.frame of user-provided medians (required if
|
Subtype |
Logical. If |
hasClinical |
Logical. If
|
A list containing PAM50 intrinsic subtype calls using the Parker classifier and selected calibration strategy.
Parker JS, Mullins M, Cheung MCU, Leung S, Voduc D, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology. 2009;27(8). https://doi.org/10.1200/JCO.2008.18.1370
Gendoo DMA, Ratanasirigulchai N, Schröder MS, Paré L, Parker JS, Prat A, et al. Genefu: An R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics. 2016;32(7). https://doi.org/10.1093/bioinformatics/btv693
data("OSLO2EMIT0obj") res <- BS_parker( se_obj = OSLO2EMIT0obj$data_input$se_NC, calibration = "Internal", internal = NA, # NA is equal to "medianCtr" Subtype = FALSE, hasClinical = FALSE )data("OSLO2EMIT0obj") res <- BS_parker( se_obj = OSLO2EMIT0obj$data_input$se_NC, calibration = "Internal", internal = NA, # NA is equal to "medianCtr" Subtype = FALSE, hasClinical = FALSE )
Implements the PCA-PAM50 method, which integrates Principal Component Analysis (PCA) of ESR1 expression to adjust for estrogen receptor (ER) imbalance prior to applying the PAM50 nearest-centroid classifier. This approach improves subtype consistency, particularly in ER-skewed cohorts.
BS_PCAPAM50(se_obj, Subtype = FALSE, hasClinical = FALSE, seed = 118)BS_PCAPAM50(se_obj, Subtype = FALSE, hasClinical = FALSE, seed = 118)
se_obj |
A
|
Subtype |
Logical. If |
hasClinical |
Logical. If
|
seed |
Integer. Random seed for reproducibility. |
A character vector of intrinsic subtype predictions assigned to each sample using the PCA-PAM50 method.
Raj-Kumar PK, Liu J, Hooke JA, Kovatich AJ, Kvecher L, Shriver CD, et al. PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping, reclassifying a subset of luminal A tumors as luminal B. Scientific Reports. 2019;9(1):1–12. https://doi.org/10.1038/s41598-019-44339-4
data("OSLO2EMIT0obj") res <- BS_PCAPAM50( se_obj = OSLO2EMIT0obj$data_input$se_NC, Subtype = FALSE, hasClinical = FALSE )data("OSLO2EMIT0obj") res <- BS_PCAPAM50( se_obj = OSLO2EMIT0obj$data_input$se_NC, Subtype = FALSE, hasClinical = FALSE )
Implements the subgroup-specific gene-centering (ssBC) method for breast cancer intrinsic subtyping. The ssBC approach applies precomputed, subgroup-specific centering values to adjust PAM50 nearest-centroid classification when the study cohort is skewed relative to the original training cohort (e.g., ER-selected, HER2-enriched, or triple-negative cohorts).
BS_ssBC(se_obj, s, Subtype = FALSE, hasClinical = FALSE)BS_ssBC(se_obj, s, Subtype = FALSE, hasClinical = FALSE)
se_obj |
A
|
s |
Character. Specifies which subgroup-specific quantiles to use:
|
Subtype |
Logical. If |
hasClinical |
Logical. If
|
A character vector of intrinsic subtype predictions assigned to each sample using the ssBC method.
Zhao X, Rodland EA, Tibshirani R, Plevritis S. Molecular subtyping for clinically defined breast cancer subgroups. Breast Cancer Research. 2015;17(1):29. https://doi.org/10.1186/s13058-015-0520-4
Fernandez-Martinez A, Krop IE, Hillman DW, Polley MY, Parker JS, Huebner L, et al. Survival, pathologic response, and genomics in CALGB 40601 (Alliance), a neoadjuvant Phase III trial of paclitaxel–trastuzumab with or without lapatinib in HER2-positive breast cancer. Journal of Clinical Oncology. 2020;38(36):4184–4197. https://doi.org/10.1200/JCO.20.01276
## Example: Updated subgroup-specific quantiles (ER.v2) data("OSLO2EMIT0obj") res <- BS_ssBC( se_obj = OSLO2EMIT0obj$data_input$se_NC, s = "ER.v2", Subtype = FALSE, hasClinical = FALSE )## Example: Updated subgroup-specific quantiles (ER.v2) data("OSLO2EMIT0obj") res <- BS_ssBC( se_obj = OSLO2EMIT0obj$data_input$se_NC, s = "ER.v2", Subtype = FALSE, hasClinical = FALSE )
Implements SSPBC (Single Sample Predictor for Breast Cancer), a refinement of the original AIMS methodology trained on the large, population-based SCAN-B RNA-seq cohort. SSPBC provides robust single-sample predictions, independent of cohort composition, and supports multiple model variants for different applications.
BS_sspbc(se_obj, ssp.name = "ssp.pam50")BS_sspbc(se_obj, ssp.name = "ssp.pam50")
se_obj |
A
|
ssp.name |
Character. Specifies the SSPBC model to use:
|
A list with the following elements:
cl: Molecular class identified by the sspbc models for each sample
(one of "Basal", "Her2", "LumA", "LumB", with or without "Normal").
prob: Numeric vector of posterior probabilities corresponding
to the assigned subtype in cl (one value per sample).
all.probs: Matrix of posterior probability values for all samples
and all subtypes (rows = samples, columns = subtypes).
rules.matrix: Binary (0/1) matrix of the pairwise gene-expression
rules (gene A < gene B) used for assignment (rows = rules, columns = samples);
1 indicates the rule evaluated to TRUE for that sample.
data.used: Expression values actually used to evaluate the simple rules.
EntrezID.used: Character vector of EntrezGene IDs used for rule evaluation.
Staaf J, Häkkinen J, Hegardt C, Saal LH, Kimbung S, Hedenfalk I, et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer. 2022;8(1):27. https://doi.org/10.1038/s41523-022-00465-3
## Example using SSPBC with the PAM50 model data("OSLO2EMIT0obj") res <- BS_sspbc( se_obj = OSLO2EMIT0obj$data_input$se_SSP, ssp.name = "ssp.pam50" )## Example using SSPBC with the PAM50 model data("OSLO2EMIT0obj") res <- BS_sspbc( se_obj = OSLO2EMIT0obj$data_input$se_SSP, ssp.name = "ssp.pam50" )
Annotation table for GENCODE Human Release 27 genes (Gene.ID) used by
StringTie summarisation. Includes HGNC, EntrezGene, and RefSeq identifiers
derived from GENCODE v27 metadata.
data("Gene.ID.ann")data("Gene.ID.ann")
An object of class data.frame with 19675 rows and 6 columns.
Used by internal SSP application functions to translate identifiers prior to classification with SSP models.
Gene.ID.ann |
Data frame of annotations for GENCODE v27 genes. |
Staaf J, Häkkinen J, Hegardt C, Saal LH, Kimbung S, Hedenfalk I, et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer. 2022;8(1):27. https://doi.org/10.1038/s41523-022-00465-3
library(BreastSubtypeR) data("Gene.ID.ann")library(BreastSubtypeR) data("Gene.ID.ann")
Starts the Shiny UI bundled with the BreastSubtypeR package.
The launcher can (optionally) attach Shiny/Bslib so UI/server can use
unqualified functions like tags, icon, fileInput, etc.
iBreastSubtypeR( attach = c("shiny", "bslib"), attach_tidyverse = FALSE, max_upload_mb = 1000 )iBreastSubtypeR( attach = c("shiny", "bslib"), attach_tidyverse = FALSE, max_upload_mb = 1000 )
attach |
Character vector of packages to attach before launch. Defaults to c("shiny","bslib"). Set to character(0) to skip attaching. |
attach_tidyverse |
Logical; if TRUE and tidyverse is installed, it will be attached quietly for the session (purely optional). |
max_upload_mb |
Numeric; Shiny upload size limit (in MB). Default 1000. |
The value returned by shiny::runApp() (usually invisible(NULL)).
if (interactive()) { iBreastSubtypeR() iBreastSubtypeR(attach = character(0)) }if (interactive()) { iBreastSubtypeR() iBreastSubtypeR(attach = character(0)) }
Preprocesses and maps gene expression input to prepare for intrinsic subtyping workflows (NC- and SSP-based).
Mapping( se_obj, RawCounts = FALSE, method = c("max", "mean", "median", "iqr", "stdev"), impute = TRUE, verbose = TRUE )Mapping( se_obj, RawCounts = FALSE, method = c("max", "mean", "median", "iqr", "stdev"), impute = TRUE, verbose = TRUE )
se_obj |
A
|
RawCounts |
Logical. If
|
method |
Strategy for resolving duplicate probes/genes. Options:
|
impute |
Logical. If |
verbose |
Logical. If |
Mapping() supports multiple input types:
Raw RNA-seq counts (with gene lengths): normalized to CPM (NC) or FPKM (SSP).
Precomputed log2(FPKM+1): used directly for NC; back-transformed for SSP.
log2-normalized microarray/nCounter data: used directly for NC; back-transformed for SSP.
This design allows users to supply a single expression format, while BreastSubtypeR automatically applies method-specific preprocessing.
Phenodata normalization. #' If receptor fields are present, they are coerced to canonical encodings
(ER -> {ER+, ER-}, HER2 -> {HER2+, HER2-},
TN -> {TN, nonTN}). Ambiguous values (e.g., HER2 == "2+")
are left unchanged and emit a warning.
A named list with:
SummarizedExperiment holding log2-transformed data prepared for NC-based methods
(assay name: counts).
SummarizedExperiment holding linear-scale data prepared for SSP-based methods
(assay name: counts).
Yang Q, Hartman J, Sifakis EG. BreastSubtypeR: A Unified R/Bioconductor Package for Intrinsic Molecular Subtyping in Breast Cancer Research. NAR Genomics and Bioinformatics. 2025. https://doi.org/10.1093/nargab/lqaf131. Selected as Editor’s Choice.
if (requireNamespace("SummarizedExperiment", quietly = TRUE)) { # Using example raw RNA-seq counts (with gene lengths) data("TCGABRCAobj") se_obj_counts <- TCGABRCAobj$se_obj[, 1:3] # tiny subset to keep checks fast res <- Mapping(se_obj_counts, RawCounts = TRUE) # Using example pre-normalized log2(FPKM+0.1) data("OSLO2EMIT0obj") se_obj_fpkm <- OSLO2EMIT0obj$se_obj[, 1:3] # tiny subset to keep checks fast res <- Mapping(se_obj_fpkm, RawCounts = FALSE) }if (requireNamespace("SummarizedExperiment", quietly = TRUE)) { # Using example raw RNA-seq counts (with gene lengths) data("TCGABRCAobj") se_obj_counts <- TCGABRCAobj$se_obj[, 1:3] # tiny subset to keep checks fast res <- Mapping(se_obj_counts, RawCounts = TRUE) # Using example pre-normalized log2(FPKM+0.1) data("OSLO2EMIT0obj") se_obj_fpkm <- OSLO2EMIT0obj$se_obj[, 1:3] # tiny subset to keep checks fast res <- Mapping(se_obj_fpkm, RawCounts = FALSE) }
Example object derived from the OSLO2-EMIT0 cohort (Staaf et al., 2022).
Includes a subset of normalized expression data, clinical metadata, feature
annotations, and example outputs from Mapping() and BS_Multi().
data("OSLO2EMIT0obj")data("OSLO2EMIT0obj")
A list with:
se_objA SummarizedExperiment containing a subset of the
log2-transformed, normalized expression matrix (log2(FPKM+0.1)) with colData clinical
metadata and row-level feature annotations.
data_inputExample output structure produced by Mapping().
resExample results from BS_Multi() run in AUTO mode.
Staaf J, Häkkinen J, Hegardt C, Saal LH, Kimbung S, Hedenfalk I, et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer. 2022;8(1):27. https://doi.org/10.1038/s41523-022-00465-3
library(BreastSubtypeR) data("OSLO2EMIT0obj")library(BreastSubtypeR) data("OSLO2EMIT0obj")
List of 11 single-sample predictor (SSP) models from Staaf et al. (2022),
indexed by short names used by sspbc.
data("sspbc.models")data("sspbc.models")
An object of class list of length 11.
Names correspond to short model identifiers. The contents are identical to
sspbc.models.fullname, which uses full model names.
sspbc.models |
Named list of 11 SSP models used by |
Staaf J, Häkkinen J, Hegardt C, Saal LH, Kimbung S, Hedenfalk I, et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer. 2022;8(1):27. https://doi.org/10.1038/s41523-022-00465-3
library(BreastSubtypeR) data("sspbc.models")library(BreastSubtypeR) data("sspbc.models")
List of the same 11 SSP models (Staaf et al., 2022) indexed by full model names.
data("sspbc.models.fullname")data("sspbc.models.fullname")
An object of class list of length 11.
Identical content to sspbc.models but with full model names as list keys.
sspbc.models.fullname |
Named list of 11 SSP models used by |
Staaf J, Häkkinen J, Hegardt C, Saal LH, Kimbung S, Hedenfalk I, et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer. 2022;8(1):27. https://doi.org/10.1038/s41523-022-00465-3
library(BreastSubtypeR) data("sspbc.models.fullname")library(BreastSubtypeR) data("sspbc.models.fullname")
Example object derived from TCGA-BRCA. Includes a subset of normalized metadata
raw counts (as a SummarizedExperiment), and example outputs from Mapping()
and BS_Multi() to facilitate runnable examples.
data("TCGABRCAobj")data("TCGABRCAobj")
A list with:
se_objA SummarizedExperiment containing the integer raw-count matrix
(top 5,000 variable genes), rowData with probe, SYMBOL, ENTREZID,
Length, and colData with PatientID, ER, PR, HER2.
data_inputExample Mapping() output created from se_obj.
resExample BS_Multi() results (e.g., run in AUTO mode).
The Cancer Genome Atlas (TCGA) BRCA via GDC; counts summarized with recount3;
clinical data retrieved with TCGAbiolinks.
The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumors. Nature. 2012;490(7418):61–70. https://doi.org/10.1038/nature11412
Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44(8):e71. https://doi.org/10.1093/nar/gkv1507
Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, et al. Reproducible RNA-seq analysis using recount2. Nat Biotechnol. 2017;35(4):319–321. https://doi.org/10.1038/nbt.3838
library(BreastSubtypeR) data("TCGABRCAobj") names(TCGABRCAobj) # str(TCGABRCAobj$se_obj); head(colData(TCGABRCAobj$se_obj))library(BreastSubtypeR) data("TCGABRCAobj") names(TCGABRCAobj) # str(TCGABRCAobj$se_obj); head(colData(TCGABRCAobj$se_obj))
This function generates a boxplot to visualize the correlation distribution between different subtypes of breast cancer, based on the provided correlation table and subtype information.
Vis_boxplot(out, correlations)Vis_boxplot(out, correlations)
out |
A data frame containing the columns |
correlations |
A data frame or matrix containing the correlation values computed from NC-based methods. |
A ggplot object representing the boxplot visualization of the
correlation distributions across the different subtypes.
data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Subtype information and correlation matrix out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) correlations <- res$results$genefu.robust$outList$distances # Generate the boxplot p <- Vis_boxplot(out, correlations) plot(p)data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Subtype information and correlation matrix out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) correlations <- res$results$genefu.robust$outList$distances # Generate the boxplot p <- Vis_boxplot(out, correlations) plot(p)
This function generates a heatmap to visualize gene expression patterns across breast cancer subtypes, based on the provided gene expression matrix and subtype information.
Vis_heatmap(x, out)Vis_heatmap(x, out)
x |
A gene expression matrix, where genes are rows and samples are columns. The data should be log2 transformed. |
out |
A data frame containing two columns: |
A ggplot or heatmap object (depending on implementation)
representing the heatmap of gene expression across different subtypes.
library(SummarizedExperiment) data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Gene expression matrix and subtype information x <- assay(OSLO2EMIT0obj$data_input$se_NC) out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) # Generate the heatmap p <- Vis_heatmap(x, out) plot(p)library(SummarizedExperiment) data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Gene expression matrix and subtype information x <- assay(OSLO2EMIT0obj$data_input$se_NC) out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) # Generate the heatmap p <- Vis_heatmap(x, out) plot(p)
This function generates a heatmap to visualize breast cancer subtypes classified by multiple subtyping methods. It helps users compare how different methods assign subtypes to the same set of samples.
Vis_Multi(data)Vis_Multi(data)
data |
Output of the |
Returns a heatmap visualizing the subtype classifications across multiple methods.
data("OSLO2EMIT0obj") # Assuming `OSLO2EMIT0obj$res$res_subtypes` contains multi-method subtype results p <- Vis_Multi(OSLO2EMIT0obj$res$res_subtypes) plot(p)data("OSLO2EMIT0obj") # Assuming `OSLO2EMIT0obj$res$res_subtypes` contains multi-method subtype results p <- Vis_Multi(OSLO2EMIT0obj$res$res_subtypes) plot(p)
This function generates a PCA plot to visualize the principal components of gene expression data, colored by the assigned subtypes. Optionally, it can display a scree plot of eigenvalues to evaluate the explained variance.
Vis_PCA(x, out, Eigen = FALSE)Vis_PCA(x, out, Eigen = FALSE)
x |
A gene expression matrix, where genes are rows and samples are columns. The data should be log2 transformed. |
out |
A data frame containing two columns: |
Eigen |
Logical. If |
A ggplot object representing the PCA plot, colored by subtype. If
Eigen is set to TRUE, a scree plot of the eigenvalues is also included.
library(SummarizedExperiment) data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Gene expression matrix and subtype information x <- assay(OSLO2EMIT0obj$data_input$se_NC) out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) # Generate the PCA plot p <- Vis_PCA(x = x, out = out) plot(p) # Generate PCA plot with scree plot of eigenvalues p_with_eigen <- Vis_PCA(x = x, out = out, Eigen = TRUE) plot(p_with_eigen)library(SummarizedExperiment) data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Gene expression matrix and subtype information x <- assay(OSLO2EMIT0obj$data_input$se_NC) out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) # Generate the PCA plot p <- Vis_PCA(x = x, out = out) plot(p) # Generate PCA plot with scree plot of eigenvalues p_with_eigen <- Vis_PCA(x = x, out = out, Eigen = TRUE) plot(p_with_eigen)
This function generates a pie chart to visualize the
distribution of breast cancer subtypes in a cohort, based on the provided
Subtype data.
Vis_pie(out)Vis_pie(out)
out |
A data frame containing two columns: |
A ggplot object representing a pie chart showing the proportion of
each subtype in the dataset.
data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Subtype information out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) # Generate the pie chart p <- Vis_pie(out = out) plot(p)data("OSLO2EMIT0obj") res <- OSLO2EMIT0obj$res # Prepare data: Subtype information out <- data.frame( PatientID = res$results$genefu.robust$BS.all$PatientID, Subtype = res$results$genefu.robust$BS.all$BS ) # Generate the pie chart p <- Vis_pie(out = out) plot(p)