Title: | AIMS : Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype |
---|---|
Description: | This package contains the AIMS implementation. It contains necessary functions to assign the five intrinsic molecular subtypes (Luminal A, Luminal B, Her2-enriched, Basal-like, Normal-like). Assignments could be done on individual samples as well as on dataset of gene expression data. |
Authors: | Eric R. Paquet, Michael T. Hallett |
Maintainer: | Eric R Paquet <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.39.0 |
Built: | 2024-11-25 05:56:10 UTC |
Source: | https://github.com/bioc/AIMS |
This is the model definition for AIMS. It contains the naive bayes classifier composed of the 100 rules described in Paquet et al. "Absolute assignment of breast cancer intrinsic molecular subtype" (under review at JNCI).
AIMSmodel
AIMSmodel
This is the AIMS model define using 100 simple rules of the form gene A < gene B and combine within a naive bayes classifier within e1071. (Paquet et al. under review JNCI)
Briefly, using a suitably large training dataset(~5000 gene breast cancer gene expression profiles), the approach identifies a small set of simple binary rules (~20) that examine the raw expression measurements for pairs of genes from a single breast cancer patient, and only that patient. The binary rules are of the form "if the expression of gene x is greater than gene y, then tend to assign subtype z for that patient". Subtypes could be : Basal, Her2, LumA, LumB, or Normal. The collection of binary rules is combined for a single estimation of a patient subtype via a single probabilistic model using naiveBayes in e1071. In this way, since only expression levels of genes with a single patient is considered, the method represents a promising approach to ablate the instability caused by relativistic approaches (Paquet et al. in review at JNCI).
all.pairs |
The 100 rules in AIMS in the form EntrezID gene A < EntrezID gene B |
k |
The selected number of optimal rules. For AIMS we have shown it is 20. |
one.vs.all.tsp |
The Naive bayes classifier used in combination with the 100 rules |
selected.pairs.list |
The list of rules sorted from the best discriminating rule to the least discriminating rules subdivided by subtype. |
Eric R. Paquet ([email protected])
applyAIMS
, mcgillExample
, naiveBayes
## Load a sample of the McGill dataset used in the paper data(AIMSmodel) ## list the top-scoring rules for the individual subtypes AIMSmodel$selected.pairs.list ## List the posterior probability tables for the 100-rules AIMSmodel$one.vs.all.tsp[[20]]$table
## Load a sample of the McGill dataset used in the paper data(AIMSmodel) ## list the top-scoring rules for the individual subtypes AIMSmodel$selected.pairs.list ## List the posterior probability tables for the 100-rules AIMSmodel$one.vs.all.tsp[[20]]$table
Given a gene expression matrix D where rows correspond to genes and columns to samples and a list of Entrez gene ids, this function will assign the breast cancer molecular subtype using Absolute PAM50 (AIMS) (Paquet et al. under review at JNCI).
applyAIMS (eset, EntrezID)
applyAIMS (eset, EntrezID)
eset |
An ExpressionSet (Biobase) or a matrix D of gene expression. Rows are genes and columns are samples (tumor from breast cancer patients). |
EntrezID |
A character vector corresponding to the Entrez Ids of the genes of matrix D. |
We defined Absolute assignment of breast cancer Molecular Intrinsic Subtypes (AIMS) to stabilize the actual PAM50. The idea of the approach is to use simple rules of the form "if gene A expression < gene B expression assigned sample to subtype X". By using those simple rules we are able to assign subtype using only the expression values of one patient. We have shown AIMS recapitulates PAM50 subtype assignments and preserved the prognostic values of the subtypes. This function will return the subtype assignment as well as the posterior probabilities for all the subtypes.
cl |
Subtypes identified by the AIMS. It could be either "Basal", "Her2", "LumA", "LumB" or "Normal". |
prob |
A vector corresponding to the posterior probabilities of the subtypes in cl. |
all.probs |
A matrix of all the posterior probabilities for all the samples and all the subtypes |
rules.matrix |
The matrix of 100 rules used by AIMS to assign the breast cancer subtypes. Rows correspond to rules and columns to samples. This is a 0,1 matrix in which 1 represents the rules is true and 0 the rule is false |
data.used |
The expression values used to evaluate the simple rules. |
EntrezID.used |
The list of Entrez ids used by AIMS |
Eric R. Paquet ([email protected])
Parker, Joel S. and Mullins, Michael and Cheang, Maggie C.U. and Leung, Samuel and Voduc, David and Vickery, Tammi and Davies, Sherri and Fauron, Christiane and He, Xiaping and Hu, Zhiyuan and Quackenbush, John F. and Stijleman, Inge J. and Palazzo, Juan and Marron, J.S. and Nobel, Andrew B. and Mardis, Elaine and Nielsen, Torsten O. and Ellis, Matthew J. and Perou, Charles M. and Bernard, Philip S. (2009) "Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes", Journal of Clinical Oncology, 27(8):1160–1167
Donald Geman, Christian d'Avignon, Daniel Q. Naiman and Raimond L. Winslow (2004) "Classifying Gene Expression Profiles from Pairwise mRNA Comparisons", Stat Appl Genet Mol Biol., 3: Article19.
## Load the McGill dataset used in the paper data(mcgillExample) ## Convert the expression matrix to an ExpressionSet. ## could also send directly an expresion matrix mcgillExample$D <- ExpressionSet(assayData=mcgillExample$D) ## Assigne AIMS on McGill dataset mcgill.AIMS.subtypes.batch <- applyAIMS (mcgillExample$D, mcgillExample$EntrezID) ## Print a summary of all the subtype in the dataset table(mcgill.AIMS.subtypes.batch$cl) ## We can do the samething using only one sample mcgill.AIMS.subtypes.first <- applyAIMS (mcgillExample$D[,1,drop=FALSE], mcgillExample$EntrezID) table(mcgill.AIMS.subtypes.first$cl) if (mcgill.AIMS.subtypes.batch$cl[1] == mcgill.AIMS.subtypes.first$cl[1]){ message("Identical assignment batch and first sample") }else{ message("Different assignment batch and first sample") } ## We can do the samething for the first 20 samples mcgill.AIMS.subtypes.first20 <- applyAIMS (mcgillExample$D[,1:20,drop=FALSE], mcgillExample$EntrezID) table(mcgill.AIMS.subtypes.first20$cl) if (all(mcgill.AIMS.subtypes.batch$cl[1:20] == mcgill.AIMS.subtypes.first20$cl)){ message("Identical assignment batch and first 20") }else{ message("Different assignment batch and first 20") } ## We can do the samething using 50 randomly selected samples, no ## set.seed needed sel.rand.50 = sample(1:ncol(mcgillExample$D),50) mcgill.AIMS.subtypes.rand50 <- applyAIMS (mcgillExample$D[,sel.rand.50,drop=FALSE], mcgillExample$EntrezID) table(mcgill.AIMS.subtypes.rand50$cl) if (all(mcgill.AIMS.subtypes.batch$cl[sel.rand.50] == mcgill.AIMS.subtypes.rand50$cl)){ message("Identical assignment batch and random 50") }else{ message("Different assignment batch and random 50") }
## Load the McGill dataset used in the paper data(mcgillExample) ## Convert the expression matrix to an ExpressionSet. ## could also send directly an expresion matrix mcgillExample$D <- ExpressionSet(assayData=mcgillExample$D) ## Assigne AIMS on McGill dataset mcgill.AIMS.subtypes.batch <- applyAIMS (mcgillExample$D, mcgillExample$EntrezID) ## Print a summary of all the subtype in the dataset table(mcgill.AIMS.subtypes.batch$cl) ## We can do the samething using only one sample mcgill.AIMS.subtypes.first <- applyAIMS (mcgillExample$D[,1,drop=FALSE], mcgillExample$EntrezID) table(mcgill.AIMS.subtypes.first$cl) if (mcgill.AIMS.subtypes.batch$cl[1] == mcgill.AIMS.subtypes.first$cl[1]){ message("Identical assignment batch and first sample") }else{ message("Different assignment batch and first sample") } ## We can do the samething for the first 20 samples mcgill.AIMS.subtypes.first20 <- applyAIMS (mcgillExample$D[,1:20,drop=FALSE], mcgillExample$EntrezID) table(mcgill.AIMS.subtypes.first20$cl) if (all(mcgill.AIMS.subtypes.batch$cl[1:20] == mcgill.AIMS.subtypes.first20$cl)){ message("Identical assignment batch and first 20") }else{ message("Different assignment batch and first 20") } ## We can do the samething using 50 randomly selected samples, no ## set.seed needed sel.rand.50 = sample(1:ncol(mcgillExample$D),50) mcgill.AIMS.subtypes.rand50 <- applyAIMS (mcgillExample$D[,sel.rand.50,drop=FALSE], mcgillExample$EntrezID) table(mcgill.AIMS.subtypes.rand50$cl) if (all(mcgill.AIMS.subtypes.batch$cl[sel.rand.50] == mcgill.AIMS.subtypes.rand50$cl)){ message("Identical assignment batch and random 50") }else{ message("Different assignment batch and random 50") }
This is a breast cancer gene expression dataset generated at McGill University using the Affymetrix Gene ST platform. This is the validation dataset used in AIMS' paper and in use in the examples of AIMS.
mcgillExample
mcgillExample
A sample of a breast cancer gene expression dataset currently generated at McGill University on the Affymetrix gene ST platform.
D |
A matrix of gene gene expression, rows are genes and columns are samples |
EntrezID |
A vector of Entrez ids corresponding to genes (rows) of D |
Eric R. Paquet ([email protected])
## Load a sample of the McGill dataset used in the paper data(mcgillExample) ## Dimensions of the gene expression matrix dim(mcgillExample$D) ## Number of EntrezID length(mcgillExample$EntrezID)
## Load a sample of the McGill dataset used in the paper data(mcgillExample) ## Dimensions of the gene expression matrix dim(mcgillExample$D) ## Number of EntrezID length(mcgillExample$EntrezID)