Title: | A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes |
---|---|
Description: | Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro). |
Authors: | Mohammad Tanvir Ahamed [aut, trl], Anna Danielsson [aut], Szilárd Nemes [aut, trl], Helena Carén [aut, cre, cph] |
Maintainer: | Helena Carén <[email protected]> |
License: | GPL-2 |
Version: | 1.35.0 |
Built: | 2024-11-23 06:19:43 UTC |
Source: | https://github.com/bioc/MethPed |
Check missing valus in dataset
checkNA(data)
checkNA(data)
data |
Object in class data.frame or matrix or ExpressionSet. |
#################### Loading and view sample data data(MethPed_sample) head(MethPed_sample,10) ################### Check missing value checkNA(MethPed_sample)
#################### Loading and view sample data data(MethPed_sample) head(MethPed_sample,10) ################### Check missing value checkNA(MethPed_sample)
The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).
MethPed(data, TargetID="TargetID", prob = TRUE,...)
MethPed(data, TargetID="TargetID", prob = TRUE,...)
data |
Data for classification. Data can be in class "ExpressionSet", "matrix" or "data.frame". See MethPed vignette for details. |
TargetID |
Name of the "Probe" column in data |
prob |
If 'TRUE' (Default value), the return value will be conditional probability from which tumor group the sample belongs to. 'FALSE' will only return value for the tumor group which has the maximum conditional probability for a sample and other non-maximum values will be zero. |
... |
More parameter to add |
Classification of pediatric tumors into biologically defined subtypes is challenging, and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors.
The MethPed classifier uses the Random Forest (RF) algorithm to classify unknown pediatric brain tumor samples into sub-types. The classification proceeds with the selection of the beta values needed for the classification.
The computational process proceeded in two stages. The first stage commences with a reduction of the probe pool or building the training probe dataset for classification. We have named this dataset as “predictors”. The second stage is to apply the RF algorithm to classify the probe data of interest based on the training probe dataset (predictors).
For the construction of the training probe pool (predictors), methylation data generated by the Illumina Infinium HumanMethylation 450 BeadChip arrays were downloaded from the Gene Expression Omnibus (GEO). Four hundred seventy-two cases were available, representing several brain tumor diagnoses (DIPG, glioblastoma, ETMR, medulloblastoma, ependymoma, pilocytic astrocytoma) and their further subgroups.
The data sets were merged and probes that did not appear in all data sets were filtered away. In addition, about 190,000 CpGs were removed due to SNPs, repeats and multiple mapping sites. The final data set contained 206,823 unique probes and nine tumor classes including the medulloblastoma subgroups. K–neighbor imputation was used for missing probe data.
After that, a large number of regression analyses were performed to select the 100 probes per tumor class that had the highest predictive power (AUC values). Based on the identified 900 methylation sites, the nine pediatric brain tumor types could be accurately classified using the multiclass classification algorithm MethPed.
The output of the algorithm is partitioned in 6 parts :
Probes name in main data set
Number of probes
Number of samples
Names of the probes that were included in the original classifier but are missing from the data at hand.
If a large number of probes are missing this could potentially lead to a drop in the predictive accuracy of the model. Note that, the out-of-bag error is 1.7 percent of the original MethPed classifier.
The last and most interesting output contains the predictions. We chose not to assign tumors to one or other group but give probabilities of belonging in one or other subtype. See vignette for details.
Anna Danielsson, Mohammad Tanvir Ahamed, Szilárd Nemes and Helena Carén, University of Gothenburg, Sweden.
[1] Anna Danielsson, Szilárd Nemes, Magnus Tisell, Birgitta Lannering, Claes Nordborg, Magnus Sabel, and Helena Carén. "MethPed: A DNA Methylation Classifier Tool for the Identification of Pediatric Brain Tumor Subtypes". Clinical Epigenetics 2015, 7:62, 2015
[2] Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5-32. doi:10.1023/A:1010933404324
[3] Troyanskaya, O., M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. "Missing Value Estimation Methods for DNA Microarrays." Bioinformatics 17.6 (2001): 520-25.
See http://www.clinicalepigeneticsjournal.com/content/7/1/62 for more details.
#################### Loading and view sample data data(MethPed_sample) head(MethPed_sample,10) #################### Check dimention of sample data dim(MethPed_sample) # Check number pof probes and samples in data #################### Checking missing value in the data missingIndex <- checkNA(MethPed_sample) #################### Apply MethPed to sample data (Probability for all tumor group) myClassification<-MethPed(MethPed_sample) myClassification<-MethPed(MethPed_sample,prob=TRUE) #################### Apply MethPed to sample data (Only maximum probability expected) myClassification_max<-MethPed(MethPed_sample,prob=FALSE) #################### Summary of results summary(myClassification) summary(myClassification) #################### Barplot of conditional prediction probability on different samples par(mai = c(1, 1, 1, 2), xpd=TRUE) mat<-t(myClassification$predictions) mycols <- c("green",rainbow(nrow(mat),start=0,end=1)[nrow(mat):1],"red") barplot(mat, col = mycols, beside=FALSE,axisnames=TRUE, ylim=c(0,1),xlab= "Sample",ylab="Probability") legend( ncol(mat)+0.5,1, legend = rownames(mat),fill = mycols,xpd=TRUE, cex = 0.6) ## Generic function to plot plot(myClassification) # myClassification should be an object in "methPed" class
#################### Loading and view sample data data(MethPed_sample) head(MethPed_sample,10) #################### Check dimention of sample data dim(MethPed_sample) # Check number pof probes and samples in data #################### Checking missing value in the data missingIndex <- checkNA(MethPed_sample) #################### Apply MethPed to sample data (Probability for all tumor group) myClassification<-MethPed(MethPed_sample) myClassification<-MethPed(MethPed_sample,prob=TRUE) #################### Apply MethPed to sample data (Only maximum probability expected) myClassification_max<-MethPed(MethPed_sample,prob=FALSE) #################### Summary of results summary(myClassification) summary(myClassification) #################### Barplot of conditional prediction probability on different samples par(mai = c(1, 1, 1, 2), xpd=TRUE) mat<-t(myClassification$predictions) mycols <- c("green",rainbow(nrow(mat),start=0,end=1)[nrow(mat):1],"red") barplot(mat, col = mycols, beside=FALSE,axisnames=TRUE, ylim=c(0,1),xlab= "Sample",ylab="Probability") legend( ncol(mat)+0.5,1, legend = rownames(mat),fill = mycols,xpd=TRUE, cex = 0.6) ## Generic function to plot plot(myClassification) # myClassification should be an object in "methPed" class
List of 900 probes in the predictor (Training data for Random Forest model)
data(MethPed_900probes)
data(MethPed_900probes)
Data frame of 900 probes.
For the construction of the training probe pool (predictors), methylation data generated by the Illumina Infinium HumanMethylation 450 BeadChip arrays were downloaded from the Gene Expression Omnibus (GEO). Four hundred seventy-two cases were available, representing several brain tumor diagnoses (DIPG, glioblastoma, ETMR, medulloblastoma, ependymoma, pilocytic astrocytoma) and their further subgroups.
The data sets were merged and probes that did not appear in all data sets were filtered away. In addition, about 190,000 CpGs were removed due to SNPs, repeats and multiple mapping sites. The final data set contained 206,823 unique probes and nine tumor classes including the medulloblastoma subgroups. K–neighbor imputation was used for missing probe data.
After that, a large number of regression analyses were performed to select the 100 probes per tumor class that had the highest predictive power (AUC values). Based on the identified 900 methylation sites, the nine pediatric brain tumor types could be accurately classified using the multiclass classification algorithm MethPed.
450k methylation array probe name
[1] Anna Danielsson, Szilárd Nemes, Magnus Tisell, Birgitta Lannering, Claes Nordborg, Magnus Sabel, and Helena Carén. "MethPed: A DNA Methylation Classifier Tool for the Identification of Pediatric Brain Tumor Subtypes". Clinical Epigenetics 2015, 7:62, 2015
See http://www.clinicalepigeneticsjournal.com/content/7/1/62 for more details.
#################### Loading and view sample data data(MethPed_900probes) head(MethPed_900probes)
#################### Loading and view sample data data(MethPed_900probes) head(MethPed_900probes)
Methylation beta-values generated with the Infinium HumanMethylation450 BeadChips (Illumina).
data(MethPed_sample)
data(MethPed_sample)
A data frame with 468821 probes and 2 tumor samples
DNA Methylation beta-values
[1] Anna Danielsson, Szilárd Nemes, Magnus Tisell, Birgitta Lannering, Claes Nordborg, Magnus Sabel, and Helena Carén. "MethPed: A DNA Methylation Classifier Tool for the Identification of Pediatric Brain Tumor Subtypes". Clinical Epigenetics 2015, 7:62, 2015
See http://www.clinicalepigeneticsjournal.com/content/7/1/62 for more details.
#################### Loading and view sample data data(MethPed_sample) head(MethPed_sample) #################### Check dimention of sample data dim(MethPed_sample) # Check number of probes and samples in data #################### Checking missing value in the data missingIndex <- checkNA(MethPed_sample)
#################### Loading and view sample data data(MethPed_sample) head(MethPed_sample) #################### Check dimention of sample data dim(MethPed_sample) # Check number of probes and samples in data #################### Checking missing value in the data missingIndex <- checkNA(MethPed_sample)
Plot conditional probability of samples that belongs to different tumor subtypes.
## S3 method for class 'methped' plot(x, ...)
## S3 method for class 'methped' plot(x, ...)
x |
Object in "methped" class. Output of function MethPed. |
... |
More arguments from function barplot. |
Object in "methped" class. Output of function MethPed.
#################### Loading sample data data(MethPed_sample) #################### Applying MethPed to sample data res<-MethPed(MethPed_sample) #################### Plot conditional probability plot(res)
#################### Loading sample data data(MethPed_sample) #################### Applying MethPed to sample data res<-MethPed(MethPed_sample) #################### Plot conditional probability plot(res)
Missing probe names from training data compared with input samples.
probeMis(x)
probeMis(x)
x |
Object in MethPed class. Output of function MethPed. |
Object in "methped" class. Output of function MethPed.
#################### Loading sample data data(MethPed_sample) #################### Applying MethPed to sample data res<-MethPed(MethPed_sample) #################### The names of the probes that were included in the original # classifier but are missing from the data at hand probeMis(res)
#################### Loading sample data data(MethPed_sample) #################### Applying MethPed to sample data res<-MethPed(MethPed_sample) #################### The names of the probes that were included in the original # classifier but are missing from the data at hand probeMis(res)
Summary of conditional probability or binary classification of samples that belong to different tumor subtypes.
## S3 method for class 'methped' summary(object, ...)
## S3 method for class 'methped' summary(object, ...)
object |
Object in methped class. Output of function MethPed. |
... |
Additional arguments affecting the summary produced |
Object in "methped" class. Output of function MethPed.
#################### Loading sample data data(MethPed_sample) #################### Applying MethPed to sample data res<-MethPed(MethPed_sample) #################### Summary function of MethPed output summary (res)
#################### Loading sample data data(MethPed_sample) #################### Applying MethPed to sample data res<-MethPed(MethPed_sample) #################### Summary function of MethPed output summary (res)