Title: | Identify oncogenes and tumor suppressor genes from omics data |
---|---|
Description: | The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). We present an updated version of the R/bioconductor package called MoonlightR, namely Moonlight2R, which returns a list of candidate driver genes for specific cancer types on the basis of omics data integration. The Moonlight framework contains a primary layer where gene expression data and information about biological processes are integrated to predict genes called oncogenic mediators, divided into putative tumor suppressors and putative oncogenes. This is done through functional enrichment analyses, gene regulatory networks and upstream regulator analyses to score the importance of well-known biological processes with respect to the studied cancer type. By evaluating the effect of the oncogenic mediators on biological processes or through random forests, the primary layer predicts two putative roles for the oncogenic mediators: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. We have automated the integration of a secondary mutational layer through new functionalities in Moonlight2R. These functionalities analyze mutations in the cancer cohort and classifies these into driver and passenger mutations using the driver mutation prediction tool, CScape-somatic. Those oncogenic mediators with at least one driver mutation are retained as the driver genes. As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, Moonlight2R can be used to discover OCGs and TSGs in the same cancer type. This may for instance help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV). In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. An additional mechanistic layer evaluates if there are mutations affecting the protein stability of the transcription factors (TFs) of the TSGs and OCGs, as that may have an effect on the expression of the genes. |
Authors: | Mona Nourbakhsh [aut], Astrid Saksager [aut], Nikola Tom [aut], Katrine Meldgård [aut], Anna Melidi [aut], Xi Steven Chen [aut], Antonio Colaprico [aut], Catharina Olsen [aut], Matteo Tiberti [cre, aut], Elena Papaleo [aut] |
Maintainer: | Matteo Tiberti <[email protected]> |
License: | GPL-3 |
Version: | 1.5.2 |
Built: | 2024-12-19 03:12:16 UTC |
Source: | https://github.com/bioc/Moonlight2R |
This function annotated a confidence level to the score
confidence(s, type)
confidence(s, type)
s |
the score |
type |
coding or noncoding |
returns a confidence level or remark/error message
remark <- confidence(0.8, type='Coding')
remark <- confidence(0.8, type='Coding')
Output from DMA. This contains the cscape-somatic annotations for all differentially expressed genes
data(cscape_somatic_output)
data(cscape_somatic_output)
A 645x7 matrix.
A 645x7 matrix.
The predicted driver genes, which have at least one driver mutation.
data(dataDMA)
data(dataDMA)
A list of two.
A list of two, containing 0 tumor-suppressor and 1 oncogene.
The output of the FEA function which does enrichment analysis
data(dataFEA)
data(dataFEA)
A dataframe of dimension 101x7
The input to the FEA is the differentially expressed genes.
A dataframe of dimension 101x7
A matrix that provides processed gene expression data (obtained from RNA seq) from the TCGA-LUAD project
data(dataFilt)
data(dataFilt)
A 3000x20 matrix
The matrix contains the genes in rows and samples in columns. The data has been downloaded and processed using TCGAbiolinks.
A 3000x20 matrix
A tibble containing results of literature search where predicted driver genes stored in dataDMA were queried for their role as drivers in PubMed
data(dataGLS)
data(dataGLS)
A 13x8 tibble.
The tibble contains PubMed IDs, doi, title, abstract, year of publication, keywords, and total number of publications for the genes.
A 13x8 tibble.
The predicted driver genes based on methylation evidence
data(dataGMA)
data(dataGMA)
A list of length two
The data is a list of two elements where each element represents predicted oncogenes and tumor suppressors
A list of length two
The output of the GRN function which finds connections between genes.
data(dataGRN)
data(dataGRN)
A list of 2 elements where the first element is a 23x613 matrix and the second element is a vector of length 23
The input to the GRN is the differentially expressed genes and the gene expression data.
A list of 2 elements where the first element is a 23x613 matrix and the second element is a vector of length 23
The output of the GRN function which finds connections between genes where the noise is set to 0 for testing reproducibility purposes.
data(dataGRN_no_noise)
data(dataGRN_no_noise)
A list of 2 elements where the first element is a 23x613 matrix and the second element is a vector of length 23
The input to the GRN is the differentially expressed genes and the gene expression data.
A list of 2 elements where the first element is a 23x613 matrix and the second element is a vector of length 23
An examplary MAF file from TCGA on lung cancer LUAD. It contains 500 randomly selected mutations.
data(dataMAF)
data(dataMAF)
A 500x141 matrix.
A 500x141 matrix.
A data matrix containing methylation data from TCGA-LUAD where CpG probes are in rows and samples are in columns.
data(dataMethyl)
data(dataMethyl)
A 73x27 matrix.
The CpG probes are in rows and samples are in columns.
A 73x27 matrix.
The predicted TSGs and OCGs and their moonlight gene z-score based on the small sample TCGA-LUAD data. The PRA() were run with expert-based approach with apoptosis and proliferation of cells.
data(dataPRA)
data(dataPRA)
A list of two.
A list of two.
The output of the URA function which carries out the upstream regulator analysis
data(dataURA)
data(dataURA)
A 23x2 matrix
The input to URA is the output of GRN and a list of biological processes and the differentially expressed genes
A 23x2 matrix
The output of the URA function which carries out the upstream regulator analysis
data(dataURA_plot)
data(dataURA_plot)
A 12x2 matrix
This URA data is used to showcase some of the visualization functions
A 12x2 matrix
Output file from running GMA function which is a summary of DEGs and associated CpG probes
data(DEG_Methylation_Annotations)
data(DEG_Methylation_Annotations)
A 3435x35 tibble
The data is a table where each row is a CpG probe in a DEG. Various annotations such as start/end site of CpG probe, promoter/enhancer annotations, NCG annotations are included in the table.
A 3435x35 tibble
Output from DMA. This contains the differentially expressed genes's mutations and all annotations generated in DMA() on the TCGA-LUAD project.
data(DEG_Mutations_Annotations)
data(DEG_Mutations_Annotations)
A 3561x173 matrix.
A 3561x173 matrix.
A matrix containing differentially expressed genes between lung cancer and normal samples found using TCGA-LUAD data and TCGAbiolinks.
data(DEGsmatrix)
data(DEGsmatrix)
A 3390x5 matrix
The matrix contains the differentially expressed genes in rows and log2 fold change and FDR values in columns.
A 3390x5 matrix
A dataset containing information about 101 cancer-related biological processes.
data(DiseaseList)
data(DiseaseList)
A list of 101 elements
The dataset contains a list of the 101 biological processes which includes genes playing a role in each biological processes including literature findings of the genes' function in the biological processes.
A list of 101 elements
This function carries out the driver mutation analysis.
DMA( dataMAF, dataDEGs, dataPRA, runCscape = TRUE, coding_file, noncoding_file, results_folder = "./DMAresults" )
DMA( dataMAF, dataDEGs, dataPRA, runCscape = TRUE, coding_file, noncoding_file, results_folder = "./DMAresults" )
dataMAF |
A MAF file rda object. The MAF file must at least contain the following columns:
|
dataDEGs |
Output DEA function. |
dataPRA |
Output PRA function. |
runCscape |
Bolean. If |
coding_file |
A character string. Path to and name of CScape-somatic coding file. Can be downloaded at http://cscape-somatic.biocompute.org.uk/#download. The .tbi file must be placed in the same folder. |
noncoding_file |
A charcter string. Path to and name of CScape-somatic noncoding file. Can be downloaded at http://cscape-somatic.biocompute.org.uk/#download. The .tbi file must be placed in the same folder. |
results_folder |
A character string. Path to the results generated by this function. |
For more information about the different annotations added to the mutations please see the documentation as followes:
data(NCG)
, data(EncodePromoters)
, data(LOC_protein)
data(LOC_transcription)
and data(LOC_translation)
.
List of two, containing TSGs and OCGs with at least one driver mutation. Additionally files are saved in results_folder
.
All output files are in compressed .rda format.
All differentially expressed genes’ mutations and their annotations. These annotations include e.g. Cscape-somatic assessment, Level of Consequence, overlab with promoter sites and information from Network of Cancer Genes (NCG 7.0). All information from MAF and DEA is contained.
All oncogenic mediators and an summarisation of their mutation based on CScape-somatic assessment, Level of Consequences and total number of mutations. If a gene as previously been assessed as a driver in Network of Cancer Genes (7.0), it is annotated in a separate column.
The file contain the cscape-somatic assessment for every mutation found in the differentially expressed genes. It is formatted exactly as the output of cscape-somatic, as if it was run in the terminal, except it is saved as .rda instead of csv.
DMA(dataMAF = dataMAF, dataDEGs = DEGsmatrix, dataPRA = dataPRA, coding_file = "path/css_coding.vcf.gz", noncoding_file = "path/css_noncoding.vcf.gz", results_folder = "path/results") #If the cscape-somatic file have already been created cscape_somatic_output <- read.csv("./results/Cscape_somatic_output.csv") save(cscape_somatic_output, file = "./results/Cscape_somatic_output.rda") DMA(dataMAF = dataMAF, dataDEGs = DEGsmatrix, dataPRA = dataPRA, runCscape = FALSE, results_folder = "./results")
DMA(dataMAF = dataMAF, dataDEGs = DEGsmatrix, dataPRA = dataPRA, coding_file = "path/css_coding.vcf.gz", noncoding_file = "path/css_noncoding.vcf.gz", results_folder = "path/results") #If the cscape-somatic file have already been created cscape_somatic_output <- read.csv("./results/Cscape_somatic_output.csv") save(cscape_somatic_output, file = "./results/Cscape_somatic_output.rda") DMA(dataMAF = dataMAF, dataDEGs = DEGsmatrix, dataPRA = dataPRA, runCscape = FALSE, results_folder = "./results")
A matrix containing information about 20038 genes including their gene description, location and family
data(EAGenes)
data(EAGenes)
A 20038x5 matrix
The matrix contains the genes in rows and description, location and family in columns.
A 20038x5 matrix
Experimentially verified promoter sites by J. Michael Cherry, Stanford. Downloaded from the ENCODE identifier ENCSR294YNI. It contains chromosome, start and end sites of promoters.
data(EncodePromoters)
data(EncodePromoters)
A tibble with no columnnames or rownames.
The first column is chromosome eg. chr1
The second column is start position eg. 10451
The third column is end position eg. 10563
A 84738x6 table
https://www.encodeproject.org/
ENCODE identifier: ENCSR294YNI
Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, Baymuradov UK, Graham K, Litton C, Miyasato SR, Strattan JS, Jolanki O, Lee JW, Tanaka FY, Adenekan P, O'Neill E, Cherry JM. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 2020 Jan 8;48(D1):D882-D889. doi: 10.1093/nar/gkz1062. PMID: 31713622; PMCID: PMC7061942.
The object, a list, that was returned from running the EpiMix function and is one of the outputs from the GMA function.
data(EpiMix_Results_Regular)
data(EpiMix_Results_Regular)
A list of length nine
The data is a list of nine elements which is outputted from the EpiMix function
A list of length nine
This function carries out the functional enrichment analysis (FEA)
FEA(BPname = NULL, DEGsmatrix)
FEA(BPname = NULL, DEGsmatrix)
BPname |
BPname biological process such as "proliferation of cells", "ALL" (default) if FEA should be carried out for all 101 biological processes |
DEGsmatrix |
DEGsmatrix output from DEA such as dataDEGs |
matrix from FEA
data(DEGsmatrix) data(DiseaseList) data(EAGenes) DEGsmatrix <- DEGsmatrix[seq.int(2), ] dataFEA <- FEA(DEGsmatrix = DEGsmatrix, BPname = "apoptosis")
data(DEGsmatrix) data(DiseaseList) data(EAGenes) DEGsmatrix <- DEGsmatrix[seq.int(2), ] dataFEA <- FEA(DEGsmatrix = DEGsmatrix, BPname = "apoptosis")
A matrix that provides the GEO dataset matched to one of 18 TCGA cancer types
data(GEO_TCGAtab)
data(GEO_TCGAtab)
A 18x12 matrix
The matrix contains the cancer types in rows and information about sample type from both TCGA and GEO in columns.
A 18x12 matrix
This function retrieves and prepares GEO data
getDataGEO(GEOobject = "GSE39004", platform = "GPL6244", TCGAtumor = NULL)
getDataGEO(GEOobject = "GSE39004", platform = "GPL6244", TCGAtumor = NULL)
GEOobject |
GEOobject |
platform |
platform |
TCGAtumor |
tumor name |
return GEO gset
data(GEO_TCGAtab) dataGEO <- getDataGEO(GEOobject = "GSE15641", platform = "GPL96")
data(GEO_TCGAtab) dataGEO <- getDataGEO(GEOobject = "GSE15641", platform = "GPL96")
GLS This function carries out gene literature search.
GLS(genes, query_string = "AND cancer AND driver", max_records = 20)
GLS(genes, query_string = "AND cancer AND driver", max_records = 20)
genes |
A character string containing the genes to search in PubMed database |
query_string |
A character string containing words in query to follow the gene of interest. Default is "AND cancer AND driver" resulting in a final query of "Gene AND cancer AND driver". Standard PubMed syntax can be used in the query. For example Boolean operators AND, OR, NOT can be applied and tags such as [AU], [TITLE/ABSTRACT], [Affiliation] can be used. |
max_records |
An integer containing the maximum number of records to be fetched from PubMed. |
A tibble containing results of literature search where PubMed was queried for information of input genes. Each row in the tibble contains a PubMed ID matching the query, doi, title, abstract, year of publication, keywords, and total number of PubMed publications, resulting in a total of eight columns.
genes_query <- "BRCA1" dataGLS <- GLS(genes = genes_query, query_string = "AND cancer AND driver", max_records = 2)
genes_query <- "BRCA1" dataGLS <- GLS(genes = genes_query, query_string = "AND cancer AND driver", max_records = 2)
GMA This function carries out Gene Methylation Analysis
GMA( dataMET, dataEXP, dataPRA, dataDEGs, sample_info, met_platform = "HM450", prevalence_filter = NULL, output_dir = "./GMAresults", cores = 1, roadmap.epigenome.ids = NULL, roadmap.epigenome.groups = NULL )
GMA( dataMET, dataEXP, dataPRA, dataDEGs, sample_info, met_platform = "HM450", prevalence_filter = NULL, output_dir = "./GMAresults", cores = 1, roadmap.epigenome.ids = NULL, roadmap.epigenome.groups = NULL )
dataMET |
A data matrix containing the methylation data where the CpG probes are in the rows and samples are in the columns |
dataEXP |
A data matrix containing the gene expression data where the genes are in the rows and the samples are in the columns |
dataPRA |
A table containing the output of the PRA function |
dataDEGs |
A table containing the output of a DEA where gene names are rownames |
sample_info |
A table containing information on the samples. This table needs to contain two columns called primary and sample.type. The primary column contains sample names which should be the same as the column names in dataMET. The sample.type column indicates for each sample if it is a Cancer or Normal sample. |
met_platform |
A character string representing the microarray type that was used to collect the methylation data. This can either be HM27, HM450 or EPIC. Default is HM450. |
prevalence_filter |
A float or NULL representing if a prevalence filter should be applied or not. Default is NULL, meaning a prevalence filter will not be applied. If a float is specified, a prevalence filter will be applied where methylation states of probes will be altered depending on the threshold of prevalence supplied as prevalence_filter. For example, if prevalence_filter = 20, it means that if the prevalence of the hyper- or hypomethylated CpG probe exceeds 20, the methylation state will be unchanged but if the prevalence is lower than 20 the methylation state will be changed to NA, meaning no methylation state was detected. In case of dual methylated probes, the methylation state will stay dual if both the prevalence of hyper- and hypomethylations exceed 20, but if only one of the prevalences exceed 20 the dual state will be changed to the state exceeding 20. If none of the prevalences exceed 20, the dual state will be changed to NA. |
output_dir |
Path to where the results will be stored. If this directory does not exist, it will be created by the function. Default is ./GMAresults. |
cores |
Number of cores to be used. Default is 1. |
roadmap.epigenome.ids |
A character string representing the epigenome ID that will be used to select enhancers. Since enhancers are tissue-specific, the tissue type needs to be specified in EpiMix. The enhancers are found from the RoadmapEpigenome project and the IDs can be found from Figure 2 in the publication with doi: 10.1038/nature14248. Default is NULL. |
roadmap.epigenome.groups |
A character string representing the epigenome group that will be used to select enhancers. Details are provided above with the roadmap.epigenome.ids parameter. Default is NULL. |
List of two elements, containing predicted oncogenes and tumor suppressors. Additionally, various output files are saved in the specified output directory: DEG_Methylation_Annotations.rda, Oncogenic_mediators_methylation_summary.rda, EpiMix_Results_Enhancer.rds, EpiMix_Results_Regular.rds, FunctionalPairs_Enhancer.csv, FunctionalPairs_Regular.csv, FunctionalProbes_Regular.rds
data("dataMethyl") data("dataFilt") data("dataPRA") data("DEGsmatrix") data("LUAD_sample_anno") data("NCG") data("EncodePromoters") data("MetEvidenceDriver") pattern <- "^(.{4}-.{2}-.{4}-.{2}).*" colnames(dataFilt) <- sub(pattern, "\\1", colnames(dataFilt)) dataGMA <- GMA(dataMET = dataMethyl, dataEXP = dataFilt, dataPRA = dataPRA, dataDEGs = DEGsmatrix, sample_info = LUAD_sample_anno, met_platform = "HM450", prevalence_filter = NULL, output_dir = "./GMAresults", cores = 1, roadmap.epigenome.ids = "E096", roadmap.epigenome.groups = NULL)
data("dataMethyl") data("dataFilt") data("dataPRA") data("DEGsmatrix") data("LUAD_sample_anno") data("NCG") data("EncodePromoters") data("MetEvidenceDriver") pattern <- "^(.{4}-.{2}-.{4}-.{2}).*" colnames(dataFilt) <- sub(pattern, "\\1", colnames(dataFilt)) dataGMA <- GMA(dataMET = dataMethyl, dataEXP = dataFilt, dataPRA = dataPRA, dataDEGs = DEGsmatrix, sample_info = LUAD_sample_anno, met_platform = "HM450", prevalence_filter = NULL, output_dir = "./GMAresults", cores = 1, roadmap.epigenome.ids = "E096", roadmap.epigenome.groups = NULL)
This function carries out the gene regulatory network inference using parmigene
GRN( TFs, DEGsmatrix, DiffGenes = FALSE, normCounts, kNearest = 3, nGenesPerm = 2000, nBoot = 400, noise_mi = 1e-12 )
GRN( TFs, DEGsmatrix, DiffGenes = FALSE, normCounts, kNearest = 3, nGenesPerm = 2000, nBoot = 400, noise_mi = 1e-12 )
TFs |
a vector of genes. |
DEGsmatrix |
DEGsmatrix output from DEA such as dataDEGs |
DiffGenes |
if TRUE consider only diff.expr genes in GRN |
normCounts |
is a matrix of gene expression with genes in rows and samples in columns. |
kNearest |
the number of nearest neighbors to consider to estimate the mutual information. Must be less than the number of columns of normCounts. |
nGenesPerm |
nGenesPerm |
nBoot |
nBoot |
noise_mi |
noise in knnmi.cross function. Default is 1e-12. |
an adjacent matrix
data('DEGsmatrix') data('dataFilt') dataGRN <- GRN(TFs = sample(rownames(DEGsmatrix), 30), DEGsmatrix = DEGsmatrix, DiffGenes = TRUE, normCounts = dataFilt, nGenesPerm = 2, nBoot = 2)
data('DEGsmatrix') data('dataFilt') dataGRN <- GRN(TFs = sample(rownames(DEGsmatrix), 30), DEGsmatrix = DEGsmatrix, DiffGenes = TRUE, normCounts = dataFilt, nGenesPerm = 2, nBoot = 2)
This function carries out the GSEA enrichment analysis.
GSEA(DEGsmatrix, top, plot = FALSE)
GSEA(DEGsmatrix, top, plot = FALSE)
DEGsmatrix |
DEGsmatrix output from DEA such as dataDEGs |
top |
is the number of top BP to plot |
plot |
if TRUE return a GSEA's plot |
return GSEA result
data("DEGsmatrix") DEGsmatrix_example <- DEGsmatrix[1:2,] dataFEA <- GSEA(DEGsmatrix = DEGsmatrix_example)
data("DEGsmatrix") DEGsmatrix_example <- DEGsmatrix[1:2,] dataFEA <- GSEA(DEGsmatrix = DEGsmatrix_example)
A list of known cancer driver genes from COSMIC
data(knownDriverGenes)
data(knownDriverGenes)
A list containing two elements where the first element is a character vector of 55 and the second element is a character vector of #' 84
The list contains two elements: a vector of known tumor #' suppressors and a vector of known oncogenes
A list containing two elements where the first element is a character vector of 55 and the second element is a character vector of #' 84
This function lifts a MAF file to a different genomic build.
LiftMAF(Infile, Current_Build)
LiftMAF(Infile, Current_Build)
Infile |
A tibble of MAF. |
Current_Build |
A charcter string, either |
MAF tibble with positions lifted to another build
data(dataMAF) dataMAF_example <- dataMAF[1,] LiftMAF(dataMAF_example, Current_Build = 'GRCh38')
data(dataMAF) dataMAF_example <- dataMAF[1,] LiftMAF(dataMAF_example, Current_Build = 'GRCh38')
A list of oncogenic mediators of 5 TCGA cancer types: BLCA, BRCA, LUAD, READ and STAD
data(listMoonlight)
data(listMoonlight)
A list containing 5 elements where each element contains differentially expressed genes and output from the URA and PRA functions of 5 TCGA cancer types
Each element in the list contains differentially expressed genes and output from the URA and PRA functions
A list containing 5 elements where each element contains differentially expressed genes and output from the URA and PRA functions of 5 TCGA cancer types
This function loads the MAVISp database from the directory specified by the user.
loadMAVISp( mavispDB = NULL, proteins_of_interest = NULL, mode = "simple", ensemble = "md" )
loadMAVISp( mavispDB = NULL, proteins_of_interest = NULL, mode = "simple", ensemble = "md" )
mavispDB |
path to the MAVISp database as a string. The database can be downloaded from an OSF repository (https://osf.io/ufpzm/) and has the following structure. |
proteins_of_interest |
vector containing specific proteins of interest in HUGO format |
mode |
string determining whether to use simple or ensemble mode of mavisp. Default is simple mode. Takes values:
|
ensemble |
string specifying the ensemble to use. The available ensembles can be found in the MAVISp database. This is ignored for simple mode. |
returns a list of tibbles each containing the MAVISp entry of one protein. Each entry will contain an extra column specifying what data the stability classification is based on.
mavisp_db_location <- system.file('extdata', 'mavisp_db', package='Moonlight2R') mavisp_data <- loadMAVISp(mavispDB = mavisp_db_location, proteins_of_interest = c('TP53'), mode = 'ensemble')
mavisp_db_location <- system.file('extdata', 'mavisp_db', package='Moonlight2R') mavisp_data <- loadMAVISp(mavispDB = mavisp_db_location, proteins_of_interest = c('TP53'), mode = 'ensemble')
A dataset binary dataset describing if a mutation of a certain class and type possibly have an effect on protein structure or function.
data(LOC_protein)
data(LOC_protein)
A 18x7 table
The values are binary: 0 no effect is possible, 1 an effect is possible.
See supplementary material for details.
A 18x7 table
paper
A dataset describing if a mutation of a certain class and type possibly have an effect on transcript level.
data(LOC_transcription)
data(LOC_transcription)
A 18x7 table
The values are binary: 0 no effect is possible, 1 an effect is possible.
See supplementary material for details.
A 18x7 table
paper
A dataset describing if a mutation of a certain class and type possibly have an effect on peptide level.
data(LOC_translation)
data(LOC_translation)
A 18x7 table
The values are binary: 0 no effect is possible, 1 an effect is possible.
See supplementary material for details.
A 18x7 table
paper
This function carries out the literature phenotype analysis (LPA)
LPA(dataDEGs, BP, BPlist)
LPA(dataDEGs, BP, BPlist)
dataDEGs |
is output from DEA |
BP |
is biological process |
BPlist |
is list of genes annotated in BP |
table with number of pubmed that affects, increase or decrase genes annotated in BP
data('DEGsmatrix') data('DiseaseList') BPselected <- c("apoptosis") BPannotations <- DiseaseList[[match(BPselected, names(DiseaseList))]]$ID
data('DEGsmatrix') data('DiseaseList') BPselected <- c("apoptosis") BPannotations <- DiseaseList[[match(BPselected, names(DiseaseList))]]$ID
A matrix that annotates LUAD samples as either cancer or normal
data(LUAD_sample_anno)
data(LUAD_sample_anno)
A 23x2 matrix
The matrix contains two columns: "primary" which contains patient barcodes of TCGA-LUAD and "sample.type" which denotes if the sample is either a "Cancer" or "Normal" sample
A 23x2 matrix
This function extracts columns from a MAF tibble to fit CScape input format
MAFtoCscape(MAF)
MAFtoCscape(MAF)
MAF |
tibble of MAF |
tibble of cscape-somatic input
data(dataMAF) MAFtoCscape(dataMAF[seq.int(2),])
data(dataMAF) MAFtoCscape(dataMAF[seq.int(2),])
A tibble containing combinations of methylation states of probes used to define driver genes
data(MetEvidenceDriver)
data(MetEvidenceDriver)
A 30x6 tibble.
The tibble contains a value of 1 if a probe is found that is either hypo-, hyper-, dualmethylated or not methylated. This is compared with Moonlight's predictions of role of oncogenic mediators to define driver genes based on methylation evidence.
A 30x6 tibble.
moonlight is a tool for identification of cancer driver genes. This function wraps the different steps of the complete analysis workflow.
moonlight( dataDEGs, dataFilt, BPname = NULL, Genelist = NULL, kNearest = 3, nGenesPerm = 2000, DiffGenes = FALSE, nBoot = 400, nTF = NULL, thres.role = 0, dataMAF, path_cscape_coding, path_cscape_noncoding )
moonlight( dataDEGs, dataFilt, BPname = NULL, Genelist = NULL, kNearest = 3, nGenesPerm = 2000, DiffGenes = FALSE, nBoot = 400, nTF = NULL, thres.role = 0, dataMAF, path_cscape_coding, path_cscape_noncoding )
dataDEGs |
table of differentially expressed genes |
dataFilt |
matrix of gene expression data with genes in rows and samples in columns |
BPname |
biological processes to use, if NULL: all processes will be used in analysis, RF for candidate; if not NULL the candidates for these processes will be determined (no learning) |
Genelist |
Genelist |
kNearest |
kNearest |
nGenesPerm |
nGenesPerm |
DiffGenes |
DiffGenes |
nBoot |
nBoot |
nTF |
nTF |
thres.role |
thres.role |
dataMAF |
A MAF file rda object for DMA |
path_cscape_coding |
A character string to path of CScape-somatic coding file |
path_cscape_noncoding |
A character string to path of CScape-somatic non-coding file |
table with cancer driver genes TSG and OCG.
drivers <- moonlight(dataDEGs = DEGsmatrix, dataFilt = dataFilt, BPname = c("apoptosis", "proliferation of cells"), dataMAF = dataMAF, path_cscape_coding = "css_coding.vcf.gz", path_cscape_noncoding = "css_noncoding.vcf.gz")
drivers <- moonlight(dataDEGs = DEGsmatrix, dataFilt = dataFilt, BPname = c("apoptosis", "proliferation of cells"), dataMAF = dataMAF, path_cscape_coding = "css_coding.vcf.gz", path_cscape_noncoding = "css_noncoding.vcf.gz")
A dataset retrived from Network of Cancer Genes 7.0
data(NCG)
data(NCG)
The format have been rearranged from the original. <symbold>|<NCG_driver>|<NCG_cgc_annotation>|<NCG_vogelstein_annotation>| <NCG_saito_annotation>|<NCG_pubmed_id>
The NCG_driver is reported as a OCG or TSG when at least one of three three databases have documented it. These are cosmic gene census (cgc), vogelstein et al. 2013 or saito et al. 2020. The NCG_driver is reported as a candidate, when literature support the gene as a cancer driver.
A 3347x7 table
Comparative assessment of genes driving cancer and somatic evolution in non-cancer tissues: an update of the Network of Cancer Genes (NCG) resource. Dressler L., Bortolomeazzi M., Keddar M.R., Misetic H., Sartini G., Acha-Sagredo A., Montorsi L., Wijewardhane N., Repana D., Nulsen J., Goldman J., Pollit M., Davis P., Strange A., Ambrose K. and Ciccarelli F.D.
Output file from running the GMA function which is a summary of the oncogenic mediators and their sum of methylated CpG probes together with the evidence level of their role as driver gene.
data(Oncogenic_mediators_methylation_summary)
data(Oncogenic_mediators_methylation_summary)
A 25x7 tibble
The data is a table where each row is an oncogenic mediator and the columns represent the predicted driver role and the sum of hypo-, hyper-, and dualmethylated CpG sites.
A 25x7 tibble
Output from DMA. This contains the oncogenic mediator from the TCGA-LUAD project, and their mutation assesments summarised based on CSCape-somatic and Level of Consequence.
data(Oncogenic_mediators_mutation_summary)
data(Oncogenic_mediators_mutation_summary)
A 12x15 matrix.
A 12x15 matrix.
This function visualize the plotCircos
plotCircos( listMoonlight, listMutation = NULL, additionalFilename = NULL, intensityColOCG = 0.5, intensityColTSG = 0.5, intensityColDual = 0.5, fontSize = 1 )
plotCircos( listMoonlight, listMutation = NULL, additionalFilename = NULL, intensityColOCG = 0.5, intensityColTSG = 0.5, intensityColDual = 0.5, fontSize = 1 )
listMoonlight |
output Moonlight function |
listMutation |
listMutation |
additionalFilename |
additionalFilename |
intensityColOCG |
intensityColOCG |
intensityColTSG |
intensityColTSG |
intensityColDual |
intensityColDual |
fontSize |
fontSize |
no return value, plot is saved
data('listMoonlight') plotCircos(listMoonlight = listMoonlight, additionalFilename = "_ncancer5")
data('listMoonlight') plotCircos(listMoonlight = listMoonlight, additionalFilename = "_ncancer5")
This function creates one or more heatmaps on the output from DMA. It visualises the CScape-Somatic annotations per oncogenic mediator either in a single heatmap or split into several different ones. It is also possible to provide a personalised genelist to visualise.
plotDMA( DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, type = "split", genelist = c(), additionalFilename = "" )
plotDMA( DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, type = "split", genelist = c(), additionalFilename = "" )
DEG_Mutations_Annotations |
A tibble, output file from DMA. |
Oncogenic_mediators_mutation_summary |
A tibble, output file from DMA. |
type |
A character string. It can take the values
|
genelist |
A character vector containing HUGO symbols.
A single heatmap will be created with only these genes.
The heatmap will be hierarchically clustered. This will overwrite |
additionalFilename |
A character string. Adds prefix or filepath to the filename of the pdf. |
No return value. DMA results are plotted.
data('DEG_Mutations_Annotations') data('Oncogenic_mediators_mutation_summary') plotDMA(DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, genelist = c("ACSS2", "AFAP1L1"), additionalFilename = "myplots_")
data('DEG_Mutations_Annotations') data('Oncogenic_mediators_mutation_summary') plotDMA(DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, genelist = c("ACSS2", "AFAP1L1"), additionalFilename = "myplots_")
This function visualize the functional enrichment analysis (FEA)'s barplot
plotFEA( dataFEA, topBP = 10, additionalFilename = NULL, height, width, offsetValue = 5, angle = 90, xleg = 35, yleg = 5, titleMain = "", minY = -5, maxY = 10, mycols = c("#8DD3C7", "#FFFFB3", "#BEBADA") )
plotFEA( dataFEA, topBP = 10, additionalFilename = NULL, height, width, offsetValue = 5, angle = 90, xleg = 35, yleg = 5, titleMain = "", minY = -5, maxY = 10, mycols = c("#8DD3C7", "#FFFFB3", "#BEBADA") )
dataFEA |
dataFEA |
topBP |
topBP |
additionalFilename |
additionalFilename |
height |
Figure height |
width |
Figure width |
offsetValue |
offsetValue |
angle |
angle |
xleg |
xleg |
yleg |
yleg |
titleMain |
title of the plot |
minY |
minY |
maxY |
maxY |
mycols |
colors to use for the plot |
no return value, FEA result is plotted
data(DEGsmatrix) data(DiseaseList) data(EAGenes) data(dataFEA) plotFEA(dataFEA = dataFEA[1:10,], additionalFilename = "_example",height = 20,width = 10)
data(DEGsmatrix) data(DiseaseList) data(EAGenes) data(dataFEA) plotFEA(dataFEA = dataFEA[1:10,], additionalFilename = "_example",height = 20,width = 10)
plotGMA This function plots results of the Gene Methylation Analysis. It visualizes the number of hypo/hyper/dual methylated CpG sites in oncogenic mediators or in a user-supplied gene list. The results are visualized either in a single heatmap or split into different ones which is specified in the function's three modes: split, complete and genelist.
plotGMA( DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary, type = "split", genelist = NULL, additionalFilename = "" )
plotGMA( DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary, type = "split", genelist = NULL, additionalFilename = "" )
DEG_Methylation_Annotations |
A tibble which is outputted from the GMA function. |
Oncogenic_mediators_methylation_summary |
A tibble which is outputted from the GMA function. |
type |
A character string which can either be split, complete or genelist. If type is set to split, the entire dataset is split into groups of 40 genes and individual heatmaps of groups each containing 40 genes will be created and subsequently merged into one pdf where each page in the pdf is an individual heatmap. The genes will be sorted alphabetically. If type is set to complete, a single heatmap is created where the number of differentially methylated CpG sites are shown for all oncogenic mediators. If type is set to genelist, a single heatmap will be created for genes supplied by the user in the genelist parameter. Default is split. |
genelist |
A character string containing HUGO symbols of genes to be visualized in a single heatmap. Default is NULL. |
additionalFilename |
A character string that can be used to add a prefix or filepath to the filename of the pdf visualizing the heatmap. Default is an empty string. |
No value is returned. Visualizations in form of heatmaps are saved.
data("DEG_Methylation_Annotations") data("Oncogenic_mediators_methylation_summary") genes <- c("ACAN", "ACE2", "ADAM19", "AFAP1L1") plotGMA(DEG_Methylation_Annotations = DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary = Oncogenic_mediators_methylation_summary, type = "genelist", genelist = genes, additionalFilename = "./GMAresults/")
data("DEG_Methylation_Annotations") data("Oncogenic_mediators_methylation_summary") genes <- c("ACAN", "ACE2", "ADAM19", "AFAP1L1") plotGMA(DEG_Methylation_Annotations = DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary = Oncogenic_mediators_methylation_summary, type = "genelist", genelist = genes, additionalFilename = "./GMAresults/")
This function creates a unclustered heatmap from the inputted data tibble and saves it
plotHeatmap(df)
plotHeatmap(df)
df |
a tibble |
The name of the alphabeatically first gene in the tibble
plotMetExp This function visualizes results of EpiMix.
plotMetExp( EpiMix_results, probe_name, dataMET, dataEXP, gene_of_interest, additionalFilename = "" )
plotMetExp( EpiMix_results, probe_name, dataMET, dataEXP, gene_of_interest, additionalFilename = "" )
EpiMix_results |
The object, a list, that was returned from running the EpiMix function and is one of the outputs from the GMA function. |
probe_name |
A character string containing the name of the CpG probe that will be plotted. |
dataMET |
A data matrix containing the methylation data where the CpG probes are in the rows and samples are in the columns |
dataEXP |
A data matrix containing the gene expression data where the genes are in the rows and the samples are in the columns |
gene_of_interest |
A character string containing the name of the gene that will be plotted. |
additionalFilename |
A character string that can be used to add a prefix or filepath to the filename of the pdf visualizing the heatmap. Default is an empty string. |
No value is returned. Visualizations are saved.
data("EpiMix_Results_Regular") data("dataMethyl") data("dataFilt") pattern <- "^(.{4}-.{2}-.{4}-.{2}).*" colnames(dataFilt) <- sub(pattern, "\\1", colnames(dataFilt)) plotMetExp(EpiMix_results = EpiMix_Results_Regular, probe_name = "cg03035704", dataMET = dataMethyl, dataEXP = dataFilt, gene_of_interest = "ACVRL1", additionalFilename = "./GMAresults/")
data("EpiMix_Results_Regular") data("dataMethyl") data("dataFilt") pattern <- "^(.{4}-.{2}-.{4}-.{2}).*" colnames(dataFilt) <- sub(pattern, "\\1", colnames(dataFilt)) plotMetExp(EpiMix_results = EpiMix_Results_Regular, probe_name = "cg03035704", dataMET = dataMethyl, dataEXP = dataFilt, gene_of_interest = "ACVRL1", additionalFilename = "./GMAresults/")
This function creates a heatmap of Moonlight gene z-scores for selected genes.
plotMoonlight( DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, dataURA, gene_type = "drivers", n = 50, genelist = c(), BPlist = c(), additionalFilename = "" )
plotMoonlight( DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, dataURA, gene_type = "drivers", n = 50, genelist = c(), BPlist = c(), additionalFilename = "" )
DEG_Mutations_Annotations |
A tibble, output file from DMA. |
Oncogenic_mediators_mutation_summary |
A tibble, output file from DMA. |
dataURA |
Output URA function. |
gene_type |
A character string either
|
n |
Numeric. The top number of genes to be plotted. If |
genelist |
A vector of strings containing Hugo Symbols of genes.
Overwrites |
BPlist |
A vector of strings. Selection of the biological processes to visualise. If left empty defaults to every BP provided in the URA file. |
additionalFilename |
A character string. Adds prefix or filepath to the filename of the pdf. |
No return value. Moonlight scores are plotted for selected genes.
data(DEG_Mutations_Annotations) data(Oncogenic_mediators_mutation_summary) data(dataURA_plot) plotMoonlight(DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, dataURA_plot, genelist = c("AFAP1L1", "ABCG2"), additionalFilename = "myplot_")
data(DEG_Mutations_Annotations) data(Oncogenic_mediators_mutation_summary) data(dataURA_plot) plotMoonlight(DEG_Mutations_Annotations, Oncogenic_mediators_mutation_summary, dataURA_plot, genelist = c("AFAP1L1", "ABCG2"), additionalFilename = "myplot_")
plotMoonlightMet This function visualizes the effect of genes on biological processes and total number of hypo/hyper/dual methylated CpG sites in genes.
plotMoonlightMet( DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary, dataURA, genes, additionalFilename = "" )
plotMoonlightMet( DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary, dataURA, genes, additionalFilename = "" )
DEG_Methylation_Annotations |
A tibble which is outputted from the GMA function. |
Oncogenic_mediators_methylation_summary |
A tibble which is outputted from the GMA function. |
dataURA |
Output of the URA function: a table containing the effect of oncogenic mediators on biological processes. This effect is quantified through Moonlight Gene Z-scores. |
genes |
A character string containing the genes to be visualized. |
additionalFilename |
A character string that can be used to add a prefix or filepath to the filename of the pdf visualizing the heatmap. Default is an empty string. |
No value is returned. Visualization in form of a heatmap is saved.
data("DEG_Methylation_Annotations") data("Oncogenic_mediators_methylation_summary") data("dataURA_plot") genes <- c("ACAN", "ACE2", "ADAM19", "AFAP1L1") plotMoonlightMet(DEG_Methylation_Annotations = DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary = Oncogenic_mediators_methylation_summary, dataURA = dataURA_plot, genes = genes, additionalFilename = "./GMAresults/")
data("DEG_Methylation_Annotations") data("Oncogenic_mediators_methylation_summary") data("dataURA_plot") genes <- c("ACAN", "ACE2", "ADAM19", "AFAP1L1") plotMoonlightMet(DEG_Methylation_Annotations = DEG_Methylation_Annotations, Oncogenic_mediators_methylation_summary = Oncogenic_mediators_methylation_summary, dataURA = dataURA_plot, genes = genes, additionalFilename = "./GMAresults/")
This function visualizes the GRN as a hive plot
plotNetworkHive(dataGRN, namesGenes, thres, additionalFilename = NULL)
plotNetworkHive(dataGRN, namesGenes, thres, additionalFilename = NULL)
dataGRN |
output GRN function |
namesGenes |
list TSG and OCG to define axes |
thres |
threshold of edges to be included |
additionalFilename |
additionalFilename |
no results Hive plot is executed
data(knownDriverGenes) data(dataGRN) plotNetworkHive(dataGRN = dataGRN, namesGenes = knownDriverGenes, thres = 0.55)
data(knownDriverGenes) data(dataGRN) plotNetworkHive(dataGRN = dataGRN, namesGenes = knownDriverGenes, thres = 0.55)
This function visualizes the URA in a heatmap
plotURA(dataURA, additionalFilename = "URAplot")
plotURA(dataURA, additionalFilename = "URAplot")
dataURA |
output URA function |
additionalFilename |
figure name |
heatmap
data(dataURA) data(DiseaseList) data(tabGrowBlock) data(knownDriverGenes) dataDual <- PRA(dataURA = dataURA, BPname = c("apoptosis","proliferation of cells"), thres.role = 0) TSGs_genes <- names(dataDual$TSG) OCGs_genes <- names(dataDual$OCG) plotURA(dataURA = dataURA[c(TSGs_genes, OCGs_genes),],additionalFilename = "_example")
data(dataURA) data(DiseaseList) data(tabGrowBlock) data(knownDriverGenes) dataDual <- PRA(dataURA = dataURA, BPname = c("apoptosis","proliferation of cells"), thres.role = 0) TSGs_genes <- names(dataDual$TSG) OCGs_genes <- names(dataDual$OCG) plotURA(dataURA = dataURA[c(TSGs_genes, OCGs_genes),],additionalFilename = "_example")
This function carries out the pattern recognition analysis
PRA(dataURA, BPname, thres.role = 0)
PRA(dataURA, BPname, thres.role = 0)
dataURA |
output URA function |
BPname |
BPname |
thres.role |
thres.role |
returns list of TSGs and OCGs when biological processes are provided, otherwise a randomForest based classifier that can be used on new data
data(dataURA) data(DiseaseList) data(tabGrowBlock) data(knownDriverGenes) dataPRA <- PRA(dataURA = dataURA[seq.int(2),], BPname = c("apoptosis","proliferation of cells"), thres.role = 0)
data(dataURA) data(DiseaseList) data(tabGrowBlock) data(knownDriverGenes) dataPRA <- PRA(dataURA = dataURA[seq.int(2),], BPname = c("apoptosis","proliferation of cells"), thres.role = 0)
This function changes the PRA output to tibble format
PRAtoTibble(dataPRA)
PRAtoTibble(dataPRA)
dataPRA |
RDA object (list of two) from PRA |
tibble with drivers
data('dataPRA') PRAtoTibble(dataPRA)
data('dataPRA') PRAtoTibble(dataPRA)
This function retrive cscape-scores to SNPs
RunCscape_somatic(input, coding_file, noncoding_file)
RunCscape_somatic(input, coding_file, noncoding_file)
input |
Input matching cscape input |
coding_file |
cscape_table with coding scores |
noncoding_file |
cscape_table with noncoding scores |
returns a tibble with a score and remark for each SNP
cscape_out <- RunCscape_somatic(input, coding_file, noncoding_file)
cscape_out <- RunCscape_somatic(input, coding_file, noncoding_file)
A matrix with biological processes in rows and the cancer #' growing or blocking effect of the process in columns
data(tabGrowBlock)
data(tabGrowBlock)
A 101x3 matrix
For each biological processes the cancer growing/blocking effect is indicated
A 101x3 matrix
This function finds mutations in the transcription factors (TF) of the DEGs that have a TF in the database.
TFinfluence( dataPRA, dataDEGs, dataTRRUST, dataMAF, dataMAVISp, stabClassMAVISp = "rosetta" )
TFinfluence( dataPRA, dataDEGs, dataTRRUST, dataMAF, dataMAVISp, stabClassMAVISp = "rosetta" )
dataPRA |
Output PRA function. List of TSG and OCG Must contain the following elements
|
dataDEGs |
Output DEA function. Tibble containing differentially expressed genes. Must contain the following columns
|
dataTRRUST |
Tibble containing gene-TF pairs and type of interaction Must contain the following columns:
|
dataMAF |
Tibble containing mutation info from MAF Must at least contain the following columns:
|
dataMAVISp |
Output loadMAVISp function. List of tibbles, one for each protein. The tibbles must contain at least
Then the tibbles must contain columns mathing either of the following names(or they will be excluded)
|
stabClassMAVISp |
The protocol to use for mutation stability classification. Default is rosetta. Accepts one of the following strings
|
returns a tibble containing:
GENE
TF
InteractionType
PMID
tf_mutation
stab_class (the effect on stability as classified by MAVISp)
data('dataPRA') data('DEGsmatrix') data('dataTRRUST') data('dataMAF') data('dataMAVISp') TFinfluence(dataTRRUST = dataTRRUST, dataMAF = dataMAF, dataDEGs = DEGsmatrix, dataPRA = dataPRA, dataMAVISp = dataMAVISp, stabClassMAVISp = 'rosetta')
data('dataPRA') data('DEGsmatrix') data('dataTRRUST') data('dataMAF') data('dataMAVISp') TFinfluence(dataTRRUST = dataTRRUST, dataMAF = dataMAF, dataDEGs = DEGsmatrix, dataPRA = dataPRA, dataMAVISp = dataMAVISp, stabClassMAVISp = 'rosetta')
This function carries out the upstream regulator analysis
URA(dataGRN, DEGsmatrix, BPname, nCores = 1)
URA(dataGRN, DEGsmatrix, BPname, nCores = 1)
dataGRN |
output GNR function |
DEGsmatrix |
output DPA function |
BPname |
biological processes |
nCores |
number of cores to use |
an adjacent matrix
data(DEGsmatrix) dataDEGs <- DEGsmatrix data(dataGRN) data(DiseaseList) data(EAGenes) dataURA <- URA(dataGRN = dataGRN, DEGsmatrix = dataDEGs, BPname = c("apoptosis", "proliferation of cells"))
data(DEGsmatrix) dataDEGs <- DEGsmatrix data(dataGRN) data(DiseaseList) data(EAGenes) dataURA <- URA(dataGRN = dataGRN, DEGsmatrix = dataDEGs, BPname = c("apoptosis", "proliferation of cells"))