Title: | Identify oncogenes and tumor suppressor genes from omics data |
---|---|
Description: | Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. |
Authors: | Antonio Colaprico [aut], Catharina Olsen [aut], Matthew H. Bailey [aut], Gabriel J. Odom [aut], Thilde Terkelsen [aut], Mona Nourbakhsh [aut], Astrid Saksager [aut], Tiago C. Silva [aut], André V. Olsen [aut], Laura Cantini [aut], Andrei Zinovyev [aut], Emmanuel Barillot [aut], Houtan Noushmehr [aut], Gloria Bertoli [aut], Isabella Castiglioni [aut], Claudia Cava [aut], Gianluca Bontempi [aut], Xi Steven Chen [aut], Elena Papaleo [aut], Matteo Tiberti [cre, aut] |
Maintainer: | Matteo Tiberti <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.33.0 |
Built: | 2024-11-18 05:44:41 UTC |
Source: | https://github.com/bioc/MoonlightR |
A data set containing the following data:
data(dataFilt)
data(dataFilt)
A 13742x20 matrix
dataFilt matrix with 13742 rows (genes) and 20 columns samples with TCGA's barcodes (10TP, 10NT)
a 13742x20 matrix
output from GRN function
data(dataGRN)
data(dataGRN)
A large list of 2 elements
dataGRN list of 2 elements miTFGenes, maxmi from GRN function
a large list of 2 elements
A data set containing the following data:
data(dataURA)
data(dataURA)
A data frame with 100 rows and 2 variables
dataURA matrix with 100 rows (genes) and 2 columns "apoptosis" "proliferation of cells"
a 100x2 matrix
A data set containing the following data:
data(DEGsmatrix)
data(DEGsmatrix)
A 3502x5 matrix
DEGsmatrix matrix with 3502 rows (genes) and five columns "logFC" "logCPM" "LR" "PValue" "FDR"
the 3502x5 matrix
A data set containing the following data:
data(DiseaseList)
data(DiseaseList)
A list of 101 matrices
DiseaseList list for 101 biological processes, each containing a matrix with five columns: ID, Genes.in.dataset, Prediction based on expression direction, Log ratio, Findings
list of 101 matrices
This function carries out the differential phenotypes analysis
DPA( dataType, dataFilt, dataConsortium = "TCGA", fdr.cut = 0.01, logFC.cut = 1, diffmean.cut = 0.25, samplesType, colDescription, gset, gsetFile = "gsetFile.RData" )
DPA( dataType, dataFilt, dataConsortium = "TCGA", fdr.cut = 0.01, logFC.cut = 1, diffmean.cut = 0.25, samplesType, colDescription, gset, gsetFile = "gsetFile.RData" )
dataType |
selected |
dataFilt |
obtained from getDataTCGA |
dataConsortium |
is TCGA or GEO, default TCGA |
fdr.cut |
is a threshold to filter DEGs according their p-value corrected |
logFC.cut |
is a threshold to filter DEGs according their logFC |
diffmean.cut |
diffmean.cut for DMR |
samplesType |
samplesType |
colDescription |
colDescription |
gset |
gset |
gsetFile |
gsetFile |
result matrix from differential phenotype analysis
dataDEGs <- DPA(dataFilt = dataFilt, dataType = "Gene expression")
dataDEGs <- DPA(dataFilt = dataFilt, dataType = "Gene expression")
A data set containing the following data:
data(EAGenes)
data(EAGenes)
A 20038x5 matrix
EAGenes matrix with 20038 rows (genes) and five columns "ID" "Gene" "Description" "Location" "Family"
a 20038x5 matrix
This function carries out the functional enrichment analysis (FEA)
FEA(BPname = NULL, DEGsmatrix)
FEA(BPname = NULL, DEGsmatrix)
BPname |
BPname biological process such as "proliferation of cells", "ALL" (default) if FEA should be carried out for all 101 biological processes |
DEGsmatrix |
DEGsmatrix output from DEA such as dataDEGs" |
matrix from FEA
dataDEGs <- DPA(dataFilt = dataFilt, dataType = "Gene expression") dataFEA <- FEA(DEGsmatrix = dataDEGs)
dataDEGs <- DPA(dataFilt = dataFilt, dataType = "Gene expression") dataFEA <- FEA(DEGsmatrix = dataDEGs)
A character vector of GDC projects:
data(GDCprojects)
data(GDCprojects)
A character vector of 39 elements
character vector for GDC projects.
character vector of 39 elements
A data set containing the following data:
data(geneInfo)
data(geneInfo)
A data frame with 20531 rows and 3 variables
geneInfo matrix with 20531 rows (genes) and 3 columns "geneLength" "gcContent" "chr"
a 20531x3 matrix
GEO_TCGAtab a 18x12 matrix that provides the GEO data set we matched to one of the 18 given TCGA cancer types
data(GEO_TCGAtab)
data(GEO_TCGAtab)
A 101x3 matrix
a 101x3 matrix
This function retrieves and prepares GEO data
getDataGEO(GEOobject = "GSE39004", platform = "GPL6244", TCGAtumor = NULL)
getDataGEO(GEOobject = "GSE39004", platform = "GPL6244", TCGAtumor = NULL)
GEOobject |
GEOobject |
platform |
platform |
TCGAtumor |
tumor name |
return GEO gset
## Not run: dataGEO <- getDataGEO(GEOobject = "GSE20347",platform = "GPL571") ## End(Not run)
## Not run: dataGEO <- getDataGEO(GEOobject = "GSE20347",platform = "GPL571") ## End(Not run)
This function retrieves and prepares TCGA data
getDataTCGA( cancerType, dataType, directory, cor.cut = 0.6, qnt.cut = 0.25, nSample, stage = "ALL", subtype = 0, samples = NULL )
getDataTCGA( cancerType, dataType, directory, cor.cut = 0.6, qnt.cut = 0.25, nSample, stage = "ALL", subtype = 0, samples = NULL )
cancerType |
select cancer type for which analysis should be run. panCancer for all available cancer types in TCGA. Defaults to panCancer |
dataType |
is dataType such as gene expression, cnv, methylation etc. |
directory |
Directory/Folder where the data was downloaded. Default: GDCdata |
cor.cut |
cor.cut |
qnt.cut |
qnt.cut |
nSample |
nSample |
stage |
stage |
subtype |
subtype |
samples |
samples |
returns filtered TCGA data
## Not run: dataFilt <- getDataTCGA(cancerType = "LUAD", dataType = "Gene expression", directory = "data", nSample = 4) ## End(Not run)
## Not run: dataFilt <- getDataTCGA(cancerType = "LUAD", dataType = "Gene expression", directory = "data", nSample = 4) ## End(Not run)
This function carries out the gene regulatory network inference using parmigene
GRN( TFs, DEGsmatrix, DiffGenes = FALSE, normCounts, kNearest = 3, nGenesPerm = 10, nBoot = 10 )
GRN( TFs, DEGsmatrix, DiffGenes = FALSE, normCounts, kNearest = 3, nGenesPerm = 10, nBoot = 10 )
TFs |
a vector of genes. |
DEGsmatrix |
DEGsmatrix output from DEA such as dataDEGs |
DiffGenes |
if TRUE consider only diff.expr genes in GRN |
normCounts |
is a matrix of gene expression with genes in rows and samples in columns. |
kNearest |
the number of nearest neighbors to consider to estimate the mutual information. Must be less than the number of columns of normCounts. |
nGenesPerm |
nGenesPerm |
nBoot |
nBoot |
an adjacent matrix
dataDEGs <- DEGsmatrix dataGRN <- GRN(TFs = rownames(dataDEGs)[1:100], DEGsmatrix = dataDEGs, DiffGenes = TRUE, normCounts = dataFilt)
dataDEGs <- DEGsmatrix dataGRN <- GRN(TFs = rownames(dataDEGs)[1:100], DEGsmatrix = dataDEGs, DiffGenes = TRUE, normCounts = dataFilt)
This function carries out the GSEA enrichment analysis.
GSEA(DEGsmatrix, top, plot = FALSE)
GSEA(DEGsmatrix, top, plot = FALSE)
DEGsmatrix |
DEGsmatrix output from DEA such as dataDEGs |
top |
is the number of top BP to plot |
plot |
if TRUE return a GSEA's plot |
return GSEA result
dataDEGs <- DEGsmatrix # dataFEA <- GSEA(DEGsmatrix = dataDEGs)
dataDEGs <- DEGsmatrix # dataFEA <- GSEA(DEGsmatrix = dataDEGs)
A data set containing the following data:
data(knownDriverGenes)
data(knownDriverGenes)
A 101x3 matrix
TSG known tumor suppressor genes
OCG known oncogenes
a 101x3 matrix
A list containing the following data:
data(listMoonlight)
data(listMoonlight)
A Large list with 5 elements
listMoonlight output from moonlight's pipeline containing dataDEGs, dataURA, listCandidates
output from moonlight pipeline
This function carries out the literature phenotype analysis (LPA)
LPA(dataDEGs, BP, BPlist)
LPA(dataDEGs, BP, BPlist)
dataDEGs |
is output from DEA |
BP |
is biological process |
BPlist |
is list of genes annotated in BP |
table with number of pubmed that affects, increase or decrase genes annotated in BP
data(DEGsmatrix) BPselected <- c("apoptosis") BPannotations <- DiseaseList[[match(BPselected, names(DiseaseList))]]$ID
data(DEGsmatrix) BPselected <- c("apoptosis") BPannotations <- DiseaseList[[match(BPselected, names(DiseaseList))]]$ID
moonlight is a tool for identification of cancer driver genes. This function wraps the different steps of the complete analysis workflow. Providing different solutions:
MoonlighR::FEA
MoonlighR::URA
MoonlighR::PIA
moonlight( cancerType = "panCancer", dataType = "Gene expression", directory = "GDCdata", BPname = NULL, cor.cut = 0.6, qnt.cut = 0.25, Genelist = NULL, fdr.cut = 0.01, logFC.cut = 1, corThreshold = 0.6, kNearest = 3, nGenesPerm = 10, DiffGenes = FALSE, nBoot = 100, nTF = NULL, nSample = NULL, thres.role = 0, stage = NULL, subtype = 0, samples = NULL )
moonlight( cancerType = "panCancer", dataType = "Gene expression", directory = "GDCdata", BPname = NULL, cor.cut = 0.6, qnt.cut = 0.25, Genelist = NULL, fdr.cut = 0.01, logFC.cut = 1, corThreshold = 0.6, kNearest = 3, nGenesPerm = 10, DiffGenes = FALSE, nBoot = 100, nTF = NULL, nSample = NULL, thres.role = 0, stage = NULL, subtype = 0, samples = NULL )
cancerType |
select cancer type for which analysis should be run. panCancer for all available cancer types in TCGA. Defaults to panCancer |
dataType |
dataType |
directory |
directory |
BPname |
biological processes to use, if NULL: all processes will be used in analysis, RF for candidate; if not NULL the candidates for these processes will be determined (no learning) |
cor.cut |
cor.cut Threshold |
qnt.cut |
qnt.cut Threshold |
Genelist |
Genelist |
fdr.cut |
fdr.cut Threshold |
logFC.cut |
logFC.cut Threshold |
corThreshold |
corThreshold |
kNearest |
kNearest |
nGenesPerm |
nGenesPerm |
DiffGenes |
DiffGenes |
nBoot |
nBoot |
nTF |
nTF |
nSample |
nSample |
thres.role |
thres.role |
stage |
stage |
subtype |
subtype |
samples |
samples |
table with cancer driver genes TSG and OCG.
dataDEGs <- DPA(dataFilt = dataFilt, dataType = "Gene expression") # to change with moonlight
dataDEGs <- DPA(dataFilt = dataFilt, dataType = "Gene expression") # to change with moonlight
MoonlightR is a package designed for the identification of cancer driver genes. Please see the documentation on our Bioconductor page for more details: https://www.bioconductor.org/packages/release/bioc/html/MoonlightR.html
If you experience issues with the package, please open an Issue on our GitHub repository: https://github.com/ELELAB/MoonlightR
If you use this package in your research, please cite this paper: https://doi.org/10.1038/s41467-019-13803-0
This function visualize the plotCircos
plotCircos( listMoonlight, listMutation = NULL, additionalFilename = NULL, intensityColOCG = 0.5, intensityColTSG = 0.5, intensityColDual = 0.5, fontSize = 1 )
plotCircos( listMoonlight, listMutation = NULL, additionalFilename = NULL, intensityColOCG = 0.5, intensityColTSG = 0.5, intensityColDual = 0.5, fontSize = 1 )
listMoonlight |
output Moonlight function |
listMutation |
listMutation |
additionalFilename |
additionalFilename |
intensityColOCG |
intensityColOCG |
intensityColTSG |
intensityColTSG |
intensityColDual |
intensityColDual |
fontSize |
fontSize |
no return value, plot is saved
plotCircos(listMoonlight = listMoonlight, additionalFilename = "_ncancer5")
plotCircos(listMoonlight = listMoonlight, additionalFilename = "_ncancer5")
This function visualize the functional enrichment analysis (FEA)'s barplot
plotFEA( dataFEA, topBP = 10, additionalFilename = NULL, height, width, offsetValue = 5, angle = 90, xleg = 35, yleg = 5, titleMain, minY = -5, maxY = 10, mycols = c("#8DD3C7", "#FFFFB3", "#BEBADA") )
plotFEA( dataFEA, topBP = 10, additionalFilename = NULL, height, width, offsetValue = 5, angle = 90, xleg = 35, yleg = 5, titleMain, minY = -5, maxY = 10, mycols = c("#8DD3C7", "#FFFFB3", "#BEBADA") )
dataFEA |
dataFEA |
topBP |
topBP |
additionalFilename |
additionalFilename |
height |
Figure height |
width |
Figure width |
offsetValue |
offsetValue |
angle |
angle |
xleg |
xleg |
yleg |
yleg |
titleMain |
title of the plot |
minY |
minY |
maxY |
maxY |
mycols |
colors to use for the plot |
no return value, FEA result is plotted
dataFEA <- FEA(DEGsmatrix = DEGsmatrix) plotFEA(dataFEA = dataFEA, additionalFilename = "_example",height = 20,width = 10)
dataFEA <- FEA(DEGsmatrix = DEGsmatrix) plotFEA(dataFEA = dataFEA, additionalFilename = "_example",height = 20,width = 10)
This function visualizes the GRN as a hive plot
plotNetworkHive(dataGRN, namesGenes, thres, additionalFilename = NULL)
plotNetworkHive(dataGRN, namesGenes, thres, additionalFilename = NULL)
dataGRN |
output GRN function |
namesGenes |
list TSG and OCG to define axes |
thres |
threshold of edges to be included |
additionalFilename |
additionalFilename |
no results Hive plot is executed
data(knownDriverGenes) data(dataGRN) plotNetworkHive(dataGRN = dataGRN, namesGenes = knownDriverGenes, thres = 0.55)
data(knownDriverGenes) data(dataGRN) plotNetworkHive(dataGRN = dataGRN, namesGenes = knownDriverGenes, thres = 0.55)
This function visualizes the URA in a heatmap
plotURA(dataURA, additionalFilename = "URAplot")
plotURA(dataURA, additionalFilename = "URAplot")
dataURA |
output URA function |
additionalFilename |
figure name |
heatmap
data(dataURA) dataDual <- PRA(dataURA = dataURA, BPname = c("apoptosis","proliferation of cells"), thres.role = 0) TSGs_genes <- names(dataDual$TSG) OCGs_genes <- names(dataDual$OCG) plotURA(dataURA = dataURA[c(TSGs_genes, OCGs_genes),],additionalFilename = "_example")
data(dataURA) dataDual <- PRA(dataURA = dataURA, BPname = c("apoptosis","proliferation of cells"), thres.role = 0) TSGs_genes <- names(dataDual$TSG) OCGs_genes <- names(dataDual$OCG) plotURA(dataURA = dataURA[c(TSGs_genes, OCGs_genes),],additionalFilename = "_example")
This function carries out the pattern recognition analysis
PRA(dataURA, BPname, thres.role = 0)
PRA(dataURA, BPname, thres.role = 0)
dataURA |
output URA function |
BPname |
BPname |
thres.role |
thres.role |
returns list of TSGs and OCGs when biological processes are provided, otherwise a randomForest based classifier that can be used on new data
data(dataURA) dataDual <- PRA(dataURA = dataURA, BPname = c("apoptosis","proliferation of cells"), thres.role = 0)
data(dataURA) dataDual <- PRA(dataURA = dataURA, BPname = c("apoptosis","proliferation of cells"), thres.role = 0)
A data set containing the following data:
data(tabGrowBlock)
data(tabGrowBlock)
A 101x3 matrix
tabGrowBlock matrix that defines if a process is growing or blocking cancer development, for each 101 biological processing
a 101x3 matrix
This function carries out the upstream regulator analysis
URA(dataGRN, DEGsmatrix, BPname, nCores = 1)
URA(dataGRN, DEGsmatrix, BPname, nCores = 1)
dataGRN |
output GNR function |
DEGsmatrix |
output DPA function |
BPname |
biological processes |
nCores |
number of cores to use |
an adjacent matrix
dataDEGs <- DEGsmatrix dataGRN <- GRN(TFs = rownames(dataDEGs)[1:100], DEGsmatrix = dataDEGs, DiffGenes = TRUE, normCounts = dataFilt) dataURA <-URA(dataGRN = dataGRN, DEGsmatrix = dataDEGs, BPname = c("apoptosis", "proliferation of cells"))
dataDEGs <- DEGsmatrix dataGRN <- GRN(TFs = rownames(dataDEGs)[1:100], DEGsmatrix = dataDEGs, DiffGenes = TRUE, normCounts = dataFilt) dataURA <-URA(dataGRN = dataGRN, DEGsmatrix = dataDEGs, BPname = c("apoptosis", "proliferation of cells"))