Title: | Identifying Genes Overexpressed in Subsets of Tumors from Tumor-Normal mRNA Expression Data |
---|---|
Description: | This package helps identify mRNAs that are overexpressed in subsets of tumors relative to normal tissue. Ideal inputs would be paired tumor-normal data from the same tissue from many patients (>15 pairs). This unsupervised approach relies on the observation that oncogenes are characteristically overexpressed in only a subset of tumors in the population, and may help identify oncogene candidates purely based on differences in mRNA expression between previously unknown subtypes. |
Authors: | Daniel Pique, John Greally, Jessica Mar |
Maintainer: | Daniel Pique <[email protected]> |
License: | GPL-3 |
Version: | 1.29.0 |
Built: | 2024-10-30 09:08:31 UTC |
Source: | https://github.com/bioc/oncomix |
These RNA-sequencing expression data were obtained from the Cancer Genome Atlas project (now the Genomic Data Commons (GDC)). The sequencing data were generated from adjacent normal breast tissue data from 113 patients with breast cancer. Quantification of RNA expression values was performed using standard GDC pipelines. The expression values are reported in transcripts per million reads. Out of an initial 73,599 RNA transcripts, 700 are included as part of this dataset. These 700 transcripts represent a random subset of the transcripts with at least 20 samples. Rows contain anonymized patient identifiers, while columns contain UCSC gene symbols.
Daniel Pique [email protected]
These RNA-sequencing expression data were obtained from the Cancer Genome Atlas project (now the Genomic Data Commons (GDC)). The sequencing data were generated from breast carcinoma samples from 113 patients. Quantification of RNA expression values was performed using standard GDC pipelines. The expression values are reported in transcripts per million reads. Out of an initial 73,599 RNA transcripts, 700 are included as part of this dataset. These 700 transcripts represent a random subset of the transcripts with at least 20 anonymized patient identifiers, while columns contain UCSC gene symbols.
Daniel Pique [email protected]
This function allows you to generate the parameters for two 2-component Gaussian mixture model with equal variances from 2 matrices of data with a priori labels (eg tumor vs normal.) This application was originally intended for matrices of gene expression data treated with 2 conditions.
mixModelParams(exprNml, exprTum)
mixModelParams(exprNml, exprTum)
exprNml |
A dataframe (S3 or S4), matrix, or SummarizedExperiment object containing normal data with patients as columns and genes as rows. |
exprTum |
A dataframe (S3 or S4), matrix, or SummarizedExperiment object containing tumor data with patients as columns and genes as rows. |
Returns a dataframe, each element of which contains the 12 mixture model parameters for each gene in an n x 12 matrix, where n is the number of genes.
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum)
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum)
This function allows you to generate a plot
oncoMixBimodal(means = c(3, 7))
oncoMixBimodal(means = c(3, 7))
means |
Set the values for the difference between parameter means |
Returns a ggplot object that shows a 2-component Gaussian mixture model
oncoMixBimodal(means=c(3,7)) oncoMixBimodal(means=c(3,10))
oncoMixBimodal(means=c(3,7)) oncoMixBimodal(means=c(3,10))
This function allows you to generate a plot
oncoMixIdeal(means = c(3, 7))
oncoMixIdeal(means = c(3, 7))
means |
Set the difference between parameter means for the overexpressed (oe) group. Defaults to c(3,7) |
Returns a ggplot object that shows the statistical model for an idealized/theoretical oncogene candidate mRNA that is overexpressed in a subset of tumors
oncoMixIdeal(means=c(3,10)) oncoMixIdeal(means=c(2,18.5))
oncoMixIdeal(means=c(3,10)) oncoMixIdeal(means=c(2,18.5))
This function allows you to generate a schematic of the assumptions of a traditional DE expermiment between two known groups.
oncoMixTraditionalDE(means = c(3, 7))
oncoMixTraditionalDE(means = c(3, 7))
means |
Set the values for the difference between parameter means |
Returns a ggplot object that shows the traditional method (2 sample t-test) for mRNA differential expression.
oncoMixTraditionalDE(means=c(3,7)) oncoMixTraditionalDE(means=c(3,10))
oncoMixTraditionalDE(means=c(3,7)) oncoMixTraditionalDE(means=c(3,10))
This function allows you to plot a histogram of gene expression values from tumor and adjacent normal tissue with the option of including the best fitting Gaussian curve.
plotGeneHist(mmParams, exprNml, exprTum, isof)
plotGeneHist(mmParams, exprNml, exprTum, isof)
mmParams |
The output from the getMixModelParams function. |
exprNml |
A dataframe (S3 or S4), matrix, or SummarizedExperiment object containing normal data with patients as columns and genes as rows. |
exprTum |
A dataframe (S3 or S4), matrix, or SummarizedExperiment object containing tumor data with patients as columns and genes as rows. |
isof |
The gene isoform to visualize |
Returns a histogram of the gene expression values from the two groups.
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum) isof <- rownames(mmParams)[1] plotGeneHist(mmParams, exprNml, exprTum, isof)
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum) isof <- rownames(mmParams)[1] plotGeneHist(mmParams, exprNml, exprTum, isof)
These data were downloaded in September 2017 from the url listed below and represent a mapping between gene symbols in ongene, an oncogene database curated from the scientific literature, and gene identifiers from the University of California, Santa Cruz's genomic database.
Min Zhao [email protected]
http://ongene.bioinfo-minzhao.org/ongene_human.txt
This function allows you to generate the parameters for two 2-component mixture models with equal variances
scatterMixPlot(mmParams, selIndThresh = 1, geneLabels = NULL)
scatterMixPlot(mmParams, selIndThresh = 1, geneLabels = NULL)
mmParams |
The output from the mixModelParams function. Will utilize the deltaMu2 and deltaMu1 rows. |
selIndThresh |
This is the selectivity index threshold to use. All genes with SI values above this threshold will be highlighted in purple. Specify either selIndThresh or geneLabels (not both simultaneously). |
geneLabels |
A character vector of gene names used to label the genes with that name on the scatter plot. Specify either selIndThresh or geneLabels (not both simultaneously). |
Returns a ggplot scatter object that can be plotted
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum) scatterMixPlot(mmParams)
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum) scatterMixPlot(mmParams)
This function allows you to subset genes that are above pre-specified quantiles and that most closely resemble the distribution of oncogenes.
topGeneQuants(mmParams, deltMu2Thr = 90, deltMu1Thr = 10, siThr = 0.99)
topGeneQuants(mmParams, deltMu2Thr = 90, deltMu1Thr = 10, siThr = 0.99)
mmParams |
The output from the mixModelParams function. |
deltMu2Thr |
The percentile threshold for the deltaMu2 statistic. All genes exceeding this percentile threshold will be selected. |
deltMu1Thr |
The percentile threshold for the deltaMu1 statistic. All genes exceeding this percentile threshold will be selected. |
siThr |
The threshold for the selectivity index statistic (between 0-1). All genes exceeding this threshold will be selected. |
Returns a dataframe containing all genes meeting the prespecified thresholds.
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum) topGeneQuants(mmParams)
exprNml <- as.data.frame(matrix(data=rgamma(n=150, shape=2, rate=2), nrow=10, ncol=15)) colnames(exprNml) <- paste0("patientN", seq_len(ncol(exprNml))) rownames(exprNml) <- paste0("gene", seq_len(nrow(exprNml))) exprTum <- as.data.frame(matrix(data=rgamma(n=150, shape=4, rate=3), nrow=10, ncol=15)) colnames(exprTum) <- paste0("patientT", seq_len(ncol(exprTum))) rownames(exprTum) <- paste0("gene", seq_len(nrow(exprTum))) mmParams <- mixModelParams(exprNml, exprTum) topGeneQuants(mmParams)