Title: | Deconvolution of Heterogeneous Tissue Samples for mRNA-Seq data |
---|---|
Description: | DeconSeq is an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It modeled expression levels from heterogeneous cell populations in mRNA-Seq as the weighted average of expression from different constituting cell types and predicted cell type proportions of single expression profiles. |
Authors: | Ting Gong <[email protected]> Joseph D. Szustakowski <[email protected]> |
Maintainer: | Ting Gong <[email protected]> |
License: | GPL-2 |
Version: | 1.49.0 |
Built: | 2024-11-19 03:24:48 UTC |
Source: | https://github.com/bioc/DeconRNASeq |
Main function "DeconRNASeq" implements an nonnegative decomposition by quadratic programming as datasets = signature*A, where "datasets" are the originally measured data matrix (e.g. genes by samples), "signature" is the signature matrix (genes by cell types) and "A" the cell type concentration matrix (cell types by samples)
Package: | DeconRNASeq |
Type: | Package |
Version: | 1.0 |
Date: | 2012-05-25 |
License: | GPL version 2 or later |
DeconRNASeq(datasets, signature)
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156.
A data frame providing the expression profilings of GSE19830 microarray samples.
all.datasets
all.datasets
A matrix with expression studies in the GSE19830 microarray samples: the first three columns are corresponding to the liver, while the last three samples are corresponding to the brain.
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(rat_liver_brain)
data(rat_liver_brain)
array.proportions: a data frame providing the fractions for liver and brain from the microarray GSE 19830 study
array.proportions
array.proportions
a martix whose rows are mixing samples' name and columns are fractions from pure live and brain tissues
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(rat_liver_brain)
data(rat_liver_brain)
array.signatures: a data frame providing the expression values from rat pure liver and brain samples, each has threee replicates
array.signatures
array.signatures
a data matrix with 30 expressions from rat pure liver and brain tissues
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(rat_liver_brain)
data(rat_liver_brain)
A function is used to draw the plot of the condition number of signature matrices of all sizes, from a handful of genes in one extreme to the whole signature in the other.
condplot(step, cond)
condplot(step, cond)
step |
an array with the number of genes used to calculate the condition numbers of signature matrices, default stepwise = 20 |
cond |
an array with the condition numbers of signature matrices |
a plot for the condition numbers of signature matrices
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156.
library(DeconRNASeq) #################################################################### ## toy data example: step <- seq(20,1000, by=20) #every 20 genes ## cell type-specific gene expression matrix: x.signature <- matrix(rexp(2000),ncol=2) sig.cond <- sapply(step, function(x) kappa(scale(x.signature[1:x,]))) function (step, cond)
library(DeconRNASeq) #################################################################### ## toy data example: step <- seq(20,1000, by=20) #every 20 genes ## cell type-specific gene expression matrix: x.signature <- matrix(rexp(2000),ncol=2) sig.cond <- sapply(step, function(x) kappa(scale(x.signature[1:x,]))) function (step, cond)
A data frame providing the RPKM of seven mixing samples.
datasets
datasets
A data frame with 31979 genes' expression on the 7 mixing samples: reads.1, reads.2, reads.3, reads.4, reads.5, reads.6, reads.7
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(liver_kidney)
data(liver_kidney)
A function is used to estimate the the confidence interval for the proportions predicted by deconvolution through bootstrapping.
decon.bootstrap(data.set, possible.signatures, n.sig, n.iter)
decon.bootstrap(data.set, possible.signatures, n.sig, n.iter)
data.set |
the data object for mixing samples |
possible.signatures |
a data frame providing the expression values from pure tissue samples |
n.sig |
the number of genes/transcripts used for estimation of proportions from our deconvolution |
n.iter |
the number of bootstraps for our deconvolution |
A three dimentional array to store means and 95% confidence interval
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156.
This function predicts proportions of constituting cell types from gene expression data generated from RNA-Seq data. Perform nonnegative quadratic programming to get per-sample based globally optimized solutions for constituting cell types .
DeconRNASeq(datasets, signatures, proportions = NULL, checksig = FALSE, known.prop = FALSE, use.scale = TRUE, fig = TRUE)
DeconRNASeq(datasets, signatures, proportions = NULL, checksig = FALSE, known.prop = FALSE, use.scale = TRUE, fig = TRUE)
datasets |
measured mixture data matrix, genes (transcripts) e.g. gene counts by samples, . The user can choose the appropriate counts, RPKM, FPKM etc.. |
signatures |
signature matrix from different tissue/cell types, genes (transcripts) by cell types. For gene counts, the user can choose the appropriate counts, RPKM, FPKM etc.. |
proportions |
proportion matrix from different tissue/cell types. |
checksig |
whether the condition number of signature matrix should be checked, efault = FALSE |
known.prop |
whether the proportions of cell types have been known in advanced for proof of concept, default = FALSE |
use.scale |
whether the data should be centered or scaled, default = TRUE |
fig |
whether to generate the scatter plots of the estimated cell fractions vs. the true proportions of cell types, default = TRUE |
Data in the originally measured mixuture sample matrix: datasets and reference matrix: signatures, need to be non-negative. We recommend to deconvolute without log-scale.
Function DeconRNA-Seq returns a list of results
out.all |
estimated cell type fraction matrix for all the mixture samples |
out.pca |
svd calculated PCA on the mixture samples to estimate the number of pure sources according to the cumulative R2 |
out.rmse |
averaged root mean square error (RMSE)) measuring the differences between fractions predicted by our model and the truth fraction matrix for all the tissue types |
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156.
## Please refer our demo ##source("DeconRNASeq.R") ### multi_tissue: expression profiles for 10 mixing samples from multiple tissues #data(multi_tissue.rda) #datasets <- x.data[,2:11] #signatures <- x.signature.filtered.optimal[,2:6] #proportions <- fraction #DeconRNASeq(datasets, signatures, proportions, checksig=FALSE, known.prop = TRUE, use.scale = TRUE) #
## Please refer our demo ##source("DeconRNASeq.R") ### multi_tissue: expression profiles for 10 mixing samples from multiple tissues #data(multi_tissue.rda) #datasets <- x.data[,2:11] #signatures <- x.signature.filtered.optimal[,2:6] #proportions <- fraction #DeconRNASeq(datasets, signatures, proportions, checksig=FALSE, known.prop = TRUE, use.scale = TRUE) #
A data frame providing the fractions from multiple tissues in the mixing samples
fraction
fraction
A martix whose rows are mixing samples' name and columns are fractions from pure tissues including brain, muscle, lung, liver and heart
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(multi_tissue)
data(multi_tissue)
a list containing:
1) datasets:a data drame providing the RPKM of seven mixing samples.
2) proportions: a data frame providing the fractions for liver and kidney in the mixing samples
3) signatures: a data frame providing the expression values from pure liver and kidney samples
liver_kidney
liver_kidney
A list 1) a data frame with 31979 genes' expression on the 7 mixing samples: reads.1, reads.2, reads.3, reads.4, reads.5, reads.6, reads.7
2) a martix whose rows are mixing samples' name and columns are fractions from pure live and kidney tissues
3) a data matrix with 630 expressions from pure liver and kidney tissues
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(liver_kidney)
data(liver_kidney)
a list containing:
1) x.data:a data frame providing the RPKM of nine mixing samples.
2) x.signatures: a data frame providing the expression values from pure brain, muscle, lung, liver and heart samples.
3) x.signatures.filtered: a data frame providing the expression values from pure brain, muscle, lung, liver and heart samples after filtering.
4) x.signatures.filtered.optimal: a data frame providing the expression values from pure brain, muscle, lung, liver and heart samples used for the example in DeconRNA-Seq.
5)fraction: a data frame providing the fractions from 5 tissues in the mixing samples
multi_tissue
multi_tissue
A list 1) a matrix with all the genes' expression in the mixing samples: the first two columns are corresponding to the RefSeq accession numbers and gene symbols
2) a martix whose rows are gene symbols and columns are RPKM expressions from pure tissues.
3) a martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: the genes with RPKM less than 200 within any of the five tissues have been filtered.
4) a martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: based on the filtered signature matrix, the optimal number of genes have been selected for the deconvolution according to the condition numbers
5) a martix whose rows are mixing samples' name and columns are fractions from pure tissues including brain, muscle, lung, liver and heart
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(multi_tissue)
data(multi_tissue)
A function is used to draw the multiple plots of proportions of cells determined from deconvolution vs. proportions of the cells actually mixed. Each plot corresponds to one tissue/cell.
multiplot(..., plotlist = NULL, cols)
multiplot(..., plotlist = NULL, cols)
... |
any number of the plot objects that store the scatter plots for all the cells/tissue types |
plotlist |
any other plot objects |
cols |
columns of the plots, default = 1 |
A pdf file with the plots of proportions of cells determined from deconvolution vs. proportions of the cells actually mixed with RMSE
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156.
## The function is currently defined as function (..., plotlist = NULL, cols) { pdf("scatterplots.pdf") require(grid) plots <- c(list(...), plotlist) numPlots = length(plots) plotCols = cols plotRows = ceiling(numPlots/plotCols) grid.newpage() pushViewport(viewport(layout = grid.layout(plotRows, plotCols))) vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y) for (i in 1:numPlots) { curRow = ceiling(i/plotCols) curCol = (i - 1)%%plotCols + 1 print(plots[[i]], vp = vplayout(curRow, curCol)) } dev.off() }
## The function is currently defined as function (..., plotlist = NULL, cols) { pdf("scatterplots.pdf") require(grid) plots <- c(list(...), plotlist) numPlots = length(plots) plotCols = cols plotRows = ceiling(numPlots/plotCols) grid.newpage() pushViewport(viewport(layout = grid.layout(plotRows, plotCols))) vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y) for (i in 1:numPlots) { curRow = ceiling(i/plotCols) curCol = (i - 1)%%plotCols + 1 print(plots[[i]], vp = vplayout(curRow, curCol)) } dev.off() }
proportions: a data frame providing the fractions for liver and kidney in the mixing samples
proportions
proportions
a martix whose rows are mixing samples' name and columns are fractions from pure live and kidney tissues
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(liver_kidney)
data(liver_kidney)
A function is used to calculate the root-mean-square error (RMSE) for the accurracy of estimated proportions.
rmse(x, y)
rmse(x, y)
x |
proportions from the actual measurement |
y |
estimated proportions from our deconvolution |
A number for RMSE
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156.
signatures: a data frame providing the expression values from pure liver and kidney samples
signatures
signatures
a data matrix with 630 expressions from pure liver and kidney tissues
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(liver_kidney)
data(liver_kidney)
A data frame providing the RPKM of nine mixing samples.
x.data
x.data
A matrix with all the genes' expression in the mixing samples: the first two columns are corresponding to the RefSeq accession numbers and gene symbols
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(multi_tissue)
data(multi_tissue)
A data frame providing the expression values from pure brain, muscle, lung, liver and heart samples.
x.signature
x.signature
A martix whose rows are gene symbols and columns are RPKM expressions from pure tissues.
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(multi_tissue)
data(multi_tissue)
A data frame providing the expression values from pure brain, muscle, lung, liver and heart samples after filtering.
x.signature.filtered
x.signature.filtered
A martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: the genes with RPKM less than 200 within any of the five tissues have been filtered.
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(multi_tissue)
data(multi_tissue)
A data frame providing the expression values from pure brain, muscle, lung, liver and heart samples used for the example in DeconRNA-Seq.
x.signature.filtered.optimal
x.signature.filtered.optimal
A martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: based on the filtered signature matrix, the optimal number of genes have been selected for the deconvolution according to the condition numbers
Ting Gong [email protected] Joseph D. Szustakowski [email protected]
data(multi_tissue)
data(multi_tissue)