Title: | Unsupervised Deconvolution of Tumor-Stromal Mixed Expressions |
---|---|
Description: | UNDO is an R package for unsupervised deconvolution of tumor and stromal mixed expression data. It detects marker genes and deconvolutes the mixing expression data without any prior knowledge. |
Authors: | Niya Wang <[email protected]> |
Maintainer: | Niya Wang <[email protected]> |
License: | GPL-2 |
Version: | 1.49.0 |
Built: | 2024-10-31 06:22:35 UTC |
Source: | https://github.com/bioc/UNDO |
This package contains main function "two_source_deconv" to implement the deconvolution of mixed tumor-stromal expressions in a completely unsupervised way. The prior knowledge of mixing matrix or pure expression is not needed. The package detects marker genes and calculate the mixing matrix and pure expressions automatically.
Package: | UNDO |
Type: | Package |
Version: | 1.7.3 |
Date: | 2014-04-30 |
License: | GPL version 2 or later |
two_source_deconv(ExpressionData,lowper=0.4,highper=0.1,epsilon1=0.01,epsilon2=0.01,A=NULL,S1=NULL,S2=NULL,return=0)
Niya Wang <[email protected]>
data(NumericalMixMCF7HS27) X <- NumericalMixMCF7HS27 deconvResult <- two_source_deconv(X, lowper = 0.4, highper = 0.1, epsilon1 = 0.1, epsilon2 = 0.1, A = NULL, S1=NULL, S2=NULL, return = 0)
data(NumericalMixMCF7HS27) X <- NumericalMixMCF7HS27 deconvResult <- two_source_deconv(X, lowper = 0.4, highper = 0.1, epsilon1 = 0.1, epsilon2 = 0.1, A = NULL, S1=NULL, S2=NULL, return = 0)
Expression data from MCF7 and HS27 biologically mixing
data(BiologicalMixMCF7HS27)
data(BiologicalMixMCF7HS27)
The format is: Formal class 'ExpressionSet' [package "Biobase"] with 7 slots ..@ experimentData :Formal class 'MIAME' [package "Biobase"] with 13 slots .. .. ..@ name : chr "" .. .. ..@ lab : chr "" .. .. ..@ contact : chr "" .. .. ..@ title : chr "" .. .. ..@ abstract : chr "" .. .. ..@ url : chr "" .. .. ..@ pubMedIds : chr "" .. .. ..@ samples : list() .. .. ..@ hybridizations : list() .. .. ..@ normControls : list() .. .. ..@ preprocessing : list() .. .. ..@ other : list() .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 2 .. .. .. .. .. ..$ : int [1:3] 1 0 0 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ assayData :<environment: 0x0000000008d92618> ..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 2 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ featureData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 22215 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "featureNames" "featureColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ annotation : chr "HG-U133A" ..@ protocolData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 2 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. ..@ .Data:List of 4 .. .. .. ..$ : int [1:3] 3 1 0 .. .. .. ..$ : int [1:3] 2 23 6 .. .. .. ..$ : int [1:3] 1 3 0 .. .. .. ..$ : int [1:3] 1 0 0
data(BiologicalMixMCF7HS27) str(BiologicalMixMCF7HS27)
data(BiologicalMixMCF7HS27) str(BiologicalMixMCF7HS27)
A function used to calculate the E1 measurement when the real mixing matrix is provided
calc_E1(A, Aest)
calc_E1(A, Aest)
A |
real mixing matrix |
Aest |
estimated mixing matrix |
E1 measurement (numeric)
Niya Wang <[email protected]>
A <- matrix(runif(4),2,2) Aest <- matrix(runif(4),2,2) E1 <- calc_E1(A,Aest) # to calculate the similarity of two randowm 2*2 matrix
A <- matrix(runif(4),2,2) Aest <- matrix(runif(4),2,2) E1 <- calc_E1(A,Aest) # to calculate the similarity of two randowm 2*2 matrix
When the number of input samples is larger than 2, this function is called to reduce the dimension to 2 by using PCA.
dimension_reduction(X)
dimension_reduction(X)
X |
gene expression data matrix |
X |
|
dimenMatrix |
the dimension reduction matrix used to recover the mixing matrix for all the samples |
Niya Wang ([email protected])
X <- matrix(runif(5000),1000,5) dimenResult <- dimension_reduction(X)
X <- matrix(runif(5000),1000,5) dimenResult <- dimension_reduction(X)
Check the input gene expression data to see whether they are nonempty, nonnegative, etc.
gene_expression_input(X)
gene_expression_input(X)
X |
gene expression data matrix with row representing genes/probe sets, and column representing samples. |
If the input is valid, the output will be the same as the input; otherwise, if the input contains NA, the corresponding rows will be deleted. if the input contains negative value, the algorithm will stop and give error information.
Niya Wang ([email protected])
gene_expression <- matrix(runif(2000),1000,2) valid_gene_expression <- gene_expression_input(gene_expression)
gene_expression <- matrix(runif(2000),1000,2) valid_gene_expression <- gene_expression_input(gene_expression)
Select the marker genes in tumor and stroma in an unsupervised way
marker_gene_selection(X, lowper, highper, epsilon1, epsilon2)
marker_gene_selection(X, lowper, highper, epsilon1, epsilon2)
X |
gene expression data |
lowper |
The percentage of genes the user wants to remove with lowest norm. The range should be between 0 and 1. |
highper |
The percentage of genes the user wants to remove with highest norm.The range should be between 0 and 1. |
epsilon1 |
Influence the number of marker genes. With increasing of epsilon1, the number marker genes in source 1 will increase. The value should be positive. |
epsilon2 |
Influence the number of marker genes. With increasing of epsilon1, the number marker genes in source 2 will increase. The value should be positive. |
a1 |
The slope of marker genes in source 1 |
a2 |
The slope of marker genes in source 2 |
MG1 |
The gene list of marker genes in source 1 |
MG2 |
The gene list of marker genes in source 2 |
dimenMatrix |
dimension reduction matrix |
Niya Wang ([email protected])
X <- matrix(runif(20000),10000,2) MG_set <- marker_gene_selection(X, 0.4, 0.1, 0.1, 0.1)
X <- matrix(runif(20000),10000,2) MG_set <- marker_gene_selection(X, 0.4, 0.1, 0.1, 0.1)
Calculate the mixing matrix based on the output from marker_gene_selection(), and scale the mixing matrix to make the sum of proportions from tumor and stroma equal to 1. The pure expression levels of tumor and stroma are also computed.
mixing_matrix_computation(X, a1, a2, dimenMatrix)
mixing_matrix_computation(X, a1, a2, dimenMatrix)
X |
Gene expression data matrix |
a1 |
The slope of marker genes in source 1 |
a2 |
The slope of marker genes in source 2 |
dimenMatrix |
The dimention reduction matrix used to recover mixing matrix for all the samples |
Aest |
estimated mixing matrix |
Sest |
estimated pure gene expression of two sources |
Niya Wang ([email protected])
a1<- matrix(runif(2),2,1) a2<- matrix(runif(2),2,1) X <- 1000*matrix(runif(20000),10000,2) dimenMatrix <- NULL Deconv <- mixing_matrix_computation(X, a1, a2, dimenMatrix)
a1<- matrix(runif(2),2,1) a2<- matrix(runif(2),2,1) X <- 1000*matrix(runif(20000),10000,2) dimenMatrix <- NULL Deconv <- mixing_matrix_computation(X, a1, a2, dimenMatrix)
real mixing matrix of data NumericalMixMCF7HS27
data(NumericalMixingMatrix)
data(NumericalMixingMatrix)
The format is: num [1:2, 1:2] 0.775 0.15 0.225 0.85 - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:2] "V1" "V2"
data(NumericalMixingMatrix) str(NumericalMixingMatrix)
data(NumericalMixingMatrix) str(NumericalMixingMatrix)
Expression data from MCF7 and HS27 numerically mixing
data(NumericalMixMCF7HS27)
data(NumericalMixMCF7HS27)
The format is: Formal class 'ExpressionSet' [package "Biobase"] with 7 slots ..@ experimentData :Formal class 'MIAME' [package "Biobase"] with 13 slots .. .. ..@ name : chr "" .. .. ..@ lab : chr "" .. .. ..@ contact : chr "" .. .. ..@ title : chr "" .. .. ..@ abstract : chr "" .. .. ..@ url : chr "" .. .. ..@ pubMedIds : chr "" .. .. ..@ samples : list() .. .. ..@ hybridizations : list() .. .. ..@ normControls : list() .. .. ..@ preprocessing : list() .. .. ..@ other : list() .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 2 .. .. .. .. .. ..$ : int [1:3] 1 0 0 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ assayData :<environment: 0x000000000e86a5d0> ..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 2 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ featureData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 22215 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "featureNames" "featureColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ annotation : chr "HG-U133A" ..@ protocolData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 2 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. ..@ .Data:List of 4 .. .. .. ..$ : int [1:3] 3 1 0 .. .. .. ..$ : int [1:3] 2 23 6 .. .. .. ..$ : int [1:3] 1 3 0 .. .. .. ..$ : int [1:3] 1 0 0
data(NumericalMixMCF7HS27) str(NumericalMixMCF7HS27)
data(NumericalMixMCF7HS27) str(NumericalMixMCF7HS27)
pure MCF7 and HS27 expression data
data(PureMCF7HS27)
data(PureMCF7HS27)
The format is: Formal class 'ExpressionSet' [package "Biobase"] with 7 slots ..@ experimentData :Formal class 'MIAME' [package "Biobase"] with 13 slots .. .. ..@ name : chr "" .. .. ..@ lab : chr "" .. .. ..@ contact : chr "" .. .. ..@ title : chr "" .. .. ..@ abstract : chr "" .. .. ..@ url : chr "" .. .. ..@ pubMedIds : chr "" .. .. ..@ samples : list() .. .. ..@ hybridizations : list() .. .. ..@ normControls : list() .. .. ..@ preprocessing : list() .. .. ..@ other : list() .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 2 .. .. .. .. .. ..$ : int [1:3] 1 0 0 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ assayData :<environment: 0x000000000e979d20> ..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 2 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ featureData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 22215 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "featureNames" "featureColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ annotation : chr "HG-U133A" ..@ protocolData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable: .. .. .. ..$ labelDescription: chr(0) .. .. ..@ data :'data.frame': 2 obs. of 0 variables .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns" .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. .. .. ..@ .Data:List of 1 .. .. .. .. .. ..$ : int [1:3] 1 1 0 ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots .. .. ..@ .Data:List of 4 .. .. .. ..$ : int [1:3] 3 1 0 .. .. .. ..$ : int [1:3] 2 23 6 .. .. .. ..$ : int [1:3] 1 3 0 .. .. .. ..$ : int [1:3] 1 0 0
data(PureMCF7HS27) str(PureMCF7HS27)
data(PureMCF7HS27) str(PureMCF7HS27)
This is the main function that is to call all the other subfunctions and realize the deconvolution of mixed expression data. When the real mixing matrix exist, it will also compare the estimated mixing matrix and real mixing matrix and give the E1 measurement.
two_source_deconv(ExpressionData, lowper = 0.4, highper = 0.1, epsilon1 = 0.01, epsilon2 = 0.01, A = NULL, S1=NULL, S2=NULL, return = 0)
two_source_deconv(ExpressionData, lowper = 0.4, highper = 0.1, epsilon1 = 0.01, epsilon2 = 0.01, A = NULL, S1=NULL, S2=NULL, return = 0)
ExpressionData |
gene expression data matrix/ExpressionSet object |
lowper |
The percentage of genes the user wants to remove with lowest norm. The range should be between 0 and 1. |
highper |
The percentage of genes the user wants to remove with highest norm.The range should be between 0 and 1. |
epsilon1 |
Influence the number of marker genes. With increasing of epsilon1, the number marker genes in source 1 will increase. The value should be positive. |
epsilon2 |
Influence the number of marker genes. With increasing of epsilon1, the number marker genes in source 2 will increase. The value should be positive. |
A |
real mixing matrix if existing |
S1 |
Pure expression profile of first source if existing |
S2 |
Pure expression profile of second source if existing |
return |
if it is equal to 0, do not return estimated S; otherwise, return the estimated S. |
Aest |
estimated mixing matrix |
E1 |
E1 measurement between real and estimated mixing matrix |
Niya Wang ([email protected])
data(NumericalMixMCF7HS27) X <- NumericalMixMCF7HS27 deconvResult <- two_source_deconv(X, lowper = 0.4, highper = 0.1, epsilon1 = 0.1, epsilon2 = 0.1, A = NULL, S1=NULL,S2=NULL, return = 0)
data(NumericalMixMCF7HS27) X <- NumericalMixMCF7HS27 deconvResult <- two_source_deconv(X, lowper = 0.4, highper = 0.1, epsilon1 = 0.1, epsilon2 = 0.1, A = NULL, S1=NULL,S2=NULL, return = 0)