Title: | Individual-Specific ceLl typE referencing Tool |
---|---|
Description: | ISLET is a method to conduct signal deconvolution for general -omics data. It can estimate the individual-specific and cell-type-specific reference panels, when there are multiple samples observed from each subject. It takes the input of the observed mixture data (feature by sample matrix), and the cell type mixture proportions (sample by cell type matrix), and the sample-to-subject information. It can solve for the reference panel on the individual-basis and conduct test to identify cell-type-specific differential expression (csDE) genes. It also improves estimated cell type mixture proportions by integrating personalized reference panels. |
Authors: | Hao Feng [aut, cre] , Qian Li [aut], Guanqun Meng [aut] |
Maintainer: | Hao Feng <[email protected]> |
License: | GPL-2 |
Version: | 1.9.0 |
Built: | 2024-10-30 07:33:31 UTC |
Source: | https://github.com/bioc/ISLET |
This function, caseEst
, extracts the estimated reference panels from the case group. It takes one outputSol
objects as the input, and produce a list of matrices containing the estimated reference panels. The length of the list is equal to the number of cell types. For each matrix, it contains the gene by subject reference panel for that specific cell type.
caseEst(res.sol)
caseEst(res.sol)
res.sol |
A |
This is the accessor function help to extract the estimated reference panels from isletSolve
step.
A list of matrices containing the estimated reference panels. The length of the list is equal to the number of cell types. For each matrix, it contains the gene by subject reference panel for that specific cell type.
Hao Feng <[email protected]>
data(GE600) study123input <- dataPrep(dat_se=GE600_se) res.sol <- isletSolve(input=study123input) caseVal <- caseEst(res.sol)
data(GE600) study123input <- dataPrep(dat_se=GE600_se) res.sol <- isletSolve(input=study123input) caseVal <- caseEst(res.sol)
This function, ctrlEst
, extracts the estimated reference panels from the control group. It takes one outputSol
objects as the input, and produce a list of matrices containing the estimated reference panels. The length of the list is equal to the number of cell types. For each matrix, it contains the gene by subject reference panel for that specific cell type.
ctrlEst(res.sol)
ctrlEst(res.sol)
res.sol |
A |
This is the accessor function help to extract the estimated reference panels from isletSolve
step.
A list of matrices containing the estimated reference panels. The length of the list is equal to the number of cell types. For each matrix, it contains the gene by subject reference panel for that specific cell type.
Hao Feng <[email protected]>
data(GE600) study123input <- dataPrep(dat_se=GE600_se) res.sol <- isletSolve(input=study123input) caseVal <- ctrlEst(res.sol)
data(GE600) study123input <- dataPrep(dat_se=GE600_se) res.sol <- isletSolve(input=study123input) caseVal <- ctrlEst(res.sol)
This function, dataPrep
, is a necessary step to make your data ready and acceptable to ISLET. It takes one SummarizedExperiment
objects listed below as the input, and produce a list ready to be feed into ISLET downstream deconvolution (function isletSolve
) and/or cell type-specific differentially expressed gene testing (function isletTest
).
dataPrep(dat_se)
dataPrep(dat_se)
dat_se |
A |
This is the initial step for using ISLET, to prepare your data input ready for downstream deconvolution (function isletSolve
) and/or differentially expressed gene testing (function isletTest
). The input data must follow requirements listed above.
dataPrep
returns a S4 object, containing elements ready to serve as the input for downstream deconvolution (function isletSolve
) and/or differentially expressed gene testing (function isletTest
).
Hao Feng <[email protected]>
data(GE600) ls() ## [1] "GE600_se" study123input <- dataPrep(dat_se=GE600_se)
data(GE600) ls() ## [1] "GE600_se" study123input <- dataPrep(dat_se=GE600_se)
This function, dataPrepSlope
, is a necessary step to make your data ready and acceptable to ISLET slope effect testing. It takes one SummarizedExperiment
objects containing both case and control group as the input, and produce a list ready to be feed into ISLET downstream deconvolution (function isletSolve
) and/or differentially expressed gene testing (function isletTest
).
dataPrepSlope(dat_se)
dataPrepSlope(dat_se)
dat_se |
A |
This is the initial step for using ISLET, to prepare your data input ready for downstream cell-type-specific differentially expressed gene testing (function isletTest
) with respect to the slope variable. The input data matrices must follow requirements listed above, and samples/subjects must be ordered and match across matrices.
dataPrepSlope
returns a S4 object, containing elements ready to serve as the input for cell-type-specific differentially expressed gene testing (function isletTest
).
Hao Feng <[email protected]>
data(GE600age) ls() ## [1] "GE600age_se" #(1) Data preparation study456input <- dataPrepSlope(dat_se=GE600age_se) #(2) [Downstream] Test for slope effect(i.e. age) difference in csDE testing #result.test <- isletTest(input=study456input)
data(GE600age) ls() ## [1] "GE600age_se" #(1) Data preparation study456input <- dataPrepSlope(dat_se=GE600age_se) #(2) [Downstream] Test for slope effect(i.e. age) difference in csDE testing #result.test <- isletTest(input=study456input)
GE600
contains the raw example datasets for ISLET. It has the gene expression values, in the form of RNA-seq raw read counts, for 10 genes by 520 sample, with 83 cases and 89 controls, and multiple repeated measurements (i.e. time points) per subject. Data were combined by case/control status, into one single SummarizedExperiment
object. These raw example datasets will need to be converted by the dataPrep
function, and then they will be ready for downstream deconvolution (function isletSolve
) and/or differentially expressed gene testing (function isletTest
).
data(GE600)
data(GE600)
One SummarizedExperiment
object containing the following elements:
counts
A gene expression value dataset, in the form of RNA-seq raw read counts, of 10 genes by 520 sample, with 83 cases and 89 controls, and multiple repeated measurements (i.e. time points) per subject.
colData
Sample meta-data. The first column is the group status (i.e. case/ctrl), the second column is the subject ID, shows the relationship between the samples IDs and their subject IDs. The remaining 6 columns (i.e. column 3-8) are the cell type proportions of all samples by their 6 cell types.
One SummarizedExperiment
object.
data(GE600) ls() ## [1] "GE600_se" #show GE600_se details GE600_se #An object of class "SummarizedExperiment" #Then, we can proceed to data preparation step, function 'dataPrep' for ISLET. ##The rest of the deconvolution/csDE analysis will then follow.
data(GE600) ls() ## [1] "GE600_se" #show GE600_se details GE600_se #An object of class "SummarizedExperiment" #Then, we can proceed to data preparation step, function 'dataPrep' for ISLET. ##The rest of the deconvolution/csDE analysis will then follow.
GE600age
contains the example input datasets for ISLET's slope testing function. It has the gene expression values, in the form of RNA-seq raw read counts, for 10 genes by 520 sample, with 83 cases and 89 controls, and multiple repeated measurements (i.e. time points) per subject. Temporal measures are at different age, with an age
variable stored in the metadata. This is the main variable-of-interest in downstream testing. Data were combined by case/control status, into one SummarizedExperiment
object. These example datasets will need to be converted by the dataPrep
function, and then they will be ready for downstream deconvolution (function isletSolve
) and/or differentially expressed gene testing (function isletTest
).
data(GE600age)
data(GE600age)
One SummarizedExperiment
object containing the following elements:
counts
A gene expression value dataset, in the form of RNA-seq raw read counts, of 10 genes by 520 sample, with 83 cases and 89 controls, and multiple repeated measurements (i.e. time points) per subject.
colData
Sample meta-data. The first column is the case/ctrl group status, the second column is the subject ID, shows the relationship between the samples IDs and their subject IDs. The third column is the age variable for each sample, which is the main variable in downstream testing. The remaining 6 columns (i.e. column 4-9) are the cell type proportions of all samples by their 6 cell types.
One SummarizedExperiment
object.
data(GE600age) ls() ## [1] "GE600age_se" #show GE600age_se details GE600age_se #An object of class "SummarizedExperiment" #Then, we can proceed to data preparation step, function 'dataPrep' for ISLET. ##The rest of the csDE testing on age(slope) effect will then follow.
data(GE600age) ls() ## [1] "GE600age_se" #show GE600age_se details GE600age_se #An object of class "SummarizedExperiment" #Then, we can proceed to data preparation step, function 'dataPrep' for ISLET. ##The rest of the csDE testing on age(slope) effect will then follow.
This core function, imply
, plays a central role in the imply algorithm. It takes the output from data preparation function implyDataPrep
, and utilizes linear mixed-effects models to solve for individual-specific and cell-type-specific reference panels. It has parallel computing implemented to enhance computational efficiency.
imply(dat123)
imply(dat123)
dat123 |
The list object output from data preparation function |
imply
is a two-step algorithm to enhance cell deconvolution results by employing subject-specific and cell-type-specific (personalized) reference panels.
In step I, personalized reference panels are generated for each subject using linear mixed-effects models. These personalized references are tailored to individual subjects.
In step II, these personalized reference panels replace the population-level signature matrix, typically used in traditional reference-based deconvolution methods. This substitution enables a personalized deconvolution process across all subjects, by employing the non-negative least squares algorithm.
The function returns a list containing personalized reference panels and improved cell deconvolution results.
A list with the estimated components of the imply
algorithm:
p.ref |
The estimated subject-specific and cell-type-specific reference panels. It is an array of dimension G by K by N, where G is the total number of genetic features, K is the total number of cell types, and N is the total number of subjects. |
imply.prop |
The improved cell deconvolution results based on personalized reference panels from |
Guanqun Meng <[email protected]>
data(GE600) ls() ## [1] "GE600_se" #(1) Data preparation dat123 <- implyDataPrep(sim_se=GE600_se) #(2) improved and personalized cell deconvolution result <- imply(dat123) str(result) #List of 2 # $ p.ref : num [1:10, 1:6, 1:172] 0 0 0.952 13.438 19.007 ... # ..- attr(*, "dimnames")=List of 3 # .. ..$ : chr [1:10] "gene1" "gene2" "gene3" "gene4" ... # .. ..$ : chr [1:6] "Bcells" "Tcells_CD4" "Tcells_CD8" "NKcells" ... # .. ..$ : chr [1:172] "210298" "223361" "228055" "229203" ... # $ imply.prop:'data.frame': 520 obs. of 6 variables: # ..$ Bcells : num [1:520] 0.4806 0.1912 0.0843 0.2177 0 ... # ..$ Tcells_CD4: num [1:520] 0 0 0 0 0 ... # ..$ Tcells_CD8: num [1:520] 0 0 0.2129 0.0507 0.2584 ... # ..$ NKcells : num [1:520] 0.1114 0.2487 0.153 0 0.0543 ... # ..$ Mono : num [1:520] 0.0547 0.0271 0 0 0.343 ... # ..$ Others : num [1:520] 0.353 0.533 0.55 0.732 0.344 ...
data(GE600) ls() ## [1] "GE600_se" #(1) Data preparation dat123 <- implyDataPrep(sim_se=GE600_se) #(2) improved and personalized cell deconvolution result <- imply(dat123) str(result) #List of 2 # $ p.ref : num [1:10, 1:6, 1:172] 0 0 0.952 13.438 19.007 ... # ..- attr(*, "dimnames")=List of 3 # .. ..$ : chr [1:10] "gene1" "gene2" "gene3" "gene4" ... # .. ..$ : chr [1:6] "Bcells" "Tcells_CD4" "Tcells_CD8" "NKcells" ... # .. ..$ : chr [1:172] "210298" "223361" "228055" "229203" ... # $ imply.prop:'data.frame': 520 obs. of 6 variables: # ..$ Bcells : num [1:520] 0.4806 0.1912 0.0843 0.2177 0 ... # ..$ Tcells_CD4: num [1:520] 0 0 0 0 0 ... # ..$ Tcells_CD8: num [1:520] 0 0 0.2129 0.0507 0.2584 ... # ..$ NKcells : num [1:520] 0.1114 0.2487 0.153 0 0.0543 ... # ..$ Mono : num [1:520] 0.0547 0.0271 0 0 0.343 ... # ..$ Others : num [1:520] 0.353 0.533 0.55 0.732 0.344 ...
This implyDataPrep
function serves a necessary step in preparing your data for imply. It takes a single input, a SummarizedExperiment
object, and generateds a formated S4 object that can be used as input for personalized deconvolution (imply
).
implyDataPrep(sim_se)
implyDataPrep(sim_se)
sim_se |
A |
This is the initial step for preparing your input data for the imply algorithm, making it ready for personalized deconvolution using the imply
function. Ensure that your input data adheres to the requirements listed below.
implyDataPrep
returns an S4 object containing elements that are prepared to serve as input for downstream deconvolution using the imply
function.
Guanqun Meng <[email protected]>
data(GE600) ls() ## [1] "GE600_se" dat123 <- implyDataPrep(sim_se=GE600_se)
data(GE600) ls() ## [1] "GE600_se" dat123 <- implyDataPrep(sim_se=GE600_se)
This function, isletSolve
, is a core function of ISLET. It takes the output from data preparation function dataPrep
, and solve for individual-specific and cell-type-specific reference panels. It has parallel computing implemented to speed up the EM algorithm application.
isletSolve(input, BPPARAM=bpparam() )
isletSolve(input, BPPARAM=bpparam() )
input |
The list object output from data preparation function |
BPPARAM |
An instance of |
For both case group and control group, the deconvolution result is a list of length K, where K is the number of cell types. For each of the K elements, it is a matrix of dimension G by N. It stores the deconvoluted feature (G) by subject (N) values, for each of the K elements.
case.ind.ref |
A list of length K, where K is the number of cell types. For each of the K elements in this list, it is a feature by subject matrix containing all the feature values (i.e. gene expression values), for case group. It is one of the main products the individual-specific and cell-type-specific solve algorithm. |
ctrl.ind.ref |
A list of length K, where K is the number of cell types. For each of the K elements in this list, it is a feature by subject matrix containing all the feature values (i.e. gene expression values), for control group. It is one of the main products the individual-specific and cell-type-specific solve algorithm. |
mLLK |
A scalar. The log-likelihood from the current model. It can be useful for testing purpose such as Likelihood Ratio Test. |
Hao Feng <[email protected]>
data(GE600) ls() ## [1] "GE600_se" #(1) Data preparation study123input <- dataPrep(dat_se=GE600_se) #(2) Individual-specific and cell-type-specific deconvolution result <- isletSolve(input=study123input)
data(GE600) ls() ## [1] "GE600_se" #(1) Data preparation study123input <- dataPrep(dat_se=GE600_se) #(2) Individual-specific and cell-type-specific deconvolution result <- isletSolve(input=study123input)
This function, isletTest
, can take the output from data preparation function dataPrep
, and test for csDE genes. It uses Likelihood Ratio Test (LRT), iterating all cell types. The output is a matrix of p-values from LRT. It has parallel computing implemented to speed up the EM algorithm application.
isletTest(input, BPPARAM=bpparam() )
isletTest(input, BPPARAM=bpparam() )
input |
The list object output from data preparation function |
BPPARAM |
An instance of |
This function implement a LRT, and run individually for each cell type, and then aggregate the results together into a matrix.
A p-value matrix, in the dimension of feature by cell type. Each element is the LRT p-value, by contrasting case group and control group, for one feature in one cell type.
Hao Feng <[email protected]>
data(GE600) ls() ## [1] "GE600_se" #(1) Data preparation study123input <- dataPrep(dat_se=GE600_se) #(2) [optional for csDE genes testing] Individual-specific and cell-type-specific deconvolution #result.solve <- isletSolve(input=study123input) #(3) Test for csDE genes result.test <- isletTest(input=study123input)
data(GE600) ls() ## [1] "GE600_se" #(1) Data preparation study123input <- dataPrep(dat_se=GE600_se) #(2) [optional for csDE genes testing] Individual-specific and cell-type-specific deconvolution #result.solve <- isletSolve(input=study123input) #(3) Test for csDE genes result.test <- isletTest(input=study123input)