Title: | Tools for finding Total RNA Expression Genes in single nucleus RNA-seq data |
---|---|
Description: | RNA abundance and cell size parameters could improve RNA-seq deconvolution algorithms to more accurately estimate cell type proportions given the different cell type transcription activity levels. A Total RNA Expression Gene (TREG) can facilitate estimating total RNA content using single molecule fluorescent in situ hybridization (smFISH). We developed a data-driven approach using a measure of expression invariance to find candidate TREGs in postmortem human brain single nucleus RNA-seq. This R package implements the method for identifying candidate TREGs from snRNA-seq data. |
Authors: | Louise Huuki-Myers [aut, cre] , Leonardo Collado-Torres [ctb] |
Maintainer: | Louise Huuki-Myers <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.11.0 |
Built: | 2024-10-31 06:24:58 UTC |
Source: | https://github.com/bioc/TREG |
This function uses the data.frame()
generated to find the maximum Proportion
Zero across groups, then filter to a set of genes that pass the max Prop Zero
cutoff defined by the user.
filter_prop_zero(prop_zero_df, cutoff = 0.9, na.rm = TRUE)
filter_prop_zero(prop_zero_df, cutoff = 0.9, na.rm = TRUE)
prop_zero_df |
|
cutoff |
A |
na.rm |
a logical indicating whether missing values should be removed when max prop_zero is calculated. |
A character()
of gene names that are under the cutoff.
These names are from the rownames()
of the expression data.
Other Proportion Zero functions:
get_prop_zero()
## Get Proportion Zero data.frame prop_zero <- get_prop_zero(sce_zero_test, group_col = "group") ## Filter with max Proportion Zero cutoff = 0.59 filter_prop_zero(prop_zero, cutoff = 0.59) ## NA handling prop_zero[1, 1] <- NA ## If NA are present in the prop_zero_df the user can choose to: ## Remove NAs (default) ## genes with NA counts may be included in the final list filter_prop_zero(prop_zero, cutoff = 0.59, na.rm = TRUE) ## Include NA values ## any gene with genes with NA counts (so max prop_zero will be NA) ## will be removed from the final list, this will throw warning filter_prop_zero(prop_zero, cutoff = 0.59, na.rm = FALSE)
## Get Proportion Zero data.frame prop_zero <- get_prop_zero(sce_zero_test, group_col = "group") ## Filter with max Proportion Zero cutoff = 0.59 filter_prop_zero(prop_zero, cutoff = 0.59) ## NA handling prop_zero[1, 1] <- NA ## If NA are present in the prop_zero_df the user can choose to: ## Remove NAs (default) ## genes with NA counts may be included in the final list filter_prop_zero(prop_zero, cutoff = 0.59, na.rm = TRUE) ## Include NA values ## any gene with genes with NA counts (so max prop_zero will be NA) ## will be removed from the final list, this will throw warning filter_prop_zero(prop_zero, cutoff = 0.59, na.rm = FALSE)
This function calculates the Proportion Zero for each gene in each user defined group. Proportion Zero = number of zero counts for a gene for a group of cells/number of cells in the group.
get_prop_zero(sce, group_col = "cellType")
get_prop_zero(sce, group_col = "cellType")
sce |
SummarizedExperiment-class object |
group_col |
name of the column in the
colData() of |
For more information about calculating Proportion Zero, check equation 1 from the vignette in section "Calculate Proportion Zero and Pick Cutoff".
A data.frame()
containing proportion of zero counts, genes as rows,
groups as columns.
Other Proportion Zero functions:
filter_prop_zero()
## Basic Proportion counts == 0 rowSums(assays(sce_zero_test)$counts == 0) / ncol(sce_zero_test) ## Get proportion by the default group "cellType" get_prop_zero(sce_zero_test) ## Get proportion by user defined grouping of the data get_prop_zero(sce_zero_test, group_col = "group") ## Groups with missing levels will be dropped get_prop_zero(sce_zero_test, group_col = "cellType_na")
## Basic Proportion counts == 0 rowSums(assays(sce_zero_test)$counts == 0) / ncol(sce_zero_test) ## Get proportion by the default group "cellType" get_prop_zero(sce_zero_test) ## Get proportion by user defined grouping of the data get_prop_zero(sce_zero_test, group_col = "group") ## Groups with missing levels will be dropped get_prop_zero(sce_zero_test, group_col = "cellType_na")
This function finds the rank of each gene's expression for each cell,
grouped by the user defined variable. This data is used to compute the rank
invariance value for each gene later with rank_invariance()
.
rank_cells(sce, group_col = "cellType", assay = "logcounts")
rank_cells(sce, group_col = "cellType", assay = "logcounts")
sce |
SummarizedExperiment-class object with
the |
group_col |
name of the column in the
colData() of |
assay |
A |
A named list()
of matrix()
objects. Each matrix()
contains the
rank values for the cells belonging to one group.
Other invariance functions:
rank_group()
,
rank_invariance_express()
,
rank_invariance()
## Rank the genes for each cell, organized by "group" column rank_cells(sce_zero_test, group_col = "group")
## Rank the genes for each cell, organized by "group" column rank_cells(sce_zero_test, group_col = "group")
This function finds the rank of each gene's mean expression all cells in a
group. This data is used to compute the rank invariance value for each gene
later with rank_invariance()
.
rank_group(sce, group_col = "cellType", assay = "logcounts")
rank_group(sce, group_col = "cellType", assay = "logcounts")
sce |
SummarizedExperiment-class object with
the |
group_col |
name of the column in the
colData() of |
assay |
A |
Named list()
of ranks for each gene.
Other invariance functions:
rank_cells()
,
rank_invariance_express()
,
rank_invariance()
## Rank the genes for each group defined by "group" column rank_group(sce_zero_test, group_col = "group")
## Rank the genes for each group defined by "group" column rank_group(sce_zero_test, group_col = "group")
This function computes the Rank Invariance value for each gene, from the cell
and group ranks computed by rank_cells()
and rank_group()
respectively.
Genes with high RI values are considered good candidate TREGs.
rank_invariance(group_rank, cell_rank)
rank_invariance(group_rank, cell_rank)
group_rank |
A |
cell_rank |
A |
A numeric()
with the rank of invariance for each gene. High values
represent low Rank Invariance, these genes are considered good candidate TREGs.
Other invariance functions:
rank_cells()
,
rank_group()
,
rank_invariance_express()
## Get the rank of the gene in each group group_rank_test <- rank_group(sce_zero_test, group_col = "group") ## Get the rank of the gene for each cell cell_rank_test <- rank_cells(sce_zero_test, group_col = "group") ## Use both rankings to calculate rank_invariance rank_invar_test <- rank_invariance(group_rank_test, cell_rank_test) ## Highest RI value is best candidate TREG sort(rank_invar_test, decreasing = TRUE)
## Get the rank of the gene in each group group_rank_test <- rank_group(sce_zero_test, group_col = "group") ## Get the rank of the gene for each cell cell_rank_test <- rank_cells(sce_zero_test, group_col = "group") ## Use both rankings to calculate rank_invariance rank_invar_test <- rank_invariance(group_rank_test, cell_rank_test) ## Highest RI value is best candidate TREG sort(rank_invar_test, decreasing = TRUE)
This function computes the Rank Invariance value for each gene, over the
groups defined by the user. This function computes the same RI values as running
rank_cells()
+ rank_group()
+ rank_invariance()
.
Genes with high RI values are considered good candidate TREGs. This function is
more efficient than running the previous three functions.
rank_invariance_express(sce, group_col = "cellType", assay = "logcounts")
rank_invariance_express(sce, group_col = "cellType", assay = "logcounts")
sce |
SummarizedExperiment-class object with
the |
group_col |
name of the column in the
colData() of |
assay |
A |
A numeric()
with the rank of invariance for each gene. High values
represent low Rank Invariance, these genes are considered good candidate TREGs.
Other invariance functions:
rank_cells()
,
rank_group()
,
rank_invariance()
## Calculate RI for the sce object ## Highest RI value is best candidate TREG, and can change based on the grouping of interest rank_invariance_express(sce_zero_test, group_col = "group") rank_invariance_express(sce_zero_test, group_col = "cellType")
## Calculate RI for the sce object ## Highest RI value is best candidate TREG, and can change based on the grouping of interest rank_invariance_express(sce_zero_test, group_col = "group") rank_invariance_express(sce_zero_test, group_col = "cellType")
A simulated SummarizedExperiment-class
object representing the expression of 100 cells across 5 genes.
This object is used as an example dataset throughout TREG
.
sce_zero_test
sce_zero_test
A SummarizedExperiment-class object with 5 rows and 100 columns.
The colData
of the object contains demo sample information about the cell,
that were designed to be used as group_col
, (such as cell_type
, donor
, and region
).
The expression of the 5 genes were designed to show a range of
Proportion Zeros across groups for different genes.
The overall prop_zero
for these theoretical genes are:
g100: prop_zero = 1.00
g50: prop_zero = 0.50
g0: prop_zero = 0.00
gOffOn: prop_zero = 0.50
gVar: prop_zero = 0.54