Title: | Differential cell type change analysis using Logistic/linear Regression |
---|---|
Description: | The goal of LRcell is to identify specific sub-cell types that drives the changes observed in a bulk RNA-seq differential gene expression experiment. To achieve this, LRcell utilizes sets of cell marker genes acquired from single-cell RNA-sequencing (scRNA-seq) as indicators for various cell types in the tissue of interest. Next, for each cell type, using its marker genes as indicators, we apply Logistic Regression on the complete set of genes with differential expression p-values to calculate a cell-type significance p-value. Finally, these p-values are compared to predict which one(s) are likely to be responsible for the differential gene expression pattern observed in the bulk RNA-seq experiments. LRcell is inspired by the LRpath[@sartor2009lrpath] algorithm developed by Sartor et al., originally designed for pathway/gene set enrichment analysis. LRcell contains three major components: LRcell analysis, plot generation and marker gene selection. All modules in this package are written in R. This package also provides marker genes in the Prefrontal Cortex (pFC) human brain region, human PBMC and nine mouse brain regions (Frontal Cortex, Cerebellum, Globus Pallidus, Hippocampus, Entopeduncular, Posterior Cortex, Striatum, Substantia Nigra and Thalamus). |
Authors: | Wenjing Ma [cre, aut] |
Maintainer: | Wenjing Ma <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.15.0 |
Built: | 2024-11-29 07:12:31 UTC |
Source: | https://github.com/bioc/LRcell |
This function takes a specific gene expression, cell type annotation and a hyperparameter to calculate enrichment scores.
enrich_posfrac_score(gene, expr, annot, power = 1)
enrich_posfrac_score(gene, expr, annot, power = 1)
gene |
Gene name from the expression matrix. |
expr |
Complete expression matrix with rows as genes and columns as cells. |
annot |
Cell type annotation named vector with names as cell ids and values as cell types. |
power |
The penalty on fraction of cells expressing the genes |
Enrichment score list with cell type as names and enrichment score as values.
A named vector containing gene symbols as name and p-values as values. This is from a mouse Alzheimer's disease model (GEO: GSE90693), specifically 6 months after treatment in Frontal Cortex brain region. In this dataset, we expect to see the Microglia as the most enriched cell type.
data(example_gene_pvals)
data(example_gene_pvals)
A named vector with 23,420 items
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE90693 'GSE90693_RawCountsData_TPR50_6months_AllRegions.txt.gz'
An example output of LRcell using data example_gene_pvals and mouse_FC_marker_genes.
data(example_LRcell_res)
data(example_LRcell_res)
A data frame with 81 rows as mouse FC sub-clusters and 8 variables:
The IDs of each marker genes, can be a cell type or cluster
How many marker genes are contributing to the analysis
The coefficients of Logistic Regression or Linear Regression
The odds ratio quantifies association in Logistic Regression
The p-value calculated from the analysis
The FDR after BH correction
Genes that are contributing to the analysis
Cell typ ename
Get top marker genes for each subcluster
get_markergenes(enriched.g, method = c("LR", "LiR"), topn = 100)
get_markergenes(enriched.g, method = c("LR", "LiR"), topn = 100)
enriched.g |
A return from LRcell_gene_enriched_scores or from provided data |
method |
If LR, the return will be a list of genes; If LiR, the return will be a list of named vector with names as genes and values as enriched scores. |
topn |
Top N genes as marker genes. |
A list of top marker genes.
library(ExperimentHub) eh <- ExperimentHub::ExperimentHub() eh <- query(eh, "LRcellTypeMarkers") # eh$title enriched_genes <- eh[['EH4548']] marker.g <- get_markergenes(enriched_genes, method="LR", topn=100)
library(ExperimentHub) eh <- ExperimentHub::ExperimentHub() eh <- query(eh, "LRcellTypeMarkers") # eh$title enriched_genes <- eh[['EH4548']] marker.g <- get_markergenes(enriched_genes, method="LR", topn=100)
This function wraps around LRcellCore
in case of empty
inputs of the marker gene file and brain region.
LRcell( gene.p, marker.g = NULL, species = c("mouse", "human"), region = NULL, method = c("LR", "LiR"), min.size = 5, sig.cutoff = 0.05 )
LRcell( gene.p, marker.g = NULL, species = c("mouse", "human"), region = NULL, method = c("LR", "LiR"), min.size = 5, sig.cutoff = 0.05 )
gene.p |
Named vector of gene-level pvalues from DEG analysis, i.e. DESeq2, LIMMA |
marker.g |
List of Cell-type specific marker genes derived from single-cell RNA-seq. The name of the list is cell-type or cluster name, the values are marker genes vectors or numeric named vectors. LRcell provides marker genes list in different human/mouse brains, but users could use their own marker gene list as input. default: NULL |
species |
Either 'mouse' or 'human', default: mouse. |
region |
Specific brain regions provided by LRcell. For mouse, LRcell provides 9 brain regions: c("FC", "HC", "PC", "GP", "STR", "TH", "SN", "ENT", "CB"). For human, LRcell provides c("pFC", "PBMC") |
method |
Either 'logistic regression' or 'linear regression'. Logistic regression equally treats cell-type specific marker genes, however, if certain values could determine the importance of marker genes, linear regression can be performed, default: LR. |
min.size |
Minimal size of a marker gene set, will impact the balance of labels |
sig.cutoff |
Cutoff for input genes pvalues, default: 0.05. |
A list with LRcell results. Each item represents a marker gene input. Each item in this list is a statistics table. In the table, the row represents the name of marker genes, and the columns are:
ID The IDs of each marker genes, can be a cell type or cluster;
genes_num How many marker genes are contributing to the analysis;
coef The coefficients of Logistic Regression or Linear Regression;
odds_ratio The odds ratio quantifies association in Logistic Regression;
p-value The p-value calculated from the analysis;
FDR The FDR after BH correction.
lead_genes Genes that are contributing to the analysis;
data(example_gene_pvals) res <- LRcell(example_gene_pvals, species="mouse", region="FC", method="LR")
data(example_gene_pvals) res <- LRcell(example_gene_pvals, species="mouse", region="FC", method="LR")
This is a function which takes marker genes from single-cell RNA-seq as reference to calculate the enrichment of certain cell types in bulk DEG analysis. This algorithm borrows from Marques et al, 2016 (https://science.sciencemag.org/content/352/6291/1326.long).
LRcell_gene_enriched_scores( expr, annot, power = 1, parallel = TRUE, n.cores = 4 )
LRcell_gene_enriched_scores( expr, annot, power = 1, parallel = TRUE, n.cores = 4 )
expr |
Expression matrix with rows as genes and columns as cells, can be an object of Matrix or dgCMatrix or a dataframe. |
annot |
Cell type annotation named vector with names as cell ids and values as cell types. |
power |
The penalty on fraction of cells expressing the genes. |
parallel |
Whether to run it in parallel. |
n.cores |
How many cores to use in parallel mode. |
A numeric matrix with rows as genes and columns as cell types, values are gene enrichment scores.
This is a function which takes marker genes from single-cell RNA-seq as reference to calculate the enrichment of certain cell types in bulk DEG analysis. We assume that bulk DEG is derived from certain cell-type specific pattern.
LRcellCore(gene.p, marker.g, method, min.size = 5, sig.cutoff = 0.05)
LRcellCore(gene.p, marker.g, method, min.size = 5, sig.cutoff = 0.05)
gene.p |
Named vector of gene-level pvalues from DEG analysis, i.e. DESeq2, LIMMA |
marker.g |
List of Cell-type specific marker genes derived from single-cell RNA-seq. The name of the list is cell-type or cluster name, the values are marker genes vectors or numeric named vectors. LRcell provides marker genes list in different human/mouse brains, but users could use their own marker gene list as input. default: NULL |
method |
Either 'logistic regression' or 'linear regression'. Logistic regression equally treats cell-type specific marker genes, however, if certain values could determine the importance of marker genes, linear regression can be performed, default: LR. |
min.size |
Minimal size of a marker gene set, will impact the balance of labels |
sig.cutoff |
Cutoff for input genes' pvalues, default: 0.05. |
A dataframe of LRcell statistics as described in LRcell
.
data(mouse_FC_marker_genes) data(example_gene_pvals) res <- LRcellCore(example_gene_pvals, mouse_FC_marker_genes, method="LR")
data(mouse_FC_marker_genes) data(example_gene_pvals) res <- LRcellCore(example_gene_pvals, mouse_FC_marker_genes, method="LR")
A named vector containing the subclusters as name and cell types as values in Mouse Brain. The cell types are pre-annotated by the dataset, which includes: Endothelial, FibroblastLike, Mural, Oligodendrocytes, Polydendrocytes, Astrocytes and Microglia.
data(mouse_celltypes)
data(mouse_celltypes)
A named vector with 565 subclusters:
Named vector with name as subclusters and values as cell types.
http://dropviz.org/ under tab 'data'
A list of marker genes with names indicating cell types. We selected top 100 enriched genes from each subcluster as marker genes list.
data(mouse_FC_marker_genes)
data(mouse_FC_marker_genes)
A named vector with 81 subclusters in mouse Frontal Cortex:
Named vector with name as subclusters and values as marker genes.
Caculated from gene enrichment scores
This function draws out the LRcell result dataframe. In this function, we take LRcell result dataframe and added cell types according to
plot_manhattan_enrich(lrcell_res, sig.cutoff = 0.05, label.topn = 5)
plot_manhattan_enrich(lrcell_res, sig.cutoff = 0.05, label.topn = 5)
lrcell_res |
LRcell result dataframe. |
sig.cutoff |
The p-value cutoff showing significance result of LRcell. |
label.topn |
A numeric number showing how many significant cell types will be labeled. |
A ggplot2 object
data(example_LRcell_res) plot_manhattan_enrich(example_LRcell_res)
data(example_LRcell_res) plot_manhattan_enrich(example_LRcell_res)
This function draws out the marker gene distribution for a certain cell type (or cluster) on the DE gene rank list.
plot_marker_dist(markers, gene.p, colour = "red")
plot_marker_dist(markers, gene.p, colour = "red")
markers |
Vector of marker genes from a cell type or cluster of interest. |
gene.p |
Named vector of gene-level pvalues from DEG analysis, i.e. DESeq2, LIMMA |
colour |
Users can define the bar color they want on the ggplot2 object. |
A ggplot2 object
data(example_gene_pvals) data(mouse_FC_marker_genes) Oligos_markers <- mouse_FC_marker_genes[["FC_9-5.Oligodendrocytes_5"]] plot_marker_dist(Oligos_markers, example_gene_pvals)
data(example_gene_pvals) data(mouse_FC_marker_genes) Oligos_markers <- mouse_FC_marker_genes[["FC_9-5.Oligodendrocytes_5"]] plot_marker_dist(Oligos_markers, example_gene_pvals)