Title: | Expression Weighted Celltype Enrichment |
---|---|
Description: | Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses. |
Authors: | Alan Murphy [cre] , Brian Schilder [aut] , Nathan Skene [aut] |
Maintainer: | Alan Murphy <[email protected]> |
License: | GPL-3 |
Version: | 1.15.0 |
Built: | 2024-11-29 07:15:57 UTC |
Source: | https://github.com/bioc/EWCE |
Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.
EWCE: Expression Weighted Celltype Enrichment
Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies.
The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.
Maintainer: Alan Murphy [email protected] (ORCID)
Authors:
Brian Schilder [email protected] (ORCID)
Nathan Skene [email protected] (ORCID)
Useful links:
add_res_to_merging_list
adds EWCE results to a list
for merging analysis.
add_res_to_merging_list(full_res, existing_results = NULL)
add_res_to_merging_list(full_res, existing_results = NULL)
full_res |
Results list generated using bootstrap_enrichment_test or ewce_expression_data functions. Multiple results tables can be merged into one results table, as long as the 'list' column is set to distinguish them. |
existing_results |
Output of previous rounds from adding results to list. Leave empty if this is the first item in the list. |
Merged results list.
# Load the single cell data ctd <- ewceData::ctd() # Load the data tt_alzh <- ewceData::tt_alzh() # tt_alzh_BA36 <- ewceData::tt_alzh_BA36() # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 # Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 # Run EWCE analysis # tt_results <- ewce_expression_data( # sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh, # reps = reps, ttSpecies = "human", sctSpecies = "mouse" # ) # tt_results_36 <- ewce_expression_data( # sct_data = ctd, tt = tt_alzh_BA36, annotLevel = 1, thresh = thresh, # reps = reps, ttSpecies = "human", sctSpecies = "mouse" # ) # Fill a list with the results results <- add_res_to_merging_list(tt_alzh) # results <- add_res_to_merging_list(tt_alzh_BA36, results)
# Load the single cell data ctd <- ewceData::ctd() # Load the data tt_alzh <- ewceData::tt_alzh() # tt_alzh_BA36 <- ewceData::tt_alzh_BA36() # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 # Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 # Run EWCE analysis # tt_results <- ewce_expression_data( # sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh, # reps = reps, ttSpecies = "human", sctSpecies = "mouse" # ) # tt_results_36 <- ewce_expression_data( # sct_data = ctd, tt = tt_alzh_BA36, annotLevel = 1, thresh = thresh, # reps = reps, ttSpecies = "human", sctSpecies = "mouse" # ) # Fill a list with the results results <- add_res_to_merging_list(tt_alzh) # results <- add_res_to_merging_list(tt_alzh_BA36, results)
bin_columns_into_quantiles
bin_columns_into_quantiles
is an internal function used to convert a
vector of specificity into a vector of specificity quantiles.
This function can be iterated across a matrix using apply
to create a matrix of specificity quantiles.
bin_columns_into_quantiles( vec, numberOfBins = 40, defaultBin = as.integer(numberOfBins/2) )
bin_columns_into_quantiles( vec, numberOfBins = 40, defaultBin = as.integer(numberOfBins/2) )
vec |
The vector of gene of specificity values. |
numberOfBins |
Number of quantile bins to use (40 is recommended). |
defaultBin |
Which bin to assign when there's only one non-zero quantile. In situations where there's only one non-zero quantile, cut throws an error. Avoid these situations by using a default quantile. |
A vector with same length as vec
but with columns storing
quantiles instead of specificity.
ctd <- ewceData::ctd() ctd[[1]]$specificity_quantiles <- apply(ctd[[1]]$specificity, 2, FUN = bin_columns_into_quantiles)
ctd <- ewceData::ctd() ctd[[1]]$specificity_quantiles <- apply(ctd[[1]]$specificity, 2, FUN = bin_columns_into_quantiles)
bin_specificity_into_quantiles
is an internal function used to convert
add '$specificity_quantiles' to a ctd
bin_specificity_into_quantiles( ctdIN, numberOfBins, matrix_name = "specificity_quantiles", as_sparse = TRUE, verbose = TRUE )
bin_specificity_into_quantiles( ctdIN, numberOfBins, matrix_name = "specificity_quantiles", as_sparse = TRUE, verbose = TRUE )
ctdIN |
A single annotLevel of a ctd, i.e. |
numberOfBins |
Number of quantile 'bins' to use (40 is recommended). |
matrix_name |
Name of the specificity matrix to create (default: "specificity_quantiles"). |
as_sparse |
Convert to sparseMatrix. |
verbose |
Print messages. |
A ctd with "specificity_quantiles" matrix in each level
(or whatever matrix_name
was set to.).
ctd <- ewceData::ctd() ctd <- lapply(ctd, EWCE::bin_specificity_into_quantiles, numberOfBins = 40) print(ctd[[1]]$specificity_quantiles[1:3, ])
ctd <- ewceData::ctd() ctd <- lapply(ctd, EWCE::bin_specificity_into_quantiles, numberOfBins = 40) print(ctd[[1]]$specificity_quantiles[1:3, ])
bootstrap_enrichment_test
takes a genelist and a single cell type
transcriptome dataset and determines the probability of enrichment and fold
changes for each cell type.
bootstrap_enrichment_test( sct_data = NULL, hits = NULL, bg = NULL, genelistSpecies = NULL, sctSpecies = NULL, sctSpecies_origin = sctSpecies, output_species = "human", method = "homologene", reps = 100, no_cores = 1, annotLevel = 1, geneSizeControl = FALSE, controlledCT = NULL, mtc_method = "BH", sort_results = TRUE, standardise_sct_data = TRUE, standardise_hits = FALSE, verbose = TRUE, localHub = FALSE, store_gene_data = TRUE )
bootstrap_enrichment_test( sct_data = NULL, hits = NULL, bg = NULL, genelistSpecies = NULL, sctSpecies = NULL, sctSpecies_origin = sctSpecies, output_species = "human", method = "homologene", reps = 100, no_cores = 1, annotLevel = 1, geneSizeControl = FALSE, controlledCT = NULL, mtc_method = "BH", sort_results = TRUE, standardise_sct_data = TRUE, standardise_hits = FALSE, verbose = TRUE, localHub = FALSE, store_gene_data = TRUE )
sct_data |
List generated using generate_celltype_data. |
hits |
List of gene symbols containing the target gene list.
Will automatically be converted to human gene symbols
if |
bg |
List of gene symbols containing the background gene list
(including hit genes). If |
genelistSpecies |
Species that |
sctSpecies |
Species that |
sctSpecies_origin |
Species that the |
output_species |
Species to convert |
method |
R package to use for gene mapping:
|
reps |
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results). |
no_cores |
Number of cores to parallelise
bootstrapping |
annotLevel |
An integer indicating which level of |
geneSizeControl |
Whether you want to control for
GC content and transcript length. Recommended if the gene list originates
from genetic studies (Default: FALSE).
If set to |
controlledCT |
[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type. |
mtc_method |
Multiple-testing correction method (passed to p.adjust). |
sort_results |
Sort enrichment results from smallest to largest p-values. |
standardise_sct_data |
Should
|
standardise_hits |
Should
If |
verbose |
Print messages. |
localHub |
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality. |
store_gene_data |
Store sampled gene data for every bootstrap iteration.
When the number of bootstrap |
A list containing three elements:
hit.cells
: vector containing the summed proportion of
expression in each cell type for the target list.
gene_data:
data.table showing the number of time each gene
appeared in the bootstrap sample.
bootstrap_data
: matrix in which each row represents the
summed proportion of expression in each cell type for one of the
random lists
controlledCT
: the controlled cell type (if applicable)
# Load the single cell data sct_data <- ewceData::ctd() # Set the parameters for the analysis # Use 3 bootstrap lists for speed, for publishable analysis use >=10,000 reps <- 3 # Load gene list from Alzheimer's disease GWAS hits <- ewceData::example_genelist() # Bootstrap significance test, no control for transcript length or GC content full_results <- EWCE::bootstrap_enrichment_test( sct_data = sct_data, hits = hits, reps = reps, annotLevel = 1, sctSpecies = "mouse", genelistSpecies = "human")
# Load the single cell data sct_data <- ewceData::ctd() # Set the parameters for the analysis # Use 3 bootstrap lists for speed, for publishable analysis use >=10,000 reps <- 3 # Load gene list from Alzheimer's disease GWAS hits <- ewceData::example_genelist() # Bootstrap significance test, no control for transcript length or GC content full_results <- EWCE::bootstrap_enrichment_test( sct_data = sct_data, hits = hits, reps = reps, annotLevel = 1, sctSpecies = "mouse", genelistSpecies = "human")
check_ewce_genelist_inputs
Is used to check that hits and bg gene
lists passed to EWCE are setup correctly. Checks they are the
appropriate length.
Checks all hits are in bg. Checks the species match and if not
reduces to 1:1 orthologs.
check_ewce_genelist_inputs( sct_data, hits, bg = NULL, genelistSpecies = NULL, sctSpecies = NULL, sctSpecies_origin = sctSpecies, output_species = "human", method = "homologene", geneSizeControl = FALSE, standardise_sct_data = TRUE, standardise_hits = FALSE, min_genes = 4, verbose = TRUE )
check_ewce_genelist_inputs( sct_data, hits, bg = NULL, genelistSpecies = NULL, sctSpecies = NULL, sctSpecies_origin = sctSpecies, output_species = "human", method = "homologene", geneSizeControl = FALSE, standardise_sct_data = TRUE, standardise_hits = FALSE, min_genes = 4, verbose = TRUE )
sct_data |
List generated using generate_celltype_data. |
hits |
List of gene symbols containing the target gene list.
Will automatically be converted to human gene symbols
if |
bg |
List of gene symbols containing the background gene list
(including hit genes). If |
genelistSpecies |
Species that |
sctSpecies |
Species that |
sctSpecies_origin |
Species that the |
output_species |
Species to convert |
method |
R package to use for gene mapping:
|
geneSizeControl |
Whether you want to control for
GC content and transcript length. Recommended if the gene list originates
from genetic studies (Default: FALSE).
If set to |
standardise_sct_data |
Should
|
standardise_hits |
Should
If |
min_genes |
Minimum number of genes in a gene list to test. |
verbose |
Print messages. |
A list containing
hits
: Array of MGI/HGNC gene symbols containing the target
gene list.
bg
: Array of MGI/HGNC gene symbols containing the background
gene list.
ctd <- ewceData::ctd() example_genelist <- ewceData::example_genelist() # Called from "bootstrap_enrichment_test()" and "generate_bootstrap_plots()" checkedLists <- EWCE::check_ewce_genelist_inputs( sct_data = ctd, hits = example_genelist, sctSpecies = "mouse", genelistSpecies = "human" )
ctd <- ewceData::ctd() example_genelist <- ewceData::example_genelist() # Called from "bootstrap_enrichment_test()" and "generate_bootstrap_plots()" checkedLists <- EWCE::check_ewce_genelist_inputs( sct_data = ctd, hits = example_genelist, sctSpecies = "mouse", genelistSpecies = "human" )
After you run bootstrap_enrichment_test, check what percentage of significantly enriched cell types match an expected cell type.
check_percent_hits( full_results, target_celltype, mtc_method = "bonferroni", q_threshold = 0.05, verbose = TRUE )
check_percent_hits( full_results, target_celltype, mtc_method = "bonferroni", q_threshold = 0.05, verbose = TRUE )
full_results |
|
target_celltype |
Substring to search to matching cell types (case-insensitive). |
mtc_method |
Multiple-testing correction method. |
q_threshold |
Corrected significance threshold. |
verbose |
Print messages. |
Report list.
## Bootstrap significance test, ## no control for transcript length or GC content ## Use pre-computed results to speed up example full_results <- EWCE::example_bootstrap_results() report <- EWCE::check_percent_hits( full_results = full_results, target_celltype = "microglia" )
## Bootstrap significance test, ## no control for transcript length or GC content ## Use pre-computed results to speed up example full_results <- EWCE::example_bootstrap_results() report <- EWCE::check_percent_hits( full_results = full_results, target_celltype = "microglia" )
controlled_geneset_enrichment
tests whether a functional gene set is
still enriched in a disease gene set after controlling for the
disease gene set's enrichment in a particular cell type (the 'controlledCT')
controlled_geneset_enrichment( disease_genes, functional_genes, bg = NULL, sct_data, sctSpecies = NULL, output_species = "human", disease_genes_species = NULL, functional_genes_species = NULL, method = "homologene", annotLevel, reps = 100, controlledCT, use_intersect = FALSE, verbose = TRUE )
controlled_geneset_enrichment( disease_genes, functional_genes, bg = NULL, sct_data, sctSpecies = NULL, output_species = "human", disease_genes_species = NULL, functional_genes_species = NULL, method = "homologene", annotLevel, reps = 100, controlledCT, use_intersect = FALSE, verbose = TRUE )
disease_genes |
Array of gene symbols containing the disease gene list. Does not have to be disease genes. Must be from same species as the single cell transcriptome dataset. |
functional_genes |
Array of gene symbols containing the functional gene list. The enrichment of this gene set within the disease_genes is tested. Must be from same species as the single cell transcriptome dataset. |
bg |
List of gene symbols containing the background gene list
(including hit genes). If |
sct_data |
List generated using generate_celltype_data. |
sctSpecies |
Species that |
output_species |
Species to convert |
disease_genes_species |
Species of the
|
functional_genes_species |
Species of the
|
method |
R package to use for gene mapping:
|
annotLevel |
An integer indicating which level of |
reps |
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results). |
controlledCT |
[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type. |
use_intersect |
When |
verbose |
Print messages. |
A list containing three data frames:
p_controlled
The probability that functional_genes are
enriched in disease_genes while controlling for the level of specificity
in controlledCT
z_controlled
The z-score that functional_genes are enriched
in disease_genes while controlling for the level of specificity in
controlledCT
p_uncontrolled
The probability that functional_genes are
enriched in disease_genes WITHOUT controlling for the level of
specificity in controlledCT
z_uncontrolled
The z-score that functional_genes are enriched
in disease_genes WITHOUT controlling for the level of specificity in
controlledCT
reps=reps
controlledCT
actualOverlap=actual
The number of genes that overlap between
functional and disease gene sets
# See the vignette for more detailed explanations # Gene set enrichment analysis controlling for cell type expression # set seed for bootstrap reproducibility set.seed(12345678) ## load merged dataset from vignette ctd <- ewceData::ctd() schiz_genes <- ewceData::schiz_genes() hpsd_genes <- ewceData::hpsd_genes() # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 res_hpsd_schiz <- EWCE::controlled_geneset_enrichment( disease_genes = schiz_genes, functional_genes = hpsd_genes, sct_data = ctd, annotLevel = 1, reps = reps, controlledCT = "pyramidal CA1" )
# See the vignette for more detailed explanations # Gene set enrichment analysis controlling for cell type expression # set seed for bootstrap reproducibility set.seed(12345678) ## load merged dataset from vignette ctd <- ewceData::ctd() schiz_genes <- ewceData::schiz_genes() hpsd_genes <- ewceData::hpsd_genes() # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 res_hpsd_schiz <- EWCE::controlled_geneset_enrichment( disease_genes = schiz_genes, functional_genes = hpsd_genes, sct_data = ctd, annotLevel = 1, reps = reps, controlledCT = "pyramidal CA1" )
Copied from scKirby, which is not yet on CRAN or Bioconductor.
ctd_to_sce(object, as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE)
ctd_to_sce(object, as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE)
object |
CellTypeDataset object. |
as_sparse |
Store SingleCellExperiment matrices as sparse. |
as_DelayedArray |
Store SingleCellExperiment matrices as DelayedArray. |
verbose |
Print messages. |
SingleCellExperiment
ctd <- ewceData::ctd() sce <- EWCE::ctd_to_sce(ctd)
ctd <- ewceData::ctd() sce <- EWCE::ctd_to_sce(ctd)
drop_uninformative_genes
drops uninformative genes in order to reduce
compute time and noise in subsequent steps. It achieves this through several
steps, each of which are optional:
Drop non-1:1 orthologs:
Removes genes that don't have 1:1 orthologs
with the output_species
("human" by default).
Drop non-varying genes:
Removes genes that don't vary across cells
based on variance deciles.
Drop non-differentially expressed genes (DEGs):
Removes genes that are not significantly differentially
expressed across cell-types (multiple DEG methods available).
drop_uninformative_genes( exp, level2annot, mtc_method = "BH", adj_pval_thresh = 1e-05, convert_orths = FALSE, input_species = NULL, output_species = "human", non121_strategy = "drop_both_species", method = "homologene", as_sparse = TRUE, as_DelayedArray = FALSE, return_sce = FALSE, no_cores = 1, verbose = TRUE, ... )
drop_uninformative_genes( exp, level2annot, mtc_method = "BH", adj_pval_thresh = 1e-05, convert_orths = FALSE, input_species = NULL, output_species = "human", non121_strategy = "drop_both_species", method = "homologene", as_sparse = TRUE, as_DelayedArray = FALSE, return_sce = FALSE, no_cores = 1, verbose = TRUE, ... )
exp |
Expression matrix with gene names as rownames. |
level2annot |
Array of cell types, with each sequentially corresponding a column in the expression matrix. |
mtc_method |
Multiple-testing correction method used by DGE step. See p.adjust for more details. |
adj_pval_thresh |
Minimum differential expression significance
that a gene must demonstrate across |
convert_orths |
If |
input_species |
Which species the gene names in |
output_species |
Which species' genes names to convert |
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
method |
R package to use for gene mapping:
|
as_sparse |
Convert |
as_DelayedArray |
Convert |
return_sce |
Whether to return the filtered results as an expression matrix or a SingleCellExperiment. |
no_cores |
Number of cores to parallelise across.
Set to |
verbose |
Print messages. #' @inheritParams orthogene::convert_orthologs |
... |
Arguments passed on to
|
exp Expression matrix with gene names as row names.
cortex_mrna <- ewceData::cortex_mrna() # Use only a subset of genes to keep the example quick cortex_mrna$exp <- cortex_mrna$exp[1:300, ] ## Convert orthologs at the same time exp2_orth <- drop_uninformative_genes( exp = cortex_mrna$exp, level2annot = cortex_mrna$annot$level2class, input_species = "mouse" )
cortex_mrna <- ewceData::cortex_mrna() # Use only a subset of genes to keep the example quick cortex_mrna$exp <- cortex_mrna$exp[1:300, ] ## Convert orthologs at the same time exp2_orth <- drop_uninformative_genes( exp = cortex_mrna$exp, level2annot = cortex_mrna$annot$level2class, input_species = "mouse" )
ewce_expression_data
takes a differential gene expression (DGE)
results table and determines the probability of cell type enrichment
in the up- and down- regulated genes.
ewce_expression_data( sct_data, annotLevel = 1, tt, sortBy = "t", thresh = 250, reps = 100, ttSpecies = NULL, sctSpecies = NULL, output_species = NULL, bg = NULL, method = "homologene", verbose = TRUE, localHub = FALSE )
ewce_expression_data( sct_data, annotLevel = 1, tt, sortBy = "t", thresh = 250, reps = 100, ttSpecies = NULL, sctSpecies = NULL, output_species = NULL, bg = NULL, method = "homologene", verbose = TRUE, localHub = FALSE )
sct_data |
List generated using generate_celltype_data. |
annotLevel |
An integer indicating which level of |
tt |
Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols. |
sortBy |
Column name of metric in |
thresh |
The number of up- and down- regulated genes to be included in each analysis (Default: 250). |
reps |
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results). |
ttSpecies |
The species the differential expression table was generated from. |
sctSpecies |
Species that |
output_species |
Species to convert |
bg |
List of gene symbols containing the background gene list
(including hit genes). If |
method |
R package to use for gene mapping:
|
verbose |
Print messages. |
localHub |
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality. |
A list containing five data frames:
results
: dataframe in which each row gives the statistics
(p-value, fold change and number of standard deviations from the mean)
associated with the enrichment of the stated cell type in the gene list.
An additional column *Direction* stores whether it the result is from the
up or downregulated set.
hit.cells.up
: vector containing the summed proportion of
expression in each cell type for the target list.
hit.cells.down
: vector containing the summed proportion of
expression in each cell type for the target list.
bootstrap_data.up
: matrix in which each row represents the
summed proportion of expression in each cell type for one of the random
lists.
bootstrap_data.down
: matrix in which each row represents the
summed proportion of expression in each cell type for one of the random
lists.
# Load the single cell data ctd <- ewceData::ctd() # Set the parameters for the analysis # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 # Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons) # Load the top table tt_alzh <- ewceData::tt_alzh() tt_results <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh, reps = reps, ttSpecies = "human", sctSpecies = "mouse" )
# Load the single cell data ctd <- ewceData::ctd() # Set the parameters for the analysis # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 # Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons) # Load the top table tt_alzh <- ewceData::tt_alzh() tt_results <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh, reps = reps, ttSpecies = "human", sctSpecies = "mouse" )
ewce_plot
generates plots of EWCE enrichment results
ewce_plot( total_res, mtc_method = "bonferroni", q_threshold = 0.05, ctd = NULL, annotLevel = 1, heights = c(0.3, 1), make_dendro = FALSE, verbose = TRUE )
ewce_plot( total_res, mtc_method = "bonferroni", q_threshold = 0.05, ctd = NULL, annotLevel = 1, heights = c(0.3, 1), make_dendro = FALSE, verbose = TRUE )
total_res |
Results data.frame generated using bootstrap_enrichment_test or ewce_expression_data functions. Multiple results tables can be merged into one results table, as long as the 'list' column is set to distinguish them. Multiple testing correction is then applied across all merged results. |
mtc_method |
Method to be used for multiple testing correction. Argument is passed to p.adjust (DEFAULT: "bonferroni). |
q_threshold |
Corrected significance threshold. |
ctd |
CellTypeDataset object. Should be provided so that the dendrogram can be taken from it and added to plots. |
annotLevel |
An integer indicating which level of |
heights |
The relative heights row in the grid. Will get repeated to match the dimensions of the grid. Passed to wrap_plots. |
make_dendro |
Add a dendrogram (requires |
verbose |
Print messages. |
A named list containing versions of the ggplot with and without the dendrogram. Note that cell type order on the x-axis is based on hierarchical clustering for both plots if make_dendro = TRUE.
## Bootstrap significance test, ## no control for transcript length or GC content ## Use pre-computed results to speed up example total_res <- EWCE::example_bootstrap_results()$results plt <- ewce_plot(total_res = total_res)
## Bootstrap significance test, ## no control for transcript length or GC content ## Use pre-computed results to speed up example total_res <- EWCE::example_bootstrap_results()$results plt <- ewce_plot(total_res = total_res)
Example cell type enrichment results produced by bootstrap_enrichment_test.
example_bootstrap_results(verbose = TRUE, localHub = FALSE)
example_bootstrap_results(verbose = TRUE, localHub = FALSE)
verbose |
Print messages. |
localHub |
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality. |
List with 3 items.
# Load the single cell data
ctd <- ewceData::ctd()
# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >=10,000
reps <- 3
# Load gene list from Alzheimer's disease GWAS
example_genelist <- ewceData::example_genelist()
# Bootstrap significance test, no control for transcript length or GC content
full_results <- EWCE::bootstrap_enrichment_test( sct_data = ctd, hits = example_genelist, reps = reps, annotLevel = 1, sctSpecies = "mouse", genelistSpecies = "human" )
bootstrap_results <- full_results
save(bootstrap_results,file = "inst/extdata/bootstrap_results.rda")
full_results <- example_bootstrap_results()
full_results <- example_bootstrap_results()
Example celltype enrichment results produced by ewce_expression_data.
example_transcriptome_results(verbose = TRUE, localHub = FALSE)
example_transcriptome_results(verbose = TRUE, localHub = FALSE)
verbose |
Print messages. |
localHub |
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality. |
List with 5 items.
## Load the single cell data
ctd <- ewceData::ctd()
## Set the parameters for the analysis
## Use 3 bootstrap lists for speed, for publishable analysis use >10,000
reps <- 3
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)
## Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
## Load the top table
tt_alzh <- ewceData::tt_alzh()
tt_results <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh, reps = reps, ttSpecies = "human", sctSpecies = "mouse" )
save(tt_results, file = "inst/extdata/tt_results.rda")
tt_results <- EWCE::example_transcriptome_results()
tt_results <- EWCE::example_transcriptome_results()
Removes rows from each matrix within a CellTypeDataset (CTD) that are not
within gene_subset
.
filter_ctd_genes(ctd, gene_subset)
filter_ctd_genes(ctd, gene_subset)
ctd |
CellTypeDataset. |
gene_subset |
Genes to subset to. |
Filtered CellTypeDataset.
ctd <- ewceData::ctd() ctd <- standardise_ctd(ctd, input_species="mouse") gene_subset <- rownames(ctd[[1]]$mean_exp)[1:100] ctd_subset <- EWCE::filter_ctd_genes(ctd = ctd, gene_subset = gene_subset)
ctd <- ewceData::ctd() ctd <- standardise_ctd(ctd, input_species="mouse") gene_subset <- rownames(ctd[[1]]$mean_exp)[1:100] ctd_subset <- EWCE::filter_ctd_genes(ctd = ctd, gene_subset = gene_subset)
Deprecated function. Please use filter_nonorthologs instead.
filter_genes_without_1to1_homolog( filenames, input_species = "mouse", convert_nonhuman_genes = TRUE, annot_levels = NULL, suffix = "_orthologs", verbose = TRUE )
filter_genes_without_1to1_homolog( filenames, input_species = "mouse", convert_nonhuman_genes = TRUE, annot_levels = NULL, suffix = "_orthologs", verbose = TRUE )
filenames |
List of file names for sct_data saved as .rda files. |
input_species |
Which species the gene names in |
convert_nonhuman_genes |
Whether to convert the |
annot_levels |
[Optional] Names of each annotation level. |
suffix |
Suffix to add to the file name (right before .rda). |
verbose |
Print messages. |
Note: This function replaces the original
filter_genes_without_1to1_homolog
function.
filter_genes_without_1to1_homolog
is
now a wrapper for filter_nonorthologs
.
List of the filtered CellTypeData file names.
# Load the single cell data ctd <- ewceData::ctd() tmp <- tempfile() save(ctd, file = tmp) fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)
# Load the single cell data ctd <- ewceData::ctd() tmp <- tempfile() save(ctd, file = tmp) fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)
filter_nonorthologs
Takes the filenames of CellTypeData files,
loads them, drops any genes which don't have a 1:1 orthologs with humans,
and then convert the gene to human orthologs.
The new files are then saved to disk, appending
'_orthologs' to the file name.
filter_nonorthologs( filenames, input_species = NULL, convert_nonhuman_genes = TRUE, annot_levels = NULL, suffix = "_orthologs", method = "homologene", non121_strategy = "drop_both_species", verbose = TRUE, ... )
filter_nonorthologs( filenames, input_species = NULL, convert_nonhuman_genes = TRUE, annot_levels = NULL, suffix = "_orthologs", method = "homologene", non121_strategy = "drop_both_species", verbose = TRUE, ... )
filenames |
List of file names for sct_data saved as .rda files. |
input_species |
Which species the gene names in |
convert_nonhuman_genes |
Whether to convert the |
annot_levels |
[Optional] Names of each annotation level. |
suffix |
Suffix to add to the file name (right before .rda). |
method |
R package to use for gene mapping:
|
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
verbose |
Print messages. |
... |
Arguments passed on to
|
Note: This function replaces the original
filter_genes_without_1to1_homolog
function.
filter_genes_without_1to1_homolog
is
now a wrapper for filter_nonorthologs
.
List of the filtered CellTypeData file names.
# Load the single cell data ctd <- ewceData::ctd() tmp <- tempfile() save(ctd, file = tmp) fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)
# Load the single cell data ctd <- ewceData::ctd() tmp <- tempfile() save(ctd, file = tmp) fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)
Given an expression matrix, wherein the rows are supposed to be HGNC symbols, find those symbols which are not official HGNC symbols, then correct them if possible. Return the expression matrix with corrected symbols.
fix_bad_hgnc_symbols( exp, dropNonHGNC = FALSE, as_sparse = TRUE, verbose = TRUE, localHub = FALSE )
fix_bad_hgnc_symbols( exp, dropNonHGNC = FALSE, as_sparse = TRUE, verbose = TRUE, localHub = FALSE )
exp |
An expression matrix where the rows are HGNC symbols or a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object. |
dropNonHGNC |
Boolean. Should symbols not recognised as HGNC symbols be dropped? |
as_sparse |
Convert |
verbose |
Print messages. |
localHub |
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality. |
Returns the expression matrix with the rownames corrected and rows representing the same gene merged. If a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object was inputted this will be returned with the corrected expression matrix under counts.
# create example expression matrix, could be part of a exp, annot list obj exp <- matrix(data = runif(70), ncol = 10) # Add HGNC gene names but add with an error: # MARCH8 is a HGNC symbol which if opened in excel will convert to Mar-08 rownames(exp) <- c("MT-TF", "MT-RNR1", "MT-TV", "MT-RNR2", "MT-TL1", "MT-ND1", "Mar-08") exp <- fix_bad_hgnc_symbols(exp) # fix_bad_hgnc_symbols warns the user of this possible issue
# create example expression matrix, could be part of a exp, annot list obj exp <- matrix(data = runif(70), ncol = 10) # Add HGNC gene names but add with an error: # MARCH8 is a HGNC symbol which if opened in excel will convert to Mar-08 rownames(exp) <- c("MT-TF", "MT-RNR1", "MT-TV", "MT-RNR2", "MT-TL1", "MT-ND1", "Mar-08") exp <- fix_bad_hgnc_symbols(exp) # fix_bad_hgnc_symbols warns the user of this possible issue
Also checks whether any gene names contain "Sep", "Mar" or "Feb". These should be checked for any suggestion that excel has corrupted the gene names.
fix_bad_mgi_symbols( exp, mrk_file_path = NULL, printAllBadSymbols = FALSE, as_sparse = TRUE, verbose = TRUE, localHub = FALSE )
fix_bad_mgi_symbols( exp, mrk_file_path = NULL, printAllBadSymbols = FALSE, as_sparse = TRUE, verbose = TRUE, localHub = FALSE )
exp |
An expression matrix where the rows are MGI symbols, or a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object. |
mrk_file_path |
Path to the MRK_List2 file which can be downloaded from www.informatics.jax.org/downloads/reports/index.html |
printAllBadSymbols |
Output to console all the bad gene symbols |
as_sparse |
Convert |
verbose |
Print messages. |
localHub |
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality. |
Returns the expression matrix with the rownames corrected and rows representing the same gene merged. If no corrections are necessary, input expression matrix is returned. If a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object was inputted this will be returned with the corrected expression matrix under counts.
# Load the single cell data cortex_mrna <- ewceData::cortex_mrna() # take a subset for speed cortex_mrna$exp <- cortex_mrna$exp[1:50, 1:5] cortex_mrna$exp <- fix_bad_mgi_symbols(cortex_mrna$exp)
# Load the single cell data cortex_mrna <- ewceData::cortex_mrna() # take a subset for speed cortex_mrna$exp <- cortex_mrna$exp[1:50, 1:5] cortex_mrna$exp <- fix_bad_mgi_symbols(cortex_mrna$exp)
Make sure celltypes don't contain characters that could interfere with downstream analyses. For example, the R package MAGMA.Celltyping cannot have spaces in celltype names because spaces are used as a delimiter in later steps.
fix_celltype_names( celltypes, replace_chars = "[-]|[.]|[ ]|[//]|[\\/]", make_unique = TRUE )
fix_celltype_names( celltypes, replace_chars = "[-]|[.]|[ ]|[//]|[\\/]", make_unique = TRUE )
celltypes |
Character vector of celltype names. |
replace_chars |
Regex string of characters to replace with "_" when renaming columns. |
make_unique |
Make all entries unique. |
Fixed celltype names.
ct <- c("microglia", "astryocytes", "Pyramidal SS") ct_fixed <- fix_celltype_names(celltypes = ct)
ct <- c("microglia", "astryocytes", "Pyramidal SS") ct_fixed <- fix_celltype_names(celltypes = ct)
generate_bootstrap_plots
takes a gene list and a single cell type
transcriptome dataset and generates plots which show how the expression of
the genes in the list compares to those in randomly generated gene lists.
generate_bootstrap_plots( sct_data = NULL, hits = NULL, bg = NULL, genelistSpecies = NULL, sctSpecies = NULL, output_species = "human", method = "homologene", reps = 100, annotLevel = 1, geneSizeControl = FALSE, full_results = NULL, listFileName = paste0("_level", annotLevel), adj_pval_thresh = 0.05, facets = "CellType", scales = "free_x", save_dir = file.path(tempdir(), "BootstrapPlots"), show_plot = TRUE, verbose = TRUE )
generate_bootstrap_plots( sct_data = NULL, hits = NULL, bg = NULL, genelistSpecies = NULL, sctSpecies = NULL, output_species = "human", method = "homologene", reps = 100, annotLevel = 1, geneSizeControl = FALSE, full_results = NULL, listFileName = paste0("_level", annotLevel), adj_pval_thresh = 0.05, facets = "CellType", scales = "free_x", save_dir = file.path(tempdir(), "BootstrapPlots"), show_plot = TRUE, verbose = TRUE )
sct_data |
List generated using generate_celltype_data. |
hits |
List of gene symbols containing the target gene list.
Will automatically be converted to human gene symbols
if |
bg |
List of gene symbols containing the background gene list
(including hit genes). If |
genelistSpecies |
Species that |
sctSpecies |
Species that |
output_species |
Species to convert |
method |
R package to use for gene mapping:
|
reps |
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results). |
annotLevel |
An integer indicating which level of |
geneSizeControl |
Whether you want to control for
GC content and transcript length. Recommended if the gene list originates
from genetic studies (Default: FALSE).
If set to |
full_results |
The full output of bootstrap_enrichment_test for the same gene list. |
listFileName |
String used as the root for files saved using this function. |
adj_pval_thresh |
Adjusted p-value threshold of celltypes to include in plots. |
facets |
|
scales |
Are scales shared across all facets (the default,
|
save_dir |
Directory where the BootstrapPlots folder should be saved, default is a temp directory. |
show_plot |
Print the plot. |
verbose |
Print messages. |
Saves a set of pdf files containing graphs and returns the file where
they are saved. These will be saved with the file name adjusted using the
value of listFileName
. The files are saved into the
'BootstrapPlot' folder.
Files start with one of the following:
qqplot_noText
: sorts the gene list according to how enriched
it is in the relevant cell type. Plots the value in the target list against
the mean value in the bootstrapped lists.
qqplot_wtGSym
: as above but labels the gene symbols for the
highest expressed genes.
bootDists
: rather than just showing the mean of the
bootstrapped lists, a boxplot shows the distribution of values
bootDists_LOG
: shows the bootstrapped distributions with the
y-axis shown on a log scale
## Load the single cell data sct_data <- ewceData::ctd() ## Set the parameters for the analysis ## Use 5 bootstrap lists for speed, for publishable analysis use >10000 reps <- 5 ## Load the gene list and get human orthologs hits <- ewceData::example_genelist() ## Bootstrap significance test, ## no control for transcript length or GC content ## Use pre-computed results to speed up example full_results <- EWCE::example_bootstrap_results() ### Skip this for example purposes # full_results <- EWCE::bootstrap_enrichment_test( # sct_data = sct_data, # hits = hits, # reps = reps, # annotLevel = 1, # sctSpecies = "mouse", # genelistSpecies = "human" # ) output <- EWCE::generate_bootstrap_plots( sct_data = sct_data, hits = hits, reps = reps, full_results = full_results, sctSpecies = "mouse", genelistSpecies = "human", annotLevel = 1 )
## Load the single cell data sct_data <- ewceData::ctd() ## Set the parameters for the analysis ## Use 5 bootstrap lists for speed, for publishable analysis use >10000 reps <- 5 ## Load the gene list and get human orthologs hits <- ewceData::example_genelist() ## Bootstrap significance test, ## no control for transcript length or GC content ## Use pre-computed results to speed up example full_results <- EWCE::example_bootstrap_results() ### Skip this for example purposes # full_results <- EWCE::bootstrap_enrichment_test( # sct_data = sct_data, # hits = hits, # reps = reps, # annotLevel = 1, # sctSpecies = "mouse", # genelistSpecies = "human" # ) output <- EWCE::generate_bootstrap_plots( sct_data = sct_data, hits = hits, reps = reps, full_results = full_results, sctSpecies = "mouse", genelistSpecies = "human", annotLevel = 1 )
Takes a gene list and a single cell type transcriptome dataset and generates plots which show how the expression of the genes in the list compares to those in randomly generated gene lists.
generate_bootstrap_plots_for_transcriptome( sct_data, tt, bg = NULL, thresh = 250, annotLevel = 1, reps = 100, full_results = NA, listFileName = "", showGNameThresh = 25, ttSpecies = NULL, sctSpecies = NULL, output_species = NULL, sortBy = "t", sig_only = TRUE, sig_col = "q", sig_thresh = 0.05, celltype_col = "CellType", plot_types = c("bootstrap", "bootstrap_distributions", "log_bootstrap_distributions"), save_dir = file.path(tempdir(), "BootstrapPlots"), method = "homologene", verbose = TRUE )
generate_bootstrap_plots_for_transcriptome( sct_data, tt, bg = NULL, thresh = 250, annotLevel = 1, reps = 100, full_results = NA, listFileName = "", showGNameThresh = 25, ttSpecies = NULL, sctSpecies = NULL, output_species = NULL, sortBy = "t", sig_only = TRUE, sig_col = "q", sig_thresh = 0.05, celltype_col = "CellType", plot_types = c("bootstrap", "bootstrap_distributions", "log_bootstrap_distributions"), save_dir = file.path(tempdir(), "BootstrapPlots"), method = "homologene", verbose = TRUE )
sct_data |
List generated using generate_celltype_data. |
tt |
Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols. |
bg |
List of gene symbols containing the background gene list
(including hit genes). If |
thresh |
The number of up- and down- regulated genes to be included in each analysis (Default: 250). |
annotLevel |
An integer indicating which level of |
reps |
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results). |
full_results |
The full output of ewce_expression_data for the same gene list. |
listFileName |
String used as the root for files saved using this function. |
showGNameThresh |
Integer. If a gene has over X percent of it's expression proportion in a cell type, then list the gene name. |
ttSpecies |
The species the differential expression table was generated from. |
sctSpecies |
Species that |
output_species |
Species to convert |
sortBy |
Column name of metric in |
sig_only |
Should plots only be generated for cells which have significant changes? |
sig_col |
Column name in |
sig_thresh |
Threshold by which to filter |
celltype_col |
Column within |
plot_types |
Plot types to generate. |
save_dir |
Directory where the BootstrapPlots folder should be saved, default is a temp directory. |
method |
R package to use for gene mapping:
|
verbose |
Print messages. |
Saves a set of PDF files containing graphs.
Then returns a nested list with each plot
and
the path
where it was saved to.
Files start with one of the following:
qqplot_noText
: sorts the gene list according to how enriched
it is in the relevant cell type. Plots the value in the target list against
the mean value in the bootstrapped lists.
qqplot_wtGSym
: as above but labels the gene symbols for the
highest expressed genes.
bootDists
: rather than just showing the mean of the
bootstrapped lists, a boxplot shows the distribution of values
bootDists_LOG
: shows the bootstrapped distributions with the
y-axis shown on a log scale
## Load the single cell data ctd <- ewceData::ctd() ## Set the parameters for the analysis ## Use 3 bootstrap lists for speed, for publishable analysis use >10,000 reps <- 3 annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons) ## Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 ## Load the top table tt_alzh <- ewceData::tt_alzh() ## See ?example_transcriptome_results for full code to produce tt_results tt_results <- EWCE::example_transcriptome_results() ## Bootstrap significance test, ## no control for transcript length or GC content savePath <- EWCE::generate_bootstrap_plots_for_transcriptome( sct_data = ctd, tt = tt_alzh, thresh = thresh, annotLevel = 1, full_results = tt_results, listFileName = "examples", reps = reps, ttSpecies = "human", sctSpecies = "mouse", # Only do one plot type for demo purposes plot_types = "bootstrap" )
## Load the single cell data ctd <- ewceData::ctd() ## Set the parameters for the analysis ## Use 3 bootstrap lists for speed, for publishable analysis use >10,000 reps <- 3 annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons) ## Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 ## Load the top table tt_alzh <- ewceData::tt_alzh() ## See ?example_transcriptome_results for full code to produce tt_results tt_results <- EWCE::example_transcriptome_results() ## Bootstrap significance test, ## no control for transcript length or GC content savePath <- EWCE::generate_bootstrap_plots_for_transcriptome( sct_data = ctd, tt = tt_alzh, thresh = thresh, annotLevel = 1, full_results = tt_results, listFileName = "examples", reps = reps, ttSpecies = "human", sctSpecies = "mouse", # Only do one plot type for demo purposes plot_types = "bootstrap" )
generate_celltype_data
takes gene expression data and
cell type annotations and creates CellTypeData (CTD) files which
contain matrices of mean expression and specificity per cell type.
generate_celltype_data( exp, annotLevels, groupName, no_cores = 1, savePath = tempdir(), file_prefix = "ctd", as_sparse = TRUE, as_DelayedArray = FALSE, normSpec = FALSE, convert_orths = FALSE, input_species = "mouse", output_species = "human", non121_strategy = "drop_both_species", method = "homologene", force_new_file = TRUE, specificity_quantiles = TRUE, numberOfBins = 40, dendrograms = TRUE, return_ctd = FALSE, verbose = TRUE, ... )
generate_celltype_data( exp, annotLevels, groupName, no_cores = 1, savePath = tempdir(), file_prefix = "ctd", as_sparse = TRUE, as_DelayedArray = FALSE, normSpec = FALSE, convert_orths = FALSE, input_species = "mouse", output_species = "human", non121_strategy = "drop_both_species", method = "homologene", force_new_file = TRUE, specificity_quantiles = TRUE, numberOfBins = 40, dendrograms = TRUE, return_ctd = FALSE, verbose = TRUE, ... )
exp |
Numerical matrix with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame. |
annotLevels |
List with arrays of strings containing the cell type
names associated with each column in |
groupName |
A human readable name for referring to the dataset being used. |
no_cores |
Number of cores that should be used to speedup the
computation.
NOTE: Use |
savePath |
Directory where the CTD file should be saved. |
file_prefix |
Prefix to add to saved CTD file name. |
as_sparse |
Convert |
as_DelayedArray |
Convert |
normSpec |
Boolean indicating whether specificity data should be transformed to a normal distribution by cell type, giving equivalent scores across all cell types. |
convert_orths |
If |
input_species |
The species that the |
output_species |
Species to convert |
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
method |
R package to use for gene mapping:
|
force_new_file |
If a file of the same name as the one being created already exists, overwrite it. |
specificity_quantiles |
Compute specificity quantiles.
Recommended to set to |
numberOfBins |
Number of quantile 'bins' to use (40 is recommended). |
dendrograms |
Add dendrogram plots |
return_ctd |
Return the CTD object in a list along with the file name, instead of just the file name. |
verbose |
Print messages. |
... |
Arguments passed on to
|
File names for the saved CellTypeData (CTD) files.
# Load the single cell data cortex_mrna <- ewceData::cortex_mrna() # Use only a subset to keep the example quick expData <- cortex_mrna$exp[1:100, ] l1 <- cortex_mrna$annot$level1class l2 <- cortex_mrna$annot$level2class annotLevels <- list(l1 = l1, l2 = l2) fNames_ALLCELLS <- EWCE::generate_celltype_data( exp = expData, annotLevels = annotLevels, groupName = "allKImouse" )
# Load the single cell data cortex_mrna <- ewceData::cortex_mrna() # Use only a subset to keep the example quick expData <- cortex_mrna$exp[1:100, ] l1 <- cortex_mrna$annot$level1class l2 <- cortex_mrna$annot$level2class annotLevels <- list(l1 = l1, l2 = l2) fNames_ALLCELLS <- EWCE::generate_celltype_data( exp = expData, annotLevels = annotLevels, groupName = "allKImouse" )
get_celltype_table
Generates a table that can be used for
supplemenary tables of publications.
The table lists how many cells are associated with each cell type, the level
of annotation, and the dataset from which it was generated.
get_celltype_table(annot)
get_celltype_table(annot)
annot |
An annotation dataframe, which columns named 'level1class', 'level2class' and 'dataset_name' |
A dataframe with columns 'name', 'level', 'freq' and 'dataset_name'
# See PrepLDSC.Rmd for origin of merged_ALLCELLS$annot cortex_mrna <- ewceData::cortex_mrna() cortex_mrna$annot$dataset_name <- "cortex_mrna" celltype_table <- EWCE::get_celltype_table(cortex_mrna$annot)
# See PrepLDSC.Rmd for origin of merged_ALLCELLS$annot cortex_mrna <- ewceData::cortex_mrna() cortex_mrna$annot$dataset_name <- "cortex_mrna" celltype_table <- EWCE::get_celltype_table(cortex_mrna$annot)
Assess whether an object is a DelayedArray or one of its derived object types.
is_delayed_array(X)
is_delayed_array(X)
X |
Object. |
boolean
Assess whether an object is a Matrix or one of its derived object types.
is_matrix(X)
is_matrix(X)
X |
Object. |
boolean
Assess whether an object is a sparse matrix or one of its derived object types.
is_sparse_matrix(X)
is_sparse_matrix(X)
X |
Object. |
boolean
List all species that EWCE can convert genes from/to. Wrapper function for map_species.
list_species(verbose = TRUE)
list_species(verbose = TRUE)
verbose |
Print messages. |
List of species EWCE can input/output genes as.
list_species()
list_species()
load_rdata
Load processed data (.rda format) using a function that assigns it to a specific variable (so you don't have to guess what the loaded variable name is).
load_rdata(fileName)
load_rdata(fileName)
fileName |
Name of the file to load. |
Data object.
tmp <- tempfile() save(mtcars, file = tmp) mtcars2 <- load_rdata(tmp)
tmp <- tempfile() save(mtcars, file = tmp) mtcars2 <- load_rdata(tmp)
Import CellTypeDataset (CTD) references from a remote repository, standardize each, and then merge into one CTD. Optionally, can return these as a merged SingleCellExperiment.
merge_ctd( CTD_list, save_dir = tempdir(), standardise_CTD = FALSE, as_SCE = FALSE, gene_union = TRUE, merge_levels = seq(1, 5), save_split_SCE = FALSE, save_split_CTD = FALSE, save_merged_SCE = TRUE, force_new_quantiles = FALSE, numberOfBins = 40, as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE, ... )
merge_ctd( CTD_list, save_dir = tempdir(), standardise_CTD = FALSE, as_SCE = FALSE, gene_union = TRUE, merge_levels = seq(1, 5), save_split_SCE = FALSE, save_split_CTD = FALSE, save_merged_SCE = TRUE, force_new_quantiles = FALSE, numberOfBins = 40, as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE, ... )
CTD_list |
(Named) list of |
save_dir |
The directory to save merged files in. |
standardise_CTD |
Whether to run |
as_SCE |
If |
gene_union |
Whether to take the gene union or intersection when merging matrices (mean_exp,specificity, etc.). |
merge_levels |
Which CTD levels you want to merge.
Can be a single value (e.g. |
save_split_SCE |
Whether to save individual SCE files in the subdirectory standardized_CTD_SCE. |
save_split_CTD |
Whether to save individual CTD files in the subdirectory standardized_CTD. |
save_merged_SCE |
Save the final merged SCE object, or simply to return it. |
force_new_quantiles |
If specificity quantiles matrix already exists, create a new one. |
numberOfBins |
Number of bins to compute specificity quantiles with. |
as_sparse |
Convert matrices to sparse matrix. |
as_DelayedArray |
Convert matrices to |
verbose |
Print messages. |
... |
Additional arguments to be passed to |
List of CellTypeDatasets or SingleCellExperiments.
## Let's pretend these are different CTD datasets ctd1 <- ewceData::ctd() ctd2 <- ctd1 CTD_list <- list(ctd1, ctd2) CTD_merged <- EWCE::merge_ctd(CTD_list = CTD_list)
## Let's pretend these are different CTD datasets ctd1 <- ewceData::ctd() ctd2 <- ctd1 CTD_list <- list(ctd1, ctd2) CTD_merged <- EWCE::merge_ctd(CTD_list = CTD_list)
SingleCellExperiment
objectsMerge several SingleCellExperiment
(SCE) objects from
different batches/experiments.
Extracted from the
scMerge package.
merge_sce( sce_list, method = "intersect", cut_off_batch = 0.01, cut_off_overall = 0.01, use_assays = NULL, colData_names = NULL, batch_names = NULL, verbose = TRUE )
merge_sce( sce_list, method = "intersect", cut_off_batch = 0.01, cut_off_overall = 0.01, use_assays = NULL, colData_names = NULL, batch_names = NULL, verbose = TRUE )
sce_list |
A list contains the |
method |
A string indicates the method of combining the
gene expression matrix, either |
cut_off_batch |
A numeric vector indicating the cut-off for the proportion of a gene is expressed within each batch. |
cut_off_overall |
A numeric vector indicating the cut-off for the proportion of a gene is expressed overall data. |
use_assays |
A string vector indicating the expression matrices to be combined. The first assay named will be used to determine the proportion of zeros. |
colData_names |
A string vector indicating the |
batch_names |
A string vector indicating the batch names for the output SCE object. |
verbose |
Print messages. |
A SingleCellExperiment
object with the list of SCE
objects combined.
Yingxin Lin (modified by Brian Schilder)
ctd <- ewceData::ctd() sce_list <- EWCE::ctd_to_sce(object = ctd) sce_combine <- merge_sce(sce_list = sce_list)
ctd <- ewceData::ctd() sce_list <- EWCE::ctd_to_sce(object = ctd) sce_combine <- merge_sce(sce_list = sce_list)
merge_two_expfiles
Used to combine two single cell type datasets.
merge_two_expfiles( exp1, exp2, annot1, annot2, name1 = "", name2 = "", as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE )
merge_two_expfiles( exp1, exp2, annot1, annot2, name1 = "", name2 = "", as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE )
exp1 |
Numerical expression matrix for dataset1 with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame. |
exp2 |
Numerical expression matrix for dataset2 with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame. |
annot1 |
Annotation data frame for dataset1 which contains three columns at least: cell_id, level1class and level2class |
annot2 |
Annotation data frame for dataset2 which contains three columns at least: cell_id, level1class and level2class |
name1 |
Name used to refer to dataset 1. Leave blank if it's already a merged dataset. |
name2 |
Name used to refer to dataset 2. Leave blank if it's already a merged dataset. |
as_sparse |
Convert the merged |
as_DelayedArray |
Convert the merged |
verbose |
Print messages. |
List containing merged exp and annot.
cortex_mrna <- ewceData::cortex_mrna() exp1 <- cortex_mrna$exp[, 1:50] exp2 <- cortex_mrna$exp[, 51:100] annot1 <- cortex_mrna$annot[1:50, ] annot2 <- cortex_mrna$annot[51:100, ] merged_res <- EWCE::merge_two_expfiles( exp1 = exp1, exp2 = exp2, annot1 = annot1, annot2 = annot2, name1 = "dataset1", name2 = "dataset2" )
cortex_mrna <- ewceData::cortex_mrna() exp1 <- cortex_mrna$exp[, 1:50] exp2 <- cortex_mrna$exp[, 51:100] annot1 <- cortex_mrna$annot[1:50, ] annot2 <- cortex_mrna$annot[51:100, ] merged_res <- EWCE::merge_two_expfiles( exp1 = exp1, exp2 = exp2, annot1 = annot1, annot2 = annot2, name1 = "dataset1", name2 = "dataset2" )
merged_ewce
combines enrichment results from multiple studies
targetting the same scientific problem
merged_ewce(results, reps = 100)
merged_ewce(results, reps = 100)
results |
a list of EWCE results generated using add_res_to_merging_list. |
reps |
Number of random gene lists to generate (Default=100 but should be >=10,000 for publication-quality results). |
dataframe in which each row gives the statistics (p-value, fold change and number of standard deviations from the mean) associated with the enrichment of the stated cell type in the gene list.
# Load the single cell data ctd <- ewceData::ctd() # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 # Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 # Load the data tt_alzh_BA36 <- ewceData::tt_alzh_BA36() tt_alzh_BA44 <- ewceData::tt_alzh_BA44() # Run EWCE analysis tt_results_36 <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh_BA36, thresh = thresh, annotLevel = 1, reps = reps, ttSpecies = "human", sctSpecies = "mouse" ) tt_results_44 <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh_BA44, thresh = thresh, annotLevel = 1, reps = reps, ttSpecies = "human", sctSpecies = "mouse" ) # Fill a list with the results results <- EWCE::add_res_to_merging_list(tt_results_36) results <- EWCE::add_res_to_merging_list(tt_results_44, results) # Perform the merged analysis # For publication reps should be higher merged_res <- EWCE::merged_ewce( results = results, reps = 2 ) print(merged_res)
# Load the single cell data ctd <- ewceData::ctd() # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps <- 3 # Use 5 up/down regulated genes (thresh) for speed, default is 250 thresh <- 5 # Load the data tt_alzh_BA36 <- ewceData::tt_alzh_BA36() tt_alzh_BA44 <- ewceData::tt_alzh_BA44() # Run EWCE analysis tt_results_36 <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh_BA36, thresh = thresh, annotLevel = 1, reps = reps, ttSpecies = "human", sctSpecies = "mouse" ) tt_results_44 <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh_BA44, thresh = thresh, annotLevel = 1, reps = reps, ttSpecies = "human", sctSpecies = "mouse" ) # Fill a list with the results results <- EWCE::add_res_to_merging_list(tt_results_36) results <- EWCE::add_res_to_merging_list(tt_results_44, results) # Perform the merged analysis # For publication reps should be higher merged_res <- EWCE::merged_ewce( results = results, reps = 2 ) print(merged_res)
Plot CellTypeData metrics such as mean_exp, specificity and/or specificity_quantiles.
plot_ctd(ctd, genes, level = 1, metric = "specificity", show_plot = TRUE)
plot_ctd(ctd, genes, level = 1, metric = "specificity", show_plot = TRUE)
ctd |
CellTypeDataset. |
genes |
Which genes in |
level |
Annotation level in |
metric |
Which metric in the
|
show_plot |
Whether to print the plot or simply return it. |
ggplot object.
ctd <- ewceData::ctd() plt <- EWCE::plot_ctd(ctd, genes = c("Apoe", "Gfap", "Gapdh"))
ctd <- ewceData::ctd() plt <- EWCE::plot_ctd(ctd, genes = c("Apoe", "Gfap", "Gapdh"))
prep_dendro
adds a dendrogram to a CellTypeDataset (CTD).
prep.dendro(ctdIN)
prep.dendro(ctdIN)
ctdIN |
A single annotLevel of a ctd, i.e. ctd[[1]] (the function is intended to be used via apply). |
A CellTypeDataset with dendrogram plotting info added.
Normalize expression matrix by accounting for library size. Uses sctransform.
sct_normalize(exp, as_sparse = TRUE, verbose = TRUE)
sct_normalize(exp, as_sparse = TRUE, verbose = TRUE)
exp |
Gene x cell expression matrix. |
as_sparse |
Convert |
verbose |
Print messages. |
Normalised expression matrix.
cortex_mrna <- ewceData::cortex_mrna() exp_sct_normed <- EWCE::sct_normalize(exp = cortex_mrna$exp[1:300, ])
cortex_mrna <- ewceData::cortex_mrna() exp_sct_normed <- EWCE::sct_normalize(exp = cortex_mrna$exp[1:300, ])
This function will take a CTD,
drop all genes without 1:1 orthologs with the
output_species
("human" by default),
convert the remaining genes to gene symbols,
assign names to each level,
and convert all matrices to sparse matrices and/or DelayedArray
.
standardise_ctd( ctd, dataset, input_species = NULL, output_species = "human", sctSpecies_origin = input_species, non121_strategy = "drop_both_species", method = "homologene", force_new_quantiles = TRUE, force_standardise = FALSE, remove_unlabeled_clusters = FALSE, numberOfBins = 40, keep_annot = TRUE, keep_plots = TRUE, as_sparse = TRUE, as_DelayedArray = FALSE, rename_columns = TRUE, make_columns_unique = FALSE, verbose = TRUE, ... )
standardise_ctd( ctd, dataset, input_species = NULL, output_species = "human", sctSpecies_origin = input_species, non121_strategy = "drop_both_species", method = "homologene", force_new_quantiles = TRUE, force_standardise = FALSE, remove_unlabeled_clusters = FALSE, numberOfBins = 40, keep_annot = TRUE, keep_plots = TRUE, as_sparse = TRUE, as_DelayedArray = FALSE, rename_columns = TRUE, make_columns_unique = FALSE, verbose = TRUE, ... )
ctd |
Input CellTypeData. |
dataset |
CellTypeData. name. |
input_species |
Which species the gene names in |
output_species |
Which species' genes names to convert |
sctSpecies_origin |
Species that the |
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
method |
R package to use for gene mapping:
|
force_new_quantiles |
By default, quantile computation is
skipped if they have already been computed.
Set |
force_standardise |
If |
remove_unlabeled_clusters |
Remove any samples that have numeric column names. |
numberOfBins |
Number of non-zero quantile bins. |
keep_annot |
Keep the column annotation data if provided. |
keep_plots |
Keep the dendrograms if provided. |
as_sparse |
Convert to sparse matrix. |
as_DelayedArray |
Convert to |
rename_columns |
Remove |
make_columns_unique |
Rename each columns with the prefix
|
verbose |
Print messages.
Set |
... |
Arguments passed on to
|
Standardised CellTypeDataset.
ctd <- ewceData::ctd() ctd_std <- EWCE::standardise_ctd( ctd = ctd, input_species = "mouse", dataset = "Zeisel2016" )
ctd <- ewceData::ctd() ctd_std <- EWCE::standardise_ctd( ctd = ctd, input_species = "mouse", dataset = "Zeisel2016" )