Package 'EWCE' reference manual

Title:	Expression Weighted Celltype Enrichment
Description:	Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.
Authors:	Alan Murphy [cre] , Brian Schilder [aut] , Nathan Skene [aut]
Maintainer:	Alan Murphy <[email protected]>
License:	GPL-3
Version:	1.15.1
Built:	2025-03-17 03:53:41 UTC
Source:	https://github.com/bioc/EWCE

EWCE: Expression Weighted Celltype Enrichment

Description

Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.

Details

EWCE: Expression Weighted Celltype Enrichment

The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.

Author(s)

Maintainer: Alan Murphy [email protected] (ORCID)

Authors:

Brian Schilder [email protected] (ORCID)
Nathan Skene [email protected] (ORCID)

Add to results to merging list

Description

add_res_to_merging_list adds EWCE results to a list for merging analysis.

Usage

add_res_to_merging_list(full_res, existing_results = NULL)
add_res_to_merging_list(full_res, existing_results = NULL)

Arguments

`full_res`	Results list generated using bootstrap_enrichment_test or ewce_expression_data functions. Multiple results tables can be merged into one results table, as long as the 'list' column is set to distinguish them.
`existing_results`	Output of previous rounds from adding results to list. Leave empty if this is the first item in the list.

Value

Merged results list.

Examples

# Load the single cell data
ctd <- ewceData::ctd()

# Load the data
tt_alzh <- ewceData::tt_alzh()
# tt_alzh_BA36 <- ewceData::tt_alzh_BA36()
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
# Run EWCE analysis
# tt_results <- ewce_expression_data(
#    sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh,
#    reps = reps, ttSpecies = "human", sctSpecies = "mouse"
# )
# tt_results_36 <- ewce_expression_data(
#    sct_data = ctd, tt = tt_alzh_BA36, annotLevel = 1, thresh = thresh,
#    reps = reps, ttSpecies = "human", sctSpecies = "mouse"
# )

# Fill a list with the results
results <- add_res_to_merging_list(tt_alzh)
# results <- add_res_to_merging_list(tt_alzh_BA36, results)
# Load the single cell data
ctd <- ewceData::ctd()

# Load the data
tt_alzh <- ewceData::tt_alzh()
# tt_alzh_BA36 <- ewceData::tt_alzh_BA36()
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
# Run EWCE analysis
# tt_results <- ewce_expression_data(
#    sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh,
#    reps = reps, ttSpecies = "human", sctSpecies = "mouse"
# )
# tt_results_36 <- ewce_expression_data(
#    sct_data = ctd, tt = tt_alzh_BA36, annotLevel = 1, thresh = thresh,
#    reps = reps, ttSpecies = "human", sctSpecies = "mouse"
# )

# Fill a list with the results
results <- add_res_to_merging_list(tt_alzh)
# results <- add_res_to_merging_list(tt_alzh_BA36, results)

`bin_columns_into_quantiles`

Description

bin_columns_into_quantiles is an internal function used to convert a vector of specificity into a vector of specificity quantiles. This function can be iterated across a matrix using apply to create a matrix of specificity quantiles.

Usage

bin_columns_into_quantiles(
  vec,
  numberOfBins = 40,
  defaultBin = as.integer(numberOfBins/2)
)
bin_columns_into_quantiles(
  vec,
  numberOfBins = 40,
  defaultBin = as.integer(numberOfBins/2)
)

Arguments

`vec`	The vector of gene of specificity values.
`numberOfBins`	Number of quantile bins to use (40 is recommended).
`defaultBin`	Which bin to assign when there's only one non-zero quantile. In situations where there's only one non-zero quantile, cut throws an error. Avoid these situations by using a default quantile.

Value

A vector with same length as vec but with columns storing quantiles instead of specificity.

Examples

ctd <- ewceData::ctd()
ctd[[1]]$specificity_quantiles <- apply(ctd[[1]]$specificity, 2,
    FUN = bin_columns_into_quantiles)
ctd <- ewceData::ctd()
ctd[[1]]$specificity_quantiles <- apply(ctd[[1]]$specificity, 2,
    FUN = bin_columns_into_quantiles)

bin_specificity_into_quantiles

Description

bin_specificity_into_quantiles is an internal function used to convert add '$specificity_quantiles' to a ctd

Usage

bin_specificity_into_quantiles(
  ctdIN,
  numberOfBins,
  matrix_name = "specificity_quantiles",
  as_sparse = TRUE,
  verbose = TRUE
)
bin_specificity_into_quantiles(
  ctdIN,
  numberOfBins,
  matrix_name = "specificity_quantiles",
  as_sparse = TRUE,
  verbose = TRUE
)

Arguments

`ctdIN`	A single annotLevel of a ctd, i.e. `ctd[[1]]` (the function is intended to be used via `apply`).
`numberOfBins`	Number of quantile 'bins' to use (40 is recommended).
`matrix_name`	Name of the specificity matrix to create (default: "specificity_quantiles").
`as_sparse`	Convert to sparseMatrix.
`verbose`	Print messages.

Value

A ctd with "specificity_quantiles" matrix in each level (or whatever matrix_name was set to.).

Examples

ctd <- ewceData::ctd()
ctd <- lapply(ctd, EWCE::bin_specificity_into_quantiles, numberOfBins = 40)
print(ctd[[1]]$specificity_quantiles[1:3, ])
ctd <- ewceData::ctd()
ctd <- lapply(ctd, EWCE::bin_specificity_into_quantiles, numberOfBins = 40)
print(ctd[[1]]$specificity_quantiles[1:3, ])

Bootstrap cell type enrichment test

Description

bootstrap_enrichment_test takes a genelist and a single cell type transcriptome dataset and determines the probability of enrichment and fold changes for each cell type.

Usage

bootstrap_enrichment_test(
  sct_data = NULL,
  hits = NULL,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  sctSpecies_origin = sctSpecies,
  output_species = "human",
  method = "homologene",
  reps = 100,
  no_cores = 1,
  annotLevel = 1,
  geneSizeControl = FALSE,
  controlledCT = NULL,
  mtc_method = "BH",
  sort_results = TRUE,
  standardise_sct_data = TRUE,
  standardise_hits = FALSE,
  verbose = TRUE,
  localHub = FALSE,
  store_gene_data = TRUE
)
bootstrap_enrichment_test(
  sct_data = NULL,
  hits = NULL,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  sctSpecies_origin = sctSpecies,
  output_species = "human",
  method = "homologene",
  reps = 100,
  no_cores = 1,
  annotLevel = 1,
  geneSizeControl = FALSE,
  controlledCT = NULL,
  mtc_method = "BH",
  sort_results = TRUE,
  standardise_sct_data = TRUE,
  standardise_hits = FALSE,
  verbose = TRUE,
  localHub = FALSE,
  store_gene_data = TRUE
)

Arguments

`sct_data`	List generated using generate_celltype_data.
`hits`	List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if `geneSizeControl=TRUE`.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`genelistSpecies`	Species that `hits` genes came from (no longer limited to just "mouse" and "human"). See list_species for all available species.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`sctSpecies_origin`	Species that the `sct_data` originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`reps`	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
`no_cores`	Number of cores to parallelise bootstrapping `reps` over.
`annotLevel`	An integer indicating which level of `sct_data` to analyse (Default: 1).
`geneSizeControl`	Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to `TRUE`, then `hits` must be from humans.
`controlledCT`	[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.
`mtc_method`	Multiple-testing correction method (passed to p.adjust).
`sort_results`	Sort enrichment results from smallest to largest p-values.
`standardise_sct_data`	Should `sct_data` be standardised? if `TRUE`: When `sctSpecies!=output_species` the `sct_data` will be checked for object formatting and the genes will be converted to the orthologs of the `output_species` with standardise_ctd (which calls map_genes internally). When `sctSpecies==output_species`, the `sct_data` will be checked for object formatting with standardise_ctd, but the gene names will remain untouched.
`standardise_hits`	Should `hits` be standardised? If `TRUE`: When `genelistSpecies!=output_species`, the genes will be converted to the orthologs of the `output_species` with convert_orthologs. When `genelistSpecies==output_species`, the genes will be standardised with map_genes. If `FALSE`, `hits` will be passed on to subsequent steps as-is.
`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.
`store_gene_data`	Store sampled gene data for every bootstrap iteration. When the number of bootstrap `reps` is very high (>=100k) and/or the number of genes in `hits` is very high, you may want to set `store_gene_data=FALSE` to avoid using excessive amounts of CPU memory.

Value

A list containing three elements:

hit.cells: vector containing the summed proportion of expression in each cell type for the target list.
gene_data: data.table showing the number of time each gene appeared in the bootstrap sample.
bootstrap_data: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists
controlledCT: the controlled cell type (if applicable)

Examples

# Load the single cell data
sct_data <- ewceData::ctd()
# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >=10,000
reps <- 3
# Load gene list from Alzheimer's disease GWAS
hits <- ewceData::example_genelist()

# Bootstrap significance test, no control for transcript length or GC content
full_results <- EWCE::bootstrap_enrichment_test(
    sct_data = sct_data,
    hits = hits,
    reps = reps,
    annotLevel = 1,
    sctSpecies = "mouse",
    genelistSpecies = "human")
# Load the single cell data
sct_data <- ewceData::ctd()
# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >=10,000
reps <- 3
# Load gene list from Alzheimer's disease GWAS
hits <- ewceData::example_genelist()

# Bootstrap significance test, no control for transcript length or GC content
full_results <- EWCE::bootstrap_enrichment_test(
    sct_data = sct_data,
    hits = hits,
    reps = reps,
    annotLevel = 1,
    sctSpecies = "mouse",
    genelistSpecies = "human")

check_ewce_genelist_inputs

Description

check_ewce_genelist_inputs Is used to check that hits and bg gene lists passed to EWCE are setup correctly. Checks they are the appropriate length. Checks all hits are in bg. Checks the species match and if not reduces to 1:1 orthologs.

Usage

check_ewce_genelist_inputs(
  sct_data,
  hits,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  sctSpecies_origin = sctSpecies,
  output_species = "human",
  method = "homologene",
  geneSizeControl = FALSE,
  standardise_sct_data = TRUE,
  standardise_hits = FALSE,
  min_genes = 4,
  verbose = TRUE
)
check_ewce_genelist_inputs(
  sct_data,
  hits,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  sctSpecies_origin = sctSpecies,
  output_species = "human",
  method = "homologene",
  geneSizeControl = FALSE,
  standardise_sct_data = TRUE,
  standardise_hits = FALSE,
  min_genes = 4,
  verbose = TRUE
)

Arguments

`sct_data`	List generated using generate_celltype_data.
`hits`	List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if `geneSizeControl=TRUE`.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`genelistSpecies`	Species that `hits` genes came from (no longer limited to just "mouse" and "human"). See list_species for all available species.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`sctSpecies_origin`	Species that the `sct_data` originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`geneSizeControl`	Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to `TRUE`, then `hits` must be from humans.
`standardise_sct_data`	Should `sct_data` be standardised? if `TRUE`: When `sctSpecies!=output_species` the `sct_data` will be checked for object formatting and the genes will be converted to the orthologs of the `output_species` with standardise_ctd (which calls map_genes internally). When `sctSpecies==output_species`, the `sct_data` will be checked for object formatting with standardise_ctd, but the gene names will remain untouched.
`standardise_hits`	Should `hits` be standardised? If `TRUE`: When `genelistSpecies!=output_species`, the genes will be converted to the orthologs of the `output_species` with convert_orthologs. When `genelistSpecies==output_species`, the genes will be standardised with map_genes. If `FALSE`, `hits` will be passed on to subsequent steps as-is.
`min_genes`	Minimum number of genes in a gene list to test.
`verbose`	Print messages.

Value

A list containing

hits: Array of MGI/HGNC gene symbols containing the target gene list.
bg: Array of MGI/HGNC gene symbols containing the background gene list.

Examples

ctd <- ewceData::ctd()
example_genelist <- ewceData::example_genelist()

# Called from "bootstrap_enrichment_test()" and "generate_bootstrap_plots()"
checkedLists <- EWCE::check_ewce_genelist_inputs(
    sct_data = ctd,
    hits = example_genelist,
    sctSpecies = "mouse",
    genelistSpecies = "human"
)
ctd <- ewceData::ctd()
example_genelist <- ewceData::example_genelist()

# Called from "bootstrap_enrichment_test()" and "generate_bootstrap_plots()"
checkedLists <- EWCE::check_ewce_genelist_inputs(
    sct_data = ctd,
    hits = example_genelist,
    sctSpecies = "mouse",
    genelistSpecies = "human"
)

Get percentage of target cell type hits

Description

After you run bootstrap_enrichment_test, check what percentage of significantly enriched cell types match an expected cell type.

Usage

check_percent_hits(
  full_results,
  target_celltype,
  mtc_method = "bonferroni",
  q_threshold = 0.05,
  verbose = TRUE
)
check_percent_hits(
  full_results,
  target_celltype,
  mtc_method = "bonferroni",
  q_threshold = 0.05,
  verbose = TRUE
)

Arguments

`full_results`	`bootstrap_enrichment_test` results.
`target_celltype`	Substring to search to matching cell types (case-insensitive).
`mtc_method`	Multiple-testing correction method.
`q_threshold`	Corrected significance threshold.
`verbose`	Print messages.

Value

Report list.

Examples

## Bootstrap significance test,
##  no control for transcript length or GC content
## Use pre-computed results to speed up example
full_results <- EWCE::example_bootstrap_results()

report <- EWCE::check_percent_hits(
    full_results = full_results,
    target_celltype = "microglia"
)
## Bootstrap significance test,
##  no control for transcript length or GC content
## Use pre-computed results to speed up example
full_results <- EWCE::example_bootstrap_results()

report <- EWCE::check_percent_hits(
    full_results = full_results,
    target_celltype = "microglia"
)

Celltype controlled geneset enrichment

Description

controlled_geneset_enrichment tests whether a functional gene set is still enriched in a disease gene set after controlling for the disease gene set's enrichment in a particular cell type (the 'controlledCT')

Usage

controlled_geneset_enrichment(
  disease_genes,
  functional_genes,
  bg = NULL,
  sct_data,
  sctSpecies = NULL,
  output_species = "human",
  disease_genes_species = NULL,
  functional_genes_species = NULL,
  method = "homologene",
  annotLevel,
  reps = 100,
  controlledCT,
  use_intersect = FALSE,
  verbose = TRUE
)
controlled_geneset_enrichment(
  disease_genes,
  functional_genes,
  bg = NULL,
  sct_data,
  sctSpecies = NULL,
  output_species = "human",
  disease_genes_species = NULL,
  functional_genes_species = NULL,
  method = "homologene",
  annotLevel,
  reps = 100,
  controlledCT,
  use_intersect = FALSE,
  verbose = TRUE
)

Arguments

`disease_genes`	Array of gene symbols containing the disease gene list. Does not have to be disease genes. Must be from same species as the single cell transcriptome dataset.
`functional_genes`	Array of gene symbols containing the functional gene list. The enrichment of this gene set within the disease_genes is tested. Must be from same species as the single cell transcriptome dataset.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`sct_data`	List generated using generate_celltype_data.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`disease_genes_species`	Species of the `disease_genes` gene set.
`functional_genes_species`	Species of the `functional_genes` gene set.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`annotLevel`	An integer indicating which level of `sct_data` to analyse (Default: 1).
`reps`	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
`controlledCT`	[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.
`use_intersect`	When `species1` and `species2` are both different from `output_species`, this argument will determine whether to use the intersect (`TRUE`) or union (`FALSE`) of all genes from `species1` and `species2`.
`verbose`	Print messages.

Value

A list containing three data frames:

p_controlled The probability that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT
z_controlled The z-score that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT
p_uncontrolled The probability that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT
z_uncontrolled The z-score that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT
reps=reps
controlledCT
actualOverlap=actual The number of genes that overlap between functional and disease gene sets

Examples

# See the vignette for more detailed explanations
# Gene set enrichment analysis controlling for cell type expression
# set seed for bootstrap reproducibility
set.seed(12345678)
## load merged dataset from vignette
ctd <- ewceData::ctd()
schiz_genes <- ewceData::schiz_genes()
hpsd_genes <- ewceData::hpsd_genes()
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3

res_hpsd_schiz <- EWCE::controlled_geneset_enrichment(
    disease_genes = schiz_genes,
    functional_genes = hpsd_genes,
    sct_data = ctd,
    annotLevel = 1,
    reps = reps,
    controlledCT = "pyramidal CA1"
)
# See the vignette for more detailed explanations
# Gene set enrichment analysis controlling for cell type expression
# set seed for bootstrap reproducibility
set.seed(12345678)
## load merged dataset from vignette
ctd <- ewceData::ctd()
schiz_genes <- ewceData::schiz_genes()
hpsd_genes <- ewceData::hpsd_genes()
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3

res_hpsd_schiz <- EWCE::controlled_geneset_enrichment(
    disease_genes = schiz_genes,
    functional_genes = hpsd_genes,
    sct_data = ctd,
    annotLevel = 1,
    reps = reps,
    controlledCT = "pyramidal CA1"
)

CellTypeDataset to SingleCellExperiment

Description

Copied from scKirby, which is not yet on CRAN or Bioconductor.

Usage

ctd_to_sce(object, as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE)
ctd_to_sce(object, as_sparse = TRUE, as_DelayedArray = FALSE, verbose = TRUE)

Arguments

`object`	CellTypeDataset object.
`as_sparse`	Store SingleCellExperiment matrices as sparse.
`as_DelayedArray`	Store SingleCellExperiment matrices as DelayedArray.
`verbose`	Print messages.

Value

SingleCellExperiment

Examples

ctd <- ewceData::ctd()
sce <- EWCE::ctd_to_sce(ctd)
ctd <- ewceData::ctd()
sce <- EWCE::ctd_to_sce(ctd)

Drop uninformative genes

Description

drop_uninformative_genes drops uninformative genes in order to reduce compute time and noise in subsequent steps. It achieves this through several steps, each of which are optional:

Drop non-1:1 orthologs:
Removes genes that don't have 1:1 orthologs with the output_species ("human" by default).
Drop non-varying genes:
Removes genes that don't vary across cells based on variance deciles.
Drop non-differentially expressed genes (DEGs):
Removes genes that are not significantly differentially expressed across cell-types (multiple DEG methods available).

Usage

drop_uninformative_genes(
  exp,
  level2annot,
  mtc_method = "BH",
  adj_pval_thresh = 1e-05,
  convert_orths = FALSE,
  input_species = NULL,
  output_species = "human",
  non121_strategy = "drop_both_species",
  method = "homologene",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  return_sce = FALSE,
  no_cores = 1,
  verbose = TRUE,
  ...
)
drop_uninformative_genes(
  exp,
  level2annot,
  mtc_method = "BH",
  adj_pval_thresh = 1e-05,
  convert_orths = FALSE,
  input_species = NULL,
  output_species = "human",
  non121_strategy = "drop_both_species",
  method = "homologene",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  return_sce = FALSE,
  no_cores = 1,
  verbose = TRUE,
  ...
)

Arguments

`exp`	Expression matrix with gene names as rownames.
`level2annot`	Array of cell types, with each sequentially corresponding a column in the expression matrix.
`mtc_method`	Multiple-testing correction method used by DGE step. See p.adjust for more details.
`adj_pval_thresh`	Minimum differential expression significance that a gene must demonstrate across `level2annot` (i.e. cell types).
`convert_orths`	If `input_species!=output_species` and `convert_orths=TRUE`, will drop genes without 1:1 `output_species` orthologs and then convert `exp` gene names to those of `output_species`.
`input_species`	Which species the gene names in `exp` come from. See list_species for all available species.
`output_species`	Which species' genes names to convert `exp` to. See list_species for all available species.
`non121_strategy`	How to handle genes that don't have 1:1 mappings between `input_species`:`output_species`. Options include: `"drop_both_species" or "dbs" or 1` : Drop genes that have duplicate mappings in either the `input_species` or `output_species` (DEFAULT). `"drop_input_species" or "dis" or 2` : Only drop genes that have duplicate mappings in the `input_species`. `"drop_output_species" or "dos" or 3` : Only drop genes that have duplicate mappings in the `output_species`. `"keep_both_species" or "kbs" or 4` : Keep all genes regardless of whether they have duplicate mappings in either species. `"keep_popular" or "kp" or 5` : Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs. `"sum","mean","median","min" or "max"` : When `gene_df` is a matrix and `gene_output="rownames"`, these options will aggregate many-to-one gene mappings (`input_species`-to-`output_species`) after dropping any duplicate genes in the `output_species`.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`as_sparse`	Convert `exp` to sparse matrix.
`as_DelayedArray`	Convert `exp` to `DelayedArray` for scalable processing.
`return_sce`	Whether to return the filtered results as an expression matrix or a SingleCellExperiment.
`no_cores`	Number of cores to parallelise across. Set to `NULL` to automatically optimise.
`verbose`	Print messages. #' @inheritParams orthogene::convert_orthologs
`...`	Arguments passed on to `orthogene::convert_orthologs` `gene_df` Data object containing the genes (see `gene_input` for options on how the genes can be stored within the object). Can be one of the following formats: `matrix` : A sparse or dense matrix. `data.frame` : A `data.frame`, `data.table`. or `tibble`. codelist : A `list` or character `vector`. Genes, transcripts, proteins, SNPs, or genomic ranges can be provided in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to gene symbols unless specified otherwise with the `...` arguments. Note: If you set `method="homologene"`, you must either supply genes in gene symbol format (e.g. "Sox2") OR set `standardise_genes=TRUE`. `gene_input` Which aspect of `gene_df` to get gene names from: `"rownames"` : From row names of data.frame/matrix. `"colnames"` : From column names of data.frame/matrix. `<column name>` : From a column in `gene_df`, e.g. `"gene_names"`. `gene_output` How to return genes. Options include: `"rownames"` : As row names of `gene_df`. `"colnames"` : As column names of `gene_df`. `"columns"` : As new columns "input_gene", "ortholog_gene" (and "input_gene_standard" if `standardise_genes=TRUE`) in `gene_df`. `"dict"` : As a dictionary (named list) where the names are input_gene and the values are ortholog_gene. `"dict_rev"` : As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene. `standardise_genes` If `TRUE` AND `gene_output="columns"`, a new column "input_gene_standard" will be added to `gene_df` containing standardised HGNC symbols identified by gorth. `drop_nonorths` Drop genes that don't have an ortholog in the `output_species`. `agg_fun` Aggregation function passed to aggregate_mapped_genes. Set to `NULL` to skip aggregation step (default). `mthreshold` Maximum number of ortholog names per gene to show. Passed to gorth. Only used when `method="gprofiler"` (DEFAULT : `Inf`). `sort_rows` Sort `gene_df` rows alphanumerically. `gene_map` A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows: `gene_map=<data.frame>` : When a data.frame containing the gene key:value columns (specified by `input_col` and `output_col`, respectively) is provided, this will be used to perform aggregation/expansion. `gene_map=NULL` and `input_species!=output_species` : A `gene_map` is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion. `gene_map=NULL` and `input_species==output_species` : A `gene_map` is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion. `input_col` Column name within `gene_map` with gene names matching the row names of `X`. `output_col` Column name within `gene_map` with gene names that you wish you map the row names of `X` onto.

Value

exp Expression matrix with gene names as row names.

Examples

cortex_mrna <- ewceData::cortex_mrna()
# Use only a subset of genes to keep the example quick
cortex_mrna$exp <- cortex_mrna$exp[1:300, ]

## Convert orthologs at the same time
exp2_orth <- drop_uninformative_genes(
    exp = cortex_mrna$exp,
    level2annot = cortex_mrna$annot$level2class,
    input_species = "mouse"
)
cortex_mrna <- ewceData::cortex_mrna()
# Use only a subset of genes to keep the example quick
cortex_mrna$exp <- cortex_mrna$exp[1:300, ]

## Convert orthologs at the same time
exp2_orth <- drop_uninformative_genes(
    exp = cortex_mrna$exp,
    level2annot = cortex_mrna$annot$level2class,
    input_species = "mouse"
)

Bootstrap cell type enrichment test for transcriptome data

Description

ewce_expression_data takes a differential gene expression (DGE) results table and determines the probability of cell type enrichment in the up- and down- regulated genes.

Usage

ewce_expression_data(
  sct_data,
  annotLevel = 1,
  tt,
  sortBy = "t",
  thresh = 250,
  reps = 100,
  ttSpecies = NULL,
  sctSpecies = NULL,
  output_species = NULL,
  bg = NULL,
  method = "homologene",
  verbose = TRUE,
  localHub = FALSE
)
ewce_expression_data(
  sct_data,
  annotLevel = 1,
  tt,
  sortBy = "t",
  thresh = 250,
  reps = 100,
  ttSpecies = NULL,
  sctSpecies = NULL,
  output_species = NULL,
  bg = NULL,
  method = "homologene",
  verbose = TRUE,
  localHub = FALSE
)

Arguments

`sct_data`	List generated using generate_celltype_data.
`annotLevel`	An integer indicating which level of `sct_data` to analyse (Default: 1).
`tt`	Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols.
`sortBy`	Column name of metric in `tt` which should be used to sort up- from down- regulated genes (Default: "t").
`thresh`	The number of up- and down- regulated genes to be included in each analysis (Default: 250).
`reps`	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
`ttSpecies`	The species the differential expression table was generated from.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

Value

A list containing five data frames:

results: dataframe in which each row gives the statistics (p-value, fold change and number of standard deviations from the mean) associated with the enrichment of the stated cell type in the gene list. An additional column *Direction* stores whether it the result is from the up or downregulated set.
hit.cells.up: vector containing the summed proportion of expression in each cell type for the target list.
hit.cells.down: vector containing the summed proportion of expression in each cell type for the target list.
bootstrap_data.up: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists.
bootstrap_data.down: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists.

Examples

# Load the single cell data
ctd <- ewceData::ctd()

# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)

# Load the top table
tt_alzh <- ewceData::tt_alzh()

tt_results <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh,
    annotLevel = 1,
    thresh = thresh,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)
# Load the single cell data
ctd <- ewceData::ctd()

# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)

# Load the top table
tt_alzh <- ewceData::tt_alzh()

tt_results <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh,
    annotLevel = 1,
    thresh = thresh,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)

Plot EWCE results

Description

ewce_plot generates plots of EWCE enrichment results

Usage

ewce_plot(
  total_res,
  mtc_method = "bonferroni",
  q_threshold = 0.05,
  ctd = NULL,
  annotLevel = 1,
  heights = c(0.3, 1),
  make_dendro = FALSE,
  verbose = TRUE
)
ewce_plot(
  total_res,
  mtc_method = "bonferroni",
  q_threshold = 0.05,
  ctd = NULL,
  annotLevel = 1,
  heights = c(0.3, 1),
  make_dendro = FALSE,
  verbose = TRUE
)

Arguments

`total_res`	Results data.frame generated using bootstrap_enrichment_test or ewce_expression_data functions. Multiple results tables can be merged into one results table, as long as the 'list' column is set to distinguish them. Multiple testing correction is then applied across all merged results.
`mtc_method`	Method to be used for multiple testing correction. Argument is passed to p.adjust (DEFAULT: "bonferroni).
`q_threshold`	Corrected significance threshold.
`ctd`	CellTypeDataset object. Should be provided so that the dendrogram can be taken from it and added to plots.
`annotLevel`	An integer indicating which level of `ctd` to analyse (Default: 1).
`heights`	The relative heights row in the grid. Will get repeated to match the dimensions of the grid. Passed to wrap_plots.
`make_dendro`	Add a dendrogram (requires `ctd`).
`verbose`	Print messages.

Value

A named list containing versions of the ggplot with and without the dendrogram. Note that cell type order on the x-axis is based on hierarchical clustering for both plots if make_dendro = TRUE.

Examples

## Bootstrap significance test,
##  no control for transcript length or GC content
## Use pre-computed results to speed up example
total_res <- EWCE::example_bootstrap_results()$results 
plt <- ewce_plot(total_res = total_res)
## Bootstrap significance test,
##  no control for transcript length or GC content
## Use pre-computed results to speed up example
total_res <- EWCE::example_bootstrap_results()$results 
plt <- ewce_plot(total_res = total_res)

Example bootstrap enrichment results

Description

Example cell type enrichment results produced by bootstrap_enrichment_test.

Usage

example_bootstrap_results(verbose = TRUE, localHub = FALSE)
example_bootstrap_results(verbose = TRUE, localHub = FALSE)

Arguments

`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

Value

List with 3 items.

Source

# Load the single cell data

ctd <- ewceData::ctd()

# Set the parameters for the analysis

# Use 3 bootstrap lists for speed, for publishable analysis use >=10,000

reps <- 3

# Load gene list from Alzheimer's disease GWAS

example_genelist <- ewceData::example_genelist()

# Bootstrap significance test, no control for transcript length or GC content

full_results <- EWCE::bootstrap_enrichment_test( sct_data = ctd, hits = example_genelist, reps = reps, annotLevel = 1, sctSpecies = "mouse", genelistSpecies = "human" )

bootstrap_results <- full_results

save(bootstrap_results,file = "inst/extdata/bootstrap_results.rda")

Examples

full_results <- example_bootstrap_results()
full_results <- example_bootstrap_results()

Example bootstrap celltype enrichment test for transcriptome data

Description

Example celltype enrichment results produced by ewce_expression_data.

Usage

example_transcriptome_results(verbose = TRUE, localHub = FALSE)
example_transcriptome_results(verbose = TRUE, localHub = FALSE)

Arguments

`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

Value

List with 5 items.

Source

## Load the single cell data

ctd <- ewceData::ctd()

## Set the parameters for the analysis

## Use 3 bootstrap lists for speed, for publishable analysis use >10,000

reps <- 3

annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)

## Use 5 up/down regulated genes (thresh) for speed, default is 250

thresh <- 5

## Load the top table

tt_alzh <- ewceData::tt_alzh()

tt_results <- EWCE::ewce_expression_data( sct_data = ctd, tt = tt_alzh, annotLevel = 1, thresh = thresh, reps = reps, ttSpecies = "human", sctSpecies = "mouse" )

save(tt_results, file = "inst/extdata/tt_results.rda")

Examples

tt_results <- EWCE::example_transcriptome_results()
tt_results <- EWCE::example_transcriptome_results()

Filter genes in a CellTypeDataset

Description

Removes rows from each matrix within a CellTypeDataset (CTD) that are not within gene_subset.

Usage

filter_ctd_genes(ctd, gene_subset)
filter_ctd_genes(ctd, gene_subset)

Arguments

`ctd`	CellTypeDataset.
`gene_subset`	Genes to subset to.

Value

Filtered CellTypeDataset.

Examples

 
ctd <- ewceData::ctd()
ctd <- standardise_ctd(ctd, input_species="mouse")
gene_subset <- rownames(ctd[[1]]$mean_exp)[1:100]
ctd_subset <- EWCE::filter_ctd_genes(ctd = ctd, gene_subset = gene_subset) 
ctd <- ewceData::ctd()
ctd <- standardise_ctd(ctd, input_species="mouse")
gene_subset <- rownames(ctd[[1]]$mean_exp)[1:100]
ctd_subset <- EWCE::filter_ctd_genes(ctd = ctd, gene_subset = gene_subset)

filter_genes_without_1to1_homolog

Description

Deprecated function. Please use filter_nonorthologs instead.

Usage

filter_genes_without_1to1_homolog(
  filenames,
  input_species = "mouse",
  convert_nonhuman_genes = TRUE,
  annot_levels = NULL,
  suffix = "_orthologs",
  verbose = TRUE
)
filter_genes_without_1to1_homolog(
  filenames,
  input_species = "mouse",
  convert_nonhuman_genes = TRUE,
  annot_levels = NULL,
  suffix = "_orthologs",
  verbose = TRUE
)

Arguments

`filenames`	List of file names for sct_data saved as .rda files.
`input_species`	Which species the gene names in `exp` come from.
`convert_nonhuman_genes`	Whether to convert the `exp` row names to human gene names.
`annot_levels`	[Optional] Names of each annotation level.
`suffix`	Suffix to add to the file name (right before .rda).
`verbose`	Print messages.

Details

Note: This function replaces the original filter_genes_without_1to1_homolog function. filter_genes_without_1to1_homolog is now a wrapper for filter_nonorthologs.

Value

List of the filtered CellTypeData file names.

Examples

# Load the single cell data
ctd <- ewceData::ctd()
tmp <- tempfile()
save(ctd, file = tmp)
fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)
# Load the single cell data
ctd <- ewceData::ctd()
tmp <- tempfile()
save(ctd, file = tmp)
fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)

Filter non-orthologs

Description

filter_nonorthologs Takes the filenames of CellTypeData files, loads them, drops any genes which don't have a 1:1 orthologs with humans, and then convert the gene to human orthologs. The new files are then saved to disk, appending '_orthologs' to the file name.

Usage

filter_nonorthologs(
  filenames,
  input_species = NULL,
  convert_nonhuman_genes = TRUE,
  annot_levels = NULL,
  suffix = "_orthologs",
  method = "homologene",
  non121_strategy = "drop_both_species",
  verbose = TRUE,
  ...
)
filter_nonorthologs(
  filenames,
  input_species = NULL,
  convert_nonhuman_genes = TRUE,
  annot_levels = NULL,
  suffix = "_orthologs",
  method = "homologene",
  non121_strategy = "drop_both_species",
  verbose = TRUE,
  ...
)

Arguments

`filenames`	List of file names for sct_data saved as .rda files.
`input_species`	Which species the gene names in `exp` come from.
`convert_nonhuman_genes`	Whether to convert the `exp` row names to human gene names.
`annot_levels`	[Optional] Names of each annotation level.
`suffix`	Suffix to add to the file name (right before .rda).
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`non121_strategy`	How to handle genes that don't have 1:1 mappings between `input_species`:`output_species`. Options include: `"drop_both_species" or "dbs" or 1` : Drop genes that have duplicate mappings in either the `input_species` or `output_species` (DEFAULT). `"drop_input_species" or "dis" or 2` : Only drop genes that have duplicate mappings in the `input_species`. `"drop_output_species" or "dos" or 3` : Only drop genes that have duplicate mappings in the `output_species`. `"keep_both_species" or "kbs" or 4` : Keep all genes regardless of whether they have duplicate mappings in either species. `"keep_popular" or "kp" or 5` : Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs. `"sum","mean","median","min" or "max"` : When `gene_df` is a matrix and `gene_output="rownames"`, these options will aggregate many-to-one gene mappings (`input_species`-to-`output_species`) after dropping any duplicate genes in the `output_species`.
`verbose`	Print messages.
`...`	Arguments passed on to `orthogene::convert_orthologs` `gene_df` Data object containing the genes (see `gene_input` for options on how the genes can be stored within the object). Can be one of the following formats: `matrix` : A sparse or dense matrix. `data.frame` : A `data.frame`, `data.table`. or `tibble`. codelist : A `list` or character `vector`. Genes, transcripts, proteins, SNPs, or genomic ranges can be provided in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to gene symbols unless specified otherwise with the `...` arguments. Note: If you set `method="homologene"`, you must either supply genes in gene symbol format (e.g. "Sox2") OR set `standardise_genes=TRUE`. `gene_input` Which aspect of `gene_df` to get gene names from: `"rownames"` : From row names of data.frame/matrix. `"colnames"` : From column names of data.frame/matrix. `<column name>` : From a column in `gene_df`, e.g. `"gene_names"`. `gene_output` How to return genes. Options include: `"rownames"` : As row names of `gene_df`. `"colnames"` : As column names of `gene_df`. `"columns"` : As new columns "input_gene", "ortholog_gene" (and "input_gene_standard" if `standardise_genes=TRUE`) in `gene_df`. `"dict"` : As a dictionary (named list) where the names are input_gene and the values are ortholog_gene. `"dict_rev"` : As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene. `standardise_genes` If `TRUE` AND `gene_output="columns"`, a new column "input_gene_standard" will be added to `gene_df` containing standardised HGNC symbols identified by gorth. `output_species` Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species. `drop_nonorths` Drop genes that don't have an ortholog in the `output_species`. `agg_fun` Aggregation function passed to aggregate_mapped_genes. Set to `NULL` to skip aggregation step (default). `mthreshold` Maximum number of ortholog names per gene to show. Passed to gorth. Only used when `method="gprofiler"` (DEFAULT : `Inf`). `as_sparse` Convert `gene_df` to a sparse matrix. Only works if `gene_df` is one of the following classes: `matrix` `Matrix` `data.frame` `data.table` `tibble` If `gene_df` is a sparse matrix to begin with, it will be returned as a sparse matrix (so long as `gene_output=` `"rownames"` or `"colnames"`). `as_DelayedArray` Convert aggregated matrix to DelayedArray. `sort_rows` Sort `gene_df` rows alphanumerically. `gene_map` A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows: `gene_map=<data.frame>` : When a data.frame containing the gene key:value columns (specified by `input_col` and `output_col`, respectively) is provided, this will be used to perform aggregation/expansion. `gene_map=NULL` and `input_species!=output_species` : A `gene_map` is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion. `gene_map=NULL` and `input_species==output_species` : A `gene_map` is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion. `input_col` Column name within `gene_map` with gene names matching the row names of `X`. `output_col` Column name within `gene_map` with gene names that you wish you map the row names of `X` onto.

Details

Note: This function replaces the original filter_genes_without_1to1_homolog function. filter_genes_without_1to1_homolog is now a wrapper for filter_nonorthologs.

Value

List of the filtered CellTypeData file names.

Examples

# Load the single cell data
ctd <- ewceData::ctd()
tmp <- tempfile()
save(ctd, file = tmp)
fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)
# Load the single cell data
ctd <- ewceData::ctd()
tmp <- tempfile()
save(ctd, file = tmp)
fNames_ALLCELLS_orths <- EWCE::filter_nonorthologs(filenames = tmp)

fix_bad_hgnc_symbols

Description

Given an expression matrix, wherein the rows are supposed to be HGNC symbols, find those symbols which are not official HGNC symbols, then correct them if possible. Return the expression matrix with corrected symbols.

Usage

fix_bad_hgnc_symbols(
  exp,
  dropNonHGNC = FALSE,
  as_sparse = TRUE,
  verbose = TRUE,
  localHub = FALSE
)
fix_bad_hgnc_symbols(
  exp,
  dropNonHGNC = FALSE,
  as_sparse = TRUE,
  verbose = TRUE,
  localHub = FALSE
)

Arguments

`exp`	An expression matrix where the rows are HGNC symbols or a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object.
`dropNonHGNC`	Boolean. Should symbols not recognised as HGNC symbols be dropped?
`as_sparse`	Convert `exp` to sparse matrix.
`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

Value

Returns the expression matrix with the rownames corrected and rows representing the same gene merged. If a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object was inputted this will be returned with the corrected expression matrix under counts.

Examples

# create example expression matrix, could be part of a exp, annot list obj
exp <- matrix(data = runif(70), ncol = 10)
# Add HGNC gene names but add with an error:
# MARCH8 is a HGNC symbol which if opened in excel will convert to Mar-08
rownames(exp) <-
    c("MT-TF", "MT-RNR1", "MT-TV", "MT-RNR2", "MT-TL1", "MT-ND1", "Mar-08")
exp <- fix_bad_hgnc_symbols(exp)
# fix_bad_hgnc_symbols warns the user of this possible issue
# create example expression matrix, could be part of a exp, annot list obj
exp <- matrix(data = runif(70), ncol = 10)
# Add HGNC gene names but add with an error:
# MARCH8 is a HGNC symbol which if opened in excel will convert to Mar-08
rownames(exp) <-
    c("MT-TF", "MT-RNR1", "MT-TV", "MT-RNR2", "MT-TL1", "MT-ND1", "Mar-08")
exp <- fix_bad_hgnc_symbols(exp)
# fix_bad_hgnc_symbols warns the user of this possible issue

fix_bad_mgi_symbols - Given an expression matrix, wherein the rows are supposed to be MGI symbols, find those symbols which are not official MGI symbols, then check in the MGI synonm database for whether they match to a proper MGI symbol. Where a symbol is found to be an aliases for a gene that is already in the dataset, the combined reads are summed together.

Description

Also checks whether any gene names contain "Sep", "Mar" or "Feb". These should be checked for any suggestion that excel has corrupted the gene names.

Usage

fix_bad_mgi_symbols(
  exp,
  mrk_file_path = NULL,
  printAllBadSymbols = FALSE,
  as_sparse = TRUE,
  verbose = TRUE,
  localHub = FALSE
)
fix_bad_mgi_symbols(
  exp,
  mrk_file_path = NULL,
  printAllBadSymbols = FALSE,
  as_sparse = TRUE,
  verbose = TRUE,
  localHub = FALSE
)

Arguments

`exp`	An expression matrix where the rows are MGI symbols, or a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object.
`mrk_file_path`	Path to the MRK_List2 file which can be downloaded from www.informatics.jax.org/downloads/reports/index.html
`printAllBadSymbols`	Output to console all the bad gene symbols
`as_sparse`	Convert `exp` to sparse matrix.
`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

Value

Returns the expression matrix with the rownames corrected and rows representing the same gene merged. If no corrections are necessary, input expression matrix is returned. If a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object was inputted this will be returned with the corrected expression matrix under counts.

Examples

# Load the single cell data
cortex_mrna <- ewceData::cortex_mrna()
# take a subset for speed
cortex_mrna$exp <- cortex_mrna$exp[1:50, 1:5]
cortex_mrna$exp <- fix_bad_mgi_symbols(cortex_mrna$exp)
# Load the single cell data
cortex_mrna <- ewceData::cortex_mrna()
# take a subset for speed
cortex_mrna$exp <- cortex_mrna$exp[1:50, 1:5]
cortex_mrna$exp <- fix_bad_mgi_symbols(cortex_mrna$exp)

Fix celltype names

Description

Make sure celltypes don't contain characters that could interfere with downstream analyses. For example, the R package MAGMA.Celltyping cannot have spaces in celltype names because spaces are used as a delimiter in later steps.

Usage

fix_celltype_names(
  celltypes,
  replace_chars = "[-]|[.]|[ ]|[//]|[\\/]",
  make_unique = TRUE
)
fix_celltype_names(
  celltypes,
  replace_chars = "[-]|[.]|[ ]|[//]|[\\/]",
  make_unique = TRUE
)

Arguments

`celltypes`	Character vector of celltype names.
`replace_chars`	Regex string of characters to replace with "_" when renaming columns.
`make_unique`	Make all entries unique.

Value

Fixed celltype names.

Examples

ct <- c("microglia", "astryocytes", "Pyramidal SS")
ct_fixed <- fix_celltype_names(celltypes = ct)
ct <- c("microglia", "astryocytes", "Pyramidal SS")
ct_fixed <- fix_celltype_names(celltypes = ct)

Generate bootstrap plots

Description

generate_bootstrap_plots takes a gene list and a single cell type transcriptome dataset and generates plots which show how the expression of the genes in the list compares to those in randomly generated gene lists.

Usage

generate_bootstrap_plots(
  sct_data = NULL,
  hits = NULL,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  output_species = "human",
  method = "homologene",
  reps = 100,
  annotLevel = 1,
  geneSizeControl = FALSE,
  full_results = NULL,
  listFileName = paste0("_level", annotLevel),
  adj_pval_thresh = 0.05,
  facets = "CellType",
  scales = "free_x",
  save_dir = file.path(tempdir(), "BootstrapPlots"),
  show_plot = TRUE,
  verbose = TRUE
)
generate_bootstrap_plots(
  sct_data = NULL,
  hits = NULL,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  output_species = "human",
  method = "homologene",
  reps = 100,
  annotLevel = 1,
  geneSizeControl = FALSE,
  full_results = NULL,
  listFileName = paste0("_level", annotLevel),
  adj_pval_thresh = 0.05,
  facets = "CellType",
  scales = "free_x",
  save_dir = file.path(tempdir(), "BootstrapPlots"),
  show_plot = TRUE,
  verbose = TRUE
)

Arguments

`sct_data`	List generated using generate_celltype_data.
`hits`	List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if `geneSizeControl=TRUE`.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`genelistSpecies`	Species that `hits` genes came from (no longer limited to just "mouse" and "human"). See list_species for all available species.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`reps`	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
`annotLevel`	An integer indicating which level of `sct_data` to analyse (Default: 1).
`geneSizeControl`	Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to `TRUE`, then `hits` must be from humans.
`full_results`	The full output of bootstrap_enrichment_test for the same gene list.
`listFileName`	String used as the root for files saved using this function.
`adj_pval_thresh`	Adjusted p-value threshold of celltypes to include in plots.
`facets`	Please use `rows` and `cols` instead.
`scales`	Are scales shared across all facets (the default, `"fixed"`), or do they vary across rows (`"free_x"`), columns (`"free_y"`), or both rows and columns (`"free"`)?
`save_dir`	Directory where the BootstrapPlots folder should be saved, default is a temp directory.
`show_plot`	Print the plot.
`verbose`	Print messages.

Value

Saves a set of pdf files containing graphs and returns the file where they are saved. These will be saved with the file name adjusted using the value of listFileName. The files are saved into the 'BootstrapPlot' folder. Files start with one of the following:

qqplot_noText: sorts the gene list according to how enriched it is in the relevant cell type. Plots the value in the target list against the mean value in the bootstrapped lists.
qqplot_wtGSym: as above but labels the gene symbols for the highest expressed genes.
bootDists: rather than just showing the mean of the bootstrapped lists, a boxplot shows the distribution of values
bootDists_LOG: shows the bootstrapped distributions with the y-axis shown on a log scale

Examples

## Load the single cell data
sct_data <- ewceData::ctd()

## Set the parameters for the analysis
## Use 5 bootstrap lists for speed, for publishable analysis use >10000
reps <- 5

## Load the gene list and get human orthologs
hits <- ewceData::example_genelist()

## Bootstrap significance test,
##  no control for transcript length or GC content
## Use pre-computed results to speed up example
full_results <- EWCE::example_bootstrap_results()

### Skip this for example purposes
# full_results <- EWCE::bootstrap_enrichment_test(
#    sct_data = sct_data,
#    hits = hits,
#    reps = reps,
#    annotLevel = 1,
#    sctSpecies = "mouse",
#    genelistSpecies = "human"
# )

output <- EWCE::generate_bootstrap_plots(
    sct_data = sct_data,
    hits = hits,
    reps = reps,
    full_results = full_results,
    sctSpecies = "mouse",
    genelistSpecies = "human",
    annotLevel = 1
)
## Load the single cell data
sct_data <- ewceData::ctd()

## Set the parameters for the analysis
## Use 5 bootstrap lists for speed, for publishable analysis use >10000
reps <- 5

## Load the gene list and get human orthologs
hits <- ewceData::example_genelist()

## Bootstrap significance test,
##  no control for transcript length or GC content
## Use pre-computed results to speed up example
full_results <- EWCE::example_bootstrap_results()

### Skip this for example purposes
# full_results <- EWCE::bootstrap_enrichment_test(
#    sct_data = sct_data,
#    hits = hits,
#    reps = reps,
#    annotLevel = 1,
#    sctSpecies = "mouse",
#    genelistSpecies = "human"
# )

output <- EWCE::generate_bootstrap_plots(
    sct_data = sct_data,
    hits = hits,
    reps = reps,
    full_results = full_results,
    sctSpecies = "mouse",
    genelistSpecies = "human",
    annotLevel = 1
)

Generate bootstrap plots

Description

Takes a gene list and a single cell type transcriptome dataset and generates plots which show how the expression of the genes in the list compares to those in randomly generated gene lists.

Usage

generate_bootstrap_plots_for_transcriptome(
  sct_data,
  tt,
  bg = NULL,
  thresh = 250,
  annotLevel = 1,
  reps = 100,
  full_results = NA,
  listFileName = "",
  showGNameThresh = 25,
  ttSpecies = NULL,
  sctSpecies = NULL,
  output_species = NULL,
  sortBy = "t",
  sig_only = TRUE,
  sig_col = "q",
  sig_thresh = 0.05,
  celltype_col = "CellType",
  plot_types = c("bootstrap", "bootstrap_distributions", "log_bootstrap_distributions"),
  save_dir = file.path(tempdir(), "BootstrapPlots"),
  method = "homologene",
  verbose = TRUE
)
generate_bootstrap_plots_for_transcriptome(
  sct_data,
  tt,
  bg = NULL,
  thresh = 250,
  annotLevel = 1,
  reps = 100,
  full_results = NA,
  listFileName = "",
  showGNameThresh = 25,
  ttSpecies = NULL,
  sctSpecies = NULL,
  output_species = NULL,
  sortBy = "t",
  sig_only = TRUE,
  sig_col = "q",
  sig_thresh = 0.05,
  celltype_col = "CellType",
  plot_types = c("bootstrap", "bootstrap_distributions", "log_bootstrap_distributions"),
  save_dir = file.path(tempdir(), "BootstrapPlots"),
  method = "homologene",
  verbose = TRUE
)

Arguments

`sct_data`	List generated using generate_celltype_data.
`tt`	Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`thresh`	The number of up- and down- regulated genes to be included in each analysis (Default: 250).
`annotLevel`	An integer indicating which level of `sct_data` to analyse (Default: 1).
`reps`	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
`full_results`	The full output of ewce_expression_data for the same gene list.
`listFileName`	String used as the root for files saved using this function.
`showGNameThresh`	Integer. If a gene has over X percent of it's expression proportion in a cell type, then list the gene name.
`ttSpecies`	The species the differential expression table was generated from.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`sortBy`	Column name of metric in `tt` which should be used to sort up- from down- regulated genes (Default: "t").
`sig_only`	Should plots only be generated for cells which have significant changes?
`sig_col`	Column name in `tt` that contains the significance values.
`sig_thresh`	Threshold by which to filter `tt` by `sig_col`.
`celltype_col`	Column within `tt` that contains celltype names.
`plot_types`	Plot types to generate.
`save_dir`	Directory where the BootstrapPlots folder should be saved, default is a temp directory.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`verbose`	Print messages.

Value

Saves a set of PDF files containing graphs. Then returns a nested list with each plot and the path where it was saved to. Files start with one of the following:

qqplot_noText: sorts the gene list according to how enriched it is in the relevant cell type. Plots the value in the target list against the mean value in the bootstrapped lists.
qqplot_wtGSym: as above but labels the gene symbols for the highest expressed genes.
bootDists: rather than just showing the mean of the bootstrapped lists, a boxplot shows the distribution of values
bootDists_LOG: shows the bootstrapped distributions with the y-axis shown on a log scale

Examples

## Load the single cell data
ctd <- ewceData::ctd()

## Set the parameters for the analysis
## Use 3 bootstrap lists for speed, for publishable analysis use >10,000
reps <- 3
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)
## Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5

## Load the top table
tt_alzh <- ewceData::tt_alzh()

## See ?example_transcriptome_results for full code to produce tt_results
tt_results <- EWCE::example_transcriptome_results()

## Bootstrap significance test,
## no control for transcript length or GC content
savePath <- EWCE::generate_bootstrap_plots_for_transcriptome(
    sct_data = ctd,
    tt = tt_alzh,
    thresh = thresh,
    annotLevel = 1,
    full_results = tt_results,
    listFileName = "examples",
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse", 
    # Only do one plot type for demo purposes
    plot_types = "bootstrap" 
) 
## Load the single cell data
ctd <- ewceData::ctd()

## Set the parameters for the analysis
## Use 3 bootstrap lists for speed, for publishable analysis use >10,000
reps <- 3
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)
## Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5

## Load the top table
tt_alzh <- ewceData::tt_alzh()

## See ?example_transcriptome_results for full code to produce tt_results
tt_results <- EWCE::example_transcriptome_results()

## Bootstrap significance test,
## no control for transcript length or GC content
savePath <- EWCE::generate_bootstrap_plots_for_transcriptome(
    sct_data = ctd,
    tt = tt_alzh,
    thresh = thresh,
    annotLevel = 1,
    full_results = tt_results,
    listFileName = "examples",
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse", 
    # Only do one plot type for demo purposes
    plot_types = "bootstrap" 
)

Generate CellTypeData (CTD) file

Description

generate_celltype_data takes gene expression data and cell type annotations and creates CellTypeData (CTD) files which contain matrices of mean expression and specificity per cell type.

Usage

generate_celltype_data(
  exp,
  annotLevels,
  groupName,
  no_cores = 1,
  savePath = tempdir(),
  file_prefix = "ctd",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  normSpec = FALSE,
  convert_orths = FALSE,
  input_species = "mouse",
  output_species = "human",
  non121_strategy = "drop_both_species",
  method = "homologene",
  force_new_file = TRUE,
  specificity_quantiles = TRUE,
  numberOfBins = 40,
  dendrograms = TRUE,
  return_ctd = FALSE,
  verbose = TRUE,
  ...
)
generate_celltype_data(
  exp,
  annotLevels,
  groupName,
  no_cores = 1,
  savePath = tempdir(),
  file_prefix = "ctd",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  normSpec = FALSE,
  convert_orths = FALSE,
  input_species = "mouse",
  output_species = "human",
  non121_strategy = "drop_both_species",
  method = "homologene",
  force_new_file = TRUE,
  specificity_quantiles = TRUE,
  numberOfBins = 40,
  dendrograms = TRUE,
  return_ctd = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`exp`	Numerical matrix with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame.
`annotLevels`	List with arrays of strings containing the cell type names associated with each column in `exp`.
`groupName`	A human readable name for referring to the dataset being used.
`no_cores`	Number of cores that should be used to speedup the computation. NOTE: Use `no_cores=1` when using this package in windows system.
`savePath`	Directory where the CTD file should be saved.
`file_prefix`	Prefix to add to saved CTD file name.
`as_sparse`	Convert `exp` to a sparse `Matrix`.
`as_DelayedArray`	Convert `exp` to `DelayedArray`.
`normSpec`	Boolean indicating whether specificity data should be transformed to a normal distribution by cell type, giving equivalent scores across all cell types.
`convert_orths`	If `input_species!=output_species` and `convert_orths=TRUE`, will drop genes without 1:1 `output_species` orthologs and then convert `exp` gene names to those of `output_species`.
`input_species`	The species that the `exp` dataset comes from. See list_species for all available species.
`output_species`	Species to convert `exp` to (Default: "human"). See list_species for all available species.
`non121_strategy`	How to handle genes that don't have 1:1 mappings between `input_species`:`output_species`. Options include: `"drop_both_species" or "dbs" or 1` : Drop genes that have duplicate mappings in either the `input_species` or `output_species` (DEFAULT). `"drop_input_species" or "dis" or 2` : Only drop genes that have duplicate mappings in the `input_species`. `"drop_output_species" or "dos" or 3` : Only drop genes that have duplicate mappings in the `output_species`. `"keep_both_species" or "kbs" or 4` : Keep all genes regardless of whether they have duplicate mappings in either species. `"keep_popular" or "kp" or 5` : Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs. `"sum","mean","median","min" or "max"` : When `gene_df` is a matrix and `gene_output="rownames"`, these options will aggregate many-to-one gene mappings (`input_species`-to-`output_species`) after dropping any duplicate genes in the `output_species`.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`force_new_file`	If a file of the same name as the one being created already exists, overwrite it.
`specificity_quantiles`	Compute specificity quantiles. Recommended to set to `TRUE`.
`numberOfBins`	Number of quantile 'bins' to use (40 is recommended).
`dendrograms`	Add dendrogram plots
`return_ctd`	Return the CTD object in a list along with the file name, instead of just the file name.
`verbose`	Print messages.
`...`	Arguments passed on to `orthogene::convert_orthologs` `gene_df` Data object containing the genes (see `gene_input` for options on how the genes can be stored within the object). Can be one of the following formats: `matrix` : A sparse or dense matrix. `data.frame` : A `data.frame`, `data.table`. or `tibble`. codelist : A `list` or character `vector`. Genes, transcripts, proteins, SNPs, or genomic ranges can be provided in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to gene symbols unless specified otherwise with the `...` arguments. Note: If you set `method="homologene"`, you must either supply genes in gene symbol format (e.g. "Sox2") OR set `standardise_genes=TRUE`. `gene_input` Which aspect of `gene_df` to get gene names from: `"rownames"` : From row names of data.frame/matrix. `"colnames"` : From column names of data.frame/matrix. `<column name>` : From a column in `gene_df`, e.g. `"gene_names"`. `gene_output` How to return genes. Options include: `"rownames"` : As row names of `gene_df`. `"colnames"` : As column names of `gene_df`. `"columns"` : As new columns "input_gene", "ortholog_gene" (and "input_gene_standard" if `standardise_genes=TRUE`) in `gene_df`. `"dict"` : As a dictionary (named list) where the names are input_gene and the values are ortholog_gene. `"dict_rev"` : As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene. `standardise_genes` If `TRUE` AND `gene_output="columns"`, a new column "input_gene_standard" will be added to `gene_df` containing standardised HGNC symbols identified by gorth. `drop_nonorths` Drop genes that don't have an ortholog in the `output_species`. `agg_fun` Aggregation function passed to aggregate_mapped_genes. Set to `NULL` to skip aggregation step (default). `mthreshold` Maximum number of ortholog names per gene to show. Passed to gorth. Only used when `method="gprofiler"` (DEFAULT : `Inf`). `sort_rows` Sort `gene_df` rows alphanumerically. `gene_map` A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows: `gene_map=<data.frame>` : When a data.frame containing the gene key:value columns (specified by `input_col` and `output_col`, respectively) is provided, this will be used to perform aggregation/expansion. `gene_map=NULL` and `input_species!=output_species` : A `gene_map` is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion. `gene_map=NULL` and `input_species==output_species` : A `gene_map` is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion. `input_col` Column name within `gene_map` with gene names matching the row names of `X`. `output_col` Column name within `gene_map` with gene names that you wish you map the row names of `X` onto.

Value

File names for the saved CellTypeData (CTD) files.

Examples

# Load the single cell data
cortex_mrna <- ewceData::cortex_mrna()
# Use only a subset to keep the example quick
expData <- cortex_mrna$exp[1:100, ]
l1 <- cortex_mrna$annot$level1class
l2 <- cortex_mrna$annot$level2class
annotLevels <- list(l1 = l1, l2 = l2)
fNames_ALLCELLS <- EWCE::generate_celltype_data(
    exp = expData,
    annotLevels = annotLevels,
    groupName = "allKImouse"
)
# Load the single cell data
cortex_mrna <- ewceData::cortex_mrna()
# Use only a subset to keep the example quick
expData <- cortex_mrna$exp[1:100, ]
l1 <- cortex_mrna$annot$level1class
l2 <- cortex_mrna$annot$level2class
annotLevels <- list(l1 = l1, l2 = l2)
fNames_ALLCELLS <- EWCE::generate_celltype_data(
    exp = expData,
    annotLevels = annotLevels,
    groupName = "allKImouse"
)

get_celltype_table

Description

get_celltype_table Generates a table that can be used for supplemenary tables of publications. The table lists how many cells are associated with each cell type, the level of annotation, and the dataset from which it was generated.

Usage

get_celltype_table(annot)
get_celltype_table(annot)

Arguments

annot

An annotation dataframe, which columns named 'level1class', 'level2class' and 'dataset_name'

Value

A dataframe with columns 'name', 'level', 'freq' and 'dataset_name'

Examples

# See PrepLDSC.Rmd for origin of merged_ALLCELLS$annot
cortex_mrna <- ewceData::cortex_mrna()
cortex_mrna$annot$dataset_name <- "cortex_mrna"
celltype_table <- EWCE::get_celltype_table(cortex_mrna$annot)
# See PrepLDSC.Rmd for origin of merged_ALLCELLS$annot
cortex_mrna <- ewceData::cortex_mrna()
cortex_mrna$annot$dataset_name <- "cortex_mrna"
celltype_table <- EWCE::get_celltype_table(cortex_mrna$annot)

Assess whether an object is a DelayedArray.

Description

Assess whether an object is a DelayedArray or one of its derived object types.

Usage

is_delayed_array(X)
is_delayed_array(X)

Arguments

X

Object.

Value

boolean

Assess whether an object is a Matrix

Description

Assess whether an object is a Matrix or one of its derived object types.

Usage

is_matrix(X)
is_matrix(X)

Arguments

X

Object.

Value

boolean

Assess whether an object is a sparse matrix

Description

Assess whether an object is a sparse matrix or one of its derived object types.

Usage

is_sparse_matrix(X)
is_sparse_matrix(X)

Arguments

X

Object.

Value

boolean

List all species

Description

List all species that EWCE can convert genes from/to. Wrapper function for map_species.

Usage

list_species(verbose = TRUE)
list_species(verbose = TRUE)

Arguments

verbose

Print messages.

Value

List of species EWCE can input/output genes as.

Examples

list_species()
list_species()

`load_rdata`

Description

Load processed data (.rda format) using a function that assigns it to a specific variable (so you don't have to guess what the loaded variable name is).

Usage

load_rdata(fileName)
load_rdata(fileName)

Arguments

fileName

Name of the file to load.

Value

Data object.

Examples

tmp <- tempfile()
save(mtcars, file = tmp)
mtcars2 <- load_rdata(tmp)
tmp <- tempfile()
save(mtcars, file = tmp)
mtcars2 <- load_rdata(tmp)

Merge multiple CellTypeDataset references

Description

Import CellTypeDataset (CTD) references from a remote repository, standardize each, and then merge into one CTD. Optionally, can return these as a merged SingleCellExperiment.

Usage

merge_ctd(
  CTD_list,
  save_dir = tempdir(),
  standardise_CTD = FALSE,
  as_SCE = FALSE,
  gene_union = TRUE,
  merge_levels = seq(1, 5),
  save_split_SCE = FALSE,
  save_split_CTD = FALSE,
  save_merged_SCE = TRUE,
  force_new_quantiles = FALSE,
  numberOfBins = 40,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  verbose = TRUE,
  ...
)
merge_ctd(
  CTD_list,
  save_dir = tempdir(),
  standardise_CTD = FALSE,
  as_SCE = FALSE,
  gene_union = TRUE,
  merge_levels = seq(1, 5),
  save_split_SCE = FALSE,
  save_split_CTD = FALSE,
  save_merged_SCE = TRUE,
  force_new_quantiles = FALSE,
  numberOfBins = 40,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`CTD_list`	(Named) list of `CellTypeDatasets`.
`save_dir`	The directory to save merged files in.
`standardise_CTD`	Whether to run `standardise_ctd`.
`as_SCE`	If `TRUE` (default), returns the merged results as a named list of SingleCellExperiments. If `FALSE`, returns as a CTD object.
`gene_union`	Whether to take the gene union or intersection when merging matrices (mean_exp,specificity, etc.).
`merge_levels`	Which CTD levels you want to merge. Can be a single value (e.g. `merge_levels=5`) or a list c(e.g. `merge_levels=c(1:5)`). If some CTD don't have the same number of levels, the maximum level depth available in that CTD will be used instead.
`save_split_SCE`	Whether to save individual SCE files in the subdirectory standardized_CTD_SCE.
`save_split_CTD`	Whether to save individual CTD files in the subdirectory standardized_CTD.
`save_merged_SCE`	Save the final merged SCE object, or simply to return it.
`force_new_quantiles`	If specificity quantiles matrix already exists, create a new one.
`numberOfBins`	Number of bins to compute specificity quantiles with.
`as_sparse`	Convert matrices to sparse matrix.
`as_DelayedArray`	Convert matrices to `DelayedArray`.
`verbose`	Print messages.
`...`	Additional arguments to be passed to `standardise_ctd`.

Value

List of CellTypeDatasets or SingleCellExperiments.

Examples

## Let's pretend these are different CTD datasets
ctd1 <- ewceData::ctd()
ctd2 <- ctd1
CTD_list <- list(ctd1, ctd2)
CTD_merged <- EWCE::merge_ctd(CTD_list = CTD_list)
## Let's pretend these are different CTD datasets
ctd1 <- ewceData::ctd()
ctd2 <- ctd1
CTD_list <- list(ctd1, ctd2)
CTD_merged <- EWCE::merge_ctd(CTD_list = CTD_list)

Merge multiple `SingleCellExperiment` objects

Description

Merge several SingleCellExperiment (SCE) objects from different batches/experiments. Extracted from the scMerge package.

Usage

merge_sce(
  sce_list,
  method = "intersect",
  cut_off_batch = 0.01,
  cut_off_overall = 0.01,
  use_assays = NULL,
  colData_names = NULL,
  batch_names = NULL,
  verbose = TRUE
)
merge_sce(
  sce_list,
  method = "intersect",
  cut_off_batch = 0.01,
  cut_off_overall = 0.01,
  use_assays = NULL,
  colData_names = NULL,
  batch_names = NULL,
  verbose = TRUE
)

Arguments

`sce_list`	A list contains the `SingleCellExperiment` Object from each batch.
`method`	A string indicates the method of combining the gene expression matrix, either `union` or `intersect`. Default to `intersect`. `union` only supports matrix class.
`cut_off_batch`	A numeric vector indicating the cut-off for the proportion of a gene is expressed within each batch.
`cut_off_overall`	A numeric vector indicating the cut-off for the proportion of a gene is expressed overall data.
`use_assays`	A string vector indicating the expression matrices to be combined. The first assay named will be used to determine the proportion of zeros.
`colData_names`	A string vector indicating the `colData` that are combined.
`batch_names`	A string vector indicating the batch names for the output SCE object.
`verbose`	Print messages.

Value

A SingleCellExperiment object with the list of SCE objects combined.

Author(s)

Yingxin Lin (modified by Brian Schilder)

Source

scMerge.

Examples

ctd <- ewceData::ctd()
sce_list <- EWCE::ctd_to_sce(object = ctd)
sce_combine <- merge_sce(sce_list = sce_list)
ctd <- ewceData::ctd()
sce_list <- EWCE::ctd_to_sce(object = ctd)
sce_combine <- merge_sce(sce_list = sce_list)

Merge two exp files

Description

merge_two_expfiles Used to combine two single cell type datasets.

Usage

merge_two_expfiles(
  exp1,
  exp2,
  annot1,
  annot2,
  name1 = "",
  name2 = "",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  verbose = TRUE
)
merge_two_expfiles(
  exp1,
  exp2,
  annot1,
  annot2,
  name1 = "",
  name2 = "",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  verbose = TRUE
)

Arguments

`exp1`	Numerical expression matrix for dataset1 with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame.
`exp2`	Numerical expression matrix for dataset2 with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame.
`annot1`	Annotation data frame for dataset1 which contains three columns at least: cell_id, level1class and level2class
`annot2`	Annotation data frame for dataset2 which contains three columns at least: cell_id, level1class and level2class
`name1`	Name used to refer to dataset 1. Leave blank if it's already a merged dataset.
`name2`	Name used to refer to dataset 2. Leave blank if it's already a merged dataset.
`as_sparse`	Convert the merged `exp` to a sparse matrix.
`as_DelayedArray`	Convert the merged `exp` to a `DelayedArray`.
`verbose`	Print messages.

Value

List containing merged exp and annot.

Examples

cortex_mrna <- ewceData::cortex_mrna()
exp1 <- cortex_mrna$exp[, 1:50]
exp2 <- cortex_mrna$exp[, 51:100]
annot1 <- cortex_mrna$annot[1:50, ]
annot2 <- cortex_mrna$annot[51:100, ]
merged_res <- EWCE::merge_two_expfiles(
    exp1 = exp1,
    exp2 = exp2,
    annot1 = annot1,
    annot2 = annot2,
    name1 = "dataset1",
    name2 = "dataset2"
)
cortex_mrna <- ewceData::cortex_mrna()
exp1 <- cortex_mrna$exp[, 1:50]
exp2 <- cortex_mrna$exp[, 51:100]
annot1 <- cortex_mrna$annot[1:50, ]
annot2 <- cortex_mrna$annot[51:100, ]
merged_res <- EWCE::merge_two_expfiles(
    exp1 = exp1,
    exp2 = exp2,
    annot1 = annot1,
    annot2 = annot2,
    name1 = "dataset1",
    name2 = "dataset2"
)

Multiple EWCE results from multiple studies

Description

merged_ewce combines enrichment results from multiple studies targetting the same scientific problem

Usage

merged_ewce(results, reps = 100)
merged_ewce(results, reps = 100)

Arguments

`results`	a list of EWCE results generated using add_res_to_merging_list.
`reps`	Number of random gene lists to generate (Default=100 but should be >=10,000 for publication-quality results).

Value

dataframe in which each row gives the statistics (p-value, fold change and number of standard deviations from the mean) associated with the enrichment of the stated cell type in the gene list.

Examples

# Load the single cell data
ctd <- ewceData::ctd()

# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5

# Load the data
tt_alzh_BA36 <- ewceData::tt_alzh_BA36()
tt_alzh_BA44 <- ewceData::tt_alzh_BA44()

# Run EWCE analysis
tt_results_36 <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh_BA36,
    thresh = thresh,
    annotLevel = 1,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)
tt_results_44 <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh_BA44,
    thresh = thresh,
    annotLevel = 1,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)

# Fill a list with the results
results <- EWCE::add_res_to_merging_list(tt_results_36)
results <- EWCE::add_res_to_merging_list(tt_results_44, results)

# Perform the merged analysis
# For publication reps should be higher
merged_res <- EWCE::merged_ewce(
    results = results,
    reps = 2
)
print(merged_res)
# Load the single cell data
ctd <- ewceData::ctd()

# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5

# Load the data
tt_alzh_BA36 <- ewceData::tt_alzh_BA36()
tt_alzh_BA44 <- ewceData::tt_alzh_BA44()

# Run EWCE analysis
tt_results_36 <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh_BA36,
    thresh = thresh,
    annotLevel = 1,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)
tt_results_44 <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh_BA44,
    thresh = thresh,
    annotLevel = 1,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)

# Fill a list with the results
results <- EWCE::add_res_to_merging_list(tt_results_36)
results <- EWCE::add_res_to_merging_list(tt_results_44, results)

# Perform the merged analysis
# For publication reps should be higher
merged_res <- EWCE::merged_ewce(
    results = results,
    reps = 2
)
print(merged_res)

Plot CellTypeData metrics

Description

Plot CellTypeData metrics such as mean_exp, specificity and/or specificity_quantiles.

Usage

plot_ctd(ctd, genes, level = 1, metric = "specificity", show_plot = TRUE)
plot_ctd(ctd, genes, level = 1, metric = "specificity", show_plot = TRUE)

Arguments

`ctd`	CellTypeDataset.
`genes`	Which genes in `ctd` to plot.
`level`	Annotation level in `ctd` to plot.
`metric`	Which metric in the `ctd` to plot: "mean_exp" "specificity" "specificity_quantiles"
`show_plot`	Whether to print the plot or simply return it.

Value

ggplot object.

Examples

ctd <- ewceData::ctd()
plt <- EWCE::plot_ctd(ctd, genes = c("Apoe", "Gfap", "Gapdh"))
ctd <- ewceData::ctd()
plt <- EWCE::plot_ctd(ctd, genes = c("Apoe", "Gfap", "Gapdh"))

prep.dendro

Description

prep_dendro adds a dendrogram to a CellTypeDataset (CTD).

Usage

prep.dendro(ctdIN)
prep.dendro(ctdIN)

Arguments

ctdIN

A single annotLevel of a ctd, i.e. ctd[[1]] (the function is intended to be used via apply).

Value

A CellTypeDataset with dendrogram plotting info added.

Normalize expression matrix

Description

Normalize expression matrix by accounting for library size. Uses sctransform.

Usage

sct_normalize(exp, as_sparse = TRUE, verbose = TRUE)
sct_normalize(exp, as_sparse = TRUE, verbose = TRUE)

Arguments

`exp`	Gene x cell expression matrix.
`as_sparse`	Convert `exp` to sparse matrix.
`verbose`	Print messages.

Value

Normalised expression matrix.

Examples

cortex_mrna <- ewceData::cortex_mrna()
exp_sct_normed <- EWCE::sct_normalize(exp = cortex_mrna$exp[1:300, ])
cortex_mrna <- ewceData::cortex_mrna()
exp_sct_normed <- EWCE::sct_normalize(exp = cortex_mrna$exp[1:300, ])

Convert a CellTypeDataset into standardized format

Description

This function will take a CTD, drop all genes without 1:1 orthologs with the output_species ("human" by default), convert the remaining genes to gene symbols, assign names to each level, and convert all matrices to sparse matrices and/or DelayedArray.

Usage

standardise_ctd(
  ctd,
  dataset,
  input_species = NULL,
  output_species = "human",
  sctSpecies_origin = input_species,
  non121_strategy = "drop_both_species",
  method = "homologene",
  force_new_quantiles = TRUE,
  force_standardise = FALSE,
  remove_unlabeled_clusters = FALSE,
  numberOfBins = 40,
  keep_annot = TRUE,
  keep_plots = TRUE,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  rename_columns = TRUE,
  make_columns_unique = FALSE,
  verbose = TRUE,
  ...
)
standardise_ctd(
  ctd,
  dataset,
  input_species = NULL,
  output_species = "human",
  sctSpecies_origin = input_species,
  non121_strategy = "drop_both_species",
  method = "homologene",
  force_new_quantiles = TRUE,
  force_standardise = FALSE,
  remove_unlabeled_clusters = FALSE,
  numberOfBins = 40,
  keep_annot = TRUE,
  keep_plots = TRUE,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  rename_columns = TRUE,
  make_columns_unique = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`ctd`	Input CellTypeData.
`dataset`	CellTypeData. name.
`input_species`	Which species the gene names in `exp` come from. See list_species for all available species.
`output_species`	Which species' genes names to convert `exp` to. See list_species for all available species.
`sctSpecies_origin`	Species that the `sct_data` originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.
`non121_strategy`	How to handle genes that don't have 1:1 mappings between `input_species`:`output_species`. Options include: `"drop_both_species" or "dbs" or 1` : Drop genes that have duplicate mappings in either the `input_species` or `output_species` (DEFAULT). `"drop_input_species" or "dis" or 2` : Only drop genes that have duplicate mappings in the `input_species`. `"drop_output_species" or "dos" or 3` : Only drop genes that have duplicate mappings in the `output_species`. `"keep_both_species" or "kbs" or 4` : Keep all genes regardless of whether they have duplicate mappings in either species. `"keep_popular" or "kp" or 5` : Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs. `"sum","mean","median","min" or "max"` : When `gene_df` is a matrix and `gene_output="rownames"`, these options will aggregate many-to-one gene mappings (`input_species`-to-`output_species`) after dropping any duplicate genes in the `output_species`.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`force_new_quantiles`	By default, quantile computation is skipped if they have already been computed. Set `=TRUE` to override this and generate new quantiles.
`force_standardise`	If `ctd` has already been standardised, whether to rerun standardisation anyway (Default: `FALSE`).
`remove_unlabeled_clusters`	Remove any samples that have numeric column names.
`numberOfBins`	Number of non-zero quantile bins.
`keep_annot`	Keep the column annotation data if provided.
`keep_plots`	Keep the dendrograms if provided.
`as_sparse`	Convert to sparse matrix.
`as_DelayedArray`	Convert to `DelayedArray`.
`rename_columns`	Remove `replace_chars` from column names.
`make_columns_unique`	Rename each columns with the prefix `dataset.species.celltype`.
`verbose`	Print messages. Set `verbose=2` if you want to print all messages from internal functions as well.
`...`	Arguments passed on to `orthogene::convert_orthologs` `gene_df` Data object containing the genes (see `gene_input` for options on how the genes can be stored within the object). Can be one of the following formats: `matrix` : A sparse or dense matrix. `data.frame` : A `data.frame`, `data.table`. or `tibble`. codelist : A `list` or character `vector`. Genes, transcripts, proteins, SNPs, or genomic ranges can be provided in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to gene symbols unless specified otherwise with the `...` arguments. Note: If you set `method="homologene"`, you must either supply genes in gene symbol format (e.g. "Sox2") OR set `standardise_genes=TRUE`. `gene_input` Which aspect of `gene_df` to get gene names from: `"rownames"` : From row names of data.frame/matrix. `"colnames"` : From column names of data.frame/matrix. `<column name>` : From a column in `gene_df`, e.g. `"gene_names"`. `gene_output` How to return genes. Options include: `"rownames"` : As row names of `gene_df`. `"colnames"` : As column names of `gene_df`. `"columns"` : As new columns "input_gene", "ortholog_gene" (and "input_gene_standard" if `standardise_genes=TRUE`) in `gene_df`. `"dict"` : As a dictionary (named list) where the names are input_gene and the values are ortholog_gene. `"dict_rev"` : As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene. `standardise_genes` If `TRUE` AND `gene_output="columns"`, a new column "input_gene_standard" will be added to `gene_df` containing standardised HGNC symbols identified by gorth. `drop_nonorths` Drop genes that don't have an ortholog in the `output_species`. `agg_fun` Aggregation function passed to aggregate_mapped_genes. Set to `NULL` to skip aggregation step (default). `mthreshold` Maximum number of ortholog names per gene to show. Passed to gorth. Only used when `method="gprofiler"` (DEFAULT : `Inf`). `sort_rows` Sort `gene_df` rows alphanumerically. `gene_map` A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows: `gene_map=<data.frame>` : When a data.frame containing the gene key:value columns (specified by `input_col` and `output_col`, respectively) is provided, this will be used to perform aggregation/expansion. `gene_map=NULL` and `input_species!=output_species` : A `gene_map` is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion. `gene_map=NULL` and `input_species==output_species` : A `gene_map` is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion. `input_col` Column name within `gene_map` with gene names matching the row names of `X`. `output_col` Column name within `gene_map` with gene names that you wish you map the row names of `X` onto.

Value

Standardised CellTypeDataset.

Examples

ctd <- ewceData::ctd()
ctd_std <- EWCE::standardise_ctd(
    ctd = ctd,
    input_species = "mouse",
    dataset = "Zeisel2016"
)
ctd <- ewceData::ctd()
ctd_std <- EWCE::standardise_ctd(
    ctd = ctd,
    input_species = "mouse",
    dataset = "Zeisel2016"
)

Package 'EWCE'

Help Index

EWCE: Expression Weighted Celltype Enrichment

Description

Details

Author(s)

See Also

Add to results to merging list

Description

Usage

Arguments

Value

Examples

bin_columns_into_quantiles

Description

Usage

Arguments

Value

Examples

bin_specificity_into_quantiles

Description

Usage

Arguments

Value

Examples

Bootstrap cell type enrichment test

Description

Usage

Arguments

Value

Examples

check_ewce_genelist_inputs

Description

Usage

Arguments

Value

Examples

Get percentage of target cell type hits

Description

Usage

Arguments

Value

Examples

Celltype controlled geneset enrichment

Description

Usage

Arguments

Value

Examples

CellTypeDataset to SingleCellExperiment

Description

Usage

Arguments

Value

Examples

Drop uninformative genes

Description

Usage

Arguments

Value

Examples

Bootstrap cell type enrichment test for transcriptome data

Description

Usage

Arguments

Value

Examples

Plot EWCE results

Description

Usage

Arguments

Value

Examples

Example bootstrap enrichment results

Description

Usage

Arguments

Value

Source

Examples

`bin_columns_into_quantiles`