Package 'cellity' reference manual

Title:	Quality Control for Single-Cell RNA-seq Data
Description:	A support vector machine approach to identifying and filtering low quality cells from single-cell RNA-seq datasets.
Authors:	Tomislav Illicic, Davis McCarthy
Maintainer:	Tomislav Ilicic <[email protected]>
License:	GPL (>= 2)
Version:	1.35.0
Built:	2025-03-24 05:56:18 UTC
Source:	https://github.com/bioc/cellity

Quality Control for Single-Cell RNA-seq Data

Description

cellity provides a support vector machine and PCA approaches to identifying and filtering low quality cells from single-cell RNA-seq datasets.

ASSESS CELL QUALITY USING PCA AND OUTLIER DETECTION

Description

ASSESS CELL QUALITY USING PCA AND OUTLIER DETECTION

Usage

assess_cell_quality_PCA(features, file = "")
assess_cell_quality_PCA(features, file = "")

Arguments

`features`	Input dataset containing features (cell x features)
`file`	Output_file where plot is saved

Details

This function applies PCA on features and uses outlier detection to determine which cells are low and which are high quality

Value

Returns a dataframe indicating which cell is low or high quality (0 or 1 respectively)

Examples

data(training_mES_features)
training_mES_features_all <- training_mES_features[[1]]
training_quality_PCA_allF <- assess_cell_quality_PCA(training_mES_features_all)
data(training_mES_features)
training_mES_features_all <- training_mES_features[[1]]
training_quality_PCA_allF <- assess_cell_quality_PCA(training_mES_features_all)

Assess quality of a cell - SVM version

Description

Assess quality of a cell - SVM version

Usage

assess_cell_quality_SVM(training_set_features, training_set_labels,
  ensemble_param, test_set_features)
assess_cell_quality_SVM(training_set_features, training_set_labels,
  ensemble_param, test_set_features)

Arguments

`training_set_features`	A training set containing features (cells x features) for prediction
`training_set_labels`	Annotation of each individual cell if high or low quality (1 or 0 respectively)
`ensemble_param`	Dataframe of parameters for SVM
`test_set_features`	Dataset to predict containing features (cells x features)

Details

This function takes a traning set + annotation to predict a test set. It requires that hyper-parameters have been optimised.

Value

Returns a dataframe indicating which cell is low or high quality (0 or 1 respectively)

data.frame with decision on quality of cells

Examples

data(param_mES_all)
data(training_mES_features)
data(training_mES_labels)
data(mES1_features)
data(mES1_labels)
mES1_features_all <- mES1_features[[1]]
training_mES_features_all <- training_mES_features[[1]]
mES1_quality_SVM <- assess_cell_quality_SVM( training_mES_features_all, 
training_mES_labels[,2], param_mES_all, mES1_features_all)
data(param_mES_all)
data(training_mES_features)
data(training_mES_labels)
data(mES1_features)
data(mES1_labels)
mES1_features_all <- mES1_features[[1]]
training_mES_features_all <- training_mES_features[[1]]
mES1_quality_SVM <- assess_cell_quality_SVM( training_mES_features_all, 
training_mES_labels[,2], param_mES_all, mES1_features_all)

Additional human genes that are used in feature extraction

Description

This list contains human genes that are used for feature extraction of biological features

Usage

extra_human_genes
extra_human_genes

Format

a list containing vectors of genes. Name indicates which GO category.

Value

NULL, but makes available a list with metadata

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Additional mouse genes that are used in feature extraction

Description

This list contains mouse genes that are used for feature extraction of biological features

Usage

extra_mouse_genes
extra_mouse_genes

Format

a list containing vectors of genes. Name indicates which GO category.

Value

NULL, but makes available a list with metadata

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Extracts biological and technical features for given dataset

Description

Extracts biological and technical features for given dataset

Usage

extract_features(counts_nm, read_metrics, prefix = "", output_dir = "",
  common_features = NULL, GO_terms = NULL, extra_genes = NULL,
  organism = "mouse")
extract_features(counts_nm, read_metrics, prefix = "", output_dir = "",
  common_features = NULL, GO_terms = NULL, extra_genes = NULL,
  organism = "mouse")

Arguments

`counts_nm`	Gene expression counts dataframe (genes x cells). Either normalised by library size or TPM values
`read_metrics`	Dataframe with mapping statistics produced by python pipeline
`prefix`	Prefix of outputfiles
`output_dir`	Output directory of files
`common_features`	Subset of features that are applicable within one species, but across cell types
`GO_terms`	DataFrame with gene ontology term IDs, that will be used in feature extraction
`extra_genes`	Additional genes used for feature extraction
`organism`	The target organism to generate the features for

Details

This function takes a combination of gene counts and mapping statistics to extract biological and technical features, which than can be used for quality data analysis

Value

a list with two elements, one providing all features, and one providing common features.

Examples

data(sample_counts)
data(sample_stats)
sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts))
sample_features <- extract_features(sample_counts_nm, sample_stats)
data(sample_counts)
data(sample_stats)
sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts))
sample_features <- extract_features(sample_counts_nm, sample_stats)

Helper Function to create all features

Description

Helper Function to create all features

Usage

feature_generation(counts_nm, read_metrics, GO_terms, extra_genes, organism)
feature_generation(counts_nm, read_metrics, GO_terms, extra_genes, organism)

Arguments

`counts_nm`	Gene expression counts dataframe (genes x cells). Either normalised by library size or TPM values
`read_metrics`	Dataframe with mapping statistics produced by python pipeline
`GO_terms`	DataFrame with gene ontology term IDs, that will be used in feature extraction
`extra_genes`	Additional genes used for feature extraction
`organism`	The target organism to generate the features for

Value

Returns the entire set of features in a data.frame

Information which genes and GO categories should be included as features. Also defines which features are cell-type independent (common features)

Description

This list contains metadata information that is used to extract features from in the function extract_features

Usage

feature_info
feature_info

Format

a list with 2 elements (GO_terms,common_features).

Value

NULL, but makes available a list with metadata

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Real test dataset containing all and common features from the paper (mES1)

Description

This list contains 2 dataframes where each contains features per cell (cell X features) that can be used for classification.

Usage

mES1_features
mES1_features

Format

a list with 2 elements (all_features, common_features).

Value

NULL, but makes available a list with 2 dataframes

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Real test dataset containing annotation of cells

Description

This data frame has 2 columns: First showing cell names, the second indicating if cell is of low (0) or high (1) quality

Usage

mES1_labels
mES1_labels

Format

a dataframe with 2 columns (cell_names, label).

Value

NULL, but makes available a dataframe with cell annotations

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Internal multiplot function to combine plots onto a grid

Description

Internal multiplot function to combine plots onto a grid

Usage

multiplot(..., plotlist = NULL, file, cols = 6, layout = NULL)
multiplot(..., plotlist = NULL, file, cols = 6, layout = NULL)

Arguments

`...`	individual plots to combine into a single plot
`plotlist`	a vector with names of plots to use in the plot
`file`	string giving filename to which pdf of plots is to be saved
`cols`	integer giving number of columns for the plot
`layout`	matrix defining the layout for the plots

Value

a plot object

Internal function to normalize by library size

Description

Internal function to normalize by library size

Usage

normalise_by_factor(counts, norm_factor)
normalise_by_factor(counts, norm_factor)

Arguments

`counts`	matrix of counts
`norm_factor`	vector of normalisation factors

Value

a matrix with normalized gene counts

Examples

data(sample_counts)
data(sample_stats)
sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts))
data(sample_counts)
data(sample_stats)
sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts))

Parameters used for SVM classification

Description

This data frame has 3 columns: gamma, cost, class.weights and is optimised for all features and our training data

Usage

param_mES_all
param_mES_all

Format

a dataframe with 3 columns (gamma, cost, class.weights).

Value

NULL, but makes available a dataframe with parameters

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Parameters used for SVM classification

Description

This data frame has 3 columns: gamma, cost, class.weights and is optimised for common features and our training data

Usage

param_mES_common
param_mES_common

Format

a dataframe with 3 columns (gamma, cost, class.weights).

Value

NULL, but makes available a dataframe with parameters

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Plots PCA of all features. Colors high and low quality cells based on outlier detection.

Description

Plots PCA of all features. Colors high and low quality cells based on outlier detection.

Usage

plot_pca(features, annot, pca, col, output_file)
plot_pca(features, annot, pca, col, output_file)

Arguments

`features`	Input dataset containing features (cell x features)
`annot`	Matrix annotation of each cell
`pca`	PCA of features
`col`	color code indicating what color high and what low quality cells
`output_file`	where plot is stored

Details

This function plots PCA of all features + most informative features

Value

Plots of PCA

Sample gene expression data containing 40 cells

Description

This data frame contains genes (rows) and cells (columns) showing raw read counts

Usage

sample_counts
sample_counts

Format

a dataframe with genes x cells

Value

NULL, but makes available a dataframe with raw read counts

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Sample read statistics data containing 40 cells

Description

This data frame contains read metrics (columns) and cells (rows)

Usage

sample_stats
sample_stats

Format

a dataframe with cells x metrics

Value

NULL, but makes available a dataframe with read statistics

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Converts all first letters to capital letters

Description

Converts all first letters to capital letters

Usage

simple_cap(x)
simple_cap(x)

Arguments

x

string

Value

a character vector in title case

Sums up normalised values of genes to groups.

Description

Supports TPM and proportion of mapped reads.

Usage

sum_prop(counts, genes_interest)
sum_prop(counts, genes_interest)

Arguments

`counts`	Normalised gene expression count matrix
`genes_interest`	dataframe of genes of interest to merge

Value

a vector of sums per group

Original training dataset containing all and common features from the paper (training mES)

Description

This list contains 2 dataframes where each contains features per cell (cell X features) that can be used for classification.

Usage

training_mES_features
training_mES_features

Format

a list with 2 elements (all_features, common_features).

Value

NULL, but makes available a list with 2 dataframes

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Original training dataset containing annotation of cells

Description

This data frame has 2 columns: First showing cell names, the second indicating if cell is of low (0) or high (1) quality

Usage

training_mES_labels
training_mES_labels

Format

a dataframe with 2 columns (cell_names, label).

Value

NULL, but makes available a dataframe with cell annotations

Author(s)

Tomislav Ilicic & Davis McCarthy, 2015-03-05

Source

Wellcome Trust Sanger Institute

Internal function to detect outliers from the mvoultier pacakge Modified slightly so that plots are not printed

Description

Internal function to detect outliers from the mvoultier pacakge Modified slightly so that plots are not printed

Usage

uni.plot(x, symb = FALSE, quan = 1/2, alpha = 0.025)
uni.plot(x, symb = FALSE, quan = 1/2, alpha = 0.025)

Arguments

`x`	A matrix containing counts
`symb`	Symbols
`quan`	quan
`alpha`	alpha

Value

a list of outlier indicators

Package 'cellity'

Help Index

Quality Control for Single-Cell RNA-seq Data

Description

ASSESS CELL QUALITY USING PCA AND OUTLIER DETECTION

Description

Usage

Arguments

Details

Value

Examples

Assess quality of a cell - SVM version

Description

Usage

Arguments

Details

Value

Examples

Additional human genes that are used in feature extraction

Description

Usage

Format

Value

Author(s)

Source

Additional mouse genes that are used in feature extraction

Description

Usage

Format

Value

Author(s)

Source

Extracts biological and technical features for given dataset

Description

Usage

Arguments

Details

Value

Examples

Helper Function to create all features

Description

Usage

Arguments

Value

Information which genes and GO categories should be included as features. Also defines which features are cell-type independent (common features)

Description

Usage

Format

Value

Author(s)

Source

Real test dataset containing all and common features from the paper (mES1)

Description

Usage

Format

Value

Author(s)

Source

Real test dataset containing annotation of cells

Description

Usage

Format

Value

Author(s)

Source

Internal multiplot function to combine plots onto a grid

Description

Usage

Arguments

Value

Internal function to normalize by library size

Description

Usage

Arguments

Value

Examples

Parameters used for SVM classification

Description

Usage

Format