Title: | Quality Control for Single-Cell RNA-seq Data |
---|---|
Description: | A support vector machine approach to identifying and filtering low quality cells from single-cell RNA-seq datasets. |
Authors: | Tomislav Illicic, Davis McCarthy |
Maintainer: | Tomislav Ilicic <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.35.0 |
Built: | 2024-11-24 06:28:32 UTC |
Source: | https://github.com/bioc/cellity |
cellity provides a support vector machine and PCA approaches to identifying and filtering low quality cells from single-cell RNA-seq datasets.
ASSESS CELL QUALITY USING PCA AND OUTLIER DETECTION
assess_cell_quality_PCA(features, file = "")
assess_cell_quality_PCA(features, file = "")
features |
Input dataset containing features (cell x features) |
file |
Output_file where plot is saved |
This function applies PCA on features and uses outlier detection to determine which cells are low and which are high quality
Returns a dataframe indicating which cell is low or high quality (0 or 1 respectively)
data(training_mES_features) training_mES_features_all <- training_mES_features[[1]] training_quality_PCA_allF <- assess_cell_quality_PCA(training_mES_features_all)
data(training_mES_features) training_mES_features_all <- training_mES_features[[1]] training_quality_PCA_allF <- assess_cell_quality_PCA(training_mES_features_all)
Assess quality of a cell - SVM version
assess_cell_quality_SVM(training_set_features, training_set_labels, ensemble_param, test_set_features)
assess_cell_quality_SVM(training_set_features, training_set_labels, ensemble_param, test_set_features)
training_set_features |
A training set containing features (cells x features) for prediction |
training_set_labels |
Annotation of each individual cell if high or low quality (1 or 0 respectively) |
ensemble_param |
Dataframe of parameters for SVM |
test_set_features |
Dataset to predict containing features (cells x features) |
This function takes a traning set + annotation to predict a test set. It requires that hyper-parameters have been optimised.
Returns a dataframe indicating which cell is low or high quality (0 or 1 respectively)
data.frame with decision on quality of cells
data(param_mES_all) data(training_mES_features) data(training_mES_labels) data(mES1_features) data(mES1_labels) mES1_features_all <- mES1_features[[1]] training_mES_features_all <- training_mES_features[[1]] mES1_quality_SVM <- assess_cell_quality_SVM( training_mES_features_all, training_mES_labels[,2], param_mES_all, mES1_features_all)
data(param_mES_all) data(training_mES_features) data(training_mES_labels) data(mES1_features) data(mES1_labels) mES1_features_all <- mES1_features[[1]] training_mES_features_all <- training_mES_features[[1]] mES1_quality_SVM <- assess_cell_quality_SVM( training_mES_features_all, training_mES_labels[,2], param_mES_all, mES1_features_all)
This list contains human genes that are used for feature extraction of biological features
extra_human_genes
extra_human_genes
a list containing vectors of genes. Name indicates which GO category.
NULL, but makes available a list with metadata
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
This list contains mouse genes that are used for feature extraction of biological features
extra_mouse_genes
extra_mouse_genes
a list containing vectors of genes. Name indicates which GO category.
NULL, but makes available a list with metadata
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
Extracts biological and technical features for given dataset
extract_features(counts_nm, read_metrics, prefix = "", output_dir = "", common_features = NULL, GO_terms = NULL, extra_genes = NULL, organism = "mouse")
extract_features(counts_nm, read_metrics, prefix = "", output_dir = "", common_features = NULL, GO_terms = NULL, extra_genes = NULL, organism = "mouse")
counts_nm |
Gene expression counts dataframe (genes x cells). Either normalised by library size or TPM values |
read_metrics |
Dataframe with mapping statistics produced by python pipeline |
prefix |
Prefix of outputfiles |
output_dir |
Output directory of files |
common_features |
Subset of features that are applicable within one species, but across cell types |
GO_terms |
DataFrame with gene ontology term IDs, that will be used in feature extraction |
extra_genes |
Additional genes used for feature extraction |
organism |
The target organism to generate the features for |
This function takes a combination of gene counts and mapping statistics to extract biological and technical features, which than can be used for quality data analysis
a list with two elements, one providing all features, and one providing common features.
data(sample_counts) data(sample_stats) sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts)) sample_features <- extract_features(sample_counts_nm, sample_stats)
data(sample_counts) data(sample_stats) sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts)) sample_features <- extract_features(sample_counts_nm, sample_stats)
Helper Function to create all features
feature_generation(counts_nm, read_metrics, GO_terms, extra_genes, organism)
feature_generation(counts_nm, read_metrics, GO_terms, extra_genes, organism)
counts_nm |
Gene expression counts dataframe (genes x cells). Either normalised by library size or TPM values |
read_metrics |
Dataframe with mapping statistics produced by python pipeline |
GO_terms |
DataFrame with gene ontology term IDs, that will be used in feature extraction |
extra_genes |
Additional genes used for feature extraction |
organism |
The target organism to generate the features for |
Returns the entire set of features in a data.frame
This list contains metadata information that is used to extract features from in the function extract_features
feature_info
feature_info
a list with 2 elements (GO_terms,common_features).
NULL, but makes available a list with metadata
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
This list contains 2 dataframes where each contains features per cell (cell X features) that can be used for classification.
mES1_features
mES1_features
a list with 2 elements (all_features, common_features).
NULL, but makes available a list with 2 dataframes
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
This data frame has 2 columns: First showing cell names, the second indicating if cell is of low (0) or high (1) quality
mES1_labels
mES1_labels
a dataframe with 2 columns (cell_names, label).
NULL, but makes available a dataframe with cell annotations
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
Internal multiplot function to combine plots onto a grid
multiplot(..., plotlist = NULL, file, cols = 6, layout = NULL)
multiplot(..., plotlist = NULL, file, cols = 6, layout = NULL)
... |
individual plots to combine into a single plot |
plotlist |
a vector with names of plots to use in the plot |
file |
string giving filename to which pdf of plots is to be saved |
cols |
integer giving number of columns for the plot |
layout |
matrix defining the layout for the plots |
a plot object
Internal function to normalize by library size
normalise_by_factor(counts, norm_factor)
normalise_by_factor(counts, norm_factor)
counts |
matrix of counts |
norm_factor |
vector of normalisation factors |
a matrix with normalized gene counts
data(sample_counts) data(sample_stats) sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts))
data(sample_counts) data(sample_stats) sample_counts_nm <- normalise_by_factor(sample_counts, colSums(sample_counts))
This data frame has 3 columns: gamma, cost, class.weights and is optimised for all features and our training data
param_mES_all
param_mES_all
a dataframe with 3 columns (gamma, cost, class.weights).
NULL, but makes available a dataframe with parameters
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
This data frame has 3 columns: gamma, cost, class.weights and is optimised for common features and our training data
param_mES_common
param_mES_common
a dataframe with 3 columns (gamma, cost, class.weights).
NULL, but makes available a dataframe with parameters
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
Plots PCA of all features. Colors high and low quality cells based on outlier detection.
plot_pca(features, annot, pca, col, output_file)
plot_pca(features, annot, pca, col, output_file)
features |
Input dataset containing features (cell x features) |
annot |
Matrix annotation of each cell |
pca |
PCA of features |
col |
color code indicating what color high and what low quality cells |
output_file |
where plot is stored |
This function plots PCA of all features + most informative features
Plots of PCA
This data frame contains genes (rows) and cells (columns) showing raw read counts
sample_counts
sample_counts
a dataframe with genes x cells
NULL, but makes available a dataframe with raw read counts
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
This data frame contains read metrics (columns) and cells (rows)
sample_stats
sample_stats
a dataframe with cells x metrics
NULL, but makes available a dataframe with read statistics
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
Converts all first letters to capital letters
simple_cap(x)
simple_cap(x)
x |
string |
a character vector in title case
Supports TPM and proportion of mapped reads.
sum_prop(counts, genes_interest)
sum_prop(counts, genes_interest)
counts |
Normalised gene expression count matrix |
genes_interest |
dataframe of genes of interest to merge |
a vector of sums per group
This list contains 2 dataframes where each contains features per cell (cell X features) that can be used for classification.
training_mES_features
training_mES_features
a list with 2 elements (all_features, common_features).
NULL, but makes available a list with 2 dataframes
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
This data frame has 2 columns: First showing cell names, the second indicating if cell is of low (0) or high (1) quality
training_mES_labels
training_mES_labels
a dataframe with 2 columns (cell_names, label).
NULL, but makes available a dataframe with cell annotations
Tomislav Ilicic & Davis McCarthy, 2015-03-05
Wellcome Trust Sanger Institute
Internal function to detect outliers from the mvoultier pacakge Modified slightly so that plots are not printed
uni.plot(x, symb = FALSE, quan = 1/2, alpha = 0.025)
uni.plot(x, symb = FALSE, quan = 1/2, alpha = 0.025)
x |
A matrix containing counts |
symb |
Symbols |
quan |
quan |
alpha |
alpha |
a list of outlier indicators