Title: | Population-level Representation on scRNA-Seq data |
---|---|
Description: | This package aims at representing and summarizing the entire single-cell profile of a sample. It allows researchers to perform important bioinformatic analyses at the sample-level such as visualization and quality control. The main functions Estimate sample distribution and calculate statistical divergence among samples, and visualize the distance matrix through MDS plots. |
Authors: | William Torous [aut, cre] , Hao Wang [aut] , Elizabeth Purdom [aut], Boying Gong [aut] |
Maintainer: | William Torous <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.5.0 |
Built: | 2024-12-18 03:40:42 UTC |
Source: | https://github.com/bioc/GloScope |
'example_SCE' is a SingleCellExperiment object which contains PCA embeddings and metadata for PBMCs from 20 COVID-infected and healthy control patients. Each sample is reduced to a random subset of 500 cells, for a total of 10,000 cells. The 'colData' slot of the object contains the metadata for each cell, its sample ID and phenotype. The dimensionality reductions slot contains the first 50 PCs, and these embeddings are provided by the authors of "Single-cell multi-omics analysis of the immune response in COVID-19" (Stephenson et al., 2021; doi: 10.1038/s41591-021-01329-2).
'example_SCE_small' is a SingleCellExperiment with the same structure as 'example_SCE', but only containing data from the first five samples. This is a smaller set for examples.
A SingleCellExperiment object with metadata and PCA embeddings
A SingleCellExperiment object
# Code to create the small SCE from the full sample # Reduction to 5 samples demonstrates data extraction from SCE objects data(example_SCE) sample_ids <- SingleCellExperiment::colData(example_SCE)$sample_id whKeep <- which(sample_ids %in% unique(sample_ids)[seq_len(5)]) example_SCE_small <- SingleCellExperiment::SingleCellExperiment( assays=list(counts=matrix(rep(0,2500),ncol=2500)), colData=SingleCellExperiment::colData(example_SCE)[whKeep,], reducedDims=list("PCA"=SingleCellExperiment::reducedDim(example_SCE,"PCA")[whKeep,]))
# Code to create the small SCE from the full sample # Reduction to 5 samples demonstrates data extraction from SCE objects data(example_SCE) sample_ids <- SingleCellExperiment::colData(example_SCE)$sample_id whKeep <- which(sample_ids %in% unique(sample_ids)[seq_len(5)]) example_SCE_small <- SingleCellExperiment::SingleCellExperiment( assays=list(counts=matrix(rep(0,2500),ncol=2500)), colData=SingleCellExperiment::colData(example_SCE)[whKeep,], reducedDims=list("PCA"=SingleCellExperiment::reducedDim(example_SCE,"PCA")[whKeep,]))
This function calculates a matrix of pairwise divergences between input samples of single cell data.
gloscope( embedding_matrix, cell_sample_ids, dens = c("GMM", "KNN"), dist_mat = c("KL", "JS"), r = 10000, num_components = seq_len(9), k = 50, BPPARAM = BiocParallel::SerialParam(), prefit_density = NULL, return_density = FALSE )
gloscope( embedding_matrix, cell_sample_ids, dens = c("GMM", "KNN"), dist_mat = c("KL", "JS"), r = 10000, num_components = seq_len(9), k = 50, BPPARAM = BiocParallel::SerialParam(), prefit_density = NULL, return_density = FALSE )
embedding_matrix |
a matrix of latent embeddings with rows corresponding to cells and columns to dimensions |
cell_sample_ids |
a vector of the samples IDs each cell comes from. Length must match the number of rows in 'embedding_matrix' |
dens |
the density estimation. One of c("GMM","KNN") |
dist_mat |
distance metric to calculate the distance. One of c("KL","JS") |
r |
number of Monte Carlo simulations to generate |
num_components |
a vector of integers for the number of components to fit GMMS to, default is seq_len(9) |
k |
number of nearest neighbours for KNN density estimation, default k = 50. |
BPPARAM |
BiocParallel parameters, default is running in serial. Set random seed with 'RNGseed' argument |
prefit_density |
a named list of pre-fit 'densityMclust' objects for each sample, default is NULL |
return_density |
return the GMM parameter list or not (if applicable), default is FALSE |
A matrix containing the pairwise divergence or distance between all pairs of samples
# Bring in small example data of single cell embeddings data(example_SCE_small) sample_ids <- SingleCellExperiment::colData(example_SCE_small)$sample_id pca_embeddings <- SingleCellExperiment::reducedDim(example_SCE_small,"PCA") # Run gloscope on first 10 PCA embeddings # We use 'KNN' option for speed ('GMM' is slightly slower) pca_embeddings_subset <- pca_embeddings[,seq_len(10)] # select the first 10 PCs dist_result <- gloscope(pca_embeddings_subset, sample_ids, dens="KNN", BPPARAM = BiocParallel::SerialParam(RNGseed=2)) dist_result
# Bring in small example data of single cell embeddings data(example_SCE_small) sample_ids <- SingleCellExperiment::colData(example_SCE_small)$sample_id pca_embeddings <- SingleCellExperiment::reducedDim(example_SCE_small,"PCA") # Run gloscope on first 10 PCA embeddings # We use 'KNN' option for speed ('GMM' is slightly slower) pca_embeddings_subset <- pca_embeddings[,seq_len(10)] # select the first 10 PCs dist_result <- gloscope(pca_embeddings_subset, sample_ids, dens="KNN", BPPARAM = BiocParallel::SerialParam(RNGseed=2)) dist_result
This function calculates a matrix of pairwise divergences between input samples' cell type proportion.
gloscope_proportion( cell_sample_ids, cell_type_ids, ep = 0, dist_mat = c("KL", "JS") )
gloscope_proportion( cell_sample_ids, cell_type_ids, ep = 0, dist_mat = c("KL", "JS") )
cell_sample_ids |
a vector of the samples IDs each cell comes from. Length must match the number of element in 'cell_type_ids' |
cell_type_ids |
a vector of use defined cell type |
ep |
an integer of error term added to 0 proportion. Default ep = 0. |
dist_mat |
distance metric to calculate the distance. One of c("KL","JS") |
clusprop_dist a symmetric matrix of divergences
# Bring in small example data of single cell embeddings data(example_SCE_small) sample_id <- SingleCellExperiment::colData(example_SCE_small)$sample_id cluster_id <- SingleCellExperiment::colData(example_SCE_small)$cluster_id dist_result <- gloscope_proportion(sample_id, cluster_id, ep = 0.5, dist_mat = "KL") dist_result
# Bring in small example data of single cell embeddings data(example_SCE_small) sample_id <- SingleCellExperiment::colData(example_SCE_small)$sample_id cluster_id <- SingleCellExperiment::colData(example_SCE_small)$cluster_id dist_result <- gloscope_proportion(sample_id, cluster_id, ep = 0.5, dist_mat = "KL") dist_result
This function creates a multidimensional scaling plot for a set of samples using their GloScope divergence. Each sample's scatter will be color-coded based on their phenotype. The function calls the 'isoMDS' function from the 'MASS' package.
'paletteBig' is a small helper function to create a large color palette for plotting
plotMDS(dist_mat, metadata_df, sample_id, group_id, k = 10) paletteBig()
plotMDS(dist_mat, metadata_df, sample_id, group_id, k = 10) paletteBig()
dist_mat |
The divergence matrix output of 'gloscope()' |
metadata_df |
A data frame contains each sample's metadata. Note this is NOT at the cell-level. |
sample_id |
The column name or index in metadata_df that contains the sample ID |
group_id |
The column name or index in metadata_df that contains the patient condition |
k |
Number of MDS dimension to generate, default = 10 |
A list containing the MDS embedding and plot of the distance matrix
mds - A data.frame containing the MDS embedding, with the number of rows equal to the number of samples.
plot - A ggplot object containing the plot object. 'print' of the object will create a plot.
data(example_SCE_small) sample_ids <- SingleCellExperiment::colData(example_SCE_small)$sample_id # Run gloscope on first 10 PCA embeddings # We use 'KNN' option for speed ('GMM' is slightly slower) pca_embeddings <- SingleCellExperiment::reducedDim(example_SCE_small,"PCA") pca_embeddings_subset <- pca_embeddings[,seq_len(10)] # select the first 10 PCs dist_result <- gloscope(pca_embeddings_subset, sample_ids, dens="KNN", BPPARAM = BiocParallel::SerialParam(RNGseed=2)) # make a per-sample metadata sample_metadata <- as.data.frame(unique(SingleCellExperiment::colData(example_SCE_small)[,c(1,2)])) mds_result <- plotMDS(dist_mat = dist_result, metadata_df = sample_metadata , "sample_id", "phenotype",k=2) mds_result$plot head(mds_result$mds)
data(example_SCE_small) sample_ids <- SingleCellExperiment::colData(example_SCE_small)$sample_id # Run gloscope on first 10 PCA embeddings # We use 'KNN' option for speed ('GMM' is slightly slower) pca_embeddings <- SingleCellExperiment::reducedDim(example_SCE_small,"PCA") pca_embeddings_subset <- pca_embeddings[,seq_len(10)] # select the first 10 PCs dist_result <- gloscope(pca_embeddings_subset, sample_ids, dens="KNN", BPPARAM = BiocParallel::SerialParam(RNGseed=2)) # make a per-sample metadata sample_metadata <- as.data.frame(unique(SingleCellExperiment::colData(example_SCE_small)[,c(1,2)])) mds_result <- plotMDS(dist_mat = dist_result, metadata_df = sample_metadata , "sample_id", "phenotype",k=2) mds_result$plot head(mds_result$mds)