Title: | Heterogeneity-Induced Pre-Processing tOol |
---|---|
Description: | For scRNA-seq data, it selects features and clusters the cells simultaneously for single-cell UMI data. It has a novel feature selection method using the zero inflation instead of gene variance, and computationally faster than other existing methods since it only relies on PCA+Kmeans rather than graph-clustering or consensus clustering. |
Authors: | Tae Kim [aut, cre], Mengjie Chen [aut] |
Maintainer: | Tae Kim <[email protected]> |
License: | GPL (>=2) |
Version: | 1.19.0 |
Built: | 2024-11-18 03:39:46 UTC |
Source: | https://github.com/bioc/HIPPO |
A reference data frame that matches ENSG IDs to HGNC symbols
ensg_hgnc
ensg_hgnc
A data frame with 46606 rows and 2 columns
Ensembl ENSG IDs
HGNC symbols
Access data from SCE object
get_data_from_sce(sce)
get_data_from_sce(sce)
sce |
SingleCellExperiment object |
count matrix
data(toydata) X = get_data_from_sce(toydata)
data(toydata) X = get_data_from_sce(toydata)
Access hippo object from SingleCellExperiment object.
get_hippo(sce)
get_hippo(sce)
sce |
SingleCellExperiment object |
hippo object embedded in SingleCellExperiment object
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) hippo_object = get_hippo(toydata)
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) hippo_object = get_hippo(toydata)
Return hippo_diffexp object
get_hippo_diffexp(sce, k = 1)
get_hippo_diffexp(sce, k = 1)
sce |
SingleCellExperiment object with hippo |
k |
integer round of result of interest |
data frame of differential expression test
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_diffexp(toydata) result1 = get_hippo_diffexp(toydata)
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_diffexp(toydata) result1 = get_hippo_diffexp(toydata)
HIPPO's hierarchical clustering
hippo(sce, K = 20, z_threshold = 2, outlier_proportion = 0.001, verbose = TRUE)
hippo(sce, K = 20, z_threshold = 2, outlier_proportion = 0.001, verbose = TRUE)
sce |
SingleCellExperiment object |
K |
number of clusters to ultimately get |
z_threshold |
numeric > 0 as a z-value threshold for selecting the features |
outlier_proportion |
numeric between 0 and 1, a cut-off so that when the proportion of important features reach this number, the clustering terminates |
verbose |
if set to TRUE, it shows progress of the algorithm |
a list of clustering result for each level of k=1, 2, ... K.
data(toydata) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01)
data(toydata) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01)
Conduct feature selection by computing test statistics for each gene
hippo_diagnostic_plot(sce, show_outliers = FALSE, zvalue_thresh = 10)
hippo_diagnostic_plot(sce, show_outliers = FALSE, zvalue_thresh = 10)
sce |
SingleCellExperiment object with count matrix |
show_outliers |
boolean to indicate whether to circle the outliers with given zvalue_thresh |
zvalue_thresh |
a numeric v for defining outliers |
a diagnostic plot that shows genes with zero inflation
data(toydata) hippo_diagnostic_plot(toydata, show_outliers=TRUE, zvalue_thresh = 2)
data(toydata) hippo_diagnostic_plot(toydata, show_outliers=TRUE, zvalue_thresh = 2)
HIPPO's differential expression
hippo_diffexp( sce, top.n = 5, switch_to_hgnc = FALSE, ref = NA, k = NA, plottitle = "" )
hippo_diffexp( sce, top.n = 5, switch_to_hgnc = FALSE, ref = NA, k = NA, plottitle = "" )
sce |
SingleCellExperiment object with hippo |
top.n |
number of markers to return |
switch_to_hgnc |
if the current gene names are ensemble ids, and would like to switch to hgnc |
ref |
a data frame with columns 'hgnc' and 'ensg' to match each other, only required when switch_to_hgnc is set to TRUE |
k |
number of rounds of clustering that you'd like to see result. Default is 1 to K |
plottitle |
title of the resulting plot |
list of differential expression result
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) result = hippo_diffexp(toydata)
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) result = hippo_diffexp(toydata)
compute t-SNE or umap of each round of HIPPO
hippo_dimension_reduction( sce, method = c("umap", "tsne"), perplexity = 30, featurelevel = 1 )
hippo_dimension_reduction( sce, method = c("umap", "tsne"), perplexity = 30, featurelevel = 1 )
sce |
SingleCellExperiment object with hippo object in it. |
method |
a string that determines the method for dimension reduction: either 'umap' or 'tsne |
perplexity |
numeric perplexity parameter for Rtsne function |
featurelevel |
the round of clustering that you will extract features to reduce the dimension |
a data frame of dimension reduction result for each k in 1, ..., K
data(toydata) set.seed(20200321) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_dimension_reduction(toydata, method="tsne") hippo_tsne_plot(toydata)
data(toydata) set.seed(20200321) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_dimension_reduction(toydata, method="tsne") hippo_tsne_plot(toydata)
HIPPO's feature heatmap
hippo_feature_heatmap( sce, switch_to_hgnc = FALSE, ref = NA, top.n = 50, kk = 2, plottitle = "" )
hippo_feature_heatmap( sce, switch_to_hgnc = FALSE, ref = NA, top.n = 50, kk = 2, plottitle = "" )
sce |
SingleCellExperiment object with hippo |
switch_to_hgnc |
if the current gene names are ensemble ids, and would like to switch to hgnc |
ref |
a data frame with columns 'hgnc' and 'ensg' to match each other, only required when switch_to_hgnc is set to TRUE |
top.n |
number of markers to return |
kk |
integer for the round of clustering that you'd like to see result. Default is 2 |
plottitle |
title for the plot |
list of differential expression result
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) hippo_feature_heatmap(toydata)
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) hippo_feature_heatmap(toydata)
visualize each round of hippo through t-SNE
hippo_pca_plot(sce, k = NA, pointsize = 0.5, pointalpha = 0.5, plottitle = "")
hippo_pca_plot(sce, k = NA, pointsize = 0.5, pointalpha = 0.5, plottitle = "")
sce |
SincleCellExperiment object with hippo and t-SNE result in it |
k |
number of rounds of clustering that you'd like to see result. Default is 1 to K |
pointsize |
size of the point for the plot (default 0.5) |
pointalpha |
transparency level of points for the plot (default 0.5) |
plottitle |
title for the ggplot |
ggplot for pca in each round
data(toydata) set.seed(20200321) toydata = hippo(toydata, K = 10,z_threshold = 1) hippo_pca_plot(toydata, k = 2:3)
data(toydata) set.seed(20200321) toydata = hippo(toydata, K = 10,z_threshold = 1) hippo_pca_plot(toydata, k = 2:3)
visualize each round of hippo through t-SNE
hippo_tsne_plot(sce, k = NA, pointsize = 0.5, pointalpha = 0.5, plottitle = "")
hippo_tsne_plot(sce, k = NA, pointsize = 0.5, pointalpha = 0.5, plottitle = "")
sce |
SincleCellExperiment object with hippo and t-SNE result in it |
k |
number of rounds of clustering that you'd like to see result. Default is 1 to K |
pointsize |
size of the point for the plot (default 0.5) |
pointalpha |
transparency level of points for the plot (default 0.5) |
plottitle |
title for the ggplot output |
ggplot object for t-SNE in each round
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_dimension_reduction(toydata, method="tsne") hippo_tsne_plot(toydata)
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_dimension_reduction(toydata, method="tsne") hippo_tsne_plot(toydata)
visualize each round of hippo through UMAP
hippo_umap_plot(sce, k = NA, pointsize = 0.5, pointalpha = 0.5, plottitle = "")
hippo_umap_plot(sce, k = NA, pointsize = 0.5, pointalpha = 0.5, plottitle = "")
sce |
SingleCellExperiment object with hippo and UMAP result in it |
k |
number of rounds of clustering that you'd like to see result. Default is 1 to K |
pointsize |
size of the point for the plot (default 0.5) |
pointalpha |
transparency level of points for the plot (default 0.5) |
plottitle |
title of the resulting plot |
ggplot object for umap in each round
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_dimension_reduction(toydata, method="umap") hippo_umap_plot(toydata)
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) toydata = hippo_dimension_reduction(toydata, method="umap") hippo_umap_plot(toydata)
Expected zero proportion under Negative Binomial
nb_prob_zero(lambda, theta)
nb_prob_zero(lambda, theta)
lambda |
numeric vector of means of negative binomial |
theta |
numeric vector of the dispersion parameter for negative binomial, 0 if poisson |
numeric vector of expected zero proportion under Negative Binomial
nb_prob_zero(3, 1.1)
nb_prob_zero(3, 1.1)
Expected zero proportion under Poisson
pois_prob_zero(lambda)
pois_prob_zero(lambda)
lambda |
numeric vector of means of Poisson |
numeric vector of expected proportion of zeros for each lambda
pois_prob_zero(3)
pois_prob_zero(3)
Preprocess UMI data without cell label so that each row contains information about each gene
preprocess_heterogeneous(X)
preprocess_heterogeneous(X)
X |
a matrix object with counts data |
data frame with one row for each gene.
data(toydata) df = preprocess_heterogeneous(get_data_from_sce(toydata))
data(toydata) df = preprocess_heterogeneous(get_data_from_sce(toydata))
Preprocess UMI data with inferred or known labels
preprocess_homogeneous(sce, label)
preprocess_homogeneous(sce, label)
sce |
SingleCellExperiment object with counts data |
label |
a numeric or character vector of inferred or known label |
data frame with one row for each gene.
data(toydata) labels = SingleCellExperiment::colData(toydata)$phenoid df = preprocess_homogeneous(toydata, label = labels)
data(toydata) labels = SingleCellExperiment::colData(toydata)$phenoid df = preprocess_homogeneous(toydata, label = labels)
A sample single cell sequencing data subsetted from Zheng2017
toydata
toydata
Single Cell experiment object with 10,000 genes and 100 cells
https://www.nature.com/articles/ncomms14049
visualize each round of hippo through zero proportion plot
zero_proportion_plot( sce, switch_to_hgnc = FALSE, ref = NA, k = NA, plottitle = "", top.n = 5, pointsize = 0.5, pointalpha = 0.5, textsize = 3 )
zero_proportion_plot( sce, switch_to_hgnc = FALSE, ref = NA, k = NA, plottitle = "", top.n = 5, pointsize = 0.5, pointalpha = 0.5, textsize = 3 )
sce |
SingleCellExperiment object with hippo element in it |
switch_to_hgnc |
boolean argument to indicate whether to change the gene names from ENSG IDs to HGNC symbols |
ref |
a data frame with hgnc column and ensg column |
k |
select rounds of clustering that you would like to see result. Default is 1 to K |
plottitle |
Title of your plot output |
top.n |
number of top genes to show the name |
pointsize |
size of the ggplot point |
pointalpha |
transparency level of the ggplot point |
textsize |
text size of the resulting plot |
a ggplot object that shows the zero proportions for each round
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) data(ensg_hgnc) zero_proportion_plot(toydata, switch_to_hgnc = TRUE, ref = ensg_hgnc)
data(toydata) set.seed(20200321) toydata = hippo(toydata,K = 10,z_threshold = 1,outlier_proportion = 0.01) data(ensg_hgnc) zero_proportion_plot(toydata, switch_to_hgnc = TRUE, ref = ensg_hgnc)
Expected zero proportion under Negative Binomial
zinb_prob_zero(lambda, theta, pi)
zinb_prob_zero(lambda, theta, pi)
lambda |
gene mean |
theta |
dispersion parameter, 0 if zero-inflated poisson |
pi |
zero inflation, 0 if negative binomial |
Expected zero proportion under Zero-Inflated Negative Binomial
zinb_prob_zero(3, 1.1, 0.1)
zinb_prob_zero(3, 1.1, 0.1)