Title: | A Supervised Approach for **P**r**e**dicting **c**ell Cycle Pr**o**gression using scRNA-seq data |
---|---|
Description: | Our approach provides a way to assign continuous cell cycle phase using scRNA-seq data, and consequently, allows to identify cyclic trend of gene expression levels along the cell cycle. This package provides method and training data, which includes scRNA-seq data collected from 6 individual cell lines of induced pluripotent stem cells (iPSCs), and also continuous cell cycle phase derived from FUCCI fluorescence imaging data. |
Authors: | Chiaowen Joyce Hsiao [aut, cre], Matthew Stephens [aut], John Blischak [ctb], Peter Carbonetto [ctb] |
Maintainer: | Chiaowen Joyce Hsiao <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.19.0 |
Built: | 2024-12-25 05:45:41 UTC |
Source: | https://github.com/bioc/peco |
List of cell cycle genes and their associated cell cycle state as reported in Whitfield et al. 2002.
data(cellcyclegenes_whitfield2002)
data(cellcyclegenes_whitfield2002)
A list with the follwing elements
Gene symbol
ENSEMBL gene ID
Marker phase identified in Whitfield et al. 2002
We define distance between two angles: the minimum of the differences in both clockwise and counterclockwise directions.
circ_dist(y1, y2)
circ_dist(y1, y2)
y1 |
A vector of angles. |
y2 |
A vector of angles. |
A vector of distances between angles.
Joyce Hsiao, Matthew Stephens
# a vector of angles theta_ref <- seq(0,2*pi, length.out=100) # shift the origin of theta_ref to pi theta_compare <- shift_origin(theta_ref, origin = pi) mean(circ_dist(theta_ref, theta_compare)) # after rotation of angles, difference is 0 between the original # and the shifted angles theta_compare_rotated <- rotation(ref_var=theta_ref, shift_var=theta_compare) mean(circ_dist(theta_ref, theta_compare_rotated))
# a vector of angles theta_ref <- seq(0,2*pi, length.out=100) # shift the origin of theta_ref to pi theta_compare <- shift_origin(theta_ref, origin = pi) mean(circ_dist(theta_ref, theta_compare)) # after rotation of angles, difference is 0 between the original # and the shifted angles theta_compare_rotated <- rotation(ref_var=theta_ref, shift_var=theta_compare) mean(circ_dist(theta_ref, theta_compare_rotated))
Estimates cyclic trends of gene expression levels using training data.
cycle_npreg_insample(Y, theta, ncores = 2, polyorder = 2, method.trend = c("trendfilter", "loess", "bspline"))
cycle_npreg_insample(Y, theta, ncores = 2, polyorder = 2, method.trend = c("trendfilter", "loess", "bspline"))
Y |
A matrix of normalized and transformed gene expression values. Gene by sample. |
theta |
A vector of angles. |
ncores |
We use doParallel package for parallel computing. |
polyorder |
We estimate cyclic trends of gene expression levels using nonparamtric trend filtering. The default fits second degree polynomials. |
method.trend |
Varous methods that can be applied to estimate cyclic trend of gene expression levels. |
A list with four elements:
Y |
Gene expression marix. |
theta |
Vector of angles or cell cycle phases. |
sigma_est |
Estimated standard error of the cyclic trend for each gene. |
funs_est |
A list of functions for approximating the cyclic trends of gene express levels for each gene. |
Joyce Hsiao
cycle_npreg_mstep
for estimating cyclic functions given
inferred phases from cycle_npreg_loglik
,
cycle_npreg_outsample
for predicting cell cycle phase
using parameters learned from cycle_npreg_insample
Other peco classifier functions: cycle_npreg_loglik
,
cycle_npreg_mstep
,
cycle_npreg_outsample
,
initialize_grids
# see \code{\link{cycle_npreg_insample}}
# see \code{\link{cycle_npreg_insample}}
Infer angles or cell cycle phase based on gene expression data
cycle_npreg_loglik(Y, sigma_est, funs_est, grids = 100, method.grid = c("pca", "uniform"))
cycle_npreg_loglik(Y, sigma_est, funs_est, grids = 100, method.grid = c("pca", "uniform"))
Y |
Gene by sample expression matrix. |
sigma_est |
A vector of standard errors for each gene from the training data. |
funs_est |
A vector of cyclic functions estimated for each gene from the training data. |
grids |
number of bins to be selected along 0 to 2pi. |
method.grid |
The approach to initialize angles in the
computation. |
A list with the following three elements:
cell_times_est |
Inferred angles or cell cycle phases, NOT ordered. |
loglik_est |
Log-likelihood estimates for each gene. |
prob_per_cell_by_celltimes |
Probabilities of each cell belong to each bin. |
Joyce Hsiao
initialize_grids
for selecting
angles in cycle_npreg_loglik
,
cycle_npreg_mstep
for estimating cyclic functions given
inferred phases from cycle_npreg_loglik
,
cycle_npreg_outsample
for predicting cell cycle phase
using parameters learned from cycle_npreg_insample
Other peco classifier functions: cycle_npreg_insample
,
cycle_npreg_mstep
,
cycle_npreg_outsample
,
initialize_grids
This is used in both cycle_npreg_insample (training data fitting) and cycle_npreg_outsample (testing data prediction) to estimate cyclic trends of gene expression values. The function outputs for each gene standard error of the cyclic trend, cyclic function, and the estimated expression levels given the cyclic function.
cycle_npreg_mstep(Y, theta, method.trend = c("trendfilter", "loess", "bspline"), polyorder = 2, ncores = 2)
cycle_npreg_mstep(Y, theta, method.trend = c("trendfilter", "loess", "bspline"), polyorder = 2, ncores = 2)
Y |
Gene by sample expression matrix (log2CPM). |
theta |
Observed cell times. |
method.trend |
How to estimate cyclic trend of gene expression values?
We offer three options: 'trendfilter' ( |
polyorder |
We estimate cyclic trends of gene expression levels using nonparamtric trend filtering. The default fits second degree polynomials. |
ncores |
How many computing cores to use? We use doParallel package for parallel computing. |
A list with the following elements:
Y |
Input gene expression data. |
theta |
Input angles. |
mu_est |
Estimated expression levels given the cyclic function for each gene. |
sigma_est |
Estimated standard error of the cyclic trends for each gene |
funs |
Estimated cyclic functions |
Joyce Hsiao
cycle_npreg_insample
for estimating cyclic functions
given known phasesfrom training data,
cycle_npreg_outsample
for predicting cell cycle phase
using parameters learned from cycle_npreg_insample
Other peco classifier functions: cycle_npreg_insample
,
cycle_npreg_loglik
,
cycle_npreg_outsample
,
initialize_grids
Apply the estimates of cycle_npreg_insample to another gene expression dataset to infer an angle or cell cycle phase for each cell.
cycle_npreg_outsample(Y_test, sigma_est, funs_est, method.trend = c("trendfilter", "loess", "bspline"), normed = TRUE, polyorder = 2, method.grid = "uniform", ncores = 2, grids = 100, get_trend_estimates = FALSE)
cycle_npreg_outsample(Y_test, sigma_est, funs_est, method.trend = c("trendfilter", "loess", "bspline"), normed = TRUE, polyorder = 2, method.grid = "uniform", ncores = 2, grids = 100, get_trend_estimates = FALSE)
Y_test |
A SingleCellExperiment object. |
sigma_est |
Input from training data. A vector of gene-specific standard error of the cyclic trends. |
funs_est |
Input fron training data. A vector of cyclic functions estimating cyclic trends. |
method.trend |
Varous methods that can be applied to estimate cyclic trend of gene expression levels. |
normed |
Is the data already normalized? TRUE or FALSE. |
polyorder |
We estimate cyclic trends of gene expression levels using nonparamtric trend filtering. The default fits second degree polynomials. |
method.grid |
Method for defining bins along the circle. |
ncores |
We use doParallel package for parallel computing. |
grids |
number of bins to be selected along 0 to 2pi. |
get_trend_estimates |
To re-estimate the cylic trend based on the predicted cell cycle phase or not (T or F). Default FALSE. This step calls trendfilter and is computationally intensive. |
A list with the following elements:
Y |
The input gene expression marix. |
cell_times_est |
Inferred angles or cell cycle phases, NOT ordered. |
loglik_est |
Log-likelihood estimates for each gene. |
cell_times_reordered |
The inferred angles reordered (in ascending order). |
Y_reorded |
The input gene expression matrix reordered by cell_times_reordered. |
sigma_reordered |
Estimated standard error of the cyclic trend for each gene, reordered by cell_times_reordered. |
funs_reordered |
A list of functions for approximating the cyclic trends of gene express levels for each gene, reordered by cell_times_reordered. |
mu_reordered |
Estimated cyclic trend of gene expression values for each gene, reordered by cell_times_reordered. |
prob_per_cell_by_celltimes |
Probabilities of each cell belong to each bin. |
cycle_npreg_insample
for obtaining parameteres for
cyclic functions from training data,
cycle_npreg_loglik
for log-likehood at
angles between 0 to 2pi,
initialize_grids
for selecting
angles in cycle_npreg_loglik
,
cycle_npreg_mstep
for estimating cyclic functions given
inferred phases from cycle_npreg_loglik
Other peco classifier functions: cycle_npreg_insample
,
cycle_npreg_loglik
,
cycle_npreg_mstep
,
initialize_grids
# import data library(SingleCellExperiment) data(sce_top101genes) # select top 5 cyclic genes sce_top5 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:5],] # Select samples from NA18511 for our prediction example coldata <- colData(sce_top5) which_samples_train <- rownames(coldata)[coldata$chip_id != "NA18511"] which_samples_predict <- rownames(coldata)[coldata$chip_id == "NA18511"] # learning cyclic functions of the genes using our training data sce_top5 <- data_transform_quantile(sce_top5) expr_quant <- assay(sce_top5, "cpm_quantNormed") Y_train <- expr_quant[, colnames(expr_quant) %in% which_samples_train] theta_train <- coldata$theta_shifted[rownames(coldata) %in% which_samples_train] names(theta_train) <- rownames(coldata)[rownames(coldata) %in% which_samples_train] # obtain cyclic function estimates model_5genes_train <- cycle_npreg_insample(Y = Y_train, theta = theta_train, polyorder=2, ncores=2, method.trend="trendfilter") # predict cell cycle model_5genes_predict <- cycle_npreg_outsample( Y_test=sce_top5[,colnames(sce_top5) %in% which_samples_predict], sigma_est=model_5genes_train$sigma_est, funs_est=model_5genes_train$funs_est, method.trend="trendfilter", ncores=2, get_trend_estimates=FALSE) # estimate cyclic gene expression levels given cell cycle for each gene predict_cyclic <- fit_cyclical_many(Y=assay(model_5genes_predict$Y,"cpm_quantNormed"), theta=colData(model_5genes_predict$Y)$cellcycle_peco) all.equal(names(predict_cyclic[[2]]), colnames(predict_cyclic[[1]])) par(mfrow=c(2,3), mar=c(4,4,3,1)) for (g in seq_along(rownames(model_5genes_predict$Y))) { plot(assay(model_5genes_predict$Y,"cpm_quantNormed")[ rownames(model_5genes_predict$Y)[g],], x=colData(model_5genes_predict$Y)$cellcycle_peco, axes=FALSE, xlab="FUCCI phase", ylab="Predicted phase") points(y=predict_cyclic$cellcycle_function[[ rownames(model_5genes_predict$Y)[g]]]( seq(0, 2*pi, length.out = 100)), x=seq(0, 2*pi, length.out = 100), pch=16, col="royalblue") axis(2); axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7) title(rownames(model_5genes_predict$Y_reordered)[g]) } title("Predicting cell cycle phase for NA18511", outer=TRUE)
# import data library(SingleCellExperiment) data(sce_top101genes) # select top 5 cyclic genes sce_top5 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:5],] # Select samples from NA18511 for our prediction example coldata <- colData(sce_top5) which_samples_train <- rownames(coldata)[coldata$chip_id != "NA18511"] which_samples_predict <- rownames(coldata)[coldata$chip_id == "NA18511"] # learning cyclic functions of the genes using our training data sce_top5 <- data_transform_quantile(sce_top5) expr_quant <- assay(sce_top5, "cpm_quantNormed") Y_train <- expr_quant[, colnames(expr_quant) %in% which_samples_train] theta_train <- coldata$theta_shifted[rownames(coldata) %in% which_samples_train] names(theta_train) <- rownames(coldata)[rownames(coldata) %in% which_samples_train] # obtain cyclic function estimates model_5genes_train <- cycle_npreg_insample(Y = Y_train, theta = theta_train, polyorder=2, ncores=2, method.trend="trendfilter") # predict cell cycle model_5genes_predict <- cycle_npreg_outsample( Y_test=sce_top5[,colnames(sce_top5) %in% which_samples_predict], sigma_est=model_5genes_train$sigma_est, funs_est=model_5genes_train$funs_est, method.trend="trendfilter", ncores=2, get_trend_estimates=FALSE) # estimate cyclic gene expression levels given cell cycle for each gene predict_cyclic <- fit_cyclical_many(Y=assay(model_5genes_predict$Y,"cpm_quantNormed"), theta=colData(model_5genes_predict$Y)$cellcycle_peco) all.equal(names(predict_cyclic[[2]]), colnames(predict_cyclic[[1]])) par(mfrow=c(2,3), mar=c(4,4,3,1)) for (g in seq_along(rownames(model_5genes_predict$Y))) { plot(assay(model_5genes_predict$Y,"cpm_quantNormed")[ rownames(model_5genes_predict$Y)[g],], x=colData(model_5genes_predict$Y)$cellcycle_peco, axes=FALSE, xlab="FUCCI phase", ylab="Predicted phase") points(y=predict_cyclic$cellcycle_function[[ rownames(model_5genes_predict$Y)[g]]]( seq(0, 2*pi, length.out = 100)), x=seq(0, 2*pi, length.out = 100), pch=16, col="royalblue") axis(2); axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7) title(rownames(model_5genes_predict$Y_reordered)[g]) } title("Predicting cell cycle phase for NA18511", outer=TRUE)
For each gene, transform counts to CPM and then to a normal distribution.
data_transform_quantile(sce, ncores = 2)
data_transform_quantile(sce, ncores = 2)
sce |
SingleCellExperiment Object. |
ncores |
We use doParallel package for parallel computing. |
SingleCellExperiment Object with an added slot of cpm_quant, cpm slot is added if it doesn't exist.
Joyce Hsiao
# use our data library(SingleCellExperiment) data(sce_top101genes) # perform CPM normalization using scater, and # quantile-normalize the CPM values of each gene to normal distribution sce_top101genes <- data_transform_quantile(sce_top101genes, ncores=2) plot(y=assay(sce_top101genes, "cpm_quantNormed")[1,], x=assay(sce_top101genes, "cpm")[1,], xlab = "CPM bbefore quantile-normalization", ylab = "CPM after quantile-normalization")
# use our data library(SingleCellExperiment) data(sce_top101genes) # perform CPM normalization using scater, and # quantile-normalize the CPM values of each gene to normal distribution sce_top101genes <- data_transform_quantile(sce_top101genes, ncores=2) plot(y=assay(sce_top101genes, "cpm_quantNormed")[1,], x=assay(sce_top101genes, "cpm")[1,], xlab = "CPM bbefore quantile-normalization", ylab = "CPM after quantile-normalization")
Use bsplies to cyclic trend of gene expression levels
fit_bspline(yy, time)
fit_bspline(yy, time)
yy |
A vector of gene expression values for one gene. The expression values are assumed to have been normalized and transformed to standard normal distribution. |
time |
A vector of angels (cell cycle phase). |
A list with one element, pred.yy
, giving the
estimated cyclic trend.
Joyce Hsiao
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[1, names(theta_ordered)] fit <- fit_bspline(yy_ordered, time=theta_ordered) plot(x=theta_ordered, y=yy_ordered, pch=16, cex=.7, axes=FALSE, ylab="quantile-normalized expression values", xlab="FUCCI phase", main = "bspline fit") points(x=theta_ordered, y=fit$pred.yy, col="blue", pch=16, cex=.7) axis(2) axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7)
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[1, names(theta_ordered)] fit <- fit_bspline(yy_ordered, time=theta_ordered) plot(x=theta_ordered, y=yy_ordered, pch=16, cex=.7, axes=FALSE, ylab="quantile-normalized expression values", xlab="FUCCI phase", main = "bspline fit") points(x=theta_ordered, y=fit$pred.yy, col="blue", pch=16, cex=.7) axis(2) axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7)
We applied quadratic (second order) trend filtering using the trendfilter function in the genlasso package (Tibshirani, 2014). The trendfilter function implements a nonparametric smoothing method which chooses the smoothing parameter by cross-validation and fits a piecewise polynomial regression. In more specifics: The trendfilter method determines the folds in cross-validation in a nonrandom manner. Every k-th data point in the ordered sample is placed in the k-th fold, so the folds contain ordered subsamples. We applied five-fold cross-validation and chose the smoothing penalty using the option lambda.1se: among all possible values of the penalty term, the largest value such that the cross-validation standard error is within one standard error of the minimum. Furthermore, we desired that the estimated expression trend be cyclical. To encourage this, we concatenated the ordered gene expression data three times, with one added after another. The quadratic trend filtering was applied to the concatenated data series of each gene.
fit_cyclical_many(Y, theta, polyorder = 2, ncores = 2)
fit_cyclical_many(Y, theta, polyorder = 2, ncores = 2)
Y |
A matrix (gene by sample) of gene expression values. The expression values are assumed to have been normalized and transformed to standard normal distribution. |
theta |
A vector of cell cycle phase (angles) for single-cell samples. |
polyorder |
We estimate cyclic trends of gene expression levels using nonparamtric trend filtering. The default fits second degree polynomials. |
ncores |
doParallel package is used to perform parallel computing to reduce computational time. |
A list containing the following objects
predict.yy |
A matrix of predicted expression values at observed cell cycle. |
cellcycle_peco_ordered |
A vector of predicted cell cycle. The values range between 0 to 2pi |
cellcycle_function |
A list of predicted cell cycle functions. |
pve |
A vector of proportion of variance explained in each gene by the predicted cell cycle. |
Joyce Hsiao
fit_trendfilter
for fitting one gene
using trendfilter
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[, names(theta_ordered)] fit <- fit_cyclical_many(Y=yy_ordered, theta=theta_ordered)
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[, names(theta_ordered)] fit <- fit_cyclical_many(Y=yy_ordered, theta=theta_ordered)
Use loess to estimate cyclic trends of expression values
fit_loess(yy, time)
fit_loess(yy, time)
yy |
A vector of gene expression values for one gene. The expression values are assumed to have been normalized and transformed to standard normal distribution. |
time |
A vector of angles (cell cycle phase). |
A list with one element, pred.yy
, giving the
estimated cyclic trend.
Joyce Hsiao
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[1, names(theta_ordered)] fit <- fit_loess(yy_ordered, time=theta_ordered) plot(x=theta_ordered, y=yy_ordered, pch=16, cex=.7, axes=FALSE, ylab="quantile-normalized expression values", xlab="FUCCI phase", main = "loess fit") points(x=theta_ordered, y=fit$pred.yy, col="blue", pch=16, cex=.7) axis(2) axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7)
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[1, names(theta_ordered)] fit <- fit_loess(yy_ordered, time=theta_ordered) plot(x=theta_ordered, y=yy_ordered, pch=16, cex=.7, axes=FALSE, ylab="quantile-normalized expression values", xlab="FUCCI phase", main = "loess fit") points(x=theta_ordered, y=fit$pred.yy, col="blue", pch=16, cex=.7) axis(2) axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7)
We applied quadratic (second order) trend filtering using the trendfilter function in the genlasso package (Tibshirani, 2014). The trendfilter function implements a nonparametric smoothing method which chooses the smoothing parameter by cross-validation and fits a piecewise polynomial regression. In more specifics: The trendfilter method determines the folds in cross-validation in a nonrandom manner. Every k-th data point in the ordered sample is placed in the k-th fold, so the folds contain ordered subsamples. We applied five-fold cross-validation and chose the smoothing penalty using the option lambda.1se: among all possible values of the penalty term, the largest value such that the cross-validation standard error is within one standard error of the minimum. Furthermore, we desired that the estimated expression trend be cyclical. To encourage this, we concatenated the ordered gene expression data three times, with one added after another. The quadratic trend filtering was applied to the concatenated data series of each gene.
fit_trendfilter(yy, polyorder = 2)
fit_trendfilter(yy, polyorder = 2)
yy |
A vector of gene expression values for one gene that are ordered by cell cycle phase. Also, the expression values are normalized and transformed to standard normal distribution. |
polyorder |
We estimate cyclic trends of gene expression levels using nonparamtric trend filtering. The default fits second degree polynomials. |
A list with two elements:
trend.yy |
The estimated cyclic trend. |
pve |
Proportion of variance explained by the cyclic trend in the gene expression levels. |
Joyce Hsiao
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[1, names(theta_ordered)] fit <- fit_trendfilter(yy_ordered) plot(x=theta_ordered, y=yy_ordered, pch=16, cex=.7, axes=FALSE, ylab="quantile-normalized expression values", xlab="FUCCI phase", main = "trendfilter fit") points(x=theta_ordered, y=fit$trend.yy, col="blue", pch=16, cex=.7) axis(2) axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7)
library(SingleCellExperiment) data(sce_top101genes) # select top 10 cyclic genes sce_top10 <- sce_top101genes[order(rowData(sce_top101genes)$pve_fucci, decreasing=TRUE)[1:10],] coldata <- colData(sce_top10) # cell cycle phase based on FUCCI scores theta <- coldata$theta names(theta) <- rownames(coldata) # normalize expression counts sce_top10 <- data_transform_quantile(sce_top10, ncores=2) exprs_quant <- assay(sce_top10, "cpm_quantNormed") # order FUCCI phase and expression theta_ordered <- theta[order(theta)] yy_ordered <- exprs_quant[1, names(theta_ordered)] fit <- fit_trendfilter(yy_ordered) plot(x=theta_ordered, y=yy_ordered, pch=16, cex=.7, axes=FALSE, ylab="quantile-normalized expression values", xlab="FUCCI phase", main = "trendfilter fit") points(x=theta_ordered, y=fit$trend.yy, col="blue", pch=16, cex=.7) axis(2) axis(1,at=c(0,pi/2, pi, 3*pi/2, 2*pi), labels=c(0,expression(pi/2), expression(pi), expression(3*pi/2), expression(2*pi))) abline(h=0, lty=1, col="black", lwd=.7)
For prediction, initialize grid points for cell cycle phase on a circle.
initialize_grids(Y, grids = 100, method.grid = c("pca", "uniform"))
initialize_grids(Y, grids = 100, method.grid = c("pca", "uniform"))
Y |
Gene expression matrix. Gene by sample. |
grids |
number of bins to be selected along 0 to 2pi. |
method.grid |
The approach to initialize angles in the
computation. |
A vector of initialized angles to be used in
cycle_npreg_loglik
to infer angles.
Joyce Hsiao
cycle_npreg_loglik
for log-likehood at
angles between 0 to 2pi,
cycle_npreg_mstep
for estimating cyclic functions given
inferred phases from cycle_npreg_loglik
,
cycle_npreg_outsample
for predicting cell cycle phase
using parameters learned from cycle_npreg_insample
Other peco classifier functions: cycle_npreg_insample
,
cycle_npreg_loglik
,
cycle_npreg_mstep
,
cycle_npreg_outsample
We use FUCCI intensities to infer the position of the cells in cell cycle progression. The result is a vector of angles on a unit circle corresponding to the positions of the cells in cell cycle progression.
intensity2circle(mat, plot.it = FALSE, method = c("trig", "algebraic"))
intensity2circle(mat, plot.it = FALSE, method = c("trig", "algebraic"))
mat |
A matrix of two columns of summarized fluorescence intensity. Rows correspond to samples. |
plot.it |
TRUE or FALSE. Plot the fitted results. |
method |
The method used to fit the circle. |
The inferred angles on unit circle based on the input intensity measurements.
Joyce Hsiao
# use our data library(SingleCellExperiment) data(sce_top101genes) # FUCCI scores - log10 transformed sum of intensities that were # corrected for background noise ints <- colData(sce_top101genes)[,c("rfp.median.log10sum.adjust", "gfp.median.log10sum.adjust")] intensity2circle(ints, plot.it=TRUE, method = "trig")
# use our data library(SingleCellExperiment) data(sce_top101genes) # FUCCI scores - log10 transformed sum of intensities that were # corrected for background noise ints <- colData(sce_top101genes)[,c("rfp.median.log10sum.adjust", "gfp.median.log10sum.adjust")] intensity2circle(ints, plot.it=TRUE, method = "trig")
Pre-computed results. Applied cycle_npreg_outsample and results stored in model_5genes_train to predict cell cycle phase for single-cell samples of NA19098. The predicted cell cycle is stored as variable cellcycle_peco.
data(model_5genes_predict)
data(model_5genes_predict)
A list with the follwing elements
Predict cell cycle, the values ranged between 0 to 2pi
Pre-computed results. Applied cycle_npreg_insample to obtain gene-specific cyclic trend parameters using samples from 5 individuals
data(model_5genes_train)
data(model_5genes_train)
A list with the follwing elements
a data.frame (gene by sample) of quantile-normailzed gene expression values
a vector of cell cycl phase values (range between 0 to 2pi)
a vector of estimated standard errors
a list of estimated cyclic functions
Because the origin of the cell cycle phases is arbitrary, we transform the angles prior to computing the distance (rotation and shifting) to minimize the distance between two vectors. After this, one can apply circ_dist to compute the distance between the output value and ref_var.
rotation(ref_var, shift_var)
rotation(ref_var, shift_var)
ref_var |
A vector of reference angles. |
shift_var |
A vector of angles to be compared to ref_var. |
The transformed values of shift_var after rotation and shifting.
Matthew Stephens
# a vector of angles theta_ref <- seq(0,2*pi, length.out=100) # shift the origin of theta_ref to pi theta_compare <- shift_origin(theta_ref, origin = pi) # rotate theta_compare in a such a way that the distance # between theta_ref and thet_compare is minimized theta_compare_rotated <- rotation(ref_var=theta_ref, shift_var=theta_compare) par(mofrow=c(1,2)) plot(x=theta_ref, y = theta_compare) plot(x=theta_ref, y = theta_compare_rotated)
# a vector of angles theta_ref <- seq(0,2*pi, length.out=100) # shift the origin of theta_ref to pi theta_compare <- shift_origin(theta_ref, origin = pi) # rotate theta_compare in a such a way that the distance # between theta_ref and thet_compare is minimized theta_compare_rotated <- rotation(ref_var=theta_ref, shift_var=theta_compare) par(mofrow=c(1,2)) plot(x=theta_ref, y = theta_compare) plot(x=theta_ref, y = theta_compare_rotated)
A SingleCellExperiment object (require SingleCellExperiment package) including molecule count data after gene and smaple filtering. The 'colData()' slot contains sample phenotype information and the 'rowData()' slot contains gene feature information.
data(sce_top101genes)
data(sce_top101genes)
A SingleCellExperiment object with 888 samples and the 101 significant cyclic genes,
Inferred angles of each cell along a circle, also known as FUCCI phase.
Shift origin of the angles for visualization
shift_origin(phase, origin)
shift_origin(phase, origin)
phase |
A vector of angles (in radians). |
origin |
the new origin of the angles. |
A vector of angles shifted to the new origin.
Joyce Hsiao
# make a vector of angles theta <- seq(0,2*pi, length.out=100) # shift the origin of theta to pi theta_shifted <- shift_origin(theta, origin = pi) plot(x=theta, y = theta_shifted)
# make a vector of angles theta <- seq(0,2*pi, length.out=100) # shift the origin of theta to pi theta_shifted <- shift_origin(theta, origin = pi) plot(x=theta, y = theta_shifted)
Pre-computed results. Applied fit_cyclic_many to 888 single-cell samples that have both normalized gene expression values and cell cycle labels to obtain training results that can be used as input for predicting cell cycle phase in other data.
data(training_human)
data(training_human)
A list with the follwing elements
Estimated cyclic expression values in the training data
Training labels ordered from 0 to 2pi
Nonparametric function of cyclic gene expression trend obtained by trendfilter function in genlasso
Proportion of variance explained in each gene by the cell cycl phase label