Title: | Tools for Single-Cell Analysis |
---|---|
Description: | Provides methods to perform trajectory analysis based on a minimum spanning tree constructed from cluster centroids. Computes pseudotemporal cell orderings by mapping cells in each cluster (or new cells) to the closest edge in the tree. Uses linear modelling to identify differentially expressed genes along each path through the tree. Several plotting and interactive visualization functions are also implemented. |
Authors: | Zhicheng Ji [aut, cre], Hongkai Ji [aut], Aaron Lun [ctb] |
Maintainer: | Zhicheng Ji <[email protected]> |
License: | GPL(>=2) |
Version: | 1.45.0 |
Built: | 2024-10-31 06:28:14 UTC |
Source: | https://github.com/bioc/TSCAN |
testing differentially expressed genes
difftest(data, TSCANorder, df = 3)
difftest(data, TSCANorder, df = 3)
data |
The raw single_cell data, which is a numeric matrix or data.frame. Rows represent genes/features and columns represent single cells. |
TSCANorder |
The TSCAN ordering generated by function |
df |
Numeric value specifying the degree of freedom used in the GAM model. |
This function tests whether a gene is significantly expressed given pseudotime ordering. Likelihood ratio test is performed to compare a generalized additive model (GAM) with a constant fit to get the p-values. The p-values are adjusted for multiple testing by fdr.
Data frame containing pvalues and qvalues of testing differentially expression.
Zhicheng Ji, Hongkai Ji <[email protected]>
data(lpsdata) procdata <- preprocess(lpsdata) lpsorder <- TSCANorder(exprmclust(procdata)) diffval <- difftest(procdata,lpsorder) #Selected differentially expressed genes under qvlue cutoff of 0.05 row.names(diffval)[diffval$qval < 0.05]
data(lpsdata) procdata <- preprocess(lpsdata) lpsorder <- TSCANorder(exprmclust(procdata)) diffval <- difftest(procdata,lpsorder) #Selected differentially expressed genes under qvlue cutoff of 0.05 row.names(diffval)[diffval$qval < 0.05]
Perform model-based clustering on expression values
exprmclust(data, clusternum = 2:9, modelNames = "VVV", reduce = T)
exprmclust(data, clusternum = 2:9, modelNames = "VVV", reduce = T)
data |
The raw single_cell data, which is a numeric matrix or data.frame. Rows represent genes/features and columns represent single cells. |
clusternum |
An integer vector specifying all possible cluster numbers. The best cluster number will be picked using BIC. The minimum value should be two other |
modelNames |
model to be used in model-based clustering. By default "ellipsoidal, varying volume, shape, and orientation" is used. |
reduce |
Whether to perform the PCA on the expression data. |
By default, this function first uses principal component analysis (PCA) to reduce dimensionality of original data. It then performs model-based clustering on the transformed expression values. A minimum-spanning-tree is constructed to link the cluster centers. The clustering results will be used for TSCAN ordering.
if more than one cluster detected, a list containing
pcareduceres Numeric matrix containing the transformed expression values after PCA.
MSTtree igraph object which is the result of constructing MST.
clusterid A named vector specifying which cluster the cells belong to.
clucenter Numeric matrix of the cluster centers.
if only one cluster detected, a list containing
pcareduceres Numeric matrix containing the transformed expression values after PCA.
Zhicheng Ji, Hongkai Ji <[email protected]>
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611-631.
data(lpsdata) procdata <- preprocess(lpsdata) exprmclust(procdata)
data(lpsdata) procdata <- preprocess(lpsdata) exprmclust(procdata)
The dataset contains 16776 rows and 131 columns. Each row represent a gene and each column represent a single cell. This dataset is a subset of single-cell RNA-seq data provided by GEO GSE48968. Only unstimulated cells and cells after 6h of LPS stimulation are retained for the purpose of demonstration. Genes which have raw expression values of greater than zero in at least one cell are retained. For the original dataset please refer to GSE48968 on GEO (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48968).
A matrix with 16776 rows and 131 variables
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48968
Shalek, A. K., Satija, R., Shuga, J., Trombetta, J. J., Gennert, D., Lu, D., ... & Regev, A. (2014). Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature.
Map each cell to the closest edge on the MST, reporting also the distance to the corresponding vertices.
mapCellsToEdges(x, ...) ## S4 method for signature 'ANY' mapCellsToEdges(x, mst, clusters, columns = NULL) ## S4 method for signature 'SummarizedExperiment' mapCellsToEdges(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' mapCellsToEdges( x, clusters = colLabels(x, onAbsence = "error"), ..., use.dimred = NULL )
mapCellsToEdges(x, ...) ## S4 method for signature 'ANY' mapCellsToEdges(x, mst, clusters, columns = NULL) ## S4 method for signature 'SummarizedExperiment' mapCellsToEdges(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' mapCellsToEdges( x, clusters = colLabels(x, onAbsence = "error"), ..., use.dimred = NULL )
x |
A numeric matrix of coordinates where each row represents a cell/sample and each column represents a dimension (usually a PC or another low-dimensional embedding, but features or genes can also be used). Alternatively, a SummarizedExperiment or SingleCellExperiment object
containing such a matrix in its Alternatively, for SingleCellExperiments, this matrix may be extracted from its |
... |
For the generic, further arguments to pass to the specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. For the SingleCellExperiment method, further arguments to pass to the SummarizedExperiment method
(if |
mst |
A graph object containing a MST,
constructed from the same coordinate space as the values in |
clusters |
A factor-like object of the same length as |
columns |
A character, logical or integer vector specifying the columns of |
assay.type |
An integer or string specifying the assay to use from a SummarizedExperiment |
use.dimred |
An integer or string specifying the reduced dimensions to use from a SingleCellExperiment |
For each cluster, we consider all edges of the MST involving that cluster. Each cell of that cluster is then mapped to the closest of these edges (where proximity is defined by Euclidean distance). The identity of and distance from each ends of the edge is reported; this can be useful for downstream pseudo-time calculations or to subset cells by those lying on a particular edge.
If clusters=NULL
, each cell can be mapped to any edge of the MST.
This is useful if the mst
was constructed from a different set of cells than those in x
,
allowing us to effectively project new datasets onto an existing MST.
Note, however, that the new x
must lie in the same coordinate space as the x
used to make mst
.
Some cells may simply be mapped to the edge endpoints. This manifests as values of zero for the distances from either end of the edge. For analyses focusing on a specific edge, it may be advisable to filter out such cells as their edge assignments are arbitrarily assigned and they do not contribute to any transitional process along the edge.
A DataFrame with one row per row of x
, containing the fields:
left.cluster
, the cluster on one end of the edge to which the cell was assigned.
right.cluster
, the cluster on the other end of the edge to which the cell was assigned.
left.distance
, the distance to the cluster centroid on one end.
right.distance
, the distance to the cluster centroid on the other end.
Note that the sum of the distances will always yield the edge length.
Aaron Lun
Ji Z and Ji H (2016). TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117
createClusterMST
, to generate mst
.
quickPseudotime
, a wrapper to quickly perform these calculations.
# Mocking up a Y-shaped trajectory. centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1)) rownames(centers) <- seq_len(nrow(centers)) clusters <- sample(nrow(centers), 1000, replace=TRUE) cells <- centers[clusters,] cells <- cells + rnorm(length(cells), sd=0.5) # Creating the MST first: mst <- createClusterMST(cells, clusters=clusters) plot(mst) # Mapping cells to the MST: mapping <- mapCellsToEdges(cells, mst, clusters=clusters) head(mapping) # Also works with some random extra cells: extras <- matrix(rnorm(1000), ncol=2) emapping <- mapCellsToEdges(extras, mst, clusters=NULL) head(emapping)
# Mocking up a Y-shaped trajectory. centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1)) rownames(centers) <- seq_len(nrow(centers)) clusters <- sample(nrow(centers), 1000, replace=TRUE) cells <- centers[clusters,] cells <- cells + rnorm(length(cells), sd=0.5) # Creating the MST first: mst <- createClusterMST(cells, clusters=clusters) plot(mst) # Mapping cells to the MST: mapping <- mapCellsToEdges(cells, mst, clusters=clusters) head(mapping) # Also works with some random extra cells: extras <- matrix(rnorm(1000), ncol=2) emapping <- mapCellsToEdges(extras, mst, clusters=NULL) head(emapping)
Compute a pseudotime for each cell lying on each path through the MST from a given starting node.
orderCells(mapping, mst, start = NULL)
orderCells(mapping, mst, start = NULL)
mapping |
A DataFrame of MST-mapping information for each cell,
usually obtained by running |
mst |
A graph object containing a MST from |
start |
A character vector specifying the starting node from which to compute pseudotimes in each component of |
The pseudotimes are returned as a matrix where each row corresponds to cell in x
and each column corresponds to a path through the MST from start
to all nodes of degree 1.
(If start
is itself a node of degree 1, then paths are only considered to all other such nodes.)
This format is inspired by that from the slingshot package and provides a compact representation of branching events.
Each branching event in the MST results in a new path and thus a new column in the pseudotime matrix.
An NA
entry for a cell indicates that it is not assigned to that particular path.
All non-NA
entries for any given cell are guaranteed to be identical.
This reflects the fact that multiple paths will share a section of the MST for which the pseudotimes are the same.
If start=NULL
, the starting node is completely arbitrarily chosen as directionality is impossible to infer from the expression matrix alone.
However, it is often possible to use prior biological knowledge to pick an appropriate cluster as the starting node.
A PseudotimeOrdering object where rows are cells and columns are paths through mst
.
The first entry of pathStats
contains a numeric matrix with the pseudotimes of each cell in each path.
The cellData
contains mapping
and the metadata
contains the chosen start
.
Aaron Lun
Ji Z and Ji H (2016). TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117
mapCellsToEdges
, to compute mapping
.
quickPseudotime
, a wrapper to quickly perform these calculations.
# Mocking up a Y-shaped trajectory. centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1)) rownames(centers) <- seq_len(nrow(centers)) clusters <- sample(nrow(centers), 1000, replace=TRUE) cells <- centers[clusters,] cells <- cells + rnorm(length(cells), sd=0.5) # Creating the MST and mapping the cells. mst <- createClusterMST(cells, clusters=clusters) mapping <- mapCellsToEdges(cells, mst, clusters=clusters) # Obtaining pseudo-time orderings. ordering <- orderCells(mapping, mst) unified <- rowMeans(pathStat(ordering), na.rm=TRUE) plot(cells[,1], cells[,2], col=topo.colors(21)[cut(unified, 21)], pch=16)
# Mocking up a Y-shaped trajectory. centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1)) rownames(centers) <- seq_len(nrow(centers)) clusters <- sample(nrow(centers), 1000, replace=TRUE) cells <- centers[clusters,] cells <- cells + rnorm(length(cells), sd=0.5) # Creating the MST and mapping the cells. mst <- createClusterMST(cells, clusters=clusters) mapping <- mapCellsToEdges(cells, mst, clusters=clusters) # Obtaining pseudo-time orderings. ordering <- orderCells(mapping, mst) unified <- rowMeans(pathStat(ordering), na.rm=TRUE) plot(cells[,1], cells[,2], col=topo.colors(21)[cut(unified, 21)], pch=16)
Calculate pseudotemporal ordering scores for orders
orderscore(subpopulation, orders)
orderscore(subpopulation, orders)
subpopulation |
Data frame with two columns. First column: cell names. Second column: sub-population codes. |
orders |
A list with various length containing pseudotime orderings. |
This function calculates pseudotemporal ordering scores (POS) based on the sub-population information and order information given by users. Cells should come from at least two cell sub-populations. These sub-population should be coded as 0,1,2,...
a numeric vector of calculated POS.
Zhicheng Ji, Hongkai Ji <[email protected]>
data(lpsdata) procdata <- preprocess(lpsdata) subpopulation <- data.frame(cell = colnames(procdata), sub = ifelse(grepl("Unstimulated",colnames(procdata)),0,1), stringsAsFactors = FALSE) lpsmclust <- exprmclust(procdata) #Comparing default TSCAN ordering and tuned TSCAN ordering order1 <- TSCANorder(lpsmclust) order2 <- TSCANorder(lpsmclust, c(1,2,3)) orders <- list(order1,order2) orderscore(subpopulation, orders)
data(lpsdata) procdata <- preprocess(lpsdata) subpopulation <- data.frame(cell = colnames(procdata), sub = ifelse(grepl("Unstimulated",colnames(procdata)),0,1), stringsAsFactors = FALSE) lpsmclust <- exprmclust(procdata) #Comparing default TSCAN ordering and tuned TSCAN ordering order1 <- TSCANorder(lpsmclust) order2 <- TSCANorder(lpsmclust, c(1,2,3)) orders <- list(order1,order2) orderscore(subpopulation, orders)
Compute the entropy of each cell, using this as a proxy for the differentiation status.
perCellEntropy(x, ...) ## S4 method for signature 'ANY' perCellEntropy(x, BPPARAM = NULL) ## S4 method for signature 'SummarizedExperiment' perCellEntropy(x, ..., assay.type = "counts")
perCellEntropy(x, ...) ## S4 method for signature 'ANY' perCellEntropy(x, BPPARAM = NULL) ## S4 method for signature 'SummarizedExperiment' perCellEntropy(x, ..., assay.type = "counts")
x |
A numeric matrix-like object containing counts for each cell (column) and feature (row). Alternatively, a SummarizedExperiment object containing such a matrix. |
... |
For the generic, further arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. |
BPPARAM |
A BiocParallelParam object from BiocParallel, specifying how calculations should be parallelized. |
assay.type |
An integer or string specifying the assay to use from a SummarizedExperiment |
Entropy values are computed from the proportion of counts assigned to each feature within a given cell.
The central idea is that undifferentiated cells have higher entropies because they are not yet committed to a single lineage,
and thus have low but persistent activity of the transcriptional programs for all lineages.
The cluster with the highest entropy values can be used to determine the start
cluster in orderCells
.
A numeric vector of entropies for all cells in x
.
Cells with all-zero values in x
will be assigned NA
entropies.
Aaron Lun
Grun D et al. (2016). De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266-77
Gulati GS et al. (2020). Single-cell transcriptional diversity is a hallmark of developmental potential. Science 367, 405-11
Guo M et al. (2017) SLICE: determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 45, e54
sce <- scuttle::mockSCE() ent <- perCellEntropy(sce) summary(ent) # Compute average entropy over mock clusters. clusters <- sample(ncol(sce), 5) by.cluster <- split(ent, clusters) mean.cluster.ent <- vapply(by.cluster, mean, 0)
sce <- scuttle::mockSCE() ent <- perCellEntropy(sce) summary(ent) # Compute average entropy over mock clusters. clusters <- sample(ncol(sce), 5) by.cluster <- split(ent, clusters) mean.cluster.ent <- vapply(by.cluster, mean, 0)
Plot the model-based clustering results
plotmclust( mclustobj, x = 1, y = 2, MSTorder = NULL, show_tree = T, show_cell_names = T, cell_name_size = 3, markerexpr = NULL )
plotmclust( mclustobj, x = 1, y = 2, MSTorder = NULL, show_tree = T, show_cell_names = T, cell_name_size = 3, markerexpr = NULL )
mclustobj |
The exact output of |
x |
The column of data after dimension reduction to be plotted on the horizontal axis. |
y |
The column of data after dimension reduction to be plotted on the vertical axis. |
MSTorder |
The arbitrary order of cluster to be shown on the plot. |
show_tree |
Whether to show the links between cells connected in the minimum spanning tree. |
show_cell_names |
Whether to draw the name of each cell in the plot. |
cell_name_size |
The size of cell name labels if show_cell_names is TRUE. |
markerexpr |
The gene expression used to define the size of nodes. |
This function will plot the gene expression data after dimension reduction and show the clustering results.
A ggplot2 object.
Zhicheng Ji, Hongkai Ji <[email protected]>
data(lpsdata) procdata <- preprocess(lpsdata) lpsmclust <- exprmclust(procdata) plotmclust(lpsmclust)
data(lpsdata) procdata <- preprocess(lpsdata) lpsmclust <- exprmclust(procdata) plotmclust(lpsmclust)
preprocess the raw single-cell data
preprocess( data, clusternum = NULL, takelog = TRUE, logbase = 2, pseudocount = 1, minexpr_value = 1, minexpr_percent = 0.5, cvcutoff = 1 )
preprocess( data, clusternum = NULL, takelog = TRUE, logbase = 2, pseudocount = 1, minexpr_value = 1, minexpr_percent = 0.5, cvcutoff = 1 )
data |
The raw single_cell data, which is a numeric matrix or data.frame. Rows represent genes/features and columns represent single cells. |
clusternum |
The number of clusters for doing cluster, typically 5 percent of number of all genes. The clustering will be done after all the transformation and trimming. If NULL no clustering will be performed. |
takelog |
Logical value indicating whether to take logarithm |
logbase |
Numeric value specifiying base of logarithm |
pseudocount |
Numeric value to be added to the raw data when taking logarithm |
minexpr_value |
Numeric value specifying the minimum cutoff of log transformed (if takelog is TRUE) value |
minexpr_percent |
Numeric value specifying the lowest percentage of highly expressed cells (expression value bigger than minexpr_value) for the genes/features to be retained. |
cvcutoff |
Numeric value specifying the minimum value of coefficient of variance for the genes/features to be retained. |
This function first takes logarithm of the raw data and then filters out genes/features in which too many cells are low expressed. It also filters out genes/features with low coefficient of variance which indicates the genes/features does not contain much information. The default setting will first take log2 of the raw data after adding a pseudocount of 1. Then genes/features in which at least half of cells have expression values are greater than 1 and the coefficeints of variance across all cells are at least 1 are retained.
Matrix or data frame with the same format as the input dataset.
Zhicheng Ji, Hongkai Ji <[email protected]>
data(lpsdata) procdata <- preprocess(lpsdata)
data(lpsdata) procdata <- preprocess(lpsdata)
A convenience wrapper to quickly compute a minimum spanning tree (MST) on the cluster centroids to obtain a pseudotime ordering of the cells.
quickPseudotime(x, ...) ## S4 method for signature 'ANY' quickPseudotime(x, clusters, others = NULL, ..., start = NULL, columns = NULL) ## S4 method for signature 'SummarizedExperiment' quickPseudotime(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' quickPseudotime( x, clusters = colLabels(x, onAbsence = "error"), ..., others = NULL, use.dimred = NULL, other.dimreds = TRUE )
quickPseudotime(x, ...) ## S4 method for signature 'ANY' quickPseudotime(x, clusters, others = NULL, ..., start = NULL, columns = NULL) ## S4 method for signature 'SummarizedExperiment' quickPseudotime(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' quickPseudotime( x, clusters = colLabels(x, onAbsence = "error"), ..., others = NULL, use.dimred = NULL, other.dimreds = TRUE )
x |
A numeric matrix of coordinates where each row represents a cell/sample and each column represents a dimension (usually a PC or another low-dimensional embedding, but features or genes can also be used). Alternatively, a SummarizedExperiment or SingleCellExperiment object
containing such a matrix in its Alternatively, for SingleCellExperiments, this matrix may be extracted from its |
... |
For the generic, further arguments to pass to the specific methods. For the ANY method, further arguments to pass to For the SummarizedExperiment method, further arguments to pass to the ANY method. For the SingleCellExperiment method, further arguments to pass to the SummarizedExperiment method
(if |
clusters |
A vector or factor of length equal to the number of cells in |
others |
List of numeric matrices with the same number of rows as |
start |
Passed to |
columns |
A character, logical or integer vector specifying the columns of |
assay.type |
An integer or string specifying the assay to use from a SummarizedExperiment |
use.dimred |
An integer or string specifying the reduced dimensions to use from a SingleCellExperiment |
other.dimreds |
Logical scalar indicating whether all dimensionality reduction results in |
This function simply calls, in order:
rowmean
, to compute the average low-dimensional coordinates for each cluster.
createClusterMST
on the average coordinates created from x
.
reportEdges
on the average coordinates for all entries of other
.
mapCellsToEdges
on the per-cell coordinates in x
with the constructed MST.
orderCells
on the mappings generated from x
onto the MST.
A List containing:
centered
, a list of numeric matrices containing the averaged coordinates for each cluster.
Each matrix corresponds to a dimensionality reduction result in x
.
mst
, a graph object containing the cluster-level MST computed on the coordinates from use
.
ordering
, a PseudotimeOrdering object containing the ordering for various paths through the MST computed from use
.
connected
, a list of data.frames containing the edge coordinates between centers.
Each data.frame corresponds to a dimensionality reduction result in x
.
Aaron Lun
createClusterMST
and friends, for the functions that do the actual work.
# Mocking up an SCE object: ncells <- 100 u <- matrix(rpois(20000, 5), ncol=ncells) pca <- matrix(runif(ncells*5), ncells) tsne <- matrix(rnorm(ncells*2), ncells) library(SingleCellExperiment) sce <- SingleCellExperiment(assays=list(counts=u), reducedDims=SimpleList(PCA=pca, tSNE=tsne)) # Clustering on our pretend PCA values: clusters <- kmeans(pca, 3)$cluster # Quickly computing the pseudotime: out <- quickPseudotime(sce, clusters, use.dimred="PCA") out$mst head(out$ordering)
# Mocking up an SCE object: ncells <- 100 u <- matrix(rpois(20000, 5), ncol=ncells) pca <- matrix(runif(ncells*5), ncells) tsne <- matrix(rnorm(ncells*2), ncells) library(SingleCellExperiment) sce <- SingleCellExperiment(assays=list(counts=u), reducedDims=SimpleList(PCA=pca, tSNE=tsne)) # Clustering on our pretend PCA values: clusters <- kmeans(pca, 3)$cluster # Quickly computing the pseudotime: out <- quickPseudotime(sce, clusters, use.dimred="PCA") out$mst head(out$ordering)
Provides the coordinates of the start and end of every edge in the MST,
possibly on a different coordinate space from that used to construct the MST.
This is mostly useful for plotting purposes in segments
or the equivalent ggplot2 functionality.
reportEdges(x, ...) ## S4 method for signature 'ANY' reportEdges(x, mst, clusters, combined = TRUE, columns = NULL) ## S4 method for signature 'SummarizedExperiment' reportEdges(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' reportEdges( x, clusters = colLabels(x, onAbsence = "error"), ..., use.dimred = NULL )
reportEdges(x, ...) ## S4 method for signature 'ANY' reportEdges(x, mst, clusters, combined = TRUE, columns = NULL) ## S4 method for signature 'SummarizedExperiment' reportEdges(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' reportEdges( x, clusters = colLabels(x, onAbsence = "error"), ..., use.dimred = NULL )
x |
A numeric matrix of coordinates where each row represents a cell/sample and each column represents a dimension (usually a PC or another low-dimensional embedding, but features or genes can also be used). Alternatively, a SummarizedExperiment or SingleCellExperiment object
containing such a matrix in its Alternatively, for SingleCellExperiments, this matrix may be extracted from its Alternatively, if |
... |
For the generic, further arguments to pass to the specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. For the SingleCellExperiment method, further arguments to pass to the SummarizedExperiment method
(if |
mst |
A graph object containing a MST, typically the output of |
clusters |
A factor-like object of the same length as Alternatively, a matrix with number of rows equal to |
combined |
Logical scalar indicating whether a single data.frame of edge coordinates should be returned. |
columns |
A character, logical or integer vector specifying the columns of |
assay.type |
An integer or string specifying the assay to use from a SummarizedExperiment |
use.dimred |
An integer or string specifying the reduced dimensions to use from a SingleCellExperiment |
It is entirely possibly to supply, say, t-SNE coordinates in x
along with a MST constructed from the PCA coordinates.
This allows us to visualize the edges of the MST on other low-dimensional embeddings.
The coordinates in x
can be per-cell or, if clusters=NULL
, they are assumed to already be per-cluster means.
x
may also be NULL
, in which case the center coordinates are obtained
from the coordinates
vertex attribute of mst
.
A data.frame containing the start and end coordinates of segments representing all the edges in mst
.
If combined=FALSE
, a list of two data.frames is returned where corresponding rows represent the start and end coordinates of the same edge.
Aaron Lun
Ji Z and Ji H (2016). TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117
createClusterMST
, to generate mst
.
quickPseudotime
, a wrapper to quickly perform these calculations.
# Mocking up a Y-shaped trajectory. centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1)) rownames(centers) <- seq_len(nrow(centers)) clusters <- sample(nrow(centers), 1000, replace=TRUE) cells <- centers[clusters,] cells <- cells + rnorm(length(cells), sd=0.5) # Creating the MST: mst <- createClusterMST(cells, clusters) # Plotting the MST on top of existing visualizations: edges <- reportEdges(x=NULL, mst, combined=FALSE) plot(cells[,1], cells[,2], col=clusters) segments(edges$start$dim1, edges$start$dim2, edges$end$dim1, edges$end$dim2, lwd=5) # Use with coordinates other than those used to make the MST: shifted.cells <- cells + 10 shift.edges <- reportEdges(shifted.cells, mst, clusters=clusters, combined=FALSE) plot(shifted.cells[,1], shifted.cells[,2], col=clusters) segments(shift.edges$start$dim1, shift.edges$start$dim2, shift.edges$end$dim1, shift.edges$end$dim2, lwd=5) # Also works for ggplot2: df <- data.frame(shifted.cells, cluster=factor(clusters)) shift.edges2 <- reportEdges(shifted.cells, mst, clusters=clusters) library(ggplot2) ggplot(df) + geom_point(aes(x=X1, y=X2, color=cluster)) + geom_line(data=shift.edges2, mapping=aes(x=dim1, y=dim2, group=edge))
# Mocking up a Y-shaped trajectory. centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1)) rownames(centers) <- seq_len(nrow(centers)) clusters <- sample(nrow(centers), 1000, replace=TRUE) cells <- centers[clusters,] cells <- cells + rnorm(length(cells), sd=0.5) # Creating the MST: mst <- createClusterMST(cells, clusters) # Plotting the MST on top of existing visualizations: edges <- reportEdges(x=NULL, mst, combined=FALSE) plot(cells[,1], cells[,2], col=clusters) segments(edges$start$dim1, edges$start$dim2, edges$end$dim1, edges$end$dim2, lwd=5) # Use with coordinates other than those used to make the MST: shifted.cells <- cells + 10 shift.edges <- reportEdges(shifted.cells, mst, clusters=clusters, combined=FALSE) plot(shifted.cells[,1], shifted.cells[,2], col=clusters) segments(shift.edges$start$dim1, shift.edges$start$dim2, shift.edges$end$dim1, shift.edges$end$dim2, lwd=5) # Also works for ggplot2: df <- data.frame(shifted.cells, cluster=factor(clusters)) shift.edges2 <- reportEdges(shifted.cells, mst, clusters=clusters) library(ggplot2) ggplot(df) + geom_point(aes(x=X1, y=X2, color=cluster)) + geom_line(data=shift.edges2, mapping=aes(x=dim1, y=dim2, group=edge))
plot expression values of individual genes against pseudotime axis
singlegeneplot(geneexpr, TSCANorder, cell_size = 2)
singlegeneplot(geneexpr, TSCANorder, cell_size = 2)
geneexpr |
The gene expression values. Names should agree with the pseudotime information. |
TSCANorder |
The output of function |
cell_size |
Size of cells in the plot. |
This function plots the expression values of individual genes against given pseudotime
ggplot2 object.
Zhicheng Ji, Hongkai Ji <[email protected]>
data(lpsdata) procdata <- preprocess(lpsdata) lpsmclust <- exprmclust(procdata) lpsorder <- TSCANorder(lpsmclust,orderonly=FALSE,flip=TRUE) #Choose STAT1 gene expression to plot STAT2expr <- log2(lpsdata["STAT2",]+1) singlegeneplot(STAT2expr, lpsorder)
data(lpsdata) procdata <- preprocess(lpsdata) lpsmclust <- exprmclust(procdata) lpsorder <- TSCANorder(lpsmclust,orderonly=FALSE,flip=TRUE) #Choose STAT1 gene expression to plot STAT2expr <- log2(lpsdata["STAT2",]+1) singlegeneplot(STAT2expr, lpsorder)
Implements a simple method of testing for significant differences with respect to pseudotime, based on fitting linear models with a spline basis matrix.
testPseudotime(x, ...) ## S4 method for signature 'ANY' testPseudotime( x, pseudotime, df = 5, get.lfc = TRUE, get.spline.coef = FALSE, trend.only = TRUE, block = NULL, row.data = NULL, BPPARAM = NULL ) ## S4 method for signature 'SummarizedExperiment' testPseudotime(x, ..., assay.type = "logcounts")
testPseudotime(x, ...) ## S4 method for signature 'ANY' testPseudotime( x, pseudotime, df = 5, get.lfc = TRUE, get.spline.coef = FALSE, trend.only = TRUE, block = NULL, row.data = NULL, BPPARAM = NULL ) ## S4 method for signature 'SummarizedExperiment' testPseudotime(x, ..., assay.type = "logcounts")
x |
A numeric matrix-like object containing log-expression values for cells (columns) and genes (rows). Alternatively, a SummarizedExperiment containing such a matrix. |
... |
For the generic, further arguments to pass to specific method. For the SummarizedExperiment method, further arguments to pass to the ANY method. |
pseudotime |
A numeric vector of length equal to |
df |
Integer scalar specifying the degrees of freedom for the splines. |
get.lfc |
Logical scalar indicating whether to return an overall log-fold change along each path. |
get.spline.coef |
Logical scalar indicating whether to return the estimates of the spline coefficients. |
trend.only |
Deprecated and ignored. |
block |
Factor of length equal to the number of cells in |
row.data |
A DataFrame with the same number and order of rows in |
BPPARAM |
A BiocParallelParam object from the BiocParallel package, used to control parallelization. |
assay.type |
String or integer scalar specifying the assay containing the log-expression matrix. |
This function fits a natural spline to the expression of each gene with respect to pseudotime. It then does an ANOVA to test whether any of the spline coefficients are non-zero. In this manner, genes exhibiting a significant (and potentially non-linear) trend with respect to the pseudotime can be detected as those with low p-values.
Branched trajectories with multiple paths are represented by a 2-dimensional pseudotime
.
In this case, only one path is tested at a time by only using one column of pseudotime
to form the spline basis matrix.
Cells with NA
values in any given pseudotime
column are assumed to be assigned to a different path and are ignored when fitting the corresponding model.
By default, estimates of the spline coefficients are not returned as they are difficult to interpret. Rather, a log-fold change of expression along each path is estimated to provide some indication of the overall magnitude and direction of any change.
block
can be used to fit a separate linear model to each of multiple batches,
after which the statistics are combined across batches as described in testLinearModel
.
This avoids potential confounding effects from batch-specific differences in the distribution of cells across pseudotime.
If pseudotime
is a vector, a DataFrame is returned containing the statistics for each gene (row),
including the p-value and its BH-adjusted equivalent.
If get.lfc=TRUE
, an overall log-fold change is returned for each path.
If get.spline.coef=TRUE
, the estimated spline coefficients are also returned (single path)
or the differences in the spline fits to the first path are returned (multiple paths).
If pseudotime
is a 2-dimensional object, a list of DataFrames is instead returned.
Each DataFrame has the same format as described above and contains test statistics for each column (i.e., lineage) in pseudotime
.
Aaron Lun
orderCells
, to generate the pseudotime matrix.
testLinearModel
, which performs the tests under the hood.
y <- matrix(rnorm(10000), ncol=100) u <- runif(100) testPseudotime(y, u) # Handling a blocking factor. b <- gl(2, 50) testPseudotime(y, u, block=b)
y <- matrix(rnorm(10000), ncol=100) u <- runif(100) testPseudotime(y, u) # Handling a blocking factor. b <- gl(2, 50) testPseudotime(y, u, block=b)
This package provides essential tools used in analyzing data from single-cell experiments
TSCAN enables users to easily construct and tune pseudotemporal cell ordering as well as analyzing differentially expressed genes. TSCAN comes with a user-friendly GUI written in shiny. More functions will come in the future.
Maintainer: Zhicheng Ji [email protected]
Authors:
Hongkai Ji
Other contributors:
Aaron Lun [email protected] [contributor]
Construct TSCAN order after exprmclust
TSCANorder(mclustobj, MSTorder = NULL, orderonly = T, flip = F, listbranch = F)
TSCANorder(mclustobj, MSTorder = NULL, orderonly = T, flip = F, listbranch = F)
mclustobj |
The exact output of the |
MSTorder |
A numeric vector specifying the order of clusters. |
orderonly |
Only return the ordering. State or pseudotime information will not be returned |
flip |
whether to flip the ordering |
listbranch |
whether to list the ordering results of all possible branches in MST. Only works if MSTorder in NULL. |
This function takes the exact output of exprmclust function and construct TSCAN order by mapping all cells onto the path that connects cluster centers. Users can also specify their own path.
if orderonly = F, a vector of ordered cell names. if orderonly = T, a data frame of ordered cell names, cell states and pseudotime.
Zhicheng Ji, Hongkai Ji <[email protected]>
data(lpsdata) procdata <- preprocess(lpsdata) lpsmclust <- exprmclust(procdata) TSCANorder(lpsmclust)
data(lpsdata) procdata <- preprocess(lpsdata) lpsmclust <- exprmclust(procdata) TSCANorder(lpsmclust)
Launch the TSCAN user interface in local machine
TSCANui()
TSCANui()
This function will automatically launch the TSCAN user interface in a web browser. The user interface provides many powerful functions which is not available by command line programming. It also provides a much easier and more convenient way to quickly explore single cell data and construct pseudotime analysis. The user interface can also be accessed by http://zhiji.shinyapps.io/TSCAN. Neither R nor any packages are required in this online version. However, it is highly recommended that the user interface be launched locally for faster running speed.
Zhicheng Ji, Hongkai Ji <[email protected]>
## Not run: TSCANui() ## End(Not run)
## Not run: TSCANui() ## End(Not run)