Title: | uSORT: A self-refining ordering pipeline for gene selection |
---|---|
Description: | This package is designed to uncover the intrinsic cell progression path from single-cell RNA-seq data. It incorporates data pre-processing, preliminary PCA gene selection, preliminary cell ordering, feature selection, refined cell ordering, and post-analysis interpretation and visualization. |
Authors: | Mai Chan Lau, Hao Chen, Jinmiao Chen |
Maintainer: | Hao Chen <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.0 |
Built: | 2024-12-18 06:12:29 UTC |
Source: | https://github.com/bioc/uSORT |
A wrapper function for autoSPIN method which implements optimized local refinement using the selected SPIN sorting method, i.e. STS or Neighborhood.
autoSPIN(data, data_type = c("linear", "cyclical"), sorting_method = c("STS", "neighborhood"), alpha = 0.2, sigma_width = 1, no_randomization = 20, window_perc_range = c(0.1, 0.9), window_size_incre_perct = 0.05)
autoSPIN(data, data_type = c("linear", "cyclical"), sorting_method = c("STS", "neighborhood"), alpha = 0.2, sigma_width = 1, no_randomization = 20, window_perc_range = c(0.1, 0.9), window_size_incre_perct = 0.05)
data |
An log2 transformed expresssion matrix containing n-rows of cells and m-cols of genes. |
data_type |
A character string indicating the type of progression, i.e. 'linear' (strictly linear) or 'cyclical' (cyclically linear). |
sorting_method |
A character string indicating the choice of SPIN sorting method, i.e. 'STS' (Side-to-Side) or 'Neighborhood'. |
alpha |
A fraction value denoting the size of locality used for calculating the summed local variance. |
sigma_width |
An integer number denoting the degree of spread of the gaussian distribution which is used for computing weight matrix for Neighborhood sorting method. |
no_randomization |
An integer number indicating the number of repeated sorting, each of which uses randomly selected initial cell position. |
window_perc_range |
A fraction value indicating the range of window size to be examined during local refinement. |
window_size_incre_perct |
A fraction value indicating the step size at each iteration for incrementing window size. |
A data frame containing single column of ordered sample IDs.
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] rownames(da) <- paste0('spl_',seq(1,nrow(da))) d <- da[,1:4] dl <- da[,5,drop=FALSE] res <- autoSPIN(data = d) dl <- dl[match(res$SampleID,rownames(dl)),] annot <- data.frame(id = seq(1,nrow(res)), label=dl, stringsAsFactors = FALSE) #ggplot(annot, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] rownames(da) <- paste0('spl_',seq(1,nrow(da))) d <- da[,1:4] dl <- da[,5,drop=FALSE] res <- autoSPIN(data = d) dl <- dl[match(res$SampleID,rownames(dl)),] annot <- data.frame(id = seq(1,nrow(res)), label=dl, stringsAsFactors = FALSE) #ggplot(annot, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
A modified monocle's function for 'compareModels' which identifies and removes genes whose reduced_models is better than full_models in term of likelihood
clusterGenes1(expr_matrix, krange, method = function(x) { as.dist((1 - cor(t(x)))/2) }, ...)
clusterGenes1(expr_matrix, krange, method = function(x) { as.dist((1 - cor(t(x)))/2) }, ...)
expr_matrix |
Expression matrix. |
krange |
krange. |
method |
method function. |
... |
Other parameters. |
test_res a dataframe containing status of modeling and adjusted p-value
MaiChan Lau
A modified monocle's function for 'compareModels' which identifies and removes genes whose reduced_models is better than full_models in term of likelihood
compareModels1(full_models, reduced_models)
compareModels1(full_models, reduced_models)
full_models |
a Monocle's vgam full model |
reduced_models |
a Monocle's vgam reduced/ null model |
test_res a dataframe containing status of modeling and adjusted p-value
MaiChan Lau
A modified monocle's function for 'diff_test_helper1' which includes more attempts on finding models and also compute max. magnitude change in expression values predicted by GLM model
diff_test_helper1(x, fullModelFormulaStr, reducedModelFormulaStr, expressionFamily, lowerDetectionLimit = 0.1, type_ordering = "linear")
diff_test_helper1(x, fullModelFormulaStr, reducedModelFormulaStr, expressionFamily, lowerDetectionLimit = 0.1, type_ordering = "linear")
x |
an expression data |
fullModelFormulaStr |
a Monocle's model structure |
reducedModelFormulaStr |
a Monocle's model structure |
expressionFamily |
a Monocle's family character |
lowerDetectionLimit |
a threshold value |
type_ordering |
a character indicating the type of underlying cell progression, i.e. linear or circular |
test_res a dataframe containing status of modeling and adjusted p-value
MaiChan Lau
modified from FludigmSC pacakge
differentialGeneTest1(cds, fullModelFormulaStr = "expression~sm.ns(Pseudotime, df=3)", reducedModelFormulaStr = "expression~1", cores = 1)
differentialGeneTest1(cds, fullModelFormulaStr = "expression~sm.ns(Pseudotime, df=3)", reducedModelFormulaStr = "expression~1", cores = 1)
cds |
Input object. |
fullModelFormulaStr |
Full model formula. |
reducedModelFormulaStr |
Reduced model formula. |
cores |
Number of cores will be used. |
test results
A distance function A distance function computes cell-to-cell distance matrix.
distance.function(expr, method = c("Euclidean", "Correlation", "eJaccard", "none"))
distance.function(expr, method = c("Euclidean", "Correlation", "eJaccard", "none"))
expr |
An expresssion matrix containing n-rows of cells and m-cols of genes. |
method |
A character string indicating the distance function. |
A matrix containing n-by-n cell distance.
A feature/ gene selection function (1) removes sparsely expressed genes, (2) identifies differentially expressed genes based on preliminary cell ordering, (3) removes highly dispersed genes from the identified DEGs, (4) further picks genes which are expected to have large expression difference on the 2 extreme ends of preliminary cell ordering
driving_force_gene_selection(cds, scattering.cutoff.prob = 0.75, driving.force.cutoff = NULL, qval_cutoff = 0.05, min_expr = 0.1, data_type = c("linear", "cyclical"), nCores = 1)
driving_force_gene_selection(cds, scattering.cutoff.prob = 0.75, driving.force.cutoff = NULL, qval_cutoff = 0.05, min_expr = 0.1, data_type = c("linear", "cyclical"), nCores = 1)
cds |
a Monocle's CellDataSet object |
scattering.cutoff.prob |
probability used for removing largely dispersed genes |
driving.force.cutoff |
a value used for removing genes which do not change much along cell progress along cell progress path |
qval_cutoff |
a user-defined adjusted p-value below which genes are retained |
min_expr |
the minimum expression value |
data_type |
a character indicating the type of underlying cell progression, i.e. linear or cyclical. |
nCores |
Number of cores to use. |
integer
MaiChan Lau
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) #exprs <- uSORT_preProcess(exprs_file = file) #exp_raw <- t(exprs$exprs_raw) #exp_trimmed <- t(exprs$exprs_log_trimed) #cds <- uSORT:::EXP_to_CellDataSet(exp_trimmed, exp_raw) #driver_genes <- driving_force_gene_selection(cds = cds)
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) #exprs <- uSORT_preProcess(exprs_file = file) #exp_raw <- t(exprs$exprs_raw) #exp_trimmed <- t(exprs$exprs_log_trimed) #cds <- uSORT:::EXP_to_CellDataSet(exp_trimmed, exp_raw) #driver_genes <- driving_force_gene_selection(cds = cds)
A elbow detection function detects the elbow/knee of a given vector of values. Values will be sorted descendingly before detection, and the ID of those values above the elbow will be returned.
elbow_detection(scores, if_plot = FALSE)
elbow_detection(scores, if_plot = FALSE)
scores |
A vector of numeric scores. |
if_plot |
Boolean determine if plot the results. |
a vector of selected elements IDs
scores <- c(10, 9 ,8, 6, 3, 2, 1, 0.1) elbow_detection(scores, if_plot = TRUE)
scores <- c(10, 9 ,8, 6, 3, 2, 1, 0.1) elbow_detection(scores, if_plot = TRUE)
A function for constructing a Monocle's CellDataSet object from an expression matrix
EXP_to_CellDataSet(log2_exp = NULL, expression_data_raw = NULL, lod = 1)
EXP_to_CellDataSet(log2_exp = NULL, expression_data_raw = NULL, lod = 1)
log2_exp |
An log2 transformed expresssion matrix containing n-rows of cells and m-cols of genes. |
expression_data_raw |
A data frame containing raw expression values, with rownames of cells and colnames of genes. |
lod |
A value of limit of detection in the unit of TPM/CPM/RPKM. |
A CellDataSet object.
A gene detection function computes the fraction of genes detected in each cell, reproduced from FluidigmSC package.
fluidigmSC_analyzeGeneDetection(expression_data, threshold = 1)
fluidigmSC_analyzeGeneDetection(expression_data, threshold = 1)
expression_data |
A data frame containing raw expression values, with rownames of genes and colnames of cells. |
threshold |
A limit of detection in the unit of TPM/CPM/RPMK. |
A data frame containing a column of number of genes detected, and a column of the corresponding percentage of gene detection, rownames of cells.
An outlier detection function identifies cells with median expression below that of the bulk, reproduced from FluidigmSC package.
fluidigmSC_identifyExpOutliers(log2ex_data, expression_data_raw, threshold, step, fine_step, num_fine_test, pct_goodsample_threshold = 0.5, quantile_threshold = 0.95, low_quantile_threshold = 0.25, min_gene_number = 25, lod)
fluidigmSC_identifyExpOutliers(log2ex_data, expression_data_raw, threshold, step, fine_step, num_fine_test, pct_goodsample_threshold = 0.5, quantile_threshold = 0.95, low_quantile_threshold = 0.25, min_gene_number = 25, lod)
log2ex_data |
A data frame containing log2 tranformed expression values, with rownames of genes and colnames of cells. |
expression_data_raw |
A data frame containing raw expression values, with rownames of genes and colnames of cells. |
threshold |
A value in raw expression used as the starting threshold value. |
step |
An integer number indicating the increment of threshold value at each iteration. |
fine_step |
An integer number indicating the increment of threshold value at each iteration, at the refining stage. |
num_fine_test |
An integer number indicating the number of iteration of the refining stage. |
pct_goodsample_threshold |
A fraction value indicating the minimum percentage of samples on which the representative genes are detectable. |
quantile_threshold |
A probability of gene detection rate above which a sample is considered as good sample. |
low_quantile_threshold |
A probability of average gene expression value below which a sample is taken as an outlier. |
min_gene_number |
An integer indicating the minimum size of representative genes. |
lod |
A value of limit of detection in the unit of TPM/CPM/RPKM. |
A vector of character stating the IDs of outlier cells.
A gene finding function looking for genes in the target set x from the source set y, reproduced from FluidigmSC package.
fluidigmSC_isElementIgnoreCase(x, y, ignore_case = TRUE)
fluidigmSC_isElementIgnoreCase(x, y, ignore_case = TRUE)
x |
A vector of characters representing gene names (target genes). |
y |
A vector of characters representing gene names (source genes). |
ignore_case |
Boolean, if TRUE ignores letter case. |
A vector of characters representing gene names.
An expression reading function which imports expression data from .txt file, and then computes log2 transformed data, reproduced from FluidigmSC package.
fluidigmSC_readLinearExp(exp_file = TRUE, lod = 1)
fluidigmSC_readLinearExp(exp_file = TRUE, lod = 1)
exp_file |
Input file name in txt format, with rownames of cells and colnames of genes. |
lod |
A value of limit of detection in the unit of TPM/CPM/RPKM. It will be used as the starting value for outlier cell detection and the basis for removing scarce genes. |
A list containing expression_data_raw
(data frame), log2ex_data
(data frame),
and log2ex_avg_data
(data frame).
A gene trimming function removes genes whose average expression value is below the log2(threshold), and also present in at least 10
fluidigmSC_removeGenesByLinearExpForAllType(log2ex_data, log2ex_avg_data, threshold)
fluidigmSC_removeGenesByLinearExpForAllType(log2ex_data, log2ex_avg_data, threshold)
log2ex_data |
A data frame containing log2 tranformed expression values, with rownames of genes and colnames of cells. |
log2ex_avg_data |
A data frame containing log2 tranformed average expression values for individual gene. |
threshold |
A limit of detection in the unit of TPM/CPM/RPMK. |
A vector of character containing gene names of those passed the filtering.
A gene trimming function removes genes whose average expression value is below the log2(threshold); reproduced from FluidigmSC package.
fluidigmSC_removeGenesByLinearExpForAllType_log2(log2ex_data, threshold)
fluidigmSC_removeGenesByLinearExpForAllType_log2(log2ex_data, threshold)
log2ex_data |
A data frame containing log2 tranformed expression values, with rownames of genes and colnames of cells. |
threshold |
A limit of detection in the unit of TPM/CPM/RPMK. |
A vector of character containing gene names of those passed the filtering.
A wrapper function for Monocle sorting method
monocle_wrapper(log2_exp, expression_data_raw, lod = 1)
monocle_wrapper(log2_exp, expression_data_raw, lod = 1)
log2_exp |
An log2 transformed expresssion matrix containing n-rows of cells and m-cols of genes. |
expression_data_raw |
A data frame containing raw expression values, with rownames of cells and colnames of genes. |
lod |
A value of limit of detection in the unit of TPM/CPM/RPKM. |
A data frame containing single column of ordered sample IDs.
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] rownames(da) <- paste0('spl_',seq(1,nrow(da))) d <- da[,1:4] dl <- da[,5,drop=FALSE] #res <- monocle_wrapper(log2_exp = d, expression_data_raw = d) #dl <- dl[match(res,rownames(dl)),] #annot <- data.frame(id = seq(1,length(res)), label=dl, stringsAsFactors = FALSE) #ggplot(annot, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] rownames(da) <- paste0('spl_',seq(1,nrow(da))) d <- da[,1:4] dl <- da[,5,drop=FALSE] #res <- monocle_wrapper(log2_exp = d, expression_data_raw = d) #dl <- dl[match(res,rownames(dl)),] #annot <- data.frame(id = seq(1,length(res)), label=dl, stringsAsFactors = FALSE) #ggplot(annot, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
A sorting function using the Neighborhood algorithm
neighborhood_sorting(d, weights_mat = NULL, max_iter = 100)
neighborhood_sorting(d, weights_mat = NULL, max_iter = 100)
d |
A matrix containing n-by-n cell distance. |
weights_mat |
A weight matrix of size n-by-n. |
max_iter |
An integer number indicating the maximum number of iteration if sorting does not converge. |
A list containing ordering
(a vector of re-ordered sequence) and
cost
(a numeric value).
A wrapper function for Neighborhood sorting as proposed in [Tsafrir et al. 2005].
neighborhood_sorting_wrapper(expr, sigma_width = 1, no_randomization = 10)
neighborhood_sorting_wrapper(expr, sigma_width = 1, no_randomization = 10)
expr |
An expresssion matrix containing n-rows of cells and m-cols of genes. |
sigma_width |
An integer number determining the degree of spread of the gaussian distribution which is used for computing weight matrix for Neighborhood sorting method. |
no_randomization |
An integer number indicating the number of repeated sorting, each of which uses a randomaly selected initial cell ordering. |
A list containing permutated.expr
(data frame) and best.cost
(a numeric value).
A cost computation function for Neighborhood algorithm
neighborhood_sortingcost(expr = NULL, sigma_width = 1, method = c("Euclidean", "Correlation", "eJaccard", "none"))
neighborhood_sortingcost(expr = NULL, sigma_width = 1, method = c("Euclidean", "Correlation", "eJaccard", "none"))
expr |
An expresssion matrix containing n-rows of cells and m-cols of genes. |
sigma_width |
An integer number determining the degree of spread of the gaussian distribution which is used for computing weight matrix for Neighborhood sorting method. |
method |
A character string indicating the distance function. |
A numeric value of sorting cost.
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] d <- da[,1:4] randomOrdering_cost <- neighborhood_sortingcost(d, method= 'Euclidean') randomOrdering_cost da <- iris d <- da[,1:4] properOrdering_cost <- neighborhood_sortingcost(d, method= 'Euclidean') properOrdering_cost
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] d <- da[,1:4] randomOrdering_cost <- neighborhood_sortingcost(d, method= 'Euclidean') randomOrdering_cost da <- iris d <- da[,1:4] properOrdering_cost <- neighborhood_sortingcost(d, method= 'Euclidean') properOrdering_cost
Gene selection using PCA technique
pca_gene_selection(data)
pca_gene_selection(data)
data |
A matrix of data.frame with row.name of cells, and col.name of genes |
a vector of the names of selected genes.
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) exprs <- uSORT_preProcess(exprs_file = file) exp_trimmed <- t(exprs$exprs_log_trimed) PCA_selected_genes <- pca_gene_selection(exp_trimmed)
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) exprs <- uSORT_preProcess(exprs_file = file) exp_trimmed <- t(exprs$exprs_log_trimed) PCA_selected_genes <- pca_gene_selection(exp_trimmed)
R inplementation of wanderlust
Rwanderlust(data, s, l = 15, k = 15, num_graphs = 1, num_waypoints = 250, waypoints_seed = 123, flock_waypoints = 2, metric = "euclidean", voting_scheme = "exponential", band_sample = FALSE, partial_order = NULL, verbose = TRUE)
Rwanderlust(data, s, l = 15, k = 15, num_graphs = 1, num_waypoints = 250, waypoints_seed = 123, flock_waypoints = 2, metric = "euclidean", voting_scheme = "exponential", band_sample = FALSE, partial_order = NULL, verbose = TRUE)
data |
Input data matrix. |
s |
Starting point ID. |
l |
l nearest neighbours. |
k |
k nearest neighbours, k < l. |
num_graphs |
Number of repreated graphs. |
num_waypoints |
Number of waypoints to guide the trajectory detection. |
waypoints_seed |
The seed for reproducing the results. |
flock_waypoints |
The number of times for flocking the waypoints, default is 2. |
metric |
Distance calculation metric for nearest neighbour detection. |
voting_scheme |
The scheme of voting. |
band_sample |
Boolean, if band the sample |
partial_order |
default NULL |
verbose |
Boolean, if print the details |
a list containing Trajectory, Order, Waypoints
Hao Chen
set.seed(15) shuffled_iris <- iris[sample(150, 150, replace = FALSE), ] data <- shuffled_iris[,1:4] data_label <- shuffled_iris[,5] wishbone <- Rwanderlust(data = data, num_waypoints = 100, waypoints_seed = 2) pd1 <- data.frame(id = wishbone$Trajectory, label=data_label, stringsAsFactors = FALSE) pd2 <- data.frame(id = seq_along(row.names(data)), label=data_label, stringsAsFactors = FALSE) #ggplot(pd1, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw() #ggplot(pd2, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
set.seed(15) shuffled_iris <- iris[sample(150, 150, replace = FALSE), ] data <- shuffled_iris[,1:4] data_label <- shuffled_iris[,5] wishbone <- Rwanderlust(data = data, num_waypoints = 100, waypoints_seed = 2) pd1 <- data.frame(id = wishbone$Trajectory, label=data_label, stringsAsFactors = FALSE) pd2 <- data.frame(id = seq_along(row.names(data)), label=data_label, stringsAsFactors = FALSE) #ggplot(pd1, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw() #ggplot(pd2, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
An expression scattering measurement function computes the level of scattering for individual genes along the cell ordering
scattering_quantification_per_gene(CDS = NULL)
scattering_quantification_per_gene(CDS = NULL)
CDS |
a Monocle's CellDataSet object |
integer
MaiChan Lau
The parameters appeared on GUI are based on input method, this function is called by
uSORT_parameters_GUI
. For internal use only.
sorting_method_parameter_GUI(method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"))
sorting_method_parameter_GUI(method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"))
method |
method name. |
a list of parameters.
Hao Chen
A wrapper function for SPIN method provides a R version of SPIN [Tsafrir et al. 2005].
SPIN(data, sorting_method = c("STS", "neighborhood"), sigma_width = 1)
SPIN(data, sorting_method = c("STS", "neighborhood"), sigma_width = 1)
data |
An log2 transformed expresssion matrix containing n-rows of cells and m-cols of genes. |
sorting_method |
A character string indicating the choice of sorting method, i.e. 'STS' (Side-to-Side) or 'Neighborhood'. |
sigma_width |
An integer number determining the degree of spread of the gaussian distribution which is used for computing weight matrix for Neighborhood sorting method. |
A data frame containing single column of ordered sample IDs.
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] rownames(da) <- paste0('spl_',seq(1,nrow(da))) d <- da[,1:4] dl <- da[,5,drop=FALSE] res <- SPIN(data = d) dl <- dl[match(res$SampleID,rownames(dl)),] annot <- data.frame(id = seq(1,nrow(res)), label=dl, stringsAsFactors = FALSE) #ggplot(annot, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] rownames(da) <- paste0('spl_',seq(1,nrow(da))) d <- da[,1:4] dl <- da[,5,drop=FALSE] res <- SPIN(data = d) dl <- dl[match(res$SampleID,rownames(dl)),] annot <- data.frame(id = seq(1,nrow(res)), label=dl, stringsAsFactors = FALSE) #ggplot(annot, aes(x=id, y=id, colour = label)) + geom_point() + theme_bw()
A sorting function using the Side-to-Side (STS) algorithm
STS_sorting(d, max_iter = 10)
STS_sorting(d, max_iter = 10)
d |
A matrix containing n-by-n cell distance. |
max_iter |
An integer number indicating the maximum number of iteration if sorting does not converge. |
A list containing ordering
(a vector of re-ordered sequence) and
cost
(a numeric value).
A wrapper function for Side-to-Side (STS) sorting as proposed in [Tsafrir et al. 2005].
STS_sorting_wrapper(expr, no_randomization = 10)
STS_sorting_wrapper(expr, no_randomization = 10)
expr |
An expresssion matrix containing n-rows of cells and m-cols of genes. |
no_randomization |
An integer number indicating the number of repeated sorting, each of which uses a randomaly selected initial cell ordering. |
A list containing permutated.expr
(data frame) and best.cost
(a numeric value).
A cost computation function for Side-to-Side (STS) algorithm
STS_sortingcost(expr = NULL, method = c("Euclidean", "Correlation", "eJaccard", "none"))
STS_sortingcost(expr = NULL, method = c("Euclidean", "Correlation", "eJaccard", "none"))
expr |
An expresssion matrix containing n-rows of cells and m-cols of genes. |
method |
A character string indicating the distance function. |
A numeric value of sorting cost.
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] d <- da[,1:4] randomOrdering_cost <- STS_sortingcost(d, method= 'Euclidean') randomOrdering_cost da <- iris d <- da[,1:4] properOrdering_cost <- STS_sortingcost(d, method= 'Euclidean') properOrdering_cost
set.seed(15) da <- iris[sample(150, 150, replace = FALSE), ] d <- da[,1:4] randomOrdering_cost <- STS_sortingcost(d, method= 'Euclidean') randomOrdering_cost da <- iris d <- da[,1:4] properOrdering_cost <- STS_sortingcost(d, method= 'Euclidean') properOrdering_cost
A summed local variance function
summed_local_variance(expr = NULL, alpha = NULL, data_type = "linear")
summed_local_variance(expr = NULL, alpha = NULL, data_type = "linear")
expr |
An expresssion matrix containing n-rows of cells and m-cols of genes. |
alpha |
A fraction value indicating the size of window for local variance measurement. |
data_type |
A character string indicating the type of progression, i.e. 'linear' (strictly linear) or 'cyclical' (cyclically linear). |
A numeric value of the summed local variance.
A summed local variance function for cyclical linear data type
summed_local_variance_cyclical(d, alpha = 0.3)
summed_local_variance_cyclical(d, alpha = 0.3)
d |
A cell-to-cell distance matrix. |
alpha |
A fraction value indicating the size of window for local variance measurement. |
A numeric value of the summed local variance.
A summed local variance function for strictly linear data type
summed_local_variance_linear(d, alpha = 0.3)
summed_local_variance_linear(d, alpha = 0.3)
d |
A cell-to-cell distance matrix. |
alpha |
A fraction value indicating the size of window for local variance measurement. |
A numeric value of the summed local variance.
autoSPIN guided wanderlust. Specifically, we use autoSPIN to help find the starting point for wanderlust.
sWanderlust(data, data_type = c("linear", "cyclical"), SPIN_option = c("STS", "neighborhood"), alpha = 0.2, sigma_width = 1, diffusionmap_components = 4, l = 15, k = 15, num_waypoints = 150, flock_waypoints = 2, waypoints_seed = 2711)
sWanderlust(data, data_type = c("linear", "cyclical"), SPIN_option = c("STS", "neighborhood"), alpha = 0.2, sigma_width = 1, diffusionmap_components = 4, l = 15, k = 15, num_waypoints = 150, flock_waypoints = 2, waypoints_seed = 2711)
data |
data Input data matrix. |
data_type |
The data type which guides the autoSPIN sorting, including |
SPIN_option |
SPIN contains two options including |
alpha |
alpha parameter for autoSPIN, default is 0.2. |
sigma_width |
Sigma width parameter for SPIN, default is 1. |
diffusionmap_components |
Number of components from diffusion map used for wanderlust analysis, default is 4. |
l |
Number of nearest neighbors, default is 15. |
k |
Number of nearest neighbors for repeating graphs, default is 15, should be less than or equal to l. |
num_waypoints |
Number of waypoint used for wanderlust, default is 150. |
flock_waypoints |
The number of times for flocking the waypoints, default is 2. |
waypoints_seed |
The seed for reproducing the results. |
a vector of the sorted oder.
Hao Chen
set.seed(15) shuffled_iris <- iris[sample(150, 150, replace = FALSE), ] data <- shuffled_iris[,1:4] data_label <- shuffled_iris[,5] wishbone <- sWanderlust(data = data, num_waypoints = 100)
set.seed(15) shuffled_iris <- iris[sample(150, 150, replace = FALSE), ] data <- shuffled_iris[,1:4] data_label <- shuffled_iris[,5] wishbone <- sWanderlust(data = data, num_waypoints = 100)
determining initial trajectory and landmarks
trajectory_landmarks(knn, data, s, partial_order = NULL, waypoints = 250, waypoints_seed = 123, metric = "euclidean", flock_waypoints = 2, band_sample = FALSE)
trajectory_landmarks(knn, data, s, partial_order = NULL, waypoints = 250, waypoints_seed = 123, metric = "euclidean", flock_waypoints = 2, band_sample = FALSE)
knn |
A sparse matrix of knn. |
data |
data. |
s |
The ID of starting point. |
partial_order |
A vector of IDs specified as recommended waypoints, NULL to ignore. |
waypoints |
Either the number of waypoints, or specify the waypoint IDs. |
waypoints_seed |
Random sampling seed, for reproducible results. |
metric |
Distance calculation metric for nearest neighbour detection. |
flock_waypoints |
Iteration of using nearest points around waypoint to adjust its position. |
band_sample |
if give more chance to nearest neighbours of starting point in randomly waypoints selection. |
a list
This package is designed to uncover the intrinsic cell progression path from single-cell RNA-seq data.
The main function of uSORT-pacakge
which provides a workflow of sorting scRNA-seq data.
uSORT(exprs_file, log_transform = TRUE, remove_outliers = TRUE, preliminary_sorting_method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"), refine_sorting_method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"), project_name = "uSORT", result_directory = getwd(), nCores = 1, save_results = TRUE, reproduce_seed = 1234, scattering_cutoff_prob = 0.75, driving_force_cutoff = NULL, qval_cutoff_featureSelection = 0.05, pre_data_type = c("linear", "cyclical"), pre_SPIN_option = c("STS", "neighborhood"), pre_SPIN_sigma_width = 1, pre_autoSPIN_alpha = 0.2, pre_autoSPIN_randomization = 20, pre_wanderlust_start_cell = NULL, pre_wanderlust_dfmap_components = 4, pre_wanderlust_l = 15, pre_wanderlust_num_waypoints = 150, pre_wanderlust_waypoints_seed = 2711, pre_wanderlust_flock_waypoints = 2, ref_data_type = c("linear", "cyclical"), ref_SPIN_option = c("STS", "neighborhood"), ref_SPIN_sigma_width = 1, ref_autoSPIN_alpha = 0.2, ref_autoSPIN_randomization = 20, ref_wanderlust_start_cell = NULL, ref_wanderlust_dfmap_components = 4, ref_wanderlust_l = 15, ref_wanderlust_num_waypoints = 150, ref_wanderlust_flock_waypoints = 2, ref_wanderlust_waypoints_seed = 2711)
uSORT(exprs_file, log_transform = TRUE, remove_outliers = TRUE, preliminary_sorting_method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"), refine_sorting_method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"), project_name = "uSORT", result_directory = getwd(), nCores = 1, save_results = TRUE, reproduce_seed = 1234, scattering_cutoff_prob = 0.75, driving_force_cutoff = NULL, qval_cutoff_featureSelection = 0.05, pre_data_type = c("linear", "cyclical"), pre_SPIN_option = c("STS", "neighborhood"), pre_SPIN_sigma_width = 1, pre_autoSPIN_alpha = 0.2, pre_autoSPIN_randomization = 20, pre_wanderlust_start_cell = NULL, pre_wanderlust_dfmap_components = 4, pre_wanderlust_l = 15, pre_wanderlust_num_waypoints = 150, pre_wanderlust_waypoints_seed = 2711, pre_wanderlust_flock_waypoints = 2, ref_data_type = c("linear", "cyclical"), ref_SPIN_option = c("STS", "neighborhood"), ref_SPIN_sigma_width = 1, ref_autoSPIN_alpha = 0.2, ref_autoSPIN_randomization = 20, ref_wanderlust_start_cell = NULL, ref_wanderlust_dfmap_components = 4, ref_wanderlust_l = 15, ref_wanderlust_num_waypoints = 150, ref_wanderlust_flock_waypoints = 2, ref_wanderlust_waypoints_seed = 2711)
exprs_file |
Input file name in txt format, with rownames of cells and colnames of genes. |
log_transform |
Boolean, if log transform the data. |
remove_outliers |
Boolean, if remove the outliers. |
preliminary_sorting_method |
Method name for preliminary sorting, including |
refine_sorting_method |
Method name for refined sorting, including |
project_name |
A character name as the prefix of the saved result file. |
result_directory |
The directory indicating where to save the results. |
nCores |
Number of cores that will be employed for drive gene selection (parallel computing), default is 1. |
save_results |
Boolean determining if save the results. |
reproduce_seed |
A seed used for reproducing the result. |
scattering_cutoff_prob |
Scattering cutoff value probability for gene selection, default 0.75. |
driving_force_cutoff |
Driving force cutoff value for gene selection, default NULL(automatically). |
qval_cutoff_featureSelection |
Q value cutoff for gene selection, default 0.05. |
pre_data_type |
The data type which guides the autoSPIN sorting, including |
pre_SPIN_option |
SPIN contains two options including |
pre_SPIN_sigma_width |
Sigma width parameter for SPIN, default is 1. |
pre_autoSPIN_alpha |
alpha parameter for autoSPIN, default is 0.2. |
pre_autoSPIN_randomization |
Number of randomizations for autoSPIN, default is 20. |
pre_wanderlust_start_cell |
The name of starting cell for wanderlust, default is the first cell from the data. |
pre_wanderlust_dfmap_components |
Number of components from diffusion map used for wanderlust analysis, default is 4. |
pre_wanderlust_l |
Number of nearest neighbors used for wanderlust, default is 15. |
pre_wanderlust_num_waypoints |
Number of waypoint used for wanderlust, default is 150. |
pre_wanderlust_waypoints_seed |
The seed for reproducing the wanderlust results. |
pre_wanderlust_flock_waypoints |
The number of times for flocking the waypoints, default is 2. |
ref_data_type |
The data type which guides the autoSPIN sorting, including |
ref_SPIN_option |
SPIN contains two options including |
ref_SPIN_sigma_width |
Sigma width parameter for SPIN, default is 1. |
ref_autoSPIN_alpha |
alpha parameter for autoSPIN, default is 0.2. |
ref_autoSPIN_randomization |
Number of randomizations for autoSPIN, default is 20. |
ref_wanderlust_start_cell |
The name of starting cell for wanderlust, default is the first cell from the data. |
ref_wanderlust_dfmap_components |
Number of components from diffusion map used for wanderlust analysis, default is 4. |
ref_wanderlust_l |
Number of nearest neighbors used for wanderlust, default is 15. |
ref_wanderlust_num_waypoints |
Number of waypoint used for wanderlust, default is 150 |
ref_wanderlust_flock_waypoints |
The number of times for flocking the waypoints, default is 2. |
ref_wanderlust_waypoints_seed |
The seed for reproducing the wanderlust results. |
This package incorporates data pre-processing, preliminary PCA gene selection, preliminary
cell ordering, feature selection, refined cell ordering, and post-analysis interpretation
and visualization. The uSORT workflow can be implemented through calling the main function or the GUI.
uSORT
.
results object (a list)
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) #remove the # symbol of the following codes to test #uSORT_results <- uSORT(exprs_file = file, project_name = "test", # preliminary_sorting_method = "autoSPIN", # refine_sorting_method = "sWanderlust", # save_results = FALSE)
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) #remove the # symbol of the following codes to test #uSORT_results <- uSORT(exprs_file = file, project_name = "test", # preliminary_sorting_method = "autoSPIN", # refine_sorting_method = "sWanderlust", # save_results = FALSE)
uSORT-package
This GUI provides an easy way for applying the uSORT package.
uSORT_GUI()
uSORT_GUI()
the GUI for uSORT-package
Hao Chen
http://JinmiaoChenLab.github.io/uSORT/
interactive() #if(interactive()) uSORT_GUI() # remove the hash symbol to run
interactive() #if(interactive()) uSORT_GUI() # remove the hash symbol to run
This is a function for generating the GUI for uSORT,
it's called by uSORT_GUI
. For internal use only.
uSORT_parameters_GUI()
uSORT_parameters_GUI()
a list of parameters.
Hao Chen
A data loading and pre-processing function which firstly identifies outlier cells and scarcely expressed genes.
uSORT_preProcess(exprs_file, log_transform = TRUE, remove_outliers = TRUE, lod = 1)
uSORT_preProcess(exprs_file, log_transform = TRUE, remove_outliers = TRUE, lod = 1)
exprs_file |
Input file name in txt format, with rownames of cells and colnames of genes. |
log_transform |
Boolean, if TRUE log transform the data. |
remove_outliers |
Boolean, if TRUE remove the outliers. |
lod |
A value of limit of detection in the unit of TPM/CPM/RPKM. It will be used as the starting value for outlier cell detection and the basis for removing scarce genes. |
A list containing exprs_raw
(data frame) and exprs_log_trimed
(data.frame).
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) exprs <- uSORT_preProcess(exprs_file = file)
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) exprs <- uSORT_preProcess(exprs_file = file)
Sorting methods include autoSPIN
, sWanderlust
, monocle
,
Wanderlust
, SPIN
. Any of the sorting method can be called directly
using this funciton.
uSORT_sorting_wrapper(data, data_raw, method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"), data_type = c("linear", "cyclical"), SPIN_option = c("STS", "neighborhood"), SPIN_sigma_width = 1, autoSPIN_alpha = 0.2, autoSPIN_randomization = 20, wanderlust_start_cell = NULL, wanderlust_dfmap_components = 4, wanderlust_l = 15, wanderlust_num_waypoints = 150, wanderlust_waypoints_seed = 2711, wanderlust_flock_waypoints = 2)
uSORT_sorting_wrapper(data, data_raw, method = c("autoSPIN", "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"), data_type = c("linear", "cyclical"), SPIN_option = c("STS", "neighborhood"), SPIN_sigma_width = 1, autoSPIN_alpha = 0.2, autoSPIN_randomization = 20, wanderlust_start_cell = NULL, wanderlust_dfmap_components = 4, wanderlust_l = 15, wanderlust_num_waypoints = 150, wanderlust_waypoints_seed = 2711, wanderlust_flock_waypoints = 2)
data |
Input preprocessed data matrix with row.name of cells and col.name of genes. |
data_raw |
Input raw data matrix with row.name of cells and col.name of genes, for monocle method. |
method |
The name of the sorting method to use, including |
data_type |
The type of the data, either |
SPIN_option |
The runing option of SPIN, |
SPIN_sigma_width |
Sigma width for SPIN. |
autoSPIN_alpha |
alpha for autoSPIN. |
autoSPIN_randomization |
Number of randomization for autoSPIN. |
wanderlust_start_cell |
The id of the starting cell for wanderlust. |
wanderlust_dfmap_components |
The number of components from diffusionmap for wanderlust. |
wanderlust_l |
The number of nearest neighbors used for wanderlust. |
wanderlust_num_waypoints |
The number of waypoints for wanderlust. |
wanderlust_waypoints_seed |
The seed for reproducible analysis. |
wanderlust_flock_waypoints |
The bumber of flock times for wanderlust. |
return the order of sorting results.
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) exprs <- uSORT_preProcess(exprs_file = file) exp_trimmed <- t(exprs$exprs_log_trimed) PCA_selected_genes <- pca_gene_selection(exp_trimmed) exp_PCA_genes <- exp_trimmed[, PCA_selected_genes] #order <- uSORT_sorting_wrapper(data = exp_PCA_genes, method = 'autoSPIN')
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) exprs <- uSORT_preProcess(exprs_file = file) exp_trimmed <- t(exprs$exprs_log_trimed) PCA_selected_genes <- pca_gene_selection(exp_trimmed) exp_PCA_genes <- exp_trimmed[, PCA_selected_genes] #order <- uSORT_sorting_wrapper(data = exp_PCA_genes, method = 'autoSPIN')
Save result object into a RData file. Save cell to cell distance heatmap for both preliminary and refined results. Creat plot of driver gene profiles on final ordering using heatmap.
uSORT_write_results(uSORT_results, project_name, result_directory)
uSORT_write_results(uSORT_results, project_name, result_directory)
uSORT_results |
Result object from |
project_name |
A prefix for the saving files. |
result_directory |
The path where to save the results. |
save the results.
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) #remove the # symbol of the following codes to test #uSORT_results <- uSORT(exprs_file = file, # project_name = 'test', # preliminary_sorting_method = 'autoSPIN', # refine_sorting_method = 'sWanderlust', # save_results = FALSE) #uSORT_write_results(uSORT_results, # project_name = 'test', # result_directory = getwd())
dir <- system.file('extdata', package='uSORT') file <- list.files(dir, pattern='.txt$', full=TRUE) #remove the # symbol of the following codes to test #uSORT_results <- uSORT(exprs_file = file, # project_name = 'test', # preliminary_sorting_method = 'autoSPIN', # refine_sorting_method = 'sWanderlust', # save_results = FALSE) #uSORT_write_results(uSORT_results, # project_name = 'test', # result_directory = getwd())
A utility function for scattering_quantification_per_gene which computes the degree of scattering for single gene, whereby the value is computed by summing over the local values of smaller local windows
variability_per_gene(logExp = NULL, min_expr = 0.1, window_size_perct = 0.1, nonZeroExpr_perct = 0.1)
variability_per_gene(logExp = NULL, min_expr = 0.1, window_size_perct = 0.1, nonZeroExpr_perct = 0.1)
logExp |
a log-scale expression vector of a gene |
min_expr |
a minimum expression value |
window_size_perct |
a window size (in dispersion level |
nonZeroExpr_perct |
a minimum amount of cells (in expression, otherwise the associated window will be assigned to 0 disperson value |
integer
MaiChan Lau
a wrapper of wanderlust for sWanderlust
wanderlust_wrapper(data, s, diffusionmap_components = 4, l = 15, k = 15, num_graphs = 1, num_waypoints = 150, waypoints_seed = 123, flock_waypoints = 2)
wanderlust_wrapper(data, s, diffusionmap_components = 4, l = 15, k = 15, num_graphs = 1, num_waypoints = 150, waypoints_seed = 123, flock_waypoints = 2)
data |
Input data matrix. |
s |
The ID of starting point. |
diffusionmap_components |
Number of components from diffusion map used for wanderlust analysis, default is 4. |
l |
Number of nearest neighbors, default is 15. |
k |
Number of nearest neighbors for repeating graphs, default is 15, should be less than or equal to l. |
num_graphs |
Number of repreated graphs. |
num_waypoints |
Number of waypoint used for wanderlust, default is 150. |
waypoints_seed |
The seed for reproducing the results. |
flock_waypoints |
The number of times for flocking the waypoints, default is 2. |
sorted order.
Hao Chen