Package 'RCSL' reference manual

Title:	Rank Constrained Similarity Learning for single cell RNA sequencing data
Description:	A novel clustering algorithm and toolkit RCSL (Rank Constrained Similarity Learning) to accurately identify various cell types using scRNA-seq data from a complex tissue. RCSL considers both lo-cal similarity and global similarity among the cells to discern the subtle differences among cells of the same type as well as larger differences among cells of different types. RCSL uses Spearman’s rank correlations of a cell’s expression vector with those of other cells to measure its global similar-ity, and adaptively learns neighbour representation of a cell as its local similarity. The overall similar-ity of a cell to other cells is a linear combination of its global similarity and local similarity.
Authors:	Qinglin Mei [cre, aut], Guojun Li [fnd], Zhengchang Su [fnd]
Maintainer:	Qinglin Mei <[email protected]>
License:	Artistic-2.0
Version:	1.15.0
Built:	2025-04-02 05:38:27 UTC
Source:	https://github.com/bioc/RCSL

Cell type annotations of 'yan' datasets by Yan et al.

Description

Cell type annotations of 'yan' datasets by Yan et al.

Usage

ann
ann

Format

An object of class data.frame with 90 rows and 1 columns.

Source

http://dx.doi.org/10.1038/nsmb.2660

Each row corresponds to one cell of 'yan' dataset

Calculate the bolock-diagnal matrix B min_B>=0, B1=1, F'F=I ||B - A||_1 + r||B||^2 + 2lambdatrace(F'L*F)

Description

Calculate the bolock-diagnal matrix B min_B>=0, B*1=1, F'*F=I ||B - A||_1 + r*||B||^2 + 2*lambda*trace(F'*L*F)

Usage

BDSM(S, C)
BDSM(S, C)

Arguments

`S`	the calculated initial similarity matrix S
`C`	the estimated number of clusters C

Value

B block-diagonal matrix

y clustering results

Examples

gfData <- GenesFilter(yan)
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
BDSM(res_SimS$S,C)

gfData <- GenesFilter(yan)
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
BDSM(res_SimS$S,C)

Solve the problem: min 1/2x'Lx-x'd s.t. x>=0, 1'x=1

Description

Solve the problem: min 1/2*x'*L*x-x'*d s.t. x>=0, 1'x=1

Usage

EProjSimplexdiag(d, l)
EProjSimplexdiag(d, l)

Arguments

`d`	matrix or vector
`l`	matrix or vector

Value

Estimate the optimal number of clusters C for clustering

Description

Estimate the optimal number of clusters C for clustering

Usage

EstClusters(drData, S)
EstClusters(drData, S)

Arguments

`drData`	gene expression matrix after PCA processing
`S`	the calculated similarity matrix S from "SimS"

Value

C the estimated number of clusters

Examples

gfData <- GenesFilter(yan)
res_SimS <- SimS(gfData)
EstClusters(res_SimS$drData,res_SimS$S)

gfData <- GenesFilter(yan)
res_SimS <- SimS(gfData)
EstClusters(res_SimS$drData,res_SimS$S)

Solve the problem: ||A-B||^2 = ||A||^2 + ||B||^2 - 2A'B

Description

Solve the problem: ||A-B||^2 = ||A||^2 + ||B||^2 - 2*A'*B

Usage

EucDist(A, B)
EucDist(A, B)

Arguments

`A`	matrix or vector
`B`	matrix or vector

Value

d matrix or vector

Perform the step of gene filtering to normalizaed gene expression data

Description

Perform the step of gene filtering to normalizaed gene expression data

Usage

GenesFilter(data, gfRatio = 0.025)
GenesFilter(data, gfRatio = 0.025)

Arguments

`data`	the normalized gene expression matrix
`gfRatio`	the ratio of genes filtering

Value

the gene expression matrix after genes filtering gfData

Examples

data(yan)
GenesFilter(yan)

data(yan)
GenesFilter(yan)

Infer the development lineage based on the clustering results from RCSL and the pseudotime

Description

Infer the development lineage based on the clustering results from RCSL and the pseudotime

Usage

getLineage(drData, clustRes, pseudoTime, simMeasure = "kendall")
getLineage(drData, clustRes, pseudoTime, simMeasure = "kendall")

Arguments

`drData`	preprocessed gene expression data (each column represent a cell)
`clustRes`	the clustering results identified by RCSL
`pseudoTime`	inferred by PlotPseudoTime() using the similarity matrix S and starting cell
`simMeasure`	the calculation method of measuring the cluster centers' similarity

Value

lineage the cell lineages connected all the cluster centers based on the clustering results from RCSL

Examples

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
Pseudo <- PlotPseudoTime(res_SimS$S,TrueLabel,startPoint=1)
getLineage(res_SimS$drData,res_BDSM$y,Pseudo$pseudoTime)

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
Pseudo <- PlotPseudoTime(res_SimS$S,TrueLabel,startPoint=1)
getLineage(res_SimS$drData,res_BDSM$y,Pseudo$pseudoTime)

Calculate the neighbor representation of cells to the low-dimensional gene expression matrix

Description

Calculate the neighbor representation of cells to the low-dimensional gene expression matrix

Usage

NeigRepresent(
  drData,
  NN.method = "KNN",
  Dis.method = "Euclidean",
  LSH.TreeNum = 30,
  LSH.Dim = 500,
  LSH.Dis = "angular",
  neiRatio = 0.65
)
NeigRepresent(
  drData,
  NN.method = "KNN",
  Dis.method = "Euclidean",
  LSH.TreeNum = 30,
  LSH.Dim = 500,
  LSH.Dis = "angular",
  neiRatio = 0.65
)

Arguments

`drData`	gene expression matrix after dimensionality reduced by PCA
`NN.method`	the method of finding neighbors
`Dis.method`	the distance metric in finding neighbors
`LSH.TreeNum`	the tree number of LSH
`LSH.Dim`	the dimension in LSH
`LSH.Dis`	the distance metric in LSH
`neiRatio`	ratio of the number of selected

Value

the similarity matrix measured by neighbor representation NR

Examples

gfData <- GenesFilter(yan)
res_SimS <- SimS(gfData)
NeigRepresent(res_SimS$drData)

gfData <- GenesFilter(yan)
res_SimS <- SimS(gfData)
NeigRepresent(res_SimS$drData)

Plot the visualization of constructed Minimum Spanning Tree based on the clustering results of RCSL

Description

Plot the visualization of constructed Minimum Spanning Tree based on the clustering results of RCSL

Usage

PlotMST(
  drData,
  clustRes,
  TrueLabel,
  dataName = "",
  fontSize = 12,
  VisualMethod = "umap"
)
PlotMST(
  drData,
  clustRes,
  TrueLabel,
  dataName = "",
  fontSize = 12,
  VisualMethod = "umap"
)

Arguments

`drData`	preprocessed gene expression data
`clustRes`	the clustering results identified by RCSL
`TrueLabel`	the real cell types to color the dots in plot
`dataName`	the name of the data that will be showed in the plot
`fontSize`	the font size of the plot
`VisualMethod`	the method for 2D visualization including UMAP,t-SNE and PCA

Value

MSTPlot ggplot object of the visualization of constructed MST

Examples

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
PlotMST(res_SimS$drData,res_BDSM$y,TrueLabel)

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
PlotMST(res_SimS$drData,res_BDSM$y,TrueLabel)

Infer the pseudo-temporal ordering between the cell types using the distance from a cell type to the predefined starting cell type.

Description

Infer the pseudo-temporal ordering between the cell types using the distance from a cell type to the predefined starting cell type.

Usage

PlotPseudoTime(
  S,
  TrueLabel,
  startPoint,
  fontSize = 12,
  dataName = "",
  sim = TRUE
)
PlotPseudoTime(
  S,
  TrueLabel,
  startPoint,
  fontSize = 12,
  dataName = "",
  sim = TRUE
)

Arguments

`S`	the similarity matrix calculated by SimS() function
`TrueLabel`	the real cell types used to indicate the vertical axis
`startPoint`	the posiition of the starting cell in the matrix
`fontSize`	the font size of the plot
`dataName`	the name of the data that will be showed in the plot
`sim`	indicate the input data is simialrity matrix or not

Value

PstudoTime

PseudoTimePlot ggplot object of the pseudo-temporal ordering of cells

Examples

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
PlotPseudoTime(res_SimS$S,TrueLabel,startPoint=1)

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
PlotPseudoTime(res_SimS$S,TrueLabel,startPoint=1)

Infer the developmental trajectories based on the clustering results from RCSL

Description

Infer the developmental trajectories based on the clustering results from RCSL

Usage

PlotTrajectory(
  gfData,
  clustRes,
  TrueLabel,
  lineage,
  fontSize = 12,
  dataName = "",
  VisualMethod = "umap"
)
PlotTrajectory(
  gfData,
  clustRes,
  TrueLabel,
  lineage,
  fontSize = 12,
  dataName = "",
  VisualMethod = "umap"
)

Arguments

`gfData`	preprocessed gene expression data (each column represent a cell)
`clustRes`	the clustering results identified by RCSL
`TrueLabel`	the real cell types
`lineage`	the lineage obtained by getLineage()
`fontSize`	the size of font in the plot
`dataName`	the name of the data that will be showed in the plot
`VisualMethod`	the display method of 2-D visualization

Value

TrajectoryPlot ggplot object of the inferred developmental trajectories

Examples

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
Pseudo <- PlotPseudoTime(res_SimS$S,TrueLabel,startPoint=1)
Linea <- getLineage(res_SimS$drData,res_BDSM$y,Pseudo$pseudoTime)
PlotTrajectory(gfData,res_BDSM$y,TrueLabel,lineage=Linea)

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
Pseudo <- PlotPseudoTime(res_SimS$S,TrueLabel,startPoint=1)
Linea <- getLineage(res_SimS$drData,res_BDSM$y,Pseudo$pseudoTime)
PlotTrajectory(gfData,res_BDSM$y,TrueLabel,lineage=Linea)

Perform the RCSL program

Description

Perform the RCSL program

Usage

RCSL(
  data,
  GF = TRUE,
  gfRatio = 0.025,
  pcRatio = 0.95,
  NN.method = "KNN",
  Dis.method = "Euclidean",
  neiRatio = 0.65
)
RCSL(
  data,
  GF = TRUE,
  gfRatio = 0.025,
  pcRatio = 0.95,
  NN.method = "KNN",
  Dis.method = "Euclidean",
  neiRatio = 0.65
)

Arguments

`data`	normalizaed gene expression matrix(each column represents a cell)
`GF`	should I need the gene filter step?
`gfRatio`	the ratio of the gene filter
`pcRatio`	the ratio between the variance of the
`NN.method`	the method of finding neighbors
`Dis.method`	the distance metric in finding neighbors
`neiRatio`	ratio of the number of selected

Value

gfData gene expression matrix after genes filtering

B block-diagonal matrix

C estimated number of clusters

y clustering results

Examples

data(yan)
data <- log2(yan+1)
RCSL(yan[,1:20])
data(yan)
data <- log2(yan+1)
RCSL(yan[,1:20])

Calculate the initial similarity matrix

Description

Calculate the initial similarity matrix

Usage

SimS(
  data,
  pcRatio = 0.95,
  gamma = 0.8,
  NN.method = "KNN",
  Dis.method = "Euclidean",
  LSH.TreeNum = 30,
  LSH.Dim = 1000,
  LSH.Dis = "angular",
  neiRatio = 0.65
)
SimS(
  data,
  pcRatio = 0.95,
  gamma = 0.8,
  NN.method = "KNN",
  Dis.method = "Euclidean",
  LSH.TreeNum = 30,
  LSH.Dim = 1000,
  LSH.Dis = "angular",
  neiRatio = 0.65
)

Arguments

`data`	gene expression matrix after genes filtering
`pcRatio`	the ratio between the variance of the
`gamma`	the ratio of the global simialrity
`NN.method`	the method of finding neighbors
`Dis.method`	the distance metric in finding neighbors
`LSH.TreeNum`	the tree number of LSH
`LSH.Dim`	the dimension in LSH
`LSH.Dis`	the distance metric in LSH
`neiRatio`	ratio of the number of selected

Value

initial similarity matrix S

gene expression matrix after PCA processing drData

Examples

gfData <- GenesFilter(yan)
SimS(gfData)

gfData <- GenesFilter(yan)
SimS(gfData)

Trajectory analysis

Description

Trajectory analysis

Usage

TrajectoryAnalysis(
  gfData,
  drData,
  S,
  clustRes,
  fontSize = 12,
  TrueLabel,
  startPoint,
  dataName = "",
  sim = TRUE,
  simMeasure = "kendall",
  VisualMethod = "umap"
)
TrajectoryAnalysis(
  gfData,
  drData,
  S,
  clustRes,
  fontSize = 12,
  TrueLabel,
  startPoint,
  dataName = "",
  sim = TRUE,
  simMeasure = "kendall",
  VisualMethod = "umap"
)

Arguments

`gfData`	preprocessed gene expression data (each column represent a cell)
`drData`	preprocessed gene expression data (each column represent a cell)
`S`	the similarity matrix calculated by SimS() function
`clustRes`	the clustering results identified by RCSL
`fontSize`	the size of font in the plot
`TrueLabel`	the real cell types used to indicate the vertical axis
`startPoint`	the posiition of the starting cell in the matrix
`dataName`	the name of the data that will be showed in the plot
`sim`	indicate the input data is simialrity matrix or not
`simMeasure`	the calculation method of measuring the cluster centers' similarity
`VisualMethod`	the display method of 2-D visualization

Value

PseudoTimePlot, MSTPlot, TrajectoryPlot

Examples

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
TrajectoryAnalysis(gfData,res_SimS$drData,res_SimS$S,res_BDSM$y,
                   TrueLabel=TrueLabel,startPoint=1)

gfData <- GenesFilter(yan)
TrueLabel <- ann$cell_type1
res_SimS <- SimS(gfData)
C <- EstClusters(res_SimS$drData,res_SimS$S)
res_BDSM <- BDSM(res_SimS$S,C)
TrajectoryAnalysis(gfData,res_SimS$drData,res_SimS$S,res_BDSM$y,
                   TrueLabel=TrueLabel,startPoint=1)

A public scRNA-seq dataset by Yan et al.

Description

A public scRNA-seq dataset by Yan et al.

Usage

yan
yan

Format

An object of class data.frame with 20214 rows and 90 columns.

Source

http://dx.doi.org/10.1038/nsmb.2660

Columns represent cells, rows represent genes expression values.

Package 'RCSL'

Help Index

Cell type annotations of 'yan' datasets by Yan et al.

Description

Usage

Format

Source

Calculate the bolock-diagnal matrix B min_B>=0, B*1=1, F'*F=I ||B - A||_1 + r*||B||^2 + 2*lambda*trace(F'*L*F)

Description

Usage

Arguments

Value

Examples

Solve the problem: min 1/2*x'*L*x-x'*d s.t. x>=0, 1'x=1

Description

Usage

Arguments

Value

Estimate the optimal number of clusters C for clustering

Description

Usage

Arguments

Value

Examples

Solve the problem: ||A-B||^2 = ||A||^2 + ||B||^2 - 2*A'*B

Description

Usage

Arguments

Value

Perform the step of gene filtering to normalizaed gene expression data

Description

Usage

Arguments

Value

Examples

Infer the development lineage based on the clustering results from RCSL and the pseudotime

Description

Usage

Arguments

Value

Examples

Calculate the neighbor representation of cells to the low-dimensional gene expression matrix

Description

Usage

Arguments

Value

Examples

Plot the visualization of constructed Minimum Spanning Tree based on the clustering results of RCSL

Description

Usage

Arguments

Value

Examples

Infer the pseudo-temporal ordering between the cell types using the distance from a cell type to the predefined starting cell type.

Description

Usage

Arguments

Value

Examples

Infer the developmental trajectories based on the clustering results from RCSL

Description

Usage

Arguments

Value

Examples

Perform the RCSL program

Description

Usage

Arguments

Value

Examples

Calculate the initial similarity matrix

Description

Usage

Arguments

Value

Examples

Trajectory analysis

Description

Usage

Calculate the bolock-diagnal matrix B min_B>=0, B1=1, F'F=I ||B - A||_1 + r||B||^2 + 2lambdatrace(F'L*F)

Solve the problem: min 1/2x'Lx-x'd s.t. x>=0, 1'x=1

Solve the problem: ||A-B||^2 = ||A||^2 + ||B||^2 - 2A'B