Package 'Oscope' reference manual

Title:	Oscope - A statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq
Description:	Oscope is a statistical pipeline developed to identifying and recovering the base cycle profiles of oscillating genes in an unsynchronized single cell RNA-seq experiment. The Oscope pipeline includes three modules: a sine model module to search for candidate oscillator pairs; a K-medoids clustering module to cluster candidate oscillators into groups; and an extended nearest insertion module to recover the base cycle order for each oscillator group.
Authors:	Ning Leng
Maintainer:	Ning Leng <[email protected]>
License:	Artistic-2.0
Version:	1.37.0
Built:	2025-03-18 03:20:01 UTC
Source:	https://github.com/bioc/Oscope

Oscope - A statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq

Description

Oscope is a statistical pipeline developed to identifying and recovering the base cycle profiles of oscillating genes in an unsynchronized single cell RNA-seq experiment. The Oscope pipeline includes three modules: a sine model module to search for candidate oscillator pairs; a K-medoids clustering module to cluster candidate oscillators into groups; and an extended nearest insertion module to recover the base cycle order for each oscillator group.

Details

Package:	Oscope
Type:	Package
Version:	0.99.0
Date:	2015-7-27
License:	Artistic-2.0

Author(s)

Ning Leng

Maintainer: Ning Leng <[email protected]>

References

Leng et al. Oscope - A statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq, accepted

Calculate absolute correlations among gene pairs

Description

Calculate absolute correlations among gene pairs

Usage

AbsCor(DataIn, method="pearson", diagNA=TRUE)
AbsCor(DataIn, method="pearson", diagNA=TRUE)

Arguments

`DataIn`	input data, gene-by-sample matrix
`method`	"pearson" or "spearman"; default is "pearson"
`diagNA`	whether replace diagonal values to NA's

Value

Output is a gene-by-gene matrix; the i, j th entry shows the absolute correlation of the ith and jth gene.

Author(s)

Ning Leng

Examples

AbsCor(matrix(rnorm(10),ncol=5))
AbsCor(matrix(rnorm(10),ncol=5))

Calculate estimated mean and variance of RNA-Seq data

Description

Calculate estimated mean and variance of RNA-Seq data

Usage

CalcMV(Data, Sizes=NULL, NormData=FALSE, MeanCutLow=100, MeanCutHigh=NULL, ApproxVal=10^-6, Plot=TRUE)
CalcMV(Data, Sizes=NULL, NormData=FALSE, MeanCutLow=100, MeanCutHigh=NULL, ApproxVal=10^-6, Plot=TRUE)

Arguments

`Data`	input data matrix; it should be a gene-by-sample or isoform-by sample matrix
`Sizes`	The library size factor for each sample. the number of values in Sizes is expected to be the same as the number of columns of Data. The library size factor will be estimated using the median normalization method implemented in EBSeq if Sizes is specified as NULL.
`NormData`	whether the data is already normalized. If NormData=TRUE, the specification of Sizes will be ignored and no normalization will be applied.
`MeanCutLow`, `MeanCutHigh`	we suggests the users to apply Oscope on genes with high mean and high variance. By default, MeanCutLow is specified as 100, consequently only genes with mean > 100 will be used. The CalcMV function will fit a linear regression on log(variance)~log(mean) on these genes. Genes with variance above this line are considered as the high mean high variance genes. The upper bound of mean may be specified using MeanCutHigh. If both are specified as NULL, all of the genes will be considered when fitting the regression.
`ApproxVal`	Default is 10^-6. It is used to approximate the estimate of parameter q for genes/isoforms whose estimated variance is less than estimated mean. q will be estimated using 1-ApproxVal
`Plot`	if Plot = T, a mean-variance plot will be shown. The fitted line will be shown and the selected genes will be marked in green.

Value

Output is a list with 6 sublists : Mean: estimated means of genes/isoforms; Var: estimated variances; Median: estimated medians; GeneToUse: the high mean high variance genes (suggested input for Oscope); Q: estimated q's (without apporximation); Q_mdf: estimated q's with approximations; Phi_mdf: estimated overdispersion parameter (phi), with approximations.

Author(s)

Ning Leng

Examples

exp=matrix(rnorm(100,1000,10),ncol=10)
rownames(exp)=paste0("g",1:10)
CalcMV(exp)
exp=matrix(rnorm(100,1000,10),ncol=10)
rownames(exp)=paste0("g",1:10)
CalcMV(exp)

Flag gene clusters with small within-cluster phase differences and/or small within-cluster sine scores

Description

Flag gene clusters with small within-cluster phase differences and/or small within-cluster sine scores

Usage

FlagCluster(SineRes, KMRes, Data, qt, thre=pi/4, qtincluster=.5, qtinpermu=.9 ,Seed=1)
FlagCluster(SineRes, KMRes, Data, qt, thre=pi/4, qtincluster=.5, qtinpermu=.9 ,Seed=1)

Arguments

`SineRes`	output of OscopeSine() function
`KMRes`	output of KMRes() function
`Data`	a gene-by-sample (isoform-by-sample) matrix indicating the rescaled expression of genes/isoforms. all values should be between [-1, 1].
`qt`, `thre`	Define a gene pair's linear score as min(eta, pi-eta), in which eta is defined as phase shift mod pi. A cluster will be flagged if the qt th quantile of within-cluster linear score is less than thre.
`qtincluster`, `qtinpermu`	To define clusters with small within-cluster sine scores, for each cluster we generate permuted data of these genes (different cell permutation for each gene). We calculate the within-cluster sine scores within the cluster of permuted genes, then infer whether the sine scores in the cluster of interest are greater than those generated by the permuted genes. A cluster will be flagged if its qtincluster th quantile in the original data is less than its qtinpermu th quantile in permuted data.
`Seed`	seed

Value

Output: RemoveID: a vector of cluster numbers that are flagged; SineCompreList: sine score and permuted sine score for each cluster; LinearList: linear score of each cluster

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- OscopeSine(Dat)
res2 <- OscopeKM(res1, quan=.8, maxK=5)
res <- FlagCluster(res1, res2, Dat)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- OscopeSine(Dat)
res2 <- OscopeKM(res1, quan=.8, maxK=5)
res <- FlagCluster(res1, res2, Dat)

Format SinFun outputs from lists to matrix

Description

Format SinFun outputs from lists to matrix

Usage

FormatSineOut(result, DataInSc, ShiftRg=pi/4)
FormatSineOut(result, DataInSc, ShiftRg=pi/4)

Arguments

`result`	Output from SineFun
`DataInSc`	a gene-by-sample (isoform-by-sample) matrix indicating the rescaled expression of two genes/isoforms. all values should be bettwen [-1, 1].
`ShiftRg`	phase shift cutoff.

Value

Output is a list with 4 sublists, each shows a N-by-N matrix, in which#' N is the total number of genes (isoforms). SimiMat: similarity matrix (sine scores); the sine scores are calculated by -log10(epsilon^2). DiffMat: dissimilarity matrix; shown are epsilon^2 for each gene pair. ShiftMat: optimal phase shift estimate for each pair of genes.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
DataInSc <- rbind(aa,bb,cc)
NumGene <- nrow(DataInSc)
Res <- sapply(1:(NumGene-1),function(i)SineFun(DataInSc, i),simplify=FALSE)
Out <- FormatSineOut(Res, DataInSc)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
DataInSc <- rbind(aa,bb,cc)
NumGene <- nrow(DataInSc)
Res <- sapply(1:(NumGene-1),function(i)SineFun(DataInSc, i),simplify=FALSE)
Out <- FormatSineOut(Res, DataInSc)

Search for the optimal sample order by using the Extended Nearest Insertion

Description

Search for the optimal sample order by using the Extended Nearest Insertion

Usage

ImpShift(Data, Seq=NULL, NChun=4, RdmStart=FALSE, Ndg=3)
ImpShift(Data, Seq=NULL, NChun=4, RdmStart=FALSE, Ndg=3)

Arguments

`Data`	gene-by-sample matrix or isoform-by-sample matrix.It should be rescaled to values bwteen [-1,1].
`Seq`	NULL or a vector indicates the sample order. if specified, the samples will be first reordered by this vector.
`NChun`	number of starting points for polynomial fitting.
`RdmStart`	whether the start points are randomly selected.
`Ndg`	degree of polynomial.

Value

This function performs the extended nearest insertion (ENI). The ENI algorithm searchs for the optimal sample order which minimizes the MSE of sliding polynomial regression (SPR). This function will call PipeShiftCDF() function, which fits SPR to each row of the data. For each gene/isoform, SPR fits NChun polynomial curves with different starting points (samples). The samples with smaller order than the start point will be appended to follow the last sample when fitting. So each fitting consider same number of samples. If RdmStart = TRUE, the start points are randomly selected. Otherwise they are evenly sampled along the sample order. The aggregated MSE of a fit (using a specific start point) is defined as the summation of the MSEs of all genes/isoforms considered here. The MSE of the SPR is defined as the largest aggregated MSE across fits using different start points. The output returns the optimal order which provides the smallest SPR MSE.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- ImpShift(rbind(aa,bb,cc), NChun=2)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- ImpShift(rbind(aa,bb,cc), NChun=2)

Run Extended Nearest Insertion and 2-opt on a gene cluster identified by OscopeKM function

Description

Run Extended Nearest Insertion and 2-opt on a gene cluster identified by OscopeKM function

Usage

NISFun(ClusterList, DataIn, i, Ndg=3, NChun=4, RdmStart=FALSE, N=20000, NCThre=1000)
NISFun(ClusterList, DataIn, i, Ndg=3, NChun=4, RdmStart=FALSE, N=20000, NCThre=1000)

Arguments

`ClusterList`	a list of gene clusters. Each sublist contains a vector of gene names.
`DataIn`	gene-by-sample matrix or isoform-by-sample matrix.It should be rescaled to values bwteen [-1,1].
`i`	the cluster of interest. If the second cluster in ClusterList is of interest, specify i=2.
`Ndg`	degree of polynomial.
`NChun`	number of starting points for polynomial fitting.
`RdmStart`	whether the start points are randomly selected.
`N`, `NCThre`	The 2-opt algorithm will stop if N iterations has been performed or if the optimal order remains unchanged for over NCThre iterations.

Value

This function performs the extended nearest insertion (ENI) and 2-opt algorithm to a particular cluster identified by OscopeKM function. The ENI algorithm searchs for the optimal sample order which minimizes the MSE of sliding polynomial regression (SPR). This function will call PipeShiftCDF() function, which fits SPR to each row of the data. For each gene/isoform, SPR fits NChun polynomial curves with different starting points (samples). The samples with smaller order than the start point will be appended to follow the last sample when fitting. So each fitting consider same number of samples. If RdmStart = TRUE, the start points are randomly selected. Otherwise they are evenly sampled along the sample order. The aggregated MSE of a fit (using a specific start point) is defined as the summation of the MSEs of all genes/isoforms considered here. The MSE of the SPR is defined as the largest aggregated MSE across fits using different start points. The output of PipeShiftCDF() returns the optimal order which provides the smallest SPR MSE. The 2-opt algorithm is then applied to improve the optimal order searching of the ENI. In each iteration, 2-opt algorithm will randomly choose two points (samples), the flip the samples between these two points. The new order will be adapted if it provides smaller SPR MSE. The output returns the optimal order for the cluster of interest.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- NISFun(list(c("aa","bb"),"cc"), rbind(aa,bb,cc),i=1, NChun=2, N=50)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- NISFun(list(c("aa","bb"),"cc"), rbind(aa,bb,cc),i=1, NChun=2, N=50)

Rescale the gene/isoform expression matrix

Description

Rescale the gene/isoform expression matrix

Usage

NormForSine(Data, qt1=.05, qt2=.95)
NormForSine(Data, qt1=.05, qt2=.95)

Arguments

`Data`	input gene-by-sample matrix or isoform-by-sample matrix
`qt1`, `qt2`	thresholds for outlier adjustment. For each gene/isoform, values <= qt1 th quantile (>= qt2 th quantile) will be pushed to qt1 th quantile (qt2 th quantile) prior to the scaling. default values are 0.05 and 0.95.

Value

The output will be a gene-by-sample or isoform-by-sample matrix. For each gene/isoform, the expressions will be scaled linearly to [-1,1]

Author(s)

Ning Leng

Examples

NormForSine(matrix(rnorm(10), nrow=2))
NormForSine(matrix(rnorm(10), nrow=2))

Run the 2-opt algorithm to improve the optimal order searching of the Extended Nearest Insertion

Description

Run the 2-opt algorithm to improve the optimal order searching of the Extended Nearest Insertion

Usage

Opt2Shift(Data,N=20000,Seq,Ndg=3,NChun=4, NCThre=1000, RdmStart=FALSE)
Opt2Shift(Data,N=20000,Seq,Ndg=3,NChun=4, NCThre=1000, RdmStart=FALSE)

Arguments

`Data`	gene-by-sample matrix or isoform-by-sample matrix. It should be rescaled to values bwteen [-1,1].
`N`, `NCThre`	The 2-opt algorithm will stop if N iterations has been performed or if the optimal order remains unchanged for over NCThre iterations.
`Seq`	a vector indicates the sample order obtained from the ENI.
`Ndg`	degree of polynomial.
`NChun`	number of starting points for polynomial fitting.
`RdmStart`	whether the start points are randomly selected.

Value

This function performs the the 2-opt algorithm to improve the optimal order searching of the Extended Nearest Insertion (ENI). In each iteration, the function will randomly choose two points (samples), the flip the samples between these two points. The new order will be adapted if it provides smaller SPR MSE. The output returns the optimal order and its SPR MSE.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- ImpShift(rbind(aa,bb,cc), NChun=2)
res2 <- Opt2Shift(rbind(aa,bb,cc), NChun=2, N=50, Seq=res)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- ImpShift(rbind(aa,bb,cc), NChun=2)
res2 <- Opt2Shift(rbind(aa,bb,cc), NChun=2, N=50, Seq=res)

Search for the optimal sample order for different gene clusters

Description

Search for the optimal sample order for different gene clusters

Usage

OscopeENI(KMRes, Data, ClusterUse=NULL, Ndg=3, NChun=4, RdmStart=FALSE,
N=20000, NCThre=1000, parallel=FALSE, parallelParam=NULL)
OscopeENI(KMRes, Data, ClusterUse=NULL, Ndg=3, NChun=4, RdmStart=FALSE,
N=20000, NCThre=1000, parallel=FALSE, parallelParam=NULL)

Arguments

`KMRes`	output of OscopeKM() function.
`Data`	gene-by-sample matrix or isoform-by-sample matrix. It should be rescaled to values bwteen [-1,1].
`ClusterUse`	a vector indicating what clusters are of interest. For example, by setting ClusterUse = c(1,2,3), only the top 3 clusters will be considered while recovering the base cycle order. If ClusterUse=NULL, all clusters will be used.
`Ndg`	degree of polynomial.
`NChun`	number of starting points for polynomial fitting.
`RdmStart`	whether the start points are randomly selected.
`N`, `NCThre`	The 2-opt algorithm will stop if N iterations has been performed or if the optimal order
`parallel`	whether apply parallel computing. if it is TRUE, BiocParallel will be called.
`parallelParam`	a SnowParam object to specify the clusters. If it is NULL, the default will be set as SnowParam(workers = 5, type = "SOCK") remains unchanged for over NCThre iterations.

Value

This function performs the extended nearest insertion (ENI) and 2-opt algorithm to all clusters (or a subset of picked clusters) identified by OscopeKM function. The function will recover independent orders to each of the clusters. For each cluster, the ENI algorithm will be applied to search for the optimal sample order which minimizes the MSE of sliding polynomial regression (SPR). This function will call PipeShiftCDF() function, which fits SPR to expression of each gene/isoform within a cluster. For each gene/isoform, SPR fits NChun polynomial curves with different starting points (samples). The samples with smaller order than the start point will be appended to follow the last sample when fitting. So each fitting consider same number of samples. If RdmStart = TRUE, the start points are randomly selected. Otherwise they are evenly sampled along the sample order. The aggregated MSE of a fit (using a specific start point) is defined as the summation of the MSEs of all genes/isoforms considered here. The MSE of the SPR is defined as the largest aggregated MSE across fits using different start points. The output of PipeShiftCDF() returns the optimal order which provides the smallest SPR MSE. The 2-opt algorithm will then be applied to improve the optimal order searching of the ENI. In each iteration, the 2-opt algorithm will randomly choose two points (samples), the flip the samples between these two points. The new order will be adapted if it provides smaller SPR MSE. The output returns the optimal order for each cluster of interest. It is a list with multiple sublists, in which each sublist includes the recovered order of the corresponding cluster in ClusterUse. If ClusterUse is not specified, the k th sublist shows the recovered order in KMRes

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
dd <- sin(seq(1.2,2.2,.1))
res <- OscopeENI(list(c1=c("aa","bb"),c2=c("cc","dd")), rbind(aa,bb,cc,dd), NChun=2, N=50)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
dd <- sin(seq(1.2,2.2,.1))
res <- OscopeENI(list(c1=c("aa","bb"),c2=c("cc","dd")), rbind(aa,bb,cc,dd), NChun=2, N=50)

Simulated gene level data set with 600 genes and 30 cells.

Description

Simulated gene expression to evaluate Oscope. 600 genes and 30 cell are simulated. The expression mean of the genes are randomly simulated in the range of 10-10000.

Format

GeneExampleData is a matrix with 600 genes (rows) and 30 cells (columns).

Examples

data(OscopeExampleData)
str(OscopeExampleData)
data(OscopeExampleData)
str(OscopeExampleData)

Oscope K medoid module

Description

Oscope K medoid module

Usage

OscopeKM(SineRes, quan=.95,cut=NULL,maxK=NULL,minSize=0, maxSize=200, fixK=NULL, rawscale=TRUE)
OscopeKM(SineRes, quan=.95,cut=NULL,maxK=NULL,minSize=0, maxSize=200, fixK=NULL, rawscale=TRUE)

Arguments

`SineRes`	output of OscopeSine function.
`quan`	only gene pairs with similarity score >= quan th quantile will be considered in the clustering analyses. Default is 0.95.
`cut`	pre-defined cutoff. Gene pairs with similarity score >= cut will be considered in cluster analyses. If cut is defined, quan will be ignored.
`maxK`	max number of clusters to consider (scan). if numbC=NULL, it will be calculated as [number of gene considered]/10
`minSize`, `maxSize`	Only clusters with minSize<= cluster size <= maxSize are reported in output.
`fixK`	if fixK is specified, the k-medoids algorithm will be applied with fixK clusters.
`rawscale`	Recall the input is the similarity matrix (-log10(distance from the sine model)). the k-medoids clustering will be applied using (-Input) as distance. If rawscale is defined as TRUE, the k-medoids clustering will be applied using -10^Input as distance.

Value

OscopeKM() calls scanK() function, which runs k-medoid clustering with varying number of clusters (k). The k is varied from 2 to maxK. The input should be the output of OscopeSine() function. scanK() function will cluster genes in gene pairs with high similarity score (the threshold can be defined using parameter quan). To select the top genes, the function first calculate the max similarity score for each gene, then select the genes with high max score.

The output object shows members in each cluster. clusters are sorted by median similarity score within cluster.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- OscopeSine(Dat)
res2 <- OscopeKM(res1, quan=.8, maxK=5)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- OscopeSine(Dat)
res2 <- OscopeKM(res1, quan=.8, maxK=5)

Apply sine model on the full set of genes or isoforms

Description

Apply sine model on the full set of genes or isoforms

Usage

OscopeSine(DataInSc, parallel=FALSE, parallelParam=NULL)
OscopeSine(DataInSc, parallel=FALSE, parallelParam=NULL)

Arguments

`DataInSc`	a gene-by-sample (isoform-by-sample) matrix indicating the rescaled expression of two genes/isoforms. all values should be bettwen [-1, 1].
`parallel`	whether apply parallel computing. if it is TRUE, BiocParallel will be called.
`parallelParam`	a SnowParam object to specify the clusters. If it is NULL, the default will be set as SnowParam(workers = 5, type = "SOCK") remains unchanged for over NCThre iterations.

Value

Output is a list with 4 sublists, each shows a N-by-N matrix, in which N is the total number of genes (isoforms). SimiMat: similarity matrix (sine scores); the sine scores are calculated by -log10(epsilon^2). DiffMat: dissimilarity matrix; shown are epsilon^2 for each gene pair. ShiftMat: optimal phase shift estimate for each pair of genes.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
OscopeSine(rbind(aa,bb,cc))
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
OscopeSine(rbind(aa,bb,cc))

Define sine scroe cutoff using permuted data

Description

Define sine scroe cutoff using permuted data

Usage

PermuCut(Data, NumPermu=1000)
PermuCut(Data, NumPermu=1000)

Arguments

`Data`	a gene-by-sample (isoform-by-sample) matrix indicating the rescaled expression of genes/isoforms. all values should be between [-1, 1].
`NumPermu`	number of permuted genes to generate.

Value

Output contains a vector of numbers. Each number presents max sine score of a given permuted gene.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- PermuCut(Dat,100)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- PermuCut(Dat,100)

Calculate residual of polynomial fit

Description

Calculate residual of polynomial fit

Usage

PipeR(Data,Ndg=3,Method="Poly")
PipeR(Data,Ndg=3,Method="Poly")

Arguments

`Data`	gene-by-sample matrix or isoform-by-sample matrix.It should be rescaled to values bwteen [-1,1].
`Ndg`	degree of polynomial.
`Method`	only polynomial fitting ("Poly") is available now.

Value

The function will fit polynomial curve to each row of the data. The output returns the MSE of each row (gene/isoform).

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- PipeR(rbind(aa,bb,cc))
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- PipeR(rbind(aa,bb,cc))

Calculate residual of the sliding polynomial regression

Description

Calculate residual of the sliding polynomial regression

Usage

PipeShiftCDF(Data, Ndg=3, NChun=4, RdmStart=FALSE)
PipeShiftCDF(Data, Ndg=3, NChun=4, RdmStart=FALSE)

Arguments

`Data`	gene-by-sample matrix or isoform-by-sample matrix. It should be rescaled to values bwteen [-1,1].
`Ndg`	degree of polynomial.
`NChun`	number of starting points for polynomial fitting.
`RdmStart`	whether the start points are randomly selected.

Value

The function will fit sliding polynomial regression (SPR) to each row of the data. For each gene/isoform, SPR fits NChun polynomial curves with different starting points (samples). The samples with smaller order than the start point will be appended to follow the last sample when fitting. So each fitting consider same number of samples. If RdmStart = TRUE, the start points are randomly selected. Otherwise they are evenly sampled along the sample order. The aggregated MSE of a fit (using a specific start point) is defined as the summation of the MSEs of all genes/isoforms considered here. The output returns the MSE of the SPR, which is the largest aggregated MSE across fits using different start points.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- PipeShiftCDF(rbind(aa,bb,cc), NChun=2)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- PipeShiftCDF(rbind(aa,bb,cc), NChun=2)

Run k-medoid algorithm with varying k on similarity matrix

Description

Run k-medoid algorithm with varying k on similarity matrix

Usage

scanK(SimiMatIn, quan=.95,cut=NULL, maxK=NULL,minSize=0, maxSize=200, fixK=NULL, rawscale=FALSE)
scanK(SimiMatIn, quan=.95,cut=NULL, maxK=NULL,minSize=0, maxSize=200, fixK=NULL, rawscale=FALSE)

Arguments

`SimiMatIn`	gene-by-gene similarity matrix
`quan`	only gene pairs with similarity score >= quan th quantile will be considered in the cluster analyses. Default is 0.95.
`cut`	pre-defined cutoff. Gene pairs with similarity score >= cut will be considered in cluster analyses. If cut is defined, quan will be ignored.
`maxK`	max number of clusters to consider (scan). if numbC=NULL, it will be calculated as [number of gene considered]/10.
`minSize`, `maxSize`	Only clusters with minSize<= cluster size <= maxSize are reported in output.
`fixK`	if fixK is specified, the k-medoids algorithm will be applied with fixK clusters.
`rawscale`	Recall the input is the similarity matrix (-log10(distance from the sine model)). the k-medoids clustering will be applied using (-Input) as distance. If rawscale is defined as TRUE, the k-medoids clustering will be applied using -10^Input as distance.

Value

scanK() function runs k-medoid clustering with varying number of clusters (k). The k is varied from 2 to maxK. The input of scanK() function should be a similarity matrix. scanK() function will cluster genes in gene pairs with high similarity score (the threshold can be defined using parameter quan). To select the top genes, the function first calculate the max similarity score for each gene, then select the genes with high max score.

The output object is a list with 4 sublists: membOut: members in each cluster. clusters are sorted by median similarity score within cluster;

MedCor: median similarity score for each cluster;

Mat: input similarity matrix;

filteredMat: similarity matrix, only showing the top genes used in clustering;

Kcluster: cluster indicator of each top gene.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- OscopeSine(Dat)
res2 <- scanK(res1$SimiMat, quan=.8, maxK=5)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
tmp <- matrix(sin(rnorm(330)),ncol=11)
rownames(tmp) <- paste0("tmp",1:30)
Dat <- rbind(aa, bb, cc, tmp)
res1 <- OscopeSine(Dat)
res2 <- scanK(res1$SimiMat, quan=.8, maxK=5)

Apply sine model on one particular gene vs. other genes

Description

Apply sine model on one particular gene vs. other genes

Usage

SineFun(DataInSc,i)
SineFun(DataInSc,i)

Arguments

`DataInSc`	a gene-by-sample (isoform-by-sample) matrix indicating the rescaled expression of two genes/isoforms. all values should be bettwen [-1, 1].
`i`	the gene (isoform) of interest. The function will apply the sine model on gene (isoform) i vs. gene (isoform) j for all j > i. Gene (isoform) i (j) is defined as the gene (isoform )shown in the i (j) th row. i should be smaller than the total number of genes (isoforms).

Value

Output is a list with two sublists, each shows the optimal phi's (shift) and epsilon's (value). N-i entries will be included in each sublist (N is the total number of genes/isoforms). The kth entry indicates results of gene (isoform) i vs. i+k.

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
SineFun(rbind(aa,bb,cc), 1)
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
SineFun(rbind(aa,bb,cc), 1)

Function for searching optimal phase shift

Description

Function for searching optimal phase shift

Usage

SineOptim(Pairdata)
SineOptim(Pairdata)

Arguments

Pairdata

a sample-by-2 matrix indicating the rescaled expression of two genes/isoforms. all values should be bettwen [-1, 1].

Value

Output provides the optimal phi (shift) and its corresponding epsilon^2 (value) of the sine model. epsilon_g1,g2^2 = sum_s [X_g1,s^2+X^2_g2,s -2X_g1,sX_g2,s cos(phi_g1,g2)-sin^2(phi_g1,g2)]^2

Author(s)

Ning Leng

Examples

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
SineOptim(cbind(aa,bb))
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
SineOptim(cbind(aa,bb))

Package 'Oscope'

Help Index

Oscope - A statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq

Description

Details

Author(s)

References

Calculate absolute correlations among gene pairs

Description

Usage

Arguments

Value

Author(s)

Examples

Calculate estimated mean and variance of RNA-Seq data

Description

Usage

Arguments

Value

Author(s)

Examples

Flag gene clusters with small within-cluster phase differences and/or small within-cluster sine scores

Description

Usage

Arguments

Value

Author(s)

Examples

Format SinFun outputs from lists to matrix

Description

Usage

Arguments

Value

Author(s)

Examples

Search for the optimal sample order by using the Extended Nearest Insertion

Description

Usage

Arguments

Value

Author(s)

Examples

Run Extended Nearest Insertion and 2-opt on a gene cluster identified by OscopeKM function

Description

Usage

Arguments

Value

Author(s)

Examples

Rescale the gene/isoform expression matrix

Description

Usage

Arguments

Value

Author(s)

Examples

Run the 2-opt algorithm to improve the optimal order searching of the Extended Nearest Insertion

Description

Usage

Arguments

Value

Author(s)

Examples

Search for the optimal sample order for different gene clusters

Description

Usage

Arguments

Value

Author(s)

Examples

Simulated gene level data set with 600 genes and 30 cells.

Description

Format

Examples

Oscope K medoid module

Description

Usage

Arguments

Value

Author(s)