Title: | runibic: row-based biclustering algorithm for analysis of gene expression data in R |
---|---|
Description: | This package implements UbiBic algorithm in R. This biclustering algorithm for analysis of gene expression data was introduced by Zhenjia Wang et al. in 2016. It is currently considered the most promising biclustering method for identification of meaningful structures in complex and noisy data. |
Authors: | Patryk Orzechowski, Artur Pańszczyk |
Maintainer: | Patryk Orzechowski <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.29.0 |
Built: | 2024-12-27 06:00:57 UTC |
Source: | https://github.com/bioc/runibic |
This function retrieves the Longest Common Subsequence (LCS) between two integer vectors by backtracking the matrix obtained with dynamic programming.
backtrackLCS(x, y)
backtrackLCS(x, y)
x |
an integer vector |
y |
an integer vector |
an integer vector containing the the Longest Common Subsequence (LCS) between vectors x and y (i.e. the values that appear in both x and y in the same order)
runibic
pairwiseLCS
calculateLCS
A <- c(1, 2, 3, 4, 5) B <- c(1, 2, 4) backtrackLCS(A, B)
A <- c(1, 2, 3, 4, 5) B <- c(1, 2, 4) backtrackLCS(A, B)
An S4 class to represent BCUnibic-class
UniBic biclustering algorithm
for numeric input. The class is intended to use with
An S4 class BCUnibicD-class
defines UniBic biclustering algorithm
for discrete input.
This function computes unique pairwise Longest Common Subsequences between each row of input matrix. The function outputs a list sorted by Longest Common Subsequences (LCS) length. The output list contains the length of calculated LCS, indices, of the first and second rows between which LCS was calculated. The function uses two different sorting methods. The default one uses Fibonacci Heap used in original implementation of Unibic, the second one uses standard sorting algorithm from C++ STL.
calculateLCS(discreteInput, useFibHeap = TRUE)
calculateLCS(discreteInput, useFibHeap = TRUE)
discreteInput |
is a input discrete matrix |
useFibHeap |
boolean value for choosing which sorting method should be used in sorting of output |
a list with sorted values based on calculation of the length of LCS between all pairs of rows
runibic
backtrackLCS
pairwiseLCS
A <- matrix(c(4, 3, 1, 2, 5, 8, 6, 7), nrow=2, byrow=TRUE) calculateLCS(A, TRUE)
A <- matrix(c(4, 3, 1, 2, 5, 8, 6, 7), nrow=2, byrow=TRUE) calculateLCS(A, TRUE)
This function search for biclusters in the input matrix. The calculations are based on the integer matrix with indexes indicating positions of j-th smallest element in each row and the results from calculations of Longest Common Subsequence between all rows in the input matrix. The paramteres of this function can be obtained from other functions provided by this package.
cluster(discreteInput, discreteInputValues, scores, geneOne, geneTwo, rowNumber, colNumber)
cluster(discreteInput, discreteInputValues, scores, geneOne, geneTwo, rowNumber, colNumber)
discreteInput |
an integer matrix with indices of sorted columns |
discreteInputValues |
an integer matrix with discrete values |
scores |
a numeric vector with LCS length |
geneOne |
a numeric vector with first row indexes from pairwise LCS calculation |
geneTwo |
a numeric vector with second row indexes from pairwise LCS calculation |
rowNumber |
a int with number of rows in the input matrix |
colNumber |
a int with number of columns in the input matrix |
a list with information of found biclusters
A <- matrix( c(4,3,1,2,5,8,6,7,9,10,11,12),nrow=4,byrow=TRUE) iA <- unisort(A) lcsResults <- calculateLCS(A) cluster(iA, A, lcsResults$lcslen, lcsResults$a, lcsResults$b, nrow(A), ncol(A))
A <- matrix( c(4,3,1,2,5,8,6,7,9,10,11,12),nrow=4,byrow=TRUE) iA <- unisort(A) lcsResults <- calculateLCS(A) cluster(iA, A, lcsResults$lcslen, lcsResults$a, lcsResults$b, nrow(A), ncol(A))
This function calculates the matrix with Longest Common Subsequence (LCS) between two numeric vectors. From given matrix we can locate the size of the Longest Common Subsequence in the last column in the last row.
pairwiseLCS(x, y)
pairwiseLCS(x, y)
x |
an integer vector |
y |
an integer vector |
a matrix computed using dynamic programming that stores the Longest Common Subsequence (LCS) between two vectors A and B.
runibic
calculateLCS
backtrackLCS
A <- c(1, 2, 3, 4, 5) B <- c(1, 2, 4) pairwiseLCS(A, B)
A <- c(1, 2, 3, 4, 5) B <- c(1, 2, 4) pairwiseLCS(A, B)
runibic
is a package that contains much faster parallel version of one of the most accurate biclustering algorithms, UniBic.
The original method was reimplemented from C to C++11, OpenMP was added for parallelization.
If you use this package, please cite it as: Patryk Orzechowski, Artur Pańszczyk, Xiuzhen Huang, Jason H Moore; "runibic: a Bioconductor package for parallel row-based biclustering of gene expression data"; Bioinformatics, 2018, bty512, doi: https://doi.org/10.1093/bioinformatics/bty512
Each of the following functions BCUnibic
, BCUnibicD
,
runibic
perform biclustering
using UniBic biclustering algorithm. The major difference
between the functions is that BCUnibicD
require a discretized matrix,
whilst BCUnibic
(or runibic
)
could be applied to numeric one.
BCUnibic(x = NULL, t = 0.95, q = 0, f = 1, nbic = 100, div = 0, useLegacy = FALSE) BCUnibicD(x = NULL, t = 0.95, q = 0, f = 1, nbic = 100, div = 0, useLegacy = FALSE) runibic(x = NULL, t = 0.95, q = 0, f = 1, nbic = 100, div = 0, useLegacy=FALSE)
BCUnibic(x = NULL, t = 0.95, q = 0, f = 1, nbic = 100, div = 0, useLegacy = FALSE) BCUnibicD(x = NULL, t = 0.95, q = 0, f = 1, nbic = 100, div = 0, useLegacy = FALSE) runibic(x = NULL, t = 0.95, q = 0, f = 1, nbic = 100, div = 0, useLegacy=FALSE)
x |
numeric or integer matrix (depends on the function) |
t |
consistency level of the block (0.5-1.0]. |
q |
a double value for quantile discretization |
f |
filtering overlapping blocks (default 1 do not remove any blocks) |
nbic |
maximum number of biclusters in output |
div |
number of ranks for up(down)-regulated genes: default: 0==ncol(x) |
useLegacy |
boolean value for using legacy parameter settings |
For a given input matrix we first perform discretization and create index matrix using runiDiscretize
function.
The discretization is performed taking into account quantiles of the data.
The resulting index matrix allows to detect order-preserving trends between each pair of the rows
irrespective to the order of columns.
After the ranking, the matrix is split by rows into subgroups based on the significance of the future biclusters.
In each of the chunks, we calculate pairwise calculations of Longest Common Subsequence LCS between all pairs of the rows.
LCS calculations are performed using dynamic programming and determine the longest order-preserving trend between each pair of the rows.
After partitioning the matrix strict order-preserving biclusters are determined and later expanded
to approximate-trend biclusters within cluster
function.
This package provides 3 main functions:
runibic
and BCUnibic
perform UniBic biclustering algorithm on numeric data, whilst
BCUnibicD
could be applied to integer ones. The latter two methods are compatible with Biclust
class.
Biclust
object with detected biclusters
BCUnibic
: BCUnibic
performs biclustering using UniBic on numeric matrix.
It is intended to use as a method called from biclust
.
BCUnibicD
: perform biclustering using UniBic on integer matrix.
It is intended to use as a method called from biclust
.
runibic
: perform biclustering using UniBic on numeric matrix.
Patryk Orzechowski [email protected], Artur Pańszczyk [email protected]
Wang, Zhenjia, et al. "UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data." Scientific reports 6 (2016): 23466.
Patryk Orzechowski, Artur Pańszczyk, Xiuzhen Huang, Jason H. Moore: "runibic: a Bioconductor package for parallel row-based biclustering of gene expression data", bioRxiv (2017): 210682, doi: https://doi.org/10.1101/210682
runiDiscretize
set_runibic_params
BCUnibic-class
BCUnibicD-class
unisort
A <- matrix(replicate(100, rnorm(100)), nrow=100, byrow=TRUE) runibic(A) BCUnibic(A) BCUnibic(A, t = 0.95, q = 0, f = 1, nbic = 100, div = 0) B <- runiDiscretize(A) runibic(B) BCUnibicD(B, t = 0.95, q = 0, f = 1, nbic = 100, div = 0) biclust::biclust(A, method=BCUnibic(), t = 0.95, q = 0, f = 1, nbic = 100, div = 0) biclust::biclust(B, method=BCUnibicD(), t = 0.95, q = 0, f = 1, nbic = 100, div = 0)
A <- matrix(replicate(100, rnorm(100)), nrow=100, byrow=TRUE) runibic(A) BCUnibic(A) BCUnibic(A, t = 0.95, q = 0, f = 1, nbic = 100, div = 0) B <- runiDiscretize(A) runibic(B) BCUnibicD(B, t = 0.95, q = 0, f = 1, nbic = 100, div = 0) biclust::biclust(A, method=BCUnibic(), t = 0.95, q = 0, f = 1, nbic = 100, div = 0) biclust::biclust(B, method=BCUnibicD(), t = 0.95, q = 0, f = 1, nbic = 100, div = 0)
This function discretizes the input matrix.
runiDiscretize
uses paramaters: 'div' and 'q',
which are set by set_runibic_params function.
The funtion returns a discrete matrix with given number of ranks
based on the parameter div. In contrast to biclust::discretize
the function takes into consideration the quantile parameter 'q'.
When 'q' parameter is higher or equal 0.5 a simple discretization is used
with equal sizes of the levels using the quantiles. If 'q' parameter
is lower than 0.5 we use up(down)-regulated discretization divided
into three parts.
runiDiscretize(x)
runiDiscretize(x)
x |
a numeric matrix |
a discretized matrix containing integers only
set_runibic_params
calculateLCS
discretize
A <- replicate(10, rnorm(20)) runiDiscretize(A)
A <- replicate(10, rnorm(20)) runiDiscretize(A)
runibic function for setting parameters
set_runibic_params(t = 0.85, q = 0, f = 1, nbic = 100L, div = 0L, useLegacy = FALSE)
set_runibic_params(t = 0.85, q = 0, f = 1, nbic = 100L, div = 0L, useLegacy = FALSE)
t |
consistency level of the block (0.5-1.0] |
q |
a double value for quantile discretization |
f |
filtering overlapping blocks, default 1(do not remove any blocks) |
nbic |
maximum number of biclusters in output |
div |
number of ranks as which we treat the up(down)-regulated value: default: 0==ncol(x) |
useLegacy |
boolean value for legacy parameters management |
NULL (an empty value)
set_runibic_params(0.85, 0, 1, 100, 0, FALSE)
set_runibic_params(0.85, 0, 1, 100, 0, FALSE)
This function sorts separately each row of a integer matrix and returns a matrix in which the value in i-th row and j-th column represent the index of the j-th smallest value of the i-th row.
unisort(x)
unisort(x)
x |
a integer matrix |
a integer matrix with indexes indicating positions of j-th smallest element in each row
runibic
calculateLCS
runiDiscretize
A <- matrix(c(4, 3, 1, 2, 5, 8, 6, 7), nrow=2, byrow=TRUE) unisort(A)
A <- matrix(c(4, 3, 1, 2, 5, 8, 6, 7), nrow=2, byrow=TRUE) unisort(A)