| Title: | An R Package for Qualitative Biclustering in Support of Gene Co-Expression Analyses |
|---|---|
| Description: | The core function of this R package is to provide the implementation of the well-cited and well-reviewed QUBIC algorithm, aiming to deliver an effective and efficient biclustering capability. This package also includes the following related functions: (i) a qualitative representation of the input gene expression data, through a well-designed discretization way considering the underlying data property, which can be directly used in other biclustering programs; (ii) visualization of identified biclusters using heatmap in support of overall expression pattern analysis; (iii) bicluster-based co-expression network elucidation and visualization, where different correlation coefficient scores between a pair of genes are provided; and (iv) a generalize output format of biclusters and corresponding network can be freely downloaded so that a user can easily do following comprehensive functional enrichment analysis (e.g. DAVID) and advanced network visualization (e.g. Cytoscape). |
| Authors: | Yu Zhang [aut, cre], Qin Ma [aut] |
| Maintainer: | Yu Zhang <[email protected]> |
| License: | CC BY-NC-ND 4.0 + file LICENSE |
| Version: | 1.41.0 |
| Built: | 2026-05-28 14:43:48 UTC |
| Source: | https://github.com/bioc/QUBIC |
Class BCQU define a QUalitative BIClustering calcuator.
Class generator for objects of class BCQU.
BCQU qudiscretize qunetwork qunet2xml
QUBIC is a biclustering package, with source code upgrading from C code to C++ code.
The updated source code can avoid memory allocation error and is much efficient than the original one.
Based on our preliminary analysis, it can save 40% running time on a plant microarray data.
Whenever using this package, please cite as
Yu Zhang, Juan Xie, Jinyu Yang, Anne Fennell, Chi Zhang, Qin Ma;
QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data.
Bioinformatics, 2017; 33 (3): 450-452.
doi: 10.1093/bioinformatics/btw635
BCQUD performs a QUalitative BIClustering for a discret matrix.
BCQU(x = NULL, r = 1, q = 0.06, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL) BCQUD(x = NULL, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL) qubiclust_d(x, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL) qubiclust(x, r = 1L, q = 0.06, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL)BCQU(x = NULL, r = 1, q = 0.06, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL) BCQUD(x = NULL, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL) qubiclust_d(x, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL) qubiclust(x, r = 1L, q = 0.06, c = 0.95, o = 100, f = 1, k = max(ncol(x) %/% 20, 2), type = 'default', P = FALSE, C = FALSE, verbose = TRUE, weight = NULL, seedbicluster = NULL)
x |
the input data matrix, which could be the normalized gene expression matrix or its qualitative representation from Qdiscretization or other discretization ways.
(for example: a qualitative representation of gene expression data) |
r |
Affect the granularity of the biclusters. The range of possible ranks.
A user can start with a small value of |
q |
Affect the granularity of the biclusters. The percentage of the regulating conditions for each gene.
The choice of |
c |
The required consistency level of a bicluster. The default value of |
o |
The number of output biclusters. |
f |
Control parameter, to control the level of overlaps between to-be-identified biclusters.
The filter cut-off for data post-processing. For overlaps among to-be-identified biclusters.
Its default value is set to |
k |
The minimum column width of the block, minimum |
type |
The constrain type. |
P |
The flag to enlarge current bicluster using a p-value contrain,
which is defined based on its significance of expression consistency comparing to some simulated submatrix. Default: |
C |
The flag to set the lower bound of the condition number in a bicluster as 5% of the total condition number in the input data.
Only suggested to use when the input data has a few conditions (e.g. less than |
verbose |
If ' |
weight |
Alternative weight matrix provided by user, will append to default weight. |
seedbicluster |
Seed provided by user, normally should be a result of
|
For a given representing matrix of a microarray data set, we construct a weighted graph G with genes represented as vertices, edges connecting every pair of genes, and the weight of each edge being the similarity level between the two corresponding (entire) rows. Clearly, the higher a weight, the more similar two corresponding rows are. Intuitively, genes in a bicluster should induce a heavier subgraph of G because under a subset of the conditions, these genes have highly similar expression patterns that should make the weight of each involved edge heavier, comparing to the edges in the background. But it should be noted that some heavy subgraph may not necessarily correspond to a bicluster, i.e. genes from a heavy subgraph may not necessarily have similar expression patterns because different edges in a subgraph may have heavier weights under completely different subsets of conditions. It should also be noted that recognizing all heavy subgraphs in a weighted graph itself is computationally intractable because identification of maximum cliques in a graph is a special case of this, and the maximum clique problem is a well known intractable problem (NP-hard). So in our solution, we do not directly solve the problem of finding heavy subgraphs in a graph. Instead, we built our biclustering algorithm based on this graph representation of a microarray gene expression data, and tackle the biclustering problem as follows. We find all feasible biclusters (I,J) in the given data set such that min{|I|, |J|} is as large as possible, where I and J are subsets of genes and conditions, respectively.
Returns a QUBIC biclustering result object (class
QUBICBiclustResult).
BCQU(): Performs a QUalitative BIClustering.
BCQUD(): Performs a QUalitative BIClustering for a discret matrix.
qubiclust_d(): Performs a QUalitative BIClustering for a discret matrix.
qubiclust(): Performs a QUalitative BIClustering.
Yu Zhang, Juan Xie, Jinyu Yang, Anne Fennell, Chi Zhang, Qin Ma; QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data. Bioinformatics, 2017; 33 (3): 450-452.
BCQU-class qudiscretize qunetwork qunet2xml
# Random matrix with one embedded bicluster test <- matrix(rnorm(5000), 100, 50) test[11:20, 11:20] <- rnorm(100, 3, 0.3) res <- qubiclust(test, verbose = FALSE) summary(res) show(res) ## Not run: # Load example microarray matrix data(ecoli, package = "QUBICdata") # Biclustering res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Visualize selected biclusters quheatmap(x = ecoli, bicResult = res, number = 5, showlabel = TRUE) quheatmap(x = ecoli, bicResult = res, number = c(4, 8), showlabel = TRUE) # Build network and export XGMML net <- qunetwork(ecoli, res, number = 5, groups = 5, method = "spearman") outfile <- tempfile(fileext = ".gr") sink(outfile) qunet2xml(net, minimum = 0.6, color = cbind(grDevices::rainbow(length(net[[2]]) - 1), "gray")) sink() unlink(outfile) ## End(Not run) # Biclustering of discretized yeast microarray data mat <- matrix(rnorm(100), 10, 10) disc <- qudiscretize(mat) qubiclust_d(disc)# Random matrix with one embedded bicluster test <- matrix(rnorm(5000), 100, 50) test[11:20, 11:20] <- rnorm(100, 3, 0.3) res <- qubiclust(test, verbose = FALSE) summary(res) show(res) ## Not run: # Load example microarray matrix data(ecoli, package = "QUBICdata") # Biclustering res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Visualize selected biclusters quheatmap(x = ecoli, bicResult = res, number = 5, showlabel = TRUE) quheatmap(x = ecoli, bicResult = res, number = c(4, 8), showlabel = TRUE) # Build network and export XGMML net <- qunetwork(ecoli, res, number = 5, groups = 5, method = "spearman") outfile <- tempfile(fileext = ".gr") sink(outfile) qunet2xml(net, minimum = 0.6, color = cbind(grDevices::rainbow(length(net[[2]]) - 1), "gray")) sink() unlink(outfile) ## End(Not run) # Biclustering of discretized yeast microarray data mat <- matrix(rnorm(100), 10, 10) disc <- qudiscretize(mat) qubiclust_d(disc)
qudiscretize delivers a discrete matrix. It is useful if we just want to get a discretized matrix.
qudiscretize(x, r = 1L, q = 0.06)qudiscretize(x, r = 1L, q = 0.06)
x |
the input data matrix, which could be the normalized gene expression matrix or its qualitative representation from Qdiscretization or other discretization ways.
(for example: a qualitative representation of gene expression data) |
r |
Affect the granularity of the biclusters. The range of possible ranks.
A user can start with a small value of |
q |
Affect the granularity of the biclusters. The percentage of the regulating conditions for each gene.
The choice of |
qudiscretize convert a given gene expression matrix to a discrete matrix.
It's implimented in C++, providing a increase in speed over the C equivalent.
A qualitative discrete matrix
# Qualitative discretization on a synthetic matrix mat <- matrix(rnorm(35), 7, 5) qudiscretize(mat)# Qualitative discretization on a synthetic matrix mat <- matrix(rnorm(35), 7, 5) qudiscretize(mat)
This function can visualize the identifed biclusters using heatmap in support of overall expression pattern analysis,either for a single bicluster or two biclusters.
quheatmap( x, bicResult, number = 1, showlabel = FALSE, col = c("#313695", "#4575B4", "#74ADD1", "#ABD9E9", "#E0F3F8", "#FFFFBF", "#FEE090", "#FDAE61", "#F46D43", "#D73027", "#A50026"), ... )quheatmap( x, bicResult, number = 1, showlabel = FALSE, col = c("#313695", "#4575B4", "#74ADD1", "#ABD9E9", "#E0F3F8", "#FFFFBF", "#FEE090", "#FDAE61", "#F46D43", "#D73027", "#A50026"), ... )
x |
The data matrix |
bicResult |
Result object returned by |
number |
which bicluster to be plotted |
showlabel |
If TRUE, show the xlabel and ylabel |
col |
default: c("#313695", "#4575B4", "#74ADD1", "#ABD9E9", "#E0F3F8", "#FFFFBF", "#FEE090", "#FDAE61", "#F46D43", "#D73027", "#A50026") |
... |
Additional options in |
Invisibly returns NULL after drawing the heatmap.
# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Draw heatmap for the 5th identified bicluster par(mar = c(5, 4, 3, 5) + 0.1, mgp = c(0, 1, 0), cex.lab = 1.1, cex.axis = 0.5, cex.main = 1.1) quheatmap(x = ecoli, res, number = 5, showlabel = TRUE) # Draw heatmap for the 4th and 8th identified biclusters. par(mar = c(5, 5, 5, 5), cex.lab = 1.1, cex.axis = 0.5, cex.main = 1.1) quheatmap(x = ecoli, res, number = c(4, 8), showlabel = TRUE) }# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Draw heatmap for the 5th identified bicluster par(mar = c(5, 4, 3, 5) + 0.1, mgp = c(0, 1, 0), cex.lab = 1.1, cex.axis = 0.5, cex.main = 1.1) quheatmap(x = ecoli, res, number = 5, showlabel = TRUE) # Draw heatmap for the 4th and 8th identified biclusters. par(mar = c(5, 5, 5, 5), cex.lab = 1.1, cex.axis = 0.5, cex.main = 1.1) quheatmap(x = ecoli, res, number = c(4, 8), showlabel = TRUE) }
This function can convert the constructed co-expression networks into XGMML format, which can be used to do further network analysis in Cytoscape, Biomax and JNets.
qunet2xml( net, minimum = 0.6, color = cbind(grDevices::rainbow(length(net[[2]]) - 1), "gray") )qunet2xml( net, minimum = 0.6, color = cbind(grDevices::rainbow(length(net[[2]]) - 1), "gray") )
net |
Result of |
minimum |
cutoff, default: 0.6 |
color |
default: cbind(grDevices::rainbow(length(net[[2]]) - 1), 'gray') |
Text of XGMML
# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Get all biclusters net <- qunetwork(ecoli, res, number = 5, groups = 5, method = "spearman") # Save the network to a XGMML file outfile <- tempfile(fileext = ".gr") sink(outfile) qunet2xml(net, minimum = 0.6, color = cbind(grDevices::rainbow(length(net[[2]]) - 1), "gray")) sink() unlink(outfile) }# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Get all biclusters net <- qunetwork(ecoli, res, number = 5, groups = 5, method = "spearman") # Save the network to a XGMML file outfile <- tempfile(fileext = ".gr") sink(outfile) qunet2xml(net, minimum = 0.6, color = cbind(grDevices::rainbow(length(net[[2]]) - 1), "gray")) sink() unlink(outfile) }
This function can automatically create co-expression networks along with their visualization based on identified biclusters in QUBIC. Three correlation methods, Pearson, Kendall and Spearman, are available for a user, facilitating different preferences in practical usage.
qunetwork( x, BicRes, number = NULL, groups = NULL, method = c("pearson", "kendall", "spearman") )qunetwork( x, BicRes, number = NULL, groups = NULL, method = c("pearson", "kendall", "spearman") )
x |
The data matrix |
BicRes |
Result object returned by |
number |
Which bicluster to be plotted |
groups |
An object that indicates which nodes belong together. |
method |
A character string indicating which correlation coefficient (or covariance) is to be computed. One of 'pearson' (default), 'kendall', or 'spearman', can be abbreviated. |
a list contains a weights matrix and groupinfo
# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Constructing the network for the 5th identified bicluster. net <- qunetwork(ecoli, res, number = 5, groups = 5, method = "spearman") } ## Not run: if (requireNamespace('qgraph')) qgraph::qgraph(net[[1]], groups = net[[2]], layout = 'spring', minimum = 0.6, color = cbind(rainbow(length(net[[2]]) - 1),'gray'), edge.labels = FALSE) ## End(Not run) ## Not run: # Load microarray matrix data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Constructing the networks for the 4th and 8th identified biclusters, # using the whole network as a background. net <- qunetwork(ecoli, res, number = c(4, 8), groups = c(4, 8), method = "spearman") if (requireNamespace('qgraph')) qgraph::qgraph(net[[1]], groups = net[[2]], layout = 'spring', minimum = 0.6, color = c('red', 'blue', 'gold', 'gray'), edge.labels = FALSE) ## End(Not run)# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Constructing the network for the 5th identified bicluster. net <- qunetwork(ecoli, res, number = 5, groups = 5, method = "spearman") } ## Not run: if (requireNamespace('qgraph')) qgraph::qgraph(net[[1]], groups = net[[2]], layout = 'spring', minimum = 0.6, color = cbind(rainbow(length(net[[2]]) - 1),'gray'), edge.labels = FALSE) ## End(Not run) ## Not run: # Load microarray matrix data(ecoli, package = "QUBICdata") res <- qubiclust(ecoli, r = 1, q = 0.06, c = 0.95, o = 100, f = 0.25, k = max(ncol(ecoli) %/% 20, 2), verbose = FALSE) # Constructing the networks for the 4th and 8th identified biclusters, # using the whole network as a background. net <- qunetwork(ecoli, res, number = c(4, 8), groups = c(4, 8), method = "spearman") if (requireNamespace('qgraph')) qgraph::qgraph(net[[1]], groups = net[[2]], layout = 'spring', minimum = 0.6, color = c('red', 'blue', 'gold', 'gray'), edge.labels = FALSE) ## End(Not run)
This function can make a report for biclusters.
showinfo(matrix, bic)showinfo(matrix, bic)
matrix |
microarray matrix |
bic |
array of biclusters |
Text of report
# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") matrix <- ecoli[1:50, , drop = FALSE] res1 <- qubiclust(matrix, verbose = FALSE) res2 <- qubiclust(matrix, c = 0.9, verbose = FALSE) res3 <- qubiclust(matrix, c = 0.85, verbose = FALSE) # Show the report showinfo(matrix, list(res1, res2, res3)) }# Load microarray matrix if (requireNamespace("QUBICdata", quietly = TRUE)) { data(ecoli, package = "QUBICdata") matrix <- ecoli[1:50, , drop = FALSE] res1 <- qubiclust(matrix, verbose = FALSE) res2 <- qubiclust(matrix, c = 0.9, verbose = FALSE) res3 <- qubiclust(matrix, c = 0.85, verbose = FALSE) # Show the report showinfo(matrix, list(res1, res2, res3)) }