Package 'MCbiclust' reference manual

Title:	Massive correlating biclusters for gene expression data and associated methods
Description:	Custom made algorithm and associated methods for finding, visualising and analysing biclusters in large gene expression data sets. Algorithm is based on with a supplied gene set of size n, finding the maximum strength correlation matrix containing m samples from the data set.
Authors:	Robert Bentham
Maintainer:	Robert Bentham <[email protected]>
License:	GPL-2
Version:	1.31.0
Built:	2025-03-29 06:36:15 UTC
Source:	https://github.com/bioc/MCbiclust

Clinical information for CCLE data

Description

A dataset containing clinical information for the CCLE samples.

Usage

CCLE_samples
CCLE_samples

Format

A data frame with 967 rows and 14 variables:

CCLE.name: Sample name identifier.
Cell.line.primary.name: Cell line name.
Cell.line.aliases: Any known aliases of cell line.
Gender: Gender of patient cell line derived from.
Site.Primary: Primary site cell line derived from.
Histology: Histology of tumour cell line derived from.
Hist.Subtype1: Histology subtype of tumour cell line derived from.
Notes: Additional notes.
Source: Source of the cell line.
Expression.arrays: Expression array used.
SNP.arrays: SNP array used.
Oncomap: Oncomap mutation array used.
Hybrid.Capture.Sequencing: Hybrid capture sequencing used.
Name: Sample name identifier

Value

Source

http://www.broadinstitute.org/ccle/data/browseData Filename: CCLE_sample_info_file_2012-04-06.txt

Subset of expression levels of CCLE data

Description

A dataset containing the gene-centric RMA-normalized mRNA expression data for nearly 1000 genes and 500 samples taken as a random subset of the complete CCLE data. 1000 genes were selected randomly such that 500 were mitochondrial and 500 non-mitochondrial.

Usage

CCLE_small
CCLE_small

Format

A data frame with 1000 rows and 500 variables:

MKN74_STOMACH: mRNA expression on sample MKN74_STOMACH
OC316_OVARY: mRNA expressionr on sample OC316_OVARY
...

@source http://www.broadinstitute.org/ccle/data/browseData Filename: CCLE_Expression_Entrez_2012-04-06.gct.gz

Value

Calculate correlation score

Description

The standard method to calculate the correlation score used to judge biclusters in MCbiclust

Usage

CorScoreCalc(gene.expr.matrix, sample.vec)
CorScoreCalc(gene.expr.matrix, sample.vec)

Arguments

`gene.expr.matrix`	Gene expression matrix with genes as rows and samples as columns
`sample.vec`	Vector of samples

Value

The correlation score

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

random.seed <- sample(seq(length = dim(CCLE.mito)[2]),10)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)


CorScoreCalc(CCLE.mito, random.seed)
CorScoreCalc(CCLE.mito, CCLE.seed)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CorScoreCalc(CCLE.mito[CCLE.hicor.genes,], CCLE.seed)
data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

random.seed <- sample(seq(length = dim(CCLE.mito)[2]),10)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)


CorScoreCalc(CCLE.mito, random.seed)
CorScoreCalc(CCLE.mito, CCLE.seed)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CorScoreCalc(CCLE.mito[CCLE.hicor.genes,], CCLE.seed)

Method for the calculation of a correlation vector

Description

Upon identifying a bicluster seed with FindSeed, one of the next steps is to identify which genes not in your chosen gene set are also highly correlated to the bicluster found. This is done by CVEval, and the output is known as the correlation vector.

Usage

CVEval(gem.part, gem.all, seed, splits)
CVEval(gem.part, gem.all, seed, splits)

Arguments

`gem.part`	Part of gene expression matrix only containing gene set of interest with genes as rows and samples as columns
`gem.all`	All of gene expression matrix
`seed`	Seed of highly correlating samples
`splits`	Number of cuts from hierarchical clustering

Details

CVeval uses hierarchical clustering to select the genes most representative of the bicluster and then uses the average expression of these genes across the sample seed and calculates the correlation of every gene measured across the sample seed to this average expression value.

The correlation vector is the output of this calculation.

Value

Correlation vector

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

Make correlation vector plot

Description

A function to visualise the differences between different found biclusters. Output is a matrix of plots. Each correlation vector is plotted against each other across the entire measured gene set in the lower diagonal plots, and a chosen gene set (e.g. mitochondrial) in the upper diagonal plots. The diagnal plots themselves show the density plots of the entire measured and chosen gene set. There are addition options to set the transparancy of the data points and names of the correlation vectors.

Usage

CVPlot(cv.df, geneset.loc, geneset.name, alpha1 = 0.005, alpha2 = 0.1,
  cnames = NULL)
CVPlot(cv.df, geneset.loc, geneset.name, alpha1 = 0.005, alpha2 = 0.1,
  cnames = NULL)

Arguments

`cv.df`	A dataframe containing the correlation vectors of one or more patterns.
`geneset.loc`	A gene set of interest (e.g. mitochondrial) to be plotted separately from rest of genes.
`geneset.name`	Name of geneset (e.g. mitochondrial genes)
`alpha1`	Transparency level of non-gene set genes
`alpha2`	Transparency level of gene set genes
`cnames`	Character vector containing names for the correlation vector

Value

A plot of the correlation vectors

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

CCLE.seed <- list()
CCLE.cor.vec <- list()

for(i in 1:3){
    set.seed(i)
    CCLE.seed[[i]] <- FindSeed(gem = CCLE.mito,
                               seed.size = 10,
                               iterations = 100,
                               messages = 100)}

for(i in 1:3){
    CCLE.cor.vec[[i]] <-  CVEval(gem.part = CCLE.mito,
                                 gem.all = CCLE_small,
                                 seed = CCLE.seed[[i]],
                                splits = 10)}



CCLE.cor.df <- (as.data.frame(CCLE.cor.vec))

CVPlot(cv.df = CCLE.cor.df, geneset.loc = mito.loc,
       geneset.name = "Mitochondrial",alpha1 = 0.5)
data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

CCLE.seed <- list()
CCLE.cor.vec <- list()

for(i in 1:3){
    set.seed(i)
    CCLE.seed[[i]] <- FindSeed(gem = CCLE.mito,
                               seed.size = 10,
                               iterations = 100,
                               messages = 100)}

for(i in 1:3){
    CCLE.cor.vec[[i]] <-  CVEval(gem.part = CCLE.mito,
                                 gem.all = CCLE_small,
                                 seed = CCLE.seed[[i]],
                                splits = 10)}



CCLE.cor.df <- (as.data.frame(CCLE.cor.vec))

CVPlot(cv.df = CCLE.cor.df, geneset.loc = mito.loc,
       geneset.name = "Mitochondrial",alpha1 = 0.5)

Find highly correlated seed of samples for gene expression matrix

Description

FindSeed() is the key function in MCbiclust. It takes a gene expression matrix and by a stochastic method greedily searches for a seed of samples that maximizes the correlation score of the chosen gene set.

Usage

FindSeed(gem, seed.size, iterations, initial.seed = NULL, messages = 100)
FindSeed(gem, seed.size, iterations, initial.seed = NULL, messages = 100)

Arguments

`gem`	Gene expression matrix with genes as rows and samples as columns
`seed.size`	Size of sample seed
`iterations`	Number of iterations
`initial.seed`	Initial seed used, if NULL randomly chosen
`messages`	frequency of progress messages

Details

Additional options allow for the search to start at a chosen seed, for instance if a improvement to a known seed is desired. The result of FindSeed() is dependent on the number of iterations, with above 1000 usually providing a good seed, and above 10000 an optimum seed.

Value

Highly correlated seed

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

random.seed <- sample(seq(length = dim(CCLE.mito)[2]),10)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)


CorScoreCalc(CCLE.mito, random.seed)
CorScoreCalc(CCLE.mito, CCLE.seed)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CorScoreCalc(CCLE.mito[CCLE.hicor.genes,], CCLE.seed)
data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

random.seed <- sample(seq(length = dim(CCLE.mito)[2]),10)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)


CorScoreCalc(CCLE.mito, random.seed)
CorScoreCalc(CCLE.mito, CCLE.seed)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CorScoreCalc(CCLE.mito[CCLE.hicor.genes,], CCLE.seed)

Calculate gene set enrichment of correlation vector using Mann-Whitney test

Description

The Mann-Whitney test is typically used due to the values of the correlation vector, not being normally distributed. GOEnrichmentAnalysis provides an interface with the GO database annotation to find the most significant GO terms.

Usage

GOEnrichmentAnalysis(gene.names, gene.values, sig.rate)
GOEnrichmentAnalysis(gene.names, gene.values, sig.rate)

Arguments

`gene.names`	Names of the genes in standard gene name format.
`gene.values`	Values associated with the genes, e.g the correlation vector output of `CVEval`.
`sig.rate`	Level of significance required after multiple hypothesis adjustment.

Value

Data frame of the significant gene sets, with GOID, GO Term, number of genes, number of genes in GO Term, number of genes in GO Term also in gene set, adjusted p-value, average value of correlation vector in gene set and phenotype describing whether average value of correlation vector is above or below the total average.

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(101)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)

CCLE.cor.vec <- CVEval(gem.part = CCLE.mito,
                       gem.all = CCLE_small,
                       seed = CCLE.seed, splits = 10)

# Significant GO terms can be calculated as follows:
# GEA <- GOEnrichmentAnalysis(gene.names = row.names(CCLE_small),
#                            gene.values = CCLE.cor.vec,
#                            sig.rate = 0.05)

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(101)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)

CCLE.cor.vec <- CVEval(gem.part = CCLE.mito,
                       gem.all = CCLE_small,
                       seed = CCLE.seed, splits = 10)

# Significant GO terms can be calculated as follows:
# GEA <- GOEnrichmentAnalysis(gene.names = row.names(CCLE_small),
#                            gene.values = CCLE.cor.vec,
#                            sig.rate = 0.05)

Find the most highly correlated genes using hierarchical clustering

Description

Upon finding an initial bicluster with FindSeed() not all the genes in the chosen geneset will be highly correlated to the bicluster. HclustGenesHiCor() uses the output of FindSeed() and hierarchical clustering to only select the genes that are most highly correlated to the bicluster. This is achieved by cutting the dendogram produced from the clustering into a set number of groups and then only selecting the groups that are most highly correlated to the bicluster

Usage

HclustGenesHiCor(gem, seed, cuts)
HclustGenesHiCor(gem, seed, cuts)

Arguments

`gem`	Gene expression matrix with genes as rows and samples as columns
`seed`	Seed of highly correlating samples
`cuts`	Number of groups to cut dendogram into

Value

Numeric vector of most highly correlated genes

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

random.seed <- sample(seq(length = dim(CCLE.mito)[2]),10)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)


CorScoreCalc(CCLE.mito, random.seed)
CorScoreCalc(CCLE.mito, CCLE.seed)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CorScoreCalc(CCLE.mito[CCLE.hicor.genes,], CCLE.seed)
data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- which(row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

random.seed <- sample(seq(length = dim(CCLE.mito)[2]),10)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 100)


CorScoreCalc(CCLE.mito, random.seed)
CorScoreCalc(CCLE.mito, CCLE.seed)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CorScoreCalc(CCLE.mito[CCLE.hicor.genes,], CCLE.seed)

MCbiclust: Massively Correlated biclustering

Description

MCbiclust is a R package for running massively correlating biclustering analysis.MCbiclust aims to find large scale biclusters with selected features being highly correlated with each other over a subset of samples.

Details

The package was originally designed in order to solve a problem in bioinformatics: to find biclusters representing different modes of regulation of mitochondria gene expression in disease states such as breast cancer. The same methods however, can be used on any gene expression data set to find biclusters of interest.

To learn more about MCbiclust, start with the vignette: browseVignettes(package = "MCbiclust")

List of known mitochondrial genes

Description

A dataset from MitoCarta1.0 containing the 1023 mitochondrial genes Availiable from the broad institute: http://www.broadinstitute.org/scientific-community/science/programs/metabolic-disease-program/publications/mitocarta/mitocarta-in-0

Usage

Mitochondrial_genes
Mitochondrial_genes

Format

A Character vector of the HGNC approved gene names

Value

Source

https://www.broadinstitute.org/publications/broad807s

Calculate PC1 vector of found pattern

Description

The correlations found between the chosen geneset in a subset of samples can be summarised by looking at the first principal component (PC1) using principal coponent analysis (PCA).

Usage

PC1VecFun(top.gem, seed.sort, n)
PC1VecFun(top.gem, seed.sort, n)

Arguments

`top.gem`	Gene expression matrix containing only highly correlating genes
`seed.sort`	Ordering of samples according to strength of correlation
`n`	Number of samples to use in calculation of PC1

Details

PC1VecFun() takes a gene expression matrix and the sample ordering and fits a PC1 value to all the samples based on a PCA analysis done on the first n samples.

Value

PC1 value for each sample

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

Calculate PointScore

Description

Using two gene sets that are represented of a known bicluster (one gene set being up regulated while other gene set is down regulated), samples are scored based on how well they match the known regulation of the bicluster.

Usage

PointScoreCalc(gene.expr.matrix, gene.loc1, gene.loc2)
PointScoreCalc(gene.expr.matrix, gene.loc1, gene.loc2)

Arguments

`gene.expr.matrix`	Gene expression matrix with genes as rows and samples as columns
`gene.loc1`	Location of the rows containing the genes in gene set 1 within the gene expression matrix
`gene.loc2`	Location of the rows containing the genes in gene set 2 within the gene expression matrix

Details

The PointScore of a sample can be directly compared to the PC1 value. The PointScore is typically used to identify samples related to the upper/lower fork of a bicluster without running the complete main MCbiclust pipeline on a dataset.

Value

Vector of point scores for each sample in the gene expression matrix

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CCLE.cor.mat <- cor(t(CCLE.mito[CCLE.hicor.genes,CCLE.seed]))

gene.set1 <- labels(as.dendrogram(hclust(dist(CCLE.cor.mat)))[[1]])
gene.set2 <- labels(as.dendrogram(hclust(dist(CCLE.cor.mat)))[[2]])

gene.set1.loc <- which(row.names(CCLE.mito) %in% gene.set1)
gene.set2.loc <- which(row.names(CCLE.mito) %in% gene.set2)

ps.vec <- PointScoreCalc(CCLE.mito,gene.set1.loc,gene.set2.loc)

cor(ps.vec[CCLE.samp.sort], CCLE.pc1)
plot(ps.vec[CCLE.samp.sort])
plot(CCLE.pc1)
data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.hicor.genes <- as.numeric(HclustGenesHiCor(CCLE.mito,
                                                CCLE.seed,
                                                cuts = 8))

CCLE.cor.mat <- cor(t(CCLE.mito[CCLE.hicor.genes,CCLE.seed]))

gene.set1 <- labels(as.dendrogram(hclust(dist(CCLE.cor.mat)))[[1]])
gene.set2 <- labels(as.dendrogram(hclust(dist(CCLE.cor.mat)))[[2]])

gene.set1.loc <- which(row.names(CCLE.mito) %in% gene.set1)
gene.set2.loc <- which(row.names(CCLE.mito) %in% gene.set2)

ps.vec <- PointScoreCalc(CCLE.mito,gene.set1.loc,gene.set2.loc)

cor(ps.vec[CCLE.samp.sort], CCLE.pc1)
plot(ps.vec[CCLE.samp.sort])
plot(CCLE.pc1)

Methods for ordering samples

Description

After finding an initial bicluster with FindSeed() the next step is to extend the bicluster by ordering the remaining samples by how they preserve the correlation found.

Usage

SampleSort(gem, seed, num.cores = 1, sort.length = NULL)

MultiSampleSortPrep(gem, av.corvec, top.genes.num, groups, initial.seeds)
SampleSort(gem, seed, num.cores = 1, sort.length = NULL)

MultiSampleSortPrep(gem, av.corvec, top.genes.num, groups, initial.seeds)

Arguments

`gem`	Gene expression matrix with genes as rows and samples as columns
`seed`	Sample seed of highly correlating genes
`num.cores`	Number of cores used in parallel evaluation
`sort.length`	Number of samples to be sorted
`av.corvec`	List of average correlation vector
`top.genes.num`	Number of the top genes in correlation vector to use for sorting samples
`groups`	List showing what runs belong to which correlation vector group
`initial.seeds`	List of sample seeds from all runs

Details

SampleSort() is the basic function that achieves this, it takes the gene expression matrix, seed of samples, and also has options for the number of cores to run the method on and the number of samples to sort.

MultiSampleSortPrep() is a preparation function for SampleSort() when MCbiclust has been run multiple times and returns a list of gene expression matrices and seeds for each 'distinct' bicluster found.

Value

Order of samples by strength to correlation pattern

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

Slihouette validation of correlation vector clusters

Description

MCbiclust is a stochastic method and needs to be run multiple times to identify different biclusters. SilhouetteClustGroups() examines the correlation vectors calculated from different runs and uses the technique of examining silhouette widths to identify the number of distinct clusters (and hence biclusters) found.

Usage

SilhouetteClustGroups(cor.vec.mat, max.clusters, plots = FALSE, seed1 = 100,
  rand.vec = TRUE)
SilhouetteClustGroups(cor.vec.mat, max.clusters, plots = FALSE, seed1 = 100,
  rand.vec = TRUE)

Arguments

`cor.vec.mat`	Correlation matrix of the correlation vectors (CVs)
`max.clusters`	Maximum number of clusters to divide CVs into
`plots`	True or False for whether to show silhouette plots
`seed1`	Value used to set random seed
`rand.vec`	True or False for whether to add random correlation vector used for comparison

Value

The distinct clusters of correlation vectors

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

CCLE.seed <- list()
CCLE.cor.vec <- list()

for(i in 1:5){
    set.seed(i)
    CCLE.seed[[i]] <- FindSeed(gem = CCLE.mito,
                               seed.size = 10,
                               iterations = 100,
                               messages = 100)}

for(i in 1:5){
  CCLE.cor.vec[[i]] <-  CVEval(gem.part = CCLE.mito,
                               gem.all = CCLE_small,
                               seed = CCLE.seed[[i]],
                               splits = 10)}

CCLE.cor.mat <- as.matrix(as.data.frame(CCLE.cor.vec))

CCLE.clust.groups <- SilhouetteClustGroups(cor.vec.mat = CCLE.cor.mat,
                                           plots = TRUE,
                                           max.clusters = 10)

av.corvec.fun <- function(x) rowMeans(CCLE.cor.mat[,x])
CCLE.average.corvec <- lapply(X = CCLE.clust.groups,
                              FUN = av.corvec.fun)

multi.sort.prep <- MultiSampleSortPrep(gem = CCLE_small,
                                       av.corvec = CCLE.average.corvec,
                                       top.genes.num = 750,
                                       groups =CCLE.clust.groups,
                                       initial.seeds = CCLE.seed)  

multi.sort <- list()        
for(i in seq_len(length(CCLE.clust.groups))){
    multi.sort[[i]] <- SampleSort(multi.sort.prep[[1]][[i]],
                                  seed = multi.sort.prep[[2]][[i]],
                                  sort.length = 11)
}

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

CCLE.seed <- list()
CCLE.cor.vec <- list()

for(i in 1:5){
    set.seed(i)
    CCLE.seed[[i]] <- FindSeed(gem = CCLE.mito,
                               seed.size = 10,
                               iterations = 100,
                               messages = 100)}

for(i in 1:5){
  CCLE.cor.vec[[i]] <-  CVEval(gem.part = CCLE.mito,
                               gem.all = CCLE_small,
                               seed = CCLE.seed[[i]],
                               splits = 10)}

CCLE.cor.mat <- as.matrix(as.data.frame(CCLE.cor.vec))

CCLE.clust.groups <- SilhouetteClustGroups(cor.vec.mat = CCLE.cor.mat,
                                           plots = TRUE,
                                           max.clusters = 10)

av.corvec.fun <- function(x) rowMeans(CCLE.cor.mat[,x])
CCLE.average.corvec <- lapply(X = CCLE.clust.groups,
                              FUN = av.corvec.fun)

multi.sort.prep <- MultiSampleSortPrep(gem = CCLE_small,
                                       av.corvec = CCLE.average.corvec,
                                       top.genes.num = 750,
                                       groups =CCLE.clust.groups,
                                       initial.seeds = CCLE.seed)  

multi.sort <- list()        
for(i in seq_len(length(CCLE.clust.groups))){
    multi.sort[[i]] <- SampleSort(multi.sort.prep[[1]][[i]],
                                  seed = multi.sort.prep[[2]][[i]],
                                  sort.length = 11)
}

Methods for defining a bicluster

Description

A bicluster is the fundamental result found using MCbiclust. These three functions are essential for the precise definition of these biclusters.

Usage

ThresholdBic(cor.vec, sort.order, pc1, samp.sig = 0)

PC1Align(gem, pc1, cor.vec, sort.order, bic)

ForkClassifier(pc1, samp.num)
ThresholdBic(cor.vec, sort.order, pc1, samp.sig = 0)

PC1Align(gem, pc1, cor.vec, sort.order, bic)

ForkClassifier(pc1, samp.num)

Arguments

`cor.vec`	Correlation vector (output of `CVEval()`).
`sort.order`	Order of samples (output of `SampleSort()`).
`pc1`	PC1 values for samples (output of `PC1VecFun`).
`samp.sig`	Value between 0 and 1 determining number of samples in bicluster
`gem`	Gene expression matrix containing genes as rows and samples as columns.
`bic`	bicluster (output of `ThresholdBic()`)
`samp.num`	Number of samples in the bicluster

Details

ThresholdBic() takes as its main inputs the correlation vector (output of CVEval()), sample ordering (output of SampleSort()), PC1 vector (output of PC1VecFun) and returns a list of the genes and samples which belong to the bicluster according to a certain level of significance.

PC1Align() is a function used once the bicluster has been found to ensure that the upper fork samples (those with higher PC1 values) correspond to those samples that have genes with positive correlation vector values up-regulated.

ForkClassifier() is a function used to classify which samples are in the upper or lower fork.

Value

Defined bicluster

Examples

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

data(CCLE_small)
data(Mitochondrial_genes)

mito.loc <- (row.names(CCLE_small) %in% Mitochondrial_genes)
CCLE.mito <- CCLE_small[mito.loc,]

set.seed(102)
CCLE.seed <- FindSeed(gem = CCLE.mito,
                      seed.size = 10,
                      iterations = 100,
                      messages = 1000)

CCLE.sort <- SampleSort(gem = CCLE.mito,seed = CCLE.seed,sort.length = 11)

# Full ordering are in Vignette_sort in sysdata.rda
CCLE.samp.sort <- MCbiclust:::Vignette_sort[[1]]

CCLE.pc1 <- PC1VecFun(top.gem = CCLE.mito,
                      seed.sort = CCLE.samp.sort,
                      n = 10)

CCLE.cor.vec <-  CVEval(gem.part = CCLE.mito,
                            gem.all = CCLE_small,
                            seed = CCLE.seed,
                            splits = 10)

CCLE.bic <- ThresholdBic(cor.vec = CCLE.cor.vec,sort.order = CCLE.samp.sort,
                         pc1 = as.numeric(CCLE.pc1))

CCLE.pc1 <- PC1Align(gem = CCLE_small, pc1 = CCLE.pc1,
                     cor.vec = CCLE.cor.vec ,
                     sort.order = CCLE.samp.sort,
                     bic =CCLE.bic)

CCLE.fork <- ForkClassifier(CCLE.pc1, samp.num = length(CCLE.bic[[2]]))

Package 'MCbiclust'

Help Index

Clinical information for CCLE data

Description

Usage

Format

Value

Source

Subset of expression levels of CCLE data

Description

Usage

Format

Value

Calculate correlation score

Description

Usage

Arguments

Value

Examples

Method for the calculation of a correlation vector

Description

Usage

Arguments

Details

Value

Examples

Make correlation vector plot

Description

Usage

Arguments

Value

Examples

Find highly correlated seed of samples for gene expression matrix

Description

Usage

Arguments

Details

Value

Examples

Calculate gene set enrichment of correlation vector using Mann-Whitney test

Description

Usage

Arguments

Value

Examples

Find the most highly correlated genes using hierarchical clustering

Description

Usage

Arguments

Value

Examples

MCbiclust: Massively Correlated biclustering

Description

Details

List of known mitochondrial genes

Description

Usage

Format

Value

Source

Calculate PC1 vector of found pattern

Description

Usage

Arguments

Details

Value

Examples

Calculate PointScore

Description

Usage

Arguments

Details

Value

Examples

Methods for ordering samples

Description

Usage

Arguments

Details

Value