Title: | BioTIP: An R package for characterization of Biological Tipping-Point |
---|---|
Description: | Adopting tipping-point theory to transcriptome profiles to unravel disease regulatory trajectory. |
Authors: | Zhezhen Wang, Andrew Goldstein, Yuxi Sun, Biniam Feleke, Qier An, Antonio Feliciano, Xinan Yang |
Maintainer: | Yuxi (Jennifer) Sun <[email protected]>, Zhezhen Wang <[email protected]>, and X Holly Yang <[email protected]> |
License: | GPL-2 |
Version: | 1.21.0 |
Built: | 2024-12-19 03:24:23 UTC |
Source: | https://github.com/bioc/BioTIP |
This function takes in ine (or two) matrix X (rows are genes, columns are samples) (or Y). It then calculates the average pairwise correlation between genes or samples. This method uses the method outlined by Schafer and Strimmer in "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics" (2005)
avg.cor.shrink( X, Y = NULL, MARGIN = c(1, 2), shrink = TRUE, abs = FALSE, target = c("zero", "average", "half") )
avg.cor.shrink( X, Y = NULL, MARGIN = c(1, 2), shrink = TRUE, abs = FALSE, target = c("zero", "average", "half") )
X |
A G1 x S matrix of counts. Rows correspond to genes, columns correspond to samples. |
Y |
A G2 x S matrix of counts. Rows correspond to genes, columns correspond to samples. By default is NULL. |
MARGIN |
An integer indicate whether the rows (1, genes) or the columns (2, samples) to be calculated for pairwise correlation. |
shrink |
A flag specifying whether to shrink the correlation or not. |
abs |
A flag specifying whether to take the absolute value before taking the average (used for gene-gene correlations, not sample-sample correlations) |
target |
A character choose among ('zero', 'average', 'half'), indicating whether to shrink towards zero (for gene-gene correlations), shrink towards their empirical common average, one or 0.5 (for sample-sample correlations). |
The average pairwise correlation between genes or samples.
Andrew Goldstein [email protected]; Xinan H Yang [email protected]
Schafer and Strimmer (2005) "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics"
## Generating a data X as coming from a multivariate normal distribution ## with 10 highly correlated variables, roughly simulating correlated genes. M = matrix(.9, nrow = 10, ncol = 10) diag(M) = 1 mu = rnorm(10) X = MASS::mvrnorm(1000, mu, M) dim(X) #1000 10 ## Mean value of standard pairwise correlation between 1000 genes # cor_tX = cor(t(X)) # mean(abs(cor_tX[upper.tri(cor_tX, diag = FALSE)])) # 0.9150228 ## Calculating estimated pairwise correlation between 1000 genes avg.cor.shrink(X, MARGIN=1,shrink = TRUE, targe='zero') # 0.8287838 M = matrix(.9, nrow = 20, ncol = 10) diag(M) = 1 Y = rbind(M,X) dim(Y) #1020 10 avg.cor.shrink(X, Y, MARGIN=1,shrink = TRUE, targe='zero') #0.8197959
## Generating a data X as coming from a multivariate normal distribution ## with 10 highly correlated variables, roughly simulating correlated genes. M = matrix(.9, nrow = 10, ncol = 10) diag(M) = 1 mu = rnorm(10) X = MASS::mvrnorm(1000, mu, M) dim(X) #1000 10 ## Mean value of standard pairwise correlation between 1000 genes # cor_tX = cor(t(X)) # mean(abs(cor_tX[upper.tri(cor_tX, diag = FALSE)])) # 0.9150228 ## Calculating estimated pairwise correlation between 1000 genes avg.cor.shrink(X, MARGIN=1,shrink = TRUE, targe='zero') # 0.8287838 M = matrix(.9, nrow = 20, ncol = 10) diag(M) = 1 Y = rbind(M,X) dim(Y) #1020 10 avg.cor.shrink(X, Y, MARGIN=1,shrink = TRUE, targe='zero') #0.8197959
A subset of gencode_gr extracted as: cod <- subset(gencode_gr, biotype == 'protein_coding')
cod
cod
A dataframe with 3 data columns and 1 metadata column.
chromosome names (chr21)
ranges
chromosome ranges on the genome (10906201-11029719)
specific strand of the genomic location (+,-,*)
This function takes in one (or two) matrix X (rows are genes, columns are samples) (or Y). It then calculates the average pairwise correlation between genes or samples. This method uses the method outlined by Schafer and Strimmer (2005).
cor.shrink( X, Y = NULL, MARGIN = c(1, 2), shrink = TRUE, target = c("zero", "average", "half") )
cor.shrink( X, Y = NULL, MARGIN = c(1, 2), shrink = TRUE, target = c("zero", "average", "half") )
X |
A G1 x S matrix of counts. Rows correspond to genes, columns correspond to samples. |
Y |
A G2 x S matrix of counts. Rows correspond to genes, columns correspond to samples. By default is NULL. |
MARGIN |
An integer indicate whether the rows (1, genes) or the columns (2, samples) to be calculated for pairwise correlation. |
shrink |
A flag specifying whether to shrink the correlation or not. |
target |
A character choose among ('zero', 'average', 'half'), indicating whether to shrink towards zero (for gene-gene correlations), shrink towards their empirical common average, one or 0.5 (for sample-sample correlations). |
The pairwise correlation between genes or samples. If Y==NULL, a G1 x G1 matrix is returned; otherwise, a G1 x G2 matrix is returned.
Andrew Goldstein [email protected]
Schafer and Strimmer (2005) "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics"
require(MASS) ## Generating a data X as coming from a multivariate normal distribution ## with 10 highly correlated variables, roughly simulating correlated genes. M = matrix(.9, nrow = 10, ncol = 10) diag(M) = 1 mu = rnorm(10) X = MASS::mvrnorm(500, mu, M) dim(X) #500 10 ## Calculating pairwise correlation among 500 correlated genes cor_tX = cor(t(X)) mean(abs(cor_tX[upper.tri(cor_tX, diag = FALSE)])) # 0.9468098 ## Calculating estimated pairwise correlation among 500 correlated genes cor.matrix <- cor.shrink(X, MARGIN=1, shrink = TRUE, targe='zero') # 0.8287838 dim(cor.matrix) #[1] 500 500 mean(upper.tri(cor.matrix, diag=FALSE)) # 0.499 ## Calculating stimated pairwise correlation among 500 correlated genes ## and additional 100 random genes Y = rbind(X, matrix(rnorm(300*10), nrow = 300)) dim(Y) #800 10 cor.matrix <- cor.shrink(X, Y, MARGIN=1, shrink = TRUE, targe='zero') dim(cor.matrix) #[1] 500 800 mean(upper.tri(cor.matrix, diag=FALSE)) # 0.6868
require(MASS) ## Generating a data X as coming from a multivariate normal distribution ## with 10 highly correlated variables, roughly simulating correlated genes. M = matrix(.9, nrow = 10, ncol = 10) diag(M) = 1 mu = rnorm(10) X = MASS::mvrnorm(500, mu, M) dim(X) #500 10 ## Calculating pairwise correlation among 500 correlated genes cor_tX = cor(t(X)) mean(abs(cor_tX[upper.tri(cor_tX, diag = FALSE)])) # 0.9468098 ## Calculating estimated pairwise correlation among 500 correlated genes cor.matrix <- cor.shrink(X, MARGIN=1, shrink = TRUE, targe='zero') # 0.8287838 dim(cor.matrix) #[1] 500 500 mean(upper.tri(cor.matrix, diag=FALSE)) # 0.499 ## Calculating stimated pairwise correlation among 500 correlated genes ## and additional 100 random genes Y = rbind(X, matrix(rnorm(300*10), nrow = 300)) dim(Y) #800 10 cor.matrix <- cor.shrink(X, Y, MARGIN=1, shrink = TRUE, targe='zero') dim(cor.matrix) #[1] 500 800 mean(upper.tri(cor.matrix, diag=FALSE)) # 0.6868
A chr21 data from GENCODE GRCh37
gencode
gencode
A data frame of chr21 with 2619444 rows and 9 variables:
Rows, include chromosome numbers
Columns include seqname, source, feature, start , end, score, strand, frame and attribute
cod_gr is a subset of sample data 'gencode_gr'
The purpose of the getBiotypes
() function is to class both coding and noncoding
transcripts into biotypes using the most recent GENCODE annotations. This
tool can also be used to define potential lncRNAs, given an available genome
transcriptome assembly (a gtf file) or any genomic loci of interest.
getBiotypes(full_gr, gencode_gr, intron_gr = NULL, minoverlap = 1L)
getBiotypes(full_gr, gencode_gr, intron_gr = NULL, minoverlap = 1L)
full_gr |
A GRanges object which contains either coding or noncoding transcripts. Each GRanges objects' columns requires a unique identifications. For further details refer to the GRanges package. |
gencode_gr |
A GRanges object contatining a human Chr21 GENCODE reference annotation. A metadata column, "biotype", describes the transcript type. |
intron_gr |
A GRanges object containing the coordinates of non-coding transcripts. |
minoverlap |
An IRanges argument which detects minimum overlap between two IRanges objects. For more information about minoverlap argument refer to the IRanges package. |
For details of findOverlaps, type.partialOverlap, type.50Overlap type.toPlot, queryhits, and subjecthits see GenomicRanges https://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.html, IRanges https://www.bioconductor.org/packages/release/bioc/html/IRanges.html, and BiocManager http://bioconductor.org/install/index.html.
A GRanges object that returns classified transcriptome biotypes.
Replace the PATH_FILE when loading your data locally.
Zhezhen Wang and Biniam Feleke
Reference GRCh37 genome https://www.gencodegenes.org/human/release_25lift37.html for details on gtf format visit ensemble https://useast.ensembl.org/info/website/upload/gff.html
Wang, Z.Z., J. M. Cunningham and X. H. Yang (2018). 'CisPi: a transcriptomic score for disclosing cis-acting disease-associated lincRNAs.' Bioinformatics 34(17): 664-670', PMID: 30423099'
# Input datasets from our package's data folder library(GenomicRanges) data("gencode") data("intron") data("ILEF") # Converting datasets to GRanges object gencode_gr = GRanges(gencode) ILEF_gr = GRanges(ILEF) cod_gr = GRanges(cod) intron_gr= GRanges(intron) # Filtering non-coding transcripts getBiotypes(ILEF_gr, gencode_gr, intron_gr) ## Not run: getBiotypes(intron_gr)
# Input datasets from our package's data folder library(GenomicRanges) data("gencode") data("intron") data("ILEF") # Converting datasets to GRanges object gencode_gr = GRanges(gencode) ILEF_gr = GRanges(ILEF) cod_gr = GRanges(cod) intron_gr= GRanges(intron) # Filtering non-coding transcripts getBiotypes(ILEF_gr, gencode_gr, intron_gr) ## Not run: getBiotypes(intron_gr)
This function runs over all states which are grouped samples.
For each state, this function splits the correlation network generated from
the function getNetwork
into several sub-networks (which we
called 'module'). The network nodes will be defined by the end-user. For
transcriptome analysis, network nodes can be the expressed transcripts. The
outputs of this function include the module IDs and node IDs per module.
getCluster_methods( igraphL, method = c("rw", "hcm", "km", "pam", "natural"), cutoff = NULL )
getCluster_methods( igraphL, method = c("rw", "hcm", "km", "pam", "natural"), cutoff = NULL )
igraphL |
A list of numerical matrices or a list of igraph objects. The list of igraph objects can be the output from the getNetwork function. |
method |
A mathematical clustering model for analyzing network nodes. Default is a random walk ('rw'). A method could be 'rw', 'hcm', 'km', 'pam', or 'natural', where:
|
cutoff |
A numeric value, default is NULL. For each method it means:
|
When method=rw: A list of communities
objects of R package
igraph, whose length is the length of the input object igraphL
.
These communities
objects can be used for
visualization when being assigned to the 'mark.groups' parameter of the
plot.igraph
function of the igraph package. Otherwise this
function returns a list of vectors, whose length is the length of the input
object igraphL
. The names of each vector are the pre-selected
transcript IDs by th function sd_selection
. Each vector,
whose length is the number of pre-selected transcript in a state, contains
the module IDs.
Zhezhen Wang [email protected]
test = list('state1' = matrix(sample(1:10, 6), 3, 3), 'state2' = matrix(sample(1:10, 6), 3, 3), 'state3' = matrix(sample(1:10, 6), 3, 3)) #assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = 1:3} #using 'rw' or 'natural' method igraphL <- getNetwork(test, fdr=1) #[1] "state1:3 nodes" #[1] "state2:3 nodes" #[1] "state3:3 nodes" cl <- getCluster_methods(igraphL) #using 'km', 'pam' or 'hcm' cl <- getCluster_methods(test, method = 'pam', cutoff=2)
test = list('state1' = matrix(sample(1:10, 6), 3, 3), 'state2' = matrix(sample(1:10, 6), 3, 3), 'state3' = matrix(sample(1:10, 6), 3, 3)) #assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = 1:3} #using 'rw' or 'natural' method igraphL <- getNetwork(test, fdr=1) #[1] "state1:3 nodes" #[1] "state2:3 nodes" #[1] "state3:3 nodes" cl <- getCluster_methods(igraphL) #using 'km', 'pam' or 'hcm' cl <- getCluster_methods(test, method = 'pam', cutoff=2)
getCTS obtains the identified BioTiP and its length based off of MCI scores.
getCTS(maxMCI, maxMCIms)
getCTS(maxMCI, maxMCIms)
maxMCI |
A numeric vector, whose length is the number of states.
This parameter is the maximum MCI score of each state, and it can be obtained from the output of |
maxMCIms |
A list of character vectors per state. The vectors are network nodes (e.g. transcript ids).
This parameter is the second element of the output of the function |
A character vector, in which the elements are the unique IDs of the network nodes of the BioTiP.
Antonio Feliciano y Pleyto and Zhezhen Wang [email protected]
maxMCI <- c(a = 2.56, b = 8.52, c = 2.36, d = 4.81, e = 5.26) maxMCIms <- list(a = c("A100", "A293", "C403"), b = c("B853", "D826", "A406"), c = c("J198", "D103", "B105"), d = c("K529", "D385", "E358"), e = c("J019", "U926", "N824")) identical(names(maxMCI), names(maxMCIms)) # TRUE getCTS(maxMCI, maxMCIms) # "Length: 3" # "B853" "D826" "A406"
maxMCI <- c(a = 2.56, b = 8.52, c = 2.36, d = 4.81, e = 5.26) maxMCIms <- list(a = c("A100", "A293", "C403"), b = c("B853", "D826", "A406"), c = c("J198", "D103", "B105"), d = c("K529", "D385", "E358"), e = c("J019", "U926", "N824")) identical(names(maxMCI), names(maxMCIms)) # TRUE getCTS(maxMCI, maxMCIms) # "Length: 3" # "B853" "D826" "A406"
Retrieve Ic scores (Pearson correlation of genes / Pearson correlation of samples) for the identified critical transition state
getIc( counts, sampleL, genes, output = c("Ic", "PCCg", "PCCs"), fun = c("cor", "BioTIP"), shrink = TRUE, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs") )
getIc( counts, sampleL, genes, output = c("Ic", "PCCg", "PCCs"), fun = c("cor", "BioTIP"), shrink = TRUE, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs") )
counts |
A numeric matrix or data frame. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively. |
sampleL |
A list of vectors, whose length is the number of states. Each vector gives the sample names in a state. Note that the vector s (sample names) has to be among the column names of the R object 'df'. |
genes |
A character vector consisting of unique CTS gene ids. This can be obtained from |
output |
A string. Please select from 'Ic', 'PCCg', or 'PCCs'. Uses 'Ic' by default. 'PCCg' is the PCC between genes (numerator) and 'PCCs' is PCC between samples (denominator) |
fun |
An optional character string indicating the R functon to calculate correlations
for all possible pairs of columns of a matrix.
When using "BioTIP", The method is modified to ignore missing values, analogous to how
|
shrink |
A flag specifying whether to shrink the correlation or not.
This appraoch uses the method outlined by Schafer and Strimmer in
"A Shrinkage Approach to Large-Scale Covariance Matrix Estimation
and Implications for Functional Genomics" (2005)
Comparing to fun='cor', the 'BioTIP' method without shinkage is modified
to ignore missing values, analogous to how |
use |
An optional character string, when fun=="cor", it gives a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". |
A list of numeric values, whose length and names are inherited from sampleL
Zhezhen Wang [email protected]; Xinan H Yang [email protected]
Schafer and Strimmer (2005) "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics"
M. Mojtahedi et al., Cell Fate Decision as High-Dimensional Critical State Transition. PLoS Biol 14, e2000640 (2016).
counts = matrix(sample(1:100, 27), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) CTS = c('loci1', 'loci2') ## Comparing the results with an estiamted correlation matrix with that without estimation. Ic = getIc(counts, samplesL, CTS, fun='cor') Ic.2 = getIc(counts, samplesL, CTS, fun='BioTIP', shrink=FALSE) BioTIP = getIc(counts, samplesL, CTS, fun='BioTIP')
counts = matrix(sample(1:100, 27), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) CTS = c('loci1', 'loci2') ## Comparing the results with an estiamted correlation matrix with that without estimation. Ic = getIc(counts, samplesL, CTS, fun='cor') Ic.2 = getIc(counts, samplesL, CTS, fun='BioTIP', shrink=FALSE) BioTIP = getIc(counts, samplesL, CTS, fun='BioTIP')
This function calculates the BioTIP score on a given
data matrix X (or two matrixes X and Y). It can also calculate the score, if desired.
This appraoch uses the method outlined by Schafer and Strimmer in "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics" (2005)
This approach is modified to ignore missing values, analogous to how
cor(X, use = "pairwise.complete.obs")
works.
The gene-gene correlations are shrunk towards 0, whereas the sample-sample correlations are shrunk towards their empirical average.
getIc.new( X, method = c("BioTIP", "Ic"), PCC_sample.target = c("average", "zero", "half"), output = c("IndexScore", "PCCg", "PCCs") )
getIc.new( X, method = c("BioTIP", "Ic"), PCC_sample.target = c("average", "zero", "half"), output = c("IndexScore", "PCCg", "PCCs") )
X |
A G x S matrix of counts. Rows correspond to genes, columns correspond to samples. |
method |
A flag specifying whether to calculate the BioTIP score
or the |
PCC_sample.target |
A character choose among ('average', 'zero', 'half'), indicating whether to shrink PCC towards towards their empirical common average, zero or 0.5 (for sample-sample correlations). |
output |
A string. Please select from 'IndexScore', 'PCCg', or 'PCCs'. Uses 'IndexScore' by default. 'PCCg' is the PCC between genes (numerator) and 'PCCs' is PCC between samples (denominator). |
A value containing the shrunk BioTIP or non-shrunk score
Andrew Goldstein [email protected]
## Generating a data X as coming from a multivariate normal distribution ## with 10 highly correlated variables, roughly simulating correlated genes. M = matrix(.9, nrow = 10, ncol = 10) diag(M) = 1 mu = rnorm(10) X = MASS::mvrnorm(1000, mu, M) dim(X) #1000 10 ## Calculating pairwise correlation between 1000 genes; then the mean value ## in two ways, respectively cor_tX = cor(t(X)) mean(abs(cor_tX[upper.tri(cor_tX, diag = FALSE)])) # 0.9150228 getIc.new(X, method = "Ic", output ='PCCg') # 0.9150228 getIc.new(X, method = "BioTIP", output ='PCCg') # 0.8287838 ## Uisng the Index of critical scoreing system, in two ways, respectively (newscore = getIc.new(X, method = "BioTIP")) (oldscore = getIc.new(X, method = "Ic"))
## Generating a data X as coming from a multivariate normal distribution ## with 10 highly correlated variables, roughly simulating correlated genes. M = matrix(.9, nrow = 10, ncol = 10) diag(M) = 1 mu = rnorm(10) X = MASS::mvrnorm(1000, mu, M) dim(X) #1000 10 ## Calculating pairwise correlation between 1000 genes; then the mean value ## in two ways, respectively cor_tX = cor(t(X)) mean(abs(cor_tX[upper.tri(cor_tX, diag = FALSE)])) # 0.9150228 getIc.new(X, method = "Ic", output ='PCCg') # 0.9150228 getIc.new(X, method = "BioTIP", output ='PCCg') # 0.8287838 ## Uisng the Index of critical scoreing system, in two ways, respectively (newscore = getIc.new(X, method = "BioTIP")) (oldscore = getIc.new(X, method = "Ic"))
This function reports the 'biomodule', which is the module with
the maximum Module Critical Index (MCI) scores for each state. Each state
can have multiple modules (groups of subnetworks derived from the function
getCluster_methods
). This function runs over all states.
getMaxMCImember(membersL, MCIl, minsize = 1)
getMaxMCImember(membersL, MCIl, minsize = 1)
membersL |
A list of integer vectors with unique ids as names. Each
vector represents the cluster number assign to that unique id. The length
of this list is equal to the number of states in the study. This can be the
first element of the output from function |
MCIl |
A list of numeric vectors with unique cluster numbers as names.
Each vector represents the MCI scores of that module. This can be the
second element of the output from function |
minsize |
A numerical value of the minimum module size (the number of transcripts in a cluster) to output for downstream analysis. |
A nested list whose length is the length of the input object
membersL
. Each internal list contains two objects: one object is
the vector of biomodule IDs across states, and the other object is a list of
transcript IDs (each defines the biomodule per state) across states.
Zhezhen Wang [email protected]
#1st option: get the input directly from getMCI function test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) # assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4')} cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4')} membersL_noweight <- getMCI(cluster, test) maxMCIms <- getMaxMCImember(membersL_noweight[[1]], membersL_noweight[[2]], min =3) #The same as maxMCIms <- getMaxMCImember(cluster, membersL_noweight[[2]], min =2) ## case1: using 'rw' method by default igraphL <- getNetwork(test, fdr=1) cl <- getCluster_methods(igraphL) ## make sure every element in list cl is a \code{communities} object sapply(cl, class) ## state1 state2 state3 ##"communities" "communities" "communities" ## If there is(are) state(s) that is(are) empty which will not be a communities object(s), ## please manually remove that state(s). cl = cl[which(sapply(cl, class) == 'communities')] ## and then run library(igraph) cluster = lapply(cl, membership) maxCIms <- getMaxMCImember(cluster, membersL_noweight[[2]], min =2) ## or run function 'getMCI' and use the 1st option membersL_noweight <- getMCI(cl, test) ## case2: using methods other than the default cl <- getCluster_methods(test, method = "pam", cutoff = 2) ## check to make sure membersL_noweight[[2]] has values and run maxCIms <- getMaxMCImember(cl, membersL_noweight[[2]], min =2)
#1st option: get the input directly from getMCI function test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) # assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4')} cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4')} membersL_noweight <- getMCI(cluster, test) maxMCIms <- getMaxMCImember(membersL_noweight[[1]], membersL_noweight[[2]], min =3) #The same as maxMCIms <- getMaxMCImember(cluster, membersL_noweight[[2]], min =2) ## case1: using 'rw' method by default igraphL <- getNetwork(test, fdr=1) cl <- getCluster_methods(igraphL) ## make sure every element in list cl is a \code{communities} object sapply(cl, class) ## state1 state2 state3 ##"communities" "communities" "communities" ## If there is(are) state(s) that is(are) empty which will not be a communities object(s), ## please manually remove that state(s). cl = cl[which(sapply(cl, class) == 'communities')] ## and then run library(igraph) cluster = lapply(cl, membership) maxCIms <- getMaxMCImember(cluster, membersL_noweight[[2]], min =2) ## or run function 'getMCI' and use the 1st option membersL_noweight <- getMCI(cl, test) ## case2: using methods other than the default cl <- getCluster_methods(test, method = "pam", cutoff = 2) ## check to make sure membersL_noweight[[2]] has values and run maxCIms <- getMaxMCImember(cl, membersL_noweight[[2]], min =2)
This function retrieves the cluster index and network-node ids for the identified biomodule (that shows the maximum MCI score) at each state in the study.
getMaxStats(membersL, idx)
getMaxStats(membersL, idx)
membersL |
A two-layer nested list of character or numeric values,
any one out of the five elements output by the function |
idx |
A vector of integers that are cluster ids of the biomodule
(the module with the highest MCI score) per state.
This is the first element of the result from |
A list describing the biomodule of each state, corresponding to one of the five elements
(members, MCI, Sd, PCC, and PCCo) outputted by the function getMCI
.
The calss of the vector depends on the class of the input parameter membersL
.
Zhezhen Wang [email protected]
test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) # assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4') } cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4') } membersL_noweight <- getMCI(cluster, test) idx = c(1, 2, 1) names(idx) = names(membersL_noweight[['sd']]) selectedSD = getMaxStats(membersL_noweight[['sd']], idx)
test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) # assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4') } cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4') } membersL_noweight <- getMCI(cluster, test) idx = c(1, 2, 1) names(idx) = names(membersL_noweight[['sd']]) selectedSD = getMaxStats(membersL_noweight[['sd']], idx)
This function calculates a module critical index (MCI) score for
each module per state within a dataset. Each module is a cluster of
transcripts generated from the function getCluster_methods
.
Note that a dataset should contains three or more states (samples in
groups).
getMCI( groups, countsL, adjust.size = FALSE, fun = c("cor", "BioTIP"), df = NULL )
getMCI( groups, countsL, adjust.size = FALSE, fun = c("cor", "BioTIP"), df = NULL )
groups |
A list of elements whose length is the member of states. The
elements could be either be vectors or |
countsL |
A list of x numeric count matrices or x data frame, where x is the number of states. |
adjust.size |
A Boolean value indicating if MCI score should be adjusted by module size (the number of transcripts in the module) or not. Default FALSE. This parameter is not recommended for fun=BioTIP. |
fun |
A character chosen between ("cor", "BioTIP"), indicating where an adjusted correlation matrix will be used to calculate the MCI score. |
df |
NULL or a numeric matrix or data frame, where rows and columns represent
unique transcript IDs (geneID) and sample names, respectively.
Used only when |
A list of five elements (members, MCI, Sd, PCC, and PCCo). Each of
element is a two-layer nested list whose length is the length of the input
object groups
. Each internal nested list is structured according to
the number of modules identified in that state.
members: vectors of unique ids
MCI: the MCI score
sd: standard deviation
PCC: Mean of pairwise Pearson Correlation Coefficient calculated among the loci in a module.
PCCo: Mean of pairwise Pearson Correlation Coefficient calculated between the loci in a module and the loci outside that module but inside the same state.
Zhezhen Wang [email protected]; Xinan H Yang [email protected]
test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) ## Assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4')} cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4')} membersL_noweight <- getMCI(cluster, test, fun='cor') names(membersL_noweight) ## [1] "members" "MCI" "sd" "PCC" "PCCo"
test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) ## Assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4')} cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4')} membersL_noweight <- getMCI(cluster, test, fun='cor') names(membersL_noweight) ## [1] "members" "MCI" "sd" "PCC" "PCCo"
This function calculates random MCI score, allowing an estimation of correlation matrix using the Schafer-Strimmer Method for the PCC_in component in the MCI score.
getMCI_inner( members, countsL, adjust.size, fun = c("cor", "BioTIP"), PCC_gene.target = "zero", M = NULL )
getMCI_inner( members, countsL, adjust.size, fun = c("cor", "BioTIP"), PCC_gene.target = "zero", M = NULL )
members |
An integer that is the length of genes in the CTS (critical transition signal). |
countsL |
A list of subset of dat matrix. The list length is the number of states. Each subset of matrix gives the genes in rows and samples in column. |
adjust.size |
A boolean value indicating if MCI score should be adjust by module size (the number of transcripts in the module) or not. Default FALSE. |
fun |
A character chosen between ("cor", "BioTIP"), indicating where an adjusted correlation matrix will be used to calculated the MCI score. |
PCC_gene.target |
A character 'zero' indicating that gene-gene correlation matrix will be shrunk. towards zero, used only for fun='BioTIP'. |
M |
is the overall shrunk correlation matrix, used only for fun='BioTIP'. |
A vector recording one MCI score per state.
Zhezhen Wang [email protected]; Xinan H Yang [email protected]
This function builds one correlation network for each state (sample group) and runs across all states. The network nodes are defined by the context of the input dataset. For transcriptomic network analysis, network nodes can be the expressed transcript IDs and network links can be the correlation coefficients. Using the Pearson Correlation Coefficient (PCC) analysis, this function assembles a correlation network of nodes (e.g., co-expressed transcripts) for each state using the R package igraph.
getNetwork(optimal, fdr = 0.05)
getNetwork(optimal, fdr = 0.05)
optimal |
A list of x numeric data frames, where x is the number of
states studied. Each data frame consists of loci with high standard
deviations. This object can be obtained through |
fdr |
A numeric cutoff value for a Pearson Correlation Coefficient (PCC) analysis. Default is 0.05. Transcripts are linked into a network if their correlations meet this PCC-significance criterion. |
A list of igraph objects whose length is the length of the input
object optimal
. Each object is a network of correlated nodes whose
PCCs meet the significant criteria based on the false discovery rate (FDR)
control. The length of the list is the number of states with PCC networks.
If no PCC meets the significant criteria in a state, the state
will be deleted from the output.
Zhezhen Wang [email protected]; Xinan H Yang [email protected]
test = list('state1' = matrix(sample(1:10, 6), 2, 3), 'state2'=matrix(sample(1:10, 6), 2, 3), 'state3' = matrix(sample(1:10, 6), 2, 3)) for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = 1:2} igraphL <- getNetwork(test, fdr=1) #[1] "state1:2 nodes" #[1] "state2:2 nodes" #[1] "state3:2 nodes
test = list('state1' = matrix(sample(1:10, 6), 2, 3), 'state2'=matrix(sample(1:10, 6), 2, 3), 'state3' = matrix(sample(1:10, 6), 2, 3)) for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = 1:2} igraphL <- getNetwork(test, fdr=1) #[1] "state1:2 nodes" #[1] "state2:2 nodes" #[1] "state3:2 nodes
The getReadthrough
() function is used to find long transcripts that cover more
than two coding regions for gene regions of interst.
getReadthrough(gr, cod_gr)
getReadthrough(gr, cod_gr)
gr |
A GRanges object that shows the start and end loci on genome. |
cod_gr |
A GRanges object contaning coding regions. |
For details of findOverlaps, type.partialOverlap, type.50Overlap type.toPlot, queryhits, readthrough and subjecthits see, GenomicRanges https://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.html, IRanges https://www.bioconductor.org/packages/release/bioc/html/IRanges.html, and BiocManager http://bioconductor.org/install/index.html.
A GRanges object that returns overlapping regions of the classified transcript biotypes.
Replace the path_file when loading data locally to the data directory.
Zhezhen Wang and Biniam Feleke
Reference GRCh37 genome https://www.gencodegenes.org/human/release_25lift37.html. For details on gtf format visit ensemble https://useast.ensembl.org/info/website/upload/gff.html.
Wang, Z. Z., J. M. Cunningham and X. H. Yang (2018).'CisPi: a transcriptomic score for disclosing cis-acting disease-associated lincRNAs.' Bioinformatics34(17): 664-670'
#First Load datasets and libraries library(GenomicRanges) data("gencode") data("ILEF") data("cod") # Assigning datasets a GRanges object gencode_gr = GRanges(gencode) ILEF_gr = GRanges(ILEF) cod_gr = GRanges(cod) getReadthrough(ILEF_gr, cod_gr) ## Not run: getReadthrough(cod_gr)
#First Load datasets and libraries library(GenomicRanges) data("gencode") data("ILEF") data("cod") # Assigning datasets a GRanges object gencode_gr = GRanges(gencode) ILEF_gr = GRanges(ILEF) cod_gr = GRanges(cod) getReadthrough(ILEF_gr, cod_gr) ## Not run: getReadthrough(cod_gr)
A gene annotation samples with their corresponding geneId extracted from GENCODE database. A python script was then run to extcat GSE6136_cli dataset. The script is included in the R/data_raw folder.
GSE6136_cli
GSE6136_cli
A dataframe with 22690 columns and 27 rows column.
Names of gene interest
Summary of GSM genes
https://www.gencodegenes.org/human/
A gene annotation samples with their corresponding geneId extracted from GENCODE database. A python script was then run to extcat GSE6136_cli dataset. The script is included in the R/data_raw folder.
GSE6136_matrix
GSE6136_matrix
A dataframe with 22690 columns and 27 rows column.
Names of gene interest
Reference ID of the target genes
https://www.gencodegenes.org/human/
A dataset containing chromosomes in the genome regions of interest for 137 chromosome. The variables are as follows:
ILEF
ILEF
A data frame of chr21 with 137 rows and 4 variables:
chromosome names (chr1,chr6,chr21)
gene read start position (167684657,167729508)
end of gene read position (15710335,43717938)
width of gene
specific strand of the genomic location (+,-,*
name of the data rows(A1BG,vawser)
A dataset containing chromosomes in the genome regions of interest. The variables are as follows:
intron
intron
A data frame with 659327 rows and 5 variables:
chromosome names (chr1,chrM,chr21)
chromosome ranges on the genome(167684657–167729508)
specific strand of the genomic location (+,-,*)
internal assigned names(uc001aaa.3_intron_0_0_chr1_12228_f)
score not used in this data set(0)
The optimize.sd_selection
filters a multi-state dataset
based on a cutoff value for standard deviation per state and optimizes.
By default, a cutoff value of 0.01 is used. Suggested if each state contains more than 10 samples.
optimize.sd_selection( df, samplesL, B = 100, percent = 0.8, times = 0.8, cutoff = 0.01, method = c("other", "reference", "previous", "itself", "longitudinal reference"), control_df = NULL, control_samplesL = NULL )
optimize.sd_selection( df, samplesL, B = 100, percent = 0.8, times = 0.8, cutoff = 0.01, method = c("other", "reference", "previous", "itself", "longitudinal reference"), control_df = NULL, control_samplesL = NULL )
df |
A dataframe of numerics. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively. |
samplesL |
A list of n vectors, where n equals to the number of states. Each vector gives the sample names in a state. Note that the vectors (sample names) has to be among the column names of the R object 'df'. |
B |
An integer indicating number of times to run this optimization, default 1000. |
percent |
A numeric value indicating the percentage of samples will be selected in each round of simulation. |
times |
A numeric value indicating the percentage of |
cutoff |
A positive numeric value. Default is 0.01. If < 1, automatically
goes to select top x percentage transcripts using the a selecting method (which is
either the |
method |
Selection of methods from
|
control_df |
A count matrix with unique loci as row names and samples names
of control samples as column names, only used for method |
control_samplesL |
A list of characters with stages as names of control samples, required for method 'longitudinal reference'. |
A list of dataframe of filtered transcripts with the highest standard
deviation are selected from df
based on a cutoff value assigned. The
resulting dataframe represents a subset of the raw input df
.
Zhezhen Wang [email protected]
Generate a line (or box) plot of Ic score and simulated Ic scores, with three horizontal lines: the min, max and 2*(max-min) value of the state of interests, or all values.
plot_Ic_Simulation( Ic, simulation, las = 0, ylim = NULL, order = NULL, main = NULL, ylab = "Ic", fun = c("matplot", "boxplot"), which2point = NULL )
plot_Ic_Simulation( Ic, simulation, las = 0, ylim = NULL, order = NULL, main = NULL, ylab = "Ic", fun = c("matplot", "boxplot"), which2point = NULL )
Ic |
A vector with names of states. If order is not assigned, then plot by the order of this vector. |
simulation |
A numeric matrix of Ic scores in which rows are states and columns are numbers of simulated times.
It can be obtained from |
las |
Numeric in 0, 1, 2, 3; the style of axis labels. Default is 0, meaning labels are parallel. (link to http://127.0.0.1:21580/library/graphics/html/par.html). |
ylim |
An integer vector of length 2. Default is NULL. |
order |
Characters of names of Ic to be plotted in a desired |
main |
A character vector. The title of the plot. Default is NULL. |
ylab |
titles y axes, as in plot. |
fun |
A character choose between ('matplot', 'boxplot'), indicating plot type. |
which2point |
A character (or integer) which state's values were used to set up the three horizontal lines. by default is NULL, indicating the values of all states will be used. |
Return a plot of the observed Ic (red) and simulated Ic (grey) scores per states.
Zhezhen Wang [email protected]; Xinan H Yang [email protected]
sim = matrix(sample(1:10, 9), 3, 3) row.names(sim) = paste0('state', 1:3) Ic = c('state1' = 3.4, 'state2' = 5.6, 'state3' = 2) plot_Ic_Simulation(Ic, sim)
sim = matrix(sample(1:10, 9), 3, 3) row.names(sim) = paste0('state', 1:3) Ic = c('state1' = 3.4, 'state2' = 5.6, 'state3' = 2) plot_Ic_Simulation(Ic, sim)
Box plots of observed (red) and simulated MCI scores by boostraping genes B
times,
with three horizontal lines: the min, max and 2*(max-min) value of the state of interests, or all all values.
plot_MCI_Simulation( MCI, simulation, las = 0, order = NULL, ylim = NULL, main = NULL, which2point = NULL, ... )
plot_MCI_Simulation( MCI, simulation, las = 0, order = NULL, ylim = NULL, main = NULL, which2point = NULL, ... )
MCI |
A named vector of max CI scores per state, can be obtained from function |
simulation |
A matrix state * number of simulated times, can be obtained from function |
las |
Numeric in 0, 1, 2, 3; the style of axis labels. Default is 0, meaning labels are parallel. (link to http://127.0.0.1:21580/library/graphics/html/par.html) |
order |
A vector of state names in the customized order to be plotted, set to NULL by default. |
ylim |
An integer vector of length 2. Default is NULL. |
main |
A character vector. The title of the plot. Default is NULL. |
which2point |
A character (or integer) which state's values were used to set up the three horizontal lines. by default is NULL, indicating the values of all states will be used. |
... |
Other parameters passed to this function |
Return a box plot of MCI(red) and simulated MCI(grey) scores per state.
Zhezhen Wang [email protected]
MCI = c(1:3); names(MCI) = c('a', 'b', 'c') simMCI = matrix(sample(1:100, 9), 3, 3) row.names(simMCI) = names(MCI) plot_MCI_Simulation(MCI, simMCI)
MCI = c(1:3); names(MCI) = c('a', 'b', 'c') simMCI = matrix(sample(1:100, 9), 3, 3) row.names(simMCI) = names(MCI) plot_MCI_Simulation(MCI, simMCI)
Generate a density plot of Ic score (orBioTIP score) from a simulation, which is the distance between the first-larget and the second-largest random scores. This is an alternative method to estimate the significance of an observed BioTIP (or Ic) score in a system. This measurement makes more sense to evaluate random scores of sample-label shuffling, in which the nature sample-sample correlation within a phenotypic state (or cell subpopulation) was removed.
plot_SS_Simulation( Ic, simulation, las = 0, xlim = NULL, ylim = NULL, order = NULL, main = "1st max - 2nd max", ylab = "1st max - 2nd max" )
plot_SS_Simulation( Ic, simulation, las = 0, xlim = NULL, ylim = NULL, order = NULL, main = "1st max - 2nd max", ylab = "1st max - 2nd max" )
Ic |
A vector with names of states. If order is not assigned, then plot by the order of this vector. |
simulation |
A numeric matrix of Ic scores in which rows are states and columns are numbers of simulated times.
It can be obtained from |
las |
Numeric in 0, 1, 2, 3; the style of axis labels. Default is 0, meaning labels are parallel. (link to http://127.0.0.1:21580/library/graphics/html/par.html). |
xlim |
An integer vector of length 2. Default is NULL. |
ylim |
An integer vector of length 2. Default is NULL. |
order |
Characters of names of Ic to be plotted in a desired |
main |
A character vector. The title of the plot. Default is NULL. |
ylab |
titles y axes, as in plot. |
Return a plot of the observed Ic (red) and simulated Ic (grey) scores per state.
Xinan H Yang [email protected]
sim = matrix(sample(1:10, 9), 3, 3) row.names(sim) = paste0('state', 1:3) Ic = c('state1' = 3.4, 'state2' = 15.6, 'state3' = 2) plot_SS_Simulation(Ic, sim)
sim = matrix(sample(1:10, 9), 3, 3) row.names(sim) = paste0('state', 1:3) Ic = c('state1' = 3.4, 'state2' = 15.6, 'state3' = 2) plot_SS_Simulation(Ic, sim)
A barplot of MCI for all clusters in all states.
plotBar_MCI( MCIl, ylim = NULL, nr = 1, nc = NULL, order = NULL, minsize = 3, states = NULL )
plotBar_MCI( MCIl, ylim = NULL, nr = 1, nc = NULL, order = NULL, minsize = 3, states = NULL )
MCIl |
A list can be obtained through getMCI. |
ylim |
A vector needed if the output barplots need to be on the same y scale. |
nr |
The number of rows to plot. |
nc |
The number of columns to plot, default length(groups). |
order |
A character vector of the order of the barplot. Default is NULL which uses the input list order. |
minsize |
A non-negative numeric value of minimum size allowed for a cluster. |
states |
A character of the state names to be shown on the plot. Default is NULL, assign this if you want to show all states including states without nodes. |
Return a barplot of MCI scores across states.
Zhezhen Wang [email protected]
Chen L, Liu R, Liu Z, Li M & Aihara K (2012) Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers Scientific Reports 2:342
test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) # assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4') } cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4') } membersL_noweight <- getMCI(cluster, test) plotBar_MCI(membersL_noweight)
test = list('state1' = matrix(sample(1:10, 6), 4, 3), 'state2' = matrix(sample(1:10, 6), 4, 3), 'state3' = matrix(sample(1:10, 6), 4, 3)) # assign colnames and rownames to the matrix for(i in names(test)){ colnames(test[[i]]) = 1:3 row.names(test[[i]]) = c('g1', 'g2', 'g3', 'g4') } cluster = list(c(1, 2, 2, 1), c(1, 2, 3, 1), c(2, 2, 1, 1)) names(cluster) = names(test) for(i in names(cluster)){ names(cluster[[i]]) = c('g1', 'g2', 'g3', 'g4') } membersL_noweight <- getMCI(cluster, test) plotBar_MCI(membersL_noweight)
plot a line plot with Ic score for each state
plotIc( Ic, las = 0, order = NULL, ylab = "Ic", col = "black", main = NULL, add = FALSE, ylim = NULL, lty = 1:5, lwd = 1 )
plotIc( Ic, las = 0, order = NULL, ylab = "Ic", col = "black", main = NULL, add = FALSE, ylim = NULL, lty = 1:5, lwd = 1 )
Ic |
A vector with names of states. If order is not assigned, then plot by the order of this vector. |
las |
Numeric in 0, 1, 2, 3; the style of axis labels. Default is 0, meaning labels are parallel. (link to http://127.0.0.1:21580/library/graphics/html/par.html). |
order |
A vector of state names in the customized order to be plotted, setting to NULL by default. |
ylab |
titles y axes, as in plot. |
col |
vector of colors. Colors are used cyclically. |
main |
A character vector. The title of the plot. Default is NULL. |
add |
logical. If TRUE, plots are added to current one. This is inherited from matplot. |
ylim |
An integer vector of length 2. Default is NULL. |
lty |
An vector of line types. This is also inherited from matplot. |
lwd |
Anineger of line widths. This is also inherited from matplot. |
Return a line plot of Ic score across states.
Zhezhen Wang [email protected]
Ic = c('state3' = 3.4, 'state1' = 5.6, 'state2' = 2) plotIc(Ic, order = c('state1', 'state2', 'state3'))
Ic = c('state3' = 3.4, 'state1' = 5.6, 'state2' = 2) plotIc(Ic, order = c('state1', 'state2', 'state3'))
This function generates a line plot over multiple states with the maximum MCI score per state. The module size (i.e., number of network nodes) is specified at each state in parentheses.
plotMaxMCI(maxMCIms, MCIl, las = 0, order = NULL, states = NULL)
plotMaxMCI(maxMCIms, MCIl, las = 0, order = NULL, states = NULL)
maxMCIms |
A list of 2 elements. The 1st element is an integer vector of module ids whose names are the state names.
The 2nd element is a list of character vectors per state. The vectors are network nodes (e.g. transcript ids).
This parameter can be obtained by running function |
MCIl |
A list of numeric vectors whose names are unique cluster ids.
Each vector represents the MCI scores of modules in a state.
This can be the second element of the output from the function |
las |
Numeric in 0, 1, 2, 3; the style of axis labels. Default is 0, meaning labels are parallel.
See |
order |
A vector of state names in the customized order to be plotted, setting to NULL by default. |
states |
A character vector of state names that will be shown on the plot, setting to NULL by default. Assign this if you want to show all states, including states with no resulted modules. This parameter will overwrite the parameter 'order'. |
Returns a line plot of maximum MCI scores across the states
Zhezhen Wang [email protected]
maxMCIms = list(c(state1 = 1, state2 = 2, state3 = 1), c(list(state1 = c('g1', 'g2', 'g3'), state2 = c('g3', 'g5'), state3 = c('g2', 'g6')))) MCIl = list(state1=c('1' = 8.84, '2' = 6.4), state2 = c('1' =NA, '2' = 9.5, '3' = NA), state3 = c('1' = 2.3, '2' = 1.4)) plotMaxMCI(maxMCIms, MCIl)
maxMCIms = list(c(state1 = 1, state2 = 2, state3 = 1), c(list(state1 = c('g1', 'g2', 'g3'), state2 = c('g3', 'g5'), state3 = c('g2', 'g6')))) MCIl = list(state1=c('1' = 8.84, '2' = 6.4), state2 = c('1' =NA, '2' = 9.5, '3' = NA), state3 = c('1' = 2.3, '2' = 1.4)) plotMaxMCI(maxMCIms, MCIl)
sd_selection
pre-selects highly oscillating transcripts
from the input dataset df
. The dataset must contain multiple sample
groups (or 'states'). For each state, the function filters the dataset using
a cutoff value for standard deviation. The default cutoff value is 0.01
(i.e., higher than the top 1 percentage standard deviation).
sd_selection( df, samplesL, cutoff = 0.01, method = c("other", "reference", "previous", "itself", "longitudinal reference"), control_df = NULL, control_samplesL = NULL )
sd_selection( df, samplesL, cutoff = 0.01, method = c("other", "reference", "previous", "itself", "longitudinal reference"), control_df = NULL, control_samplesL = NULL )
df |
A numeric matrix or data frame. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively. |
samplesL |
A list of vectors, whose length is the number of states. Each vector gives the sample names in a state. Note that the vectors (sample names) has to be among the column names of the R object 'df'. |
cutoff |
A positive numeric value. Default is 0.01. If < 1,
automatically selects top x transcripts using the a selecting
method (which is either the |
method |
Selection of methods from
|
control_df |
A count matrix with unique loci as row names and samples names
of control samples as column names, only used for method |
control_samplesL |
A list of characters with stages as names of control samples, required for method 'longitudinal reference' |
sd_selection()
A list of data frames, whose length is the number
of states. The rows in each data frame are the filtered transcripts with
highest standard deviation selected from df
and based on an assigned cutoff
value. Each resulting data frame represents a subset of the raw
input df
, with the sample ID of the same state in the column.
Zhezhen Wang [email protected]
counts = matrix(sample(1:100, 18), 2, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) test_sd_selection <- sd_selection(counts, samplesL, 0.01)
counts = matrix(sample(1:100, 18), 2, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) test_sd_selection <- sd_selection(counts, samplesL, 0.01)
Simulating Ic scores for x
randomly selected samples, where x should be the same
as the length of identified critical-transition signal (CTS) (e.g., number of genes) and B
is self-defined running times.
simulation_Ic( obs.x, sampleL, counts, B = 1000, fun = c("cor", "BioTIP"), shrink = TRUE, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), output = c("Ic", "PCCg", "PCCs") )
simulation_Ic( obs.x, sampleL, counts, B = 1000, fun = c("cor", "BioTIP"), shrink = TRUE, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), output = c("Ic", "PCCg", "PCCs") )
obs.x |
An integer, length of identified CTS. |
sampleL |
A list of vectors, whose length is the number of states. Each vector gives the sample names in a state. Note that the vector s (sample names) has to be among the column names of the R object 'df'. |
counts |
A numeric matrix or dataframe in which columns are samples and rows are transcripts. Each row needs to have a unique row name (i.e. transcript ID). |
B |
An integer, setting the permutation with |
fun |
An optional character string indicating the R functon to calculate correlations
for all possible pairs of columns of a matrix.
When using "BioTIP", The method is modified to ignore missing values, analogous to how
|
shrink |
A flag specifying whether to shrink the correlation or not.
This appraoch uses the method outlined by Schafer and Strimmer in
"A Shrinkage Approach to Large-Scale Covariance Matrix Estimation
and Implications for Functional Genomics" (2005)
Comparing to fun='cor', the 'BioTIP' method without shinkage is modified
to ignore missing values, analogous to how |
use |
An optional character string, when fun=="cor", it gives a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". |
output |
A string. Please select from 'Ic', 'PCCg', or 'PCCs'. Uses 'Ic' by default. 'PCCg' is the PCC between genes (numerator) and 'PCCs' is PCC between samples (denominator) |
A matrix of y
rows and B
columns where y
is the length of sampleL
and B
is self-defined. Each column is a set of Ic scores calculated for each state
Zhezhen Wang [email protected]
counts = matrix(sample(1:100, 27), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) simulation_Ic(2, samplesL, counts, B =3, fun="BioTIP", shrink=TRUE)
counts = matrix(sample(1:100, 27), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) simulation_Ic(2, samplesL, counts, B =3, fun="BioTIP", shrink=TRUE)
Run B
times of sample-label shuffling to calculate the Ic score,
where x should be the same as the length of identified BioTiP and B is self-defined.
simulation_Ic_sample( counts, sampleNo, Ic = NULL, genes, B = 1000, ylim = NULL, main = "simulation of samples", fun = c("cor", "BioTIP"), shrink = TRUE, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), output = c("Ic", "PCCg", "PCCs"), plot = FALSE )
simulation_Ic_sample( counts, sampleNo, Ic = NULL, genes, B = 1000, ylim = NULL, main = "simulation of samples", fun = c("cor", "BioTIP"), shrink = TRUE, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), output = c("Ic", "PCCg", "PCCs"), plot = FALSE )
counts |
A numeric matrix or data frame. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively. |
sampleNo |
An integer of sample size at the tipping-point state. |
Ic |
A numeric value. Ic score of identified CTS (gene-set), useful when |
genes |
A character vector of identified CTS gene unique ids. |
B |
An integer indicating number of times to run this simulation, default 1000. |
ylim |
An integer vector of length 2. Default is NULL. |
main |
A character vector. The title of the plot. Default is NULL. |
fun |
An optional character string indicating the R functon to calculate correlations
for all possible pairs of columns of a matrix.
When using "BioTIP", The method is modified to ignore missing values, analogous to how
|
shrink |
A flag specifying whether to shrink the correlation or not.
This appraoch uses the method outlined by Schafer and Strimmer in
"A Shrinkage Approach to Large-Scale Covariance Matrix Estimation
and Implications for Functional Genomics" (2005)
Comparing to fun='cor', the 'BioTIP' method without shinkage is modified
to ignore missing values, analogous to how |
use |
An optional character string, when fun=="cor", it gives a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". |
output |
A string. Please select from 'Ic', 'PCCg', or 'PCCs'. Uses 'Ic' by default. 'PCCg' is the PCC between genes (numerator) and 'PCCs' is PCC between samples (denominator) |
plot |
A Bollen value indicating whether a density plot will be plotted. |
A vector of B
values of BioTIP (or Ic) scores calculated for the state of interest.
Zhezhen Wang [email protected]; Xinan H Yang [email protected]
counts = matrix(sample(1:100, 27), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') CTS = c('loci1', 'loci2') randomS <- simulation_Ic_sample(counts, sampleNo=3, Ic=3.4, genes=CTS, B=3, fun='BioTIP', plot=TRUE) dim(randomS)
counts = matrix(sample(1:100, 27), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') CTS = c('loci1', 'loci2') randomS <- simulation_Ic_sample(counts, sampleNo=3, Ic=3.4, genes=CTS, B=3, fun='BioTIP', plot=TRUE) dim(randomS)
This function gets the MCI scores for randomly selected features (e.g. transcript ids),
simulationMCI( len, samplesL, df, adjust.size = FALSE, B = 1000, fun = c("cor", "BioTIP") )
simulationMCI( len, samplesL, df, adjust.size = FALSE, B = 1000, fun = c("cor", "BioTIP") )
len |
An integer that is the length of genes in the CTS (critical transition signal). |
samplesL |
A list of vectors, whose length is the number of states. Each vector gives the sample names in a state. Note that the vector s (sample names) has to be among the column names of the R object 'df'. |
df |
A numeric matrix or dataframe of numerics, factor or character. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively |
adjust.size |
A boolean value indicating if MCI score should be adjust by module size (the number of transcripts in the module) or not. Default FALSE. |
B |
An integer, setting the permutation with |
fun |
A character chosen between ("cor", "BioTIP"), indicating where an adjusted correlation matrix will be used to calculated the MCI score. |
A numeric matrix indicating the MCI scores of permutation.
The dimension (row X column) of this matrix is the length of samplesL
* B
.
Zhezhen Wang [email protected]; Xinan H Yang [email protected]
counts = matrix(sample(1:100, 18), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) simMCI = simulationMCI(2, samplesL, counts, B=2) simMCI # [,1] [,2] #state1 2.924194 2.924194 #state2 20.877138 20.877138 #state3 2.924194 2.924194
counts = matrix(sample(1:100, 18), 3, 9) colnames(counts) = 1:9 row.names(counts) = c('loci1', 'loci2', 'loci3') cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) simMCI = simulationMCI(2, samplesL, counts, B=2) simMCI # [,1] [,2] #state1 2.924194 2.924194 #state2 20.877138 20.877138 #state3 2.924194 2.924194