| Title: | Removal of unwanted variation for gene-gene correlations and related analysis |
|---|---|
| Description: | RUVcorr allows to apply global removal of unwanted variation (ridged version of RUV) to real and simulated gene expression data. |
| Authors: | Saskia Freytag |
| Maintainer: | Saskia Freytag <[email protected]> |
| License: | GPL-2 |
| Version: | 1.45.0 |
| Built: | 2026-05-30 06:53:58 UTC |
| Source: | https://github.com/bioc/RUVcorr |
assessQuality allows to assess the quality of cleaning procedures in the context
of correlations when the true underlying correlation structure is known.
assessQuality( est, true, index = "all", methods = c("all", "fnorm", "wrong.sign") )assessQuality( est, true, index = "all", methods = c("all", "fnorm", "wrong.sign") )
est |
A matrix of estimated gene expression values. |
true |
A matrix of true correlations. |
index |
A vector of indices of genes to be included in
the assessment; if |
methods |
The method used for quality assessment;
if |
The squared Frobenius norm used for assessQuality has the following structure
Here, the parameter and the parameter denote the lower triangles of the
estimated and true Fisher transformed correlation matrices, respectively.
The parameter denotes the number of elements in and .
assessQuality returns a vector of the requested quality assessments.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) assessQuality(Y$Y, Y$Sigma, index=1:100, methods="wrong.sign") assessQuality(Y$Y, Y$Sigma, index=1:100, method="fnorm")Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) assessQuality(Y$Y, Y$Sigma, index=1:100, methods="wrong.sign") assessQuality(Y$Y, Y$Sigma, index=1:100, method="fnorm")
background returns background genes for judging the quality of the cleaning.
These genes are supposed to represent the majority of genes. The positive control and
negative control genes should be excluded.
background(Y, nBG, exclude, nc_index)background(Y, nBG, exclude, nc_index)
Y |
A matrix of gene expression values or an object of
the class |
nBG |
An integer setting the number of background genes. |
exclude |
A vector of indices of genes to exclude. |
nc_index |
A vector of indices of negative controls (also excluded from being background genes). |
background returns a vector of randomly chosen indices.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) background(Y, nBG=20, exclude=1:100, nc_index=251:500)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) background(Y, nBG=20, exclude=1:100, nc_index=251:500)
calculateThreshold returns the proportion of prioritised genes from a random selection
for supplied threshold. Furthermore, this function also fits a loess curve to the estimated points.
This allows the calculation of a threshold for priortisation of genes.
calculateThreshold( X, exclude, index.ref, set.size = length(index.ref), Weights = NULL, thresholds = seq(0.05, 1, 0.05), anno = NULL, Factor = NULL, cpus = 1, parallel = FALSE )calculateThreshold( X, exclude, index.ref, set.size = length(index.ref), Weights = NULL, thresholds = seq(0.05, 1, 0.05), anno = NULL, Factor = NULL, cpus = 1, parallel = FALSE )
X |
A matrix of gene expression values. |
exclude |
A vector of indices of genes to exclude. |
index.ref |
A vector of indices of reference genes used for prioritisation. |
set.size |
An integer giving the size of the set of genes that are to be prioritised. |
Weights |
A object of class |
thresholds |
A vector of thresholds; values should be in the range |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
cpus |
An integer giving the number of cores that are supposed to be used. |
parallel |
A logical value indicating whether parallel comuting should be used. |
The proportion of prioritized random genes is estimated by drawing 1000 random sets of genes and calculating how many would be prioritised at every given threshold. A gene is is prioritised if at least one correlation with a known reference gene is above the given threshold.
calculateThreshold returns an object of class Threshold.
An object of class Threshold is a list with the following components:
Prop.values A vector of the proportion of prioritized genes.
Thresholds A vector containing the values in threshold.
loess.estimate An object of class loess.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") calculateThreshold(Y$Y, exclude=seq(251,500,1), index.ref=seq_len(10), Weights=weights, anno=anno, Factor="Factor")Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") calculateThreshold(Y$Y, exclude=seq(251,500,1), index.ref=seq_len(10), Weights=weights, anno=anno, Factor="Factor")
compareRanks allows to calculate the difference of the ranks of known reference
gene pairs from two versions of the same data.
compareRanks(Y, Y.hat, ref_index, no.random = 1000, exclude_index)compareRanks(Y, Y.hat, ref_index, no.random = 1000, exclude_index)
Y |
A matrix of raw gene expression values. |
Y.hat |
A matrix of cleaned gene expression values. |
ref_index |
A vector of indices that are referrring to genes of interest. |
no.random |
An integer giving the number of random genes. |
exclude_index |
A vector of indices to be excluded from the selection of random genes. |
The correlations between all random genes and reference genes is calculated
(including correlations between random and reference) using the two versions of
the data. The correlations are then ranked according to their absolute value (highest
to lowest). The ranks of the reference gene pairs are extracted. For a paticular
reference gene pair, the difference in the ranks between the two versions of the data
is calculated:
Rank in Y - Rank in Y.hat
compareRanks returns a vector of the differences in ranks of
the correlations of reference gene pairs estimated using raw or cleaned data.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nu=0, kW=10) compareRanks(Y$Y, Y.hat, ref_index=1:30, no.random=100, exclude_index=c(31:100,251:500))Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nu=0, kW=10) compareRanks(Y$Y, Y.hat, ref_index=1:30, no.random=100, exclude_index=c(31:100,251:500))
correlationPlot produces a correlation plot to compare true and estimated
correlationPlot( true, est, plot.genes = sample(seq_len(dim(true)[1]), 18), boxes = TRUE, title, line = -1 )correlationPlot( true, est, plot.genes = sample(seq_len(dim(true)[1]), 18), boxes = TRUE, title, line = -1 )
true |
A matrix of true gene-gene correlation values. |
est |
A matrix of estimated gene expression values. |
plot.genes |
A vector of indices of genes used in plotting; the suggested length of this vector is 18. |
boxes |
A logical scalar to indicate whether boxes
are drawn around sets of 6 genes; only available if |
title |
A character string describing the title of the plot. |
line |
on which MARgin line, starting at 0 counting outwards. |
The upper triangle of the correlation plot shows the true gene-gene correlation values, while the lower triangle of the correlation plot shows the gene-gene correlation values calculated from the estimated gene expression values. This is possible because correlation matrices are symmetric.
correlationPlot returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) correlationPlot(Y$Sigma, Y$Y, title="Raw", plot.genes=c(sample(1:100, 6), sample(101:250, 6), sample(251:500, 6)))Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) correlationPlot(Y$Sigma, Y$Y, title="Raw", plot.genes=c(sample(1:100, 6), sample(101:250, 6), sample(251:500, 6)))
ECDFPlot generates empirical cumulative distribution
functions (ECDF) for gene-gene correlation values.
ECDFPlot(X, Y, index = "all", col.X = "red", col.Y = "black", title, legend)ECDFPlot(X, Y, index = "all", col.X = "red", col.Y = "black", title, legend)
X |
A matrix or list of matrices of estimated gene-gene correlations. |
Y |
A matrix of reference gene-gene correlations (i.e. underlying known correlation structure). |
index |
A vector of indicies of genes of interest. |
col.X |
The color or colors for ECDF as estimated from |
col.Y |
The color for ECDF as estimated from |
title |
A character string describing title of plot. |
legend |
A vector describing |
ECDFPlot returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) Y.hat.cor<-cor(Y.hat) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) ECDFPlot(Y.hat.cor, Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Truth")) ECDFPlot(list(Y.hat.cor, cor(Y$Y)), Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Raw", "Truth"), col.Y="black")Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) Y.hat.cor<-cor(Y.hat) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) ECDFPlot(Y.hat.cor, Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Truth")) ECDFPlot(list(Y.hat.cor, cor(Y$Y)), Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Raw", "Truth"), col.Y="black")
eigenvaluePlot plots the ratio of the ith eigenvalue
of the SVD of the negative controls to the eigenvalue total.
eigenvaluePlot(Y, nc_index, k = 10, center = TRUE, title = "Eigenvalue Plot")eigenvaluePlot(Y, nc_index, k = 10, center = TRUE, title = "Eigenvalue Plot")
Y |
A matrix of gene expressions. |
nc_index |
A vector of indices for the negative controls. |
k |
A numeric value giving the number of eigenvalues that should be displayed. |
center |
A logical character to indicate whether centering is needed. |
title |
A character string describing title. |
eigenvaluePlot returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) eigenvaluePlot(Y$Y, nc_index=251:500, k=20, center=TRUE)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) eigenvaluePlot(Y$Y, nc_index=251:500, k=20, center=TRUE)
empNegativeControls finds suitable negative controls in real or simulated data.
empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## Default S3 method: empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## S3 method for class 'simulateGEdata' empNegativeControls(Y, exclude, smoothing = 0.1, nc)empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## Default S3 method: empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## S3 method for class 'simulateGEdata' empNegativeControls(Y, exclude, smoothing = 0.1, nc)
Y |
A matrix of gene expression values or an
object of the class |
exclude |
A vector of indices to be excluded from being chosen as negative controls. |
smoothing |
A numerical scalar determining the amount of smoothing to be applied. |
nc |
An integer setting the number of negative controls. |
First the mean of all genes (except the excluded genes) is calculated and genes are accordingly assigned to bins. The bins have the size of the smoothing parameter. In each bin the function picks a number of negative control genes proportional to the total number of genes in the bin. The picked genes in each bin have the lowest inter-quantile ranges of all genes in the respective bin.
empNegativeControls returns a vector of indicies of
empirically chosen negative controls.
For simulated data it is advisable to use the known negative controls or restrict the empirical choice to the known negative controls by excluding all other genes.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) empNegativeControls(Y, exclude=1:100, nc=100)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) empNegativeControls(Y, exclude=1:100, nc=100)
findWeights returns a list of variances and weights based on the correlation
between genes for each level of a factor found in the annotation. This function is
typically used to find the weights of each individual in the data set.
findWeights(X, anno, Factor)findWeights(X, anno, Factor)
X |
A matrix of gene expression values. |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
Note that because calculations of weights include finding correlations between all genes,
this function might take some time. Hence, recalculation of weights is not advisable and
should be avoided. However often the inverse variances can be used to calculate new weights.
In particlular, when denotes the weight of the level and
the variance as calculated from the gene-gene correlations:
findWeights returns output of the class Weights.
An object of class Weights is a list with the following components:
Weights A list containing the weights of each level of Factor.
Inv.Sigma A list containing the inverse variances of each level of Factor.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" findWeights(Y$Y, anno, "Factor")Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" findWeights(Y$Y, anno, "Factor")
genePlot plots the means vs. the inter-quantile ranges of the gene
expression values of all genes with the possibility to highlight interesting sets of genes.
genePlot(Y, index = NULL, legend = NULL, col.h = "red", title)genePlot(Y, index = NULL, legend = NULL, col.h = "red", title)
Y |
A matrix of gene expression values or an object of the class |
index |
A vector of indices of genes of interest to be displayed in a different color, if |
legend |
A character string describing the highlighted genes. |
col.h |
The color of the highlighted genes. |
title |
A character string describing the title of the plot. |
genePlot returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) genePlot(Y, index=1:100, legend="Expressed genes", title="IQR-Mean Plot")Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) genePlot(Y, index=1:100, legend="Expressed genes", title="IQR-Mean Plot")
histogramPlot plots histograms of correlation values in expression data and
its reference.
histogramPlot( X, Y, legend, breaks = 40, title, col.X = "red", col.Y = "black", line = NULL )histogramPlot( X, Y, legend, breaks = 40, title, col.X = "red", col.Y = "black", line = NULL )
X |
A matrix or a list of matrices of estimated gene-gene correlations. |
Y |
A matrix of reference gene-gene correlations (i.e. known underlying correlation structure). |
legend |
A vector of character strings describing the data contained in |
breaks |
one of:
In the last three cases the number is a suggestion only; as the
breakpoints will be set to |
title |
A character string describing title. |
col.X |
A vector or character string defining the color/colors associated with the data contained in |
col.Y |
The color associated with the data in |
line |
A vector giving the line type. |
The default for breaks is "Sturges".
Other names for which algorithms are supplied are "Scott" and "FD" / "Freedman-Diaconis"
Case is ignored and partial
matching is used. Alternatively, a function can be supplied which will compute the
intended number of breaks or the actual breakpoints as a function of x.
histogramPlot returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=FALSE) Y.hat.cor<-cor(Y.hat[,1:100]) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) histogramPlot(Y.hat.cor, Y$Sigma[1:100, 1:100], title="Simulated data", legend=c("RUV", "Truth")) try(dev.off(), silent=TRUE) histogramPlot(list(Y.hat.cor, cor(Y$Y[, 1:100])), Y$Sigma[1:100, 1:100], title="Simulated data", col.Y="black", legend=c("RUV", "Raw", "Truth"))Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=FALSE) Y.hat.cor<-cor(Y.hat[,1:100]) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) histogramPlot(Y.hat.cor, Y$Sigma[1:100, 1:100], title="Simulated data", legend=c("RUV", "Truth")) try(dev.off(), silent=TRUE) histogramPlot(list(Y.hat.cor, cor(Y$Y[, 1:100])), Y$Sigma[1:100, 1:100], title="Simulated data", col.Y="black", legend=c("RUV", "Raw", "Truth"))
optimizeParameters class.is.optimizeParameters checks if object is of optimizeParameters class.
is.optimizeParameters(x)is.optimizeParameters(x)
x |
An object. |
is.optimizeParameters returns a logical scalar;
TRUE if the object is of the class optimizeParameters.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE) opt is.optimizeParameters(opt)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE) opt is.optimizeParameters(opt)
simulateGEdata class.is.simulateGEdata checks if object is of simulateGEdata class.
is.simulateGEdata(x)is.simulateGEdata(x)
x |
An object. |
is.simulateGEdata returns a logical scaler;
TRUE if the object is of the class simulateGEdata.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) is.simulateGEdata(Y)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) is.simulateGEdata(Y)
Threshold class.is.Threshold checks if object is of Threshold class.
is.Threshold(x)is.Threshold(x)
x |
An object. |
is.Threshold returns a logical scalar;
TRUE if the object is of the class Threshold.
Saskia Freytag
Weights class.is.Weights checks if object is of Weights class.
is.Weights(x)is.Weights(x)
x |
An object. |
is.Weights returns a logical scaler;
TRUE if the object is of the class Weights.
Saskia Freytag
optimizeParameters returns the optimal parameters to be
used in the removal of unwanted variation procedure when using simulated data.
optimizeParameters( Y, kW.hat = seq(5, 25, 5), nu.hat = c(0, 10, 100, 1000, 10000), nc_index, methods = c("all", "fnorm", "wrong.sign"), cpus = 1, parallel = FALSE, check.input = FALSE )optimizeParameters( Y, kW.hat = seq(5, 25, 5), nu.hat = c(0, 10, 100, 1000, 10000), nc_index, methods = c("all", "fnorm", "wrong.sign"), cpus = 1, parallel = FALSE, check.input = FALSE )
Y |
An object of the class |
kW.hat |
A vector of integers for |
nu.hat |
A vector of values for |
nc_index |
A vector of indices of the negative controls
used in |
methods |
The method used for quality assessment;
if |
cpus |
A number specifiying how many workers to use for parallel computing. |
parallel |
Logical: if |
check.input |
Logical; if |
The simulated data is cleaned using removal of unwanted variation with all combinations of the input parameters. The quality of each cleaning is judged by the Frobenius Norm of the correlation as estimated from the cleaned data and the known data or the percentage of correlations with estimated to have the wrong sign.
optimizeParameters returns output of the class
optimizeParameters.
An object of class optimizeParameters is a list containing the
following components:
All.resultsA matrix of output of the quality assessment for all combinations of input parameters.
Compare.rawA vector of the quality assessment for the uncorrected data.
Optimal.parameterA matrix or a vector giving the optimal parameter combination.
Saskia Freytag
assessQuality, RUVNaiveRidge,
funcPara
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE, check.input=TRUE) optY<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE, check.input=TRUE) opt
PCAPlot generates principle component plots for with the possibility
to color arrays according to a known factor.
PCAPlot( Y, comp = c(1, 2), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, title )PCAPlot( Y, comp = c(1, 2), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, title )
Y |
A matrix of gene expression values or an object of class |
comp |
A vector of length 2 specifying which principle components to be used. |
anno |
A dataframe or a matrix containing the annotation of the arrays. |
Factor |
A character string describing the column name of
|
numeric |
A logical scalar indicating whether |
new.legend |
A vector describing the names used for labelling; if |
title |
A character string giving the title. |
PCAPlot returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) PCAPlot(Y$Y, title="") ## Create random annotation file anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) PCAPlot(Y$Y, anno=anno, Factor="Factor", numeric=TRUE, title="")Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) PCAPlot(Y$Y, title="") ## Create random annotation file anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) PCAPlot(Y$Y, anno=anno, Factor="Factor", numeric=TRUE, title="")
optimizeParameters.plot.optimizeParameters generates a heatmap of the quality assessment values
stored in the object of class optimizeParameters .
## S3 method for class 'optimizeParameters' plot( x, main = colnames(opt$All.results)[seq(3, dim(opt$All.results)[2], 1)], ... )## S3 method for class 'optimizeParameters' plot( x, main = colnames(opt$All.results)[seq(3, dim(opt$All.results)[2], 1)], ... )
x |
An object of the class |
main |
A character string describing title of plot. |
... |
Further arguments passed to or from other methods. |
The black point in the heatmap denotes the optimal parameter combination.
plot.optimizeParameters returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=2, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,100000), nc_index=seq(251,500,1), methods=c("fnorm"), cpus=1, parallel=FALSE) try(dev.off(), silent=TRUE) plot(opt, main="Heatmap Plot")Y<-simulateGEdata(500, 500, 10, 2, 5, g=2, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,100000), nc_index=seq(251,500,1), methods=c("fnorm"), cpus=1, parallel=FALSE) try(dev.off(), silent=TRUE) plot(opt, main="Heatmap Plot")
plotDesign returns a plot with different color strips representing
different factors relating to the study design.
genes.
plotDesign(anno, Factors, anno.names = Factors, orderby = NULL)plotDesign(anno, Factors, anno.names = Factors, orderby = NULL)
anno |
A dataframe or matrix containing the annotation of the study. |
Factors |
A vector of factors that should be plotted. |
anno.names |
A vector containing the names, the default |
orderby |
A character describing an element in |
plotDesign returns a plot.
Saskia Freytag
library(bladderbatch) data(bladderdata) expr.meta <- pData(bladderEset) plotDesign(expr.meta, c("cancer", "outcome", "batch"), c("Diagnosis", "Outcome", "Batch"), orderby="batch")library(bladderbatch) data(bladderdata) expr.meta <- pData(bladderEset) plotDesign(expr.meta, c("cancer", "outcome", "batch"), c("Diagnosis", "Outcome", "Batch"), orderby="batch")
Threshold.plotThreshold plots the objects of class Threshold.
plotThreshold(x, main = "", legend, col = NULL, ...)plotThreshold(x, main = "", legend, col = NULL, ...)
x |
An object of class |
main |
A character string describing the title of the plot. |
legend |
A vector of character strings decribing the different |
col |
A vector giving the colors, if |
... |
Further arguments passed to or from other methods. |
plotThreshold returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") Thresh<-calculateThreshold(Y$Y, exclude=1:100, index.ref=1:10, Weights=weights, anno=anno, Factor="Factor") plotThreshold(Thresh)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") Thresh<-calculateThreshold(Y$Y, exclude=1:100, index.ref=1:10, Weights=weights, anno=anno, Factor="Factor") plotThreshold(Thresh)
simualteGEdata.print.simualteGEdata is the print generic
for object so f the class simulateGEdata.
## S3 method for class 'simulateGEdata' print(x, ...)## S3 method for class 'simulateGEdata' print(x, ...)
x |
An object of the class |
... |
Further arguments passed to or from other methods. |
print.simualteGEdata returns the information about simulation and
the first 5 rows and 5 columns of all matrices.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) YY<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) Y
prioritise returns a set of genes from a candidate set of genes that are
correlated above a provided threshold with at least one of the provided reference
genes.
prioritise(X, ref_index, cand_index, anno, Factor, Weights, threshold)prioritise(X, ref_index, cand_index, anno, Factor, Weights, threshold)
X |
A matrix of gene expression values. |
ref_index |
A vector of indices of reference genes. |
cand_index |
A vector of indices of candidate genes. |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
Weights |
An object of class |
threshold |
A value in the range |
prioritise returns a matrix with three columns. The first column gives
the names of the genes that were prioiritised, while the second column gives the
number of correlations above the threshold for the gene in question. The
columns gives the sum of the absolute value of all correlations with reference genes
above the threshold.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) colnames(Y$Y)<-1:dim(Y$Y)[2] anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") prioritise(Y$Y, 1:10, 51:150, anno, "Factor", weights, 0.6)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) colnames(Y$Y)<-1:dim(Y$Y)[2] anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") prioritise(Y$Y, 1:10, 51:150, anno, "Factor", weights, 0.6)
RLEPlot generates three different types of
relative log expression plots for high-dimensional data.
RLEPlot( X, Y, center = TRUE, name, title, method = c("IQR.points", "IQR.boxplots", "minmax"), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, outlier = FALSE )RLEPlot( X, Y, center = TRUE, name, title, method = c("IQR.points", "IQR.boxplots", "minmax"), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, outlier = FALSE )
X |
A matrix of gene expression values. |
Y |
A matrix of gene expression values. |
center |
A logical scalar; |
name |
A vector of characters describing the data contained in
|
title |
A character string describing the title of the plot. |
method |
The type of RLE plot to be displayed; possible inputs are
|
anno |
A dataframe or a matrix containing the annotation of
arrays in |
Factor |
A character string corresponding to a column name of
|
numeric |
A logical scalar indicating whether |
new.legend |
A vector describing the names used for labelling; if |
outlier |
A logical indicating whether outliers should be plotted; only
applicable when |
There are three different RLE plots that can be generated using RLEPlot:
"IQR.points"Median expression vs. inter-quantile range of every array.
"IQR.boxplots"Boxplots of the 25% and 75% quantile of all arrays.
"Minmax"Ordinary RLE plots for the 5 arrays with the smallest and largest inter-quantile ranges.
Note that normal RLE plots are not supplied as they are not very suitable for high-dimensional data.
RLEPlot returns a plot.
Saskia Freytag, Terry Speed
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points") try(dev.off(), silent=TRUE) par(mfrow=c(1, 1)) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.boxplots") try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="minmax") #Create a random annotation file anno<-as.matrix(sample(1:4, dim(Y.hat)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points", anno=anno, Factor="Factor", numeric=TRUE)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points") try(dev.off(), silent=TRUE) par(mfrow=c(1, 1)) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.boxplots") try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="minmax") #Create a random annotation file anno<-as.matrix(sample(1:4, dim(Y.hat)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points", anno=anno, Factor="Factor", numeric=TRUE)
RUVcorr allows to apply global removal of unwanted variation (ridged version of RUV) to real and simulated gene expression data.
All gene expression data are assumed to be in the following format:
Rows correspond to arrays.
Columns correspond to genes.
Saskia Freytag
RUVNaiveRidge applies the ridged version of global removal of unwanted variation
to simulated or real gene expression data.
RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## Default S3 method: RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## S3 method for class 'simulateGEdata' RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE)RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## Default S3 method: RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## S3 method for class 'simulateGEdata' RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE)
Y |
A matrix of gene expression values or an object of
class |
center |
A logical scalar; if |
nc_index |
A vector of indices of negative controls. |
nu |
A numeric scalar value of |
kW |
An integer setting the number of dimensions for the estimated noise. |
check.input |
A logical scalar; if |
The parameter kW controls how much noise is cleaned, whereas the
parameter nu controls the amount of ridging to deal with possible dependence of
the noise and the factor of interest.
RUVNaiveRidge returns a matrix of the cleaned
(RUV-treated) centered gene expression values.
Saskia Freytag, Laurent Jacob
Jacob L., Gagnon-Bartsch J., Speed T. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Berkley Technical Reports (2012).
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) Y Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 9, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5] Y.hat<-RUVNaiveRidge(Y, center=FALSE, nc_index=251:500, 0, 10, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5]Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) Y Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 9, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5] Y.hat<-RUVNaiveRidge(Y, center=FALSE, nc_index=251:500, 0, 10, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5]
simulateGEdata returns simulated noisy gene expression values of specified size
and its underlying gene-gene correlation.
simulateGEdata( n, m, k, size.alpha, corr.strength, g = NULL, Sigma.eps = 0.1, nc, ne, intercept = TRUE, check.input = FALSE )simulateGEdata( n, m, k, size.alpha, corr.strength, g = NULL, Sigma.eps = 0.1, nc, ne, intercept = TRUE, check.input = FALSE )
n |
An integer setting the number of genes. |
m |
An integer setting the number of arrays. |
k |
An integer setting number of dimensions of noise term,
controls dimension of |
size.alpha |
A numeric scalar giving the maximal and
minimal absolute value of |
corr.strength |
An integer controlling the dimension of |
g |
An integer value between [1, min( |
Sigma.eps |
A numeric scalar setting the amount of random variation in
|
nc |
An integer setting the number of negative controls. |
ne |
An integer setting the number of strongly expressed genes. |
intercept |
An logical value indicating whether the systematic noise has an intercept. |
check.input |
A logical scalar; if |
This function generates log2-transformed expression values of n genes in
m arrays. The expression values consist of true expression and noise:
The dimensions of the matrices and are used to control the size of
the correlation between the genes. It is possible to simualte three different classes
of genes:
correlated genes expressed with true log2-transformed values from 0 to 16
correlated genes expressed with true log2-transformed values with mean 0
uncorrelated genes with true log2-transformed expression equal to 0 (negative controls)
The negative control are always the last nc genes in the data,
whereas the strongly expressed genes are always the first ne genes in the data.
The parameter intercept controls whether the systematic noise has an
offset or not. Note that the intercept is one dimension of .
It is possible to either simulate data where and are independent by
setting g to NULL, or increasing correlation between
and by increasing g.
simulateGEdata returns output of the class simulateGEdata.
An object of class simulateGEdata is a list with the
following components:
Truth A matrix containing the values of .
Y A matrix containing the values in .
Noise A matrix containing the values in .
Sigma A matrix containing the true gene-gene correlations, as defined by .
Info A matrix containing some of the general information about the simulation.
Saskia Freytag, Johann Gagnon-Bartsch
Jacob L., Gagnon-Bartsch J., Speed T. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Berkley Technical Reports (2012).
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) Y Y<-simulateGEdata(500, 500, 10, 2, 5, g=3, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) YY<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) Y Y<-simulateGEdata(500, 500, 10, 2, 5, g=3, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) Y
wcor returns correlations weighted according to a provided object of
class Weights.
wcor(X, anno, Factor, Weights)wcor(X, anno, Factor, Weights)
X |
A matrix of gene expression values. |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
Weights |
An object of class |
wcor returns a matrix.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") wcor(Y$Y[,1:5], anno, "Factor", weights)Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") wcor(Y$Y[,1:5], anno, "Factor", weights)